-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat(wren-launcher): Expanded dbt<>Wren MDL conversions, added BigQuery support #1859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdds an Changes
Sequence Diagram(s)sequenceDiagram
participant User as CLI User
participant CLI as DbtAutoConvert (CLI)
participant Conv as ConvertDbtProject / Converter
participant DS as DataSource Builder
participant MDL as WrenMDLManifest
User->>CLI: run command (flag or prompt)
CLI->>Conv: invoke conversion (IncludeStagingModels)
Conv->>DS: parse profiles → build DataSource (BigQuery path possible)
Conv->>Conv: read manifest.json & catalog.json
alt semantic_manifest present
Conv->>Conv: read semantic_manifest.json → extract enums, metrics, PKs
end
Conv->>Conv: apply staging filter (based on IncludeStagingModels)
Conv->>Conv: generate relationships & metrics
Conv->>MDL: assemble manifest (models, enums, metrics, relationships, datasources)
MDL-->>CLI: return ConvertResult
CLI-->>User: write/output MDL
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: CodeRabbit UI 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (1)
🧰 Additional context used🧠 Learnings (3)📚 Learning: 2025-07-09T02:43:20.433Z
Applied to files:
📚 Learning: 2025-08-13T05:17:30.180Z
Applied to files:
📚 Learning: 2025-07-09T02:43:07.560Z
Applied to files:
🧬 Code Graph Analysis (3)wren-launcher/commands/dbt/data_source.go (1)
wren-launcher/commands/dbt/data_source_test.go (2)
wren-launcher/commands/dbt/converter.go (3)
🔇 Additional comments (33)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (4)
wren-launcher/commands/dbt/data_source.go (1)
119-137
: Consider adding validation for keyfile_json format.The function correctly extracts
keyfile_json
from the Additional map, but there's no validation of its format. Consider adding basic JSON validation to catch malformed service account keys early.wren-launcher/commands/dbt/converter.go (3)
573-587
: Consider sanitizing enum names for special characters.The enum name generation uses model and column names directly, which might contain special characters that are invalid in enum names.
Consider adding more robust sanitization:
enumName = fmt.Sprintf("%s_%s_Enum", modelName, columnName) // Sanitize enum name -enumName = strings.ReplaceAll(enumName, ".", "_") +// Replace any non-alphanumeric characters with underscores +re := regexp.MustCompile(`[^a-zA-Z0-9_]`) +enumName = re.ReplaceAllString(enumName, "_") +// Ensure it doesn't start with a number +if len(enumName) > 0 && enumName[0] >= '0' && enumName[0] <= '9' { + enumName = "_" + enumName +}
684-701
: Add validation for measure references.The code searches for input measures in the measure lookup but doesn't handle cases where measures might not be found.
Consider adding validation and logging:
// Find the model this metric is based on for model, measures := range measureLookup { for _, inputMeasure := range inputMeasuresList { if imMap, ok := inputMeasure.(map[string]interface{}); ok { imName := getStringFromMap(imMap, "name", "") if _, exists := measures[imName]; exists { baseModel = model break + } else if imName != "" { + pterm.Warning.Printf("Measure '%s' not found in model '%s'\n", imName, model) } } } if baseModel != "" { break } } +if baseModel == "" && len(inputMeasuresList) > 0 { + pterm.Warning.Printf("Could not find model for metric '%s'\n", metricName) +}
918-925
: Handle edge cases in ref parsing.The
parseRef
function assumes standard ref() syntax but might encounter variations.Consider handling more ref() variations:
func parseRef(refStr string) string { - re := regexp.MustCompile(`ref\(['"]([^'"]+)['"]\)`) + // Handle ref() with optional spaces and both quote types + re := regexp.MustCompile(`ref\s*\(\s*['"]([^'"]+)['"]\s*\)`) matches := re.FindStringSubmatch(refStr) if len(matches) > 1 { return matches[1] } return "" }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
wren-launcher/commands/dbt.go
(3 hunks)wren-launcher/commands/dbt/converter.go
(14 hunks)wren-launcher/commands/dbt/data_source.go
(3 hunks)wren-launcher/commands/dbt/data_source_test.go
(1 hunks)wren-launcher/commands/dbt/profiles.go
(2 hunks)wren-launcher/commands/dbt/profiles_analyzer.go
(2 hunks)wren-launcher/commands/dbt/wren_mdl.go
(3 hunks)wren-launcher/commands/launch.go
(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: in wren-launcher/commands/dbt/data_source.go, when encountering unsupported database types in conver...
Learnt from: goldmedal
PR: Canner/WrenAI#1827
File: wren-launcher/commands/dbt/data_source.go:50-53
Timestamp: 2025-07-09T02:43:20.433Z
Learning: In wren-launcher/commands/dbt/data_source.go, when encountering unsupported database types in convertConnectionToDataSource function, the user prefers to log a warning and continue processing instead of failing. The function should return nil, nil to allow processing to continue.
Applied to files:
wren-launcher/commands/dbt/profiles.go
wren-launcher/commands/dbt/profiles_analyzer.go
wren-launcher/commands/dbt.go
wren-launcher/commands/launch.go
wren-launcher/commands/dbt/data_source_test.go
wren-launcher/commands/dbt/data_source.go
wren-launcher/commands/dbt/converter.go
📚 Learning: in the wren-ai-service codebase, when adding new fields like "alias" to the output of functions that...
Learnt from: cyyeh
PR: Canner/WrenAI#1763
File: wren-ai-service/src/pipelines/generation/semantics_description.py:31-33
Timestamp: 2025-06-20T02:37:21.292Z
Learning: In the wren-ai-service codebase, when adding new fields like "alias" to the output of functions that use Pydantic models for validation, the user prefers not to update the corresponding Pydantic model definitions to include these new fields.
Applied to files:
wren-launcher/commands/dbt/wren_mdl.go
📚 Learning: in the wrenai codebase, the maptype methods for data sources should return the original type name fo...
Learnt from: goldmedal
PR: Canner/WrenAI#1827
File: wren-launcher/commands/dbt/data_source.go:154-156
Timestamp: 2025-07-09T02:43:07.560Z
Learning: In the WrenAI codebase, the MapType methods for data sources should return the original type name for unknown types, rather than a default fallback like "string". This maintains consistency across all data source implementations.
Applied to files:
wren-launcher/commands/dbt/data_source.go
wren-launcher/commands/dbt/converter.go
🧬 Code Graph Analysis (3)
wren-launcher/commands/dbt.go (1)
wren-launcher/commands/dbt/converter.go (3)
ConvertResult
(30-34)ConvertOptions
(19-27)ConvertDbtProjectCore
(38-259)
wren-launcher/commands/launch.go (1)
wren-launcher/commands/dbt.go (1)
DbtConvertProject
(63-75)
wren-launcher/commands/dbt/converter.go (3)
wren-launcher/commands/dbt/utils.go (1)
FileExists
(30-36)wren-launcher/commands/dbt/data_source.go (2)
WrenBigQueryDataSource
(218-224)DataSource
(12-16)wren-launcher/commands/dbt/wren_mdl.go (7)
WrenMDLManifest
(4-13)EnumDefinition
(16-19)WrenModel
(22-30)Relationship
(52-58)Metric
(61-68)View
(71-75)WrenColumn
(40-49)
🔇 Additional comments (10)
wren-launcher/commands/dbt/profiles_analyzer.go (1)
136-136
: LGTM! Clean implementation of method field extraction.The addition of the
Method
field extraction follows the established pattern and correctly includes it in theknownFields
map to prevent it from being stored in theAdditional
map. This properly supports BigQuery authentication methods as outlined in the PR objectives.Also applies to: 149-149
wren-launcher/commands/launch.go (1)
518-518
: LGTM! Correctly implements the new IncludeStagingModels parameter.The addition of the
false
parameter forIncludeStagingModels
is appropriate for the launch flow, where staging models are typically not needed. This aligns with the function signature update and maintains backward compatibility.wren-launcher/commands/dbt/profiles.go (2)
28-28
: LGTM! Well-implemented Method field for BigQuery support.The
Method
field is properly added with appropriate YAML/JSON tags andomitempty
annotation, following the established pattern for optional database-specific fields. This correctly supports BigQuery authentication methods as mentioned in the PR objectives.
11-11
: LGTM! Improved formatting consistency.The formatting adjustments to the
Target
andSSLMode
fields improve code consistency without affecting functionality.Also applies to: 35-35
wren-launcher/commands/dbt.go (2)
27-27
: LGTM! Well-implemented command line flag for staging model inclusion.The
--include-staging-models
flag is properly added with a clear description that explains its purpose. This gives users control over whether staging models are included during conversion.
50-50
: LGTM! Consistent implementation of IncludeStagingModels parameter.The parameter is properly threaded through both the
DbtAutoConvert
command and theDbtConvertProject
function, with clean formatting and consistent assignment to theConvertOptions
struct. The function signature update maintains clarity and follows Go conventions.Also applies to: 63-72
wren-launcher/commands/dbt/data_source_test.go (2)
377-470
: LGTM! Comprehensive BigQuery conversion tests.The test cases properly cover both supported BigQuery authentication methods ("service-account" and "service-account-json") and validate correct field extraction and assignment. The test structure follows established patterns and provides good coverage for the new BigQuery functionality.
472-544
: LGTM! Thorough BigQuery validation test coverage.The validation tests comprehensively cover both valid and invalid scenarios, including:
- Both supported authentication methods
- Missing required fields (project, dataset)
- Unsupported oauth method
- Method-specific requirements (keyfile for service-account)
This ensures the BigQuery data source validation behaves correctly across all expected use cases.
wren-launcher/commands/dbt/wren_mdl.go (1)
7-11
: LGTM! Clean struct additions for enhanced metadata support.The new
EnumDefinition
andMetric
structs are well-designed, and the extensions toWrenMDLManifest
andWrenColumn
are properly implemented with appropriate JSON tags and omitempty markers.Also applies to: 15-19, 43-43, 60-68
wren-launcher/commands/dbt/converter.go (1)
431-433
: LGTM! Staging model filtering is implemented correctly.The filtering logic properly excludes models with
.stg_
or.staging_
in their names whenIncludeStagingModels
is false.
This commit incorporates a series of changes based on code review feedback to improve the dbt to Wren MDL converter. The updates focus on enhancing security, adding more robust validation and parsing, and increasing the completeness of the generated MDL file. In `wren_mdl.go`: - feat: Adds a `DisplayName` field to the `WrenColumn` struct to support user-friendly labels for columns. In `data_source.go`: - feat: Adds JSON validation for the `keyfile_json` field in BigQuery data sources to provide earlier feedback on malformed credentials. - refactor: Implements BigQuery-specific data type mappings in the `MapType` method to correctly convert types like `INT64`, `TIMESTAMP`, and `NUMERIC`. In `converter.go`: - fix(security): Omits the `keyfile_json` content from the generated `wren-datasource.json` to prevent exposing sensitive credentials in the output file. - feat: Adds mapping for a dbt column's `meta.label` to the `DisplayName` property in the corresponding Wren `WrenColumn`, improving the usability of the output. - refactor: Enhances enum name generation with more robust sanitization, replacing all non-alphanumeric characters to ensure valid identifiers. - refactor: Improves the `ref()` parsing regex to handle optional whitespace, making it more resilient to formatting variations. - refactor: Adds validation for metric `input_measures` to log a warning if a referenced measure cannot be found, improving debuggability. - fix: Resolves a persistent compilation error related to an incorrect type assertion on `inputMeasures` by refactoring the logic to correctly identify the base model for a metric.
Thanks @cougrimes for the contribution. I'll take a look 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cougrimes for working on this. It's a great start for BigQuery and dbt semantic layer integration! 👍
I've left some comments regarding the connection info and launching steps.
For the dbt semantic layer integration, could you provide a sample semantic_manifest.json
? I'm not very familiar with the dbt semantic layer, so a sample would help us confirm whether the conversion logic is reasonable.
OutputDir string | ||
ProfileName string | ||
Target string | ||
IncludeStagingModels bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice improvement 👍
type WrenBigQueryDataSource struct { | ||
Project string `json:"project"` | ||
Dataset string `json:"dataset"` | ||
Method string `json:"method"` | ||
Keyfile string `json:"keyfile,omitempty"` | ||
KeyfileJSON string `json:"keyfile_json,omitempty"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They aren't matched what Wren AI requires for BigQuery connection. You can refer to the Wren engine doc .
The connection info should be
{
"credentials": "eyJ...",
"dataset_id": "my_dataset",
"project_id": "my-project"
}
credentials
is the base64 content of json file
@@ -154,6 +169,17 @@ func ConvertDbtProjectCore(opts ConvertOptions) (*ConvertResult, error) { | |||
"format": typedDS.Format, | |||
}, | |||
} | |||
case *WrenBigQueryDataSource: | |||
// SECURITY FIX: Omit keyfile_json from output for security. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To build the connection info of BigQuery, we need to encode the content of the key file in base64. What's your concern about reading the keyfile_json
? 🤔
wren-launcher/commands/launch.go
Outdated
@@ -515,12 +515,12 @@ func processDbtProject(projectDir string) (string, error) { | |||
} | |||
|
|||
// Use the core conversion function from dbt package | |||
result, err := DbtConvertProject(dbtProjectPath, targetDir, profileName, target, true) | |||
result, err := DbtConvertProject(dbtProjectPath, targetDir, profileName, target, true, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We add a new option IncludeStagingModels
for dbt converting. We should have an additional step for it. Otherwise, it will always be disabled.
Hi @cougrimes, after #1877, we added some formatter and lint tools for |
@goldmedal Happy to make the changes requested, but I actually wound up hitting an unexpected issue when processing the MDL on my end—despite declaring BigQuery, I keep getting the MDL trying to parse as Trino rather than BQ. Do you have any other documentation on the MDL schema? I've been trying to work backward to figure out why this is happening to no avail. |
What did you mean that the MDL is parsed as Trino? Which step will do it? 🤔
You can check the doc for WrenMDL or the json schema. |
Most of the issues I had been running into around Trino-esque errors are addressed in Canner/wren-engine#1290; issues with keys and auths resolved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
wren-launcher/commands/dbt/data_source.go (1)
4-11
: Import os to support reading keyfile paths for service-account authWe’ll need os.ReadFile when handling dbt BigQuery profiles that use method: service-account with keyfile path (see below).
import ( "encoding/base64" "encoding/json" "fmt" + "os" "path/filepath" "strings" "github.com/pterm/pterm" )
♻️ Duplicate comments (1)
wren-launcher/commands/launch.go (1)
525-525
: IncludeStagingModels is hardcoded to false; there’s no way to enable it in the Launch flowThis permanently disables staging models in this path. Per the PR objective and previous reviewer note, expose a toggle here (prompt or config/flag) instead of hardcoding.
Apply this minimal change to pass a variable instead of a literal:
- result, err := DbtConvertProject(dbtProjectPath, targetDir, profileName, target, true, false) + includeStagingModels := false // TODO: wire from config flag or prompt + result, err := DbtConvertProject(dbtProjectPath, targetDir, profileName, target, true, includeStagingModels)If you want a quick interactive solution, add a prompt:
// place near other ask* helpers func askForIncludeStagingModels() (bool, error) { prompt := promptui.Select{ Label: "Include staging models (stg_*, staging_*)?", Items: []string{"No", "Yes"}, } _, result, err := prompt.Run() if err != nil { return false, err } return result == "Yes", nil }Then in processDbtProject(), before calling DbtConvertProject:
includeStagingModels, _ := askForIncludeStagingModels() // default to No on error
🧹 Nitpick comments (4)
wren-launcher/commands/dbt/data_source.go (1)
272-283
: Tighten validation by trimming whitespaceAvoid accepting whitespace-only values for required fields.
func (ds *WrenBigQueryDataSource) Validate() error { - if ds.Project == "" { + if strings.TrimSpace(ds.Project) == "" { return fmt.Errorf("project_id cannot be empty") } - if ds.Dataset == "" { + if strings.TrimSpace(ds.Dataset) == "" { return fmt.Errorf("dataset_id cannot be empty") } - if ds.Credentials == "" { + if strings.TrimSpace(ds.Credentials) == "" { return fmt.Errorf("credentials cannot be empty") } return nil }wren-launcher/commands/dbt/converter.go (3)
433-437
: Reduce false positives when excluding staging modelsChecking substrings on nodeKey can skip non-staging models. Prefer checking model name prefix.
- if !includeStagingModels && (strings.Contains(nodeKey, ".stg_") || strings.Contains(nodeKey, ".staging_")) { - continue - } + if !includeStagingModels { + mn := getModelNameFromNodeKey(nodeKey) + if strings.HasPrefix(mn, "stg_") || strings.HasPrefix(mn, "staging_") { + continue + } + }
458-510
: Possible duplicate relationships; add a cheap dedup by keyIf the same relationship is captured via both embedded and compiled tests, duplicates can be produced. Dedup on name+type+condition to keep output clean.
func generateRelationships(manifestData map[string]interface{}) []Relationship { var relationships []Relationship if nodes, ok := manifestData["nodes"].(map[string]interface{}); ok { for nodeKey, nodeValue := range nodes { nodeMap, ok := nodeValue.(map[string]interface{}) if !ok { continue } @@ } } - return relationships + // Dedup + seen := make(map[string]struct{}, len(relationships)) + var unique []Relationship + for _, r := range relationships { + key := r.Name + "|" + r.JoinType + "|" + r.Condition + if _, ok := seen[key]; ok { + continue + } + seen[key] = struct{}{} + unique = append(unique, r) + } + return unique }
938-945
: Nit: precompile the ref() regex onceCompiling on each call is unnecessary. Cache at package scope for micro-efficiency and readability.
You can move the regex to a package-level var:
var refRegex = regexp.MustCompile(`ref\s*\(\s*['"]([^'"]+)['"]\s*\)`)Then in parseRef:
- re := regexp.MustCompile(`ref\s*\(\s*['"]([^'"]+)['"]\s*\)`) - matches := re.FindStringSubmatch(refStr) + matches := refRegex.FindStringSubmatch(refStr)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (5)
wren-launcher/commands/dbt/converter.go
(14 hunks)wren-launcher/commands/dbt/data_source.go
(5 hunks)wren-launcher/commands/dbt/data_source_test.go
(1 hunks)wren-launcher/commands/dbt/profiles_analyzer.go
(2 hunks)wren-launcher/commands/launch.go
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- wren-launcher/commands/dbt/profiles_analyzer.go
- wren-launcher/commands/dbt/data_source_test.go
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-07-09T02:43:20.433Z
Learnt from: goldmedal
PR: Canner/WrenAI#1827
File: wren-launcher/commands/dbt/data_source.go:50-53
Timestamp: 2025-07-09T02:43:20.433Z
Learning: In wren-launcher/commands/dbt/data_source.go, when encountering unsupported database types in convertConnectionToDataSource function, the user prefers to log a warning and continue processing instead of failing. The function should return nil, nil to allow processing to continue.
Applied to files:
wren-launcher/commands/dbt/data_source.go
wren-launcher/commands/dbt/converter.go
📚 Learning: 2025-08-13T05:17:30.180Z
Learnt from: goldmedal
PR: Canner/WrenAI#1887
File: wren-launcher/commands/dbt/data_source.go:273-276
Timestamp: 2025-08-13T05:17:30.180Z
Learning: In the Wren AI codebase, there's an intentional distinction between dbt connection types and Wren AI data source types: dbt uses "sqlserver" as the connection type, while Wren AI expects "mssql" as the data source type. The conversion layer in convertConnectionToDataSource() correctly matches "sqlserver" from dbt profiles and the WrenMSSQLDataSource.GetType() method correctly returns "mssql" for Wren AI compatibility.
Applied to files:
wren-launcher/commands/dbt/data_source.go
📚 Learning: 2025-07-09T02:43:07.560Z
Learnt from: goldmedal
PR: Canner/WrenAI#1827
File: wren-launcher/commands/dbt/data_source.go:154-156
Timestamp: 2025-07-09T02:43:07.560Z
Learning: In the WrenAI codebase, the MapType methods for data sources should return the original type name for unknown types, rather than a default fallback like "string". This maintains consistency across all data source implementations.
Applied to files:
wren-launcher/commands/dbt/data_source.go
🧬 Code Graph Analysis (3)
wren-launcher/commands/launch.go (1)
wren-launcher/commands/dbt.go (1)
DbtConvertProject
(63-75)
wren-launcher/commands/dbt/data_source.go (1)
wren-launcher/commands/dbt/profiles.go (1)
DbtConnection
(16-40)
wren-launcher/commands/dbt/converter.go (3)
wren-launcher/commands/dbt/utils.go (1)
FileExists
(30-36)wren-launcher/commands/dbt/data_source.go (2)
WrenBigQueryDataSource
(260-264)DataSource
(43-47)wren-launcher/commands/dbt/wren_mdl.go (7)
WrenMDLManifest
(4-13)EnumDefinition
(16-19)WrenModel
(22-30)Relationship
(53-59)Metric
(62-69)View
(72-76)WrenColumn
(40-50)
🔇 Additional comments (5)
wren-launcher/commands/dbt/data_source.go (2)
77-89
: BigQuery type dispatch looks goodRouting "bigquery" to a dedicated converter aligns with the DataSource pattern and prior guidance to warn-and-continue on unsupported types.
286-309
: BigQuery type mapping: LGTMCovers the common BigQuery types and correctly falls back to the original type for unknowns, per codebase convention.
wren-launcher/commands/dbt/converter.go (3)
172-180
: BigQuery datasource output structure matches engine expectations and avoids credential leakageEmits project_id, dataset_id, credentials (base64) only, and writes with 0600. This aligns with the engine schema and prior security guidance.
839-847
: DisplayName mapping from dbt meta.label: LGTMCorrectly reads meta.label into WrenColumn.DisplayName as per PR objective.
328-336
: Verify MDLdataSources
field usageI wasn’t able to locate any MDL schema or engine code in this repo that defines how the
dataSources
property is interpreted. If the engine actually expects a path (for example"wren-datasource.json"
) rather than the connector type string, we should either:
- Populate
manifest.DataSources
with the relative JSON filename,- Or leave it empty and let the co-located
wren-datasource.json
be picked up.Would you like to wire
ConvertDbtProjectCore
to pass the generated datasource path intoConvertDbtCatalogToWrenMDL
so this field is populated correctly?
* fix: Adjusted some behaviors around keyfiles vs. inlined JSON * fix: Corrected a logical flaw where ratio metrics in dbt were calculated at a row level (numerator / denominator) instead of after aggregation. The code now correctly generates expressions like SUM(numerator) / SUM(denominator). * perf: Improved performance by pre-compiling a regular expression for parsing ref() functions at the package level, preventing it from being re-compiled on every function call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (3)
wren-launcher/commands/launch.go (1)
190-201
: Nice addition: interactive staging-models promptThis addresses the earlier feedback to expose IncludeStagingModels via an extra step. The prompt UX is clear and returns a sane bool.
wren-launcher/commands/dbt/converter.go (2)
861-869
: DisplayName mapping from meta.label: LGTMColumn.DisplayName is now sourced from meta.label per PR objectives. Thanks for addressing the earlier gap.
655-784
: Metric conversion covers simple, ratio, and derived correctlyThe ratio handling now composes expressions using each measure’s agg and expr metadata, avoiding bare-name division. Solid implementation.
🧹 Nitpick comments (2)
wren-launcher/commands/launch.go (1)
537-546
: Wire-up looks correct; minor defensive nitThe call now passes the new IncludeStagingModels argument correctly. If the prompt errors, you already warn and fall back to the default (false). To make the fallback explicit and clearer to readers, consider setting includeStagingModels = false in the error branch.
includeStagingModels, err := askForIncludeStagingModels() if err != nil { pterm.Warning.Println("Could not get staging model preference, defaulting to 'No'.") + includeStagingModels = false }
wren-launcher/commands/dbt/data_source.go (1)
151-231
: BigQuery auth handling is robust; add small trims for resilienceGood coverage of service-account-json (inline), service-account (file path), fallback when method is empty, and explicit skipping of oauth per PR scope. To harden against whitespace in profiles, trim keyfile_json and keyfile path before use.
case "service-account-json": // Extract inline JSON from Additional["keyfile_json"] var keyfileJSON string if kfj, exists := conn.Additional["keyfile_json"]; exists { if kfjStr, ok := kfj.(string); ok { - keyfileJSON = kfjStr + keyfileJSON = strings.TrimSpace(kfjStr) } } @@ case "service-account", "": // Prefer structured field; fall back to Additional["keyfile"] - keyfilePath := strings.TrimSpace(conn.Keyfile) + keyfilePath := strings.TrimSpace(conn.Keyfile) if keyfilePath == "" { if kf, ok := conn.Additional["keyfile"]; ok { if kfStr, ok := kf.(string); ok { - keyfilePath = strings.TrimSpace(kfStr) + keyfilePath = strings.TrimSpace(kfStr) } } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
wren-launcher/commands/dbt/converter.go
(14 hunks)wren-launcher/commands/dbt/data_source.go
(5 hunks)wren-launcher/commands/launch.go
(3 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-07-09T02:43:20.433Z
Learnt from: goldmedal
PR: Canner/WrenAI#1827
File: wren-launcher/commands/dbt/data_source.go:50-53
Timestamp: 2025-07-09T02:43:20.433Z
Learning: In wren-launcher/commands/dbt/data_source.go, when encountering unsupported database types in convertConnectionToDataSource function, the user prefers to log a warning and continue processing instead of failing. The function should return nil, nil to allow processing to continue.
Applied to files:
wren-launcher/commands/dbt/data_source.go
wren-launcher/commands/dbt/converter.go
📚 Learning: 2025-08-13T05:17:30.180Z
Learnt from: goldmedal
PR: Canner/WrenAI#1887
File: wren-launcher/commands/dbt/data_source.go:273-276
Timestamp: 2025-08-13T05:17:30.180Z
Learning: In the Wren AI codebase, there's an intentional distinction between dbt connection types and Wren AI data source types: dbt uses "sqlserver" as the connection type, while Wren AI expects "mssql" as the data source type. The conversion layer in convertConnectionToDataSource() correctly matches "sqlserver" from dbt profiles and the WrenMSSQLDataSource.GetType() method correctly returns "mssql" for Wren AI compatibility.
Applied to files:
wren-launcher/commands/dbt/data_source.go
📚 Learning: 2025-07-09T02:43:07.560Z
Learnt from: goldmedal
PR: Canner/WrenAI#1827
File: wren-launcher/commands/dbt/data_source.go:154-156
Timestamp: 2025-07-09T02:43:07.560Z
Learning: In the WrenAI codebase, the MapType methods for data sources should return the original type name for unknown types, rather than a default fallback like "string". This maintains consistency across all data source implementations.
Applied to files:
wren-launcher/commands/dbt/data_source.go
🧬 Code Graph Analysis (3)
wren-launcher/commands/dbt/data_source.go (1)
wren-launcher/commands/dbt/profiles.go (1)
DbtConnection
(16-40)
wren-launcher/commands/launch.go (1)
wren-launcher/commands/dbt.go (1)
DbtConvertProject
(63-75)
wren-launcher/commands/dbt/converter.go (3)
wren-launcher/commands/dbt/utils.go (1)
FileExists
(30-36)wren-launcher/commands/dbt/data_source.go (2)
WrenBigQueryDataSource
(312-316)DataSource
(44-48)wren-launcher/commands/dbt/wren_mdl.go (7)
WrenMDLManifest
(4-13)EnumDefinition
(16-19)WrenModel
(22-30)Relationship
(53-59)Metric
(62-69)View
(72-76)WrenColumn
(40-50)
🔇 Additional comments (5)
wren-launcher/commands/launch.go (1)
190-202
: No duplicate definitions of askForIncludeStagingModels found
Verified withrg -nC2 --type go -P '\bfunc\s+askForIncludeStagingModels\s*\('
; only one occurrence exists at wren-launcher/commands/launch.go:191. No redeclaration issues detected.wren-launcher/commands/dbt/data_source.go (1)
83-85
: BigQuery dispatch added correctlyconvertConnectionToDataSource now routes "bigquery" to the dedicated converter, aligning with the new data source support.
wren-launcher/commands/dbt/converter.go (3)
59-64
: Optional files handled gracefullyOptional manifest.json and semantic_manifest.json are discovered and logged with clear fallbacks. This makes the conversion resilient to missing artifacts.
Also applies to: 85-92
172-180
: BigQuery data source output matches engine requirementsProperties use project_id, dataset_id, and base64-encoded credentials as requested in the engine docs and maintainer feedback. File permissions for the output are secure elsewhere (0600).
433-437
: Staging model exclusion implemented as specifiedThe includeStagingModels filter correctly skips stg_* and staging_* when disabled. Simple and effective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
wren-launcher/commands/launch.go (1)
14-14
: Fix invalid import path (breaks build)The import has a typo ("github.comcom"). This will prevent compilation.
Apply this diff:
- "github.comcom/Canner/WrenAI/wren-launcher/config" + "github.com/Canner/WrenAI/wren-launcher/config"
🧹 Nitpick comments (1)
wren-launcher/commands/launch.go (1)
536-544
: Make the "No" fallback explicit on prompt failure; consider non-interactive overrideCurrently, if the prompt fails, includeStagingModels is already false due to the function’s (false, err) return, but making that explicit improves readability and future-proofing. Also consider honoring a non-interactive flag/env (e.g., via config) before prompting so CI/automation can bypass TTY interaction.
Minimal explicit fallback:
// Ask the user whether to include staging models includeStagingModels, err := askForIncludeStagingModels() if err != nil { pterm.Warning.Println("Could not get staging model preference, defaulting to 'No'.") + includeStagingModels = false }
If you’d like, I can wire this to a config/env-driven value (e.g., config.GetIncludeStagingModels()) with the prompt as a fallback.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
wren-launcher/commands/launch.go
(3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
wren-launcher/commands/launch.go (1)
wren-launcher/commands/dbt.go (1)
DbtConvertProject
(63-75)
🔇 Additional comments (1)
wren-launcher/commands/launch.go (1)
190-201
: All good: no duplicate definitions foundI’ve verified that
askForIncludeStagingModels
is defined only once inwren-launcher/commands/launch.go
, so there’s no risk of a redefinition compile error. Approving these changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
wren-launcher/commands/dbt/converter.go (3)
447-451
: Relationship generation and de-dup: fix looks correctThe de-dup key concatenation is now valid Go and the stray brace is gone. Relationship extraction from both column-embedded tests and compiled test nodes is comprehensive.
Also applies to: 460-524
654-783
: Metrics: correct use of measure agg/expr in ratio/simple metricsThis addresses the earlier concern about bare names; building expressions as AGG(expr) and using proper parentheses for ratios is correct.
860-868
: Column DisplayName mapping from meta.label: LGTMThis fulfills the PR objective and matches the WrenColumn API.
🧹 Nitpick comments (3)
wren-launcher/commands/launch.go (1)
536-541
: Graceful fallback on prompt error is good; consider a non-interactive overrideDefaulting to "No" on prompt failure is sensible. For CI/non-interactive scenarios, consider allowing an override via a flag/env/config (e.g., WREN_INCLUDE_STAGING_MODELS=true) to skip the prompt entirely when set.
Would you like me to wire an env-based override and document it?
wren-launcher/commands/dbt/converter.go (2)
433-438
: Staging model filter: consider making prefixes configurableThe stg_/staging_ prefixes are a good default. Consider making the prefix list configurable (via opts or config/env) to support different naming conventions.
936-945
: Guard getMapFromMap against nil parent mapsMinor robustness tweak: if m is nil, return defaultValue to mirror getStringFromMap.
Apply this diff:
func getMapFromMap(m map[string]interface{}, key string, defaultValue map[string]interface{}) map[string]interface{} { + if m == nil { + return defaultValue + } if value, exists := m[key]; exists { if str, ok := value.(map[string]interface{}); ok { return str } } return defaultValue }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
wren-launcher/commands/dbt/converter.go
(14 hunks)wren-launcher/commands/launch.go
(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-09T02:43:20.433Z
Learnt from: goldmedal
PR: Canner/WrenAI#1827
File: wren-launcher/commands/dbt/data_source.go:50-53
Timestamp: 2025-07-09T02:43:20.433Z
Learning: In wren-launcher/commands/dbt/data_source.go, when encountering unsupported database types in convertConnectionToDataSource function, the user prefers to log a warning and continue processing instead of failing. The function should return nil, nil to allow processing to continue.
Applied to files:
wren-launcher/commands/dbt/converter.go
🧬 Code Graph Analysis (2)
wren-launcher/commands/launch.go (1)
wren-launcher/commands/dbt.go (1)
DbtConvertProject
(63-75)
wren-launcher/commands/dbt/converter.go (3)
wren-launcher/commands/dbt/utils.go (1)
FileExists
(30-36)wren-launcher/commands/dbt/data_source.go (2)
WrenBigQueryDataSource
(312-316)DataSource
(44-48)wren-launcher/commands/dbt/wren_mdl.go (7)
WrenMDLManifest
(4-13)EnumDefinition
(16-19)WrenModel
(22-30)Relationship
(53-59)Metric
(62-69)View
(72-76)WrenColumn
(40-50)
🔇 Additional comments (9)
wren-launcher/commands/launch.go (2)
190-200
: Include-staging prompt: LGTM; no duplicate definition detectedThe prompt implementation is clean and returns a boolean as expected. Also, despite an earlier summary claiming a duplicate, this file now contains a single definition of askForIncludeStagingModels.
544-545
: All DbtConvertProject call sites updatedI ran the grep across the Go code and found only the function definition and the single invocation in
wren-launcher/commands/launch.go
, both using the new six-argument signature. No remaining calls with the old arity exist.wren-launcher/commands/dbt/converter.go (7)
20-27
: New ConvertOptions.IncludeStagingModels: LGTMAdding this to options keeps the flow explicit and testable.
85-92
: Optional semantic manifest handling: LGTMRobust to missing/malformed semantic_manifest.json with clear warnings. Good trade-off for resilience.
213-237
: Helpful conversion summary: LGTMThe added counts for relationships, metrics, and enums improve UX and debuggability.
326-336
: Manifest initialization extended with Enums/Metrics/DataSources: LGTMTypes align with wren_mdl.go and keep the output self-descriptive.
344-395
: Pre-processing for not_null/accepted_values on struct fields: LGTMSolid coverage of both top-level and embedded field tests with enum de-dup. Nice.
907-911
: PK mapping via semantic_models alias: verify alias aligns with catalog model nameSemantic model node_relation.alias may differ from the dbt model name derived from the catalog key. If you see sparse PKs in output, consider a fallback using the dbt model name or a map from alias->model name.
172-180
: Add cautionary warning for BigQuery credentialsWriting base64-encoded credentials is necessary but sensitive—even with 0600 perms, it’s worth flagging for end users. After the data source file is written, print a clear warning and double-check the property names against your MDL schema.
• After:
pterm.Success.Printf("✓ WrenDataSource saved to: %s\n", dataSourcePath)insert:
if ds != nil && ds.GetType() == "bigquery" { pterm.Warning.Println("Note: wren-datasource.json contains BigQuery credentials. Treat it as sensitive and restrict access.") }• Ensure the JSON keys
"project_id"
,"dataset_id"
, and"credentials"
exactly match what your engine’s MDL/schema expects for BigQuery. The schema isn’t in this repo—please verify against your external specs or documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
wren-launcher/commands/dbt/data_source.go (3)
183-207
: Minor: clarify error when method is omittedWhen method is empty you treat it like service-account (good), but the error message only mentions service-account. Slight wording tweak improves UX.
- return nil, fmt.Errorf("bigquery: method 'service-account' requires 'keyfile' path") + return nil, fmt.Errorf("bigquery: auth method 'service-account' (or omitted) requires 'keyfile' path")
347-370
: Nit: reuse shared type constants for consistencyYou already declare integerType, doubleType, varcharType, booleanType, dateType, timestampType at the top. Returning those improves consistency and avoids typos.
func (ds *WrenBigQueryDataSource) MapType(sourceType string) string { switch strings.ToUpper(sourceType) { case "INT64", "INTEGER": - return "integer" + return integerType case "FLOAT64", "FLOAT": - return "double" + return doubleType case "STRING": - return "varchar" + return varcharType case "BOOL", "BOOLEAN": - return "boolean" + return booleanType case "DATE": - return "date" + return dateType case "TIMESTAMP", "DATETIME": - return "timestamp" + return timestampType case "NUMERIC", "DECIMAL", "BIGNUMERIC": - return "double" + return doubleType case "BYTES": - return "varchar" + return varcharType case "JSON": - return "varchar" + return varcharType default: return strings.ToLower(sourceType) } }
209-214
: Optional: Expand environment variables and “~” in keyfile pathsTo make keyfile resolution more robust when dbt profiles specify paths using
~
or environment variables, consider updating lines 209–214 inwren-launcher/commands/dbt/data_source.go
as follows:- // If keyfile path is not absolute, join it - // with the dbt project's home directory path. - resolvedKeyfilePath := keyfilePath - if !filepath.IsAbs(keyfilePath) && dbtHomePath != "" { - resolvedKeyfilePath = filepath.Join(dbtHomePath, keyfilePath) - } + // Expand env vars and ~ in keyfile path, then resolve relative to dbtHomePath if provided. + expanded := os.ExpandEnv(keyfilePath) + if strings.HasPrefix(expanded, "~") { + if home, err := os.UserHomeDir(); err == nil { + expanded = filepath.Join(home, strings.TrimPrefix(expanded, "~")) + } + } + resolvedKeyfilePath := expanded + if !filepath.IsAbs(resolvedKeyfilePath) && dbtHomePath != "" { + resolvedKeyfilePath = filepath.Join(dbtHomePath, resolvedKeyfilePath) + }We verified that in production code (
converter.go
), profile conversion goes throughGetActiveDataSources
, which correctly passesdbtHomePath
. This change will therefore handle both relative and env-var/tilde-based keyfile paths in the main conversion flow.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
wren-launcher/commands/dbt/data_source.go
(4 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-07-09T02:43:20.433Z
Learnt from: goldmedal
PR: Canner/WrenAI#1827
File: wren-launcher/commands/dbt/data_source.go:50-53
Timestamp: 2025-07-09T02:43:20.433Z
Learning: In wren-launcher/commands/dbt/data_source.go, when encountering unsupported database types in convertConnectionToDataSource function, the user prefers to log a warning and continue processing instead of failing. The function should return nil, nil to allow processing to continue.
Applied to files:
wren-launcher/commands/dbt/data_source.go
📚 Learning: 2025-08-13T05:17:30.180Z
Learnt from: goldmedal
PR: Canner/WrenAI#1887
File: wren-launcher/commands/dbt/data_source.go:273-276
Timestamp: 2025-08-13T05:17:30.180Z
Learning: In the Wren AI codebase, there's an intentional distinction between dbt connection types and Wren AI data source types: dbt uses "sqlserver" as the connection type, while Wren AI expects "mssql" as the data source type. The conversion layer in convertConnectionToDataSource() correctly matches "sqlserver" from dbt profiles and the WrenMSSQLDataSource.GetType() method correctly returns "mssql" for Wren AI compatibility.
Applied to files:
wren-launcher/commands/dbt/data_source.go
📚 Learning: 2025-07-09T02:43:07.560Z
Learnt from: goldmedal
PR: Canner/WrenAI#1827
File: wren-launcher/commands/dbt/data_source.go:154-156
Timestamp: 2025-07-09T02:43:07.560Z
Learning: In the WrenAI codebase, the MapType methods for data sources should return the original type name for unknown types, rather than a default fallback like "string". This maintains consistency across all data source implementations.
Applied to files:
wren-launcher/commands/dbt/data_source.go
🧬 Code Graph Analysis (1)
wren-launcher/commands/dbt/data_source.go (1)
wren-launcher/commands/dbt/profiles.go (1)
DbtConnection
(16-40)
🔇 Additional comments (7)
wren-launcher/commands/dbt/data_source.go (7)
4-7
: Imports for base64/json/os are appropriateAll three are used by the BigQuery converter. No issues.
83-85
: Correct dispatch to BigQuery converter with dbtHomePath threadingRouting “bigquery” here and passing dbtHomePath enables proper resolution of relative keyfile paths downstream. Looks good.
157-165
: Good: validate service account JSON before base64 encodingThe encodeJSON helper prevents encoding malformed credentials and keeps the original bytes for encoding. Solid approach.
166-183
: Service-account-json flow is correctPulls from Additional["keyfile_json"], validates, and encodes. Clear error when missing. LGTM.
226-232
: Graceful handling of unsupported oauth methodsWarning + skip aligns with project preference to continue processing on unsupported configurations.
234-240
: BigQuery data source construction aligns with Wren engine schemaFields and JSON tags (“project_id”, “dataset_id”, “credentials”) match the documented shape.
333-344
: Validation covers required fieldsTrimming and checking project_id, dataset_id, credentials ensures correctness at emit-time.
test: Adjusted data source testing fix: Go fmt linting
hi @cougrimes
|
This PR directly builds off #1827 to provide key updates and expansions around parsing dbt information into Wren MDL, as well as establishing bigquery-dbt support.
Initial Features & BigQuery Support
1. BigQuery Integration
profiles.yml
for BigQuery-specific connection details, includingproject
,dataset
,keyfile
, andmethod
.service-account
,service-account-json
) are correctly configured.oauth
-related methods are screened out with the assumption live testing may come in a later iteration.2. Metadata Mapping
meta.label
field from the dbt project and maps it to thedisplayName
property in the Wren MDL.3. Configurable Staging Model Inclusion
--include-staging-models
command-line flag to thedbt-auto-convert
command.stg_
orstaging_
in their names; otherwise, they are skipped by default.Semantic Layer & Data Integrity Features
1. Relationship Generation from dbt Tests
Relationship
objects from dbtrelationships
tests.manifest.json
: embedded directly onstruct
fields and as top-level, compiledtest
nodes for simple columns.2. Metric Conversion from dbt Semantic Layer
semantic_manifest.json
to translate dbtmetrics
and their underlyingsemantic_models
into WrenMetric
objects.simple
,ratio
, andderived
metric types.semantic_manifest.json
is not found.3. Enum Definition Generation from
accepted_values
TestsEnumDefinition
objects from dbtaccepted_values
tests.EnumDefinition
is created, and all relevant columns are linked to it.struct
fields.4. Primary Key Identification from Semantic Models
entities
list within dbtsemantic_models
to identify the primary entity (type: "primary"
).node_relation.alias
to correctly map the identified primary key to the corresponding dbt model and populate theprimaryKey
field in the Wrenmodel
.5. Not Null Constraint Conversion
not_null
tests from all possible locations in the manifest.not_null
test is found, theNotNull
field on the corresponding WrenWrenColumn
is set totrue
.Miscellaneous notes
Made Semantic Layer Parsing Optional & Robust
semantic_manifest.json
was wrapped in checks to ensure that the converter runs successfully without errors even if the file is missing or malformed, simply skipping the dependent features.Corrected Struct Definitions
wren_mdl.go
file was updated to include the necessary structs and fields forMetric
andEnumDefinition
to support the new features.Summary by CodeRabbit
New Features
Tests
Chores