Skip to content

Conversation

cougrimes
Copy link

@cougrimes cougrimes commented Aug 3, 2025

This PR directly builds off #1827 to provide key updates and expansions around parsing dbt information into Wren MDL, as well as establishing bigquery-dbt support.

Initial Features & BigQuery Support

1. BigQuery Integration

  • Added full support for converting dbt projects that use a BigQuery data source.
  • The converter correctly parses profiles.yml for BigQuery-specific connection details, including project, dataset, keyfile, and method.
  • Includes robust validation for BigQuery connections, ensuring required properties are present and supported authentication methods (service-account, service-account-json) are correctly configured. oauth-related methods are screened out with the assumption live testing may come in a later iteration.

2. Metadata Mapping

  • Reads a column's meta.label field from the dbt project and maps it to the displayName property in the Wren MDL.

3. Configurable Staging Model Inclusion

  • Added an --include-staging-models command-line flag to the dbt-auto-convert command.
  • When this flag is used, the converter will include models with stg_ or staging_ in their names; otherwise, they are skipped by default.

Semantic Layer & Data Integrity Features

1. Relationship Generation from dbt Tests

  • The converter now automatically generates Wren Relationship objects from dbt relationships tests.
  • Robust Parsing: The logic correctly parses relationship tests from both possible locations in the manifest.json: embedded directly on struct fields and as top-level, compiled test nodes for simple columns.

2. Metric Conversion from dbt Semantic Layer

  • Added support for parsing semantic_manifest.json to translate dbt metrics and their underlying semantic_models into Wren Metric objects.
  • Correctly handles simple, ratio, and derived metric types.
  • This feature is optional; the script runs without error if semantic_manifest.json is not found.

3. Enum Definition Generation from accepted_values Tests

  • Generates Wren EnumDefinition objects from dbt accepted_values tests.
  • De-duplication Logic: If multiple columns share the exact same set of accepted values, only one EnumDefinition is created, and all relevant columns are linked to it.
  • Supports tests on both simple columns and nested struct fields.

4. Primary Key Identification from Semantic Models

  • Reads the entities list within dbt semantic_models to identify the primary entity (type: "primary").
  • Uses the node_relation.alias to correctly map the identified primary key to the corresponding dbt model and populate the primaryKey field in the Wren model.

5. Not Null Constraint Conversion

  • Identifies dbt not_null tests from all possible locations in the manifest.
  • When a not_null test is found, the NotNull field on the corresponding Wren WrenColumn is set to true.

Miscellaneous notes

Made Semantic Layer Parsing Optional & Robust

  • The logic for parsing semantic_manifest.json was wrapped in checks to ensure that the converter runs successfully without errors even if the file is missing or malformed, simply skipping the dependent features.

Corrected Struct Definitions

  • The wren_mdl.go file was updated to include the necessary structs and fields for Metric and EnumDefinition to support the new features.

Summary by CodeRabbit

  • New Features

    • BigQuery is supported as a data source with credential handling and validation.
    • dbt conversion can read semantic manifests and now exports metrics, enums, primary keys, inferred relationships, and column display names into the MDL output.
    • New CLI flag and interactive prompt to include or exclude staging models during conversion.
  • Tests

    • Added tests covering BigQuery data source creation, credential resolution, validation, and type mappings.
  • Chores

    • Minor formatting and consistency tweaks.

Copy link
Contributor

coderabbitai bot commented Aug 3, 2025

Walkthrough

Adds an --include-staging-models CLI flag and prompt, threads it through dbt conversion, adds semantic_manifest.json parsing to extract enums, metrics, relationships, and primary keys, adds BigQuery data-source support and tests, and extends the Wren MDL schema. A duplicate prompt function and a minor launch.go note were introduced.

Changes

Cohort / File(s) Change Summary
CLI & Call Sites
wren-launcher/commands/dbt.go, wren-launcher/commands/launch.go
Added --include-staging-models flag and interactive prompt; pass IncludeStagingModels through DbtAutoConvert → DbtConvertProject; updated call sites to accept the new boolean; duplicate askForIncludeStagingModels definition present and launch.go notes prompt-related handling.
DBT Conversion Core
wren-launcher/commands/dbt/converter.go
Added semantic_manifest.json support; extended ConvertDbtCatalogToWrenMDL and helpers to accept semantic manifest path and includeStagingModels; added enum extraction, not-null/primary-key mapping, relationship generation, metrics conversion, and new summary counts; ConvertOptions gains IncludeStagingModels.
Data Source (BigQuery)
wren-launcher/commands/dbt/data_source.go, wren-launcher/commands/dbt/profiles.go, wren-launcher/commands/dbt/profiles_analyzer.go
Introduced WrenBigQueryDataSource and convertToBigQueryDataSource with credential handling (service-account-json, service-account/keyfile resolution, oauth warning), base64-encoded credentials, and type mapping; added DbtConnection.Method and parsing support.
Tests
wren-launcher/commands/dbt/data_source_test.go
Added BigQuery-focused tests: service-account-json, absolute/relative keyfile resolution, validation failure cases, and MapType tests; refactored/renamed a Postgres test to reflect default-port behavior.
Wren MDL Schema
wren-launcher/commands/dbt/wren_mdl.go
Added EnumDefinition and Metric structs; added EnumDefinitions and Metrics to WrenMDLManifest; added DisplayName and Enum fields to WrenColumn.

Sequence Diagram(s)

sequenceDiagram
    participant User as CLI User
    participant CLI as DbtAutoConvert (CLI)
    participant Conv as ConvertDbtProject / Converter
    participant DS as DataSource Builder
    participant MDL as WrenMDLManifest

    User->>CLI: run command (flag or prompt)
    CLI->>Conv: invoke conversion (IncludeStagingModels)
    Conv->>DS: parse profiles → build DataSource (BigQuery path possible)
    Conv->>Conv: read manifest.json & catalog.json
    alt semantic_manifest present
        Conv->>Conv: read semantic_manifest.json → extract enums, metrics, PKs
    end
    Conv->>Conv: apply staging filter (based on IncludeStagingModels)
    Conv->>Conv: generate relationships & metrics
    Conv->>MDL: assemble manifest (models, enums, metrics, relationships, datasources)
    MDL-->>CLI: return ConvertResult
    CLI-->>User: write/output MDL
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

launcher

Suggested reviewers

  • douenergy
  • wwwy3y3

Poem

"I nibble manifests and chase each key,
BigQuery crumbs and enums follow me.
Metrics hum softly, relationships sing,
Staging gates open — conversion takes wing. 🐇"

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 443d62a and 8eac257.

📒 Files selected for processing (5)
  • wren-launcher/commands/dbt/converter.go (14 hunks)
  • wren-launcher/commands/dbt/data_source.go (4 hunks)
  • wren-launcher/commands/dbt/data_source_test.go (6 hunks)
  • wren-launcher/commands/dbt/profiles.go (1 hunks)
  • wren-launcher/commands/dbt/wren_mdl.go (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • wren-launcher/commands/dbt/profiles.go
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-07-09T02:43:20.433Z
Learnt from: goldmedal
PR: Canner/WrenAI#1827
File: wren-launcher/commands/dbt/data_source.go:50-53
Timestamp: 2025-07-09T02:43:20.433Z
Learning: In wren-launcher/commands/dbt/data_source.go, when encountering unsupported database types in convertConnectionToDataSource function, the user prefers to log a warning and continue processing instead of failing. The function should return nil, nil to allow processing to continue.

Applied to files:

  • wren-launcher/commands/dbt/data_source.go
  • wren-launcher/commands/dbt/data_source_test.go
  • wren-launcher/commands/dbt/converter.go
📚 Learning: 2025-08-13T05:17:30.180Z
Learnt from: goldmedal
PR: Canner/WrenAI#1887
File: wren-launcher/commands/dbt/data_source.go:273-276
Timestamp: 2025-08-13T05:17:30.180Z
Learning: In the Wren AI codebase, there's an intentional distinction between dbt connection types and Wren AI data source types: dbt uses "sqlserver" as the connection type, while Wren AI expects "mssql" as the data source type. The conversion layer in convertConnectionToDataSource() correctly matches "sqlserver" from dbt profiles and the WrenMSSQLDataSource.GetType() method correctly returns "mssql" for Wren AI compatibility.

Applied to files:

  • wren-launcher/commands/dbt/data_source.go
📚 Learning: 2025-07-09T02:43:07.560Z
Learnt from: goldmedal
PR: Canner/WrenAI#1827
File: wren-launcher/commands/dbt/data_source.go:154-156
Timestamp: 2025-07-09T02:43:07.560Z
Learning: In the WrenAI codebase, the MapType methods for data sources should return the original type name for unknown types, rather than a default fallback like "string". This maintains consistency across all data source implementations.

Applied to files:

  • wren-launcher/commands/dbt/data_source.go
🧬 Code Graph Analysis (3)
wren-launcher/commands/dbt/data_source.go (1)
wren-launcher/commands/dbt/profiles.go (1)
  • DbtConnection (16-40)
wren-launcher/commands/dbt/data_source_test.go (2)
wren-launcher/commands/dbt/profiles.go (3)
  • DbtProfiles (4-7)
  • DbtProfile (10-13)
  • DbtConnection (16-40)
wren-launcher/commands/dbt/data_source.go (7)
  • ValidateAllDataSources (433-455)
  • GetActiveDataSources (375-413)
  • WrenBigQueryDataSource (321-325)
  • DataSource (44-48)
  • WrenLocalFileDataSource (242-245)
  • DefaultDataSource (458-458)
  • WrenPostgresDataSource (286-292)
wren-launcher/commands/dbt/converter.go (3)
wren-launcher/commands/dbt/utils.go (1)
  • FileExists (30-36)
wren-launcher/commands/dbt/data_source.go (2)
  • WrenBigQueryDataSource (321-325)
  • DataSource (44-48)
wren-launcher/commands/dbt/wren_mdl.go (7)
  • WrenMDLManifest (4-13)
  • EnumDefinition (16-19)
  • WrenModel (22-30)
  • Relationship (53-59)
  • Metric (62-69)
  • View (72-76)
  • WrenColumn (40-50)
🔇 Additional comments (33)
wren-launcher/commands/dbt/wren_mdl.go (4)

5-13: LGTM!

The extension of WrenMDLManifest with new optional fields for enums, metrics, and enhanced column definitions correctly aligns with the PR objectives to support semantic layer features.


15-19: LGTM!

The EnumDefinition struct is well-designed for storing named enum values that can be referenced by columns, supporting the accepted_values test conversion mentioned in the PR objectives.


42-44: LGTM!

The addition of DisplayName and Enum fields to WrenColumn properly supports the meta.label mapping and enum linkage features described in the PR objectives.


61-69: LGTM!

The Metric struct correctly captures the essential attributes for semantic layer metrics with proper support for dimensions, aggregations, and display information as outlined in the PR objectives.

wren-launcher/commands/dbt/data_source_test.go (6)

4-8: LGTM!

The addition of necessary imports for base64 encoding, file operations, and path handling properly supports the new BigQuery test functionality.


87-123: LGTM!

The test function name change from TestFromDbtProfiles_PostgresWithDbName to TestFromDbtProfiles_PostgresWithDefaultPort accurately reflects the test's purpose and the code correctly validates default port behavior.


208-252: LGTM!

The TestValidateAllDataSources function provides comprehensive coverage for both valid and invalid profiles, ensuring proper validation behavior across different scenarios.


254-403: LGTM!

The BigQuery test implementation comprehensively covers all supported authentication methods (service-account-json, service-account with absolute/relative paths) and properly validates base64 encoding of credentials. The tests correctly use GetActiveDataSources and handle file operations appropriately.


405-457: LGTM!

The BigQuery validation tests provide excellent coverage of all validation scenarios (valid configuration, missing project, missing dataset, missing credentials) using a clean table-driven test approach.


582-635: LGTM!

The TestMapType function provides comprehensive coverage of type mapping across all data source types (BigQuery, LocalFile, DefaultDataSource, PostgresDataSource) with clear test cases and proper assertions.

wren-launcher/commands/dbt/data_source.go (6)

4-7: LGTM!

The addition of necessary imports for base64 encoding, JSON validation, and file operations properly supports the new BigQuery data source functionality.


83-85: LGTM!

The integration of BigQuery support into the connection conversion flow is correctly implemented and follows the established pattern for other data source types.


152-240: LGTM!

The convertToBigQueryDataSource function excellently handles all BigQuery authentication methods with comprehensive error handling, proper credential validation, and robust file path resolution. The implementation correctly supports both service-account-json (inline) and service-account (file path) methods while gracefully handling unsupported oauth methods.


321-325: LGTM!

The WrenBigQueryDataSource struct properly maps to the expected Wren engine API format with correct JSON field names (project_id, dataset_id, credentials).


327-344: LGTM!

The interface implementations are correct, with proper validation of required fields and appropriate error messages.


346-370: LGTM!

The BigQuery type mapping implementation correctly maps BigQuery-specific types to appropriate Wren types, following the pattern established in other data source implementations and addressing the previous review feedback.

wren-launcher/commands/dbt/converter.go (17)

8-8: LGTM!

The addition of the regexp package import is necessary for the new enum name sanitization functionality.


15-17: Helpful documentation comment for struct organization.

Good practice to document the code organization and prevent compilation errors.


26-26: LGTM!

The addition of IncludeStagingModels option properly supports the configurable staging model inclusion feature mentioned in the PR objectives.


59-63: LGTM!

The semantic manifest integration is well-implemented with proper file existence checking and clear user messaging about optional features.

Also applies to: 85-92


172-180: LGTM!

The BigQuery data source mapping correctly follows the Wren engine API specification with proper field names (project_id, dataset_id, credentials).


213-213: LGTM!

The function signature extension to include semantic manifest path and staging model options is correctly implemented.


234-236: LGTM!

The conversion summary enhancement provides valuable feedback to users about the comprehensive conversion results including relationships, metrics, and enums.


272-272: LGTM!

The function signature extension properly supports the new semantic layer features and staging model inclusion.


298-422: LGTM!

The semantic manifest parsing and pre-processing logic is robust and well-structured, with proper error handling and graceful fallbacks when optional files are missing or malformed.


433-438: LGTM!

The staging model filtering logic correctly implements the configurable inclusion feature using proper prefix matching.


447-455: LGTM!

The relationship and metrics generation integration is properly implemented with appropriate conditional logic based on data availability.


460-523: LGTM!

The relationship generation logic comprehensively handles both model column tests and compiled test nodes with proper deduplication. The implementation correctly extracts relationship information from various test structures.


574-652: LGTM!

The enum management and column test processing functions are well-designed with proper deduplication, name sanitization, and recursive handling of nested column structures.


654-783: LGTM!

The metric conversion logic excellently handles different metric types (simple, ratio, derived) with proper lookup tables and aggregation expression building. The ratio metric implementation correctly uses measure metadata for proper SQL generation.


786-786: LGTM!

The model conversion enhancements properly integrate enum mapping, not-null constraints, primary key assignment, and DisplayName mapping from meta.label as specified in the PR objectives.

Also applies to: 863-873, 907-910


925-927: LGTM!

The addition of null safety to getStringFromMap improves robustness when dealing with potentially nil map inputs.


936-969: LGTM!

The utility functions for map extraction, model name parsing, and ref string parsing are well-implemented and properly support the new semantic layer features.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
wren-launcher/commands/dbt/data_source.go (1)

119-137: Consider adding validation for keyfile_json format.

The function correctly extracts keyfile_json from the Additional map, but there's no validation of its format. Consider adding basic JSON validation to catch malformed service account keys early.

wren-launcher/commands/dbt/converter.go (3)

573-587: Consider sanitizing enum names for special characters.

The enum name generation uses model and column names directly, which might contain special characters that are invalid in enum names.

Consider adding more robust sanitization:

 enumName = fmt.Sprintf("%s_%s_Enum", modelName, columnName)
 // Sanitize enum name
-enumName = strings.ReplaceAll(enumName, ".", "_")
+// Replace any non-alphanumeric characters with underscores
+re := regexp.MustCompile(`[^a-zA-Z0-9_]`)
+enumName = re.ReplaceAllString(enumName, "_")
+// Ensure it doesn't start with a number
+if len(enumName) > 0 && enumName[0] >= '0' && enumName[0] <= '9' {
+    enumName = "_" + enumName
+}

684-701: Add validation for measure references.

The code searches for input measures in the measure lookup but doesn't handle cases where measures might not be found.

Consider adding validation and logging:

 // Find the model this metric is based on
 for model, measures := range measureLookup {
 	for _, inputMeasure := range inputMeasuresList {
 		if imMap, ok := inputMeasure.(map[string]interface{}); ok {
 			imName := getStringFromMap(imMap, "name", "")
 			if _, exists := measures[imName]; exists {
 				baseModel = model
 				break
+			} else if imName != "" {
+				pterm.Warning.Printf("Measure '%s' not found in model '%s'\n", imName, model)
 			}
 		}
 	}
 	if baseModel != "" {
 		break
 	}
 }
+if baseModel == "" && len(inputMeasuresList) > 0 {
+	pterm.Warning.Printf("Could not find model for metric '%s'\n", metricName)
+}

918-925: Handle edge cases in ref parsing.

The parseRef function assumes standard ref() syntax but might encounter variations.

Consider handling more ref() variations:

 func parseRef(refStr string) string {
-	re := regexp.MustCompile(`ref\(['"]([^'"]+)['"]\)`)
+	// Handle ref() with optional spaces and both quote types
+	re := regexp.MustCompile(`ref\s*\(\s*['"]([^'"]+)['"]\s*\)`)
 	matches := re.FindStringSubmatch(refStr)
 	if len(matches) > 1 {
 		return matches[1]
 	}
 	return ""
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between abc1868 and 057420d.

📒 Files selected for processing (8)
  • wren-launcher/commands/dbt.go (3 hunks)
  • wren-launcher/commands/dbt/converter.go (14 hunks)
  • wren-launcher/commands/dbt/data_source.go (3 hunks)
  • wren-launcher/commands/dbt/data_source_test.go (1 hunks)
  • wren-launcher/commands/dbt/profiles.go (2 hunks)
  • wren-launcher/commands/dbt/profiles_analyzer.go (2 hunks)
  • wren-launcher/commands/dbt/wren_mdl.go (3 hunks)
  • wren-launcher/commands/launch.go (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: in wren-launcher/commands/dbt/data_source.go, when encountering unsupported database types in conver...
Learnt from: goldmedal
PR: Canner/WrenAI#1827
File: wren-launcher/commands/dbt/data_source.go:50-53
Timestamp: 2025-07-09T02:43:20.433Z
Learning: In wren-launcher/commands/dbt/data_source.go, when encountering unsupported database types in convertConnectionToDataSource function, the user prefers to log a warning and continue processing instead of failing. The function should return nil, nil to allow processing to continue.

Applied to files:

  • wren-launcher/commands/dbt/profiles.go
  • wren-launcher/commands/dbt/profiles_analyzer.go
  • wren-launcher/commands/dbt.go
  • wren-launcher/commands/launch.go
  • wren-launcher/commands/dbt/data_source_test.go
  • wren-launcher/commands/dbt/data_source.go
  • wren-launcher/commands/dbt/converter.go
📚 Learning: in the wren-ai-service codebase, when adding new fields like "alias" to the output of functions that...
Learnt from: cyyeh
PR: Canner/WrenAI#1763
File: wren-ai-service/src/pipelines/generation/semantics_description.py:31-33
Timestamp: 2025-06-20T02:37:21.292Z
Learning: In the wren-ai-service codebase, when adding new fields like "alias" to the output of functions that use Pydantic models for validation, the user prefers not to update the corresponding Pydantic model definitions to include these new fields.

Applied to files:

  • wren-launcher/commands/dbt/wren_mdl.go
📚 Learning: in the wrenai codebase, the maptype methods for data sources should return the original type name fo...
Learnt from: goldmedal
PR: Canner/WrenAI#1827
File: wren-launcher/commands/dbt/data_source.go:154-156
Timestamp: 2025-07-09T02:43:07.560Z
Learning: In the WrenAI codebase, the MapType methods for data sources should return the original type name for unknown types, rather than a default fallback like "string". This maintains consistency across all data source implementations.

Applied to files:

  • wren-launcher/commands/dbt/data_source.go
  • wren-launcher/commands/dbt/converter.go
🧬 Code Graph Analysis (3)
wren-launcher/commands/dbt.go (1)
wren-launcher/commands/dbt/converter.go (3)
  • ConvertResult (30-34)
  • ConvertOptions (19-27)
  • ConvertDbtProjectCore (38-259)
wren-launcher/commands/launch.go (1)
wren-launcher/commands/dbt.go (1)
  • DbtConvertProject (63-75)
wren-launcher/commands/dbt/converter.go (3)
wren-launcher/commands/dbt/utils.go (1)
  • FileExists (30-36)
wren-launcher/commands/dbt/data_source.go (2)
  • WrenBigQueryDataSource (218-224)
  • DataSource (12-16)
wren-launcher/commands/dbt/wren_mdl.go (7)
  • WrenMDLManifest (4-13)
  • EnumDefinition (16-19)
  • WrenModel (22-30)
  • Relationship (52-58)
  • Metric (61-68)
  • View (71-75)
  • WrenColumn (40-49)
🔇 Additional comments (10)
wren-launcher/commands/dbt/profiles_analyzer.go (1)

136-136: LGTM! Clean implementation of method field extraction.

The addition of the Method field extraction follows the established pattern and correctly includes it in the knownFields map to prevent it from being stored in the Additional map. This properly supports BigQuery authentication methods as outlined in the PR objectives.

Also applies to: 149-149

wren-launcher/commands/launch.go (1)

518-518: LGTM! Correctly implements the new IncludeStagingModels parameter.

The addition of the false parameter for IncludeStagingModels is appropriate for the launch flow, where staging models are typically not needed. This aligns with the function signature update and maintains backward compatibility.

wren-launcher/commands/dbt/profiles.go (2)

28-28: LGTM! Well-implemented Method field for BigQuery support.

The Method field is properly added with appropriate YAML/JSON tags and omitempty annotation, following the established pattern for optional database-specific fields. This correctly supports BigQuery authentication methods as mentioned in the PR objectives.


11-11: LGTM! Improved formatting consistency.

The formatting adjustments to the Target and SSLMode fields improve code consistency without affecting functionality.

Also applies to: 35-35

wren-launcher/commands/dbt.go (2)

27-27: LGTM! Well-implemented command line flag for staging model inclusion.

The --include-staging-models flag is properly added with a clear description that explains its purpose. This gives users control over whether staging models are included during conversion.


50-50: LGTM! Consistent implementation of IncludeStagingModels parameter.

The parameter is properly threaded through both the DbtAutoConvert command and the DbtConvertProject function, with clean formatting and consistent assignment to the ConvertOptions struct. The function signature update maintains clarity and follows Go conventions.

Also applies to: 63-72

wren-launcher/commands/dbt/data_source_test.go (2)

377-470: LGTM! Comprehensive BigQuery conversion tests.

The test cases properly cover both supported BigQuery authentication methods ("service-account" and "service-account-json") and validate correct field extraction and assignment. The test structure follows established patterns and provides good coverage for the new BigQuery functionality.


472-544: LGTM! Thorough BigQuery validation test coverage.

The validation tests comprehensively cover both valid and invalid scenarios, including:

  • Both supported authentication methods
  • Missing required fields (project, dataset)
  • Unsupported oauth method
  • Method-specific requirements (keyfile for service-account)

This ensures the BigQuery data source validation behaves correctly across all expected use cases.

wren-launcher/commands/dbt/wren_mdl.go (1)

7-11: LGTM! Clean struct additions for enhanced metadata support.

The new EnumDefinition and Metric structs are well-designed, and the extensions to WrenMDLManifest and WrenColumn are properly implemented with appropriate JSON tags and omitempty markers.

Also applies to: 15-19, 43-43, 60-68

wren-launcher/commands/dbt/converter.go (1)

431-433: LGTM! Staging model filtering is implemented correctly.

The filtering logic properly excludes models with .stg_ or .staging_ in their names when IncludeStagingModels is false.

This commit incorporates a series of changes based on code review feedback to improve the dbt to Wren MDL converter. The updates focus on enhancing security, adding more robust validation and parsing, and increasing the completeness of the generated MDL file.

In `wren_mdl.go`:
- feat: Adds a `DisplayName` field to the `WrenColumn` struct to support user-friendly labels for columns.

In `data_source.go`:
- feat: Adds JSON validation for the `keyfile_json` field in BigQuery data sources to provide earlier feedback on malformed credentials.
- refactor: Implements BigQuery-specific data type mappings in the `MapType` method to correctly convert types like `INT64`, `TIMESTAMP`, and `NUMERIC`.

In `converter.go`:
- fix(security): Omits the `keyfile_json` content from the generated `wren-datasource.json` to prevent exposing sensitive credentials in the output file.
- feat: Adds mapping for a dbt column's `meta.label` to the `DisplayName` property in the corresponding Wren `WrenColumn`, improving the usability of the output.
- refactor: Enhances enum name generation with more robust sanitization, replacing all non-alphanumeric characters to ensure valid identifiers.
- refactor: Improves the `ref()` parsing regex to handle optional whitespace, making it more resilient to formatting variations.
- refactor: Adds validation for metric `input_measures` to log a warning if a referenced measure cannot be found, improving debuggability.
- fix: Resolves a persistent compilation error related to an incorrect type assertion on `inputMeasures` by refactoring the logic to correctly identify the base model for a metric.
@goldmedal
Copy link
Collaborator

Thanks @cougrimes for the contribution. I'll take a look 👍

@wwwy3y3 wwwy3y3 requested a review from goldmedal August 4, 2025 07:50
Copy link
Collaborator

@goldmedal goldmedal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cougrimes for working on this. It's a great start for BigQuery and dbt semantic layer integration! 👍

I've left some comments regarding the connection info and launching steps.

For the dbt semantic layer integration, could you provide a sample semantic_manifest.json? I'm not very familiar with the dbt semantic layer, so a sample would help us confirm whether the conversion logic is reasonable.

OutputDir string
ProfileName string
Target string
IncludeStagingModels bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice improvement 👍

Comment on lines 219 to 225
type WrenBigQueryDataSource struct {
Project string `json:"project"`
Dataset string `json:"dataset"`
Method string `json:"method"`
Keyfile string `json:"keyfile,omitempty"`
KeyfileJSON string `json:"keyfile_json,omitempty"`
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They aren't matched what Wren AI requires for BigQuery connection. You can refer to the Wren engine doc .
The connection info should be

{
"credentials": "eyJ...",
"dataset_id": "my_dataset",
"project_id": "my-project"
}

credentials is the base64 content of json file

@@ -154,6 +169,17 @@ func ConvertDbtProjectCore(opts ConvertOptions) (*ConvertResult, error) {
"format": typedDS.Format,
},
}
case *WrenBigQueryDataSource:
// SECURITY FIX: Omit keyfile_json from output for security.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To build the connection info of BigQuery, we need to encode the content of the key file in base64. What's your concern about reading the keyfile_json? 🤔

@@ -515,12 +515,12 @@ func processDbtProject(projectDir string) (string, error) {
}

// Use the core conversion function from dbt package
result, err := DbtConvertProject(dbtProjectPath, targetDir, profileName, target, true)
result, err := DbtConvertProject(dbtProjectPath, targetDir, profileName, target, true, false)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We add a new option IncludeStagingModels for dbt converting. We should have an additional step for it. Otherwise, it will always be disabled.

@goldmedal
Copy link
Collaborator

Hi @cougrimes, after #1877, we added some formatter and lint tools for wren-launcher module. It causes some conflicts for this PR. If you don't mind, I can help to solve the conflict by pushing to your branch directly.
By the way, I left some comments for this PR. If you have no time to improve them, we can merge this PR first and I'll improve them in the follow-up PR.
Anyway, thanks for your contribution.

@cougrimes
Copy link
Author

@goldmedal Happy to make the changes requested, but I actually wound up hitting an unexpected issue when processing the MDL on my end—despite declaring BigQuery, I keep getting the MDL trying to parse as Trino rather than BQ. Do you have any other documentation on the MDL schema? I've been trying to work backward to figure out why this is happening to no avail.

@goldmedal
Copy link
Collaborator

@goldmedal Happy to make the changes requested, but I actually wound up hitting an unexpected issue when processing the MDL on my end—despite declaring BigQuery, I keep getting the MDL trying to parse as Trino rather than BQ.

What did you mean that the MDL is parsed as Trino? Which step will do it? 🤔

Do you have any other documentation on the MDL schema? I've been trying to work backward to figure out why this is happening to no avail.

You can check the doc for WrenMDL or the json schema.
Although some features (e.g. Metric, Enum, ..) are not presented in the doc, we still can put them in the MDL for SQL-generated context.

@cougrimes
Copy link
Author

Most of the issues I had been running into around Trino-esque errors are addressed in Canner/wren-engine#1290; issues with keys and auths resolved.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
wren-launcher/commands/dbt/data_source.go (1)

4-11: Import os to support reading keyfile paths for service-account auth

We’ll need os.ReadFile when handling dbt BigQuery profiles that use method: service-account with keyfile path (see below).

 import (
 	"encoding/base64"
 	"encoding/json"
 	"fmt"
+	"os"
 	"path/filepath"
 	"strings"

 	"github.com/pterm/pterm"
 )
♻️ Duplicate comments (1)
wren-launcher/commands/launch.go (1)

525-525: IncludeStagingModels is hardcoded to false; there’s no way to enable it in the Launch flow

This permanently disables staging models in this path. Per the PR objective and previous reviewer note, expose a toggle here (prompt or config/flag) instead of hardcoding.

Apply this minimal change to pass a variable instead of a literal:

- result, err := DbtConvertProject(dbtProjectPath, targetDir, profileName, target, true, false)
+ includeStagingModels := false // TODO: wire from config flag or prompt
+ result, err := DbtConvertProject(dbtProjectPath, targetDir, profileName, target, true, includeStagingModels)

If you want a quick interactive solution, add a prompt:

// place near other ask* helpers
func askForIncludeStagingModels() (bool, error) {
	prompt := promptui.Select{
		Label: "Include staging models (stg_*, staging_*)?",
		Items: []string{"No", "Yes"},
	}
	_, result, err := prompt.Run()
	if err != nil {
		return false, err
	}
	return result == "Yes", nil
}

Then in processDbtProject(), before calling DbtConvertProject:

includeStagingModels, _ := askForIncludeStagingModels() // default to No on error
🧹 Nitpick comments (4)
wren-launcher/commands/dbt/data_source.go (1)

272-283: Tighten validation by trimming whitespace

Avoid accepting whitespace-only values for required fields.

 func (ds *WrenBigQueryDataSource) Validate() error {
-	if ds.Project == "" {
+	if strings.TrimSpace(ds.Project) == "" {
 		return fmt.Errorf("project_id cannot be empty")
 	}
-	if ds.Dataset == "" {
+	if strings.TrimSpace(ds.Dataset) == "" {
 		return fmt.Errorf("dataset_id cannot be empty")
 	}
-	if ds.Credentials == "" {
+	if strings.TrimSpace(ds.Credentials) == "" {
 		return fmt.Errorf("credentials cannot be empty")
 	}
 	return nil
 }
wren-launcher/commands/dbt/converter.go (3)

433-437: Reduce false positives when excluding staging models

Checking substrings on nodeKey can skip non-staging models. Prefer checking model name prefix.

-		if !includeStagingModels && (strings.Contains(nodeKey, ".stg_") || strings.Contains(nodeKey, ".staging_")) {
-			continue
-		}
+		if !includeStagingModels {
+			mn := getModelNameFromNodeKey(nodeKey)
+			if strings.HasPrefix(mn, "stg_") || strings.HasPrefix(mn, "staging_") {
+				continue
+			}
+		}

458-510: Possible duplicate relationships; add a cheap dedup by key

If the same relationship is captured via both embedded and compiled tests, duplicates can be produced. Dedup on name+type+condition to keep output clean.

 func generateRelationships(manifestData map[string]interface{}) []Relationship {
 	var relationships []Relationship
 	if nodes, ok := manifestData["nodes"].(map[string]interface{}); ok {
 		for nodeKey, nodeValue := range nodes {
 			nodeMap, ok := nodeValue.(map[string]interface{})
 			if !ok {
 				continue
 			}
@@
 		}
 	}
-	return relationships
+	// Dedup
+	seen := make(map[string]struct{}, len(relationships))
+	var unique []Relationship
+	for _, r := range relationships {
+		key := r.Name + "|" + r.JoinType + "|" + r.Condition
+		if _, ok := seen[key]; ok {
+			continue
+		}
+		seen[key] = struct{}{}
+		unique = append(unique, r)
+	}
+	return unique
 }

938-945: Nit: precompile the ref() regex once

Compiling on each call is unnecessary. Cache at package scope for micro-efficiency and readability.

You can move the regex to a package-level var:

var refRegex = regexp.MustCompile(`ref\s*\(\s*['"]([^'"]+)['"]\s*\)`)

Then in parseRef:

-	re := regexp.MustCompile(`ref\s*\(\s*['"]([^'"]+)['"]\s*\)`)
-	matches := re.FindStringSubmatch(refStr)
+	matches := refRegex.FindStringSubmatch(refStr)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 055cd67 and a2ac6f0.

📒 Files selected for processing (5)
  • wren-launcher/commands/dbt/converter.go (14 hunks)
  • wren-launcher/commands/dbt/data_source.go (5 hunks)
  • wren-launcher/commands/dbt/data_source_test.go (1 hunks)
  • wren-launcher/commands/dbt/profiles_analyzer.go (2 hunks)
  • wren-launcher/commands/launch.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • wren-launcher/commands/dbt/profiles_analyzer.go
  • wren-launcher/commands/dbt/data_source_test.go
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-07-09T02:43:20.433Z
Learnt from: goldmedal
PR: Canner/WrenAI#1827
File: wren-launcher/commands/dbt/data_source.go:50-53
Timestamp: 2025-07-09T02:43:20.433Z
Learning: In wren-launcher/commands/dbt/data_source.go, when encountering unsupported database types in convertConnectionToDataSource function, the user prefers to log a warning and continue processing instead of failing. The function should return nil, nil to allow processing to continue.

Applied to files:

  • wren-launcher/commands/dbt/data_source.go
  • wren-launcher/commands/dbt/converter.go
📚 Learning: 2025-08-13T05:17:30.180Z
Learnt from: goldmedal
PR: Canner/WrenAI#1887
File: wren-launcher/commands/dbt/data_source.go:273-276
Timestamp: 2025-08-13T05:17:30.180Z
Learning: In the Wren AI codebase, there's an intentional distinction between dbt connection types and Wren AI data source types: dbt uses "sqlserver" as the connection type, while Wren AI expects "mssql" as the data source type. The conversion layer in convertConnectionToDataSource() correctly matches "sqlserver" from dbt profiles and the WrenMSSQLDataSource.GetType() method correctly returns "mssql" for Wren AI compatibility.

Applied to files:

  • wren-launcher/commands/dbt/data_source.go
📚 Learning: 2025-07-09T02:43:07.560Z
Learnt from: goldmedal
PR: Canner/WrenAI#1827
File: wren-launcher/commands/dbt/data_source.go:154-156
Timestamp: 2025-07-09T02:43:07.560Z
Learning: In the WrenAI codebase, the MapType methods for data sources should return the original type name for unknown types, rather than a default fallback like "string". This maintains consistency across all data source implementations.

Applied to files:

  • wren-launcher/commands/dbt/data_source.go
🧬 Code Graph Analysis (3)
wren-launcher/commands/launch.go (1)
wren-launcher/commands/dbt.go (1)
  • DbtConvertProject (63-75)
wren-launcher/commands/dbt/data_source.go (1)
wren-launcher/commands/dbt/profiles.go (1)
  • DbtConnection (16-40)
wren-launcher/commands/dbt/converter.go (3)
wren-launcher/commands/dbt/utils.go (1)
  • FileExists (30-36)
wren-launcher/commands/dbt/data_source.go (2)
  • WrenBigQueryDataSource (260-264)
  • DataSource (43-47)
wren-launcher/commands/dbt/wren_mdl.go (7)
  • WrenMDLManifest (4-13)
  • EnumDefinition (16-19)
  • WrenModel (22-30)
  • Relationship (53-59)
  • Metric (62-69)
  • View (72-76)
  • WrenColumn (40-50)
🔇 Additional comments (5)
wren-launcher/commands/dbt/data_source.go (2)

77-89: BigQuery type dispatch looks good

Routing "bigquery" to a dedicated converter aligns with the DataSource pattern and prior guidance to warn-and-continue on unsupported types.


286-309: BigQuery type mapping: LGTM

Covers the common BigQuery types and correctly falls back to the original type for unknowns, per codebase convention.

wren-launcher/commands/dbt/converter.go (3)

172-180: BigQuery datasource output structure matches engine expectations and avoids credential leakage

Emits project_id, dataset_id, credentials (base64) only, and writes with 0600. This aligns with the engine schema and prior security guidance.


839-847: DisplayName mapping from dbt meta.label: LGTM

Correctly reads meta.label into WrenColumn.DisplayName as per PR objective.


328-336: Verify MDL dataSources field usage

I wasn’t able to locate any MDL schema or engine code in this repo that defines how the dataSources property is interpreted. If the engine actually expects a path (for example "wren-datasource.json") rather than the connector type string, we should either:

  • Populate manifest.DataSources with the relative JSON filename,
  • Or leave it empty and let the co-located wren-datasource.json be picked up.

Would you like to wire ConvertDbtProjectCore to pass the generated datasource path into ConvertDbtCatalogToWrenMDL so this field is populated correctly?

* fix: Adjusted some behaviors around keyfiles vs. inlined JSON
* fix: Corrected a logical flaw where ratio metrics in dbt were calculated at a row level (numerator / denominator) instead of after aggregation. The code now correctly generates expressions like SUM(numerator) / SUM(denominator).
* perf: Improved performance by pre-compiling a regular expression for parsing ref() functions at the package level, preventing it from being re-compiled on every function call.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
wren-launcher/commands/launch.go (1)

190-201: Nice addition: interactive staging-models prompt

This addresses the earlier feedback to expose IncludeStagingModels via an extra step. The prompt UX is clear and returns a sane bool.

wren-launcher/commands/dbt/converter.go (2)

861-869: DisplayName mapping from meta.label: LGTM

Column.DisplayName is now sourced from meta.label per PR objectives. Thanks for addressing the earlier gap.


655-784: Metric conversion covers simple, ratio, and derived correctly

The ratio handling now composes expressions using each measure’s agg and expr metadata, avoiding bare-name division. Solid implementation.

🧹 Nitpick comments (2)
wren-launcher/commands/launch.go (1)

537-546: Wire-up looks correct; minor defensive nit

The call now passes the new IncludeStagingModels argument correctly. If the prompt errors, you already warn and fall back to the default (false). To make the fallback explicit and clearer to readers, consider setting includeStagingModels = false in the error branch.

 includeStagingModels, err := askForIncludeStagingModels()
 if err != nil {
     pterm.Warning.Println("Could not get staging model preference, defaulting to 'No'.")
+    includeStagingModels = false
 }
wren-launcher/commands/dbt/data_source.go (1)

151-231: BigQuery auth handling is robust; add small trims for resilience

Good coverage of service-account-json (inline), service-account (file path), fallback when method is empty, and explicit skipping of oauth per PR scope. To harden against whitespace in profiles, trim keyfile_json and keyfile path before use.

 case "service-account-json":
     // Extract inline JSON from Additional["keyfile_json"]
     var keyfileJSON string
     if kfj, exists := conn.Additional["keyfile_json"]; exists {
         if kfjStr, ok := kfj.(string); ok {
-            keyfileJSON = kfjStr
+            keyfileJSON = strings.TrimSpace(kfjStr)
         }
     }
@@
 case "service-account", "":
     // Prefer structured field; fall back to Additional["keyfile"]
-    keyfilePath := strings.TrimSpace(conn.Keyfile)
+    keyfilePath := strings.TrimSpace(conn.Keyfile)
     if keyfilePath == "" {
         if kf, ok := conn.Additional["keyfile"]; ok {
             if kfStr, ok := kf.(string); ok {
-                keyfilePath = strings.TrimSpace(kfStr)
+                keyfilePath = strings.TrimSpace(kfStr)
             }
         }
     }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a2ac6f0 and 696b358.

📒 Files selected for processing (3)
  • wren-launcher/commands/dbt/converter.go (14 hunks)
  • wren-launcher/commands/dbt/data_source.go (5 hunks)
  • wren-launcher/commands/launch.go (3 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-07-09T02:43:20.433Z
Learnt from: goldmedal
PR: Canner/WrenAI#1827
File: wren-launcher/commands/dbt/data_source.go:50-53
Timestamp: 2025-07-09T02:43:20.433Z
Learning: In wren-launcher/commands/dbt/data_source.go, when encountering unsupported database types in convertConnectionToDataSource function, the user prefers to log a warning and continue processing instead of failing. The function should return nil, nil to allow processing to continue.

Applied to files:

  • wren-launcher/commands/dbt/data_source.go
  • wren-launcher/commands/dbt/converter.go
📚 Learning: 2025-08-13T05:17:30.180Z
Learnt from: goldmedal
PR: Canner/WrenAI#1887
File: wren-launcher/commands/dbt/data_source.go:273-276
Timestamp: 2025-08-13T05:17:30.180Z
Learning: In the Wren AI codebase, there's an intentional distinction between dbt connection types and Wren AI data source types: dbt uses "sqlserver" as the connection type, while Wren AI expects "mssql" as the data source type. The conversion layer in convertConnectionToDataSource() correctly matches "sqlserver" from dbt profiles and the WrenMSSQLDataSource.GetType() method correctly returns "mssql" for Wren AI compatibility.

Applied to files:

  • wren-launcher/commands/dbt/data_source.go
📚 Learning: 2025-07-09T02:43:07.560Z
Learnt from: goldmedal
PR: Canner/WrenAI#1827
File: wren-launcher/commands/dbt/data_source.go:154-156
Timestamp: 2025-07-09T02:43:07.560Z
Learning: In the WrenAI codebase, the MapType methods for data sources should return the original type name for unknown types, rather than a default fallback like "string". This maintains consistency across all data source implementations.

Applied to files:

  • wren-launcher/commands/dbt/data_source.go
🧬 Code Graph Analysis (3)
wren-launcher/commands/dbt/data_source.go (1)
wren-launcher/commands/dbt/profiles.go (1)
  • DbtConnection (16-40)
wren-launcher/commands/launch.go (1)
wren-launcher/commands/dbt.go (1)
  • DbtConvertProject (63-75)
wren-launcher/commands/dbt/converter.go (3)
wren-launcher/commands/dbt/utils.go (1)
  • FileExists (30-36)
wren-launcher/commands/dbt/data_source.go (2)
  • WrenBigQueryDataSource (312-316)
  • DataSource (44-48)
wren-launcher/commands/dbt/wren_mdl.go (7)
  • WrenMDLManifest (4-13)
  • EnumDefinition (16-19)
  • WrenModel (22-30)
  • Relationship (53-59)
  • Metric (62-69)
  • View (72-76)
  • WrenColumn (40-50)
🔇 Additional comments (5)
wren-launcher/commands/launch.go (1)

190-202: No duplicate definitions of askForIncludeStagingModels found
Verified with rg -nC2 --type go -P '\bfunc\s+askForIncludeStagingModels\s*\('; only one occurrence exists at wren-launcher/commands/launch.go:191. No redeclaration issues detected.

wren-launcher/commands/dbt/data_source.go (1)

83-85: BigQuery dispatch added correctly

convertConnectionToDataSource now routes "bigquery" to the dedicated converter, aligning with the new data source support.

wren-launcher/commands/dbt/converter.go (3)

59-64: Optional files handled gracefully

Optional manifest.json and semantic_manifest.json are discovered and logged with clear fallbacks. This makes the conversion resilient to missing artifacts.

Also applies to: 85-92


172-180: BigQuery data source output matches engine requirements

Properties use project_id, dataset_id, and base64-encoded credentials as requested in the engine docs and maintainer feedback. File permissions for the output are secure elsewhere (0600).


433-437: Staging model exclusion implemented as specified

The includeStagingModels filter correctly skips stg_* and staging_* when disabled. Simple and effective.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
wren-launcher/commands/launch.go (1)

14-14: Fix invalid import path (breaks build)

The import has a typo ("github.comcom"). This will prevent compilation.

Apply this diff:

-	"github.comcom/Canner/WrenAI/wren-launcher/config"
+	"github.com/Canner/WrenAI/wren-launcher/config"
🧹 Nitpick comments (1)
wren-launcher/commands/launch.go (1)

536-544: Make the "No" fallback explicit on prompt failure; consider non-interactive override

Currently, if the prompt fails, includeStagingModels is already false due to the function’s (false, err) return, but making that explicit improves readability and future-proofing. Also consider honoring a non-interactive flag/env (e.g., via config) before prompting so CI/automation can bypass TTY interaction.

Minimal explicit fallback:

  // Ask the user whether to include staging models
  includeStagingModels, err := askForIncludeStagingModels()
  if err != nil {
     pterm.Warning.Println("Could not get staging model preference, defaulting to 'No'.")
+    includeStagingModels = false
  }

If you’d like, I can wire this to a config/env-driven value (e.g., config.GetIncludeStagingModels()) with the prompt as a fallback.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 696b358 and 1856a04.

📒 Files selected for processing (1)
  • wren-launcher/commands/launch.go (3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
wren-launcher/commands/launch.go (1)
wren-launcher/commands/dbt.go (1)
  • DbtConvertProject (63-75)
🔇 Additional comments (1)
wren-launcher/commands/launch.go (1)

190-201: All good: no duplicate definitions found

I’ve verified that askForIncludeStagingModels is defined only once in wren-launcher/commands/launch.go, so there’s no risk of a redefinition compile error. Approving these changes.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
wren-launcher/commands/dbt/converter.go (3)

447-451: Relationship generation and de-dup: fix looks correct

The de-dup key concatenation is now valid Go and the stray brace is gone. Relationship extraction from both column-embedded tests and compiled test nodes is comprehensive.

Also applies to: 460-524


654-783: Metrics: correct use of measure agg/expr in ratio/simple metrics

This addresses the earlier concern about bare names; building expressions as AGG(expr) and using proper parentheses for ratios is correct.


860-868: Column DisplayName mapping from meta.label: LGTM

This fulfills the PR objective and matches the WrenColumn API.

🧹 Nitpick comments (3)
wren-launcher/commands/launch.go (1)

536-541: Graceful fallback on prompt error is good; consider a non-interactive override

Defaulting to "No" on prompt failure is sensible. For CI/non-interactive scenarios, consider allowing an override via a flag/env/config (e.g., WREN_INCLUDE_STAGING_MODELS=true) to skip the prompt entirely when set.

Would you like me to wire an env-based override and document it?

wren-launcher/commands/dbt/converter.go (2)

433-438: Staging model filter: consider making prefixes configurable

The stg_/staging_ prefixes are a good default. Consider making the prefix list configurable (via opts or config/env) to support different naming conventions.


936-945: Guard getMapFromMap against nil parent maps

Minor robustness tweak: if m is nil, return defaultValue to mirror getStringFromMap.

Apply this diff:

 func getMapFromMap(m map[string]interface{}, key string, defaultValue map[string]interface{}) map[string]interface{} {
+  if m == nil {
+    return defaultValue
+  }
   if value, exists := m[key]; exists {
     if str, ok := value.(map[string]interface{}); ok {
       return str
     }
   }
   return defaultValue
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1856a04 and bbb6510.

📒 Files selected for processing (2)
  • wren-launcher/commands/dbt/converter.go (14 hunks)
  • wren-launcher/commands/launch.go (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-09T02:43:20.433Z
Learnt from: goldmedal
PR: Canner/WrenAI#1827
File: wren-launcher/commands/dbt/data_source.go:50-53
Timestamp: 2025-07-09T02:43:20.433Z
Learning: In wren-launcher/commands/dbt/data_source.go, when encountering unsupported database types in convertConnectionToDataSource function, the user prefers to log a warning and continue processing instead of failing. The function should return nil, nil to allow processing to continue.

Applied to files:

  • wren-launcher/commands/dbt/converter.go
🧬 Code Graph Analysis (2)
wren-launcher/commands/launch.go (1)
wren-launcher/commands/dbt.go (1)
  • DbtConvertProject (63-75)
wren-launcher/commands/dbt/converter.go (3)
wren-launcher/commands/dbt/utils.go (1)
  • FileExists (30-36)
wren-launcher/commands/dbt/data_source.go (2)
  • WrenBigQueryDataSource (312-316)
  • DataSource (44-48)
wren-launcher/commands/dbt/wren_mdl.go (7)
  • WrenMDLManifest (4-13)
  • EnumDefinition (16-19)
  • WrenModel (22-30)
  • Relationship (53-59)
  • Metric (62-69)
  • View (72-76)
  • WrenColumn (40-50)
🔇 Additional comments (9)
wren-launcher/commands/launch.go (2)

190-200: Include-staging prompt: LGTM; no duplicate definition detected

The prompt implementation is clean and returns a boolean as expected. Also, despite an earlier summary claiming a duplicate, this file now contains a single definition of askForIncludeStagingModels.


544-545: All DbtConvertProject call sites updated

I ran the grep across the Go code and found only the function definition and the single invocation in wren-launcher/commands/launch.go, both using the new six-argument signature. No remaining calls with the old arity exist.

wren-launcher/commands/dbt/converter.go (7)

20-27: New ConvertOptions.IncludeStagingModels: LGTM

Adding this to options keeps the flow explicit and testable.


85-92: Optional semantic manifest handling: LGTM

Robust to missing/malformed semantic_manifest.json with clear warnings. Good trade-off for resilience.


213-237: Helpful conversion summary: LGTM

The added counts for relationships, metrics, and enums improve UX and debuggability.


326-336: Manifest initialization extended with Enums/Metrics/DataSources: LGTM

Types align with wren_mdl.go and keep the output self-descriptive.


344-395: Pre-processing for not_null/accepted_values on struct fields: LGTM

Solid coverage of both top-level and embedded field tests with enum de-dup. Nice.


907-911: PK mapping via semantic_models alias: verify alias aligns with catalog model name

Semantic model node_relation.alias may differ from the dbt model name derived from the catalog key. If you see sparse PKs in output, consider a fallback using the dbt model name or a map from alias->model name.


172-180: Add cautionary warning for BigQuery credentials

Writing base64-encoded credentials is necessary but sensitive—even with 0600 perms, it’s worth flagging for end users. After the data source file is written, print a clear warning and double-check the property names against your MDL schema.

• After:

pterm.Success.Printf("✓ WrenDataSource saved to: %s\n", dataSourcePath)

insert:

if ds != nil && ds.GetType() == "bigquery" {
  pterm.Warning.Println("Note: wren-datasource.json contains BigQuery credentials. Treat it as sensitive and restrict access.")
}

• Ensure the JSON keys "project_id", "dataset_id", and "credentials" exactly match what your engine’s MDL/schema expects for BigQuery. The schema isn’t in this repo—please verify against your external specs or documentation.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
wren-launcher/commands/dbt/data_source.go (3)

183-207: Minor: clarify error when method is omitted

When method is empty you treat it like service-account (good), but the error message only mentions service-account. Slight wording tweak improves UX.

-            return nil, fmt.Errorf("bigquery: method 'service-account' requires 'keyfile' path")
+            return nil, fmt.Errorf("bigquery: auth method 'service-account' (or omitted) requires 'keyfile' path")

347-370: Nit: reuse shared type constants for consistency

You already declare integerType, doubleType, varcharType, booleanType, dateType, timestampType at the top. Returning those improves consistency and avoids typos.

 func (ds *WrenBigQueryDataSource) MapType(sourceType string) string {
   switch strings.ToUpper(sourceType) {
   case "INT64", "INTEGER":
-    return "integer"
+    return integerType
   case "FLOAT64", "FLOAT":
-    return "double"
+    return doubleType
   case "STRING":
-    return "varchar"
+    return varcharType
   case "BOOL", "BOOLEAN":
-    return "boolean"
+    return booleanType
   case "DATE":
-    return "date"
+    return dateType
   case "TIMESTAMP", "DATETIME":
-    return "timestamp"
+    return timestampType
   case "NUMERIC", "DECIMAL", "BIGNUMERIC":
-    return "double"
+    return doubleType
   case "BYTES":
-    return "varchar"
+    return varcharType
   case "JSON":
-    return "varchar"
+    return varcharType
   default:
     return strings.ToLower(sourceType)
   }
 }

209-214: Optional: Expand environment variables and “~” in keyfile paths

To make keyfile resolution more robust when dbt profiles specify paths using ~ or environment variables, consider updating lines 209–214 in wren-launcher/commands/dbt/data_source.go as follows:

-            // If keyfile path is not absolute, join it 
-            // with the dbt project's home directory path.
-            resolvedKeyfilePath := keyfilePath
-            if !filepath.IsAbs(keyfilePath) && dbtHomePath != "" {
-                resolvedKeyfilePath = filepath.Join(dbtHomePath, keyfilePath)
-            }
+            // Expand env vars and ~ in keyfile path, then resolve relative to dbtHomePath if provided.
+            expanded := os.ExpandEnv(keyfilePath)
+            if strings.HasPrefix(expanded, "~") {
+                if home, err := os.UserHomeDir(); err == nil {
+                    expanded = filepath.Join(home, strings.TrimPrefix(expanded, "~"))
+                }
+            }
+            resolvedKeyfilePath := expanded
+            if !filepath.IsAbs(resolvedKeyfilePath) && dbtHomePath != "" {
+                resolvedKeyfilePath = filepath.Join(dbtHomePath, resolvedKeyfilePath)
+            }

We verified that in production code (converter.go), profile conversion goes through GetActiveDataSources, which correctly passes dbtHomePath. This change will therefore handle both relative and env-var/tilde-based keyfile paths in the main conversion flow.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between bbb6510 and 443d62a.

📒 Files selected for processing (1)
  • wren-launcher/commands/dbt/data_source.go (4 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-07-09T02:43:20.433Z
Learnt from: goldmedal
PR: Canner/WrenAI#1827
File: wren-launcher/commands/dbt/data_source.go:50-53
Timestamp: 2025-07-09T02:43:20.433Z
Learning: In wren-launcher/commands/dbt/data_source.go, when encountering unsupported database types in convertConnectionToDataSource function, the user prefers to log a warning and continue processing instead of failing. The function should return nil, nil to allow processing to continue.

Applied to files:

  • wren-launcher/commands/dbt/data_source.go
📚 Learning: 2025-08-13T05:17:30.180Z
Learnt from: goldmedal
PR: Canner/WrenAI#1887
File: wren-launcher/commands/dbt/data_source.go:273-276
Timestamp: 2025-08-13T05:17:30.180Z
Learning: In the Wren AI codebase, there's an intentional distinction between dbt connection types and Wren AI data source types: dbt uses "sqlserver" as the connection type, while Wren AI expects "mssql" as the data source type. The conversion layer in convertConnectionToDataSource() correctly matches "sqlserver" from dbt profiles and the WrenMSSQLDataSource.GetType() method correctly returns "mssql" for Wren AI compatibility.

Applied to files:

  • wren-launcher/commands/dbt/data_source.go
📚 Learning: 2025-07-09T02:43:07.560Z
Learnt from: goldmedal
PR: Canner/WrenAI#1827
File: wren-launcher/commands/dbt/data_source.go:154-156
Timestamp: 2025-07-09T02:43:07.560Z
Learning: In the WrenAI codebase, the MapType methods for data sources should return the original type name for unknown types, rather than a default fallback like "string". This maintains consistency across all data source implementations.

Applied to files:

  • wren-launcher/commands/dbt/data_source.go
🧬 Code Graph Analysis (1)
wren-launcher/commands/dbt/data_source.go (1)
wren-launcher/commands/dbt/profiles.go (1)
  • DbtConnection (16-40)
🔇 Additional comments (7)
wren-launcher/commands/dbt/data_source.go (7)

4-7: Imports for base64/json/os are appropriate

All three are used by the BigQuery converter. No issues.


83-85: Correct dispatch to BigQuery converter with dbtHomePath threading

Routing “bigquery” here and passing dbtHomePath enables proper resolution of relative keyfile paths downstream. Looks good.


157-165: Good: validate service account JSON before base64 encoding

The encodeJSON helper prevents encoding malformed credentials and keeps the original bytes for encoding. Solid approach.


166-183: Service-account-json flow is correct

Pulls from Additional["keyfile_json"], validates, and encodes. Clear error when missing. LGTM.


226-232: Graceful handling of unsupported oauth methods

Warning + skip aligns with project preference to continue processing on unsupported configurations.


234-240: BigQuery data source construction aligns with Wren engine schema

Fields and JSON tags (“project_id”, “dataset_id”, “credentials”) match the documented shape.


333-344: Validation covers required fields

Trimming and checking project_id, dataset_id, credentials ensures correctness at emit-time.

@cougrimes cougrimes requested a review from goldmedal August 19, 2025 18:44
test: Adjusted data source testing
fix: Go fmt linting
@goldmedal
Copy link
Collaborator

hi @cougrimes
There are some lint checking and format checking failures. Could you check them?
You can use the following command on the local for checking them.

make check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants