Skip to content

dbx deploy: named_parameters dictionary gets an extraneous "named_parameters" entry when renaming file: paths #855

@angus-dawson-idexx

Description

@angus-dawson-idexx

Expected Behavior

A deployment with dbx deploy [...] --write-specs-to-file=spec.json with a named_parameters definition like so:

named_parameters:
  conf-file: "file:fuse://path/to/some-file.yaml"

Should result in a specs file with the following named_parameters object:

"named_parameters": {
  "conf-file": "/dbfs/[...]/artifacts/path/to/some-file.yaml",
}

And only conf-file should be created as a keyword argument parameter in the workflow task on the web GUI.

Current Behavior

Instead, I get this:

"named_parameters": {
  "conf-file": "/dbfs/[...]/artifacts/path/to/some-file.yaml",
  "named_parameters": "/dbfs/[...]/artifacts/path/to/some-file.yaml"
}

And the workflow task on the web GUI ends up with two keyword argument parameters: conf-file and named_parameters.

Steps to Reproduce (for bugs)

Run a dbx deploy with a workflow that has python_wheel_task tasks with named_parameters, at least one of which has a value that starts with file:// or file:fuse://. Then check the keyword argument parameters of the task in the web GUI.

Context

I've traced the problem to this function:

def file_traverse(self, workflows, file_adjuster: FileReferenceAdjuster):
for element, parent, index in self.traverse(workflows):
if isinstance(element, str):
if element.startswith("file://") or element.startswith("file:fuse://"):
file_adjuster.adjust_file_ref(element, parent, index)

And this part in PropertyAdjuster.traverse():

if isinstance(_object, dict):
for key in list(_object.keys()):
item = _object[key]
yield item, _object, key
for _out in self.traverse(item, _object, index_in_parent):
yield _out

After yielding the correct tuple for conf-file:

('file:fuse://path/to/config.yaml', {'conf-file': 'file:fuse://path/to/config.yaml'}, 'conf-file')

It then attempts to traverse item with index_in_parent, which is named_parameters, and since item is a string, traverse jumps here and terminates:

yield _object, parent, index_in_parent

And yields essentially a duplicate tuple except with the wrong index_in_parent:

('file:fuse://path/to/config.yaml', {'conf-file': 'file:fuse://path/to/config.yaml'}, 'named_parameters')

Your Environment

  • dbx version used: 0.8.18
  • Databricks Runtime version: 12.2.x-scala2.12

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions