All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
NOTE: For CLI interfaces, we support SemVer approach. However, for API components we don't use SemVer as of now. This may lead to instability when using dbx API methods directly.
Please read through the Keep a Changelog (~5min).
Unreleased changes must be tracked above this line. When releasing, Copy the changelog to below this line, with proper version and date. And empty the [Unreleased] section above.
- 📖 documentation on the dependency management
- ✨ failsafe switch for assets-based shared job clusters
- 🎨 404 page in docs is now rendered correctly
- ✏️ Small typos in the docs
- ✏️ Reference structures for
libraries
section - 🔗 Broken links in the docs
- 📖 documentation on the integration tests
- ♻️ refactored poetry build logic
- 📖 indents in quickstart doc
- 📝 add integration tests to the quickstart structure
- ✨ add pip install extras option
- 🎨 Nice spinners for long-running processes (e.g. cluster start and run tracing)
- 🧪 Add convenient integration tests interface example
- 📖 Small typos in Jinja docs
- 📖 Formatting issues in cluster types doc
- 🐛 bug with context provisioning for
dbx execute
- ⚡️
dbx destroy
command - ☁️ failsafe behaviour for shared clusters when assets-based launch is used
- 📖 Documentation with cluster types guidance
- 📖 Documentation with scheduling and orchestration links
- 📖 Documentation for mixed-mode projects DevOps
- ✨Add
.dbx/sync
folder to template gitignore - ✨Changed the dependencies from the
mlflow
to a more lightweightmlflow-skinny
option - ✨Added suppression for too verbose
click
stacktraces - ⚡️added
execute_shell_command
fixture, improving tests performance x2 - ⚡️added failsafe check for
get_experiment_by_name
call
- 🎨Switch all the CLI interfaces to
typer
- ✨Add
workflow-name
argument todbx deploy
,dbx launch
anddbx execute
- ✨Add
--workflows
argument todbx deploy
- ✨Add
--assets-only
and--from-assets
as a clearer replacement for old arguments - ⚡️Add support for
--environment
parameter fordbx sync
commands - ✨Add flexible parameter overriding logic for
dbx execute
via new--parameters
option - ✨Add flexible parameter overriding logic for
dbx launch
via new--parameters
option (RunNow API) - ✨Add flexible parameter overriding logic for
dbx launch
via new--parameters
option (RunSubmit API) - ✨Add inplace Jinja support for YAML and JSON files, can be configured via
dbx configure --enable-inplace-jinja-support
- ✨Add build logic options for
pip
,poetry
andflit
- ✨Add build logic customization with
build.commands
section - ✨Add support for custom Python functions in Jinja templates
- ✨Arguments
--allow-delete-unmatched
/--disallow-delete-unmatched
were replaced with--unmatched-behaviour
option. - 🏷️Deprecate
jobs
section and rename it toworkflows
- 🏷️Deprecate
job
andjobs
options and rename it toworkflow
argument - ✨Refactored all cluster-relevant methods into a separate
ClusterController
- ✨Refactored model-related components for
.dbx/project.json
file - ✨Refactored
launch
-related API-level code - ⚡️Deleted
autouse
oftemp_project
fixture to speedup the tests - 🚩Deprecate
--files-only
and--as-run-submit
options - 🚩Deprecate
--files-only
and--as-run-submit
options - 🚩Delete the Azure Data Factory-related functionality. Unfortunately we're unable to make this integration stable and secure due to resource lack and lack of RunNow API.
- 💎Documentation framework changed from
sphinx
tomkdocs
- 💎Documentation has been heavily re-worked and improved
- 🐛
dbx sync
now takes into accountHTTP(S)_PROXY
env variables - 🐛empty task parameters are now supported
- 🐛ACLs are now properly updated for Jobs API 2.1
--jinja-variables-file
fordbx execute
- Support
jobs_api_version
values provided by config inApiClient
construction - References and wording in the Python template
- Callback issue in
--jinja-variables-file
fordbx deploy
- Added support for
python_wheel_task
indbx execute
- Error in case when
.dbx/project.json
is non-existent - Error in case when
environment
is not provided in the project file - Path usage when
--upload-via-context
on win platform
- Additional
sync
command options (--no-use-gitignore
,--force-include
, etc.) for more control over what is synced. - Additional
init
command option--template
was added to allow using dbx templates distributed as part of python packages. - Refactored the
--deployment-file
option for better modularity of the code - Add upload via context for
dbx execute
- Tasks naming in tests imports for Python template
- Task naming and references in the Python template
- Small typo in Python template
- Rename
workloads
totasks
in the Python package template - Documentation structure has been refactored
- Option (
--include-output
) to include run stderr and stdout output to the console output - Docs describing how-to for Python packaging
- New option for Jinja-based deployment parameter passing from a YAML file (
--jinja-variables-file
) - Support for multitask jobs in
dbx execute
- Local build command now produces only one file in the
dist
folder
- Add
dist
directory cleanup before core package build - Add
--job-run-log-level
option todbx launch
to retrieve log after trace run
- Separate
unit-requirements.txt
file has been deleted from the template
RunSubmit
based launch when cloud storage is used as an artifact location
- Module-based interface for launching commands in Azure Pipelines
- All invocations in Azure Pipelines template are now module-based (
python -m ...
)
- Fix auth ordering (now env-variables based auth has priority across any other auth methods)
- Fix import issues in
dbx.api.storage
package
- Added dev container config for VSCode and GitHub CodeSpaces
- tests are now parallel (x2 less time spent per each CI pipeline launch)
- url-strip behaviour for old-format workspace host names (which was unsupported in Mlflow API and caused a lot of hardly explainable errors)
- Docs fixed in terms of allowed versions
- Non-strict path adjustment policy has been deleted from code and docs
- Dropped support for environment variables in plain JSON/YAML files
- Refactored code for reading configurations
- Drop support for
ruamel.yaml
in favor of standardpyyaml
- All tests are now based on pytest
- Full support for env variables in Jinja-based deployment configs
- Documentation improvements for Jinja-based templates
- Now package builds are performed with
pip
by default
- Parsing of
requirements.txt
has been improved to properly handle comments in requirements files - Recognition of
--branch-name
argument fordbx launch
- Path resolution for Jinja2 templates
- YAML Example for deploying multi-task Python job
- YAML Example for deploying multi-task Scala job
- Support including jinja templates from subpaths of the current working directory
- Add
--path
and--checkout
options to thedbx init
- Change the format of the
python_basic
to use pytest - Add
sync repo
andsync dbfs
commands for syncing local files to Databricks and watching for changes.
- Refactor the configuration code
- Refactor the JSON-related code
- Jinja2-based file recognition behaviour
- Documentation, examples and support for Jobs API 2.1
- Support for Jinja2-based templates inside deployment configuration
- Added new
--job
argument to deploy command for a single-job deploy and convenience
- Issue with empty paths in non-strict path adjustment logic
- Issues with
--no-package
argument for multi-task jobs - Issues with named properties propagation for Jobs API 2.1
- path resolution on win platforms
- Provided bugfix for non-DBFS based mlflow artifact locations
- CI pipeline on win platform
- Provided bugfix for job/task name references in the deployment configuration
- Recognition of
conf/deployment.yml
file from conf directory as a default parameter - Remove unnecessary references of
conf/deployment.yml
in CI pipelines
- Upgraded minimal
mlflow
version to 1.23 - Upgraded minimal
databricks-cli
version to 0.16.2 - Upgraded minimal requirements for Azure Data Factory dependent libraries
- Provided bugfix for emoji-based messages in certain shell environments
- Provided bugfix for cases when not all jobs are listed due to usage of Jobs API 2.1
- Provided bugfix for cases when file names are reused multiple times
- Provided bugfix for cases when
policy_name
argument needs to be applied on the tasks level - Provided bugfix for ADF integration that deleted pipeline-level properties
- Add support for named property of the driver instance pool name
- Add support for built-in templates and project initialization via :code:
dbx init
- Provided bugfix for named property resolution in multitask-based jobs
- Update the contribution docs with CLA
- Update documentation about environment variables
- Add support for named job properties
- Add support for
spark_jar_task
in Azure Data Factory reflector
- Provide bugfix for strict path resolving in the execute command
- Provide bugfix for Azure Datafactory when using
existing_cluster_id
- Update
databricks-cli
dependency to 0.16.2 - Improved code coverage
- Added support for environment variables in deployment files
- Fixed minor bug in exception text
- Provide a bugfix for execute issue
- Removed pydash from package dependencies, as it is not used. Still need it as a dev-requirement.
- Added support for multitask jobs.
- Added more explanations around DATABRICKS_HOST exception during API client initialization
- Add strict path adjustment policy and FUSE-based path adjustment
- Fix issue which stripped non-pyspark libraries from a requirements file during deploys.
- Fix issue which didn't update local package during remote execution.
- Support for yaml-based deployment files.
- Now dbx finds the git branch name from any subdirectory in the repository.
- Minor alterations in the documentation.
- Altered the Changelog based on Keep a Changelog
- Changed(for contributors): Makefile now requires pyenv.
- Changed(for contributors): Makefile is more self describing and self-sufficient.
make clean install
will set you up with all that is needed.make help
to see all available commands.
- Fix issue with execute parameters passing
- Fix issue with multi-version package upload
- Add explicit exception for artifact location change
- Add experimental support for fixed properties' propagation from cluster policies
- Added Run Submit API support.
- Fixed the issue with pywin32 installation for Azure imports on win platforms.
- Integration with Azure Data Factory.
- Some small internal behaviour fixes.
- Changed the behaviour of
dbx deploy --write-specs-to-file
, to make the structure of specs file compatible with environment structure.
- Added integrated permission management, please refer to documentation for details.
- Added
--write-specs-to-file
option fordbx deploy
command.
- HotFix for execute command.
- Made Internal refactorings after code coverage analysis.
- Fixed issue with job spec adjustment.
- Finalized the CI setup for the project.
- No code changes were done.
- Release is required to start correct numeration in pypi.
- Initial public release version.