Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Customizable column names and extra config placeholder #127

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

hrfmartins
Copy link

In this PR I implemented a placeholder for extra configurations for DQEngine. I also included customizable column names to replace the custom names.

Changes

  • New constructor for DQEngine
  • Turned get_invalid and get_valid to static, in order to use the internal naming resolution and avoid passing references around.

Linked issues

Resolves #46

Tests

  • manually tested
  • added unit tests
  • added integration tests

@hrfmartins hrfmartins requested a review from a team as a code owner January 23, 2025 13:46
@hrfmartins hrfmartins requested review from nehamilak-db and removed request for a team January 23, 2025 13:46
@mwojtyczka
Copy link
Contributor

mwojtyczka commented Jan 23, 2025

@hrfmartins thank you for the contribution. Really appreciated. I have a few requests. Can you please sign your commits? We require all commits to be signed with GPG key. Please also run make fmt to format the code and run pylint to make sure the build works before commiting.
See more here: https://databrickslabs.github.io/dqx/docs/dev/contributing/#first-contribution

We currently have an issue with running integration tests triggered from forks. Your PR may be blocked at the moment.

@hrfmartins hrfmartins force-pushed the feature/customizable-column-names branch 7 times, most recently from dbc3dd2 to a335d2a Compare January 23, 2025 15:12
@hrfmartins
Copy link
Author

@mwojtyczka Signing with GPG done and lint + fmt ran and issues fixed :) Sorry for the inconvenience.

Is there anything I can/need to do about the fork issue? Thank you

@mwojtyczka
Copy link
Contributor

mwojtyczka commented Jan 24, 2025

@mwojtyczka Signing with GPG done and lint + fmt ran and issues fixed :) Sorry for the inconvenience.

Is there anything I can/need to do about the fork issue? Thank you

Thank you! We are working on fixing the fork issue. Will keep you posted.

tests/integration/test_apply_checks.py Show resolved Hide resolved
tests/integration/test_apply_checks.py Show resolved Hide resolved
tests/integration/test_apply_checks.py Outdated Show resolved Hide resolved
@hrfmartins hrfmartins force-pushed the feature/customizable-column-names branch from ed300d9 to a09b0fe Compare February 1, 2025 11:00
@hrfmartins hrfmartins requested a review from mwojtyczka February 2, 2025 08:30
Copy link
Contributor

@mwojtyczka mwojtyczka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hrfmartins
Can you please extend the existing guide to show how to customize the reporting columns. Perhaps a new section, sth like "Additional configuration" before the custom checks:
https://github.com/databrickslabs/dqx/blob/main/docs/dqx/docs/guide.mdx#quality-rules-and-creation-of-custom-checks

Can you please also extend demos, probably a new cell here:
https://github.com/databrickslabs/dqx/blob/main/demos/dqx_demo_library.py#L286

tests/integration/test_apply_checks.py Outdated Show resolved Hide resolved
tests/integration/test_apply_checks.py Outdated Show resolved Hide resolved
tests/integration/test_apply_checks.py Outdated Show resolved Hide resolved
@hrfmartins
Copy link
Author

@mwojtyczka Just added documentation and a new entry on the notebook of the demos.
Also added the missing test that was needed 😄

dq_engine = DQEngine(ws, extra_params=ExtraParams(column_names={'errors': 'ERROR', 'warnings': 'WARN'}))
test_df = spark.createDataFrame([[1, 3, 3], [2, None, 4], [None, 4, None], [None, None, None]], SCHEMA)

checks = [{"criticality": "warn", "check": {"function": "col_test_check_func", "arguments": {"col_name": "a"}}}]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this test more focused. Please use regular check not a custom one. Currently col_test_check_func is used which is a custome check.

tests/integration/test_apply_checks.py Outdated Show resolved Hide resolved
tests/integration/test_apply_checks.py Outdated Show resolved Hide resolved


checks = [
{"criticality": "warn", "check": {"function": "col_test_check_func", "arguments": {"col_name": "a"}}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use one of standard checks, not a custom check.

tests/integration/test_apply_checks.py Outdated Show resolved Hide resolved
docs/dqx/docs/guide.mdx Outdated Show resolved Hide resolved
docs/dqx/docs/guide.mdx Outdated Show resolved Hide resolved
docs/dqx/docs/guide.mdx Outdated Show resolved Hide resolved
docs/dqx/docs/guide.mdx Outdated Show resolved Hide resolved
docs/dqx/docs/guide.mdx Outdated Show resolved Hide resolved
@mwojtyczka
Copy link
Contributor

@mwojtyczka Just added documentation and a new entry on the notebook of the demos. Also added the missing test that was needed 😄

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE]: Make error and warning column names configurable
2 participants