Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

analyzers/checks to generate metrics based on certain dimensions/group by columns #388

Open
obaidcuet opened this issue Oct 9, 2021 · 1 comment

Comments

@obaidcuet
Copy link

obaidcuet commented Oct 9, 2021

Hello,

Does deequ has a feature to run analyzers or checks to generate metrics based on certain dimensions/group by columns?

Like how we do in SQL like below:

select count(id), count(distinct id) 
from mytable
group by country, state;

In deequ, I am expecting something like below:

val analysisResult: AnalyzerContext = { AnalysisRunner
  // data to run the analysis on
  .onData(df)
  // domensions/columns to group to generate metrics for each group insted of full table
  .groupingColumns(Seq("country", "state")) 
  // define analyzers that compute metrics
  .addAnalyzer(Size())
  .addAnalyzer(Completeness("id"))
  .addAnalyzer(Distinctness("id")) 
  .addAnalyzer(Compliance("employee_name : proper size", "length(trim(employee_name)) > 1")) 
  // compute metrics
  .run()
}

Sorry if I missed something. I checked all the documentation, couldn't able to find an example.

-Obaid

@obaidcuet
Copy link
Author

obaidcuet commented Oct 10, 2021

Hi,

I could see a work in progress #384

Another similar issue #381. From here I can see not possible using current API.

There is some code that can be used as a workaround, but will be hard to manage for high number of grouped values:

val completenessStateNA = completeness.computeStateFrom(partitionNA)

Are there any other references?

-Obaid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant