You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Does deequ has a feature to run analyzers or checks to generate metrics based on certain dimensions/group by columns?
Like how we do in SQL like below:
select count(id), count(distinct id)
from mytable
group by country, state;
In deequ, I am expecting something like below:
val analysisResult: AnalyzerContext = { AnalysisRunner
// data to run the analysis on
.onData(df)
// domensions/columns to group to generate metrics for each group insted of full table
.groupingColumns(Seq("country", "state"))
// define analyzers that compute metrics
.addAnalyzer(Size())
.addAnalyzer(Completeness("id"))
.addAnalyzer(Distinctness("id"))
.addAnalyzer(Compliance("employee_name : proper size", "length(trim(employee_name)) > 1"))
// compute metrics
.run()
}
Sorry if I missed something. I checked all the documentation, couldn't able to find an example.
-Obaid
The text was updated successfully, but these errors were encountered:
Hello,
Does deequ has a feature to run analyzers or checks to generate metrics based on certain dimensions/group by columns?
Like how we do in SQL like below:
In deequ, I am expecting something like below:
Sorry if I missed something. I checked all the documentation, couldn't able to find an example.
-Obaid
The text was updated successfully, but these errors were encountered: