An Expectation Suite in Great Expectations is a collection of expectations about your data. Each expectation is a rule that your data should adhere to. An Expectation Suite allows you to validate your data and make sure it meets your requirements.
Initiate by creating a .json
file that will store your expectations.
touch myexpectations.json
Then, populate the file with the basic structure as follows:
{
"data_asset_type": null,
"expectation_suite_name": "your_expectation_name",
"expectations": []
}
Within "expectations": [], you'll insert the following code for each test you plan to execute, so if u have 2 tests, the content of your expectations list should look like this:
{
"expectation_type": "PLACEHOLDER1 ",
"kwargs": {
"column": "PLACEHOLDER1"
},
"meta": {}
},
{
"expectation_type": "PLACEHOLDER2",
"kwargs": {
"column": "PLACEHOLDER2"
},
"meta": {}
},
Each expectation in your suite will be represented as a JSON object, comprising an expectation_type
and a kwargs
object. The kwargs
object describes the specific parameters of the expectation.
{
"data_asset_type": null,
"expectation_type": "<Expectation Test Name Here>",
"kwargs": {
"column": "<Column Name Here>",
...
}
}
The kwargs
object in each expectation is where you define the specific parameters for the expectation. Different expectations have different kwargs
, but some common ones include:
column
: The column in your dataset that the expectation is applied to.min_value
andmax_value
: The minimum and maximum values for the column, used in expectations likeexpect_column_values_to_be_between
.mostly
: A percentage that determines how many values in the column must pass the expectation for it to be considered successful. For instance, amostly
value of 0.95 means the expectation must pass for 95% of the values.strict_min
andstrict_max
: Determine whether themin_value
andmax_value
are inclusive or exclusive.value_set
: A list of values that a column's values should match, used in expectations likeexpect_column_values_to_be_in_set
.
{
"expectation_type": "expect_column_values_to_be_between",
"kwargs": {
"column": "<Your Column Name Here>",
"min_value": <Your Minimum Value Here>,
"max_value": <Your Maximum Value Here>,
"mostly": <Your 'Mostly' Value Here>,
"strict_min": <True/False>,
"strict_max": <True/False>
}
}
You can also include metadata about your Expectation Suite or individual expectations using the meta
object. This can be useful for adding notes or comments.
"meta": {
"notes": {
"format": "markdown",
"content": ["<Your Notes or Comments Here>"]
}
}
Repeat the process of creating and customizing expectations for each expectation you want to include in your suite.
In the end, your Expectation Suite might look something like this:
{
"expectation_suite_name": "<Your Expectation Suite Name Here>",
"expectations": [
{
"expectation_type": "expect_column_values_to_be_between",
"kwargs": {
"column": "<Your Column Name Here>",
"min_value": <Your Minimum Value Here>,
"max_value": <Your Maximum Value Here>,
"mostly": <Your 'Mostly' Value Here>,
"strict_min": <True/False>,
"strict_max": <True/False>
},
"meta": {
"notes": {
"format": "markdown",
"content": ["<Your Notes or Comments Here>"]
}
}
},
...
]
}
You can add as many expectations to your suite as you need, each with its own specific kwargs
.
Lastly, save your JSON document in an /expectations
directory and use the filename (without .json
) in the ExpectationSuiteName
field to implement your suite with Easy GE
.
Source:
Name: your_report_name
Processor: Pandas
Properties:
InMemory:
DataFrameName: your_declared_dataframe
Backend:
ExpectationSuiteName: myexpectations
Filesystem:
WorkDir: /path/to/your/local/workdir
Report:
NamingRegex: "%Y%m%d%H%M-report"
Outputs:
GenerateDocs: True
And that's it! You've created an Expectation Suite in Great Expectations. Remember, the actual expectations and parameters you use will depend heavily on your specific data and use case. Visit the Expectations Gallery for a comprehensive list of available Expectations and their options to employ in your tests.