A new way to define parametrizations #132
-
IntroductionParametrizations are a powerful tool to avoid code duplication and scale tasks up. ProblemCurrently, parametrizations have some flaws.
AnswersCandidate solutions are posted below as answers which can, then, be discussed in the thread below. Your answer does not need to address all problems, but can also address part of the problems. You can also start a thread to discuss your experience with parametrizations, what you find difficult, what works well, what use-case is not easily supported. Your post does not have to include a resolution. Descriptions and questions are great! References
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
This idea evolved in a discussion with 0az. The approach allows to parametrize tasks with functions while providing an "intuitive" interface. The main idea is to define a dictionary a the module level whose name starts with For example, assume you have multiple data sets and there are different functions to plot the data. Here are the functions. def plot_histogram_of_all_variables(depends_on, produces, plot_kwargs, output_format):
...
def plot_kde_of_all_variables(depends_on, produces, plot_kwargs, output_format):
... Next, we define the parametrization. task_dictionary = {
f"task_{data_name}_{plot_name}": {
"function": function,
"depends_on": path_to_data(data_name),
"produces": path_to_figure(data_name, plot_name),
}
for data_name in DATA
for plot_name, function in [
("hist", plot_histogram_of_all_variables), ("kde", plot_kde_of_all_variables)
]
} Additional keys in a task dictionary are considered to be keyword arguments to the specific task function, e.g. we set the number of bins to a certain value for the histogram plots. Global kwargs are assumed apply to all tasks. For example, we only want to generate pngs of a certain size. task_dictionary = {
f"task_{data_name}_{plot_name}": {
"function": function,
"depends_on": path_to_data(data_name),
"produces": path_to_figure(data_name, plot_name),
"plot_kwargs": {"bins": 20} if plot_name == "hist" else {}
}
for data_name in DATA
for plot_name, function in [
("hist", plot_histogram_of_all_variables), ("kde", plot_kde_of_all_variables)
]
}
task_dictionary["output_format"] = "png" MarkersMarkers can be added as usual by applying the decorators to the task functions. Another way is to use the special task_dictionary = {
f"task_{data_name}_{plot_name}": {
"function": function,
...,
# KDEs cannot be computed on Windows. Who would have thought that!
"marks": pytask.mark.skipif(ON_WINDOWS) if plot_name == "kde" else []
}
for data_name in DATA
for plot_name, function in [
("hist", plot_histogram_of_all_variables), ("kde", plot_kde_of_all_variables)
]
}
# All tasks should persist.
task_dictionary["marks"] = pytask.mark.persist
|
Beta Was this translation helpful? Give feedback.
-
There is now a loop-based approach to parametrizations which basically solves all issues: https://pytask-dev.readthedocs.io/en/stable/tutorials/repeating_tasks_with_different_inputs.html |
Beta Was this translation helpful? Give feedback.
There is now a loop-based approach to parametrizations which basically solves all issues: https://pytask-dev.readthedocs.io/en/stable/tutorials/repeating_tasks_with_different_inputs.html