Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pipelines
Pipelines provide a versatile API for automating tasks efficiently. Below some key features and best practices:
1. Reproducibility
Initialization and Configuration: Pipelines are initialized using the
__init__
method, allowing configuration of common elements. All parameters passed to the class constructor are stored in theself.hparams
dictionary, facilitating reproducibility and serialization. Additionally, theignore
parameter in the__init__
method allows exclusion of specific parameters, enhancing reproducibility by avoiding the storage of non-essential or large parameters. For example:ID and Working Directory: Each pipeline instance is assigned a unique identifier (
id
) upon initialization, aiding in tracking and identification. Additionally, pipelines have a designated working directory for organizing generated files, though it doesn't alter Python's working directory. Example:Public Interface: Pipelines offer the
run
method as the public interface for execution. Therun
method encapsulates the pipeline's logic and returns the output. Note that,run
is the only method that should be called directly by users. For your own version of pipeline, you should override_run
method (that is called fromrun
) Example:Besides a result, the
run
method can also set public attributes of the pipeline instance. These attributes are implemented as read-only properties, ensuring a consistent state during execution. For instance, the code below:The public attributes are
seed
that is set during the pipeline run.2. Composition
For instance, consider the minimal example following example:
In this example, we have two pipelines:
Distance
andSumOfDistances
. TheDistance
pipeline calculates the distance between two points based on a specified norm. TheSumOfDistances
pipeline calculates the sum of distances between multiple points and adds a constant value. TheSumOfDistances
pipeline uses theDistance
pipeline as a component, demonstrating pipeline composition.3. Integration with CLI
Seamless CLI Integration: Pipelines integrate seamlessly with
jsonargparse
, enabling the creation of command-line interfaces (CLI) for easy configuration and execution. Configuration can be provided via YAML files or directly through CLI run arguments, enhancing user accessibility. Examples of CLI usage withjsonargparse
are provided. For instance, we can use theCLI
class to run a pipeline with arguments:Or write an YAML file for some of the parameters
And run the pipeline with the YAML file:
Or write an YAML file for all the parameters
And run the pipeline with the YAML file:
And we can run from shell:
Or the YAML file:
4. Logging and Monitoring
Execution Log: Pipelines maintain a log of their executions, providing a comprehensive record of activities. The
status
property offers insights into the pipeline's state, from creation to completion, facilitating monitoring and troubleshooting. Example:5. Clonability
Cloning Pipelines: Pipelines are cloneable, enabling the creation of independent instances from existing ones. The
clone
method initializes a deep copy, providing a clean slate for each clone. Example:Note that some attributes, such as
id
, are unique to each pipeline instance and are updated during cloning to maintain uniqueness.6. Parallel and Distributed Environments
Parallel Execution: Pipelines support parallel execution, enabling faster processing of tasks and efficient resource utilization.
Distributed Execution: Pipelines can be executed in a distributed manner, suitable for deployment on clusters to leverage distributed computing resources effectively. This scalability enhances performance in large-scale processing environments.