Skip to content

Commit

Permalink
docs: Updated references
Browse files Browse the repository at this point in the history
  • Loading branch information
vijayvammi committed Feb 14, 2024
1 parent bba76ad commit 557393e
Show file tree
Hide file tree
Showing 6 changed files with 42 additions and 34 deletions.
9 changes: 0 additions & 9 deletions docs/extensions.md
Original file line number Diff line number Diff line change
Expand Up @@ -201,12 +201,3 @@ Example:
show_symbol_type_heading: true
members: None
heading_level: 3


## Roadmap

- AWS environments using Sagemaker pipelines or AWS step functions.
- HPC environment using SLURM executor.
- Database based Run log store.
- Better integrations with experiment tracking tools.
- Azure ML environments.
25 changes: 25 additions & 0 deletions docs/roadmap.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
## AWS environments

Bring in native AWS services to orchestrate workflows. The stack should be:

- AWS step functions.
- Sagemaker jobs - Since they can take dynamic image name, AWS batch needs job definition and can be tricky.
- S3 for Run log and Catalog: Already tested and working prototype.
- AWS secrets manager: Access to AWS secrets manager via the RBAC of the execution role.


## HPC environment using SLURM executor.

- Without native orchestration tools, the preferred way is to run it as local but use SLURM to schedule jobs.

## Database based Run log store.

## Better integrations with experiment tracking tools.

Currently, the implementation of experiment tracking tools within magnus is limited. It might be better to
choose a good open source implementation and stick with it.


## Model registry service

Could be interesting to bring in a model registry to catalog models.
13 changes: 7 additions & 6 deletions docs/why-magnus.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ straightforward implementation for task orchestration. Nonetheless, due to their
design, orchestrating the flow of data—whether parameters or artifacts—can introduce complexity and
require careful handling.

Magnus simplifies this aspect by introducing an intuitive mechanism for data flow, thereby
streamlining data management. This approach allows the orchestrators to focus on their core
Magnus simplifies this aspect by introducing an [intuitive mechanism for data flow](/example/dataflow),
thereby streamlining data management. This approach allows the orchestrators to focus on their core
competency: allocating the necessary computational resources for task execution.

### Local first
Expand All @@ -22,8 +22,8 @@ In the context of the project's proof-of-concept (PoC) phase, the utilization of
experimentation. Data scientists require an environment that aligns with their established workflows,
which is most effectively achieved through the use of local development tools.

Magnus serves as an intermediary stage, simulating the production environment by offering local
versions of essential services—such as execution engines, data catalogs, secret management, and
Magnus serves as an intermediary stage, simulating the production environment by offering [local
versions](/configurations/overview/) of essential services—such as execution engines, data catalogs, secret management, and
experiment tracking—without necessitating intricate configuration. As the project transitions into the
production phase, these local stand-ins are replaced with their robust, production-grade counterparts.

Expand All @@ -40,8 +40,9 @@ experimentation, thus impeding iterative research and development.


Magnus is engineered to minimize the need for such extensive refactoring when operationalizing
projects. It achieves this by allowing tasks to be defined as simple Python functions or Jupyter
notebooks. This means that the research-centric components of the code can remain unchanged, avoiding
projects. It achieves this by allowing tasks to be defined as [simple Python functions](/concepts/task/#python_functions)
or [Jupyter notebooks](/concepts/task/#notebook). This means that the research-centric components of the code
can remain unchanged, avoiding
the need for immediate refactoring and allowing for the postponement of these efforts until they
become necessary for the long-term maintenance of the product.

Expand Down
2 changes: 1 addition & 1 deletion magnus/interaction.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ def get_parameter(key: Optional[str] = None, cast_as: Optional[CastT] = None) ->
Args:
key (str, optional): The key of the parameter to retrieve. If not provided, all parameters will be returned.
cast_as (Type, optional): The type to cast the parameter to. If not provided, the type will remain as it is
for simple data types (int, float, bool, str). For nested parameters, it would be a dict.
for simple data types (int, float, bool, str). For nested parameters, it would be a dict.
Raises:
Exception: If the parameter does not exist and key is not provided.
Expand Down
25 changes: 8 additions & 17 deletions magnus/sdk.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ class Catalog(BaseModel):
Use to instruct a task to sync data from/to the central catalog.
Please refer to [concepts](../../concepts/catalog) for more information.
Args:
Attributes:
get (List[str]): List of glob patterns to get from central catalog to the compute data folder.
put (List[str]): List of glob patterns to put into central catalog from the compute data folder.
Expand Down Expand Up @@ -112,21 +112,17 @@ class Task(BaseTraversal):
An execution node of the pipeline.
Please refer to [concepts](../../concepts/task) for more information.
Args:
Attributes:
name (str): The name of the node.
command (str): The command to execute.
- For python functions, [dotted path](../../concepts/task/#python_functions) to the function.
- For shell commands: command to execute in the shell.
- For notebooks: path to the notebook.
command_type (str): The type of command to execute.
Can be one of "shell", "python", or "notebook".
catalog (Optional[Catalog]): The catalog to sync data from/to.
Please see Catalog about the structure of the catalog.
overrides (Dict[str, Any]): Any overrides to the command.
Individual tasks can override the global configuration config by referring to the
specific override.
Expand All @@ -141,23 +137,18 @@ class Task(BaseTraversal):
overrides:
custom_docker_image:
docker_image: "magnus/magnus:custom"
```
### Task specific configuration
```python
task = Task(name="task", command="echo 'hello'", command_type="shell",
overrides={'local-container': custom_docker_image})
```
notebook_output_path (Optional[str]): The path to save the notebook output.
Only used when command_type is 'notebook', defaults to command+_out.ipynb
optional_ploomber_args (Optional[Dict[str, Any]]): Any optional ploomber args.
Only used when command_type is 'notebook', defaults to {}
output_cell_tag (Optional[str]): The tag of the output cell.
Only used when command_type is 'notebook', defaults to "magnus_output"
terminate_with_failure (bool): Whether to terminate the pipeline with a failure after this node.
terminate_with_success (bool): Whether to terminate the pipeline with a success after this node.
on_failure (str): The name of the node to execute if the step fails.
Expand Down Expand Up @@ -208,7 +199,7 @@ class Stub(BaseTraversal):
A stub node can tak arbitrary number of arguments.
Please refer to [concepts](../../concepts/stub) for more information.
Args:
Attributes:
name (str): The name of the node.
terminate_with_failure (bool): Whether to terminate the pipeline with a failure after this node.
terminate_with_success (bool): Whether to terminate the pipeline with a success after this node.
Expand All @@ -231,7 +222,7 @@ class Parallel(BaseTraversal):
A node that executes multiple branches in parallel.
Please refer to [concepts](../../concepts/parallel) for more information.
Args:
Attributes:
name (str): The name of the node.
branches (Dict[str, Pipeline]): A dictionary of branches to execute in parallel.
terminate_with_failure (bool): Whether to terminate the pipeline with a failure after this node.
Expand Down Expand Up @@ -260,7 +251,7 @@ class Map(BaseTraversal):
A node that iterates over a list of items and executes a pipeline for each item.
Please refer to [concepts](../../concepts/map) for more information.
Args:
Attributes:
branch: The pipeline to execute for each item.
iterate_on: The name of the parameter to iterate over.
Expand Down Expand Up @@ -308,7 +299,7 @@ class Success(BaseModel):
Most often, there is no need to use this node as nodes can be instructed to
terminate_with_success and pipeline with add_terminal_nodes=True.
Args:
Attributes:
name (str): The name of the node.
"""

Expand All @@ -330,7 +321,7 @@ class Fail(BaseModel):
Most often, there is no need to use this node as nodes can be instructed to
terminate_with_failure and pipeline with add_terminal_nodes=True.
Args:
Attributes:
name (str): The name of the node.
"""

Expand All @@ -349,7 +340,7 @@ class Pipeline(BaseModel):
"""
A Pipeline is a directed acyclic graph of Steps that define a workflow.
Args:
Attributes:
steps (List[Stub | Task | Parallel | Map | Success | Fail]): A list of Steps that make up the Pipeline.
start_at (Stub | Task | Parallel | Map): The name of the first Step in the Pipeline.
name (str, optional): The name of the Pipeline. Defaults to "".
Expand Down
2 changes: 1 addition & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
site_name: Magnus
site_description: "Pipelines made easy"
strict: false
strict: true
repo_url: https://github.com/AstraZeneca/magnus-core

# TODO: Set up versioning
Expand Down

0 comments on commit 557393e

Please sign in to comment.