diff --git a/RELEASE.md b/RELEASE.md index 0e132d4275..79828bb9e2 100644 --- a/RELEASE.md +++ b/RELEASE.md @@ -14,6 +14,7 @@ * Safeguard hooks when user incorrectly registers a hook class in settings.py. * Fixed parsing paths with query and fragment. * Remove lowercase transformation in regex validation. +* Updated `Partitioned dataset lazy saving` docs page. ## Breaking changes to the API ## Documentation changes diff --git a/docs/source/data/partitioned_and_incremental_datasets.md b/docs/source/data/partitioned_and_incremental_datasets.md index 3ac91f83dc..f5acf16a05 100644 --- a/docs/source/data/partitioned_and_incremental_datasets.md +++ b/docs/source/data/partitioned_and_incremental_datasets.md @@ -175,6 +175,7 @@ new_partitioned_dataset: path: s3://my-bucket-name dataset: pandas.CSVDataset filename_suffix: ".csv" + save_lazily: True ``` Here is the node definition: @@ -238,6 +239,24 @@ def create_partitions() -> Dict[str, Callable[[], Any]]: When using lazy saving, the dataset will be written _after_ the `after_node_run` [hook](../hooks/introduction). ``` +```{note} +Lazy saving is the default behaviour, meaning that if a `Callable` type is provided, the dataset will be written _after_ the `after_node_run` hook is executed. +``` + +In certain cases, it might be useful to disable lazy saving, such as when your object is already a `Callable` (e.g., a TensorFlow model) and you do not intend to save it lazily. +To disable the lazy saving set `save_lazily` parameter to `False`: + +```yaml +# conf/base/catalog.yml + +new_partitioned_dataset: + type: partitions.PartitionedDataset + path: s3://my-bucket-name + dataset: pandas.CSVDataset + filename_suffix: ".csv" + save_lazily: False +``` + ## Incremental datasets {class}`IncrementalDataset` is a subclass of `PartitionedDataset`, which stores the information about the last processed partition in the so-called `checkpoint`. `IncrementalDataset` addresses the use case when partitions have to be processed incrementally, that is, each subsequent pipeline run should process just the partitions which were not processed by the previous runs.