Revise documentation in the "Kedro for notebook users" section #2845

stichbury · 2023-07-26T16:11:12Z

Child of #2799

Description

Looking into popular content (and content that could be popular if it was any good) I have identified this section on notebook/Kedro usage as problematic.

Context

Lots of potential to help notebook users cross the rubicon to use Kedro

Possible Implementation

Currently we have two pages but in my view they're the wrong way around and there's a big chunk missing on the conversion from Notebook -> Kedro and/or phased introduction of Kedro support to notebooks. I think we should go with this ordering:

Continue using a Notebook for almost everything but use the data catalog to help with data access/sharing a project
Migrate from using a Notebook to a full Kedro project (but do it in stages so the previous bullet is phase 1, then look at parameters, then functions->nodes etc).
Clearly marked as a different angle: introduce notebooks into your Kedro project for exploratory analysis

Looking at the pages in more detail:

Page 1: Phased support to use the Kedro `DataCatalog` as a data registry (terminology TBC)

What problem does this solve? (No hard-coded data locations for shared projects, managed data access)
How to use the DataCatalog within your existing notebook
- Present an example notebook with data and show how to remove hard-coded data locations and data loading/saving in favour of Kedro
How to use the standalone-datacatalog starter
- Basic pandas-iris example

Page 2: How to convert your existing notebook to a Kedro project

Holy grail example. TBD. I need to pair with someone on this to work out how to write it up (it has potential to be a blog post too). I have a separate ticket for this work #2855.

Page 3: How to use Kedro and a notebook side-by-side

Tidy up what we have in this page https://docs.kedro.org/en/stable/notebooks_and_ipython/kedro_and_notebooks.html to illustrate how to use a notebook to explore side-by-side with your Kedro project.

Remove the complexity in the early part of the page under "A custom Kedro kernel" and just summarise what you get:
- catalog (type DataCatalog): Data Catalog instance that contains all defined datasets; this is a shortcut for context.catalog
- context (type KedroContext): Kedro project context that provides access to Kedro's library components
- pipelines (type Dict[str, Pipeline]): Pipelines defined in your pipeline registry
- session (type KedroSession): Kedro session that orchestrates a pipeline run
Iris dataset example: Shows with the pandas-iris starter
how to add a notebook with kedro jupyter notebook
- Illustrate how to use catalog, context, pipelines and session
%reload_kedro line magic
%run_viz line magic
How to convert functions from Jupyter Notebooks into Kedro nodes
Work with managed services
Connect an IPython shell to a Kedro project kernel
Create a custom Jupyter kernel that automatically loads the extension and launches JupyterLab / QtConsole

Page 4: Jupyter notebook/Kedro FAQs

A page that covers some the commonly asked questions that we get

How does this look? I'm interested in those that wizard or see questions coming in, or generally have a vision on how we should present ourselves when it comes to Notebook support: @astrojuanlu @merelcht @noklam @deepyaman

The text was updated successfully, but these errors were encountered:

astrojuanlu · 2023-07-27T11:26:35Z

Holy grail example. TBD. I need to pair with someone on this to work out how to write it up (it has potential to be a blog post too)

Let's do this together, I've done this process a dozen times or more already. It's far from perfect but there are several issues tracking how to make it easier already.

How does this look? I'm interested in those that wizard or see questions coming in, or generally have a vision on how we should present ourselves when it comes to Notebook support

So far I haven't seen many direct requests. But by casually walking around the office I see lots of DS using Kedro on Jupyter. And we should pay attention to Databricks as well.

noklam · 2023-07-28T09:18:25Z

Let's do this together, I've done this process a dozen times or more already. It's far from perfect but there are several issues tracking how to make it easier already.

Do you see there is anything we can build to simplify the process? @astrojuanlu

So far I haven't seen many direct requests. But by casually walking around the office I see lots of DS using Kedro on Jupyter. And we should pay attention to Databricks as well.

Same as my experience, I would add a comment that many are not using it in the most efficient way. "I don't know which nodes I need to re-run so I just re-run the whole pipeline". Although kedro run support many different options, it is not well used I wonder if we can show some example in notebook section.

noklam · 2023-07-28T09:26:18Z

Page 1: Phased support to use the Kedro DataCatalog as a data registry (terminology TBC)
I would love to see this example, Amanda is working on the notebook series so potentially there are many things that we can reuse?

Page 2: How to convert your existing notebook to a Kedro project
Holy grail example. TBD. I need to pair with someone on this to work out how to write it up (it has potential to be a blog post too)

Page 3: How to use Kedro and a notebook side-by-side
Agree with many on the points, I shared the same experience when I update the notebook docs last time.

catalog (type DataCatalog): [Data Catalog](https://github.com/kedro-org/kedro/data/data_catalog.md) instance that contains all defined datasets; this is a shortcut for context.catalog
context (type KedroContext): Kedro project context that provides access to Kedro's library components
pipelines (type Dict[str, Pipeline]): Pipelines defined in your [pipeline registry](https://github.com/kedro-org/kedro/nodes_and_pipelines/run_a_pipeline.md#run-a-pipeline-by-name)
session (type KedroSession): [Kedro session](https://github.com/kedro-org/kedro/kedro_project_setup/session.md) that orchestrates a pipeline run

I may reorder session -> catalog -> pipeline -> context, since session and catalog will be used most of the time.

%reload_kedro is an important one
%run_viz I may put it close to "work with manage instance" because you only need this magic command in these situations.
remove the QtConsole mentions? I don't know why we need to mention this and I don't know anyone is doing this.

stichbury · 2023-07-28T11:11:35Z

Fab, thanks for your help on this @noklam and @astrojuanlu. I think I can start on Pages 1 & 3 but will leave page 2 until your return @astrojuanlu (so I have made a separate ticket for that work #2855)

astrojuanlu · 2023-07-28T12:14:50Z

Let's do this together, I've done this process a dozen times or more already. It's far from perfect but there are several issues tracking how to make it easier already.

Do you see there is anything we can build to simplify the process? @astrojuanlu

I opened a few over time, see for example #2583, #2593, #2700, #2777, #2819.

stichbury · 2023-10-19T09:31:21Z

All done and released in 0.18.4

stichbury added this to the Documentation priorities for H2-2023 milestone Jul 26, 2023

stichbury mentioned this issue Jul 26, 2023

Parent task: Review most popular docs content and build a list of pages to update/cull/extend #2799

Closed

stichbury modified the milestones: Documentation priorities for H2-2023, Improve onboarding experience for non-Kedro users (docs and examples) Jul 26, 2023

stichbury mentioned this issue Jul 28, 2023

Create new documentation page to show a staged migration from a notebook to a Kedro project. #2855

Closed

github-actions bot mentioned this issue Aug 18, 2023

Monthly issue metrics report #2950

Closed

merelcht assigned stichbury Sep 4, 2023

This was referenced Sep 29, 2023

Parent task: Build a backlog of Spaceflights extension examples #2800

Open

[Ready] Update docs in the notebooks section #3098

Closed

[Ready] Changes to improve existing content about notebook development & kedro #3127

Merged

stichbury closed this as completed Oct 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revise documentation in the "Kedro for notebook users" section #2845

Revise documentation in the "Kedro for notebook users" section #2845

stichbury commented Jul 26, 2023 •

edited

Loading

astrojuanlu commented Jul 27, 2023

noklam commented Jul 28, 2023

noklam commented Jul 28, 2023

stichbury commented Jul 28, 2023

astrojuanlu commented Jul 28, 2023

stichbury commented Oct 19, 2023

Revise documentation in the "Kedro for notebook users" section #2845

Revise documentation in the "Kedro for notebook users" section #2845

Comments

stichbury commented Jul 26, 2023 • edited Loading

Description

Context

Possible Implementation

Page 1: Phased support to use the Kedro DataCatalog as a data registry (terminology TBC)

Page 2: How to convert your existing notebook to a Kedro project

Page 3: How to use Kedro and a notebook side-by-side

Page 4: Jupyter notebook/Kedro FAQs

astrojuanlu commented Jul 27, 2023

noklam commented Jul 28, 2023

noklam commented Jul 28, 2023

stichbury commented Jul 28, 2023

astrojuanlu commented Jul 28, 2023

stichbury commented Oct 19, 2023

stichbury commented Jul 26, 2023 •

edited

Loading

Page 1: Phased support to use the Kedro `DataCatalog` as a data registry (terminology TBC)