Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lightweight Kedro Viz Experimentation using AST #1966

Merged
merged 63 commits into from
Sep 3, 2024
Merged
Show file tree
Hide file tree
Changes from 56 commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
0e7f24d
merge main from remote
ravi-kumar-pilla Apr 25, 2024
c1aae75
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla Apr 26, 2024
177ccbc
merging remote
ravi-kumar-pilla May 1, 2024
8ecf9bf
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla May 2, 2024
37f3bf4
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla May 8, 2024
499d8c4
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla May 14, 2024
b3ab479
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla May 16, 2024
e295e92
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla May 20, 2024
905b198
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla May 21, 2024
490a89f
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla May 30, 2024
c1a099b
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla May 31, 2024
573e3c0
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla Jun 10, 2024
5a12c65
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla Jun 13, 2024
960c113
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla Jun 18, 2024
49c05b1
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla Jun 21, 2024
354e024
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla Jun 21, 2024
60e2f27
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla Jun 26, 2024
52c2060
partially working parser - WIP
ravi-kumar-pilla Jun 27, 2024
cfd99a7
partial working commit
ravi-kumar-pilla Jun 29, 2024
de4a4ef
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Jul 3, 2024
7125927
testing show code
ravi-kumar-pilla Jul 3, 2024
bff5a4c
adjust file permissions
ravi-kumar-pilla Jul 3, 2024
3038afd
update comments and rename parser file
ravi-kumar-pilla Jul 3, 2024
0e91504
remove gitignore
ravi-kumar-pilla Jul 3, 2024
a4b3b1a
handle func lambda case
ravi-kumar-pilla Jul 3, 2024
0a80f6c
mocking working draft proposal
ravi-kumar-pilla Jul 12, 2024
e31242f
reuse session with mock modules
ravi-kumar-pilla Jul 15, 2024
8b8e337
wip integration tests
ravi-kumar-pilla Jul 17, 2024
8e0ae73
sporadic working needs testing
ravi-kumar-pilla Jul 18, 2024
38782e3
update sys modules with patch
ravi-kumar-pilla Jul 18, 2024
1fc1faf
fix lint and pytests
ravi-kumar-pilla Jul 18, 2024
98361e3
add dataset factories test
ravi-kumar-pilla Jul 22, 2024
e120ccc
add e2e test
ravi-kumar-pilla Jul 22, 2024
a711cf0
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Jul 22, 2024
b7a1862
fix CI
ravi-kumar-pilla Jul 22, 2024
c5a6f2a
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Jul 22, 2024
06e35bf
dataset factory pattern support in lite mode
ravi-kumar-pilla Jul 23, 2024
78cd413
add doc strings
ravi-kumar-pilla Jul 23, 2024
f2dda93
add e2e test and clear unused func
ravi-kumar-pilla Jul 24, 2024
bfe069f
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Jul 24, 2024
35f1ed5
Merge branch 'main' into feature/kedro-viz-lite
ravi-kumar-pilla Jul 24, 2024
1cffd8a
Merge branch 'main' into feature/kedro-viz-lite
ravi-kumar-pilla Jul 25, 2024
fc8f7e4
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Jul 30, 2024
c31fbda
Merge branch 'main' into feature/kedro-viz-lite
ravi-kumar-pilla Aug 9, 2024
bc4aea2
testing relative to absolute imports
ravi-kumar-pilla Aug 13, 2024
60f9cd3
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Aug 16, 2024
8162147
testing relative imports
ravi-kumar-pilla Aug 16, 2024
840cb9f
working draft for relative imports multi-level
ravi-kumar-pilla Aug 17, 2024
76e3c2b
remove resolving relative dependencies
ravi-kumar-pilla Aug 19, 2024
2d18e9a
test
ravi-kumar-pilla Aug 19, 2024
16e1ef5
working draft
ravi-kumar-pilla Aug 19, 2024
8c6d878
modify test and standalone support for lite
ravi-kumar-pilla Aug 19, 2024
f9de2fe
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Aug 19, 2024
db1b416
improve readability
ravi-kumar-pilla Aug 20, 2024
fe09d20
fix lint and pytest
ravi-kumar-pilla Aug 20, 2024
fefafa6
revert link redirect
ravi-kumar-pilla Aug 21, 2024
ae94f1e
remove side effects
ravi-kumar-pilla Aug 21, 2024
57ea66a
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Aug 22, 2024
45da624
pr suggestions addressed
ravi-kumar-pilla Aug 22, 2024
bcdd304
fix dict issue
ravi-kumar-pilla Aug 22, 2024
f4cd1dd
merge main
ravi-kumar-pilla Aug 22, 2024
050bff2
moved package check under dirs and add exception block
ravi-kumar-pilla Aug 22, 2024
63b9fd3
merge main
ravi-kumar-pilla Sep 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions package/features/steps/cli_steps.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,16 @@ def exec_viz_command(context):
)


@when("I execute the kedro viz run command with lite option")
def exec_viz_lite_command(context):
"""Execute Kedro-Viz command."""
context.result = ChildTerminatingPopen(
[context.kedro, "viz", "run", "--lite", "--no-browser"],
env=context.env,
cwd=str(context.root_project_dir),
)


@then("kedro-viz should start successfully")
def check_kedroviz_up(context):
"""Check that Kedro-Viz is up and responding to requests."""
Expand All @@ -169,3 +179,26 @@ def check_kedroviz_up(context):
)
finally:
context.result.terminate()


@then("I store the response from main endpoint")
def get_main_api_response(context):
max_duration = 30 # 30 seconds
end_by = time() + max_duration

while time() < end_by:
try:
response = requests.get("http://localhost:4141/api/main")
context.response = response.json()
assert response.status_code == 200
except Exception:
sleep(2.0)
continue
else:
break


@then("I compare the responses in regular and lite mode")
def compare_main_api_responses(context):
regular_mode_response = requests.get("http://localhost:4141/api/main").json()
assert context.response == regular_mode_response
14 changes: 14 additions & 0 deletions package/features/viz.feature
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,17 @@ Feature: Viz plugin in new project
When I execute the kedro viz run command
Then kedro-viz should start successfully

Scenario: Execute viz lite with latest Kedro
Given I have installed kedro version "latest"
And I have run a non-interactive kedro new with spaceflights-pandas starter
When I execute the kedro viz run command with lite option
Then kedro-viz should start successfully

Scenario: Compare viz responses in regular and lite mode
Given I have installed kedro version "latest"
And I have run a non-interactive kedro new with spaceflights-pandas starter
When I execute the kedro viz run command with lite option
Then I store the response from main endpoint
Given I have installed the project's requirements
When I execute the kedro viz run command
Then I compare the responses in regular and lite mode
14 changes: 13 additions & 1 deletion package/kedro_viz/data_access/managers.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@

import networkx as nx
from kedro.io import DataCatalog
from kedro.io.core import DatasetError
from kedro.io.memory_dataset import MemoryDataset
from kedro.pipeline import Pipeline as KedroPipeline
from kedro.pipeline.node import Node as KedroNode
from sqlalchemy.orm import sessionmaker
Expand Down Expand Up @@ -316,7 +318,17 @@ def add_dataset(
Returns:
The GraphNode instance representing the dataset that was added to the NodesRepository.
"""
obj = self.catalog.get_dataset(dataset_name)
try:
obj = self.catalog.get_dataset(dataset_name)
except DatasetError:
# This is to handle dataset factory patterns when running
# Kedro Viz in lite mode. The `get_dataset` function
# of DataCatalog calls AbstractDataset.from_config
# which tries to create a Dataset instance from the pattern

# pylint: disable=abstract-class-instantiated
obj = MemoryDataset() # type: ignore[abstract]
ravi-kumar-pilla marked this conversation as resolved.
Show resolved Hide resolved

layer = self.catalog.get_layer_for_dataset(dataset_name)
graph_node: Union[DataNode, TranscodedDataNode, ParametersNode]
(
Expand Down
76 changes: 76 additions & 0 deletions package/kedro_viz/integrations/kedro/data_catalog_lite.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
"""``DataCatalogLite`` is a custom implementation of Kedro's ``DataCatalog``
to provide a MemoryDataset instance when running Kedro-Viz in lite mode.
"""

import copy
from typing import Any, Optional

from kedro.io.core import AbstractDataset, DatasetError, generate_timestamp
from kedro.io.data_catalog import DataCatalog, _resolve_credentials
from kedro.io.memory_dataset import MemoryDataset


class DataCatalogLite(DataCatalog):
"""``DataCatalogLite`` is a custom implementation of Kedro's ``DataCatalog``
to provide a MemoryDataset instance by overriding ``from_config`` of ``DataCatalog``
when running Kedro-Viz in lite mode.
"""

@classmethod
def from_config(
cls,
catalog: Optional[dict[str, dict[str, Any]]],
credentials: Optional[dict[str, dict[str, Any]]] = None,
load_versions: Optional[dict[str, str]] = None,
save_version: Optional[str] = None,
) -> DataCatalog:
datasets = {}
dataset_patterns = {}
catalog = copy.deepcopy(catalog) or {}
credentials = copy.deepcopy(credentials) or {}
save_version = save_version or generate_timestamp()
load_versions = copy.deepcopy(load_versions) or {}
user_default = {}

for ds_name, ds_config in catalog.items():
if not isinstance(ds_config, dict):
raise DatasetError(
ravi-kumar-pilla marked this conversation as resolved.
Show resolved Hide resolved
f"Catalog entry '{ds_name}' is not a valid dataset configuration. "
"\nHint: If this catalog entry is intended for variable interpolation, "
"make sure that the key is preceded by an underscore."
)

try:
ds_config = _resolve_credentials(
ds_config, credentials
) # noqa: PLW2901
if cls._is_pattern(ds_name):
# Add each factory to the dataset_patterns dict.
dataset_patterns[ds_name] = ds_config

else:
try:
datasets[ds_name] = AbstractDataset.from_config(
ds_name, ds_config, load_versions.get(ds_name), save_version
)
except DatasetError:
# pylint: disable=abstract-class-instantiated
datasets[ds_name] = MemoryDataset() # type: ignore[abstract]
except KeyError:
# pylint: disable=abstract-class-instantiated
datasets[ds_name] = MemoryDataset() # type: ignore[abstract]

sorted_patterns = cls._sort_patterns(dataset_patterns)
if sorted_patterns:
# If the last pattern is a catch-all pattern, pop it and set it as the default
if cls._specificity(list(sorted_patterns.keys())[-1]) == 0:
last_pattern = sorted_patterns.popitem()
user_default = {last_pattern[0]: last_pattern[1]}

return cls(
datasets=datasets,
dataset_patterns=sorted_patterns,
load_versions=load_versions,
save_version=save_version,
default_pattern=user_default,
)
92 changes: 70 additions & 22 deletions package/kedro_viz/integrations/kedro/data_loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,18 +7,22 @@

import json
import logging
import sys
from pathlib import Path
from typing import Any, Dict, Optional, Tuple
from unittest.mock import patch

from kedro import __version__
from kedro.framework.project import configure_project, pipelines
from kedro.framework.project import configure_project, pipelines, settings
from kedro.framework.session import KedroSession
from kedro.framework.session.store import BaseSessionStore
from kedro.framework.startup import bootstrap_project
from kedro.io import DataCatalog
from kedro.pipeline import Pipeline

from kedro_viz.constants import VIZ_METADATA_ARGS
from kedro_viz.integrations.kedro.data_catalog_lite import DataCatalogLite
from kedro_viz.integrations.kedro.lite_parser import LiteParser

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -69,12 +73,51 @@ def _get_dataset_stats(project_path: Path) -> Dict:
return {}


def _load_data_helper(
project_path: Path,
env: Optional[str] = None,
include_hooks: bool = False,
extra_params: Optional[Dict[str, Any]] = None,
is_lite: bool = False,
):
"""Helper to load data from a Kedro project."""

with KedroSession.create(
project_path=project_path,
env=env,
save_on_close=False,
extra_params=extra_params,
) as session:
# check for --include-hooks option
if not include_hooks:
session._hook_manager = _VizNullPluginManager() # type: ignore

context = session.load_context()
session_store = session._store

# Update the DataCatalog class for a custom implementation
# to handle kedro.io.core.DatasetError from
# `settings.DATA_CATALOG_CLASS.from_config`
if is_lite:
settings.DATA_CATALOG_CLASS = DataCatalogLite

catalog = context.catalog

# Pipelines is a lazy dict-like object, so we force it to populate here
# in case user doesn't have an active session down the line when it's first accessed.
# Useful for users who have `get_current_session` in their `register_pipelines()`.
pipelines_dict = dict(pipelines)
stats_dict = _get_dataset_stats(project_path)
return catalog, pipelines_dict, session_store, stats_dict


def load_data(
ravi-kumar-pilla marked this conversation as resolved.
Show resolved Hide resolved
project_path: Path,
env: Optional[str] = None,
include_hooks: bool = False,
package_name: Optional[str] = None,
extra_params: Optional[Dict[str, Any]] = None,
is_lite: bool = False,
) -> Tuple[DataCatalog, Dict[str, Pipeline], BaseSessionStore, Dict]:
"""Load data from a Kedro project.
Args:
Expand All @@ -87,6 +130,7 @@ def load_data(
for underlying KedroContext. If specified, will update (and therefore
take precedence over) the parameters retrieved from the project
configuration.
is_lite: A flag to run Kedro-Viz in lite mode.
Returns:
A tuple containing the data catalog and the pipeline dictionary
and the session store.
Expand All @@ -97,24 +141,28 @@ def load_data(
# bootstrap project when viz is run in dev mode
bootstrap_project(project_path)

with KedroSession.create(
project_path=project_path,
env=env,
save_on_close=False,
extra_params=extra_params,
) as session:
# check for --include-hooks option
if not include_hooks:
session._hook_manager = _VizNullPluginManager() # type: ignore

context = session.load_context()
session_store = session._store
catalog = context.catalog

# Pipelines is a lazy dict-like object, so we force it to populate here
# in case user doesn't have an active session down the line when it's first accessed.
# Useful for users who have `get_current_session` in their `register_pipelines()`.
pipelines_dict = dict(pipelines)
stats_dict = _get_dataset_stats(project_path)

return catalog, pipelines_dict, session_store, stats_dict
if is_lite:
lite_parser = LiteParser(project_path, package_name)
mocked_modules = lite_parser.get_mocked_modules()

if len(mocked_modules):
logger.warning(
"Kedro-Viz has mocked the following dependencies for lite-mode.\n"
"%s \n"
"In order to get a complete experience of Viz, "
"please install the missing Kedro project dependencies\n",
list(mocked_modules.keys()),
)

sys_modules_patch = sys.modules.copy()
sys_modules_patch.update(mocked_modules)

# Patch actual sys modules
with patch.dict("sys.modules", sys_modules_patch):
return _load_data_helper(
project_path, env, include_hooks, extra_params, is_lite
)
else:
return _load_data_helper(
project_path, env, include_hooks, extra_params, is_lite
)
Loading
Loading