Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fds 1843 use graph #1425

Closed
wants to merge 49 commits into from
Closed
Show file tree
Hide file tree
Changes from 47 commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
1c67b4a
Add data_model_graph_pickle to generator
afwillia Apr 11, 2024
a986b39
Add ata_model_graph_pickle to metadata.py
afwillia Apr 11, 2024
0390b69
Add data_model_graph to attributes_explorer
afwillia Apr 11, 2024
37bbf74
Add data_model_graph_pickle to tangled_tree
afwillia Apr 11, 2024
4596560
Merge branch 'develop' into FDS-1843-use-graph
afwillia May 7, 2024
b5dd8e4
Add data_model_graph argument to ManifestGenerator.create_manifests
afwillia May 8, 2024
c8da900
use pickle.load to read pickle file
afwillia May 8, 2024
601b5d0
Add data model graph pickle to attributes_explorer
afwillia May 8, 2024
c6d01de
Add data model graph pickle to tangled_tree
afwillia May 8, 2024
77979c7
Merge branch 'develop' into FDS-1843-use-graph
afwillia Jun 5, 2024
9907b4e
Fix mix-up between parsed data model and graph data model
afwillia Jun 6, 2024
481ebd3
Add data model pickle parsing to tanlged_tree
afwillia Jun 6, 2024
f63231a
Add data_model_pickle file to tests
afwillia Jun 6, 2024
f3fdc41
Add data model pickle file
afwillia Jun 6, 2024
032771f
run black
afwillia Jun 6, 2024
bf0e9ba
fix a couple sonarcloud issues with the graph_data_model variable
afwillia Jun 6, 2024
bc59218
Add data_model_graph_pickle to metadata.py and tests
afwillia Jun 6, 2024
5973a75
run black on generator.py
afwillia Jun 6, 2024
bf1a449
remove type check from networkx
afwillia Jun 6, 2024
69bb182
fix pylint issues
afwillia Jun 6, 2024
d93aa9d
remove pickle from metadata test because it wasn't created with data_…
afwillia Jun 7, 2024
aef4f62
create two metadata_model objects with different data_model_label gra…
afwillia Jun 7, 2024
08b2ae5
add display_label graph pickle
afwillia Jun 7, 2024
a088d63
Merge branch 'develop' into FDS-1843-use-graph
afwillia Jun 26, 2024
68672c3
Add multiple combinations of jsonld and pickle to test_viz attribute_…
afwillia Jun 28, 2024
de07c55
Fix missing variables in tangled_tree when jsonld and pickle are supp…
afwillia Jun 28, 2024
77f5ec4
Add combinations of jsonld and pickle to tangled_tree tests
afwillia Jun 28, 2024
5934ed5
add graph_data_model argument description
afwillia Jun 28, 2024
51313bd
add data_model_graph_pickle description to MetadataModel
afwillia Jun 28, 2024
e4e1368
run black on test_viz.py
afwillia Aug 26, 2024
9a78a18
Update schematic/manifest/generator.py
afwillia Aug 26, 2024
05b4fe0
Add read_pickle function to read a binary pickle file
afwillia Aug 26, 2024
0964501
Use read_pickle instead of importing pickle and opening file
afwillia Aug 26, 2024
697cc3e
Add note for pickle files that don't fit in memory. Not sure if this …
afwillia Aug 26, 2024
774dc91
Use read_pickle to load the pickle file in metadata.py
afwillia Aug 26, 2024
bb7c450
read_pickle instead of import pickle in attributes_explorer
afwillia Aug 26, 2024
8f5bc1a
set self.graph_data_model and update it if data_model_grapher is not …
afwillia Aug 26, 2024
35c13dd
use if ... is not None instead of if not ...
afwillia Aug 26, 2024
f8c14d4
use if ... is not None instead of if not ...
afwillia Aug 26, 2024
13c9ffc
add an input to read_pickle
afwillia Aug 26, 2024
9205d6d
run black on io_utils.py
afwillia Aug 27, 2024
172e1b1
use read_pickle to read pickle file. Set default data_model_labels
afwillia Aug 27, 2024
d4d3a30
set default data_model_labels
afwillia Aug 27, 2024
730b227
alter logic for setting graph_data_model from pickle or DataModelGrap…
afwillia Aug 27, 2024
795e263
fix pickle reading logic in tangled tree
afwillia Aug 27, 2024
0ddb552
fix logic checking for None parameters in attributes_explorer.py
afwillia Aug 27, 2024
537af6f
import "import pickle" should be placed before "from schematic import…
afwillia Aug 27, 2024
23b7e1d
update docstring and use if ... is not None instead of if not ...
afwillia Aug 27, 2024
fa5e308
use if ... is None instead of if ... is not None
afwillia Aug 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 21 additions & 7 deletions schematic/manifest/generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
build_service_account_creds,
)
from schematic.utils.df_utils import update_df, load_df
from schematic.utils.io_utils import read_pickle
from schematic.utils.schema_utils import extract_component_validation_rules
from schematic.utils.validate_utils import rule_in_rule_list
from schematic.utils.schema_utils import DisplayLabelType
Expand Down Expand Up @@ -1642,11 +1643,15 @@ def create_manifests(
title: Optional[str] = None,
strict: Optional[bool] = True,
use_annotations: Optional[bool] = False,
afwillia marked this conversation as resolved.
Show resolved Hide resolved
graph_data_model: Optional[nx.MultiDiGraph] = None,
data_model_graph_pickle: Optional[str] = None,
) -> Union[List[str], List[pd.DataFrame]]:
"""Create multiple manifests

Args:
path_to_data_model (str): str path to data model
data_model_graph_pickle (str): A data model graph as csv or pickle
afwillia marked this conversation as resolved.
Show resolved Hide resolved
graph_data_model (str): An networkx MultiDiGraph object
afwillia marked this conversation as resolved.
Show resolved Hide resolved
data_types (list): a list of data types
access_token (str, optional): synapse access token. Required when getting an existing manifest. Defaults to None.
dataset_ids (list, optional): a list of dataset ids when generating an existing manifest. Defaults to None.
Expand Down Expand Up @@ -1677,16 +1682,25 @@ def create_manifests(
"Please check your submission and try again."
)

data_model_parser = DataModelParser(path_to_data_model=path_to_data_model)
if not graph_data_model:
afwillia marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

@andrewelamb andrewelamb Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want this to be if graph_data_model is None I'm not sure how nx.MultiDiGraph handles thruthyness.

if data_model_graph_pickle:
"""What if pickle file does not fit in memory?"""
graph_data_model = read_pickle(data_model_graph_pickle)
else:
data_model_parser = DataModelParser(
path_to_data_model=path_to_data_model
)

# Parse Model
parsed_data_model = data_model_parser.parse_model()
# Parse Model
parsed_data_model = data_model_parser.parse_model()

# Instantiate DataModelGraph
data_model_grapher = DataModelGraph(parsed_data_model, data_model_labels)
# Instantiate DataModelGraph
data_model_grapher = DataModelGraph(
parsed_data_model, data_model_labels
)

# Generate graph
graph_data_model = data_model_grapher.graph
# Generate graph
graph_data_model = data_model_grapher.graph

# Gather all returned result urls
all_results = []
Expand Down
26 changes: 18 additions & 8 deletions schematic/models/metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
from schematic.store.synapse import SynapseStorage

from schematic.utils.df_utils import load_df
from schematic.utils.io_utils import read_pickle

from schematic.models.validate_manifest import validate_all
from opentelemetry import trace
Expand All @@ -42,12 +43,14 @@ def __init__(
inputMModelLocation: str,
inputMModelLocationType: str,
data_model_labels: str,
data_model_graph_pickle: Optional[str] = None,
afwillia marked this conversation as resolved.
Show resolved Hide resolved
) -> None:
"""Instantiates a MetadataModel object.

Args:
inputMModelLocation: local path, uri, synapse entity id (e.g. gs://, syn123, /User/x/…); present location
inputMModelLocationType: specifier to indicate where the metadata model resource can be found (e.g. 'local' if file/JSON-LD is on local machine)
data_model_graph_pickle: filepath to a data model graph stored as pickle file.
"""
# extract extension of 'inputMModelLocation'
# ensure that it is necessarily pointing to a '.jsonld' file
Expand All @@ -60,17 +63,24 @@ def __init__(
self.inputMModelLocation = inputMModelLocation
self.path_to_json_ld = inputMModelLocation

data_model_parser = DataModelParser(path_to_data_model=self.inputMModelLocation)
# Parse Model
parsed_data_model = data_model_parser.parse_model()
# Use graph, if provided. Otherwise parse data model for graph.
if data_model_graph_pickle:
self.graph_data_model = read_pickle(data_model_graph_pickle)
self.dmge = DataModelGraphExplorer(self.graph_data_model)
else:
data_model_parser = DataModelParser(
path_to_data_model=self.inputMModelLocation
)
# Parse Model
parsed_data_model = data_model_parser.parse_model()

# Instantiate DataModelGraph
data_model_grapher = DataModelGraph(parsed_data_model, data_model_labels)
# Instantiate DataModelGraph
data_model_grapher = DataModelGraph(parsed_data_model, data_model_labels)

# Generate graph
self.graph_data_model = data_model_grapher.graph
# Generate graph
self.graph_data_model = data_model_grapher.graph

self.dmge = DataModelGraphExplorer(self.graph_data_model)
self.dmge = DataModelGraphExplorer(self.graph_data_model)

# check if the type of MModel file is "local"
# currently, the application only supports reading from local JSON-LD files
Expand Down
8 changes: 8 additions & 0 deletions schematic/utils/io_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
from typing import Any
import json
import urllib.request
import pickle
from schematic import LOADER


Expand Down Expand Up @@ -40,3 +41,10 @@ def load_schemaorg() -> Any:
data_path = "data_models/schema_org.model.jsonld"
schema_org_path = LOADER.filename(data_path)
return load_json(schema_org_path)


def read_pickle(file_path: str) -> Any:
"""Read pickle file"""
with open(file_path, "rb") as fle:
data = pickle.load(fle)
return data
26 changes: 16 additions & 10 deletions schematic/visualization/attributes_explorer.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,13 @@
from typing import Optional, no_type_check
import numpy as np
import pandas as pd
import networkx as nx # type: ignore

from schematic.schemas.data_model_parser import DataModelParser
from schematic.schemas.data_model_graph import DataModelGraph, DataModelGraphExplorer
from schematic.schemas.data_model_json_schema import DataModelJSONSchema
from schematic.utils.schema_utils import DisplayLabelType
from schematic.utils.io_utils import load_json
from schematic.utils.io_utils import load_json, read_pickle

logger = logging.getLogger(__name__)

Expand All @@ -22,34 +23,39 @@ class AttributesExplorer:
def __init__(
self,
path_to_jsonld: str,
data_model_labels: DisplayLabelType,
data_model_labels: DisplayLabelType = "class_label",
data_model_grapher: Optional[DataModelGraph] = None,
data_model_graph_explorer: Optional[DataModelGraphExplorer] = None,
parsed_data_model: Optional[dict] = None,
graph_data_model: Optional[nx.MultiDiGraph] = None,
data_model_graph_pickle: Optional[str] = None,
) -> None:
self.path_to_jsonld = path_to_jsonld

self.jsonld = load_json(self.path_to_jsonld)
if graph_data_model is not None:
self.graph_data_model = graph_data_model
elif data_model_graph_pickle is not None:
self.graph_data_model = read_pickle(data_model_graph_pickle)

# Parse Model
if not parsed_data_model:
if parsed_data_model is None:
data_model_parser = DataModelParser(
path_to_data_model=self.path_to_jsonld,
)
parsed_data_model = data_model_parser.parse_model()

# Instantiate DataModelGraph
if not data_model_grapher:
if data_model_grapher is None:
data_model_grapher = DataModelGraph(parsed_data_model, data_model_labels)

# Generate graph
self.graph_data_model = data_model_grapher.graph
# Generate graph
self.graph_data_model = data_model_grapher.graph

# Instantiate Data Model Graph Explorer
if not data_model_graph_explorer:
self.dmge = DataModelGraphExplorer(self.graph_data_model)
else:
if data_model_graph_explorer is not None:
self.dmge = data_model_graph_explorer
else:
self.dmge = DataModelGraphExplorer(self.graph_data_model)

# Instantiate Data Model Json Schema
self.data_model_js = DataModelJSONSchema(
Expand Down
34 changes: 22 additions & 12 deletions schematic/visualization/tangled_tree.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
from schematic.visualization.attributes_explorer import AttributesExplorer
from schematic.schemas.data_model_parser import DataModelParser
from schematic.schemas.data_model_graph import DataModelGraph, DataModelGraphExplorer
from schematic.utils.io_utils import load_json
from schematic.utils.io_utils import load_json, read_pickle
from schematic.utils.schema_utils import DisplayLabelType


Expand All @@ -43,14 +43,15 @@ class Node(TypedDict):
children: list[str]


class TangledTree: # pylint: disable=too-many-instance-attributes
class TangledTree: # pylint: disable=too-many-instance-attributes disable=too-many-arguments
"""Tangled tree class"""

def __init__(
self,
path_to_json_ld: str,
figure_type: FigureType,
data_model_labels: DisplayLabelType,
data_model_labels: DisplayLabelType = "class_label",
data_model_graph_pickle: Optional[str] = None,
) -> None:
# Load jsonld
self.path_to_json_ld = path_to_json_ld
Expand All @@ -59,19 +60,26 @@ def __init__(
# Parse schema name
self.schema_name = path.basename(self.path_to_json_ld).split(".model.jsonld")[0]

parsed_data_model = None

# Instantiate Data Model Parser
data_model_parser = DataModelParser(
path_to_data_model=self.path_to_json_ld,
)
if data_model_graph_pickle is None:
data_model_parser = DataModelParser(
path_to_data_model=self.path_to_json_ld,
)

# Parse Model
parsed_data_model = data_model_parser.parse_model()
# Parse Model
parsed_data_model = data_model_parser.parse_model()

# Instantiate DataModelGraph
data_model_grapher = DataModelGraph(parsed_data_model, data_model_labels)
# Instantiate DataModelGraph
data_model_grapher = DataModelGraph(parsed_data_model, data_model_labels)

# Generate graph
self.graph_data_model = data_model_grapher.graph
# Generate graph
self.graph_data_model = data_model_grapher.graph

else:
self.graph_data_model = read_pickle(data_model_graph_pickle)
data_model_grapher = self.graph_data_model

# Instantiate Data Model Graph Explorer
self.dmge = DataModelGraphExplorer(self.graph_data_model)
Expand All @@ -91,6 +99,8 @@ def __init__(
data_model_grapher=data_model_grapher,
data_model_graph_explorer=self.dmge,
parsed_data_model=parsed_data_model,
graph_data_model=self.graph_data_model,
data_model_graph_pickle=data_model_graph_pickle,
)

# Create output paths.
Expand Down
Binary file added tests/data/example.display.label.model.pickle
Binary file not shown.
Binary file added tests/data/example.model.pickle
Binary file not shown.
19 changes: 18 additions & 1 deletion tests/test_metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,20 @@ def metadata_model(helpers, data_model_labels):
inputMModelLocation=helpers.get_data_path("example.model.jsonld"),
data_model_labels=data_model_labels,
inputMModelLocationType="local",
data_model_graph_pickle=helpers.get_data_path("example.model.pickle"),
)

return metadata_model


def metadata_model_display(helpers, data_model_labels):
metadata_model = MetadataModel(
inputMModelLocation=helpers.get_data_path("example.model.jsonld"),
data_model_labels=data_model_labels,
inputMModelLocationType="local",
data_model_graph_pickle=helpers.get_data_path(
"example.display.label.model.pickle"
),
)

return metadata_model
Expand All @@ -34,7 +48,10 @@ class TestMetadataModel:
)
def test_get_component_requirements(self, helpers, as_graph, data_model_labels):
# Instantiate MetadataModel
meta_data_model = metadata_model(helpers, data_model_labels)
if data_model_labels == "class_label":
meta_data_model = metadata_model(helpers, data_model_labels)
else:
meta_data_model = metadata_model_display(helpers, data_model_labels)

if data_model_labels == "display_label":
source_component = "BulkRNAseqAssay"
Expand Down
59 changes: 46 additions & 13 deletions tests/test_viz.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,30 +13,63 @@
logger = logging.getLogger(__name__)


@pytest.fixture
def attributes_explorer(helpers):
@pytest.fixture(
params=[
("example.model.jsonld", "example.model.pickle"),
("example.model.jsonld", ""),
pytest.param(("", ""), marks=pytest.mark.xfail),
pytest.param(("", "example.model.pickle"), marks=pytest.mark.xfail),
]
)
def attributes_explorer(request, helpers):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at this doc here: https://www.draconianoverlord.com/2017/11/28/using-given/when/then-for-tests.html/

Personally, I am a huge fan of giving each test a very explicit purpose. The way I do this is clearly state a set of criteria the test is expected to following in a:

# GIVEN some initial setup (Explain what the setup is doing)

# WHEN I apply some action after the initial setup (The functions you are testing)

# THEN I expect the output/behavior to look like (Add your assertions)

I added some tests like this in: #1472 if you wanted some examples

# Get JSONLD file path
path_to_jsonld = helpers.get_data_path("example.model.jsonld")
param1, param2 = request.param
path_to_jsonld = helpers.get_data_path(param1)
path_to_graph = helpers.get_data_path(param2)

afwillia marked this conversation as resolved.
Show resolved Hide resolved
# Initialize TangledTree
attributes_explorer = AttributesExplorer(
path_to_jsonld,
data_model_labels="class_label",
)
if param2 != "":
attributes_explorer = AttributesExplorer(
path_to_jsonld,
data_model_graph_pickle=path_to_graph,
data_model_labels="class_label",
)
else:
attributes_explorer = AttributesExplorer(
path_to_jsonld,
data_model_labels="class_label",
)
yield attributes_explorer


@pytest.fixture
def tangled_tree(helpers):
@pytest.fixture(
params=[
("example.model.jsonld", "example.model.pickle"),
("example.model.jsonld", ""),
pytest.param(("", ""), marks=pytest.mark.xfail),
pytest.param(("", "example.model.pickle"), marks=pytest.mark.xfail),
]
)
def tangled_tree(helpers, request):
figure_type = "component"

# Get JSONLD file path
path_to_jsonld = helpers.get_data_path("example.model.jsonld")
param1, param2 = request.param
path_to_jsonld = helpers.get_data_path(param1)
path_to_graph = helpers.get_data_path(param2)

# Initialize TangledTree
tangled_tree = TangledTree(
path_to_jsonld, figure_type, data_model_labels="class_label"
)
if param2 == "":
tangled_tree = TangledTree(
path_to_jsonld, figure_type, data_model_labels="class_label"
)
else:
tangled_tree = TangledTree(
path_to_jsonld,
figure_type,
data_model_labels="class_label",
data_model_graph_pickle=path_to_graph,
)
yield tangled_tree


Expand Down
Loading