Skip to content

fair-workflows/openpredict

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenPREDICT: Open and FAIR implementation of the PREDICT method for drug repurposing

Open and FAIR implementation of the PREDICT method described in the paper titled "PREDICT: a method for inferring novel drug indications with application to personalized medicine.", Gottlieb A, Stein GY, Ruppin E, Sharan R. Mol Syst Biol. 2011;7:496. Published 2011 Jun 7. doi:10.1038/msb.2011.26

Sources


Bio2RDF datasets

Dataset RDF Dataset Obtained from Metadata
Drugbank http://download.bio2rdf.org/files/release/4/drugbank/drugbank.nq.gz http://download.bio2rdf.org/files/release/4/drugbank/bio2rdf-drugbank.nq
Kegg http://download.bio2rdf.org/files/release/4/kegg/kegg-drug.nq.gz \ http://download.bio2rdf.org/files/release/4/kegg/kegg-genes.nq.gz http://download.bio2rdf.org/files/release/4/kegg/bio2rdf-kegg.nq
SIDER http://download.bio2rdf.org/files/release/4/sider/sider-se.nq.gz http://download.bio2rdf.org/files/release/4/sider/bio2rdf-sider.nq
HGNC http://download.bio2rdf.org/files/release/4/hgnc/hgnc.nq.gz http://download.bio2rdf.org/files/release/4/hgnc/bio2rdf-hgnc.nq
GOA http://download.bio2rdf.org/files/release/4/goa/goa_human.nq.gz http://download.bio2rdf.org/files/release/4/goa/bio2rdf-goa.nq

FAIRified datasets

Dataset Raw Data RDF Data Generated Metadata Generated By
PREDICT drug indication gold standard msb201126-s1.csv predict_gold_standard_omim.nq.gz predict_gold_standard_omim_metadata.nq MappingPREDICTGoldstandard.ipynb
Pubchem IDs mapping for Drugbank pubchem.tsv pubchem_mapping.nq.gz pubchem_mapping_metadata.nq RDFConversionOfPubchemMapping.ipynb
Protein-protein interactions human_interactome.tsv human_interactome.nq.gz human_interactome_metadata.nq HumanInteractome.ipynb
HPO Phenotype annotations for diseases phenotype_annotation_hpoteam.tab hpo_annotations.nq.gz hpo_annotations_metadata.nq OMIMHpoAnnotations.ipynb
MESH Phenotype annotations for diseases mim2mesh.tsv omim_mesh_annotations.nq.gz omim_mesh_annotations_metadata.nq RDFConversionOfMeshAnnotation.ipynb
MESH Phenotype annotations using BioPortal for diseases meshAnnotationsFromBioPorttalUsingOMIMDesc.txt omim_mesh_bioportal.nq.gz omim_mesh_bioportal_metadata.nq RDFConversionOfMeshAnnotation-BioPortal.ipynb and raw file generated by getMeshTerms.py

Pre-processing Bio2RDF data to fix parsing issues

This issue is related to Bio2RDF datasets. The format of each Bio2RDF dataset has to be fixed before uploading it to the triple-store. Example:

python src/preprocess_bio2rdf.py -i drugbank.nq.gz -o refined_drugbank.nq.gz

Upload RDF data to the triple store

Upload each RDF data into triple-store (GraphDB or Virtuoso)

Requirement

see Dockerfile

Answering Competency Questions with SPARQL

CQ1.1: Which steps are meant to be executed manually and which to be executed computationally?

PREFIX bpmn: <http://dkm.fbk.eu/index.php/BPMN2_Ontology#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX p-plan: <http://purl.org/net/p-plan#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX opredict: <http://purl.org/plex/Instances/OpenPREDICT#>
SELECT ?step ?stepType ?instructions ?description 
                WHERE  
	                {
					values ?stepType { bpmn:ManualTask bpmn:ScriptTask }

					?instructions rdf:type p-plan:Plan.
					?step rdf:type ?stepType.
					?step dul:isDescribedBy ?instructions.					
					?instructions dc:description ?description.
					?step p-plan:isStepOfPlan opredict:Plan_Main_Protocol_v01.
	                } 
                 

To run : Yasgui Link

CQ1.2: For the manual parts, who are the developers and who are the agents responsible to execute each step?

PREFIX bpmn: <http://dkm.fbk.eu/index.php/BPMN2_Ontology#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX p-plan: <http://purl.org/net/p-plan#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX opredict: <http://purl.org/plex/Instances/OpenPREDICT#>
SELECT ?step ?role ?agent ?creator ?publisher ?instructions ?description 
                WHERE  
	                {
					values ?stepType { bpmn:ManualTask }

					?instructions rdf:type p-plan:Plan.
					?step rdf:type ?stepType.
					?step dul:isDescribedBy ?instructions.					
					?instructions dc:description ?description.
					?association prov:hadPlan ?instructions.
					?association prov:agent ?agent.
					?association prov:hadRole ?role.
					OPTIONAL {?plan dc:creator ?creator}
					OPTIONAL {?plan dc:publisher ?publisher}

					?step p-plan:isStepOfPlan opredict:Plan_Main_Protocol_v01.
	                } 

To run : Yasgui Link

CQ1.3: Which datasets were used as input for the computational steps and their respective formats?

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX edam: <http://edamontology.org/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX p-plan: <http://purl.org/net/p-plan#>
PREFIX opredict: <http://purl.org/plex/Instances/OpenPREDICT#>
SELECT ?step ?instructions ?usage ?usageEntity ?downloadURL ?dataFormat ?dataFormatLabel
WHERE  
{
	?usageEntity rdf:type dcat:Distribution. 
	?usage prov:entity ?usageEntity.
	?plan prov:qualifiedUsage ?usage.
	?step dul:isDescribedBy ?plan.
	?step rdf:type edam:operation_2409.

	?usageEntity dcat:mediaType ?dataFormat
	OPTIONAL { ?dataFormat rdfs:label ?dataFormatLabel.}
	
	OPTIONAL { ?usageEntity dcat:downloadURL ?downloadURL.}
	OPTIONAL { ?plan dc:description ?instructions.}
	OPTIONAL { ?step p-plan:hasInputVar ?varInput.}
	OPTIONAL { ?step p-plan:hasOutputVar ?varOutput.}

	?step p-plan:isStepOfPlan opredict:Plan_Main_Protocol_v01.

} 

To run : Yasgui Link

CQ1.4: What are the inputs and outputs of manual steps?

PREFIX bpmn: <http://dkm.fbk.eu/index.php/BPMN2_Ontology#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX p-plan: <http://purl.org/net/p-plan#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX opredict: <http://purl.org/plex/Instances/OpenPREDICT#>
SELECT ?step ?varInput ?varOutput ?instructions ?description 
                WHERE  
	                {
					values ?stepType { bpmn:ManualTask}

					?instructions rdf:type p-plan:Plan.
					?step dul:isDescribedBy ?instructions.					
					?instructions dc:description ?description.
					OPTIONAL { ?step p-plan:hasInputVar ?varInput.}
					OPTIONAL { ?step p-plan:hasOutputVar ?varOutput.}
					?step p-plan:isStepOfPlan opredict:Plan_Main_Protocol_v01.
	                } 
ORDER BY DESC(?varInput)

To run : Yasgui Link

CQ2.1: What are the main steps of OpenPREDICT protocol?

PREFIX p-plan: <http://purl.org/net/p-plan#>
PREFIX opredict: <http://purl.org/plex/Instances/OpenPREDICT#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX pwo: <http://purl.org/spar/pwo#>
SELECT ?stepA ?stepB
WHERE  
{
	?stepA p-plan:isStepOfPlan opredict:Plan_Main_Protocol_v01.
	?stepA dul:precedes ?stepB.
	
	OPTIONAL {
			opredict:Plan_Main_Protocol_v01 pwo:hasFirstStep ?stepTopLevel.
			?stepTopLevel dul:precedes ?stepB.
		}
}
ORDER BY  DESC(?stepTopLevel)

To run : Yasgui Link

CQ2.2: What are the steps of a plan and how each step instruction is described?

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX p-plan: <http://purl.org/net/p-plan#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX dc: <http://purl.org/dc/terms/>
SELECT ?language ?instructions ?description ?step 
WHERE  
{
	?instructions rdf:type p-plan:Plan.
	?step dul:isDescribedBy ?instructions.					
	?instructions dc:description ?description.
	?instructions dc:language ?language.				
}

To run : Yasgui Link

CQ2.3: What instructions specify the code used in OpenPREDICT steps?

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX p-plan: <http://purl.org/net/p-plan#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX bpmn: <http://dkm.fbk.eu/index.php/BPMN2_Ontology#>
PREFIX dc: <http://purl.org/dc/terms/>
SELECT ?specInstruction ?specification ?instructions ?description ?step ?language 
WHERE  
{
	?instructions rdf:type p-plan:Plan.
	?step dul:isDescribedBy ?instructions.
	?step rdf:type bpmn:ScriptTask.
	?instructions dc:description ?description.
	?instructions dc:language ?language.
	OPTIONAL
	{
		?instructions dul:isDescribedBy ?specInstruction.
		?specInstruction dc:description ?specification.
	}
}
ORDER BY ?step

To run : Yasgui Link

CQ3.1: What are the existing versions of a workflow and what are their provenance?

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX dc: <http://purl.org/dc/terms/>
SELECT ?workflow  ?wflVersion ?creator ?createDate
WHERE  
{
	?workflow rdf:type dul:Workflow.
	?workflow dc:hasVersion ?wflVersion.
	?workflow dc:creator ?creator.
	?workflow dc:created ?createDate.
}

To run : Yasgui Link

CQ3.2: Which instructions were removed/changed/added from one version to another?

PREFIX p-plan: <http://purl.org/net/p-plan#>
PREFIX opredict: <http://purl.org/plex/Instances/OpenPREDICT#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX bpmn: <http://dkm.fbk.eu/index.php/BPMN2_Ontology#>
PREFIX prov: <http://www.w3.org/ns/prov#>
SELECT * 
WHERE  
{
	?step p-plan:isStepOfPlan opredict:Plan_Main_Protocol_v01.
	?step dul:isDescribedBy ?instruction.
	?step rdf:type ?stepType.
	values ?stepType { bpmn:ManualTask bpmn:ScriptTask }
	
	FILTER NOT EXISTS 
	{
		?step p-plan:isStepOfPlan opredict:Plan_Main_Protocol_v02.
		
	}
	
	FILTER NOT EXISTS 
	{
		?instructionNextVersion prov:wasRevisionOf ?instruction.
		?stepNextVersion dul:isDescribedBy ?instructionNextVersion. 
		?stepNextVersion p-plan:isStepOfPlan opredict:Plan_Main_Protocol_v02.
	}
	
}

To run : Yasgui Link

CQ3.3: Which steps were automatized from one version to another?

PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX bpmn: <http://dkm.fbk.eu/index.php/BPMN2_Ontology#>
SELECT ?stepPriorVersion ?planPriorVersion ?stepNewVersion ?planNewVersion
WHERE  
{					
	?planNewVersion prov:wasRevisionOf ?planPriorVersion.
    ?planNewVersion dc:description ?planNewVersionDesc.
	?planPriorVersion dc:description ?planPriorVersionDesc.
	?stepNewVersion dul:isDescribedBy ?planNewVersion.
	?stepNewVersion rdf:type ?stepNewVersionType. 
	?stepPriorVersion dul:isDescribedBy ?planPriorVersion.
	?stepPriorVersion rdf:type ?stepPriorVersionType. 

	values ?stepPriorVersionType { bpmn:ManualTask}.
	values ?stepNewVersionType { bpmn:ScriptTask}

} 

To run : Yasgui Link

CQ3.4: Which datasets were removed/changed/added for the different versions?

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX edam: <http://edamontology.org/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX p-plan: <http://purl.org/net/p-plan#>
PREFIX opredict: <http://purl.org/plex/Instances/OpenPREDICT#>
SELECT ?step ?instructions ?usage ?usageEntity ?downloadURL ?dataFormat ?dataFormatLabel
WHERE  
{
	?usageEntity rdf:type dcat:Distribution. 
	?usage prov:entity ?usageEntity.
	?plan prov:qualifiedUsage ?usage.
	?step dul:isDescribedBy ?plan.
	?step rdf:type edam:operation_2409.

	?usageEntity dcat:mediaType ?dataFormat
	OPTIONAL { ?dataFormat rdfs:label ?dataFormatLabel.}
	
	OPTIONAL { ?usageEntity dcat:downloadURL ?downloadURL.}
	OPTIONAL { ?plan dc:description ?instructions.}
	OPTIONAL { ?step p-plan:hasInputVar ?varInput.}
	OPTIONAL { ?step p-plan:hasOutputVar ?varOutput.}

	OPTIONAL {?step p-plan:isStepOfPlan opredict:Plan_Main_Protocol_v01}
    OPTIONAL {?step p-plan:isStepOfPlan opredict:Plan_Main_Protocol_v02}

} 

To run : Yasgui Link

CQ3.5: Which workflow version was used in each execution and what was generated?

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX p-plan: <http://purl.org/net/p-plan#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX prov: <http://www.w3.org/ns/prov#>
SELECT ?plan ?version ?execution ?stepExecuted ?wflExecArtifact 
WHERE  
{
	?execution rdf:type p-plan:Activity.
	?execution p-plan:correspondsToStep ?stepExecuted.
	?stepExecuted p-plan:isStepOfPlan ?plan.
	?plan rdf:type dul:Workflow.
	?plan dc:hasVersion ?version.
	?execution prov:generated ?wflExecArtifact.
}
ORDER BY ?version

To run : Yasgui Link

How to reproduce the results?

  • Use the OpenPREDICT GraphDB SPARQL endpoint (http://graphdb.dumontierlab.com/repositories/openpredict) to query all data
  • If you don't want to use the given SPARQL endpoint, collect all sources from given links and pre-process bio2rdf datasets (see section: Pre-processing Bio2RDF data ), then create your triple store and upload each RDF data into your triple store (currently tested with GraphDB or Virtuoso)
  • Clone the project
git clone https://github.com/fair-workflows/openpredict.git
  • Install docker to set up the environment

To install docker: https://docs.docker.com/install/

  • Build

From openpredict directory, edit workflow/config.yml file, set sparql_ep to the running SPARQL endpoint or your own SPARQL endpoint

cd openpredict/
docker build -t openpredict .
  • Run Juypter
docker run -d --rm --name openpredict -p 8888:8888 openpredict
  • Execute CWL workflow
docker exec -it openpredict cwltool --outdir=/juypter/run/ workflow/openpredict-ipynb.cwl workflow/config.yml

--outdir	enter folder in which you want to generate the outputs
 

You would expect to see two juypter notebook output notebook files (output_fg.ipynb, output_ml.ipynb) and the other generated results to be stored in your outdir ('/juypter/run/')