-
Notifications
You must be signed in to change notification settings - Fork 2
ARAX
Note: for the developer Wiki, see here https://github.com/RTXteam/RTX/wiki
ARAX is a web-based software service and tool for exploring structured biomedical knowledge graphs from multiple sources, to answer translational questions. ARAX can answer can answer Translator queries that are expressed in the JSON-based Translator Reasoner API (TRAPI) format, with query graphs that nodes and edges, as well as additional query metadata about the user's intent. An open-access article describing ARAX has been published in the journal Bioinformatics. ARAX is integrated with numerous Knowledge Sources and Knowledge Providers, and it can automatically determine (though the user can override) which knowledge sources and providers are utilized to answer a given user query.
ARAX can be queried using either of two query representations:
- Via TRAPI (Translator API) format, which triggers automated answering and ranking.
- Using ARAXi, a domain-specific language that allows users more fine-grained control on what algorithms are utilized and how when asking their questions.
There are two ways that an end-user can post a query to ARAX:
- Via the NCATS Biomedical Data Translator's (still in beta) web-based User Interface.
- Via ARAX's web browser interface
There are two ways that a developer can post a query to ARAX:
- Via ARAX's web API
- Via the NCATS Biomedical Data Translator's Autonomous Relay System (ARS)
There are two main modes for interacting with ARAX: the first is via posting TRAPI messages to the ARAX API. Examples of doing this are included here.
The second way to interact with ARAX is via the GUI. There, you will see four different query types:
- You can build a query graph by clicking on this icon:
- You can enter the value of the
query_graph
element in a TRAPI message (circumventing the need to manually POST TRAPI queries) by clicking on this icon: - You can enter ARAXi domain specific language commands by clicking on this icon:
- You can enter an ARS PK ID (to pull results from the Automated Reasoning System) after clicking on this icon:
No matter which method is used, after submitting a query, the results can be viewed via the links on the left vertical bar under output: .
If you want to look up an identifier for a specific natural language term, please use the Synonyms link under the Tools section of the left vertical bar:
Each of the query methods has a link to an example so a user can see what sort of information is to be expected. If you run into any issues with using any aspect of the system, please open an issue here.
ARAX is registered in Smart API here.
If you would like to deploy your own instance, please see the dependencies listed here, here for how to build the Expander Agent portion of the Knowledge Graph here (more info about this knowledge graph, called KG2, is available here), and the deployment wiki.
- one-hop query:
cat <<EOF >onehop.json
{
"bypass_cache": false,
"enforce_edge_directionality": false,
"max_results": 100,
"message": {
"query_graph": {
"edges": {
"e00": {
"object": "n01",
"predicates": ["biolink:physically_interacts_with"],
"subject": "n00"
}
},
"nodes": {
"n00": {
"categories": ["biolink:ChemicalSubstance"],
"ids": ["CHEMBL.COMPOUND:CHEMBL112"]
},
"n01": {
"categories": ["biolink:Protein"]
}
}
}
},
"page_number": 1,
"page_size": 100,
"return_minimal_metadata": false,
"stream_progress": false
}
EOF
curl -X POST \
"https://arax.ncats.io/api/arax/v1.1/query?bypass_cache=false" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-d @onehop.json
should result in this response:
{
"context": "https://raw.githubusercontent.com/biolink/biolink-model/master/context.jsonld",
"datetime": "2021-05-10 11:56:19",
"description": "Normal completion",
"id": "https://arax.ncats.io/api/arax/v1.1/response/9182",
"logs": [
{
"code": "",
"level": "INFO",
"message": "ARAX Query launching on incoming Query",
"timestamp": "2021-05-10T11:56:19.774118"
},
...
],
"message": {
"knowledge_graph": {
"edges": {
"ARAX/KG2:CHEBI:4056-biolink:physically_interacts_with-CHEMBL.COMPOUND:CHEMBL112": {
"attributes": [
{
"attribute_source": "infores:rtx_kg2_kp",
"attribute_type_id": "biolink:original_source",
"value": "infores:semmeddb",
"value_type_id": "biolink:InformationResource"
},
{
"attribute_source": "infores:semmeddb",
"attribute_type_id": "biolink:has_supporting_publications",
"value": [
"PMID:10872641",
"PMID:11330834",
"PMID:23032911",
"PMID:25319358",
"PMID:25753323",
"PMID:25956474",
"PMID:30293568",
"PMID:30915487",
"PMID:31600996"
],
"value_type_id": "biolink:Publication"
},
{
"attribute_source": "infores:arax_ara",
"attribute_type_id": "biolink:knowledge_provider_source",
"value": "infores:rtx_kg2_kp",
"value_type_id": "biolink:InformationResource"
}
],
"object": "CHEMBL.COMPOUND:CHEMBL112",
"predicate": "biolink:physically_interacts_with",
"subject": "CHEBI:4056"
},
"ARAX/KG2:CHEMBL.COMPOUND:CHEMBL112-biolink:molecularly_interacts_with-UniProtKB:O00519": {
"attributes": [
{
"attribute_source": "infores:rtx_kg2_kp",
"attribute_type_id": "biolink:original_source",
"value": "infores:chembl",
"value_type_id": "biolink:InformationResource"
},
{
"attribute_source": "infores:arax_ara",
"attribute_type_id": "biolink:knowledge_provider_source",
"value": "infores:rtx_kg2_kp",
"value_type_id": "biolink:InformationResource"
}
],
"object": "UniProtKB:O00519",
"predicate": "biolink:molecularly_interacts_with",
"subject": "CHEMBL.COMPOUND:CHEMBL112"
},
"ARAX/KG2:CHEMBL.COMPOUND:CHEMBL112-biolink:molecularly_interacts_with-UniProtKB:P08684": {
"attributes": [
{
"attribute_source": "infores:rtx_kg2_kp",
"attribute_type_id": "biolink:original_source",
"value": "infores:pharos",
"value_type_id": "biolink:InformationResource"
},
{
"attribute_source": "infores:arax_ara",
"attribute_type_id": "biolink:knowledge_provider_source",
"value": "infores:rtx_kg2_kp",
"value_type_id": "biolink:InformationResource"
}
],
"object": "UniProtKB:P08684",
"predicate": "biolink:molecularly_interacts_with",
"subject": "CHEMBL.COMPOUND:CHEMBL112"
},
"ARAX/KG2:CHEMBL.COMPOUND:CHEMBL112-biolink:molecularly_interacts_with-UniProtKB:P10635": {
"attributes": [
{
"attribute_source": "infores:rtx_kg2_kp",
"attribute_type_id": "biolink:original_source",
"value": "infores:pharos",
"value_type_id": "biolink:InformationResource"
},
{
"attribute_source": "infores:arax_ara",
"attribute_type_id": "biolink:knowledge_provider_source",
"value": "infores:rtx_kg2_kp",
"value_type_id": "biolink:InformationResource"
}
],
"object": "UniProtKB:P10635",
"predicate": "biolink:molecularly_interacts_with",
"subject": "CHEMBL.COMPOUND:CHEMBL112"
},
"ARAX/KG2:CHEMBL.COMPOUND:CHEMBL112-biolink:molecularly_interacts_with-UniProtKB:P12268": {
"attributes": [
{
"attribute_source": "infores:rtx_kg2_kp",
"attribute_type_id": "biolink:original_source",
"value": "infores:pharos",
"value_type_id": "biolink:InformationResource"
},
{
"attribute_source": "infores:arax_ara",
"attribute_type_id": "biolink:knowledge_provider_source",
"value": "infores:rtx_kg2_kp",
"value_type_id": "biolink:InformationResource"
}
],
"object": "UniProtKB:P12268",
"predicate": "biolink:molecularly_interacts_with",
"subject": "CHEMBL.COMPOUND:CHEMBL112"
},
...
- two-hop query with several various overlay commands and filtering:
cat <<EOF >kitchensink.json
{
"bypass_cache": false,
"enforce_edge_directionality": false,
"max_results": 100,
"message": {},
"operations": {"actions": [
"add_qnode(name=arthritis, key=n00)",
"add_qnode(categories=biolink:Protein, is_set=true, key=n01)",
"add_qnode(categories=biolink:ChemicalSubstance, key=n02)",
"add_qedge(subject=n00, object=n01, key=e00)",
"add_qedge(subject=n01, object=n02, key=e01, predicates=biolink:physically_interacts_with)",
"expand(edge_key=[e00,e01], kp=ARAX/KG2)",
"overlay(action=overlay_clinical_info, observed_expected_ratio=true, virtual_relation_label=C1, subject_qnode_key=n00, object_qnode_key=n02)",
"filter_kg(action=remove_edges_by_attribute, edge_attribute=probably_treats, direction=below, threshold=.8, remove_connected_nodes=t, qnode_key=n02)",
"overlay(action=compute_jaccard, start_node_key=n00, intermediate_node_key=n01, end_node_key=n02, virtual_relation_label=J1)",
"overlay(action=predict_drug_treats_disease, subject_qnode_key=n02, object_qnode_key=n00, virtual_relation_label=P1)",
"resultify(ignore_edge_direction=true)",
"filter_results(action=limit_number_of_results, max_results=15)",
"return(message=true, store=false)"
]},
"page_number": 1,
"page_size": 100,
"return_minimal_metadata": false,
"stream_progress": false
}
EOF
curl -X POST \
"https://arax.ncats.io/api/arax/v1.1/query?bypass_cache=false" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-d @kitchensink.json
should result in this response: (This utilizes a few different overlay commands which hit a few databases so this may take a minute or two)
{
"context": "https://raw.githubusercontent.com/biolink/biolink-model/master/context.jsonld",
"datetime": "2021-05-10 11:36:58",
"description": "Normal completion",
"logs": [
...
],
"message": {
"knowledge_graph": {
"edges": {
"ARAX/KG2:CHEBI:67079-biolink:physically_interacts_with-CHEMBL.TARGET:CHEMBL1641359": {
"attributes": [
{
"attribute_source": "infores:rtx_kg2_kp",
"attribute_type_id": "biolink:original_source",
"value": "infores:semmeddb",
"value_type_id": "biolink:InformationResource"
},
{
"attribute_source": "infores:semmeddb",
"attribute_type_id": "biolink:has_supporting_publications",
"value": [
"PMID:22552402",
"PMID:27320659",
"PMID:30199704"
],
"value_type_id": "biolink:Publication"
},
{
"attribute_source": "infores:arax_ara",
"attribute_type_id": "biolink:knowledge_provider_source",
"value": "infores:rtx_kg2_kp",
"value_type_id": "biolink:InformationResource"
}
],
"object": "CHEMBL.TARGET:CHEMBL1641359",
"predicate": "biolink:physically_interacts_with",
"subject": "CHEBI:67079"
},
"ARAX/KG2:CHEBI:67079-biolink:physically_interacts_with-CHEMBL.TARGET:CHEMBL3301559": {
"attributes": [
{
"attribute_source": "infores:rtx_kg2_kp",
"attribute_type_id": "biolink:original_source",
"value": "infores:semmeddb",
"value_type_id": "biolink:InformationResource"
},
{
"attribute_source": "infores:semmeddb",
"attribute_type_id": "biolink:has_supporting_publications",
"value": [
"PMID:29427163"
],
"value_type_id": "biolink:Publication"
},
{
"attribute_source": "infores:arax_ara",
"attribute_type_id": "biolink:knowledge_provider_source",
"value": "infores:rtx_kg2_kp",
"value_type_id": "biolink:InformationResource"
}
],
"object": "CHEMBL.TARGET:CHEMBL3301559",
"predicate": "biolink:physically_interacts_with",
"subject": "CHEBI:67079"
},
"ARAX/KG2:CHEBI:67079-biolink:physically_interacts_with-UniProtKB:O00206": {
"attributes": [
{
"attribute_source": "infores:rtx_kg2_kp",
"attribute_type_id": "biolink:original_source",
"value": "infores:semmeddb",
"value_type_id": "biolink:InformationResource"
},
{
"attribute_source": "infores:semmeddb",
"attribute_type_id": "biolink:has_supporting_publications",
"value": [
"PMID:30099678"
],
"value_type_id": "biolink:Publication"
},
{
"attribute_source": "infores:arax_ara",
"attribute_type_id": "biolink:knowledge_provider_source",
"value": "infores:rtx_kg2_kp",
"value_type_id": "biolink:InformationResource"
}
],
"object": "UniProtKB:O00206",
"predicate": "biolink:physically_interacts_with",
"subject": "CHEBI:67079"
},
"ARAX/KG2:CHEBI:67079-biolink:physically_interacts_with-UniProtKB:P01137": {
"attributes": [
{
"attribute_source": "infores:rtx_kg2_kp",
"attribute_type_id": "biolink:original_source",
"value": "infores:semmeddb",
"value_type_id": "biolink:InformationResource"
},
{
"attribute_source": "infores:semmeddb",
"attribute_type_id": "biolink:has_supporting_publications",
"value": [
"PMID:15285804"
],
"value_type_id": "biolink:Publication"
},
{
"attribute_source": "infores:arax_ara",
"attribute_type_id": "biolink:knowledge_provider_source",
"value": "infores:rtx_kg2_kp",
"value_type_id": "biolink:InformationResource"
}
],
"object": "UniProtKB:P01137",
"predicate": "biolink:physically_interacts_with",
"subject": "CHEBI:67079"
},
...
Currently, ARAX will query every SmartAPI registered, TRAPI compliant knowledge provider. These include:
- Clinical Data Provider
- Exposure Provider
- Molecular Data Provider
- Service Provider
- Genetics Provider
- Connections Hypothesis Provider
- RTX-KG2
- The entire codebase is accessible at https://github.com/RTXteam/RTX/
- ARAXi Domain Specific Language documentation: https://github.com/RTXteam/RTX/blob/master/code/ARAX/Documentation/DSL_Documentation.md
- Project README: https://github.com/RTXteam/RTX/blob/master/README.md
The ARAX ranking algorithm primarily uses two pieces of information: the shape of a result graph (eg. smaller diameter graphs with more edges are better) and quantitative information on edges. In more detail, after gathering the result graph from Knowledge Providers, ARAX adds Normalized Google Distance edges that essentially represent literature co-occurrence between nodes. Next, all edges are considered more or less highly "weighted" based on their origin and associated publications (eg. SemMedDB edges with few associated publications are weighted less than a DrugBankDB edge). After this, three different structural algorithms are used to combine these edge weights and the shape of the graph into a score (specifically, Max Flow, Frobenius Norm, and Maximum Weighted path), and the average rank from these three approaches is used as the final score. Because of this, scores from different queries are not comparable, but scores within the same query are comparable. A visualization of the ranking algorithm is available here.