A CTEC Parser
A parser designed for free text clinical trial eligibility criteria (CTEC). Parsing free text CTEC and formalizing into OMOP CDM v5 table. The parser was trained on 250 clinical trials on Alzheimer's. The annotation guidelines is in folder Supple Materials.
Developed in Dr. Chunhua Weng's lab in Department of Biomedical Informatics at Columbia
Author: Tian Kang
Affiliation: Department of Biomedical Informatics, Columbia University
Contact Email: [email protected]
Last update: June 20, 2016 (add Negation detection in NER step)
Version: 1.0
Primary steps:
- Entity recogntion
- Attribute recognition
- Clinical relation identification
- Data standardization
Exmaple input:
Age 70 years or older.
Must have a clinical diagnosis of Alzheimer's Disease.
Life expectancy less than 1 year.
Subjects must be in reasonably good health, based on medical history, physical examination, vital signs, and ECG.
Subjects with a past or current history of seizures cannot participate.
Subjects with clinically significant heart disease, pulmonary disease, diabetes, neurologic or psychiatric disease (Group 1 subjects must have Alzheimer's Disease), or any other illness that could interfere with interpretation of study results.
Current use of donepezil, rivastigmine or galantamine.
GDS-5 score < 6.
Within the previous 2 years, unstable and clinically siginificant cardivascular disease.
Example output:
<root>
<sent>
<text>Age 70 years or older .</text>
<entity class="Observation" index="T1" negated="N" relation="T2:has_value" start="0"> Age </entity>
<attribute class="Measurement" index="T2" start="1"> 70 years or older </attribute>
</sent>
<sent>
<text>Must have a clinical diagnosis of Alzheimer's Disease .</text>
<entity class="Condition" index="T3" negated="N" relation="None" start="6"> Alzheimer's Disease </entity>
</sent>
<sent>
<text>Life expectancy less than 1 year .</text>
<entity class="Observation" index="T4" negated="N" relation="T5:has_value" start="0"> Life expectancy </entity>
<attribute class="Measurement" index="T5" start="2"> less than 1 year </attribute>
</sent>
<sent>
<text>Subjects must be in reasonably good health , based on medical history , physical examination , vital signs , and ECG .</text>
<attribute class="Qualifier" index="T6" start="5"> good </attribute>
<entity class="Condition" index="T7" negated="N" relation="T6:modified_by" start="6"> health </entity>
<entity class="Observation" index="T8" negated="N" relation="T6:modified_by" start="11"> history </entity>
<entity class="Observation" index="T9" negated="N" relation="None" start="13"> physical examination </entity>
<entity class="Observation" index="T10" negated="N" relation="None" start="16"> vital signs </entity>
<entity class="Procedure_Device" index="T11" negated="N" relation="None" start="20"> ECG </entity>
</sent>
<sent>
<text>Subjects with a past or current history of seizures cannot participate .</text>
<entity class="Condition" index="T12" negated="Y" relation="None" start="8"> seizures cannot participate </entity>
</sent>
<sent>
<text>Subjects with clinically significant heart disease , pulmonary disease , diabetes , neurologic or psychiatric disease ( Group 1 subjects must have Alzheimer's Disease ) , or any other illness that could interfere with interpretation of study results .</text>
<attribute class="Qualifier" index="T13" start="2"> clinically significant </attribute>
<entity class="Condition" index="T14" negated="N" relation="T13:modified_by" start="4"> heart disease </entity>
<entity class="Condition" index="T15" negated="N" relation="None" start="7"> pulmonary disease </entity>
<entity class="Condition" index="T16" negated="N" relation="None" start="10"> diabetes </entity>
<entity class="Condition" index="T17" negated="N" relation="None" start="12"> neurologic or psychiatric disease </entity>
<attribute class="Qualifier" index="T18" start="28"> other </attribute>
<entity class="Condition" index="T19" negated="N" relation="T18:modified_by" start="29"> illness </entity>
</sent>
<sent>
<text>Current use of donepezil , rivastigmine or galantamine .</text>
<entity class="Drug" index="T20" negated="N" relation="None" start="3"> donepezil </entity>
<entity class="Drug" index="T21" negated="N" relation="None" start="5"> rivastigmine </entity>
</sent>
<sent>
<text>GDS-5 score smaller than 6 .</text>
<entity class="Observation" index="T22" negated="N" relation="T23:has_value" start="0"> GDS-5 score </entity>
<attribute class="Measurement" index="T23" start="2"> smaller than 6 </attribute>
</sent>
<sent>
<text>Within the previous 2 years , unstable and clinically siginificant cardivascular disease .</text>
<attribute class="Temporal_measurement" index="T24" start="0"> Within the previous 2 years </attribute>
<attribute class="Qualifier" index="T25" start="6"> unstable </attribute>
<attribute class="Qualifier" index="T26" start="8"> clinically siginificant </attribute>
<entity class="Condition" index="T27" negated="N" relation="T24:has_temp|T26:modified_by|T25:modified_by" start="10"> cardivascular disease </entity>
</sent>
</root>
First download all codes and decompress
Fast Usage:
- open
wrapper_for_parsing.sh
- set the parameter lists to your task-based ones
- run
sh wrapper_for_parsing.sh
and parsing results will be generated in XML files.
(See example output directly running sh wrapper_for_parsing.sh
without changing)
Step-by-step Usage:
- NER step: run
python NamedEntityRecognition.py $1:<input directory> $2:<input text name> $3:<output directory>
- Clinical Relation: run
python Relation.py $3:<output directory> $2:<input text name>
Example commands:
python NamedEntityRecognition.py output test.txt output
python Relation.py output test.txt
The example output would be output/test_NER.xml and output/test_Parsed.xml
-
This parser assumes MetaMap is installed and requires that the MetaMap support services are running. If you have MetaMap installed in
$MM
, these can be started as:$MM/bin/skrmedpostctl start
$MM/bin/wsdserverctl start
Go to
features_dir
and openmetamap_tag.sh
; follow the guidance to change the MetaMap root dir and start running -
Python package required:
nltk
networkx
codecs
libsvm
practnlptools
- Stadardize entities and attributes concepts using OHDSI standards
- Convert the final format into JSON
- Extend use case to more diseases
Requirements
- Docker
Usage
Make sure to rename .env.example
to .env
and update the AWS S3 credentials.
Usage: ./deployment/bin/compose [up | eliie [bash | parse | skr | wsd]]