-
Notifications
You must be signed in to change notification settings - Fork 10
Section Headings
DeepPhe uses a list of section headings to identify parts of notes and reports to consider more important than other parts. This helps reduce the amount of noise (false positives) within the system's output.
The default list of section headings is contained in the sections.txt
file.
The first few lines are comments about the file.
Note that by default the sections headings in the list are case sensitive.
You can modify the existing file or you can create a copy and modify that copy.
To make the system use a modified copy, update the file data/pipeline/DeepPhe.document.piper
so that this line
add Sectionizer
looks like the following line, with OUR.sections.txt
replaced with your filename (and directory).
# Use customized list of section headers
add Sectionizer sections_file=/org/apache/ctakes/cancer/sections/OUR.sections.txt
You can add variations of the existing section headings to the end of the line by separating them with a comma.
For example, if your institution uses **FINAL REPORT
as a heading, you can update the line
Final Report,Final Report
to be
Final Report,Final Report,\*\*FINAL REPORT
Note that each asterisk was escaped because it is one of the characters that is required to be escaped, as described by the comments at the top of the sections.txt file.
You can add new section headings too. To do so, add a new line to the sections.txt file in the following format
- the line starts with a nicely readable name for the section
- after that, add a comma
- after that, add a comma-separated list of each variation that should be recognized as equivalent
For example, if your reports contain a section with the heading 'GENOMIC RESULTS:', you might add the following line
Genomic Results,GENOMIC RESULTS:
Note that would only recognize the upper case heading to be recognized.
If you also want a heading of 'Genomic Results' to be recognized, your line would be
Genomic Results,Genomic Results,GENOMIC RESULTS:
If you also want the heading GENOMIC RESULTS (without the trailing colon) you could add that as well.
Genomic Results,Genomic Results,GENOMIC RESULTS:,GENOMIC RESULTS
However, DeepPhe will append a colon to all the patterns, so this would suffice:
Genomic Results,Genomic Results,GENOMIC RESULTS
This example line
Genomic Results,Genomic,GENOMIC RESULTS:
would identify the following lines as the start of a Genomic Results section
Genomic
Genomic:
GENOMIC RESULTS:
The following would not be recognized as the start of a Genomic Results section
Genomic Result (additional word on the line)
GENOMIC RESULTS (missing trailing colon)
You can use the SectionWriter component to see which sections of a document are being recognized
Adding the following line to DeepPhe.document.piper after the add Sectionizer
line will create a subdirectory where files will be written containing the sections headings the system found in your reports/notes.
add org.healthnlp.deepphe.uima.cc.SectionWriter SubDirectory=SECTIONS
You can add the SectionWriter multiple times to see how the sections change as the pipeline runs - after Sectionizer and then again after SectionRemover.
// Discover sections.
add Sectionizer
// Write out the sections recognized
add org.healthnlp.deepphe.uima.cc.SectionWriter SubDirectory=SECTIONS
// Remove sections that should not be used by the rest of the pipeline.
add SectionRemover
// Using SectionWriter after SectionRemover writes out the headers of
// the sections that will be annotated
add org.healthnlp.deepphe.uima.cc.SectionWriter SubDirectory=SECTIONS_IMPORTANT
Open the Pages section (above) to view all pages by title.
Installation
Windows
Mac
Linux
Known Issues
Quickstart
Windows Quickstart Tutorial
Mac Quickstart Tutorial
Command line-execution
Using DeepPhe-XN With Your Own Files
Input File Requirements
Configuring DeepPhe Phenotype Summarizer
Specifying and Interpreting Outputs
Example Files Description
Using DeepPhe-XN Visualizer
Additional Output File Types
Example File Data Specifics
About the DeepPhe Pipeline
Pipeline Description
Code Modules
-
AE
-
division
-
section
-
temporal
-
CR
-
filetree
-
naaccr