Skip to content

Naming Input Files

johnsd11 edited this page Feb 11, 2022 · 3 revisions

*NOTE: DeepPhe can only process plain text files. It does not handle files with embedded note text, markup tags or binary formats (word, pdf).

DeepPhe Input File Names

Follow these naming conventions to produce better results. DeepPhe places importance on different sections based on the type of report/note. Use the following filename suffixes preceded by the underscore _ character:

Clinical Notes:

  • NOTE
  • PGN
  • DS

Pathology Reports:

  • SP
  • PATH

Radiology Reports:

  • RAD

Examples

     fake_patient1_doc1_RAD.txt
     fake_patient1_doc2_SP.txt
     fake_patient1_doc3_NOTE.txt
     fake_patient1_doc4_DS.txt
     fake_patient1_doc9_PATH.txt

Explanation of the origin of the abbreviations used

  • PGN - Progress Note
  • DS - Discharge Summary
  • NOTE - Clinical Note
  • SP - Surgical Pathology
  • PATH - Pathology Report
  • RAD - Radiology Report

DeepPhe input file directories

DeepPhe associates notes with patients based upon directory structure. Create a single directory to contain a collection of patients. Within that directory create a directory per patient. Place all notes associated with a patient in that patient's directory.

Example

breast_cancer_corpus/
       patient01/
              patient01_05_01_2019_NOTE.txt
              patient01_05_01_2019_RAD.txt
       patient02/
              hospitalA_visit4_NOTE.txt
              procedureXX_DS.txt