Skip to content

Use LLM for predict standard LOINC code from lab source code

License

Notifications You must be signed in to change notification settings

xngli/llm_for_loinc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llm_for_loinc

Use LLM for predict standard LOINC code from lab source code

This repo is hosting code for the final project of CSE 6250 at Georgia Tech. The goal is to reproduce key results from the paper Automated LOINC Standardization Using Pre-trained Large Language Models. The code is based on TensorFlow.

Dependencies

The dependencies are specified in requirements.txt. Run the following to install dependencies

pip install -r requirements.txt

Datasets and preprocessing

There are two datasets required for this project: 1) the offical LOINC table, and 2) the MIMIC-III Clinical Database 1.4. Proper registration and/or training is needed to obtain both datasets therefore they are not included in this repository. Once these data are downloaded they should be placed in the /datasets folder.

Before model training and testing, run the following to preprocess and augment the data

python preprocessing/data_processing.py

Model training and testing

The model development consists of different stages. Use the following for training and testing for each stage

  • Test pre-trained Sentence-T5 model (the module includes code that downloads the pre-trained weights from TensorFlow Hub)
python test/test_pretained.py
  • Train and test in first stage fine-tuning
python train/train_first_stage.py
python test/test_first_stage.py
  • Train and test in second stage fine-tuning
python train/train_second_stage.py
python test/test_second_stage.py
  • Skip first stage fine-tuning and go directly to second stage fine-tuning
python test/test_skip_stage.py

About

Use LLM for predict standard LOINC code from lab source code

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages