A sophisticated Natural Language Processing (NLP) system specifically designed for medical text analysis. This pipeline combines state-of-the-art NLP models to extract meaningful information from medical texts, including patient information, conditions, temporal data, and intent classification.
- Advanced Entity Extraction: Powered by GLiNER for accurate medical entity recognition
- Smart Intent Classification: Zero-shot classification using BART model
- Temporal Information Analysis: Precise extraction of dates, times, and frequencies
- Structured Data Output: Well-organized JSON output format
- Medical Domain Specialization: Optimized for medical terminology and context
-
π€ Patient Information
- Names
- Age
- Gender
- Patient IDs
-
π₯ Medical Information
- Conditions
- Symptoms
- Medications
- Procedures
- Tests
- Dosages
- Frequencies
-
π Temporal Information
- Dates
- Times
- Durations
- Frequencies
-
π’ Location Information
- Hospitals
- Departments
- Rooms
- Python 3.8+
- 4GB+ RAM
- CUDA-compatible GPU (optional, for faster processing)
transformers>=4.30.0
torch>=2.0.0
spacy>=3.5.0
gliner>=1.0.0
pydantic>=2.0.0
fastapi>=0.100.0
uvicorn>=0.22.0
- Clone the repository
git clone https://github.com/MohamedSebaie/Medical_Task_Management_ChatBot.git
cd Medical_Task_Management_ChatBot
- Create a virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install required packages
pip install -r requirements.txt
- Install Spacy model
python -m spacy download en_core_web_sm
from app.services.nlp_pipeline import MedicalNLPPipeline
# Initialize pipeline
nlp = MedicalNLPPipeline()
# Process text
text = "Patient John Doe, 45 years old, presents with diabetes and hypertension"
result = nlp.process_text(text)
# Access results
print(result["entities"]) # Extracted entities
print(result["intent"]) # Classified intent
from fastapi import FastAPI
from app.services.nlp_pipeline import MedicalNLPPipeline
app = FastAPI()
nlp = MedicalNLPPipeline()
@app.post("/process")
async def process_text(text: str):
return nlp.process_text(text)
{
"intent": {
"primary_intent": "add_patient",
"confidence": 0.944
},
"entities": {
"patient_info": [
{
"text": "John Doe",
"type": "patient",
"confidence": 0.995
},
{
"text": "45 years old",
"type": "age",
"confidence": 0.990
}
],
"medical_info": [
{
"text": "diabetes",
"type": "condition",
"confidence": 0.986
},
{
"text": "hypertension",
"type": "condition",
"confidence": 0.985
}
]
},
"temporal_info": {
"dates": [],
"times": [],
"patterns": []
},
"processed_at": "2024-12-20T02:38:48.669624"
}
- Primary entity extraction engine
- Pre-trained on medical data
- Advanced medical terminology handling
- Intent classification using BART
- Flexible classification system
- Multiple medical intent support
- Linguistic feature extraction
- Temporal information processing
- Text preprocessing
- Text Input β Preprocessing
- Parallel Processing:
- Entity Extraction (GLiNER)
- Intent Classification (Zero-shot)
- Temporal Analysis (SpaCy)
- Result Structuring
- Output Generation
CUDA_VISIBLE_DEVICES=0
MODEL_PATH=/path/to/models
LOG_LEVEL=INFO
# config.py
DEFAULT_CONFIG = {
"model_name": "urchade/gliner_base",
"device": -1, # CPU
"batch_size": 32,
"max_length": 512
}
Add a new patient Sarah Miller, female, 32 years old, with hypertension
Assign medication Lisinopril 10mg daily for Sarah Miller
Schedule appointment for Sarah Miller next Monday at 2 PM
Add a new patient Michael Chen, male, 55 years old, with arthritis
Assign medication Methotrexate 1000mg weekly for Michael Chen
Schedule follow-up for Michael Chen in 2 weeks
Add a new patient Emily Wilson, female, 28 years old, with migraine
Assign medication SuperPainAway 250mg for Emily Wilson
Book appointment for Emily Wilson tomorrow morning
Add a new patient Robert Brown, male, 61 years old, with high cholesterol
Assign medication Atorvastatin 40mg for Robert Brown
Schedule check-up for Robert Brown next Friday
Add a new patient Lisa Anderson, female, 35 years old, with pregnancy
Assign medication Accutane 20mg daily for Lisa Anderson
Schedule prenatal check for Lisa Anderson next week
-
GLiNER Initialization Errors
Solution: Verify CUDA installation and model paths
-
Memory Issues
Solution: Reduce batch size or use CPU mode
-
Performance Optimization
Solution: Enable batch processing for large datasets
- Entity Recognition Accuracy: ~95%
- Intent Classification Accuracy: ~94%
- Processing Speed: ~100ms per text (GPU)
- Maximum Text Length: 512 tokens
# Run tests
pytest tests/
# Run specific test
pytest tests/test_pipeline.py -k test_entity_extraction
# Format code
black .
# Check types
mypy .
Full documentation is available in the /docs
directory:
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE
for more information.
- GLiNER team for the base model
- Hugging Face for transformers library
- SpaCy for NLP utilities
Mohamed Sebaie - LinkedIn Profile - [email protected]
Project Link: https://https://github.com/MohamedSebaie/Medical_Task_Management_ChatBot
βοΈ If you found this project useful, please give it a star!