This project showcases an end-to-end deep learning implementation for classifying chest cancer using CT scan images. Leveraging MLflow and DVC, this project incorporates MLOps principles, focusing on efficient experiment tracking, modular pipeline design, and deployment-ready solutions.
-
Comprehensive Workflow:
- Modular implementation of deep learning pipelines.
- Utilizes MLflow for experiment tracking and model registration.
- Tracks and optimizes data pipelines with DVC.
-
Scalable Framework:
- Seamless integration of MLflow with local and remote servers.
- CICD-based deployments using Docker and cloud platforms.
-
User-Friendly Interface:
- Deployable web app for cancer classification.
- Real-time predictions from CT scan images.
- Languages: Python
- Libraries: TensorFlow, Keras, MLflow, DVC
- Deployment: Docker, AWS, Azure
- Frontend: HTML, CSS (basic)
- Version Control: GitHub
├── app
│ ├── api
│ ├── pipeline
│ ├── templates
│ ├── utils
├── data
│ ├── raw
│ └── processed
├── experiments
│ └── MLflow runs and metadata
├── docker
├── models
├── notebooks
│ └── experiment_notebooks
└── requirements.txt
- Python: Install Python 3.8 or later.
- MLflow: Set up MLflow locally or on a remote server like DagsHub or AWS.
- DVC: Install DVC for data versioning.
- Docker: Required for deployment.
- Cloud Accounts: AWS and Azure credentials for deployment.
-
Clone the repository:
git clone https://github.com/yourusername/chest-cancer-classification.git cd chest-cancer-classification
-
Install dependencies:
pip install -r requirements.txt
-
Set up MLflow tracking URI:
- For local:
http://localhost:5000
- For remote: Configure with your cloud-based URI.
- For local:
-
Data Ingestion:
- Raw CT scan images are processed into labeled datasets.
- Metadata is managed using DVC.
-
Preprocessing:
- Data augmentation techniques applied to normalize and enhance images.
-
Training and Experimentation:
- Model training using TensorFlow with hyperparameter tuning.
- MLflow tracks all experiment parameters, metrics, and artifacts.
-
Deployment:
- Web application built to accept images and provide predictions.
- Dockerized app deployed on AWS or Azure using CICD pipelines.
-
MLflow Experiment Tracking:
- Automatic logging of parameters, metrics, and models.
- Supports both local and cloud deployments.
-
DVC Integration:
- Tracks dataset versions and ensures data consistency.
-
Web Interface:
- Intuitive image uploader to classify chest cancer from CT scans.
- Predicts cancer types such as Adenocarcinoma.
-
Run MLflow Server:
mlflow ui
-
Execute Training Pipeline:
python app/api/train.py --config config/train_config.yaml
-
Launch Application:
docker-compose up
- Expand dataset for better accuracy.
- Improve web app UI using modern frameworks.
- Automate pipeline with more MLOps tools.
We welcome contributions! Please fork the repository and submit a pull request.
MIT License - See LICENSE for details.