Implementation of basic nlp text summarization task in complete end to end ML project.
- Create
template.py
to create the structure of the project - Create a python virtual environment specifically for this project
- Write the required packages in
requirements.txt
Package Name | Usage |
---|---|
transformers | Pretrained Models |
transformers[sentencepiece] | Unsupervised Text tokenizer and detokenizer |
datasets | For efficient data preprocessing and public datasets |
rouge-score | Evaluating the auto gen text with human produced summary |
py7zr | to compress , decompress , encrypt and decrypt |
pandas | To work with tabulated data |
nltk | Text processing library for classification,tokenization,stemming,... |
tqdm | Produce progress bar in for loop |
PyYAML | YAML parcer and emitter |
matplotlib | Produce static , animated , visualization |
torch | DL library |
notebook | Jupyter notebook environment |
boto3 | AWS Software Development Toolkit |
mypy-boto3-s3 | Type annotation for boto3 |
python-box | Replacement for dict() datatype |
ensure | An assertion helper |
fastapi | Web framework for bulding API |
uvicorn | ASGI (Asynchronous Server Gateway Interface) web server for python |
jinja2 | Template engine |
- Writing code in
logging/__init__.py
- Writing code in
utils/common.py
- Work on the
research/*.ipynb
files for training and saving the model - Upload the dataset in the public cloud like
github
- Stages:
- Data ingestion 🟢
- Data Validation 🟢
- Data Transformation 🟢
- Model Training 🟢
- Model Evaluation 🟢
- Working on the TRAINING WORKFLOW for each stage:
- Update the
config/config.yaml
- Initialize the
params.yaml
- Update the
entity
and define the dataclassess - Update the
src/constants/
- Update the
src/config/configuration.py
- Update the
src/components
- Update the
src/pipeline
- Update the
main.py
- Update the
- Before model training stage:
- Following pip commands must be executed :
!pip install --upgrade accelerate
!pip uninstall -y transformers accelerate
!pip install transformers accelerate
- Update the
params.yaml
file
- Following pip commands must be executed :
- Work on PREDICTION PIPELINE :
- Create
src/pipeline/prediction.py
- Update the
app.py
- Create
- Work on DEPLOYMENT PHASE:
- Update
Dockerfile
- Create
.github/workflows/main.yaml
for CI/CD
- Update