Welcome to my Language Processing and Text Categorization Projects Repository! This collection showcases various projects that address challenges in natural language processing (NLP) and text analysis, leveraging both classical techniques and modern deep learning solutions.
This repository contains implementations and solutions for a range of language processing and categorization tasks, utilizing techniques such as:
- Classic NLP methods for spell checking and error correction
- Deep Learning models for text classification and sentiment analysis
- Large Language Models (LLMs) for semantic understanding and fine-tuning
Explore the following projects included in this repository:
A spell checker that utilizes Levenshtein distance for error correction and finite state transducers (FSTs) to model and correct typographical errors in text.
Building a deep learning model to classify text based on sentiment using pre-trained word embeddings like GloVe, along with a custom neural network architecture.
Utilizing BERT, GPT, and similar models to perform semantic analysis and fine-tune pre-trained models for text classification tasks like sentiment analysis.
- PyTorch, TensorFlow for deep learning model development
- scikit-learn for machine learning approaches and data preprocessing
- Hugging Face Transformers for working with pre-trained large language models
- NLTK, spaCy for natural language processing tasks
- Levenshtein distance for spell checking and typo correction
- Word embeddings (GloVe, Word2Vec) for semantic representation
- Deep Neural Networks for text classification and sentiment analysis
- BERT/GPT for advanced semantic analysis and fine-tuning
Each project includes detailed results and experiments showcasing the performance of different approaches:
- Spell Checker: Accuracy of spelling correction and performance on common misspellings.
- Text Classification: Performance metrics like accuracy, recall, and F1-score for sentiment analysis tasks.
- Large Language Models: Comparative performance of pre-trained models on various text classification datasets.