Fake News Detection System (Python) Overview The Fake News Detection System is a Python-based application that uses natural language processing and machine learning techniques to identify and classify news articles as either genuine or fake. The system utilizes a combination of text preprocessing, feature extraction, and logistic regression for accurate classification.
Features
- Text Preprocessing Remove stop words and perform stemming to clean and simplify the text data. Use regular expressions to clean the text and handle special characters.
- Feature Extraction with TF-IDF Utilize the Term Frequency-Inverse Document Frequency (TF-IDF) vectorizer to convert text data into numerical features. TF-IDF weighting helps capture the importance of words in the context of the entire dataset.
- Data Splitting Split the dataset into training and testing sets using the train_test_split function from scikit-learn. Ensure a balanced distribution of genuine and fake news in both sets.
- Logistic Regression Model Implement a logistic regression model using scikit-learn's LogisticRegression. Train the model on the training dataset to learn patterns and relationships between features and labels.
- Model Evaluation Assess the accuracy of the trained model using the accuracy_score metric. Evaluate the model's performance on the testing dataset to ensure generalization. Dataset The Fake News Detection System uses the (submit.csv). You can download the dataset from https://www.kaggle.com/c/fake-news/data. Technologies Used
Data Processing: NumPy, Pandas, NLTK (Natural Language Toolkit) Machine Learning: scikit-learn Version Control: Git
How to Contribute Fork the repository. Clone the forked repository to your local machine. Create a new branch for your feature or bug fix. Make your changes and commit them. Push your changes to your fork on GitHub. Submit a pull request to the original repository. Installation and Setup Clone the repository to your local machine. Install dependencies using pip install -r requirements.txt. Run the application or script to train and test the fake news detection model.
License This project is licensed under the MIT License.