An advanced web application for efficiently summarizing Bengali news articles.
- π Overview
- π― Problem Statement
- β¨ Features
- π§ Model Architecture
- π Project Structure
- π Installation
- π₯ Usage
- π License
Bengali Text Summarizer is a web application developed as part of the CSE499B Senior Design Project II for the B.Sc. final year project. It allows users to input Bengali news articles and generate concise summaries, addressing the challenge of navigating through vast amounts of Bengali news content efficiently.
- Web Crawler: Initially developed to fetch articles from various Bengali newspaper portals.
- Data Collection: Gathered articles and summaries from multiple sources.
- Model Training: Utilized a pre-trained model (google/mt5-small) with a Seq2Seq architecture.
- Web Interface: Built using Next.js 15 and React 19 for a responsive and user-friendly experience.
The proliferation of Bangla news portals presents a challenge in navigating so many articles within a limited time, compounded by the language's inherent complexity.
- π Overwhelming volume of Bengali news content online
- β³ Limited time for comprehensive reading
- π€ Inherent complexity of the Bengali language
- π§ Difficulty in quickly grasping essential information from articles
- π Input Bengali news articles
- π€ Generate concise summaries
- π Responsive design for various devices
- π Dark mode support
Our summarization model is based on the Seq2Seq architecture using the pre-trained google/mt5-small
model and MT5Tokenizer.
Set | Metric | Text (token count) | Summary (token count) |
---|---|---|---|
Training | Mean length | 1576.52 | 61.15 |
Max length | 9645 | 316 | |
Min length | 23 | 5 | |
Std length | 943.45 | 25.43 | |
Validation | Mean length | 1266.48 | 56.78 |
Max length | 2559 | 105 | |
Min length | 153 | 22 | |
Std length | 540.35 | 17.93 | |
Test | Mean length | 1302.62 | 57.51 |
Max length | 2548 | 105 | |
Min length | 182 | 21 | |
Std length | 542.46 | 17.75 |
import torch from transformers import MT5ForConditionalGeneration, MT5Tokenizer
model_name = "google/mt5-small" tokenizer = MT5Tokenizer.from_pretrained(model_name) model = MT5ForConditionalGeneration.from_pretrained(model_name)
train_inputs = tokenize_data(df_4_train, max_length=512, max_target_length=100) val_inputs = tokenize_data(df_4_val, max_length=512, max_target_length=100) test_inputs = tokenize_data(df_4_test, max_length=512, max_target_length=100)
training_args = Seq2SeqTrainingArguments( output_dir="./results", eval_strategy="epoch", learning_rate=1e-5, per_device_train_batch_size=8, per_device_eval_batch_size=8, num_train_epochs=5, weight_decay=0.01, save_total_limit=2, predict_with_generate=True, save_safetensors=False )
Epoch | Training Loss | Validation Loss |
---|---|---|
1 | 1.046200 | 0.692979 |
2 | 1.015400 | 0.683604 |
3 | 1.027700 | 0.676918 |
4 | 0.988000 | 0.672858 |
5 | 0.994400 | 0.671896 |
src /
βββ app/
β βββ api/
β β βββ bts-summarize // BTS summarization API endpoint
β βββ fonts // Custom fonts directory
β βββ globals.css // Global CSS styles
β βββ layout.tsx // Root layout component
β βββ page.tsx // Home page component
βββ component/
β βββ layout/
β β βββ footer.tsx // Footer component
β β βββ NavigationBar.tsx // Navigation bar component
β βββ page-contents/
β β βββ AdditionalContents/
β β β βββ AuthDialog.tsx // User authentication dialog
β β β βββ FacultyAdvisor.tsx // Faculty advisor details
β β β βββ ProjectMetadata.tsx // Brings back project metadata together
β β β βββ ProjectOverview.tsx // Project overview section
β β β βββ StatsCard.tsx // Statistical metrics card
β β β βββ TeamMembers.tsx // Team members list
β β β βββ TrainingChart.tsx // Training data chart
β β βββ SummaryGenerator/
β β βββ ArticleInput.tsx // Input for articles to summarize
β β βββ ArticleList.tsx // List of articles
β β βββ ArticleSummary.tsx // Summarized article display
β β βββ CategoryList.tsx // Article category list
β β βββ Header.tsx // Header for Summary Generator
β β βββ MainContent.tsx // Main content area
β β βββ Sidebar.tsx // Sidebar navigation
β β βββ SummaryGenerator.tsx // Main Summary Generator component
β βββ ui // Shadcn UI components
βββ context/
β βββ ThemeContext.tsx // Theme context for app theming
βββ hooks/
β βββ useSummaryGenerator.tsx // Custom hook for Summary Generator
βββ lib/
βββ constants.ts // Application-wide constants
βββ errors.ts // Error handling utilities
βββ huggingface.ts // Hugging Face API utilities
βββ types.ts // TypeScript types and interfaces
βββ utils.ts // Utility functions
βββ validation.ts // Data validation functions
- Clone the repository:
- Navigate to the project directory:
- Install dependencies:
- Start the development server:
git clone https://github.com/your-username/bengali-text-summarizer.git
cd bengali-text-summarizer
npm install
npm run dev
- Open your browser and navigate to
http://localhost:3000
- Input a Bengali news article in the provided text area
- Click the "Summarize" button
- View the generated summary
This project is licensed under the MIT License. See the LICENSE file for details.