Twitter has become an important communication channel in times of emergency. The ubiquitousness of smartphones enables people to announce an emergency they’re observing in real-time. Because of this, more agencies are interested in programatically monitoring Twitter (i.e. disaster relief organizations and news agencies).
But, it’s not always clear whether a person’s words are actually announcing a disaster. Take this example:
The author explicitly uses the word “ABLAZE” but means it metaphorically. This is clear to a human right away, especially with the visual aid. But it’s less clear to a machine.
build a machine learning model that predicts which Tweets are about real disasters and which one’s aren’t. You’ll have access to a dataset of 10,000 tweets that were hand classified
Certainly! Here are a few examples of natural disaster-related tweets :
"🚨 Breaking News: A massive earthquake measuring 7.5 on the Richter scale struck the coastal region today. Prayers for the safety of everyone affected. Stay safe and be prepared! 🙏 #Earthquake #SafetyFirst"
"🔥 Wildfires are spreading rapidly in the forest area, posing a serious threat to nearby communities. Emergency services are on high alert. Evacuation orders have been issued. Please follow instructions from authorities. #Wildfires #SafetyAlert"
"⚠️ Tropical Storm Alert: The meteorological department has issued a warning for a potential tropical storm formation in the coming days. Stay tuned for updates and take necessary precautions. #TropicalStorm #StaySafe"
"💨 Strong winds and heavy rainfall are expected in the region due to an approaching cyclone. Secure loose objects, stay indoors, and avoid unnecessary travel. Safety should be the top priority. #Cyclone #WeatherUpdate"
for solving this we used Bert-base-cased and bert-base-uncased with fine-tune BERT for classification of disaster tweets PyTorch-based transformer models and Tensorflow-based models. the model and pretrained model from TFBertForSequenceClassification.
We prepared a simple train.py training script,the following code snippets available in @code_train.py
We'll use the following command to launch training:
!tensorflow & pytorch scripts/@code_train.py \
--fine_tune_model bert-base-uncased \
--fine-tune_model bert-base-mutilingual-cased\
--dataset_path train.csv \
--lr 1e-3 \
--per_device_train_batch_size 32 \
--epochs 10
Model: "tf_bert_for_sequence_classification"::
Total params: 109,483,778
with fine-tune BERT model
Model: "tf_bert_for_sequence_classification"::
Total params: 177,855,747
the training was completed and achieved results :
***** train metrics *****
epoch = 10
train_runtime = 0:7:45
train_samples.shuffle = 1000
train_samples_per_second = 85.00
🛠 frameworks and tools used: