Traffic information, such as traffic incidents and real-time traffic status, plays a fundamental role in improving the efficiency of Intelligent Transportation Systems (ITS). Conventionally, traffic information is obtained from physical sensors like GPS, cameras, or loop detectors. Recently Social media has also been regarded as the potential source to serve as social sensors to extract traffic information, since people and authoritative agencies often post transportation information online with the popularity of such platforms. As these platforms have a great number of real-time user generated contents, they have become powerful and inexpensive information sources. Nowadays use of social media is quite common and Twitter is one of the prominent social media sites currently. In 2018 more than 336 million active bloggers were recorded on Twitter monthly. The information these Tweets provide can be used for a variety of tasks such as prediction of a person’s influence in an area, predicting weather, people’s reviews, problems and disaster management.
In field of transportation also there is a spike in the number of people using social media data for dealing with issues such as predicting traffic and accidents in an area. Earlier when twitter used to provide the coordinates of the blogger, work was done on reporting of accidents and issues causing traffic in that area. As now as per new terms and conditions, Twitter does not provide the coordinates of the blogger, it is quite a challenge to predict the area of the issue. Here, we have took aid of data mining, having data from twitter about transport, using India as bounding box, we filtered tweets such that each tweet we are classifying as traffic related or not has a mention of location in it. This solved the problem and made it easy for prediction of area from the blogger posted the tweet.
Hence, it is quite clear that social media data, here twitter, is an incredible source of information and this rich embedded information can help us improving traffic predictions. The Twitter information is both noisy and unstructured. An effective text mining method is necessary to extract the useful transport-related information from tweets. In this study, we employ and compare two deep learning methods: Deep Neural Network (DNN) and Long Short-Term Memory (LSTM), in training and classifying the accident-related tweets. Unlike classifiers such as logistic regression or Support Vector Machines (SVMs), deep learning does not seek direct functional relationships between the input features and the output classification results. Instead, it is a set of machine learning algorithms that attempt to learn in multiple levels, corresponding to different levels of abstraction. The training process of DNN is divided into multiple layers, and the output result is expressed as a composition of layers, where the higher level features are the composition of lower-level features , giving the potential of modelling complex data with fewer units than a similarly performing shallow network. We have shown results with RNN (Recurrent Neural Network), DNN (with LSTM) and all basic algorithms including Logistic Regression, SVM, Naïve Bayes and Random Forest. Later we also used the latest Facebook library “FastText” for classification which provided excellent results as shown in coming sections.