Contributors
-
@victoryg739 - Classification Model, EDA, Data Preparation and Data Cleaning, Data Extraction and Slides
-
@Nivlek06 - Practical Motivation, EDA, Slides and Conclusion
-
@Roseus9 - Practical Motivation, EDA, Slides and Conclusion
Problem Defintion
- Can we predict stock prices with online comments/trends on social media?
Solution Approach
- By using classification model to check whether r/WallStreetBets post can dictate stock prices.
Models Used
- VADER Sentiment Analysis
- Gradient Boosted Tree
Conclusion
We took a closer look at how r/WallStreetBets interacts with the stock market. We examined activity on the subreddit and conducted an analysis as to whether or not there was any significant correlation between sentiment score of stocks and and there price movement. We found that the classification model does provide some predictive power in our dataset.
To further test our model, it is recommended that we forward test our classification model on real-time market data and test our model on more recent r/wallstreetbets data.
What did we learn from this project?
- Natural Language processing (NLP) based on VADER (Valence Aware Dictionary and Sentiment Reasoner)
- Classifcation modal using gradient boosted tree
- Pearson Correlation (r-value and p-value)
- Data Cleaning for Sentiment Analysis (lemmatization, stopwords and etc)
References
-
https://www.analyticsvidhya.com/blog/2020/04/beginners-guide-exploratory-data-analysis-text-data/
-
https://scholarworks.rit.edu/cgi/viewcontent.cgi?article=12195&context=theses
-
https://www.kaggle.com/code/radema/yolo-explorative-analysis-on-wallstreetbets
-
https://medium.com/geekculture/debunking-r-wallstreetbets-with-machine-learning-257a867ecc76