Development of the scraping process used to collect data for the Pundits Review website - https://www.punditsreview.com/
Pundits Review scrapes and processes news articles about the Premier League in order to give players and teams a review score each week. Each Monday, the project collects articles, divides them into phrases, identifies the player or club being referred to and then predicts the sentiment of the phrase. See more on how it works here!
This repository shows the progression of the method used to scrape and process football articles from news sites. The directories show the workings involved in each phase of building a solution. Phase One represents the first method used and final solution represents the method eventually integrated into the Pundits Review project.
NOTE:
Prediction models have been removed from this repository
Phase One Method: Combination of Beautiful Soup & Requests libraries used inside of notebook
Phase Two Method: Scrapy takes place of beautiful soup & requests inside notebook
Phase Three Method: More functions incorporated into modules. Pipeline takes shape but crawler still called from notebook
Core files used inside Scrapy Spider which was eventually integrated into project