Fixes for issues affecting the FBref scraper #281
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes the following issues:
"all"
regardless of the type of stats queried. This caused an issue as the cache might not have contained the table needed. It now caches these tables in different files.n
rows, the website adds a row in a table that replicates the table header. This causedread_schedule
to fail as the number of rows indf_table
would be higher than the one of the list of match URLs obtained (see [FBref] Non-data rows in the table body should be removed #277). I added the logic to remove those replicated headers when found.Scores & Fixtures
on theBig 5 European Leagues Stats
page. Thus it'd go to the genericScores & Fixtures
page, which shows games currently being played. Because of this, I had to move the optimisation that combines the top five leagues under that label inread_leagues
, asread_schedule
necessarily needs the five top leagues separately rather than in their combined form.IndexError
, supposedly when no flag is present. I fixed this by changing the logic to use regular expressions instead so that when the flag is missing no error is thrown.Additionally, it moves
pretty-error
to the dev dependencies group, as it would otherwise be installed in repositories importing this library (which should not be the case). I'm not sure I've done this correctly, and I had to remove some imports, so please let me know if this breaks previous behaviour and advise me on what I should do instead. It also updatespandas
to v2.0.