Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retrosheet_daily table missing game.source #62

Open
segiddins opened this issue May 25, 2021 · 2 comments
Open

retrosheet_daily table missing game.source #62

segiddins opened this issue May 25, 2021 · 2 comments
Labels
bug Something isn't working

Comments

@segiddins
Copy link

cwdaily outputs daily lines for each player, which include the source for the game information.
For games with multiple sources, there will be multiple daily entries for a given (player_id, game_id) tuple, and right now there's no column that can be used to disambiguate.

E.g. select * from retrosheet_daily where game_dt = '1943-06-19' and player_id = 'mackr101';

yields 5 rows for 2 games (two halves of a double header), which each game having a box score & deduced game, according to https://raw.githubusercontent.com/chadwickbureau/retrosplits/master/daybyday/playing-1943.csv. I'm not sure why mack in particular has 2 deduced game entries for CHA194306191, but that's probably an issue in chadwick

@droher
Copy link
Owner

droher commented Jun 26, 2021

Hmm, I put in some protection against this problem here, but looks like it's not working:

def remove_redundant_box_score_files() -> None:

I'll try to patch. Adding a general source column across all of these tables would be a great idea. For now, I do have an extra retrosheet_deduced_game table that you can join on to find which games have deduced entries -- I know that doesn't help with disambiguation, though.

@droher droher added the bug Something isn't working label Jun 26, 2021
@droher
Copy link
Owner

droher commented Apr 16, 2022

This hasn't been resolved in the code, but I've manually removed the duplicated games from my Retresheet fork, so the newly published version should be free of this bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants