Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset; covid-19-uk-data/data/covid-19-cases-uk.csv has non numerical values in some of the total cases field #29

Closed
Jcamain opened this issue Apr 6, 2020 · 3 comments

Comments

@Jcamain
Copy link

Jcamain commented Apr 6, 2020

@tomwhite - Tom is there any chance you can change the '1 to 4' values in the Total Cases field? Your dataset is the best one I've found (I've been grabbing the NHSEngland from the ArcGis dashboard, but it doesn't include Wales and Scotland), so would like to reference yours instead - I work for Qlik and we are helping out where we can - our software allows for a huge amount of analysis in to the data and your set is perfect. Any additional mapping fields etc.. and data I'm working on happy to share, and the dashboard.

@tomwhite
Copy link
Owner

tomwhite commented Apr 6, 2020

Hi @Jcamain - thanks for raising this, glad you are finding the dataset useful.

What would you change the '1 to 4' values to? This is how PHE published them right at the beginning, before switching to actual case numbers. I agree it's annoying, but the simplest thing is probably to filter the dataset to drop dates up to 2020-03-05 (the last date with '1 to 4'-style values in it) for the cases file.

@timday
Copy link

timday commented Apr 6, 2020

FWIW, in my (python) code reading the csv file, I map all the cells which I expect to contain numbers through a function which is currently

def value(s):
    if s=='1 to 4':
        return 2.5
    else:
        return float(s)

just on the grounds that if all of 1,2,3 & 4 are equally likely, that's the average. (And because if any more odd things like that turned up which would break the conversion to float, I can intercept them there too. Fortunately it's been pure numbers all the way since.)

Pretty much standard operating procedure for real world data IMHO. If it's not something like "1 to 4" there'll be the odd "n/a" or "?" or blank cell for your processing to choke on. You always need a way of dealing with them.

@Jcamain
Copy link
Author

Jcamain commented Apr 7, 2020

Thanks guys! To be fair, I thought it might cause some issues, but everything is working as it should! Will post up my dashboard when I'm finished, just in case you find it interesting, J

@Jcamain Jcamain closed this as completed Apr 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants