-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No cases data for England for latest day? (c.f Wales and Scotland, which have it) #45
Comments
I'm getting the data from the cases CSV file from the PHE dashboard (https://coronavirus.data.gov.uk/). The latest data in there is for 2020-04-26, so that's what appears in this repo. Before I was just getting the latest data and assigning it to the current date, which was not necessarily the correct thing to do. Interestingly, the the deaths CSV file does have data for 2020-04-27. |
@timday There are now divergences between the date types used by different statistics from varied data sources. As to positive case numbers of England and Wales, the terminology is Specimen Date, which indicates the date of the first positive specimen of any tested individual in the lab. In contrast, other data, such as death figures aforementioned by @tomwhite, use Reporting Date or similar ones, indicating the date that the data were published by the government after receiving them in batches from the lab, which contain results from many different specimen dates. In this sense, there is apparently an inconsistency between the corresponding dates of the latest available data, unless they unify them some day. However, this inconsistency might be mitigated depending on what type of data you are looking at. For example, if the cumulative figures, either in total or in breakdown, concern you, the latest specimen date of the cumulative cases in England is essentially the same thing as the latest reporting date of those in Scotland, as far as I understand. |
Hmmm.... thanks, interesting. For the purposes of looking at cases by region across England, Scotland and Wales (and using only days for which data is available for all nations) I'm now wondering whether the "best"/"most realistic" thing to do is either:
or
Not at all clear to me which is more "correct". It's only the |
In terms of cases by region, if you look at Wales historical data csv file provided on this dashboard, England and Wales are actually on the same page (both using Specimen Date), compared to Scotland, as I mentioned above. What England and Wales are doing on their dashboards is the first option (shifting the latest specimen date to match the latest reporting date of Scotland) you mentioned, which I think is the most realistic approach to check the latest cases breakdown. To justify, this approach can be regarded as the means to obtain the latest available cases breakdown. This also makes sense considering cases with unknown regions. But do bear in mind that this approach only makes sense subject to 1. latest data and 2. cumulative data. If other types, either historical or daily, are involved, I see no sensible option of consistency across all nations, unless they unify the publishing standard. |
Another thing I notice from charting the England cases data: Compare the output from
with
the places listed in both haven't changed at all (or declined by 1 in Barnsley's case). This seems most unlikely given the general rate of increase previously and it looks more like the data from 27th has simply been "reused" on the 28th. There's also something new going on with some regions becoming more "gappy" (e.g Isle of Wight); I'm sure I'd have noticed that before as it results in gaps appearing in some of my charts which were continuous lines before. |
That is because among all the data of a range of specimen dates the gov is receiving every day, only a fairly small number come from yesterday. No expert here, but I guess this means very few tests can have results in merely one day. This is one of the reasons for the daily revision, so the data of 27/04/2020 would be more reasonable if you look at it on 29/04/2020 than on 28/04/2020. Btw, the revision can affect data of over a month ago, not for every region though. However, I am not quite sure about tom's current daily update process. I mentioned it in #41, but It seems this revision thing hasn't been fully addressed according to your attached data above. The way I deal with this is to simply overwrite the historical data for England and Wales with new ones on a daily basis. |
Just looking at today's update.
but comparing with the numbers in my previous comment, it can be seen the numbers for the 27th and 28th have been bumped up from 2782 (both) to 2799 and 2801. |
Yes these latest data could be misleading, ditching last day's data could be useful to reveal the true trend in a sense. May I suggest another method if you want to keep the England and Wales historical data consistent with Scotland, which is to concatenate all the latest total numbers in each daily file, i.e. 29/04/2020 cumulative data published on 30/04/2020, 28/04/2020 data on 29/04/2020, etc. Though this can you transform specimen data to reporting ones. tom has archived all the old csv files, which makes it quite easy to do so. |
I note the last commit to
data/covid-19-cases-uk.csv
is "Update for 2020-04-27 for England, using new process". However, there is no data for any England region for 2020-04-27 in the file, although Scotland and Wales do have data for that date, and England data for 2020-04-26 and before is still present.The text was updated successfully, but these errors were encountered: