-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get_eurostat fails with correct table ID #293
Comments
I have the same issue with different codes. Any news or hints on how to resolve this? |
@bt-hb @CubicTom Thank you for reporting. I tried replicating
@CubicTom what datasets do you get the issue with, or is it with all available datasets? If all the datasets that you fail to download are on the larger side (such as the quarterly data |
@pitkant Thanks for your reply! Using version 4.0.0 I can confirm that namq_10_gdp can be downloaded. Two series IDs that are reproducing the error for me are
Last time I used the function succesfully with these IDs was March 28th 2024. |
Right, thanks. I think I now understand what the problem is. If you look at that dataset in the Eurostat data browser you can see that it's a rather big one, with 920 different categories for different types of activities alone. The returned object of those two queries is not the data but an XML file, like
This is described in the Eurostat help pages: API - Detailed guidelines - Asynchronous API When I use the abovementioned URI for an asynchronous request, I get the following message:
And so on. The eurostat package does not currently have the functionalities to handle asynchronous requests. This might get implemented sometime in the future but I have to be frank that it's not very high on my priority list right now. PR's are of course always welcome. |
@pitkant Thanks for the explanation. Any idea why this has only stopped working recently? I have been using this request regularly for about two or three years now without any issues... |
Good question! It may be that Eurostat has changed something on their side and the Asynchronous API guidelines page seems to be more detailed now than it was when I last checked it. See especially the "More details on asynchronous trigger and thresholds..." collapsible section there:
If you only need a subset of the data then filtering it accordingly might solve your problem. I will have to make sure that a sensible message is displayed to the end user if the server is attempting to give an asynchronous response. |
@pitkant If using filters, get_eurostat will not allow me to get the flags. If you have any idea how to retrieve those, I would happily filter the query. |
@CubicTom in that case it seems I should hurry with 4.1 release that adds the option to make SDMX queries with filters, instead of directing filtered queries to API Statistics. Retrieving some big datasets can quite quickly reach "between 500 000 cells and 5 000 000 cells", the level where async kicks in. Above 5 000 000 cells the query needs to be filtered because otherwise it seems that it won't play nice at all: "if above 5 000 000 cells, a client request error is returned and more filters need to be added to the extraction query to reduce its estimated cost. EDIT: How stringent this limit of 500 000 cells then is in practice? It of course depends on the number of values and so on but also on the number of categories:
Testing with some items from the eurostat TOC, I noticed that datasets that had under 1 million values were handled normally, whereas datasets with over 1 million values returned an XML response. I was writing the faster data.table functionalities with datasets that have 100+ million values in mind so there has definitely been a policy change with regards to accessing data. |
As posted in #304 : @CubicTom I have received the following message from Eurostat user support:
So I think that the issues are related to big datasets not being cached as they previously were. Excerpt from the Eurostat documentation: "When a data request is initiated, the system first checks if the exact same request was already performed previously and if applicable lookup the data directly from an internal cache and return it as a response." I'm not sure if today's hotfix has renewed the cache for all files or not (probably not, sounds like a process that takes some time) but maybe something has changed for the better now. |
@pitkant Thank you so much for this info! Indeed, the old code I had already commented out in favor for another (but waaaaaay slooooower) procedure now warks flawlessly again! 🚀 Best wishes and keep up the good work, |
Many thanks for the very useful package.
Up until early January (I think prior to the latest update of the namq_10_gdp dataset on 26th January), I was able to pull quarterly GDP data using the code:
data <- eurostat::get_eurostat("namq_10_gdp")
However, now I have been getting the following error message:
Error in eurostat::get_eurostat("namq_10_gdp") : get_eurostat_raw fails with the id namq_10_gdp
I have double checked the dataset ID using search_eurostat and manually via the website and I believe it is correct. https://ec.europa.eu/eurostat/databrowser/view/namq_10_gdp/default/table?lang=en&category=euroind.ei_qna.ei_namq_10_ma
Other datasets download fine -- for example nama_10_gdp works -- and
check_access_to_data()
is TRUE.For info, I am running v4.0 of the eurostat package with Rstudio v2023.03.1.
The text was updated successfully, but these errors were encountered: