-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generator function for getting new records #185
Comments
I agree completely, that the package should keep its simplicity - and I also like that it is very light on dependencies. Using openalexR is really straight forward (and the refactoring of the parameters as in #182 discussed will make it even easier. But the problem is, as you point out, the memory limit and, I assume this and I have no test cases, the slowdown when downloading a large number of pages (memory reallocations when the result list grows). I will not go into that second part - that is an independent discussion, but comes later in as a by product. So we are at the memory limit. The primary question is, as I see it, how can this package deal with the memory limit and avoid it. So I see two steps here:
I only looked at 'oa_request()`, so far, and I assume that the main memory limitation is happening when downloading multiple pages, which are than saved in memory in the result list. The most promising approach could be, as the number of pages is known, to allocate the memory for all downloads, and then just overwrite the individual page results. If I understand R memory management correctly, this should be faster as well as it does not need to re-allocate memory each time. And it would tell you before the retrieval starts, if there is enough unfragmented memory. But still, there is the question on how to deal with the case, when the memory can not be allocated at the beginning. There are two (or three?) approaches:
I am in general in favour of automatic functions, so that the user can concentrate on the question (which has to do with literature) and is not sidetracked by thinking about complicated ways of solving the problem of to many hits in the search or snowballing from to may papers, but this is as I see it the most difficult to implement. The easiest is an error message if the memory can not be allocated, and to provide a hook function which, if returning TRUE, continues processing, and if returning FALSE, skips processing and fetches the next page. |
Hi Rainer, a few thoughts:
|
Thanks for the clarifications, @trangdata . Please feel free to close this issue. |
Done in #184 |
Moved part of #182 here.
@rkrug:
@yjunechoe:
@trangdata:
I mostly agree with @yjunechoe. Our vision for the package covers the common use case of accessing a smaller amount of OpenAlex records that can fit in memory. I would argue that the use cases for larger amount of data can possibly be solved using OpenAlex snapshot download/bulk export.
Nonetheless, @rkrug, while I'm not completely clear on setHooks/getHooks idea, I have implemented a generator function for in #184 with the new optional dependency coro. Would this cover your use case? @yjunechoe any thoughts?
The text was updated successfully, but these errors were encountered: