Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support large search result sets #124

Open
reece opened this issue May 5, 2015 · 11 comments
Open

Support large search result sets #124

reece opened this issue May 5, 2015 · 11 comments
Labels
enhancement New feature or request keep alive exempt issue from staleness checks

Comments

@reece
Copy link
Member

reece commented May 5, 2015

Originally reported by Reece Hart (Bitbucket: reece, GitHub: reece) in biocommons/eutils #124
Migrated by bitbucket-issue-migration on 2016-05-25 23:09:02


NCBI's eutiltities interface very nicely supports large search result sets by sending results in chunks. The eutils currently only handles the first chunk.

See http://www.ncbi.nlm.nih.gov/books/NBK25500/#chapter1.Demonstration_Programs
Perl excerpt to generation the continuation URLs:

for($retstart = 0; $retstart < $Count; $retstart += $retmax) {
   my $efetch = "$utils/efetch.fcgi?" .
                "rettype=$report&retmode=text&retstart=$retstart&retmax=$retmax&" .
                "db=$db&query_key=$QueryKey&WebEnv=$WebEnv";

The purpose of this issue is provide full support for large result sets using webenv histories.

Possible implementation:
This seems like an obvious use of python iterators for results. I'd like to keep the eutils.xmlfacades.esearchresults.ESearchResults as parsing-only. However, the interface methods are appropriate. So, one implementation is to write an upper-level (eutils.esearchresults) that wraps the xmlfacade version, holds a reference to the client, and
provides an iterator over results. This upper-level ESearchResults would be passed back to callers in lieu of the xmlfacade version.

@reece reece added major enhancement New feature or request labels May 26, 2016
@reece reece added this to the 0.1.0 milestone May 26, 2016
@reece reece removed this from the 0.1.0 milestone Jul 23, 2016
@moritzschaefer
Copy link

Is this still on the radar?

@reece
Copy link
Member Author

reece commented Oct 30, 2018

It's certainly still desirable. No ETA. I'm happy to take a PR for this issue.

@dhimmel
Copy link

dhimmel commented Sep 10, 2019

Ah this is a real deal-breaker to an otherwise nice package! Although I am glad the package did show the following warning:

WARNING:eutils._internal.client:NCBI found 13241 results, but we truncated the reply at 250 results; see https://github.com/biocommons/eutils/issues/124/

If it is any guidance, a few years ago I made this implementation to deal with the pagination. Anyways, I don't think I'll have the time soon to make a PR with this contribution, but will keep it on my radar.

@leipzig
Copy link

leipzig commented Feb 14, 2020

how do we shut off the warnings? warnings.simplefilter("ignore") is not effective

@reece
Copy link
Member Author

reece commented Feb 17, 2020

That command suppresses warnings made through the warnings module.

The messages that you're seeing are warnings made through the logging module. There really no way to suppress those specifically.

If you're running from the command line, the best/easiest workaround is probably to redirect stderr to a separate file (or /dev/null).

@PazBazak
Copy link

Is this repo still maintained?

@reece
Copy link
Member Author

reece commented Dec 27, 2022

I don't need eutils in my work at the moment, so I'm not adding new features or fixing bugs. But, I will gladly accept PRs if you have something to contribute.

@Sdamirsa
Copy link

Sdamirsa commented Oct 6, 2023

I tried to add a costume variable "retstart" and "retmax" to create a loop and getting the results by looping through my search pubmed ids. After five hour, still couldn't make it, but I am sure that we can add retstart and retmax as a costume variable. In VS code, you need to ctrl+click on the xx.esearch to see the code behind that which is:

def esearch(self, db, term, retmax=250, retstart=0):
    """query the esearch endpoint
    """
    esr = ESearchResult(self._qs.esearch({"db": db, "term": term}, retmax=retmax, retstart=retstart))


    if esr.count > retmax:
        logger.warning("NCBI found {esr.count} results, but we truncated the reply at {esr.retmax}"
                    " results; see https://github.com/biocommons/eutils/issues/124/".format(esr=esr))
    return esr

And you can ctrl+click on ESearchResult to see the code behind that which is:
class ESearchResult(Base):
#def init(self, xml_string, retmax=250, retstart=0):
#self._xml_root = ET.fromstring(xml_string)
#self._retmax = retmax
#self._retstart = retstart

_root_tag = "eSearchResult"

@property
def count(self):
    return int(self._xml_root.find("Count").text)

@property
def retmax(self):
    return int(self._xml_root.find("RetMax").text)

#@retmax.setter
#def retmax(self, value):
    #self._retmax = value
    #self._xml_root.find("RetMax").text = str(value)

@property
def retstart(self):
    return int(self._xml_root.find("RetStart").text)

#@retstart.setter
#def retstart(self, value):
    #self._retstart = value
    #self._xml_root.find("RetStart").text = str(value)

@property
def ids(self):
    return [int(id) for id in self._xml_root.xpath("/eSearchResult/IdList/Id/text()")]

@property
def webenv(self):
    try:
        return self._xml_root.find("WebEnv").text
    except AttributeError:
        return None

You can see my code trying to set retmax and retstart as a modifiable variable, hoping to download a big chunk of articles looping through pubmed results:

while i <= count:
    ai = ec.esearch(db='pubmed', term=search_term, retmax=400, retstart=i)
    i += 400

I hope someone with more experience can put 1 hour into this and solve this issue, which will help so many people like me :) Cheers to this future hero :)

Copy link

github-actions bot commented Jan 5, 2024

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Jan 5, 2024
Copy link

This issue was closed because it has been stalled for 7 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 12, 2024
@jsstevenson jsstevenson added the keep alive exempt issue from staleness checks label Apr 4, 2024
@jsstevenson
Copy link
Contributor

Just hit this issue myself -- I'm reopening this issue and will get a PR up... sometime.

@jsstevenson jsstevenson reopened this Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request keep alive exempt issue from staleness checks
Projects
None yet
Development

No branches or pull requests

7 participants