Harvest

There is a celerybeat running every day at:

EST Timezone (Estern Standard Time) UTC -5
EDT Timezone (Eastern Daylight Time) UTC -4

How to check harvest?

The easiest way to check harvest is by accessing https://inspire-prod-grafana.web.cern.ch. There is also an alert from grafana which sends a message on Zulip at ops/harvest topic.

How to harvest?

We harvest many collections from arXiv but someone can harvest a single paper as well. The collections that are related to INSPIRE are the following:

cs
econ
eess
math
physics
physics:astro-ph
physics:cond-mat
physics:gr-qc
physics:hep-ex
physics:hep-lat
physics:hep-ph
physics:hep-th
physics:math-ph
physics:nlin
physics:nucl-ex
physics:nucl-th
physics:physics
physics:quant-ph
q-bio
q-fin
stat

Harvest by collection

$ ssh inspire-prod-crawler1
$ inspirehep crawler schedule arXiv article --kwarg 'from_date=2018-12-06' --kwarg 'until_date=2018-12-07' --kwarg 'sets=cs,econ,eess,math,physics,physics:astro-ph,physics:cond-mat,physics:gr-qc,physics:hep-ex,physics:hep-lat,physics:hep-ph,physics:hep-th,physics:math-ph,physics:nlin,physics:nucl-ex,physics:nucl-th,physics:physics,physics:quant-ph,q-bio,q-fin,stat

Note from_date and until_date are very important.

This command will trigger a harvest, you can always check the tasks in the queue (rabbitmq) with the following command:

$ ssh inspire-prod-broker1
$ rabbitmqctl -p inspire list_queues | grep harvests

Harvest a single paper

$ inspirehep crawler schedule arXiv_single article --kwarg 'identifier=oai:arXiv.org:1604.05726'

You can check the logs by running:

$ inspirehep crawler job list
$ inspirehep crawler job logs <JOB_ID>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harvest

How to check harvest?

How to harvest?

Harvest by collection

Harvest a single paper

Clone this wiki locally