This git repository contains metadata for Akvaplan-niva publications with a DOI.
The Deno Deploy service https://dois.deno.dev/ is connected to main
on this repository and is the data source of https://akvaplan.no/en/publications
The metadata is collected by an automated pipeline and stored as NDJSON in the slim
directory.
The pipeline finds Akvaplan-niva publications in Crossref, CRISTIN, and OpenAlex.
./bin/doi-pipeline
Manually add DOIs by adding/editing a NDJSON file in doi/*/*.ndjson
and run pipeline.
Inspect and push approved changes in slim/*.ndjson
.
Update the KV store in the data service:
$ curl --netrc -XPOST https://dois.deno.dev/ingest
{"ingested":1669,"total":1669,"elapsed":198.636,"start":"2023-07-11T13:16:13.589Z","end":"2023-07-11T13:19:32.225Z","ok":true}
The DOI pipeline consists of the following steps:
First, create list of unique DOIs
- Extract DOIs from
raw
text references - Find DOIs in Crossref, CRISTIN, and OpenAlex
- Add these into the NDJSON-formatted DOIs in
doi
- De-deplicate the DOIs
(The raw text references were augmented with DOIs from Crossref's SimpleTextQuery, when missing in in the original source.)
For each DOI
- Fetch "works" metadata from Crossref API
- Create slim metadata from Crossref works
- Fetch PDF location from Unpaywall API
Notice: The pipeline aggressively caches all HTTP responses, calling the APIs just once per DOI across all invocations. On linux, the cache is located in
$HOME/.cache/deno/https/api.*.org
.
Finally
- Add PDF URL to slim metadata
- Partition slim metadata, creating one file per year in
slim
- Show summary counts
- Verify SHA checksums