Skip to content
cnrdh edited this page Jun 12, 2017 · 7 revisions

Several endpoints run a special _file service for uploading and downloading file attachments.

Downloading (large) file collections

The file server can return all attached files as a zip or tar.gz archive, but on the-fly packing is not viable for very large file collections.

Download attached files sequentially

Pipe a list of filenames (JSON resonse from _file BASE) to wget:

BASE="https://api.npolar.no/dataset/c3db82e3-adfa-413c-9523-5b3fb09708ed/_file" # Example BASE URI
UUID=`basename ${BASE%/*}` && mkdir -p $UUID && cd $UUID # Optional: extract UUID and use as download directory 
wget -qO- $BASE | grep -Po '"filename"\s*:\s*"\K([^"]*)' | wget -nc --base=$BASE/ -i-

Check file integrity

All uploaded files have a md5 checksum

# Run in folder containing the downloaded files
META=`wget -qO- $BASE`
echo $META | grep -Po '"md5sum"\s*:\s*"\K([^"]*)' > /tmp/md5
echo $META | grep -Po '"filename"\s*:\s*"\K([^"]*)' > /tmp/filename
paste /tmp/md5 /tmp/filename > md5sums.txt
md5sum -c md5sums.txt