Scicat - Archival System Mockup

This set of scripts will simulate an archival system where datasets get archived/retrieved on demand

Archive Mock Service

Description

It listens to jobs posted to rabbitmq by the Backend. This will instruct the Archive Mock to run the script that simulates the archive system, for both archival and retrieval. After retrieving a job from RabbitMQ, the job is executed. An archival job is executed as such:

Mark job as being handled (jobStatusMessage: "inProgress")
Mark dataset(s) as being archived (archivable: false, archiveStatusMessage: "started")
Add DataBlocks
Mark dataset(s) as archived (retrievable: true, archiveStatusMessage: "datasetOnArchiveDisk")
Mark job as done (jobStatusMessage: "finishedSuccessful")

Note: the way the DataBlocks (not OrigDatablocks) are generated in the mock does not necessarily correspond to any specific implementation of the archival system, it's just an example.

A similar script handles retrieval:

Mark job as being handled. (jobStatusMessage: "inProgress")
Mark dataset(s) as being retrieved (retrieve status message: "started")
Mark dataset(s) as retrieved (retrieve status message: "datasetReceived")
Mark job as done (jobStatusMessage: "finishedSuccessful")

Running locally

It requires having access to a RabbitMQ and a Scicat Backend instance, with the latter configured to post jobs to the former. It depends on pika for RabbitMQ communication.

python3 job_handler_mq_client_mock.py [PARAMS]

After starting the script, it will wait for any archival/retrieval jobs posted to RabbitMQ, and will attempt to handle them. It's possible for datasets to get stuck in a non-archivable and non-retrievable state, if the Job or Dataset has any unexpected (as in, out-of-spec) elements.

List of parameters:

Parameter	Description	Required
--scicat-url	The base url of the backend (usually ends in `/api/v_`)	Yes
--scicat-user	The user to use for connecting to scicat	No (can use token instead)
--scicat-password	The user's password for connecting to scicat	No (can use token instead)
--scicat-token	The token to use for connecting to scicat	No (can use user and pw instead)
--rabbitmq-url	RabbitMQ API url	Yes
--rabbitmq-user	the username used for accessing RabbitMQ	Yes
--rabbitmq-password	the password used for accesing RabbitMQ	Yes

Docker Container

It still requires some kind of access to a SciCat Backend and a RabbitMQ instance. You can use scicatlive if you don't have any instance of those two to test with.

Variable	Corresponding Parameter
SCI_URL	`--scicat-url`
SCI_USER	`--scicat-user`
SCI_PW	`--scicat-password`
TOKEN	`--scicat-token`
RMQ_URL	`--rabbitmq-url`
RMQ_USER	`--rabbitmq-user`
RMQ_PW	`--rabbitmq-password`

Utility Scripts

Use python scicat_ingestion.py -b "http://catamel.localhost/api/v3/" -u admin -p 2jf70TPNZsS --path [PATH TO DATASET] to ingest a dataset. NOTE: You must add a transfer.yaml file (example is included in this repository) to the dataset which will serve as a metadata source for the ingestor.
Use python job_create.py -b "http://catamel.localhost/api/v3/" -u admin -p 2jf70TPNZsS --dataset-id [PID of the previously created dataset] --job-type [archival / retrieval] to create an archival / retrieval job (new datasets can only be archived initially, and archival can only happen once)
The job will almost immediately be handled by the archival mock script
You can check the datasetlifecycle of your dataset to see that it has been archived / retrieved using the backend's swagger ui with the GET /Datasets/{id} endpoint

Basic Auto-Ingester and Archival for API checking

You can use the dataset_ingest_job_mock.py script to ingest and archive datasets by providing it a source folder of datasets (with each dataset having its own subfolder), and each dataset must have a transfer.yaml as a source of metadata (example provided with this repository). This is mostly useful as an API test, it doesn't replicate the job message queue, but it also alleviates the dependency on RabbitMQ (scicat is obviously still needed). The command to use it is the following:

python dataset_ingest_job_mock.py -b [backend address] -u [username] -p [password] --path [path to dataset collection]

Optionally, one can add --check-dataset-duplication to avoid ingesting the same datasets multiple times

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
dataset_ingest.py		dataset_ingest.py
dataset_ingest_job_mock.py		dataset_ingest_job_mock.py
job_create.py		job_create.py
job_handler_mq_client_mock.py		job_handler_mq_client_mock.py
scicat_archival.py		scicat_archival.py
scicat_common.py		scicat_common.py
scicat_ingestion.py		scicat_ingestion.py
transfer.yaml		transfer.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scicat - Archival System Mockup

Archive Mock Service

Description

Running locally

Docker Container

Utility Scripts

Basic Auto-Ingester and Archival for API checking

About

Releases

Packages

Languages

SciCatProject/ScicatArchiveMock

Folders and files

Latest commit

History

Repository files navigation

Scicat - Archival System Mockup

Archive Mock Service

Description

Running locally

Docker Container

Utility Scripts

Basic Auto-Ingester and Archival for API checking

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages