Skip to content

ArtifactDB/SewerRat-py

Repository files navigation

Python interface to the SewerRat API

Unit tests Documentation PyPI-Server

Pretty much as it says on the tin: provides a Python client for the API of the same name. It is assumed that the users of the sewerrat client and the SewerRat API itself are accessing the same shared filesystem; this is typically the case for high-performance computing clusters in scientific institutions. To demonstrate, let's spin up a mock SewerRat instance:

import sewerrat as sr
_, url = sr.start_sewerrat()

Let's mock up a directory of metadata files:

import tempfile
import os

mydir = tempfile.mkdtemp()
with open(os.path.join(mydir, "metadata.json"), "w") as handle:
    handle.write('{ "first": "foo", "last": "bar" }')

os.mkdir(os.path.join(mydir, "diet"))
with open(os.path.join(mydir, "diet", "metadata.json"), "w") as handle:
    handle.write('{ "fish": "barramundi" }')

We can then easily register it via the register() function. Similarly, we can deregister this directory with deregister(mydir).

# Only indexing metadata files named 'metadata.json'.
sr.register(mydir, names=["metadata.json"], url=url)

To search the index, we use the query() function to perform free-text searches. This does not require filesystem access and can be done remotely.

sr.query(url, "foo")
sr.query(url, "bar*") # partial match to 'bar...'
sr.query(url, "bar* AND foo") # boolean operations
sr.query(url, "fish:bar*") # match in the 'fish' field

We can also search on the user, path components, and time of creation:

sr.query(url, user="LTLA") # created by myself
sr.query(url, path="diet/") # path has 'diet/' in it

import time
sr.query(url, after=time.time() - 3600) # created less than 1 hour ago

Once we find a file of interest from a registered directory, we can retrieve its metadata, or other files in the same directory, or the entire directory itself:

sr.retrieve_metadata(mydir + "/metadata.json", url)
sr.list_files(mydir, url)
sr.retrieve_file(mydir + "/diet/metadata.json", url)
sr.retrieve_directory(mydir, url)

Check out the API documentation for more details on each function. For the concepts underlying the SewerRat itself, check out the repository for a detailed explanation.