Libraries to provide support for the use of Bagit bags and Bagit Profiles
The BagIt Support library complies with version 1.3.0
of the Bagit profiles specification and includes the following
bag profiles:
Because these profiles are built in, we do our best to keep them up to date, but they may occasionally need to be updated.
The BagIt Support library uses bagit profiles which have json compliant to that of the bagit profiles spec. In order to
support constraints on custom tag files, a section called Other-Info
is used to provide additional contraints. The
Other-Info
section is composed of a list of json objects, each of which should be titled for the "tag" file, e.g.
APTrust-Info
, and have each of its fields outlined which share the same parameter types as in the Bag-Info
section.
Aptrust Other-Info
"Other-Info" : [{
"APTrust-Info": {
"Title": {
"required": true,
"description": "The title to be used"
},
"Access": {
"required": true,
"values": ["Consortia", "Institution", "Restricted"]
},
"Storage-Option": {
"required": true,
"values": [
"Standard",
"Glacier-OH",
"Glacier-OR",
"Glacier-VA",
"Glacier-Deep-OH",
"Glacier-Deep-OR",
"Glacier-Deep-VA"
]
}
}
}]
The BagProfile
class provides three constructors:
- A default constructor which uses the
beyondtherepository
profile. - A constructor which takes a
BagProfile.BuiltIn
specifying a built in profile to use. - A constructor which takes an
InputStream
. This is intended to be the json content of the BagIt profile and allows for external profiles to be used.
e.g. Using a Built In Profile
final String profileIdentifier = "beyondtherepository";
final BagProfile.BuiltIn builtInProfile = BagProfile.BuiltIn.from(profileIdentifier);
final BagProfile profile = new BagProfile(builtInProfile);
As mentioned above, the BagProfile
constructor only takes an InputStream
, so if you want to use a custom Bagit
Profile, all you need to do is provide the InputStream
for your json schema.
final BagProfile profile;
final Path json = Paths.get("/profiles/bagit-profile.json");
try (InputStream is = Files.newInputStream(json)) {
profile = new BagProfile(is);
}
The BagProfile
has the capabilities to validate that a Bag
(read by the gov.loc bagit library) conforms to its
standard. In order to use this validation, the BagProfile#validate(Bag)
should be uesd. If validation fails, a
RuntimeException
is thrown describing what sections failed to validate.
final Path bag = Paths.get("/bags/my-really-cool-bag");
final BagReader reader = new BagReader();
try {
final Bag readBag = reader.read(bag);
profile.validateBag(readBag);
} catch (UnparsableVersionException | MaliciousPathException | UnsupportedAlgorithmException |
InvalidBagitFileFormatException e) {
log.error("Unable to read bag", e);
}
In addition to the validation on a Bag, a BagProfile
can also validate a BagConfig
before the process of writing
begins in order to verify that all the tag files used will be compliant with a given BagProfile
. If a BagConfig
fails to validate, a RuntimeException
is thrown.
final Path yaml = Paths.get("/config/sample-bag.yml");
final BagConfig config = new BagConfig(yaml.toFile());
profile.validateConfig(config);
In order to help write Bagit bags, a basic BagWriter
is provided which only writes the metadata (tag files,
manifests) of a bag. It requires the user to populate the payload files for a bag as well as track what the
checksums are for each payload file. A bagit.txt
is generated by default but all other tag files must have data
provided, including the bag-info.txt
, otherwise they will not be written. The BagConfig
class can be used to help
assist with loading values for tag files such as the bag-info.txt
.
The BagWriter
comes with a few methods to help populate tag files for the bag:
public void registerChecksums(final String algorithm, final Map<File, String> filemap)
public void addTags(final String key, final Map<String, String> values)
Writing a Bag
final Long bytesWritten;
final Long filesWritten;
final Path bag = Paths.get("/bags/sample-bag")
final Path yaml = Paths.get("/config/sample-bag.yml")
final BagItDigest sha1 = BagItDigest.SHA1;
final Map<File, String> sha1Checksums = new HashMap<>();
// work to populate data directory
...
// configure the BagWriter
final BagWriter writer = new BagWriter(bag, Set.of(sha1.bagitName());
writer.registerChecksums(sha1.bagitName(), sha1Checksums());
// register tag files
final BagConfig config = new BagConfig(yaml.toFile());
config.getTagFiles().forEach(filename -> writer.addTags(filename, config.getFieldsForTagFile(filename));
// finish the bag-info.txt with information from populating the data directory
Map<String, String> info = writer.getTags(BagConfig.BAG_INFO_KEY);
Map<String, String> generatedo = Map.of(BagConfig.BAG_SIZE_KEY, byteCountToDisplaySize(bytesWritten),
BagConfig.PAYLOAD_OXUM_KEY, bytesWritten.toString() + "." + filesWritten.toString(),
BagConfig.BAGGING_DATE_KEY, DateTimeFormatter.ISO_LOCAL_DATE.format(LocalDate.now())
writer.addTags(BagConfig.BAG_INFO_KEY, info.putAll(generated));
writer.write();
sample-bag.yml
bag-info.txt:
Source-Organization: org.duraspace
External-Description: Sample bag
External-Identifier: SAMPLE_001
Bag-Group-Identifier: SAMPLE
Internal-Sender-Identifier: SAMPLE_001
Internal-Sender-Description: Sample bag
aptrust-info.txt:
Access: Restricted
Title: Sample bag
The BagIt Support library can assist with serialization and deserialization of Bagit bags.
Supported formats are:
- zip: zip, application/zip
- tar: tar, application/tar, application/x-tar, application/gtar, application/x-gtar
- gzip (only tar+gz when serializing): tgz, gzip, tar+gzip, application/gzip, application/x-gzip, application/x-compressed-tar
Because gzip is a compression/decompression format, when deserializing gzip only decompression occurs. This means that it will require more space to decompress a tar+gzip bag because it will first decompress the gzip portion, then extract the tar archive.
The SerializationSupport
class offers helper methods for instantiating the correct BagSerializer
or
BagDeserializer
depending on what is passed in:
public static BagSerializer serializerFor(final String contentType, final BagProfile profile)
public static BagDeserializer deserializerFor(final Path serializedBag, final BagProfile profile)
When retrieving a BagSerializer
, the correct serializer is created based on the given contentType
and BagProfile
.
If the contentType
is not supported by either the BagProfile
or the SerializationSupport
class, a
RuntimeException
is thrown.
final Path bag = Paths.get("/bags/my-really-cool-bag");
final String contentType = "zip";
final BagProfile profile = new BagProfile(getProfileInputStream());
final BagSerializer serializer = SerializationSupport.serializerFor(contentType, profile);
final Path serialized = serializer.serialize(bag);
Retrieving the BagDeserializer
is similar to the BagSerializer
. When attempting to find the appropriate
BagDeserializer
to use, the apache tika library is used in order to read the content type of the Path
. If a
BagProfile
does not support the found content type, a RuntimeException
is once again thrown, and if the
SerializationSupport
does not have built in support for the content type, an UnsupportedOperationException
is
thrown.
final Path bag = Paths.get("/bags/my-really-cool-bag.tar.gz");
final BagProfile profile = new BagProfile(getProfileInputStream());
final BagSerializer deserializer = SerializationSupport.deserializerFor(bag, profile);
final Path deserialized = deserializer.deserialize(bag);