You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Many of the reports in AIPscan, such as "File format count" as well as the "File format" and "File format version" reports introduced in #76, refer to "file formats". The file format names found in AIPscan are aggregated from the format names in Archivematica METS files and ultimately reflect the Archivematica FPR data model (and, one degree further, the PRONOM data model). These sometimes look a bit different than what one would expect. In the words of @ross-spencer, "they're not quite distinct file formats, and they're not quite format families either." Ross has suggested a better name for these might be "format naming group".
To give a few examples:
Most end users would probably consider PDF to be a file format, and variations of it to be file format versions. In Archivematica/AIPscan, "Acrobat PDF 1.4 - Portable Document Format" and "Acrobat PDF 1.5 - Portable Document Format" are considered to be different file formats, not different versions of the same format. In the Archivematica FPR, these are aggregated into a "Portable Document Format" format group, but that aspect of the FPR data model has not made its way down to AIPscan yet.
Similarly, most end users would consider JPEG to be a file format. In PRONOM, valid files with a .jpg/.jpeg file extension have the following file format names, among others, each with one or more associated PUIDs:
"Raw JPEG Stream"
"JPEG File Interchange Format"
"Exchangeable Image File Format (Compressed)"
By the time we get to AIPscan, reading format names from the METS files, we seem to have all of the above as well as "JPEG" and "Generic JPEG". This makes it really difficult for an end user to see all of the files they would consider to be in the JPEG format. And in this instance, Archivematica's format groups likely wouldn't help us, as the nearest format group is "Image (Raster)".
I'm not sure what the solution to this looks like at this point. It might be useful to do some thinking about whether or how to communicate some of these subtleties through the UI, as well as what might become possible by bringing additional data sources into AIPscan.
The text was updated successfully, but these errors were encountered:
One stopgap solution to consider that was suggested by @ross-spencer is to list the related PUIDs alongside the format name wherever possible in the UI, e.g.:
Many of the reports in AIPscan, such as "File format count" as well as the "File format" and "File format version" reports introduced in #76, refer to "file formats". The file format names found in AIPscan are aggregated from the format names in Archivematica METS files and ultimately reflect the Archivematica FPR data model (and, one degree further, the PRONOM data model). These sometimes look a bit different than what one would expect. In the words of @ross-spencer, "they're not quite distinct file formats, and they're not quite format families either." Ross has suggested a better name for these might be "format naming group".
To give a few examples:
Most end users would probably consider PDF to be a file format, and variations of it to be file format versions. In Archivematica/AIPscan, "Acrobat PDF 1.4 - Portable Document Format" and "Acrobat PDF 1.5 - Portable Document Format" are considered to be different file formats, not different versions of the same format. In the Archivematica FPR, these are aggregated into a "Portable Document Format" format group, but that aspect of the FPR data model has not made its way down to AIPscan yet.
Similarly, most end users would consider JPEG to be a file format. In PRONOM, valid files with a .jpg/.jpeg file extension have the following file format names, among others, each with one or more associated PUIDs:
By the time we get to AIPscan, reading format names from the METS files, we seem to have all of the above as well as "JPEG" and "Generic JPEG". This makes it really difficult for an end user to see all of the files they would consider to be in the JPEG format. And in this instance, Archivematica's format groups likely wouldn't help us, as the nearest format group is "Image (Raster)".
I'm not sure what the solution to this looks like at this point. It might be useful to do some thinking about whether or how to communicate some of these subtleties through the UI, as well as what might become possible by bringing additional data sources into AIPscan.
The text was updated successfully, but these errors were encountered: