Skip to content

Updating existing items

Andrew Berger edited this page Oct 25, 2023 · 7 revisions

Any item that has already been accessioned into the SDR can be updated through Preassembly. This process is sometimes referred to as "reaccessioning."

The requirements for updating items via Preassembly are that:

  • the Preassembly user making the update must have permission to manage the item in Argo
  • the item must be in an "Accessioned" state - if the item is "Opened" or "In accessioning" it must be returned to an Accessioned state before running Preassembly

How updates work

Every time Preassembly updates an item it creates a new version of that item. This new version can be entirely new (i.e. every file is different from the previous version) or it can be an incremental change to the existing item, such as the addition of a new file or the replacement of an existing file with a new version of that file.

It is also possible to "remove" files from an existing item when making an update via Preassembly, but please keep in mind that the files will only be "removed" from the latest version of the item. They will be retained in the item's previous version history stored in the preservation system.

Replacing all files in an item

To replace all files in an item, you can follow the same steps you would follow to accession an item for the first time:

  • stage all of the files you are going to accession
  • create a manifest.csv listing the druids and folders in the accessioning batch
  • optionally, create a file_manifest.csv to apply specific settings to individual files
  • run a discovery report and address any issues found in the report
  • run Preassembly

Making incremental updates

An incremental update is an update where you intend to add, modify, or remove files in an item or set of items while leaving the remaining files unchanged. It is possible to do this by:

  • staging only the files that are new and/or modified
  • creating a manifest.csv listing all the druids and folders in the accessioning batch
  • submitting a file_manifest.csv that contains a list of all files to be contained in the new version of the item or set of items

The file_manifest.csv is the key to incremental updates: without it, Preassembly would have no way to determine which files are new and which should be left unchanged.

Creating the file_manifest.csv

Unlike with first-version accessioning, when making incremental updates you must create a file_manifest.csv. The simplest way to do this is to download a CSV listing the current set of files from Argo first. To obtain this CSV:

If updating a single item

  1. Navigate to the item's page in Argo
  2. Scroll down to the "Content" section
  3. In the upper right-hand side of the "Content" section, click on the link labeled "Download CSV". Screenshot 2023-10-13 at 3 24 20 PM

If updating a batch of items

  1. Navigate to the Argo bulk actions page and choose "New Bulk Action"
  2. Choose "Export structural metadata" (located in the list of CSV-based bulk actions)
  3. Enter the list of druids that you will be updating
  4. Submit the bulk action
  5. Wait for the bulk action to complete and then download the CSV
  6. Open the CSV in an editor of your choice.
  7. Edit the CSV as needed

Whether for one item or a batch of items, this downloaded CSV follows the same structure. It contains a list of all files currently in the item or set of items, including all of the specific settings for each file. This is what's known within SDR as "structural metadata": the structure that enables different types of displays: book, image, video, 3D, etc. See Consul for further documentation on SDR structural metadata, including a deeper explanation of the structure behind this CSV.

Once you've obtained the CSV, edit it as needed to reflect the updates you are going to make using Preassembly.

  • If you are replacing existing files and making no other changes,
    • Do not modify the CSV at all. You can move ahead to the next step: "staging your files".
  • If you are adding any files,
    • Insert one line for each file into the CSV
    • These new lines must be positioned exactly where you want the new files to appear within the structure of the item
    • Depending on the nature of the change, you may need to revise the "sequence" column for all existing lines, for example, if the new file being added is placed within the "middle" of the list rather than appended at the end
    • You do not need to fill out every column in the CSV for each new file but you must include
      • druid
      • resource_label
      • resource_type
      • sequence
      • filename
      • publish
      • shelve
      • preserve
    • Other columns are optional and will be filled in by the system according to default
      • file_label - this will be filled by the filename
      • rights (view, download, location) - these will be the same as the object rights
      • mimetype - this will be determined by the system
      • role - this will be left blank if not filled in
  • If you are removing files so that they will not appear in the new version,
    • Remove the lines corresponding to those files
    • Note that if the only change you're making is to remove files, you can do this in Argo without using Preassembly
  • Finally, save the new CSV as "file_manifest.csv"

Staging your files

Once you've created the new file_manifest.csv you are ready to stage your files for Preassembly. The steps for staging files for updates are no different than the steps for staging files when accessioning an object for the first time, with one exception: you only need to include the files that are being added or modified

In the following example, I've staged

  • modified files for druids jq399nx1812 (Page 1) and ww089xf7663 (Page 4)
  • a new file for druid nx301cp7407
.
├── file_manifest.csv
├── jq399nx1812
│   └── jq399nx1812_0001.tif
├── manifest.csv
├── nx301cp7407
│   └── newfile.txt
└── ww089xf7663
    └── ww089xf7663_0004.tif 

In the file_manifest.csv, I inserted newfile.txt as a new resource (sequence #3) in item nx301cp7407. I also removed two lines from jq399nx1812, representing "Page 2" of that item, which will be removed with my Preassembly update.

druid,resource_label,resource_type,sequence,filename,file_label,publish,shelve,preserve,rights_view,rights_download,rights_location,mimetype,role
jq399nx1812,Page 1,image,1,jq399nx1812_0001.tif,jq399nx1812_0001.tif,no,no,yes,world,world,,image/tiff,
jq399nx1812,Page 1,image,1,jq399nx1812_0001.jp2,jq399nx1812_0001.jp2,yes,yes,no,world,world,,image/jp2,
nx301cp7407,Page 1,image,1,nx301cp7407_0001.tif,nx301cp7407_0001.tif,no,no,yes,world,world,,image/tiff,
nx301cp7407,Page 1,image,1,nx301cp7407_0001.jp2,nx301cp7407_0001.jp2,yes,yes,no,world,world,,image/jp2,
nx301cp7407,Page 2,image,2,nx301cp7407_0002.tif,nx301cp7407_0002.tif,no,no,yes,world,world,,image/tiff,
nx301cp7407,Page 2,image,2,nx301cp7407_0002.jp2,nx301cp7407_0002.jp2,yes,yes,no,world,world,,image/jp2,
nx301cp7407,new text file,object,3,newfile.txt,newfile.txt,yes,yes,yes,world,world,,,
ww089xf7663,Page 1,image,1,ww089xf7663_0001.tif,ww089xf7663_0001.tif,no,no,yes,world,world,,image/tiff,
ww089xf7663,Page 1,image,1,ww089xf7663_0001.jp2,ww089xf7663_0001.jp2,yes,yes,no,world,world,,image/jp2,
ww089xf7663,Page 2,image,2,ww089xf7663_0002.tif,ww089xf7663_0002.tif,no,no,yes,world,world,,image/tiff,
ww089xf7663,Page 2,image,2,ww089xf7663_0002.jp2,ww089xf7663_0002.jp2,yes,yes,no,world,world,,image/jp2,
ww089xf7663,Page 3,image,3,ww089xf7663_0003.tif,ww089xf7663_0003.tif,no,no,yes,world,world,,image/tiff,
ww089xf7663,Page 3,image,3,ww089xf7663_0003.jp2,ww089xf7663_0003.jp2,yes,yes,no,world,world,,image/jp2,
ww089xf7663,Page 4,image,4,ww089xf7663_0004.tif,ww089xf7663_0004.tif,no,no,yes,world,world,,image/tiff,
ww089xf7663,Page 4,image,4,ww089xf7663_0004.jp2,ww089xf7663_0004.jp2,yes,yes,no,world,world,,image/jp2,
ww089xf7663,Page 5,image,5,ww089xf7663_0005.tif,ww089xf7663_0005.tif,no,no,yes,world,world,,image/tiff,
ww089xf7663,Page 5,image,5,ww089xf7663_0005.jp2,ww089xf7663_0005.jp2,yes,yes,no,world,world,,image/jp2,

Run the discovery report

After staging your files and manifests, create a new "project" and run a discovery report. Since you are using a file_manifest.csv:

  • in the box labeled "Processing configuration", leave "Default" as the selected option
  • check the box labeled "I have a file manifest"

Remember, you must use the "I have a file manifest" option when making incremental updates.

Screenshot 2023-10-13 at 4 01 27 PM

The discovery report will show you the changes that will be made to the object. Review these changes to make sure they are what you expect. Pay particular attention to any files that will be deleted and make sure this list is correct before proceeding to Preassembly.

Screenshot 2023-10-13 at 4 08 13 PM

In this example, the report is showing me the changes I've made, namely:

  • updating a file in jq399nx1812
  • removing two files from jq399nx1812
  • adding a new file to nx301cp7407
  • updating a file in ww089xf7663

Note: when submitting a new TIFF file, as in this set of updates, the system will also update the corresponding JP2 to reflect the new TIFF.

Run Preassembly

If the list looks correct and there are no errors on the discovery report, proceed to Preassembly by clicking the "Run Preassembly" button at the bottom of the discovery report page. This will trigger Preassembly to send the new/modified files to SDR along with the structural metadata that indicates that the remaining files should not be changed.

Troubleshooting

Incremental updates are new to Preassembly as of October 2023. Before then, Preassembly could make updates to items but only if you staged all files, including files that will not be changed, every time you ran Preassembly. Staging only the new files would result in only the new files being included in the new version of an item - all other files would be removed in that version. There is still a risk that this could happen if you do not use the file_manifest.csv.

Problem: the discovery report says that almost all of my files are going to be deleted

If this happens, check the settings on your discovery report job. Remember that you must check the box that says "I have a file manifest". If you did not check that box, please start a new Project with that box checked and do not run Preassembly until you see the correct discovery report output on your new job.

If the system does not see a file manifest, it will not be able to determine which files should remain unchanged from the previous version of the item. It will then fall back to processing the job as if the staged files are the only files to include in the next version of the item.

Clone this wiki locally