Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test cases for bag-info values #8

Open
1 of 2 tasks
acdha opened this issue Feb 28, 2017 · 5 comments
Open
1 of 2 tasks

Test cases for bag-info values #8

acdha opened this issue Feb 28, 2017 · 5 comments

Comments

@acdha
Copy link
Member

acdha commented Feb 28, 2017

In discussion with @johnscancella and @reeset on reeset/bagit_sharp#1 had some additions to the duplicate metadata testcase:

  • Interleaved ordering (e.g. contact-email / contact-name / contact-email / contact-name) should be preserved when the bag is saved
  • Case-insensitive handling of the keys – e.g. "Contact-Email" and "contact-email". https://tools.ietf.org/html/draft-kunze-bagit-14#page-6 only mandates insensitivity for the reserved names but I think this should be clarified in the spec to mandate case-insensitive access for all values to avoid confusion.
acdha added a commit that referenced this issue Feb 28, 2017
This adds tests to confirm that order is preserved across duplicate keys
and for case-insensitivity for custom key values. This is only half of
the test since a library would need to confirm the expected output for
key access and after saving.
@johnscancella
Copy link
Contributor

Case-insensitive handling of the keys – e.g. "Contact-Email" and "contact-email". https://tools.ietf.org/html/draft-kunze-bagit-14#page-6 only mandates insensitivity for the reserved names but I think this should be clarified in the spec to mandate case-insensitive access for all values to avoid confusion.

is this getting into too much detail of the implementation?

@acdha
Copy link
Member Author

acdha commented Mar 7, 2017

@johnscancella I'm a bit mixed on that but I've been leaning case-insensitive for everything since the info file is intended to be human-managed and humans tend not to care. It seems like a bug if someone has “Crawl-date” in one bag and “Crawl-Date” in another but a program only sees one of them because it used a case-sensitive library.

What do you think - play it conservatively and add a spec update before finishing this ticket? This might either be partially out of scope or otherwise incompatible with this repo currently since we only collect valid/invalid bags and this would be a discrepancy in how a tool processes the bag rather than a question of validity.

@johnscancella
Copy link
Contributor

I vote to say make a spec update and say that keys are case insensitive but the values are case sensitive since there might be some special meaning.

I would also add that implementations should preserve case of the keys as entered. That way you can do something like

  1. read a existing bag
  2. write it out to a different directory
  3. compare file to file - there should be 0 differences

@acdha
Copy link
Member Author

acdha commented Mar 7, 2017 via email

@johnscancella
Copy link
Contributor

Yeah, we can specify that in the README and use bagit.py and bagit-java as source implementations to look at for testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants