-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Links #120
base: master
Are you sure you want to change the base?
Links #120
Conversation
This commit adds a follow_links parameter to Bag.make_bag and Bag.__init__. When set to true Bag._is_dangerous will allow links to files that exist outside of the payload directory. By default it is set to False, but I think a reasonable argument could be made for making the default set to True. See LibraryOfCongress#115 for more context.
There were several places in the code where filesystem walking happens. These needed to be adjusted to take the follow_links option that is provided to make_bag or the Bag constructor. In addition there are a few functions outside of the Bag class which walk the fileystem which needed to be aware of whether the user wants to follow links. Interestingly this work surfaced a bit of a bug in bagit-python. If the directory to be bagged contains a symlink before bagging, bagit will happily atomically move that directory into the bag's payload directory. However when the manifests are created they totally ignore the symbolic links. So the data is there present on the filesystem but totally not represented in the manifests. With the addition of --follow-links you can now create a bag that contains symlinks, and verify it later. I think an argument could be made that this should be the default behavior, and perhaps you should only be able to turn it off?
real_path = os.path.realpath(norm_path) | ||
real_bag_path = os.path.realpath(norm_bag_path) | ||
if os.path.commonprefix((real_path, real_bag_path)) != real_bag_path \ | ||
and not self.follow_links: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So --follow-links
allows a symlink pointing to /etc/passwd
?
I see two different use cases for symbolic links:
For 1) the most reasonable behavior seems to follow the links as this PR implements and should indeed be the default. The bahvior for 2) should only be enabled explicitly, e.g. while developing or for testing, with a flag like There's also security issues when allowing payload data to point outside the bag dir, |
Yes. It seems simplest for bagit-python to trust the integrity of the filesystem, and if it can read files in the payload directory it should do so. Perhaps this is naive. But the current behavior where symlinks could be present in the payload and totally ignored by the bagging and validation process seems wrong. |
test.py
Outdated
src = j(os.path.dirname(__file__), "README.rst") | ||
dst = j(self.tmpdir, "README.rst") | ||
os.symlink(src, dst) | ||
self.assertRaisesRegex( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it must be assertRaisesRegexp to work in 2.7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes -- nice catch!
Currently if you attempt to bag a directory that contains symbolic links bagit.py happily creates the bag, but the manifests will not include the linked files or directories.
bagit.py --validate
will say this bag is valid, because the manifests don't mention the linked content, and the code that walks the filesystem during validation ignores links.However when you are looking at the filesystem you can see there is content present that is not in the manifest. This seems to be contrary to 3.A.4:
This PR includes changes to support following symbolic links when creating and validating bags:
I think an argument could be made that follow links should be the default. I'm thinking of situations where bags are quite large and may use symlinks to manage large payload files/directories.
Is this a crazy idea?