Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long file names in tar are getting cut in index #1

Open
harelglik opened this issue Aug 7, 2014 · 2 comments
Open

Long file names in tar are getting cut in index #1

harelglik opened this issue Aug 7, 2014 · 2 comments
Labels

Comments

@harelglik
Copy link

Hi,
Great effort!
We are trying to use your code and it seems the index file is not created correctly for long files names inside a tar. Changing the Tar index creation part to use TarArchiveInputStream
with:
while (null != (tarArchiveEntry = tarArchiveInputStream.getNextTarEntry())) ...
instead of using the byte array seems to fix the problem.

@harelglik
Copy link
Author

More investigation brought me here: http://stackoverflow.com/questions/2078778/what-exactly-is-the-gnu-tar-longlink-trick
So, for long file names there is a special Tar entry before the real file entry that its data is the actual file name.
I made a fix to the index creation (also makes sure these special long-name meta entries aren't added to index).
There is still a problem in getFileInfo, as the index (that now holds the correct name) is compared against the name from the TarArchiveEntry (which is not read fully yet again). I comment this check for now.

@JDatta
Copy link
Owner

JDatta commented Aug 7, 2014

When I first wrote it I had only format defined by POSIX.1-1988 (ustar) in mind. This format supports filenames upto 256 char (https://www.gnu.org/software/tar/manual/html_chapter/tar_8.html). I do not think it works for POSIX.1-2001 and we do not currently have any special handling for different types of tar headers (L or otherwise).

I vaguely remember that I first tried using tarArchiveInputStream.getNextTarEntry() for index creation. But probably it caused some issues and hence I had to do the offset manipulation myself. I can not recollect the reason now.

I plan to revisit this as soon as I have some bandwidth (maybe in a couple of weeks). If you have some immediate fix, you are welcome too.

@JDatta JDatta added the bug label Apr 22, 2017
@JDatta JDatta added this to the 2.2_beta release milestone Apr 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants