Long file names in tar are getting cut in index #1

harelglik · 2014-08-07T07:11:22Z

Hi,
Great effort!
We are trying to use your code and it seems the index file is not created correctly for long files names inside a tar. Changing the Tar index creation part to use TarArchiveInputStream
with:
while (null != (tarArchiveEntry = tarArchiveInputStream.getNextTarEntry())) ...
instead of using the byte array seems to fix the problem.

harelglik · 2014-08-07T14:34:01Z

More investigation brought me here: http://stackoverflow.com/questions/2078778/what-exactly-is-the-gnu-tar-longlink-trick
So, for long file names there is a special Tar entry before the real file entry that its data is the actual file name.
I made a fix to the index creation (also makes sure these special long-name meta entries aren't added to index).
There is still a problem in getFileInfo, as the index (that now holds the correct name) is compared against the name from the TarArchiveEntry (which is not read fully yet again). I comment this check for now.

JDatta · 2014-08-07T14:59:37Z

When I first wrote it I had only format defined by POSIX.1-1988 (ustar) in mind. This format supports filenames upto 256 char (https://www.gnu.org/software/tar/manual/html_chapter/tar_8.html). I do not think it works for POSIX.1-2001 and we do not currently have any special handling for different types of tar headers (L or otherwise).

I vaguely remember that I first tried using tarArchiveInputStream.getNextTarEntry() for index creation. But probably it caused some issues and hence I had to do the offset manipulation myself. I can not recollect the reason now.

I plan to revisit this as soon as I have some bandwidth (maybe in a couple of weeks). If you have some immediate fix, you are welcome too.

JDatta added the bug label Apr 22, 2017

JDatta added this to the 2.2_beta release milestone Apr 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long file names in tar are getting cut in index #1

Long file names in tar are getting cut in index #1

harelglik commented Aug 7, 2014

harelglik commented Aug 7, 2014

JDatta commented Aug 7, 2014

Long file names in tar are getting cut in index #1

Long file names in tar are getting cut in index #1

Comments

harelglik commented Aug 7, 2014

harelglik commented Aug 7, 2014

JDatta commented Aug 7, 2014