-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix 1431 binary comment (0.27) #1432
Conversation
Thank you very much for your very fast response!! |
Good. Let's see if @nicofooo, @vinc17fr, @norbertj42, @tester0077 have comments about this as they have been involved in other changes concerning the comment code. Here's the bill for my services: Head over to https://openhub.net, it's free to register. Give me some kudos and say something nice about Exiv2. |
Is any particular image used/required/recommended for this test? |
Thanks Arnold (@tester0077). I'm delighted to say that @nicofooo has tested this PR and he is happy. I'll wait a couple of days to see if @vinc17fr and @norbertj42 wish to comment before merging. Yes there is a test file which @JuergenWolz shared privately and discussed by email. So, I think we're close to closing and merging this PR. The following discussion with @JuergenWolz is very technical and presented here for transparency.
$ ../0.27-maintenance/build/bin/exiv2 -g Comment ~/Downloads/IMG_0027.CR2
Exif.Photo.UserComment Undefined 264 Tiere Sand Momo
$ Curiously, #1267 also solves @JuergenWolz's issue as the current 0.27-maintenance (without #1432) behaviour is good, thanks to removing trailing nul bytes.
None-the-less the 0.27.3 toString() return of
|
Apologies to @norbertj42 for not mentioning his involvement in this issue in my previous message. Comments on this issue are welcome from @vinc17fr and @norbertj42 and anybody else. |
As side note and looking at some of the sample runs of your tvisitor app shown in some of your replies, is there a man page or more detailed explanation of the app's options? As I am not a frequent user of either tvisitor or exiv2, the 'help' output from tvisitor is rather sparse and there seems to be more fine grained modifiers for the app. |
The current syntax for tvisitor is: $ tvisitor
usage: tvisitor -{ U | S | R | X | C | I }+ path+
$ The options are:
The options can be in any order and undefined characters are ignored. The most common option that I use is -pR which is equivalent to exiv2's -pR. It could be simply stated as tvisitor -R foo. In the test harness, I use -pRU. The option U prints Unknown (and known) tags. An unknown tag is an item of metadata for which tvisitor does not know the name. A tiff 'tag' is identified by a 16 bit integer and these are defined in the TIFF-EP and Exif Specifications. The MakeNote tags are not standardised and unknown tags are reported in the following style:
The option S prints the Structure of the file. For example, use the following file: https://clanmills.com/Stonehenge.jpg, we see the structure of the JPEG file:
The option R performs a Recursive descent of the file and dumps embedded structures such as the TIFF which contains the Exif Metadata. It also descends into IPTC data and ICC Profiles.
There is no plan to have a man page for tvisitor because it has a 200 page book! tvisitor isn't intended for any production use and has been written to explain how Exiv2 works. In less than 4000 lines of code it decodes the metadata in all formats supported by Exiv2 plus ISOBMFF formats .CR3, .HEIC and .AVIF. Additionally, it supports BigTiff, extended JPEG, dumping ICC profiles and many other features which are not supported in Exiv2. tvisitor is currently being tested using more than 10,000 images harvested from ExifTool, raw.Pixls.us, RawSamples.ch and images collected from issues reported to Exiv2. My aim is to successfully read 9990 which is 99.9% reliability. I fully expect the Community to attack me concerning the 0.1% that are not successfully decoded. On-line abuse from the Community is the reason that I am retiring. I have written the book for two purposes:
Exiv2 provides a unique capability to the Community and its long term maintenance is of importance to Linux. To my knowledge, no book as been written about Image Metadata. The tvisitor.cpp code would provide a good resource from which to develop a new Metadata Library. |
I had a look on the modification and did tests with some of my pictures. Behaviour is now perfect for me. Thanks a lot. |
Note that if you change the API in an incompatible way so that applications silently break, that's bad, because various existing applications that were working suddenly behave incorrectly because of the change. Here, this might also have security issues as the comment is longer than expected, thus could yield a buffer overflow. If an application displays So |
@clanmills Thank you Robin. I haven't had a chance to download and review your book, but I presume you will simply add the explanation to it - or perhaps it is already part of it. |
@vinc17fr : I had a discussion with Robin about this some time ago. There was a good reason for the change: when you want to change a comment field with the exiv2 command line tool, charset must be given. So it makes sense to have charset also in the output to be consistent. |
@norbertj42 I'm not talking of the command line tool (it is not used by applications), but of the API. As I've said, it is sufficient to add a new function to the API. This new function can be used by applications if they need it, and by the command line tool to output the charset. The issue is that many applications are not maintained actively. I had a look at the gThumb source, but did not manage to find where the charset prologue was added. Or perhaps you can provide patches for these applications? |
For Nomacs, I think I will be able to patch the code: it already has a function to remove |
@norbertj42 I agree with vinc17fr, the charset info ought to be separate from the data. |
@tester0077 I wrote that text this morning and added it to the book. Today, I arrived at the conclusion that the book cannot be finished in the next two weeks. Every week, I discover something new that deserves attention and it ends up in the book. A new issue about Nikon LensData arrived last night. This is an encrypted tag that Exiv2 doesn't understand. It'll be a 3 or 4 days to investigate and deal with that next week. The book will get finished in 2021. @norbertj42 Working on a release on my own is quite a daunting task. I'm careful about tracking everything in the release. However, the |
This channge has been made for good reason. If I remove the "charset=" behaviour, I'm sure @vinc17fr will want to dispute that also. It's not a perfect world. This is purely cosmetic. The security argument is bogus. The function returns a string and the application should deal with it in a safe an orderly manner. |
All what I'm saying is that one should get the comment as before. I don't see any good reason for this change: if the charset is useful, it could be provided by a new function, thus without gratuitously breaking existing applications. In the unpatched Nomacs, due to the "charset=..." text, the comment is almost not visible at all without scrolling (this depends on the pane width, but if one increases the width, one wastes other screen space), so that this is not just cosmetic. Now, I've written a patch for Nomacs, which extends the exiv2 string cleanup to support |
@vinc17fr You are a very determined person who will not give up until you have things your own way. I am not changing it on 0.27-maintenance as I'm sure it somebody equally strong willed will want to argue that it shouldn't be reverted. I propose that you submit a PR to 'master' and you'll have your way in 0.28. For now, I'm working hard to get my book finished and to retire. I would like to thank you for your generous words of appreciation for my years of work on Exiv2. |
@vinc17fr I will revisit this on Monday. Your point of view has merit. |
I've slept on this and spent another couple of hours looking at the code. I want charset=encoding (prologed string) in the output from the exiv2 command-line program. It should be possible for CommentValue::toString() to return an "unprologed string" and for CommentValue::write() to output the "prologed string". I believe that would make everybody happy. I didn't write Exiv2 and there are aspects of the code that I don't understand. Exifdatum::write() uses toString(). Without changing the API, I can't see a way to determine the value of the prolog string charset=encoding from Exiv2datum::write(). For a "dot" release, I am reluctant to change the API. I am happy to screen share on Zoom to step the code together. Two heads are better than one. I'll sleep on this again tonight. If I can't find a way to determine the prolog string charset=encoding in Exifdatum::write(), I will merge this PR on Monday. |
I think toString() is behaving correctly by returning a prologed string such as "charset=encoding ABCD". The correct method to get the value of a comment is CommentValue.comment() and documented here: https://exiv2.org/doc/classExiv2_1_1CommentValue.html One way to call that is: Exiv2::CommentValue commentValue (value().toString());
os << commentValue.comment() ; I've plugged this in here:
S.jpg is a copy of https://clanmills.com/Stonehenge.jpg
I don't think we should modify the 0.27.3 behaviour as it appears to be correct. Applications wanting the unprologed comment should use the clumsy code: Exiv2::CommentValue(value().toString()).comment() ; Comments welcome. I'm now very confident about merging this PR on Monday (without the illustrative plug in this comment). |
Thank you for having another look at this issue. I am a bit behind right now because I replaced my main C: drive, along with the Christmas preparation, such as they are under Covid, but I will have a look at the code in the links you have provided ASAP. |
Well, according to https://exiv2.org/doc/classExiv2_1_1CommentValue.html, there are two ways to get a comment: So this doesn't mean that a prologed string is more correct than just the comment string. Until now, Perhaps it could be decided in the future to change the behavior, but this is a change of API in an incompatible way, and this must not be done in a minor release. Please read: Semantic Versioning 2.0.0. Note also that in general (outside exiv2), the |
Right. I've found the fix for which I was searching this morning. With this, we should both be happy because I'm doing things your way, and the output of the exiv2 command-line program shows the prolog as documented in the man page. I've moved the prolog printer from CommentValue::write() into Exifdatum::write(). So:
When the plug is enabled, we get:
I'll submit this now (with the plug disabled) for you to test. This change has caused exceptions to appear in our test suite in conversions.sh which I will investigate tomorrow (I believe they are valid behaviour and should modify the reference files in the test suite). As a future enhancement the "special code" in std::ostream& Exifdatum::write() should be moved into a global function "printComment()" and the TagInfo for Photo.UserComment should declare that instead of printValue. I'll do that when this 0.27-maintenance is ported to 'master' for 0.28. I avoid extending the API on 0.27-maintenance. |
As I see, nobody objected the original changes as in 1432, this is removing isBinary in CommentValue::comment(). |
You're right. We've wandered a little away from your issue. Several people have discussed matters relating to this since @0.27.3 shipped and I am trying to make everybody happy. I'll be honest and say that I don't know what CommentValue toString() did in years gone by. Everything we're discussing here arises from the response to #1046 which was a "hard bug" concerning GPSProcessingMethod and GPSAreaInformation. This lead to many discoveries including:
And there were other matters such as binary ending up in the output, and other pain I have forgotten. I always try to make everybody happy. I don't always succeed. I hope we're done. |
Hi Robin, |
@JuergenWolz Thanks for your input on this. There's a saying "when you're in a hole, stop digging!". I'm strongly tempted to return to the state of the PR on Saturday. Here's the 0.27.2 behaviour with a very slightly modified samples/exifvalue which reports toString() and value().
And, I believe that's how it has worked for years. You'll notice that toString() reports charset in a slightly different syntax from that documented in 0.27.3 I've been trying, without success, to pacify @vinc17fr in his complaints that some applications are behaving differently with 0.27.3. The prime libexiv2 suspect has been toString(), however this evidence does not support that. I'm happy to tabulate similar behaviour with:
Before I undertake that investigation (which will take about 30 minutes), I'd like to hear your thoughts about the following proposal:
Your thoughts? |
Yes, I had observed slight differences in syntax, charset="Unicode" vs. charset=Unicode , and I know that charset="Unicode" existed for years. |
P.S.: 3. 0.27-maintenance with Saturday's PR would be fine for me. |
@clanmills I am fine with your approach - and will muss you, thanks for all |
Took longer that 30 minutes. Running the builds/tests is easy. Presenting the results in MarkDown is tough:
Conclusions I will restore this PR to the code on Saturday because:
std::ostringstream os ;
os << value() ;
std::string comment = os.str(); The documented way to determine the comment without the prefix charset= is discussed above: std::string comment = Exiv2::CommentValue(value().toString()).comment()); |
OK, Debian bug 974608 against gthumb and Debian bug 974616 against nomacs updated to say not to use internal libexiv2 functions to get the user comment, but the documented way you gave. BTW, it would be better not to expose internals, such as |
@vinc17fr: As you have not said "Thank You, Robin", I will retire from open-source when this PR is merged/closed. I have no interest in your opinion and aggressive behaviour. |
Hi Robin, |
Thanks, @JuergenWolz. It's been mostly fun, interesting and mentally rewarding. If everybody was as nice as you, @nicofooo, @norbertj42 and @tester0077 it would be all fun. Sadly there are a few people in the Community who make trouble. You might find the introduction to Chapter 11 Project Management interesting https://clanmills.com/exiv2/book/#11 (By the way, Chapter 11 in the US Tax Code is Bankruptcy!) I am now retired from the Exiv2. We leave on a pre-Christmas family vacation with the grand-kids on Friday. I'll get the book finished in the New Year. The book is mostly impenetrable and very detailed. That's quite intentional. This is not something to help you relax, or read to the cat. It's solid shit! |
@clanmills All the very best wishes for the holidays & retirement.
I still remember my surprise when I first asked a question re exiv2lib I
don't recall how many years ago and received a prompt and helpful reply.
Thank you for your unfailing help over the years.
Stay healthy & remember: illegitimi no carborundum :-)
Arnold
PS: looking forward to the completion of your book. It will be an
excellent summary of & reference for all things "Exiv2"
|
@tester0077 I've made a remarkable discovery today while working on Previews for my book and also #1464. Two or three months ago you asked me about data inside Stonehenge.jpg that doesn't seem to be needed. It's a couple of thumbnails that have nothing to do with Exif, ICC, XMP or IPTC. I've beefed up tvisitor to dump the segments that follow SOS (start of scan). There's a couple of thumbnails hiding at the bottom of the lake. I learn something every day. #1464 (comment) I've agreed to release Exiv2 v0.27.4 on 2021-05-22 to include ISOBMFF support (.CR3, .HEIC and .AVIF) and after that I'm out of here. There's going to be a Zoom meeting late-February/early-March to discuss the future of Exiv2. You'll get an invitation. |
Thank you Robin.
Never investigated this any further, but I'll keep this discovery in mind.
Also good to know the extra bytes are not just leftover junk :-)
Please do let me know about the Zoom meeting. I do want to keep up with Exiv2
Arnold
|
Arnold: @tester0077 You're on the invitation list on #1466. I hope you can join us. |
Two changes:
isBinary()
function and call in src/value.cppstd::string CommentValue::comment()
I will wait for you to test on your application and provide feedback before submitting this to the code base. Both @norbertj42 and @vinc17fr may also wish to comment.
Discussion
Let me explain how I believe this works. I didn't write Exiv2, so there are matters which I do not totally understand. There are three "Comments" in the Exif Spec (CIPA DC-008-2019, p45). UserComment and two GPS tags:
Exif Comment Specification
The character code used in the UserComment tag is identified based on an ID code in a fixed 8-byte area at the start of the tag data area. The unused portion of the area shall be padded with NULL ("00.H"). ID codes are assigned by means of registration. The designation method and references for each character code are given in Table 9. The value of Count N is determined based on the 8 bytes in the character code area and the number of bytes in the user comment part. Since the TYPE is not ASCII, NULL termination is not necessary.
Exiv2 Comment Implementation
When an application reads the image (with image->readMetadata()), the image is updated to have a a vector[] of Exifdatum key/values. https://exiv2.org/doc/classExiv2_1_1Exifdatum.html
You have 100% transparent access to the "raw" data read from the file. The API also provides convenience methods such as toString() which reports "charset=Ascii ABCD".
I understand your complain that
toString()
was returningbinary comment
. I have modified it to report the bytes in the string. Please be aware, that your concern about toString() is a presentation issue.Reporting
charset=encoding
in front of the data was added in v0.27.3 in response to #1046. The exiv2 command-line program was enhanced to parse charset=encoding, Comments were tested in the test suite, and Comments were documented in the exiv2.1 man page.The subject of the correctness of this was discussed in #1258. I received several very unpleasant messages from @vinc17fr on the subject of
charset=encoding
appearing in the UI of some applications which use libexiv2. He used the meaningless term "buggy". I believe it is correct fortoString()
to returncharset=Ascii ABCD
. An application may present the value in a different format using the "raw" data visible to them.