Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow negative struct parent keys in PDFMergerUtility #149

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

bernhardf-ro
Copy link

PDFMergerUtility expects structural parent tree keys to be non-negative.
However, negative values don't seem to be forbidden by the specification and are accepted by validators.
This patch adapts the class to handle negative struct parent keys correctly.

PDFMergerUtility expects structural parent tree keys to be non-negative.
However, negative values don't seem to be forbidden by the specification and are accepted by validators.
This patch adapts the class to handle negative struct parent keys correctly.
@THausherr
Copy link
Contributor

Do you have a PDF where this happens?

@bernhardf-ro
Copy link
Author

Here is a document to verify the issue. (Sorry, I forgot to add that to the issue in the first place.)
merge.pdf
It is a valid PDF/UA-1, according to PDF Accessibility Checker 2021.
Appending it to itself using PDFMergerUtility results in a document that is not valid PDF/UA. ("Structural parent tree" issue)
With the patch applied the merge result is valid PDF/UA-1.

@@ -1498,8 +1499,8 @@ private void updateStructParentEntries(PDPage page, int structParentOffset) thro
List<PDAnnotation> newannots = new ArrayList<>(annots.size());
annots.forEach(annot ->
{
int structParent = annot.getStructParent();
if (structParent >= 0)
int structParent = annot.getCOSObject().getInt(COSName.STRUCT_PARENT, Integer.MIN_VALUE); // allow for negative struct parent values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't change anything but the default value if STRUCT_PARENT is null. If Integer.MIN_VALUE makes more sense than -1 the getter should be changed rather than this piece of code

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally kept the patch to one class and as simple and minimal as possible, e.g. not renaming variables. So feel free to adapt it and let me know if you have any questions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the question is, do we need to change the default value if the value isn't set?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be an API change which would break all specific (i.e. positiv or negative equals) checks for -1 in all integrations.
If a one-parameter method that defaults to Integer.MIN_VALUE is necessary (IMHO it is not), it would have to be a new one, e.g. getIntSigned(COSName) (and similar for array, possibly for other number types).
The only change in this regard that I consider necessary is improving the API documentation of getInt(COSName) (and similar methods) to clarify that they cannot be used if the result may be negative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants