Skip to content

Getting bounding boxes with different dpi settings #4252

Closed Answered by JorjMcKie
zhuwenfei-wintech asked this question in Q&A
Discussion options

You must be logged in to vote

No, performance would not benefit from incorporating this - rather the contrary. The standard user unit size in PDF is 72 points per inch in 99.99999% of all cases, and any deviations from this will be taken care of by the base library MuPDF. So page content coordinates will be correct in any case.

From the perspective of text extraction, your use case is peripheral. For the sake of brevity, my above explanation was somewhat imprecise: The boundary box coordinates in the image should be integers, so the correct / complete computation is (rect * matrix).irect. For points there is no similar transformation available.
Given all that, we don't want to bloat text extraction code with this sort…

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@zhuwenfei-wintech
Comment options

@JorjMcKie
Comment options

Answer selected by zhuwenfei-wintech
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants