Getting bounding boxes with different dpi settings #4252
-
Currently the bboxes I get when calling get_text are calculated under dpi=72. I have to resize the bbox myself if I need a higher dpi. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
You are referring to coordinate computation of page content inside the rendered image? This is no problem at all with whatever resolution you chose when creating the pixmap: Just take the image dimensions, which are represented by If you then take any |
Beta Was this translation helpful? Give feedback.
No, performance would not benefit from incorporating this - rather the contrary. The standard user unit size in PDF is 72 points per inch in 99.99999% of all cases, and any deviations from this will be taken care of by the base library MuPDF. So page content coordinates will be correct in any case.
From the perspective of text extraction, your use case is peripheral. For the sake of brevity, my above explanation was somewhat imprecise: The boundary box coordinates in the image should be integers, so the correct / complete computation is
(rect * matrix).irect
. For points there is no similar transformation available.Given all that, we don't want to bloat text extraction code with this sort…