Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return type for surya_recognition #320

Open
kevinhu opened this issue Oct 26, 2024 · 0 comments
Open

Return type for surya_recognition #320

kevinhu opened this issue Oct 26, 2024 · 0 comments

Comments

@kevinhu
Copy link

kevinhu commented Oct 26, 2024

Looking at the surya_recognition function, its return type annotation List[Optional[Page]] suggests that each element in the returned list could be either a Page object or None.

However, based on the implementation, the function always returns a list of Page objects (never None).

def surya_recognition(doc, page_idxs, langs: List[str], rec_model, pages: List[Page], batch_multiplier=1) -> List[Optional[Page]]:
# Slice images in higher resolution than detection happened in
images = [render_image(doc[pnum], dpi=settings.SURYA_OCR_DPI) for pnum in page_idxs]
box_scale = settings.SURYA_OCR_DPI / settings.SURYA_DETECTOR_DPI
processor = rec_model.processor
selected_pages = [p for i, p in enumerate(pages) if i in page_idxs]
surya_langs = [langs] * len(page_idxs)
detection_results = [p.text_lines.bboxes for p in selected_pages]
polygons = deepcopy([[b.polygon for b in bboxes] for bboxes in detection_results])
# Scale polygons to get correct image slices
for j, poly in enumerate(polygons):
skip_idxs = []
for z, p in enumerate(poly):
for i in range(len(p)):
p[i] = [int(p[i][0] * box_scale), int(p[i][1] * box_scale)]
x_coords = [p[i][0] for i in range(len(p))]
y_coords = [p[i][1] for i in range(len(p))]
bbox = [min(x_coords), min(y_coords), max(x_coords), max(y_coords)]
if (bbox[2] - bbox[0]) * (bbox[3] - bbox[1]) == 0:
skip_idxs.append(z)
if len(skip_idxs) > 0:
polygons[j] = [p for i, p in enumerate(poly) if i not in skip_idxs]
results = run_recognition(images, surya_langs, rec_model, processor, polygons=polygons, batch_size=int(get_batch_size() * batch_multiplier))
new_pages = []
for idx, (page_idx, result, old_page) in enumerate(zip(page_idxs, results, selected_pages)):
text_lines = old_page.text_lines
ocr_results = result.text_lines
blocks = []
for i, line in enumerate(ocr_results):
scaled_bbox = rescale_bbox([0, 0, images[idx].size[0], images[idx].size[1]], old_page.text_lines.image_bbox, line.bbox)
block = Block(
bbox=scaled_bbox,
pnum=page_idx,
lines=[Line(
bbox=scaled_bbox,
spans=[Span(
text=line.text,
bbox=scaled_bbox,
span_id=f"{page_idx}_{i}",
font="",
font_weight=0,
font_size=0,
)
]
)]
)
blocks.append(block)
page = Page(
blocks=blocks,
pnum=page_idx,
bbox=old_page.text_lines.image_bbox,
rotation=0,
text_lines=text_lines,
ocr_method="surya"
)
new_pages.append(page)
return new_pages

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant