Skip to content

Commit

Permalink
feat(translator): skip pages that are too large to process
Browse files Browse the repository at this point in the history
- Add size check for PDF pages (height > 1200 or width > 2000)\n- Log warning message for skipped large pages\n- Improve processing efficiency by avoiding unnecessary work on oversized pages
  • Loading branch information
awwaawwa committed Mar 8, 2025
1 parent d774a6f commit ebd693d
Showing 1 changed file with 10 additions and 0 deletions.
10 changes: 10 additions & 0 deletions babeldoc/high_level.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,8 +138,18 @@ def start_parse_il(
if pages and (pageno not in pages):
continue
page.pageno = pageno

if not translation_config.should_translate_page(pageno + 1):
continue

height, width = (
page.cropbox[3] - page.cropbox[1],
page.cropbox[2] - page.cropbox[0],
)
if height > 1200 or width > 2000:
logger.warning(f"page {pageno + 1} is too large, skip")
continue

translation_config.raise_if_cancelled()
# The current program no longer relies on
# the following layout recognition results,
Expand Down

0 comments on commit ebd693d

Please sign in to comment.