Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(formula-detection): improve formula font detection and adjust element order #142

Merged
merged 2 commits into from
Mar 9, 2025

Conversation

awwaawwa
Copy link
Member

@awwaawwa awwaawwa commented Mar 9, 2025

This PR enhances formula detection capabilities and optimizes element processing order:

  • Add GlosaMath font pattern to formula font detection
  • Reorder paragraph types in paragraph_finder for better processing priority
  • Improve detection of mathematical content in documents

Technical Details

  1. Added "GlosaMath.+" to the formula font pattern regex to properly detect formulas using GlosaMath font
  2. Reordered paragraph types in paragraph_finder.py to optimize processing priority, placing tables and figures after their captions
  3. These changes improve the accuracy of formula detection and the overall document structure analysis

awwaawwa added 2 commits March 9, 2025 17:02
…ement order

- Add GlosaMath font pattern to formula font detection\n- Reorder paragraph types in paragraph_finder for better processing priority\n- Improve detection of mathematical content in documents
@awwaawwa awwaawwa merged commit d8878c9 into main Mar 9, 2025
@awwaawwa awwaawwa deleted the feat/improve-formula-detection branch March 9, 2025 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant