Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing text using Type 1 (CID) font in markdown result #618

Open
sophie-aistribute opened this issue Feb 12, 2025 · 0 comments
Open

Missing text using Type 1 (CID) font in markdown result #618

sophie-aistribute opened this issue Feb 12, 2025 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@sophie-aistribute
Copy link

Describe the bug
The following fonts were used in the PDF, Llama Parse can extract all the text but failed to include some of them (those were using Type 1 (CID) font) into the markdown result.

Image

Files
type1-cid.PDF

Job ID
a924c195-7455-45c0-a186-c3d6527c2df2

Client:

  • Frontend (cloud.llamaindex.ai)
  • Typescript Library

Additional context

  • Premium mode
  • Skip image extraction

Returned Result:

{
  "pages": [
    {
      "page": 1,
      "text": "SHANGHAI,CHINA\n\n\n\nEDMONTON,AB\n\n\n\n                07-JAN-2025",
      "md": "07-JAN-2025",
      "images": [
        {
          "name": "page_1.jpg",
          "height": 841.95,
          "width": 595.35,
          "x": 0,
          "y": 0,
          "original_width": 1131,
          "original_height": 1600,
          "type": "full_page_screenshot"
        }
      ],
      "charts": [],
      "items": [
        {
          "type": "text",
          "value": "07-JAN-2025",
          "md": "07-JAN-2025",
          "bBox": {
            "x": 485,
            "y": 693.95,
            "w": 44,
            "h": 8
          }
        }
      ],
      "status": "OK",
      "links": [],
      "width": 595.35,
      "height": 841.95,
      "triggeredAutoMode": false,
      "parsingMode": "premium",
      "structuredData": null,
      "noStructuredContent": false,
      "noTextContent": false
    }
  ],
  "job_metadata": {
    "credits_used": 150,
    "job_credits_usage": 0,
    "job_pages": 0,
    "job_auto_mode_triggered_pages": 0,
    "job_is_cache_hit": true,
    "credits_max": 1000
  }

Full page screenshot

Image

@sophie-aistribute sophie-aistribute added the bug Something isn't working label Feb 12, 2025
@hexapode hexapode self-assigned this Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants