Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON Support for generic and formatted output #18

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

rafeyrana
Copy link

@rafeyrana rafeyrana commented Dec 25, 2024

JSON Output Support

Changes:

  1. Added support for reliable JSON output

    • Generic JSON output: The model dynamically generates JSON based on the image hierarchy.
    • Predefined format JSON output: Users can pass a response type definition object to enforce consistent JSON structure.
  2. Code changes for modularity and extendibility

    • Code cleanup: Refactored for better maintainability.
    • New folder structure:
      • services/: Application logic.
      • utils/: Reusable helper functions.
      • prompts/: Centralized prompt definitions.
  3. Comprehensive testing suite

    • Test coverage for all 6 flows:
      • Input types: Remote URL and Local file.
      • Output types: Markdown, Generic JSON, Predefined JSON format.

Enhancements in OCR Functionality:

The ocr function now supports three optional parameters for JSON support:

  • outputFormat
    • Options: ["json", "markdown"]
    • Default: "markdown"
  • jsonStructure
    • Options: null (generic JSON) or a predefined format object.
    • Default: Generic JSON output when outputFormat = "json".
  • reattempts
    • Number of reattempts for predefined JSON format if the model fails to generate the specified structure.
    • Default: 3

Test results:

llama-ocr % npm test  

> [email protected] test
> tsx ./test/index.ts


=== Testing Markdown Output ===

Test 1: Local Receipt - Markdown
**Trader Joe's Receipt**

**Store Information**

*   Store Name: Trader Joe's
*   Store Address: 785 Oak Grove Road, Concord, CA 94518
*   Store Phone Number: (925) 521-1134

**Receipt Details**

*   **Transaction Date and Time:** Not Provided
*   **Transaction Number:** Not Provided

**Items Purchased**

| Item Description | Quantity | Unit Price | Total Price |
| --- | --- | --- | --- |
| Sour Cream & Onion Corn | 2 | $2.49 | $4.98 |
| Sliced Whole Wheat Bread | 1 | $2.49 | $2.49 |
| Rice Cakes Korean Tteok | 1 | $3.99 | $3.99 |
| Squash Zucchini 1.5 lb | 1 | $2.49 | $2.49 |
| Greens Kale 10 oz | 1 | $1.99 | $1.99 |
| Squash Spaghetti Each | 2 | $2.49 | $4.98 |
| 50% Less Salt Roasted SA | 1 | $2.99 | $2.99 |
| Banana Each | 6 | $0.19 | $1.14 |
| Pasta Gnocchi Pranzo | 1 | $1.99 | $1.99 |
| ORG Coconut Milk | 1 | $1.69 | $1.69 |
| ORG Yellow Mustard | 1 | $1.79 | $1.79 |
| HOL Traditional Active D | 1 | $1.29 | $1.29 |

**Subtotal:** $26.83

**Balance to Pay:** $25.00

**Gift Card Tendered:** $1.83

**Payment Method:** Visa Debit

**Footer**

*   **Payment Card Purchase Transaction**
*   **Customer Copy**
✅ Test 1 passed

Test 2: Remote Receipt - Markdown
The provided image is a receipt from a bar or restaurant, titled "Berghotel Grosse Scheidegg" with an address of "3818 Grindelwald Familie R. Müller". The receipt includes the following information:

* Date and Time: 30.07.2007/13:29:17
* Bar Number: 4572
* Table Number: 7/01

The receipt lists the items ordered, along with their prices in Swiss Francs (CHF):

| Item | Price (CHF) |
| --- | --- |
| 2xLatte Macchiato | 9.00 |
| 1xGloki | 5.00 |
| 1xSchweinschnitzel | 22.00 |
| 1xChässpätzli | 18.50 |

The total cost of the order is 54.50 CHF, with a Value Added Tax (VAT) of 3.85 CHF, making the total amount due 54.50 CHF. The receipt also includes the following notes:

* Incl. 7.6% MwSt
* Entsricht in Euro 36.33 EUR
* Es bediente Sie: Ursula

The receipt concludes with the following contact information:

* MwSt Nr.: 430 234
* Tel.: 033 853 67 16
* Fax.: 033 853 67 19
* E-mail: [email protected]
✅ Test 2 passed

=== Testing JSON Output (No Structure) ===

Test 3: Local Receipt - JSON (No Structure)
{
  "store": {
    "name": "Trader Joe's",
    "address": "785 Oak Grove Road, Concord, CA 94518"
  },
  "items": [
    {
      "name": "Sour Cream & Onion Corn",
      "price": "$2.49",
      "quantity": 1
    },
    {
      "name": "Sliced Whole Wheat Bread",
      "price": "$2.49",
      "quantity": 1
    },
    {
      "name": "Rice Cakes Korean Tteok",
      "price": "$3.99",
      "quantity": 1
    },
    {
      "name": "Squash Zucchini 1.5 lb",
      "price": "$2.49",
      "quantity": 1
    },
    {
      "name": "Greens Kale 10 oz",
      "price": "$1.99",
      "quantity": 1
    },
    {
      "name": "Squash Spaghetti Each",
      "price": "$2.49",
      "quantity": 1
    },
    {
      "name": "50% Less Salt Roasted Sa",
      "price": "$2.99",
      "quantity": 1
    },
    {
      "name": "Banana Each",
      "price": "$1.14",
      "quantity": 6
    },
    {
      "name": "Pasta Gnocchi Pranzo",
      "price": "$1.99",
      "quantity": 1
    },
    {
      "name": "ORG Coconut Milk",
      "price": "$1.69",
      "quantity": 1
    },
    {
      "name": "ORG Yellow Mustard",
      "price": "$1.79",
      "quantity": 1
    },
    {
      "name": "HOL Traditional Active D",
      "price": "$1.29",
      "quantity": 1
    }
  ],
  "totals": {
    "subtotal": "$26.83",
    "tax": "$0.00",
    "total": "$26.83"
  },
  "payment": {
    "method": "Visa Debit",
    "amount": "$1.83"
  }
}
✅ Test 3 passed

Test 4: Remote Receipt - JSON (No Structure)
{
  "receipt": {
    "header": {
      "company_name": "Berghotel Grosse Scheidegg",
      "address": "3818 Grindelwald Familie R. Müller"
    },
    "date": "30.07.2007/13:29:17",
    "bar": "Tisch 7/01",
    "items": [
      {
        "description": "2xLatte Macchiato",
        "quantity": "à 4.50 CHF",
        "total": "9.00"
      },
      {
        "description": "1xGloki",
        "quantity": "à 5.00 CHF",
        "total": "5.00"
      },
      {
        "description": "1xSchweinschnitzel",
        "quantity": "à 22.00 CHF",
        "total": "22.00"
      },
      {
        "description": "1xChässpätzli",
        "quantity": "à 18.50 CHF",
        "total": "18.50"
      }
    ],
    "subtotals": {
      "total": "CHF 54.50",
      "incl_7_6_MwSt": "CHF 54.50",
      "entspricht_in_Euro": "36.33 EUR"
    },
    "footer": {
      "mwst_nr": "430 234",
      "tel": "033 853 67 16",
      "fax": "033 853 67 19",
      "email": "[email protected]"
    }
  }
}
✅ Test 4 passed

=== Testing JSON Output (With Structure) ===

Test 5: Local Receipt - JSON (With Structure)

Convert the provided image into JSON format matching exactly the following structure:

{
  "store": {
    "name": "string",
    "address": "string",
    "phone": "string"
  },
  "transaction": {
    "type": "string",
    "items": [
      {
        "name": "string",
        "price": "number"
      }
    ],
    "total": "number",
    "payment": {
      "method": "string",
      "amount": "number"
    }
  }
}

Requirements:
- Must match the provided structure exactly
- All fields must be present
- No additional fields allowed
- No Delimiters: Do not use code fences or delimiters like ``` . DO NOT INCLUDE ANY OTHER COMMENT OR EXPLANATIONS JUST OUTPUT THE JSON
- COMPULSORY REQUIREMENT: YOUR RESPONSE SHOULD ONLY BE THE JSON OBJECT REQUESTED. THE RESPONSE SHOULD BE DIRECTLY PARSEABLE INTO JSON USING JSON.PARSE()

this is the result from the model: 
  {
  "store": {
    "name": "Trader Joe's",
    "address": "785 Oak Grove Road, Concord, CA 94518",
    "phone": "925 521-1134"
  },
  "transaction": {
    "type": "sale",
    "items": [
      {
        "name": "Sour Cream & Onion Corn",
        "price": 2.49
      },
      {
        "name": "Sliced Whole Wheat Bread",
        "price": 2.49
      },
      {
        "name": "Rice Cakes Korean Tteok",
        "price": 3.99
      },
      {
        "name": "Squash Zucchini 1.5 lb",
        "price": 2.49
      },
      {
        "name": "Greens Kale 10 oz",
        "price": 1.99
      },
      {
        "name": "Squash Spaghetti each",
        "price": 2.49
      },
      {
        "name": "50% Less Salt Roasted SA Banana each",
        "price": 2.99
      },
      {
        "name": "Pasta Gnocchi Pranzo",
        "price": 1.99
      },
      {
        "name": "ORG Coconut Milk",
        "price": 1.69
      },
      {
        "name": "ORG Yellow Mustard",
        "price": 1.79
      },
      {
        "name": "HOL Traditional Active D",
        "price": 1.29
      }
    ],
    "total": 26.83,
    "payment": {
      "method": "Gift Card Tendered",
      "amount": 25
    }
  }
}
{
  "store": {
    "name": "Trader Joe's",
    "address": "785 Oak Grove Road, Concord, CA 94518",
    "phone": "925 521-1134"
  },
  "transaction": {
    "type": "sale",
    "items": [
      {
        "name": "Sour Cream & Onion Corn",
        "price": 2.49
      },
      {
        "name": "Sliced Whole Wheat Bread",
        "price": 2.49
      },
      {
        "name": "Rice Cakes Korean Tteok",
        "price": 3.99
      },
      {
        "name": "Squash Zucchini 1.5 lb",
        "price": 2.49
      },
      {
        "name": "Greens Kale 10 oz",
        "price": 1.99
      },
      {
        "name": "Squash Spaghetti each",
        "price": 2.49
      },
      {
        "name": "50% Less Salt Roasted SA Banana each",
        "price": 2.99
      },
      {
        "name": "Pasta Gnocchi Pranzo",
        "price": 1.99
      },
      {
        "name": "ORG Coconut Milk",
        "price": 1.69
      },
      {
        "name": "ORG Yellow Mustard",
        "price": 1.79
      },
      {
        "name": "HOL Traditional Active D",
        "price": 1.29
      }
    ],
    "total": 26.83,
    "payment": {
      "method": "Gift Card Tendered",
      "amount": 25
    }
  }
}
✅ Test 5 passed

Test 6: Remote Receipt - JSON (With Structure)

Convert the provided image into JSON format matching exactly the following structure:

{
  "store": {
    "name": "string",
    "address": "string",
    "phone": "string"
  },
  "transaction": {
    "type": "string",
    "items": [
      {
        "name": "string",
        "price": "number"
      }
    ],
    "total": "number",
    "payment": {
      "method": "string",
      "amount": "number"
    }
  }
}

Requirements:
- Must match the provided structure exactly
- All fields must be present
- No additional fields allowed
- No Delimiters: Do not use code fences or delimiters like ``` . DO NOT INCLUDE ANY OTHER COMMENT OR EXPLANATIONS JUST OUTPUT THE JSON
- COMPULSORY REQUIREMENT: YOUR RESPONSE SHOULD ONLY BE THE JSON OBJECT REQUESTED. THE RESPONSE SHOULD BE DIRECTLY PARSEABLE INTO JSON USING JSON.PARSE()

this is the result from the model: 
  {
  "store": {
    "name": "Berghotel Grosse Scheidegg",
    "address": "3818 Grindelwald",
    "phone": "033 853 67 16"
  },
  "transaction": {
    "type": "bar",
    "items": [
      {
        "name": "2xLatte Macchiato",
        "price": 9
      },
      {
        "name": "1xGloki",
        "price": 5
      },
      {
        "name": "1xSchweinschnitzel",
        "price": 22
      },
      {
        "name": "1xChässpätzli",
        "price": 18.5
      }
    ],
    "total": 54.5,
    "payment": {
      "method": "cash",
      "amount": 36.33
    }
  }
}
{
  "store": {
    "name": "Berghotel Grosse Scheidegg",
    "address": "3818 Grindelwald",
    "phone": "033 853 67 16"
  },
  "transaction": {
    "type": "bar",
    "items": [
      {
        "name": "2xLatte Macchiato",
        "price": 9
      },
      {
        "name": "1xGloki",
        "price": 5
      },
      {
        "name": "1xSchweinschnitzel",
        "price": 22
      },
      {
        "name": "1xChässpätzli",
        "price": 18.5
      }
    ],
    "total": 54.5,
    "payment": {
      "method": "cash",
      "amount": 36.33
    }
  }
}
✅ Test 6 passed

=== Test Summary ===
6/6 tests passed ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant