Skip to content

Commit

Permalink
Update results
Browse files Browse the repository at this point in the history
  • Loading branch information
capjamesg committed Jan 12, 2025
1 parent c274987 commit c79714f
Show file tree
Hide file tree
Showing 2 changed files with 176 additions and 72 deletions.
142 changes: 70 additions & 72 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ <h1>How's GPT-4o Doing?</h1>
<p>You can contribute your own tests, too! See the <a href="https://github.com/roboflow/gpt-checkup?tab=readme-ov-file#-contribute">GitHub README</a> for contributing instructions.</p>
</div>
<div class="header_subtitle">
<p>Tests are run every day at 1am PT. Last updated January 11, 2025.</p>
<p>Tests are run every day at 1am PT. Last updated January 12, 2025.</p>
<p>Made with ❤️ by the team at <a href="https://roboflow.com">Roboflow</a>.</p>
</div>
<div class="header_cta">
Expand Down Expand Up @@ -216,7 +216,7 @@ <h2>Object Detection</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.01</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.009</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand All @@ -230,7 +230,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/fruit.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>{'x': 0.45, 'y': 0.4, 'width': 0.25, 'height': 0.4}</pre>
<pre>{'x': 0.4, 'y': 0.35, 'width': 0.25, 'height': 0.4}</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
Expand Down Expand Up @@ -361,7 +361,7 @@ <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
{
"R": 80,
"G": 0,
"B": 128
"B": 130
}
```</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
Expand Down Expand Up @@ -417,81 +417,18 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/annotationqa.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>Based on the visible cars in the image, there appear to be missing annotations. The white car on the far right and one or more smaller distant cars behind the existing bounding boxes do not appear to be annotated.

Here's the JSON response:

```json
{
"missing": 2
}
```</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
</div>

<div class="feature_card">
<div class="feature_header">
<div class="feature_header_text">
<h2>Measurement Test</h2>
<p>Can GPT-4 Measure Items Using a Reference?</p>
</div>
<div class="chart">
<div class="chart_box chart_box_red">
<p>Fail</p>
</div>
</div>
</div>
<div class="result_summary">
<div class="summary_row">
<b class="summary_title">Last 7-Day Performance</b>
<div class="summary_squares">

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_red"></div>

</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.011</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
<div class="explainer">
<h3><span class="explainer_icon far fa-microscope"></span>Method</h3>
<pre class="test_method">This test gives GPT-4 a image of a square sticker with a ruler on one side and asks it to provide a number for the length and width. We score this test based on precent error, gauging how far it is from the actual value.</pre>
<h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<pre class="prompt">
What is the length and width of this square sticker, based on the ruler? Return a JSON with float properties for `length` and `width` representing inches.
</pre>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/measurement.jpg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>To measure the square sticker relative to the ruler in the image, the sticker appears approximately 3 inches in both length and width. Here's the JSON output:
<pre>Based on the provided image, the cars visible in the scene mostly have bounding boxes already drawn around them, but one car on the right edge in the foreground (the white car) appears to lack an annotation.

```json
{
"length": 3.0,
"width": 3.0
"missing": 1
}
```</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
</div>

</section>
</section>
<section class="tests_passing">
Expand Down Expand Up @@ -643,7 +580,7 @@ <h2>Structured Data OCR</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>100%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.009</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.007</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand Down Expand Up @@ -716,7 +653,68 @@ <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
</div>
</div>
</div>


<div class="feature_card">
<div class="feature_header">
<div class="feature_header_text">
<h2>Measurement Test</h2>
<p>Can GPT-4 Measure Items Using a Reference?</p>
</div>
<div class="chart">
<div class="chart_box chart_box_green">
<p>Pass</p>
</div>
</div>
</div>
<div class="result_summary">
<div class="summary_row">
<b class="summary_title">Last 7-Day Performance</b>
<div class="summary_squares">

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_red"></div>

</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.011</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
<div class="explainer">
<h3><span class="explainer_icon far fa-microscope"></span>Method</h3>
<pre class="test_method">This test gives GPT-4 a image of a square sticker with a ruler on one side and asks it to provide a number for the length and width. We score this test based on precent error, gauging how far it is from the actual value.</pre>
<h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<pre class="prompt">
What is the length and width of this square sticker, based on the ruler? Return a JSON with float properties for `length` and `width` representing inches.
</pre>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/measurement.jpg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>Based on the ruler in the image, the square sticker appears to have a **length** and **width** of approximately **3.5 inches** each. Here’s the JSON representation:

```json
{
"length": 3.5,
"width": 3.5
}
```</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
</div>

<div class="feature_card">
<div class="feature_header">
<div class="feature_header_text">
Expand Down
106 changes: 106 additions & 0 deletions results/2025-01-12.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
{
"zero_shot_classification": {
"score": 1,
"success": true,
"price": 0.006400000000000001,
"pass_fail": "Pass",
"response_time": 1.8045542240142822,
"result": "Toyota Camry"
},
"count_fruit": {
"score": 0,
"success": false,
"price": 0.00882,
"pass_fail": "Fail",
"response_time": 2.806345224380493,
"result": "8"
},
"document_ocr": {
"score": 0,
"success": false,
"price": 0.00988,
"pass_fail": "Fail",
"response_time": 2.5980618000030518,
"result": "I was thinking earlier today that I have gone through, to use the lingo, eras of listening to each of Swift's Eras. Meta indeed. I started listening to Ms. Swift's music after hearing the *Midnights* album. A few weeks after hearing the album for the first time, I found myself playing various songs on repeat. I listened to the album in order multiple times."
},
"handwriting_ocr": {
"score": 1,
"success": true,
"price": 0.00974,
"pass_fail": "Pass",
"response_time": 6.301056623458862,
"result": "The words of songs on the album have been echoing in my head all week. \"Fades into the grey of my day old tea.\""
},
"extraction_ocr": {
"score": 1.0,
"success": true,
"price": 0.00719,
"pass_fail": "Pass",
"response_time": 2.8219246864318848,
"result": "[{'name': 'Mary Thomas', 'time_per_day': 1, 'medication': 'Atenolol', 'dosage': 100, 'rx_number': '1234567-12345'}]"
},
"math_ocr": {
"score": 1.0,
"success": true,
"price": 0.015070000000000002,
"pass_fail": "Pass",
"response_time": 3.1533970832824707,
"result": "3x^2-6x+2"
},
"object_detection": {
"score": 0.4833538840937115,
"success": false,
"price": 0.009490000000000002,
"pass_fail": "Fail",
"response_time": 3.758528232574463,
"result": "{'x': 0.4, 'y': 0.35, 'width': 0.25, 'height': 0.4}"
},
"graph_understanding": {
"score": 0.99,
"success": false,
"price": 0.01079,
"pass_fail": "Fail",
"response_time": 2.198917865753174,
"result": "```json\n{\n \"A\": {\n \"quantity\": 20,\n \"price\": 10\n },\n \"B\": {\n \"quantity\": 25,\n \"price\": 20\n },\n \"C\": {\n \"quantity\": 30,\n \"price\": 30\n },\n \"D\": {\n \"quantity\": 35,\n \"price\": 40\n }\n}\n```"
},
"color_recognition": {
"score": 0.9594771241830066,
"success": false,
"price": 0.009850000000000001,
"pass_fail": "Fail",
"response_time": 5.018013954162598,
"result": "```json\n{\n \"R\": 80,\n \"G\": 0,\n \"B\": 130\n}\n```"
},
"annotation_qa": {
"score": 0.33333333333333337,
"success": false,
"price": 0.01634,
"pass_fail": "Fail",
"response_time": 2.9421772956848145,
"result": "Based on the provided image, the cars visible in the scene mostly have bounding boxes already drawn around them, but one car on the right edge in the foreground (the white car) appears to lack an annotation.\n\n```json\n{\n \"missing\": 1\n}\n```"
},
"measurement": {
"score": 1.0,
"success": true,
"price": 0.01086,
"pass_fail": "Pass",
"response_time": 3.84500789642334,
"result": "Based on the ruler in the image, the square sticker appears to have a **length** and **width** of approximately **3.5 inches** each. Here\u2019s the JSON representation:\n\n```json\n{\n \"length\": 3.5,\n \"width\": 3.5\n}\n```"
},
"easy_captcha": {
"score": 1,
"success": true,
"price": 0.00636,
"pass_fail": "Pass",
"response_time": 2.8851521015167236,
"result": "charybdis indubitable"
},
"easy_captcha_persuade": {
"score": 1,
"success": true,
"price": 0.006860000000000001,
"pass_fail": "Pass",
"response_time": 1.6607348918914795,
"result": "charybdis indubitable"
}
}

0 comments on commit c79714f

Please sign in to comment.