Skip to content

Commit

Permalink
Update results
Browse files Browse the repository at this point in the history
  • Loading branch information
capjamesg committed Dec 15, 2024
1 parent 88f72cc commit a1c2dc2
Show file tree
Hide file tree
Showing 2 changed files with 115 additions and 9 deletions.
18 changes: 9 additions & 9 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ <h1>How's GPT-4o Doing?</h1>
<p>You can contribute your own tests, too! See the <a href="https://github.com/roboflow/gpt-checkup?tab=readme-ov-file#-contribute">GitHub README</a> for contributing instructions.</p>
</div>
<div class="header_subtitle">
<p>Tests are run every day at 1am PT. Last updated December 14, 2024.</p>
<p>Tests are run every day at 1am PT. Last updated December 15, 2024.</p>
<p>Made with ❤️ by the team at <a href="https://roboflow.com">Roboflow</a>.</p>
</div>
<div class="header_cta">
Expand Down Expand Up @@ -122,7 +122,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/fruit.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>8</pre>
<pre>9</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
Expand Down Expand Up @@ -230,7 +230,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/fruit.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>{'x': 0.35, 'y': 0.35, 'width': 0.18, 'height': 0.25}</pre>
<pre>{'x': 0.5, 'y': 0.4, 'width': 0.2, 'height': 0.35}</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
Expand Down Expand Up @@ -347,9 +347,9 @@ <h3><span class="explainer_icon far fa-image"></span>Image</h3>
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>```json
{
"R": 80,
"R": 78,
"G": 0,
"B": 128
"B": 130
}
```</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
Expand Down Expand Up @@ -405,9 +405,9 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/annotationqa.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>To evaluate the completeness of the annotations, I analyzed the red bounding boxes and checked for visible cars that are not enclosed within a bounding box. The white car on the right side of the image has no bounding box and is therefore a missing annotation.
<pre>Based on observation, the cars in the foreground and background are mostly annotated with red bounding boxes. However, there appears to be at least one car (on the left lane in the far background) that is not annotated.

Here is the response:
Here is the JSON response:

```json
{
Expand Down Expand Up @@ -453,7 +453,7 @@ <h2>Measurement Test</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.009</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.01</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand All @@ -467,7 +467,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/measurement.jpg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>Based on the ruler in the image, the square sticker appears to be approximately 3 inches in both length and width.
<pre>Based on the ruler in the image, the square sticker measures approximately 3 inches for both the length and width. Here's the JSON:

```json
{
Expand Down
106 changes: 106 additions & 0 deletions results/2024-12-15.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
{
"zero_shot_classification": {
"score": 1,
"success": true,
"price": 0.00481,
"pass_fail": "Pass",
"response_time": 1.8396103382110596,
"result": "Toyota Camry"
},
"count_fruit": {
"score": 0,
"success": false,
"price": 0.007870000000000002,
"pass_fail": "Fail",
"response_time": 1.9412617683410645,
"result": "9"
},
"document_ocr": {
"score": 0,
"success": false,
"price": 0.0086,
"pass_fail": "Fail",
"response_time": 2.5448005199432373,
"result": "I was thinking earlier today that I have gone through, to use the lingo, eras of listening to each of Swift's Eras. Meta indeed. I started listening to Ms. Swift's music after hearing the *Midnights* album. A few weeks after hearing the album for the first time, I found myself playing various songs on repeat. I listened to the album in order multiple times."
},
"handwriting_ocr": {
"score": 1,
"success": true,
"price": 0.00876,
"pass_fail": "Pass",
"response_time": 7.667215347290039,
"result": "The words of songs on the album have been echoing in my head all week. \"Fades into the grey of my day old tea.\""
},
"extraction_ocr": {
"score": 1.0,
"success": true,
"price": 0.00719,
"pass_fail": "Pass",
"response_time": 2.70133113861084,
"result": "[{'name': 'Mary Thomas', 'time_per_day': 1, 'medication': 'Atenolol', 'dosage': 100, 'rx_number': '1234567-12345'}]"
},
"math_ocr": {
"score": 1.0,
"success": true,
"price": 0.015290000000000002,
"pass_fail": "Pass",
"response_time": 3.4348642826080322,
"result": "3x^2-6x+2"
},
"object_detection": {
"score": 0.4872881355932201,
"success": false,
"price": 0.009490000000000002,
"pass_fail": "Fail",
"response_time": 2.5180845260620117,
"result": "{'x': 0.5, 'y': 0.4, 'width': 0.2, 'height': 0.35}"
},
"graph_understanding": {
"score": 0.99,
"success": false,
"price": 0.01031,
"pass_fail": "Fail",
"response_time": 1.978865146636963,
"result": "```json\n{\n \"A\": { \"quantity\": 20, \"price\": 10 },\n \"B\": { \"quantity\": 25, \"price\": 20 },\n \"C\": { \"quantity\": 30, \"price\": 30 },\n \"D\": { \"quantity\": 35, \"price\": 40 }\n}\n```"
},
"color_recognition": {
"score": 0.9620915032679739,
"success": false,
"price": 0.008870000000000001,
"pass_fail": "Fail",
"response_time": 3.2238097190856934,
"result": "```json\n{\n \"R\": 78,\n \"G\": 0,\n \"B\": 130\n}\n```"
},
"annotation_qa": {
"score": 0.33333333333333337,
"success": false,
"price": 0.016800000000000002,
"pass_fail": "Fail",
"response_time": 2.8600192070007324,
"result": "Based on observation, the cars in the foreground and background are mostly annotated with red bounding boxes. However, there appears to be at least one car (on the left lane in the far background) that is not annotated.\n\nHere is the JSON response:\n\n```json\n{\n \"missing\": 1\n}\n```"
},
"measurement": {
"score": 0.8571428571428572,
"success": false,
"price": 0.00958,
"pass_fail": "Fail",
"response_time": 3.3993048667907715,
"result": "Based on the ruler in the image, the square sticker measures approximately 3 inches for both the length and width. Here's the JSON:\n\n```json\n{\n \"length\": 3.0,\n \"width\": 3.0\n}\n```"
},
"easy_captcha": {
"score": 1,
"success": true,
"price": 0.004790000000000001,
"pass_fail": "Pass",
"response_time": 1.674936294555664,
"result": "charybdis indubitable"
},
"easy_captcha_persuade": {
"score": 1,
"success": true,
"price": 0.00529,
"pass_fail": "Pass",
"response_time": 1.3542070388793945,
"result": "charybdis indubitable"
}
}

0 comments on commit a1c2dc2

Please sign in to comment.