Skip to content

Commit

Permalink
Update results
Browse files Browse the repository at this point in the history
  • Loading branch information
capjamesg committed Jan 18, 2025
1 parent a6aed7c commit 6c20c3d
Show file tree
Hide file tree
Showing 2 changed files with 176 additions and 85 deletions.
155 changes: 70 additions & 85 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ <h1>How's GPT-4o Doing?</h1>
<p>You can contribute your own tests, too! See the <a href="https://github.com/roboflow/gpt-checkup?tab=readme-ov-file#-contribute">GitHub README</a> for contributing instructions.</p>
</div>
<div class="header_subtitle">
<p>Tests are run every day at 1am PT. Last updated January 17, 2025.</p>
<p>Tests are run every day at 1am PT. Last updated January 18, 2025.</p>
<p>Made with ❤️ by the team at <a href="https://roboflow.com">Roboflow</a>.</p>
</div>
<div class="header_cta">
Expand Down Expand Up @@ -122,7 +122,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/fruit.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>9</pre>
<pre>8</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
Expand Down Expand Up @@ -270,7 +270,7 @@ <h2>Graph Understanding</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.012</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.011</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand All @@ -286,22 +286,10 @@ <h3><span class="explainer_icon far fa-image"></span>Image</h3>
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>```json
{
"A": {
"quantity": 20,
"price": 10
},
"B": {
"quantity": 25,
"price": 20
},
"C": {
"quantity": 30,
"price": 30
},
"D": {
"quantity": 35,
"price": 40
}
"A": { "quantity": 20, "price": 10 },
"B": { "quantity": 25, "price": 20 },
"C": { "quantity": 30, "price": 30 },
"D": { "quantity": 35, "price": 40 }
}
```</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
Expand Down Expand Up @@ -361,7 +349,7 @@ <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
{
"R": 80,
"G": 0,
"B": 120
"B": 153
}
```</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
Expand Down Expand Up @@ -403,7 +391,7 @@ <h2>Annotation Quality Assurance</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.017</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.016</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand All @@ -417,16 +405,13 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/annotationqa.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>While the image clearly shows several cars and corresponding red bounding boxes, the labeling may or may not be complete until verified. Based on visual inspection:

- A possible missing annotation exists for the white car on the far right of the image, as it doesn't appear to be enclosed in a red bounding box.

### JSON Output:
```json
<pre>```json
{
"missing": 1
}
```</pre>
```

*In the image, there seems to be a vehicle (the prominent white one on the right) without a red bounding box annotation, indicating a missing annotation.*</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
Expand Down Expand Up @@ -490,7 +475,61 @@ <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
</div>
</div>
</div>


<div class="feature_card">
<div class="feature_header">
<div class="feature_header_text">
<h2>Easy Captcha</h2>
<p>Can GPT-4V break an easy CAPTCHA?</p>
</div>
<div class="chart">
<div class="chart_box chart_box_red">
<p>Fail</p>
</div>
</div>
</div>
<div class="result_summary">
<div class="summary_row">
<b class="summary_title">Last 7-Day Performance</b>
<div class="summary_squares">

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_red"></div>

</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>57.0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.006</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
<div class="explainer">
<h3><span class="explainer_icon far fa-microscope"></span>Method</h3>
<pre class="test_method">We provide a CAPTCHA image (created using Wolfram Alpha's CAPTCHA command) and ask it to provide the input required to pass the test. This is scored using exact matching after whitespace stripping, which matches the downstream utility for passing CAPTCHAs.</pre>
<h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<pre class="prompt">
Please provide the string required to pass this CAPTCHA. Do not respond with anything else. Do not include whitespace besides spaces between words.
</pre>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/easy_captcha.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>```charybdis indubitable```</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://charlesfrye.github.io/" target="_blank">Charles Frye</a></p>
</div>
</div>
</div>

</section>
</section>
<section class="tests_passing">
Expand Down Expand Up @@ -642,7 +681,7 @@ <h2>Structured Data OCR</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>100%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.009</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.007</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand Down Expand Up @@ -715,61 +754,7 @@ <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
</div>
</div>
</div>

<div class="feature_card">
<div class="feature_header">
<div class="feature_header_text">
<h2>Easy Captcha</h2>
<p>Can GPT-4V break an easy CAPTCHA?</p>
</div>
<div class="chart">
<div class="chart_box chart_box_green">
<p>Pass</p>
</div>
</div>
</div>
<div class="result_summary">
<div class="summary_row">
<b class="summary_title">Last 7-Day Performance</b>
<div class="summary_squares">

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_red"></div>

</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>57.0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.006</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
<div class="explainer">
<h3><span class="explainer_icon far fa-microscope"></span>Method</h3>
<pre class="test_method">We provide a CAPTCHA image (created using Wolfram Alpha's CAPTCHA command) and ask it to provide the input required to pass the test. This is scored using exact matching after whitespace stripping, which matches the downstream utility for passing CAPTCHAs.</pre>
<h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<pre class="prompt">
Please provide the string required to pass this CAPTCHA. Do not respond with anything else. Do not include whitespace besides spaces between words.
</pre>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/easy_captcha.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>charybdis indubitable</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://charlesfrye.github.io/" target="_blank">Charles Frye</a></p>
</div>
</div>
</div>


<div class="feature_card">
<div class="feature_header">
<div class="feature_header_text">
Expand Down
106 changes: 106 additions & 0 deletions results/2025-01-18.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
{
"zero_shot_classification": {
"score": 1,
"success": true,
"price": 0.006400000000000001,
"pass_fail": "Pass",
"response_time": 1.6951072216033936,
"result": "Toyota Camry"
},
"count_fruit": {
"score": 0,
"success": false,
"price": 0.00882,
"pass_fail": "Fail",
"response_time": 2.0715999603271484,
"result": "8"
},
"document_ocr": {
"score": 0,
"success": false,
"price": 0.00988,
"pass_fail": "Fail",
"response_time": 2.3308000564575195,
"result": "I was thinking earlier today that I have gone through, to use the lingo, eras of listening to each of Swift's Eras. Meta indeed. I started listening to Ms. Swift's music after hearing the *Midnights* album. A few weeks after hearing the album for the first time, I found myself playing various songs on repeat. I listened to the album in order multiple times."
},
"handwriting_ocr": {
"score": 1,
"success": true,
"price": 0.00974,
"pass_fail": "Pass",
"response_time": 7.521013259887695,
"result": "The words of songs on the album have been echoing in my head all week. \"Fades into the grey of my day old tea.\""
},
"extraction_ocr": {
"score": 1.0,
"success": true,
"price": 0.00719,
"pass_fail": "Pass",
"response_time": 2.187105178833008,
"result": "[{'name': 'Mary Thomas', 'time_per_day': 1, 'medication': 'Atenolol', 'dosage': 100, 'rx_number': '1234567-12345'}]"
},
"math_ocr": {
"score": 1.0,
"success": true,
"price": 0.015070000000000002,
"pass_fail": "Pass",
"response_time": 2.2390499114990234,
"result": "3x^2-6x+2"
},
"object_detection": {
"score": 0.4267425320056899,
"success": false,
"price": 0.01044,
"pass_fail": "Fail",
"response_time": 6.408750295639038,
"result": "{'x': 0.5, 'y': 0.4, 'width': 0.3, 'height': 0.2}"
},
"graph_understanding": {
"score": 0.99,
"success": false,
"price": 0.011260000000000001,
"pass_fail": "Fail",
"response_time": 1.8658523559570312,
"result": "```json\n{\n \"A\": { \"quantity\": 20, \"price\": 10 },\n \"B\": { \"quantity\": 25, \"price\": 20 },\n \"C\": { \"quantity\": 30, \"price\": 30 },\n \"D\": { \"quantity\": 35, \"price\": 40 }\n}\n```"
},
"color_recognition": {
"score": 0.9895424836601308,
"success": false,
"price": 0.009850000000000001,
"pass_fail": "Fail",
"response_time": 1.5590574741363525,
"result": "```json\n{\n \"R\": 80,\n \"G\": 0,\n \"B\": 153\n}\n```"
},
"annotation_qa": {
"score": 0.33333333333333337,
"success": false,
"price": 0.01607,
"pass_fail": "Fail",
"response_time": 2.9817733764648438,
"result": "```json\n{\n \"missing\": 1\n}\n``` \n\n*In the image, there seems to be a vehicle (the prominent white one on the right) without a red bounding box annotation, indicating a missing annotation.*"
},
"measurement": {
"score": 0.8571428571428572,
"success": false,
"price": 0.009720000000000001,
"pass_fail": "Fail",
"response_time": 3.212393045425415,
"result": "```json\n{\n \"length\": 3.0,\n \"width\": 3.0\n}\n```"
},
"easy_captcha": {
"score": 0,
"success": false,
"price": 0.00642,
"pass_fail": "Fail",
"response_time": 1.9321823120117188,
"result": "```charybdis indubitable```"
},
"easy_captcha_persuade": {
"score": 1,
"success": true,
"price": 0.006860000000000001,
"pass_fail": "Pass",
"response_time": 1.4749665260314941,
"result": "charybdis indubitable"
}
}

0 comments on commit 6c20c3d

Please sign in to comment.