Add webpage

Dev1nW · Nov 25, 2024 · f18dc7e · f18dc7e
1 parent 7fc3d4b
commit f18dc7e
Show file tree

Hide file tree

Showing 17 changed files with 469 additions and 0 deletions.
diff --git a/docs/Norm_perf.jpg b/docs/Norm_perf.jpg
diff --git a/docs/Norm_perf_ind.jpg b/docs/Norm_perf_ind.jpg
diff --git a/docs/about.html b/docs/about.html
@@ -0,0 +1,42 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>About Atari-GPT</title>
+    <link rel="stylesheet" href="styles.css">
+</head>
+<body>
+    <header>
+        <h1>About Atari-GPT</h1>
+        <nav>
+            <a href="index.html">Home</a>
+            <a href="results.html">Results</a>
+            <a href="videos.html">Videos</a>
+            <a href="contact.html">Contact</a>
+        </nav>
+    </header>
+    <main>
+        <section>
+            <h2>Abstract</h2>
+            <p>Recent advancements in large language models (LLMs) have expanded their capabilities beyond traditional text-based tasks to multimodal domains, integrating visual, auditory, and textual data. While multimodal LLMs have been extensively explored for high-level planning in domains like robotics and games, their potential as low-level controllers remains largely untapped. In this paper we introduce a novel benchmark aimed at testing the emergent capabilities of multimodal LLMs as low-level policies in Atari games. Unlike traditional reinforcement learning (RL) methods that require training for each new environment and reward function specification, these LLMs utilize pre-existing multimodal knowledge to directly engage with game environments. Our study assesses multiple multimodal LLMs performance against traditional RL agents, human players, and random agents, focusing on their ability to understand and interact with complex visual scenes and formulate strategic responses. Our results show that these multimodal LLMs are not yet capable of being zero-shot low-level policies, further, we see that this is, in part, due to their visual and spatial reasoning. </p>
+        </section>
+        <section>
+            <h2>Key Contributions</h2>
+            <ul>
+                <li>Introduction of a benchmark for assessing multimodal LLMs as low-level controllers.</li>
+                <li>Comparison of game play performance across GPT-4V, GPT-4o, Gemini 1.5 Flash, and Claude 3 Haiku.</li>
+                <li>Investigation into the visual and spatial reasoning capabilities of these models.</li>
+                <li>Identifying key areas of research which can improve performance.</li>
+            </ul>
+        </section>
+        <section>
+            <h2>Why It Matters</h2>
+            <p>Atari-GPT pushes the boundaries of what LLMs can achieve, exploring their application beyond text-based tasks into visually complex, real-time decision-making environments. This work investigates the emergent capabilities of LLMs to perform as low-level controllers in Atari, which sets the foundation for future work in more advanced environments.</p>
+        </section>
+    </main>
+    <footer>
+        <p>&copy; 2024 Atari-GPT Research Team.</p>
+    </footer>
+</body>
+</html>
diff --git a/docs/contact.html b/docs/contact.html
@@ -0,0 +1,46 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Contact Us</title>
+    <link rel="stylesheet" href="styles.css">
+</head>
+<body>
+    <header>
+        <h1>Contact Us</h1>
+        <nav>
+            <a href="index.html">Home</a>
+            <a href="about.html">About</a>
+            <a href="results.html">Results</a>
+            <a href="videos.html">Videos</a>
+        </nav>
+    </header>
+    <main>
+        <section>
+            <h2>Get in Touch</h2>
+            <p>Check out our code on GitHub: <a href="https://github.com/nwayt001/atari-gpt" target="_blank">Atari-GPT Repository</a></p>
+            <p>We welcome collaboration and any comments for new features and updates to the code!</p>
+        </section>
+        <section>
+            <h2>How to Cite</h2>
+            <p>If you find our work useful, please use the following citation:</p>
+            <pre>
+        @misc{waytowich2024atarigptinvestigatingcapabilitiesmultimodal,
+            title={Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games}, 
+            author={Nicholas R. Waytowich and Devin White and MD Sunbeam and Vinicius G. Goecks},
+            year={2024},
+            eprint={2408.15950},
+            archivePrefix={arXiv},
+            primaryClass={cs.AI},
+            url={https://arxiv.org/abs/2408.15950}
+        }
+            </pre>
+            <p>You can also find our paper on <a href="https://arxiv.org/abs/2408.15950" target="_blank">arXiv</a>.</p>
+        </section>
+    </main>
+    <footer>
+        <p>&copy; 2024 Atari-GPT Research Team.</p>
+    </footer>
+</body>
+</html>
diff --git a/docs/index.html b/docs/index.html
@@ -0,0 +1,49 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Atari-GPT</title>
+    <link rel="stylesheet" href="styles.css">
+</head>
+<body>
+    <header>
+        <h1>Atari-GPT</h1>
+        <nav>
+            <a href="about.html">About</a>
+            <a href="results.html">Results</a>
+            <a href="videos.html">Videos</a>
+            <a href="contact.html">Contact</a>
+        </nav>
+    </header>
+    <main>
+        <section>
+            <h2>Welcome to Atari-GPT</h2>
+            <p>Atari-GPT introduces a novel benchmark to evaluate the capabilities of multimodal large language models (LLMs) as low-level controllers in Atari games. This groundbreaking research explores their potential in dynamic, visually rich environments and creates a benchmark designed to test their capabilities.</p>
+        </section>
+        <section>
+            <h2>Explore the Highlights</h2>
+            <ul>
+                <li>Investigate how models like GPT-4V, Gemini and Claude perform in Atari games.</li>
+                <li>Learn about challenges in spatial reasoning and visual understanding.</li>
+                <li>Discover the potential applications of LLMs beyond traditional tasks.</li>
+            </ul>
+            <p>Want to read more? You can read the full paper <a href="https://arxiv.org/abs/2408.15950" target="_blank">here</a>.</p></p>
+        </section>
+        <section>
+            <h2>Watch GPT-4o Play Atari!</h2>
+            <p>Want to see more? Watch all the LLMs play Atari <a href="videos.html">here</a>!</p>
+            <div>
+                <h3>GPT-4o Gameplay</h3>
+                <video autoplay loop muted controls width="640" height="360">
+                    <source src="videos/GPT4o.mp4" type="video/mp4">
+                    Your browser does not support the video tag.
+                </video>
+            </div>
+        </section>
+    </main>
+    <footer>
+        <p>&copy; 2024 Atari-GPT Research Team.</p>
+    </footer>
+</body>
+</html>
diff --git a/docs/results.html b/docs/results.html
@@ -0,0 +1,54 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Results</title>
+    <link rel="stylesheet" href="styles.css">
+</head>
+<body>
+    <header>
+        <h1>Results</h1>
+        <nav>
+            <a href="index.html">Home</a>
+            <a href="about.html">About</a>
+            <a href="videos.html">Videos</a>
+            <a href="contact.html">Contact</a>
+        </nav>
+    </header>
+    <main>
+        <section>
+            <h2>Performance Analysis</h2>
+            <p>We evaluated the performance of GPT-4V, GPT-4o, Gemini 1.5 Flash, and Claude 3 Haiku across several Atari games. Models were assessed on game scores, visual understanding, spatial reasoning, and strategic capabilities.</p>
+            <ul>
+                <li>Best model: GPT-4o with a normalized performance of 23.2% when compared to a human.</li>
+                <li>There is a significant gap in visual and spatial reasoning.</li>
+                <li>Model inferencing in its current state is not yet fast enough for real time gameplay.</li>
+                <li>Models outperformed random agents but lagged behind humans and RL agents.</li>
+            </ul>
+        </section>
+        <section>
+            <h2>Human Normalized Scores</h2>
+            <img src="Norm_perf.jpg" alt="Performance Chart">
+
+            <h2>Human Normalized Scores per environment</h2>
+            <img src="Norm_perf_ind.jpg" alt="Performance Chart">
+        </section>
+        <section>
+            <h2>Visual and Spatial Reasoning</h2>
+            <img src="vis_perf.png" alt="Performance Chart">
+        </section>
+        <section>
+            <h2>Key Insights</h2>
+            <ul>
+                <li>Visual reasoning is moderately successful; spatial reasoning remains a bottleneck.</li>
+                <li>Inference time (2-7 seconds) is a major hurdle for real-time applications.</li>
+                <li>Models demonstrated basic understanding of game mechanics, signaling potential for improvement.</li>
+            </ul>
+        </section>
+    </main>
+    <footer>
+        <p>&copy; 2024 Atari-GPT Research Team.</p>
+    </footer>
+</body>
+</html>
diff --git a/docs/styles.css b/docs/styles.css
@@ -0,0 +1,185 @@
+/* General Body Styling */
+body {
+    font-family: 'VT323', monospace; /* Retro gaming font */
+    margin: 0;
+    padding: 0;
+    line-height: 1.6;
+    background-color: #0d0d0d; /* Deep black background for retro feel */
+    color: #00ffcc; /* Neon turquoise text for readability */
+}
+
+/* Header Styling */
+header {
+    background: linear-gradient(90deg, #0d0d0d, #0d0d0d); /* Inviting orange-to-yellow gradient */
+    color: #fff;
+    text-align: center;
+    padding: 20px 0;
+    border-bottom: 5px solid #00ffcc; /* Retro arcade bold underline */
+    box-shadow: 0 4px 6px rgba(0, 0, 0, 0.5);
+    text-transform: uppercase;
+}
+
+header h1 {
+    margin: 0;
+    font-size: 3rem;
+    letter-spacing: 3px; /* Spaced-out font for retro aesthetic */
+    text-shadow: 0 0 8px #fff, 0 0 15px #ff6600; /* Glow effect */
+}
+
+.video-container {
+    display: flex; /* Enables flexbox */
+    justify-content: center; /* Centers items horizontally */
+    align-items: center; /* Centers items vertically */
+    margin: 20px 0; /* Adds spacing between videos */
+}
+
+video {
+    max-width: 100%; /* Ensures the video doesn't exceed container width */
+    height: auto; /* Maintains the aspect ratio */
+    border: 3px solid #00ffcc; /* Optional: Add a retro-styled border */
+    border-radius: 10px; /* Optional: Rounds the edges of the video */
+    box-shadow: 0 0 10px #00ffcc; /* Optional: Adds a glow effect */
+}
+
+nav {
+    margin-top: 15px;
+}
+
+nav a {
+    color: #fff;
+    margin: 0 15px;
+    text-decoration: none;
+    font-size: 1.2rem;
+    letter-spacing: 1px;
+    transition: color 0.3s ease, text-shadow 0.3s ease;
+}
+
+nav a:hover {
+    color: #ffcc00;
+    text-shadow: 0 0 8px #ffcc00, 0 0 12px #ff6600; /* Glow effect on hover */
+}
+
+/* Main Content Styling */
+main {
+    max-width: 1000px;
+    margin: 30px auto;
+    padding: 20px;
+    background: #1a1a1a; /* Slightly lighter black for contrast */
+    border: 3px solid #00ffcc;
+    border-radius: 15px; /* Rounded edges for a polished look */
+    box-shadow: 0 0 15px #00ffcc; /* Glow effect */
+}
+
+section {
+    margin-bottom: 40px;
+}
+
+section h2 {
+    color: #ffcc00;
+    font-size: 2rem;
+    text-transform: uppercase;
+    text-shadow: 0 0 10px #ffcc00, 0 0 20px #ff6600;
+    margin-bottom: 20px;
+}
+
+p {
+    font-size: 1.2rem;
+    line-height: 1.8;
+}
+
+ul {
+    list-style-type: disc; /* Changes to bullet points */
+    padding-left: 20px; /* Adds space between bullets and text */
+}
+
+ul li {
+    margin: 10px 0;
+    font-size: 1.2rem;
+}
+
+
+ul li a {
+    color: #00ffcc;
+    text-decoration: none;
+    font-weight: bold;
+    transition: color 0.3s ease, text-shadow 0.3s ease;
+}
+
+ul li a:hover {
+    color: #ff6600;
+    text-shadow: 0 0 5px #ff6600, 0 0 10px #ffcc00;
+}
+
+/* Footer Styling */
+footer {
+    background: #111;
+    color: #00ffcc;
+    text-align: center;
+    padding: 20px 0;
+    border-top: 3px solid #00ffcc;
+    box-shadow: 0 -2px 10px #00ffcc;
+}
+
+footer p {
+    margin: 0;
+    font-size: 1rem;
+}
+
+/* Buttons */
+button {
+    padding: 12px 25px;
+    background: #ff6600;
+    color: #fff;
+    border: none;
+    border-radius: 8px;
+    cursor: pointer;
+    font-size: 1.2rem;
+    text-transform: uppercase;
+    letter-spacing: 1px;
+    transition: background 0.3s ease, transform 0.2s ease;
+    box-shadow: 0 4px 8px rgba(0, 0, 0, 0.2);
+}
+
+button:hover {
+    background: #ffcc00;
+    color: #111;
+    transform: translateY(-3px); /* Retro button bounce */
+    box-shadow: 0 6px 12px rgba(0, 0, 0, 0.3);
+}
+
+/* Image Styling */
+img {
+    max-width: 100%;
+    border: 3px solid #00ffcc;
+    border-radius: 10px;
+    margin: 20px 0;
+    box-shadow: 0 0 10px #00ffcc;
+}
+
+/* Responsive Design */
+@media (max-width: 768px) {
+    header h1 {
+        font-size: 2rem;
+    }
+
+    nav a {
+        font-size: 1rem;
+    }
+
+    main {
+        margin: 15px;
+        padding: 15px;
+    }
+
+    section h2 {
+        font-size: 1.5rem;
+    }
+
+    p, ul li {
+        font-size: 1rem;
+    }
+
+    button {
+        font-size: 1rem;
+    }
+}