Skip to content

Commit

Permalink
Merge branch 'main' of github.com:visual-haystacks/visual-haystacks.g…
Browse files Browse the repository at this point in the history
…ithub.io
  • Loading branch information
tsunghan-wu committed Oct 15, 2024
2 parents 57bd266 + cbccbee commit afeb3e0
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
<!-- Replace the content tag with appropriate information -->
<meta name="description" content="SVisual Haystacks: Answering Harder Questions About Sets of Images">
<meta property="og:title" content="Visual Haystacks" />
<meta property="og:description" content="SOCIAL MEDIA DESCRIPTION TAG TAG" />
<meta property="og:description" content="Recent advancements in Large Multimodal Models (LMMs) have made significant progress in the field of single-image visual question answering. However, these models face substantial challenges when tasked with queries that span extensive collections of images, similar to real-world scenarios like searching through large photo albums, finding specific information across the internet, or monitoring environmental changes through satellite imagery. This paper explores the task of Multi-Image Visual Question Answering (MIQA): given a large set of images and a natural language query, the task is to generate a relevant and grounded response. We propose a new public benchmark, dubbed Visual Haystacks (VHs), specifically designed to evaluate LMMs capabilities in visual retrieval and reasoning over sets of unrelated images, where we perform comprehensive evaluations demonstrating that even robust closed-source models struggle significantly. Towards addressing these shortcomings, we introduce MIRAGE (Multi-Image Retrieval Augmented Generation), a novel retrieval/QA framework tailored for LMMs that confronts the challenges of MIQA with marked efficiency and accuracy improvements over baseline methods. Our evaluation shows that MIRAGE surpasses closed-source GPT-4o models by up to 11% on the VHs benchmark and offers up to 3.4x improvements in efficiency over text-focused multi-stage approaches." />
<meta property="og:url" content="http://visual-haystacks.github.io" />
<!-- Path to banner image, should be in the path listed below. Optimal dimensions are 1200X630-->
<meta property="og:image" content="/static/images/VHs_logo.png" />
Expand All @@ -16,7 +16,7 @@


<meta name="twitter:title" content="Visual Haystacks: Answering Harder Questions About Sets of Images">
<meta name="twitter:description" content="TWITTER BANNER DESCRIPTION META TAG">
<meta name="twitter:description" content="Recent advancements in Large Multimodal Models (LMMs) have made significant progress in the field of single-image visual question answering. However, these models face substantial challenges when tasked with queries that span extensive collections of images, similar to real-world scenarios like searching through large photo albums, finding specific information across the internet, or monitoring environmental changes through satellite imagery. This paper explores the task of Multi-Image Visual Question Answering (MIQA): given a large set of images and a natural language query, the task is to generate a relevant and grounded response. We propose a new public benchmark, dubbed Visual Haystacks (VHs), specifically designed to evaluate LMMs capabilities in visual retrieval and reasoning over sets of unrelated images, where we perform comprehensive evaluations demonstrating that even robust closed-source models struggle significantly. Towards addressing these shortcomings, we introduce MIRAGE (Multi-Image Retrieval Augmented Generation), a novel retrieval/QA framework tailored for LMMs that confronts the challenges of MIQA with marked efficiency and accuracy improvements over baseline methods. Our evaluation shows that MIRAGE surpasses closed-source GPT-4o models by up to 11% on the VHs benchmark and offers up to 3.4x improvements in efficiency over text-focused multi-stage approaches." />
<!-- Path to banner image, should be in the path listed below. Optimal dimensions are 1200X600-->
<meta name="twitter:image" content="static/images/VHs_logo.png">
<meta name="twitter:card" content="Visual Haystack Project Logo: A cartoon character photo sitting on top of a haystack of images.">
Expand Down

0 comments on commit afeb3e0

Please sign in to comment.