This directory contains all the materials and documentation related to the human study conducted for Autonomous Driving System Testing with Domain Augmentation.
- Human Survey Results
- Platform Used for Human Survey
- Mechanical Turk Template
- Tutorial for Participants
- Dataset of Images for Human Survey
- GLM Analysis
The human survey was conducted to evaluate the realism and semantic validity of the augmented images generated during the experiments.
The anonymized responses from the survey are available here.
Key Findings:
- Semantic Validity: Participants identified semantically valid images with an accuracy rate of 92% in the validation set.
- Realism Evaluation: Responses were collected using a Likert scale. Average realism ratings ranged from 3.5 to 4.7 across augmentation techniques.
The human survey was deployed using a custom-built web platform designed to mirror the interface of Mechanical Turk, ensuring uniformity in evaluation. This platform is a separate project built with Django, a robust Python-based web framework.
- Task Support: Participants completed both realism and semantic validity assessments using a streamlined, intuitive interface.
- Data Management: The platform saves all collected data locally, ensuring secure and efficient storage of participant responses.
- Deployment: It can be hosted on a local server, making it easy to set up and run for small-scale studies or controlled experiments.
- Consistency: Ensures uniform task presentation across all participants, minimizing variability in the survey environment.
The source code for the web platform is available here.
Below are two example screenshots from the platform, showcasing a realism question (left) and a semantic validity question (right):
We created a Mechanical Turk template for scaling the survey to a larger participant pool.
The template contains:
- Survey structure and design.
- Instructions for participants to complete the task effectively.
The Mechanical Turk semantic template is provided in mturk_semantic_template.html
and is accessible here. While the Mechanical Turk realism template is provided in mturk_realism_template.html
and is accessible here.
To ensure participants understood the task, we created a comprehensive tutorial explaining how to evaluate image realism and semantic validity.
The tutorial includes:
- Definitions and examples of semantic validity and realism.
- Step-by-step instructions for answering questions on the survey platform.
- Example tasks with feedback for learning purposes.
The semantic validity tutorial can be accessed here. The realism tutorial can be accessed here.
The dataset used for the human survey consists of augmented images generated using the following techniques:
- Instruction Editing
- Inpainting
- Inpainting with Refinement
The dataset is available here.
For any additional questions, please refer to the project documentation or contact the authors directly.
To analyze the relationship between user expertise and survey performance, we conducted a Generalized Linear Model (GLM) regression.
Key Variables:
- Predictors: Years of driving experience, ADS knowledge, vision quality, and ADS-related experience.
- Outcome: Performance on the semantic validity task (False Positive Rate, Realism).
Findings:
- No statistically significant correlation was observed between any user expertise variables and performance metrics.
- A detailed summary of the regression analysis is included in here (semantic) and here (realism).