Skip to content

Ivyyyy24381/pointing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pointing and Human Dog Interaction

Background: https://github.com/csci1951a-spring-2024/final-project-gestubots/blob/main/README.md

detailed data breakdown is in: https://github.com/csci1951a-spring-2024/final-project-gestubots/blob/main/data/README.md

2. Data stat

We ran experiment on 8 human dog participants pairs with 12 human-dog interaction + 4 human-human interaction trials(16 trials per run yielding a total of 128 trials). Given that we are comparing 5 different vectors, we have a total of 640 pointing and 640 non pointing arm information(1280 in total). The sub dataset can be broken down into:

  • dog selection data: record how dog make sequential decision base on the gesture command (96 data points)
  • ground intersection: record where specific vector intersection with the ground plane and the distance to each target in the scene. The ground intersection can be further broken into:
    • Experiment: equadistant target & human pointing for dog (960 data points)
    • Calibration: equadistant target & human pointing for human (320 data points)
    • Test: vertically distributed targets & human pointing for dog (we sliced the video streams but the data set size for test and validation is tbd)

3. Attributes:

  • Dog Selection Data:

    • Data Date: Date of the experimental session.
    • Trial #: Sequential number of the trial within an experimental session.
    • Target Location: Location of the hidden treat.
    • Dog Selection: Sequence of dog's attempts to select the correct cup.
  • Vector Intersection Data:

    • Img Name: Unique identifier for each image/frame.
    • Vector Selection: Type of pointing vector used.
    • Vector Intersection: Coordinates of the intersection of the vector with the ground plane.
    • Target Distance: Distance to the target from the vector intersection.
    • Closest Target: Location and distance of the closest target to the vector intersection.

Target Selection Probability Data:

  • Probability: Probability of target selection.
  • Perplexity: Measure of uncertainty in target selection.

4. Data source:

Our data was collected through meticulously recorded experimental sessions utilizing Intel's Realsense D435 camera, renowned for its depth sensing capabilities. The use of Google's MediaPipe gesture detection library facilitated the detection of human pointing gestures. The sample was generated by manually selecting frames where gesture commands occurred, resulting in a sample size of 96 recorded trials. While representative of natural communication between humans and dogs, there may be some sampling bias inherent in the experimental setup and selection process. During the experiment, we observed repetitive pointing behaviors from human participants. In addition, there is a tendancy for dog to preference targets that are closer to the owner. These observations are currently not analyzed quantitively in our dataset.

Data Cleaning

MediaPipe allows us to extract location information of where each joint is. We saved that to JSON and selected partial attributes(i.e. the upper body 2d joint locations) to generate the vector raycast to ground for our own data set. To ensure the integrity of our dataset, thorough checks were conducted for consistency in data formats and ranges. Any incomplete or inconsistent entries were removed, while essential fields like trial number and target location were verified for completeness. MediaPipe would occasionally output inaccurate gesture detection, but the visualization of the output is saved to allow us to go back for inspection and data reprocessing. Missing and duplcate values were not issues for us since we produced our own dataset. We ran into data type issues when we are alligning depth and color images, but it is resolved through further digging into the realsense camera api interface. We would also like to further work on integrating the pipeline for the user interaction component.

data distribution

Below is a brief analysis on the data we have processed:

Vector Euclidean Dist [m] Stdev Accuracy Stdev PP Stdev
Nose to Wrist 0.514 0.280 0.957 0.163 3.128 0.484
Eye to Wrist 0.516 0.307 0.960 0.146 3.213 0.585
Shoulder to Wrist 0.565 0.283 0.940 0.182 3.111 0.584
Elbow to Wrist 0.868 1.179 0.925 0.184 3.372 0.568
Head 2.711 0.683 0.518 0.364 3.581 0.520
Dog(baseline) 0.649 4.000
  • "Vector" represents the type of pointing vector used.
  • "Euclidean Dist [m]" is the average Euclidean distance in meters.
  • "Stdev" refers to the standard deviation of the Euclidean distance.
  • "Accuracy" denotes the average accuracy of the model.
  • "Stdev" represents the standard deviation of accuracy.
  • "PP" refers to the average perplexity.
  • "Stdev" represents the standard deviation of perplexity.

The Euclidean distances for most pointing vectors show a relatively uniform distribution without significant skewness. Outliers are minimal, with a few potential outliers observed in the "Elbow to Wrist" and "Head" vectors. The accuracy values for the "Dog" pointing vector exhibit a skewed distribution towards lower values.

Summary

From collecting the data, we've encountered challenges in maintaining consistency for frame selection across trials. We've also observed variability in dog selection attempts and target distances. Moving forward, we plan to analyze the effectiveness of different pointing vectors in human-dog interaction, considering factors influencing pointing accuracy. The data collection process has guided our focus and highlighted the complexities of natural gesture interactions between humans and dogs. We would also shift our focus to create interactive components for users to visualize the vector intersection from images.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published