Please find the text descriptions and metadata for each visual grounding prompt here. The scene data can be found at the original sources:
The following files can be found here:
vigil3d_scannet.csv
- visual grounding descriptions for selected ScanNet scenesvigil3d_scannetpp.csv
- visual grounding descriptions for selected ScanNet++ scenes
Each CSV includes the following fields:
prompt_id
- ID of prompt (ideally, should be unique across scene datasets)scene_id
- ID of sceneobject_id
- IDs of target objects (as specified in ground truth object mapping)object_name
- text label of object category, if applicabledescription
- natural language description of target objectpair_id
- integer ID representing pairs of descriptions which were designed to be similaris_valid
- 1 if a valid prompt, else 0 if it should be ignored
Additional fields (most prefixed as num_
) include manual metadata characterizing the linguistic patterns of each prompt.
These include
num_attributes_target
- number of attributes used to describe target object(s)num_attributes_anchors
- number of attributes used to describe anchor objectsnum_attribute_type_*
- captures use of different attribute types (e.g. color, size, material)num_relationships_target
- number of relationships used to compare target objects with other objectsnum_relationships_anchors
- number of relationships used to compare anchor objects to one anothernum_relationship_type_*
- captures use of different relationship types (e.g. near, far, vertical)num_coreferences
- 1 if coreferences are usednum_target_not_first_np
- 1 if target is first noun phrase (NP) in promptgranularity
- captures specificity of target object reference (e.g. generic, coarse-grained, fine-grained)num_anchor_type_*
- captures use of different types of anchors (e.g. single, multiple, viewpoint, agent)num_negation
- 1 if negation was used in prompt, else 0