Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: DOC-243: Update task agreements page #6343

Merged
merged 5 commits into from
Sep 12, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 29 additions & 12 deletions docs/source/guide/stats.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
---
title: Task agreement and how it is calculated
short: Task agreement matrix
title: How task agreement and labeling consensus are calculated
short: Task agreements
tier: enterprise
type: guide
order: 0
order_enterprise: 307
meta_title: Data Labeling Statistics
meta_description: Label Studio Enterprise documentation about task agreement, annotator consensus, and other data annotation statistics for data labeling and machine learning projects.
meta_title: Task agreement in Label Studio Enterprise
meta_description: Task agreement, or labeling consensus, and other data annotation statistics for data labeling and machine learning projects.
section: "Review & Measure Quality"
---

Expand All @@ -17,7 +17,7 @@ Annotation statistics help you determine the quality of your dataset, its readin

## Task agreement

Task agreement shows the consensus between multiple annotators when labeling the same task. There are several types of task agreement in Label Studio Enterprise:
Task agreement, also known as "labeling consensus" or "annotation consensus," shows the consensus between multiple annotators when labeling the same task. There are several types of task agreement in Label Studio Enterprise:
- a per-task agreement score, visible on the Data Manager page for a project. This displays how well the annotations on a particular task match across annotators.
- an inter-annotator agreement matrix, visible on the Members page for a project. This displays how well the annotations from specific annotators agree with each other in general, or for specific tasks.

Expand Down Expand Up @@ -155,15 +155,18 @@ The following **text edit distance** algorithms are available:
- Needleman-Wunsch
- Smith-Waterman

### Intersection over Union example
### Intersection over union example

The Intersection over Union (IoU) metric compares the area of overlapping regions, such as bounding boxes, polygons or textual / time series one-dimensional spans with the overall area, or union, of the regions.
The Intersection over Union (IoU) metric is used to compare the overlap between regions such as bounding boxes, polygons, or textual/time series one-dimensional spans—against the combined area, or union, of the regions.

For example, for two annotations `x` and `y` containing either bounding boxes or polygons, the following calculation occurs:
- LSE identifies whether any regions overlap across the two annotations. Overlapping can be only considered with matched labels.
- For each pair of overlapping regions across the annotations, the area of the overlap, or intersection `aI` is compared to the combined area `aU` of both regions, referred to as the union of the regions: `aI` ÷ `aU`
- The average of `aI` ÷ `aU` for each pair of regions is used as the IoU calculation for a pair of annotations, or each IoU calculation for eadch label or region is used.
For example, if there are two bounding boxes for each `x` and `y` annotations, the agreement of `x` and `y` = ((`aI` ÷ `aU`) + (`aI` ÷ `aU`)) ÷2 .
For two annotations, `x` and `y`, which contain either bounding boxes or polygons, the following steps occur:

* **Identifying Overlapping Regions**: The system identifies whether any regions overlap across the two annotations. Overlaps are only considered for matched labels (i.e., regions assigned the same label or class).
* **Calculating IoU for Each Pair**: For each pair of overlapping regions, the area of overlap, or intersection (aI), is divided by the total combined area of the two regions, known as the union (aU). This gives the IoU for that pair as `aI ÷ aU`, which results in a value between `0` and `1`, where `1` indicates perfect overlap and `0` indicates no overlap.
* **Tracking the Maximum IoU**: When comparing multiple regions (e.g., multiple bounding boxes), the system tracks the highest IoU value for the pair using the formula `max_iou = max(iou, max_iou)`. This ensures that the most significant agreement between the two annotations is captured.
* **Avoiding Averaging Misconceptions**: In some cases, there may be multiple overlapping regions between annotations `x` and `y`. Rather than averaging all IoU values (which could be misleading), the highest IoU for each pair is retained, ensuring the most representative comparison of agreement between the annotations.

This method ensures that only the strongest level of overlap between regions is recorded for each annotation pair, reflecting the highest possible agreement between the two annotations.

#### Intersection over union with text

Expand All @@ -174,6 +177,20 @@ For two given task annotations `x` and `y`, the agreement score formula is `m(x,
- For hypertext annotations, the span is defined by the `startOffset` and `endOffset` keys.
- For paragraphs of dialogue annotations, the span is defined by the `startOffset` and `endOffset` keys.

#### Intersection over union with time series

Intersection over Union (IoU) for time series data evaluates the overlap between two labeled regions within the time series. Here's how it works:

1. **Identify Regions**: Determine the start and end points of the labeled regions in the time series data.
2. **Calculate Intersection**: Find the overlapping duration between the two regions.
3. **Calculate Union**: Determine the total duration covered by both regions.
4. **Compute IoU**: Divide the intersection duration by the union duration.

For example, if you have two regions:
- Region A: (0, 20)
- Region B: (10, 30)
The intersection is (10, 20) with a duration of 10 units, and the union is (0, 30) with a duration of 30 units. The IoU would be 10/30 = 0.33.

#### Intersection over union with other metrics
The IoU metric can be combined with other metrics. Several metrics in Label Studio Enterprise use IoU to establish initial agreement across annotations, then computes the [precision](#precision-example), [recall](#recall-example), or [F1-score](#f1-score-example) for the IoU values above a specific threshold. Text IoU can also include the [edit distance algorithm](#edit-distance-algorithm-example).

Expand Down