Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

labelbox integration #2995

Merged
merged 58 commits into from
Dec 12, 2024
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
ce6f182
wip labelbox integration
tyesayan Nov 21, 2024
8307576
changes:
tyesayan Nov 22, 2024
c24cc23
document labelbox converter api
tyesayan Nov 22, 2024
a92d216
change labelbox metadata in deeplake dataset info
tyesayan Nov 23, 2024
e42e2c3
add secret
artgish Nov 25, 2024
762ba0a
add retries for labelbox video download and mark to skip integration …
tyesayan Nov 25, 2024
58294a3
fix labelbox json parsing issues
tyesayan Nov 25, 2024
6289211
add labelbox groupping support
tyesayan Nov 25, 2024
00feb16
fix CI failures
tyesayan Nov 25, 2024
ebcc166
add creds and fail_on_error args to labelbox integration functions
tyesayan Nov 25, 2024
fb83641
Merge branch 'v3' into v3-labelbox-integration
tyesayan Nov 26, 2024
ae58f9b
reformat labelbox files
tyesayan Nov 26, 2024
d1e52b7
Merge branch 'v3-labelbox-integration' of https://github.com/activelo…
tyesayan Nov 26, 2024
5cd0dda
fix mypy
activesoull Nov 26, 2024
2baedcf
add cred_id arg for labelbox dataset create functions
tyesayan Nov 26, 2024
34b0761
Merge branch 'v3-labelbox-integration' of https://github.com/activelo…
tyesayan Nov 26, 2024
b49c3e4
fix typo
tyesayan Nov 26, 2024
40af0ac
fix typo
tyesayan Nov 26, 2024
e80e9da
fix remote urls upload to labelbox
Nov 27, 2024
cfeff84
update test_labebox project id
Nov 27, 2024
253a945
add url_presigner to handle delegated access video resources from sto…
tyesayan Nov 27, 2024
d2c8175
Merge branch 'v3-labelbox-integration' of https://github.com/activelo…
tyesayan Nov 27, 2024
6589ffd
faster frame fetching and some imporvements in labelbox integration
Nov 27, 2024
879f684
labelox integration perforamce improvement and fixes
tyesayan Nov 27, 2024
94f2aea
download videos before generating frames
tyesayan Dec 2, 2024
64aa4e7
fix mask generating issue
tyesayan Dec 2, 2024
d4d2510
fix av error exception
tyesayan Dec 2, 2024
3977c1a
implement custom interpolators for labelbox
tyesayan Dec 2, 2024
f420b76
bring back generating frames from stream for labelbox
tyesayan Dec 2, 2024
cafdfca
fix labelbox annotations interpolation
tyesayan Dec 2, 2024
9ae0cb5
fix labelbox samples count
tyesayan Dec 3, 2024
6ee3bbe
add fail_on_labelbox_project_export_error argument to skip labelbox e…
tyesayan Dec 3, 2024
3dca0a9
Merge branch 'v3' of https://github.com/activeloopai/deeplake into v3…
tyesayan Dec 4, 2024
e8b5e95
fix assertion failure in labelbox integration while parsing segments
tyesayan Dec 4, 2024
a143b4e
add metadata support in labelbox integration
tyesayan Dec 5, 2024
edf0e54
fix labelbox values interpolation
tyesayan Dec 5, 2024
669e26f
add duplication check for create_labelbox_annotation_project
tyesayan Dec 6, 2024
88b548b
fix duplicate label addition for checkboxes during interpolation
tyesayan Dec 7, 2024
60e2426
update connect to labelbox test
tyesayan Dec 8, 2024
f29fceb
reformat labelbox integration files
tyesayan Dec 8, 2024
f30b95b
fix labelbox_utils mypy errors
tyesayan Dec 8, 2024
e3cec14
update labelbox integration test
tyesayan Dec 10, 2024
e70faac
fix labelbox integration test
tyesayan Dec 10, 2024
b68d910
Merge branch 'v3' of https://github.com/activeloopai/deeplake into v3…
tyesayan Dec 10, 2024
1719be5
fix mypy lint error
tyesayan Dec 10, 2024
c15d447
Trigger Action
zaaram Dec 10, 2024
bb18a99
add labelbox azure utils test
tyesayan Dec 11, 2024
560d4b9
Merge branch 'v3-labelbox-integration' of https://github.com/activelo…
tyesayan Dec 11, 2024
d7a6ed2
cleanup labelbox debug classes
tyesayan Dec 11, 2024
b5b2f23
reformat labelbox ingestion files
tyesayan Dec 11, 2024
937b2d9
fixed darglint
activesoull Dec 11, 2024
06dde0c
Merge branch 'v3-labelbox-integration' of github.com:activeloopai/dee…
activesoull Dec 11, 2024
dd3557b
fixed darglint
activesoull Dec 11, 2024
519b357
use labelbox export_v2 instead of export
tyesayan Dec 11, 2024
7b38832
update error log for labelbox project export
tyesayan Dec 11, 2024
b13f582
skip test_connect_to_labelbox
tyesayan Dec 11, 2024
1a2acc0
reformat test_labelbox.py
tyesayan Dec 11, 2024
90839e0
skip test_labelbox_azure_utils
tyesayan Dec 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/test-push.yml
Original file line number Diff line number Diff line change
Expand Up @@ -149,3 +149,5 @@ jobs:
oauth_client_secret: ${{ secrets.GDRIVE_CLIENT_SECRET }}
oauth_refresh_token: ${{ secrets.GDRIVE_REFRESH_TOKEN }}
sonar_token: ${{ secrets.SONAR_TOKEN }}
labelbox_token: ${{ secrets.LABELBOX_TOKEN }}

7 changes: 7 additions & 0 deletions deeplake/integrations/labelbox/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
from deeplake.integrations.labelbox.labelbox_ import (
create_dataset_for_video_annotation,
create_dataset_for_video_annotation_with_custom_data_filler,
create_dataset_from_video_annotation_project,
create_dataset_from_video_annotation_project_with_custom_data_filler,
converter_for_video_project_with_id,
)
317 changes: 317 additions & 0 deletions deeplake/integrations/labelbox/labelbox_.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,317 @@
import deeplake
import os
import labelbox as lb # type: ignore

from deeplake.integrations.labelbox.labelbox_utils import *
from deeplake.integrations.labelbox.labelbox_converter import labelbox_video_converter
from deeplake.integrations.labelbox.v3_converters import *


def converter_for_video_project_with_id(
project_id,
client,
deeplake_ds_loader,
lb_api_key,
group_mapping=None,
fail_on_error=False,
):
"""
Creates a converter for Labelbox video project to a Deeplake dataset format based on annotation types.

Args:
project_id (str): The unique identifier for the Labelbox project to convert.
client (LabelboxClient): An authenticated Labelbox client instance for API access.
deeplake_ds_loader (callable): A function that creates/loads a Deeplake dataset given a name.
lb_api_key (str): Labelbox API key for authentication.
group_mapping (dict, optional): A dictionary mapping annotation kinds (labelbox_kind) to the desired tensor group name (tensor_name). This mapping determines whether annotations of the same kind should be grouped into the same tensor or kept separate.
fail_on_error (bool, optional): Whether to raise an exception if data validation fails. Defaults to False.

Returns:
labelbox_type_converter or None: Returns a labelbox_type_converter if successful, None if no data is found.
The returned converter can be used to apply Labelbox annotations to a Deeplake dataset.

Raises:
Exception: If project data validation fails.

Example:
>>> client = LabelboxClient(api_key='your_api_key')
>>> converter = converter_for_video_project_with_id(
... '<project_id>',
... client,
... lambda name: deeplake.load(name),
... 'your_api_key',
... group_mapping={"raster-segmentation": "mask"}
... )
>>> if converter:
... # Use converter to apply annotations
... ds = converter.dataset_with_applied_annotations()

Notes:
- Supports Video ontology from labelbox.
- The function first validates the project data before setting up converters.
"""
project_json = labelbox_get_project_json_with_id_(client, project_id, fail_on_error)

Check warning on line 53 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L53

Added line #L53 was not covered by tests

if len(project_json) == 0:
print("no data")
return None

Check warning on line 57 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L55-L57

Added lines #L55 - L57 were not covered by tests

ds_name = project_json[0]["projects"][project_id]["name"]
deeplake_dataset = deeplake_ds_loader(ds_name)

Check warning on line 60 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L59-L60

Added lines #L59 - L60 were not covered by tests

if not validate_project_data_(project_json, deeplake_dataset, project_id, "video"):
if fail_on_error:
raise Exception("Data validation failed")

Check warning on line 64 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L62-L64

Added lines #L62 - L64 were not covered by tests

ontology_id = project_json[0]["projects"][project_id]["project_details"][

Check warning on line 66 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L66

Added line #L66 was not covered by tests
"ontology_id"
]
ontology = client.get_ontology(ontology_id)

Check warning on line 69 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L69

Added line #L69 was not covered by tests

converters = {

Check warning on line 71 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L71

Added line #L71 was not covered by tests
"rectangle": bbox_converter_,
"radio": radio_converter_,
"checklist": checkbox_converter_,
"point": point_converter_,
"line": line_converter_,
"raster-segmentation": raster_segmentation_converter_,
"text": text_converter_,
}
return labelbox_video_converter(

Check warning on line 80 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L80

Added line #L80 was not covered by tests
ontology,
converters,
project_json,
project_id,
deeplake_dataset,
{"ds": deeplake_dataset, "lb_api_key": lb_api_key},
group_mapping=group_mapping,
)


def create_dataset_for_video_annotation_with_custom_data_filler(
deeplake_ds_path,
video_paths,
lb_client,
data_filler,
deeplake_creds=None,
deeplake_token=None,
overwrite=False,
lb_ontology=None,
lb_batch_priority=5,
lb_dataset_name=None,
fail_on_error=False,
):
"""
Creates a Deeplake dataset for video annotation and sets up corresponding Labelbox project.
Processes videos frame-by-frame using a custom data filler function.

Args:
deeplake_ds_path (str): Path where the Deeplake dataset will be created/stored.
Can be local path or remote path (e.g. 'hub://org/dataset')
video_paths (List[str]): List of paths to video files to be processed can be local or pre-signed remote.
lb_client (LabelboxClient): Authenticated Labelbox client instance
data_filler (dict): Dictionary containing two functions:
- 'create_tensors': callable(ds) -> None
Creates necessary tensors in the dataset
- 'fill_data': callable(ds, idx, frame_num, frame) -> None
Fills dataset with processed frame data
deeplake_creds (dict): Dictionary containing credentials for deeplake
deeplake_token (str, optional): Authentication token for Deeplake cloud storage.
overwrite (bool, optional): Whether to overwrite existing dataset. Defaults to False
lb_ontology (Ontology, optional): Labelbox ontology to connect to project. Defaults to None
lb_batch_priority (int, optional): Priority for Labelbox batches. Defaults to 5
lb_dataset_name (str, optional): Custom name for Labelbox dataset.
Defaults to deeplake_ds_path basename + '_from_deeplake'

Returns:
Dataset: Created Deeplake dataset containing processed video frames and metadata for Labelbox project
"""
ds = deeplake.empty(

Check warning on line 129 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L129

Added line #L129 was not covered by tests
deeplake_ds_path,
creds=deeplake_creds,
token=deeplake_token,
overwrite=overwrite,
)

data_filler["create_tensors"](ds)

Check warning on line 136 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L136

Added line #L136 was not covered by tests

for idx, video_path in enumerate(video_paths):
for frame_num, frame in frame_generator_(video_path):
data_filler["fill_data"](ds, idx, frame_num, frame)

Check warning on line 140 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L138-L140

Added lines #L138 - L140 were not covered by tests

if lb_dataset_name is None:
lb_dataset_name = os.path.basename(deeplake_ds_path) + "_from_deeplake"

Check warning on line 143 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L142-L143

Added lines #L142 - L143 were not covered by tests

lb_ds = lb_client.create_dataset(name=lb_dataset_name)
task = lb_ds.create_data_rows(video_paths)
task.wait_till_done()

Check warning on line 147 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L145-L147

Added lines #L145 - L147 were not covered by tests

# Create a new project
project = lb_client.create_project(

Check warning on line 150 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L150

Added line #L150 was not covered by tests
name=os.path.basename(deeplake_ds_path), media_type=lb.MediaType.Video
)

ds.info["labelbox_meta"] = {

Check warning on line 154 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L154

Added line #L154 was not covered by tests
"project_id": project.uid,
"type": "video",
"sources": video_paths,
}

task = project.create_batches_from_dataset(

Check warning on line 160 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L160

Added line #L160 was not covered by tests
name_prefix=lb_dataset_name, dataset_id=lb_ds.uid, priority=lb_batch_priority
)

if task.errors():
if fail_on_error:
raise Exception(f"Error creating batches: {task.errors()}")

Check warning on line 166 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L164-L166

Added lines #L164 - L166 were not covered by tests

if lb_ontology:
project.connect_ontology(lb_ontology)

Check warning on line 169 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L168-L169

Added lines #L168 - L169 were not covered by tests

ds.commit()

Check warning on line 171 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L171

Added line #L171 was not covered by tests

return ds

Check warning on line 173 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L173

Added line #L173 was not covered by tests


def create_dataset_for_video_annotation(
deeplake_ds_path,
video_paths,
lb_client,
deeplake_creds=None,
deeplake_token=None,
overwrite=False,
lb_ontology=None,
lb_batch_priority=5,
fail_on_error=False,
):
"""
See create_dataset_for_video_annotation_with_custom_data_filler for complete documentation.

The only difference is this function uses default tensor creation and data filling functions:
- create_tensors_default_: Creates default tensor structure
- fill_data_default_: Fills tensors with default processing
"""
return create_dataset_for_video_annotation_with_custom_data_filler(

Check warning on line 194 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L194

Added line #L194 was not covered by tests
deeplake_ds_path,
video_paths,
lb_client,
data_filler={
"create_tensors": create_tensors_default_,
"fill_data": fill_data_default_,
},
deeplake_creds=deeplake_creds,
deeplake_token=deeplake_token,
lb_ontology=lb_ontology,
lb_batch_priority=lb_batch_priority,
overwrite=overwrite,
fail_on_error=fail_on_error,
)


def create_dataset_from_video_annotation_project_with_custom_data_filler(
deeplake_ds_path,
project_id,
lb_client,
lb_api_key,
data_filler,
deeplake_creds=None,
deeplake_token=None,
overwrite=False,
fail_on_error=False,
):
"""
Creates a Deeplake dataset from an existing Labelbox video annotation project using custom data processing.
Downloads video frames from Labelbox and processes them using provided data filler functions.

Args:
deeplake_ds_path (str): Path where the Deeplake dataset will be created/stored.
Can be local path or cloud path (e.g. 'hub://org/dataset')
project_id (str): Labelbox project ID to import data from
lb_client (LabelboxClient): Authenticated Labelbox client instance
lb_api_key (str): Labelbox API key for accessing video frames
data_filler (dict): Dictionary containing two functions:
- 'create_tensors': callable(ds) -> None
Creates necessary tensors in the dataset
- 'fill_data': callable(ds, idx, frame_num, frame) -> None
Fills dataset with processed frame data
deeplake_creds (dict): Dictionary containing credentials for deeplake
deeplake_token (str, optional): Authentication token for Deeplake cloud storage.
Required if using hub:// path. Defaults to None
overwrite (bool, optional): Whether to overwrite existing dataset. Defaults to False
fail_on_error (bool, optional): Whether to raise an exception if data validation fails. Defaults to False

Returns:
Dataset: Created Deeplake dataset containing processed video frames and Labelbox metadata.
Returns empty dataset if no data found in project.

Notes:
- The function does not fetch the annotations from Labelbox, only the video frames. After creating the dataset, use the converter to apply annotations.
"""
ds = deeplake.empty(

Check warning on line 250 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L250

Added line #L250 was not covered by tests
deeplake_ds_path,
overwrite=overwrite,
creds=deeplake_creds,
token=deeplake_token,
)
data_filler["create_tensors"](ds)

Check warning on line 256 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L256

Added line #L256 was not covered by tests

proj = labelbox_get_project_json_with_id_(lb_client, project_id, fail_on_error)
if len(proj) == 0:
print("no data")
return ds

Check warning on line 261 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L258-L261

Added lines #L258 - L261 were not covered by tests

if not validate_project_creation_data_(proj, project_id, "video"):
if fail_on_error:
raise Exception("Data validation failed")

Check warning on line 265 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L263-L265

Added lines #L263 - L265 were not covered by tests

video_files = []

Check warning on line 267 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L267

Added line #L267 was not covered by tests

for idx, p in enumerate(proj):
video_url = p["data_row"]["row_data"]
for frame_num, frame in frame_generator_(video_url, f"Bearer {lb_api_key}"):
data_filler["fill_data"](ds, idx, frame_num, frame)

Check warning on line 272 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L269-L272

Added lines #L269 - L272 were not covered by tests

video_files.append(p["data_row"]["external_id"])

Check warning on line 274 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L274

Added line #L274 was not covered by tests

ds.info["labelbox_meta"] = {

Check warning on line 276 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L276

Added line #L276 was not covered by tests
"project_id": project_id,
"type": "video",
"sources": video_files,
}

ds.commit()

Check warning on line 282 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L282

Added line #L282 was not covered by tests

return ds

Check warning on line 284 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L284

Added line #L284 was not covered by tests


def create_dataset_from_video_annotation_project(
deeplake_ds_path,
project_id,
lb_client,
lb_api_key,
deeplake_creds=None,
deeplake_token=None,
overwrite=False,
fail_on_error=False,
):
"""
See create_dataset_from_video_annotation_project_with_custom_data_filler for complete documentation.

The only difference is this function uses default tensor creation and data filling functions:
- create_tensors_default_: Creates default tensor structure
- fill_data_default_: Fills tensors with default processing
"""
return create_dataset_from_video_annotation_project_with_custom_data_filler(

Check warning on line 304 in deeplake/integrations/labelbox/labelbox_.py

View check run for this annotation

Codecov / codecov/patch

deeplake/integrations/labelbox/labelbox_.py#L304

Added line #L304 was not covered by tests
deeplake_ds_path,
project_id,
lb_client,
lb_api_key,
data_filler={
"create_tensors": create_tensors_default_,
"fill_data": fill_data_default_,
},
deeplake_creds=deeplake_creds,
deeplake_token=deeplake_token,
overwrite=overwrite,
fail_on_error=fail_on_error,
)
Loading
Loading