diff --git a/docs/source/docker/integration/docker_trigger_from_api.rst b/docs/source/docker/integration/docker_trigger_from_api.rst index 984669d3a..100155002 100644 --- a/docs/source/docker/integration/docker_trigger_from_api.rst +++ b/docs/source/docker/integration/docker_trigger_from_api.rst @@ -1,6 +1,6 @@ .. _integration-docker-trigger-from-api: -Trigger a Docker Job from the API using a Remote Datasource +Trigger a Docker Job from from the Platform or code =========================================================== Introduction @@ -12,7 +12,6 @@ the provided dataset. The results are immediately available in the webapp for visualization and the selected samples are sent back to your :ref:`cloud bucket `. - Advantages ---------- @@ -20,36 +19,10 @@ Advantages - You can automatically trigger a new job when data is added to your dataset. - Use Lightly docker a background worker and processes new jobs automatically. -Requirements ------------- -This recipe requires that you already have a dataset in the Lightly Platform -configured to use the data in your AWS S3 bucket. Create such a dataset in 2 steps: - -1. `Create a new dataset `_ in Lightly. - Make sure that you choose the input type `Images` or `Videos` correctly, - depending on the type of files in your cloud storage bucket. -2. Edit your dataset, select the storage source as your datasource and fill out the form. - In our example we use an S3 bucket. - - .. figure:: ../../getting_started/resources/LightlyEdit2.png - :align: center - :alt: Lightly S3 connection config - :width: 60% - - Lightly S3 connection config - -If you don`t know how to fill out the form, follow the full tutorial to -`create a Lightly dataset connected to your S3 bucket `_. - -Furthermore, you should have access to a machine running docker. Ideally, it -also has a CUDA-GPU. A fast GPU will speed up the process significantly, -especially for large datasets. - Download the Lightly Docker --------------------------- -Next, the Lightly Docker should be installed. Please follow the instructions for -the :ref:`ref-docker-setup`. +Please follow the instructions for the :ref:`ref-docker-setup`. Register the Lightly Docker as a Worker @@ -81,92 +54,138 @@ The state of the worker on the `Docker Workers `_ in Lightly. + Make sure that you choose the input type `Images` or `Videos` correctly, + depending on the type of files in your cloud storage bucket. + 2. Edit your dataset, select the storage source as your datasource and fill out the form. + In our example we use an S3 bucket. + + .. figure:: ../../getting_started/resources/LightlyEdit2.png + :align: center + :alt: Lightly S3 connection config + :width: 60% + + Lightly S3 connection config + + If you don`t know how to fill out the form, follow the full tutorial to + `create a Lightly dataset connected to your S3 bucket `_. + + + .. tab:: Python Code + + .. literalinclude:: examples/create_dataset.py + +And now we can schedule a new job. + +.. tabs:: + + .. tab:: Web App + + **Trigger the Job** + + To trigger a new job you can click on the schedule run button on the dataset + overview as shown in the screenshot below: + + .. image:: images/schedule-compute-run.png + + After clicking on the button you will see a wizard to configure the the parameters + for the job. + + .. image:: images/schedule-compute-run-config.png + + In our example we use the following parameters. + + + + .. code-block:: javascript + :caption: Docker Config + + { + enable_corruptness_check: true, + remove_exact_duplicates: true, + enable_training: false, + pretagging: false, + pretagging_debug: false, + method: 'coreset', + stopping_condition: { + n_samples: 0.1, + min_distance: -1 + }, + scorer: 'object-frequency', + scorer_config: { + frequency_penalty: 0.25, + min_score: 0.9 + } + } + + + + .. code-block:: javascript + :caption: Lightly Config + + { + loader: { + batch_size: 16, + shuffle: true, + num_workers: -1, + drop_last: true + }, + model: { + name: 'resnet-18', + out_dim: 128, + num_ftrs: 32, + width: 1 + }, + trainer: { + gpus: 1, + max_epochs: 100, + precision: 32 + }, + criterion: { + temperature: 0.5 + }, + optimizer: { + lr: 1, + weight_decay: 0.00001 + }, + collate: { + input_size: 64, + cj_prob: 0.8, + cj_bright: 0.7, + cj_contrast: 0.7, + cj_sat: 0.7, + cj_hue: 0.2, + min_scale: 0.15, + random_gray_scale: 0.2, + gaussian_blur: 0.5, + kernel_size: 0.1, + vf_prob: 0, + hf_prob: 0.5, + rr_prob: 0 + } + } + + Once the parameters are set you can schedule the run using a click on **schedule**. + + .. tab:: Python Code + + .. literalinclude:: examples/trigger_job_s3.py View the progress of the Lightly Docker @@ -206,7 +225,7 @@ so you want to add a subset of them to your dataset. This workflow is supported by the Lightly Platform using a datapool. It remembers which raw data in your S3 bucket has already been processed and will ignore it in future docker runs. Thus you can send the same job again to the -Lightly Worker. It will find your new raw data in the S3 bucket, stream, embed +worker. It will find your new raw data in the S3 bucket, stream, embed and subsample it and then add it to your existing dataset. The samplers will take the existing data in your dataset into account when sampling new data to be added to your dataset. diff --git a/docs/source/docker/integration/docker_with_datasource.rst b/docs/source/docker/integration/docker_with_datasource.rst index 3bfb2f1f1..2fc48fa2f 100644 --- a/docs/source/docker/integration/docker_with_datasource.rst +++ b/docs/source/docker/integration/docker_with_datasource.rst @@ -1,7 +1,7 @@ .. _ref-docker-with-datasource: -Using the docker with an S3 bucket as remote datasource. +Using the docker with an S3 bucket as remote datasource ======================================================== Introduction diff --git a/docs/source/docker/integration/examples/create_dataset.py b/docs/source/docker/integration/examples/create_dataset.py new file mode 100644 index 000000000..d0edef7b7 --- /dev/null +++ b/docs/source/docker/integration/examples/create_dataset.py @@ -0,0 +1,15 @@ +import lightly + +# we create the Lightly client to connect to the API +client = lightly.api.ApiWorkflowClient(token="TOKEN") + +# create a new dataset using Python code +# and connect it to an existing S3 bucket +client.create_dataset('dataset-name') +client.set_s3_config( + resource_path="s3://bucket/dataset", + region="eu-central-1", + access_key="KEY", + secret_access_key="SECRET", + thumbnail_suffix=None, +) diff --git a/docs/source/docker/integration/examples/trigger_job_s3.py b/docs/source/docker/integration/examples/trigger_job_s3.py new file mode 100644 index 000000000..3c7855538 --- /dev/null +++ b/docs/source/docker/integration/examples/trigger_job_s3.py @@ -0,0 +1,73 @@ +import lightly + +# we create the Lightly client to connect to the API +# don't forget to pass the dataset_id if you don't use the client from the +# previous snippet (create a dataset) +client = lightly.api.ApiWorkflowClient(token="TOKEN", dataset_id="DATASET_ID") + +# schedule the compute run using our custom config +# We show here the full default config you can easily edit the +# value according to your need. +client.schedule_compute_worker_run( + worker_config={ + 'stopping_condition': { + 'n_samples': 10, + 'enable_corruptness_check': True, + 'remove_exact_duplicates': True, + 'enable_training': False, + 'pretagging': False, + 'pretagging_debug': False, + 'method': 'coreset', + 'stopping_condition': { + 'n_samples': 0.1, + 'min_distance': -1 + }, + 'scorer': 'object-frequency', + 'scorer_config': { + 'frequency_penalty': 0.25, + 'min_score': 0.9 + } + } + }, + lightly_config={ + 'loader': { + 'batch_size': 16, + 'shuffle': True, + 'num_workers': -1, + 'drop_last': True + }, + 'model': { + 'name': 'resnet-18', + 'out_dim': 128, + 'num_ftrs': 32, + 'width': 1 + }, + 'trainer': { + 'gpus': 1, + 'max_epochs': 100, + 'precision': 32 + }, + 'criterion': { + 'temperature': 0.5 + }, + 'optimizer': { + 'lr': 1, + 'weight_decay': 0.00001 + }, + 'collate': { + 'input_size': 64, + 'cj_prob': 0.8, + 'cj_bright': 0.7, + 'cj_contrast': 0.7, + 'cj_sat': 0.7, + 'cj_hue': 0.2, + 'min_scale': 0.15, + 'random_gray_scale': 0.2, + 'gaussian_blur': 0.5, + 'kernel_size': 0.1, + 'vf_prob': 0, + 'hf_prob': 0.5, + 'rr_prob': 0 + } + } +)