Skip to content

Commit

Permalink
Igor lig 724 write python script to test milestone 01 (#726)
Browse files Browse the repository at this point in the history
* Update imagenette results with new models

* Update benchmark results for ImageNette

* Add script and update tutorial

* Remove `.` from title

* Update title

* Implement feedback

* Update text

* Implement feedback
  • Loading branch information
IgorSusmelj authored Mar 9, 2022
1 parent 4688bc9 commit 2c15ac8
Show file tree
Hide file tree
Showing 4 changed files with 224 additions and 117 deletions.
251 changes: 135 additions & 116 deletions docs/source/docker/integration/docker_trigger_from_api.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. _integration-docker-trigger-from-api:

Trigger a Docker Job from the API using a Remote Datasource
Trigger a Docker Job from from the Platform or code
===========================================================

Introduction
Expand All @@ -12,44 +12,17 @@ the provided dataset. The results are immediately available in the webapp for
visualization and the selected samples are sent back to your
:ref:`cloud bucket <platform-create-dataset>`.


Advantages
----------

- You can submit jobs through the API, fully automating the Lightly workflow.
- You can automatically trigger a new job when data is added to your dataset.
- Use Lightly docker a background worker and processes new jobs automatically.

Requirements
------------
This recipe requires that you already have a dataset in the Lightly Platform
configured to use the data in your AWS S3 bucket. Create such a dataset in 2 steps:

1. `Create a new dataset <https://app.lightly.ai/dataset/create>`_ in Lightly.
Make sure that you choose the input type `Images` or `Videos` correctly,
depending on the type of files in your cloud storage bucket.
2. Edit your dataset, select the storage source as your datasource and fill out the form.
In our example we use an S3 bucket.

.. figure:: ../../getting_started/resources/LightlyEdit2.png
:align: center
:alt: Lightly S3 connection config
:width: 60%

Lightly S3 connection config

If you don`t know how to fill out the form, follow the full tutorial to
`create a Lightly dataset connected to your S3 bucket <https://docs.lightly.ai/getting_started/dataset_creation/dataset_creation_aws_bucket.html>`_.

Furthermore, you should have access to a machine running docker. Ideally, it
also has a CUDA-GPU. A fast GPU will speed up the process significantly,
especially for large datasets.


Download the Lightly Docker
---------------------------
Next, the Lightly Docker should be installed. Please follow the instructions for
the :ref:`ref-docker-setup`.
Please follow the instructions for the :ref:`ref-docker-setup`.


Register the Lightly Docker as a Worker
Expand Down Expand Up @@ -81,92 +54,138 @@ The state of the worker on the `Docker Workers <https://app.lightly.ai/docker/wo
page should now indicate that the worker is in an idle state.


Triggering a Job through the API
--------------------------------

To trigger a new job you can click on the schedule run button on the dataset
overview as shown in the screenshot below:

.. image:: images/schedule-compute-run.png

After clicking on the button you will see a wizard to configure the the parameters
for the job.

.. image:: images/schedule-compute-run-config.png

In our example we use the following parameters.



.. code-block:: javascript
:caption: Docker Config
{
enable_corruptness_check: true,
remove_exact_duplicates: true,
enable_training: false,
pretagging: false,
pretagging_debug: false,
method: 'coreset',
stopping_condition: {
n_samples: 0.1,
min_distance: -1
},
scorer: 'object-frequency',
scorer_config: {
frequency_penalty: 0.25,
min_score: 0.9
}
}
.. code-block:: javascript
:caption: Lightly Config
{
loader: {
batch_size: 16,
shuffle: true,
num_workers: -1,
drop_last: true
},
model: {
name: 'resnet-18',
out_dim: 128,
num_ftrs: 32,
width: 1
},
trainer: {
gpus: 1,
max_epochs: 100,
precision: 32
},
criterion: {
temperature: 0.5
},
optimizer: {
lr: 1,
weight_decay: 0.00001
},
collate: {
input_size: 64,
cj_prob: 0.8,
cj_bright: 0.7,
cj_contrast: 0.7,
cj_sat: 0.7,
cj_hue: 0.2,
min_scale: 0.15,
random_gray_scale: 0.2,
gaussian_blur: 0.5,
kernel_size: 0.1,
vf_prob: 0,
hf_prob: 0.5,
rr_prob: 0
}
}
Once the parameters are set you can schedule the run using a click on **schedule**.
Create a Dataset and Trigger a Job
-----------------------------------

There are two ways to trigger a new job. You can either use the user interface
provided through our Web App or you can use our Python package and build a script.


.. tabs::

.. tab:: Web App

**Create a Dataset**

This recipe requires that you already have a dataset in the Lightly Platform
configured to use the data in your AWS S3 bucket. Create such a dataset in 2 steps:

1. `Create a new dataset <https://app.lightly.ai/dataset/create>`_ in Lightly.
Make sure that you choose the input type `Images` or `Videos` correctly,
depending on the type of files in your cloud storage bucket.
2. Edit your dataset, select the storage source as your datasource and fill out the form.
In our example we use an S3 bucket.

.. figure:: ../../getting_started/resources/LightlyEdit2.png
:align: center
:alt: Lightly S3 connection config
:width: 60%

Lightly S3 connection config

If you don`t know how to fill out the form, follow the full tutorial to
`create a Lightly dataset connected to your S3 bucket <https://docs.lightly.ai/getting_started/dataset_creation/dataset_creation_aws_bucket.html>`_.


.. tab:: Python Code

.. literalinclude:: examples/create_dataset.py

And now we can schedule a new job.

.. tabs::

.. tab:: Web App

**Trigger the Job**

To trigger a new job you can click on the schedule run button on the dataset
overview as shown in the screenshot below:

.. image:: images/schedule-compute-run.png

After clicking on the button you will see a wizard to configure the the parameters
for the job.

.. image:: images/schedule-compute-run-config.png

In our example we use the following parameters.



.. code-block:: javascript
:caption: Docker Config
{
enable_corruptness_check: true,
remove_exact_duplicates: true,
enable_training: false,
pretagging: false,
pretagging_debug: false,
method: 'coreset',
stopping_condition: {
n_samples: 0.1,
min_distance: -1
},
scorer: 'object-frequency',
scorer_config: {
frequency_penalty: 0.25,
min_score: 0.9
}
}
.. code-block:: javascript
:caption: Lightly Config
{
loader: {
batch_size: 16,
shuffle: true,
num_workers: -1,
drop_last: true
},
model: {
name: 'resnet-18',
out_dim: 128,
num_ftrs: 32,
width: 1
},
trainer: {
gpus: 1,
max_epochs: 100,
precision: 32
},
criterion: {
temperature: 0.5
},
optimizer: {
lr: 1,
weight_decay: 0.00001
},
collate: {
input_size: 64,
cj_prob: 0.8,
cj_bright: 0.7,
cj_contrast: 0.7,
cj_sat: 0.7,
cj_hue: 0.2,
min_scale: 0.15,
random_gray_scale: 0.2,
gaussian_blur: 0.5,
kernel_size: 0.1,
vf_prob: 0,
hf_prob: 0.5,
rr_prob: 0
}
}
Once the parameters are set you can schedule the run using a click on **schedule**.

.. tab:: Python Code

.. literalinclude:: examples/trigger_job_s3.py


View the progress of the Lightly Docker
Expand Down Expand Up @@ -206,7 +225,7 @@ so you want to add a subset of them to your dataset.
This workflow is supported by the Lightly Platform using a datapool. It
remembers which raw data in your S3 bucket has already been processed and will
ignore it in future docker runs. Thus you can send the same job again to the
Lightly Worker. It will find your new raw data in the S3 bucket, stream, embed
worker. It will find your new raw data in the S3 bucket, stream, embed
and subsample it and then add it to your existing dataset. The samplers will
take the existing data in your dataset into account when sampling new data to be
added to your dataset.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/docker/integration/docker_with_datasource.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@

.. _ref-docker-with-datasource:

Using the docker with an S3 bucket as remote datasource.
Using the docker with an S3 bucket as remote datasource
========================================================

Introduction
Expand Down
15 changes: 15 additions & 0 deletions docs/source/docker/integration/examples/create_dataset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import lightly

# we create the Lightly client to connect to the API
client = lightly.api.ApiWorkflowClient(token="TOKEN")

# create a new dataset using Python code
# and connect it to an existing S3 bucket
client.create_dataset('dataset-name')
client.set_s3_config(
resource_path="s3://bucket/dataset",
region="eu-central-1",
access_key="KEY",
secret_access_key="SECRET",
thumbnail_suffix=None,
)
73 changes: 73 additions & 0 deletions docs/source/docker/integration/examples/trigger_job_s3.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
import lightly

# we create the Lightly client to connect to the API
# don't forget to pass the dataset_id if you don't use the client from the
# previous snippet (create a dataset)
client = lightly.api.ApiWorkflowClient(token="TOKEN", dataset_id="DATASET_ID")

# schedule the compute run using our custom config
# We show here the full default config you can easily edit the
# value according to your need.
client.schedule_compute_worker_run(
worker_config={
'stopping_condition': {
'n_samples': 10,
'enable_corruptness_check': True,
'remove_exact_duplicates': True,
'enable_training': False,
'pretagging': False,
'pretagging_debug': False,
'method': 'coreset',
'stopping_condition': {
'n_samples': 0.1,
'min_distance': -1
},
'scorer': 'object-frequency',
'scorer_config': {
'frequency_penalty': 0.25,
'min_score': 0.9
}
}
},
lightly_config={
'loader': {
'batch_size': 16,
'shuffle': True,
'num_workers': -1,
'drop_last': True
},
'model': {
'name': 'resnet-18',
'out_dim': 128,
'num_ftrs': 32,
'width': 1
},
'trainer': {
'gpus': 1,
'max_epochs': 100,
'precision': 32
},
'criterion': {
'temperature': 0.5
},
'optimizer': {
'lr': 1,
'weight_decay': 0.00001
},
'collate': {
'input_size': 64,
'cj_prob': 0.8,
'cj_bright': 0.7,
'cj_contrast': 0.7,
'cj_sat': 0.7,
'cj_hue': 0.2,
'min_scale': 0.15,
'random_gray_scale': 0.2,
'gaussian_blur': 0.5,
'kernel_size': 0.1,
'vf_prob': 0,
'hf_prob': 0.5,
'rr_prob': 0
}
}
)

0 comments on commit 2c15ac8

Please sign in to comment.