Igor lig 724 write python script to test milestone 01 (#726)

* Update imagenette results with new models * Update benchmark results for ImageNette * Add script and update tutorial * Remove `.` from title * Update title * Implement feedback * Update text * Implement feedback
lightly-ai · Mar 9, 2022 · 2c15ac8 · 2c15ac8
1 parent 4688bc9
commit 2c15ac8
Show file tree

Hide file tree

Showing 4 changed files with 224 additions and 117 deletions.
diff --git a/docs/source/docker/integration/docker_trigger_from_api.rst b/docs/source/docker/integration/docker_trigger_from_api.rst
@@ -1,6 +1,6 @@
 .. _integration-docker-trigger-from-api:
 
-Trigger a Docker Job from the API using a Remote Datasource
+Trigger a Docker Job from from the Platform or code
 ===========================================================
 
 Introduction
@@ -12,44 +12,17 @@ the provided dataset. The results are immediately available in the webapp for
 visualization and the selected samples are sent back to your 
 :ref:`cloud bucket <platform-create-dataset>`.
 
-
 Advantages
 ----------
 
 - You can submit jobs through the API, fully automating the Lightly workflow.
 - You can automatically trigger a new job when data is added to your dataset.
 - Use Lightly docker a background worker and processes new jobs automatically.
 
-Requirements
-------------
-This recipe requires that you already have a dataset in the Lightly Platform
-configured to use the data in your AWS S3 bucket. Create such a dataset in 2 steps:
-
-1. `Create a new dataset <https://app.lightly.ai/dataset/create>`_ in Lightly.
-   Make sure that you choose the input type `Images` or `Videos` correctly,
-   depending on the type of files in your cloud storage bucket.
-2. Edit your dataset, select the storage source as your datasource and fill out the form.
-   In our example we use an S3 bucket.
-
-    .. figure:: ../../getting_started/resources/LightlyEdit2.png
-        :align: center
-        :alt: Lightly S3 connection config
-        :width: 60%
-
-        Lightly S3 connection config
-
-If you don`t know how to fill out the form, follow the full tutorial to
-`create a Lightly dataset connected to your S3 bucket <https://docs.lightly.ai/getting_started/dataset_creation/dataset_creation_aws_bucket.html>`_.
-
-Furthermore, you should have access to a machine running docker. Ideally, it 
-also has a CUDA-GPU. A fast GPU will speed up the process significantly, 
-especially for large datasets.
-
 
 Download the Lightly Docker
 ---------------------------
-Next, the Lightly Docker should be installed. Please follow the instructions for
-the :ref:`ref-docker-setup`.
+Please follow the instructions for the :ref:`ref-docker-setup`.
 
 
 Register the Lightly Docker as a Worker
@@ -81,92 +54,138 @@ The state of the worker on the `Docker Workers <https://app.lightly.ai/docker/wo
 page should now indicate that the worker is in an idle state.
 
 
-Triggering a Job through the API
---------------------------------
-
-To trigger a new job you can click on the schedule run button on the dataset
-overview as shown in the screenshot below:
-
-.. image:: images/schedule-compute-run.png
-
-After clicking on the button you will see a wizard to configure the the parameters
-for the job.
-
-.. image:: images/schedule-compute-run-config.png
-
-In our example we use the following parameters.
-
-
-
-.. code-block:: javascript
-  :caption: Docker Config
-
-  {
-    enable_corruptness_check: true,
-    remove_exact_duplicates: true,
-    enable_training: false,
-    pretagging: false,
-    pretagging_debug: false,
-    method: 'coreset',
-    stopping_condition: {
-      n_samples: 0.1,
-      min_distance: -1
-    },
-    scorer: 'object-frequency',
-    scorer_config: {
-      frequency_penalty: 0.25,
-      min_score: 0.9
-    }
-  }
-
-
-
-.. code-block:: javascript
-  :caption: Lightly Config
-
-  {
-    loader: {
-      batch_size: 16,
-      shuffle: true,
-      num_workers: -1,
-      drop_last: true
-    },
-    model: {
-      name: 'resnet-18',
-      out_dim: 128,
-      num_ftrs: 32,
-      width: 1
-    },
-    trainer: {
-      gpus: 1,
-      max_epochs: 100,
-      precision: 32
-    },
-    criterion: {
-      temperature: 0.5
-    },
-    optimizer: {
-      lr: 1,
-      weight_decay: 0.00001
-    },
-    collate: {
-      input_size: 64,
-      cj_prob: 0.8,
-      cj_bright: 0.7,
-      cj_contrast: 0.7,
-      cj_sat: 0.7,
-      cj_hue: 0.2,
-      min_scale: 0.15,
-      random_gray_scale: 0.2,
-      gaussian_blur: 0.5,
-      kernel_size: 0.1,
-      vf_prob: 0,
-      hf_prob: 0.5,
-      rr_prob: 0
-    }
-  }
-
-Once the parameters are set you can schedule the run using a click on **schedule**.
+Create a Dataset and Trigger a Job
+-----------------------------------
+
+There are two ways to trigger a new job. You can either use the user interface
+provided through our Web App or you can use our Python package and build a script.
+
+
+.. tabs::
+
+    .. tab:: Web App
+
+      **Create a Dataset**
+
+      This recipe requires that you already have a dataset in the Lightly Platform
+      configured to use the data in your AWS S3 bucket. Create such a dataset in 2 steps:
+
+      1. `Create a new dataset <https://app.lightly.ai/dataset/create>`_ in Lightly.
+        Make sure that you choose the input type `Images` or `Videos` correctly,
+        depending on the type of files in your cloud storage bucket.
+      2. Edit your dataset, select the storage source as your datasource and fill out the form.
+        In our example we use an S3 bucket.
+
+          .. figure:: ../../getting_started/resources/LightlyEdit2.png
+              :align: center
+              :alt: Lightly S3 connection config
+              :width: 60%
+
+              Lightly S3 connection config
+
+      If you don`t know how to fill out the form, follow the full tutorial to
+      `create a Lightly dataset connected to your S3 bucket <https://docs.lightly.ai/getting_started/dataset_creation/dataset_creation_aws_bucket.html>`_.
+
+
+    .. tab:: Python Code
+
+      .. literalinclude:: examples/create_dataset.py
+
+And now we can schedule a new job.
+
+.. tabs::
+
+    .. tab:: Web App
+
+      **Trigger the Job**
+
+      To trigger a new job you can click on the schedule run button on the dataset
+      overview as shown in the screenshot below:
+
+      .. image:: images/schedule-compute-run.png
+
+      After clicking on the button you will see a wizard to configure the the parameters
+      for the job.
+
+      .. image:: images/schedule-compute-run-config.png
+
+      In our example we use the following parameters.
+
+
+
+      .. code-block:: javascript
+        :caption: Docker Config
+
+        {
+          enable_corruptness_check: true,
+          remove_exact_duplicates: true,
+          enable_training: false,
+          pretagging: false,
+          pretagging_debug: false,
+          method: 'coreset',
+          stopping_condition: {
+            n_samples: 0.1,
+            min_distance: -1
+          },
+          scorer: 'object-frequency',
+          scorer_config: {
+            frequency_penalty: 0.25,
+            min_score: 0.9
+          }
+        }
+
+
+
+      .. code-block:: javascript
+        :caption: Lightly Config
+
+        {
+          loader: {
+            batch_size: 16,
+            shuffle: true,
+            num_workers: -1,
+            drop_last: true
+          },
+          model: {
+            name: 'resnet-18',
+            out_dim: 128,
+            num_ftrs: 32,
+            width: 1
+          },
+          trainer: {
+            gpus: 1,
+            max_epochs: 100,
+            precision: 32
+          },
+          criterion: {
+            temperature: 0.5
+          },
+          optimizer: {
+            lr: 1,
+            weight_decay: 0.00001
+          },
+          collate: {
+            input_size: 64,
+            cj_prob: 0.8,
+            cj_bright: 0.7,
+            cj_contrast: 0.7,
+            cj_sat: 0.7,
+            cj_hue: 0.2,
+            min_scale: 0.15,
+            random_gray_scale: 0.2,
+            gaussian_blur: 0.5,
+            kernel_size: 0.1,
+            vf_prob: 0,
+            hf_prob: 0.5,
+            rr_prob: 0
+          }
+        }
+
+      Once the parameters are set you can schedule the run using a click on **schedule**.
+
+    .. tab:: Python Code
+
+      .. literalinclude:: examples/trigger_job_s3.py
 
 
 View the progress of the Lightly Docker
@@ -206,7 +225,7 @@ so you want to add a subset of them to your dataset.
 This workflow is supported by the Lightly Platform using a datapool. It
 remembers which raw data in your S3 bucket has already been processed and will
 ignore it in future docker runs. Thus you can send the same job again to the 
-Lightly Worker. It will find your new raw data in the S3 bucket, stream, embed
+worker. It will find your new raw data in the S3 bucket, stream, embed
 and subsample it and then add it to your existing dataset. The samplers will
 take the existing data in your dataset into account when sampling new data to be
 added to your dataset.

diff --git a/docs/source/docker/integration/docker_with_datasource.rst b/docs/source/docker/integration/docker_with_datasource.rst
@@ -1,7 +1,7 @@
 
 .. _ref-docker-with-datasource:
 
-Using the docker with an S3 bucket as remote datasource.
+Using the docker with an S3 bucket as remote datasource
 ========================================================
 
 Introduction

diff --git a/docs/source/docker/integration/examples/create_dataset.py b/docs/source/docker/integration/examples/create_dataset.py
@@ -0,0 +1,15 @@
+import lightly
+
+# we create the Lightly client to connect to the API
+client = lightly.api.ApiWorkflowClient(token="TOKEN")
+
+# create a new dataset using Python code
+# and connect it to an existing S3 bucket
+client.create_dataset('dataset-name')
+client.set_s3_config(
+    resource_path="s3://bucket/dataset",
+    region="eu-central-1",
+    access_key="KEY",
+    secret_access_key="SECRET",
+    thumbnail_suffix=None,
+)
diff --git a/docs/source/docker/integration/examples/trigger_job_s3.py b/docs/source/docker/integration/examples/trigger_job_s3.py
@@ -0,0 +1,73 @@
+import lightly
+
+# we create the Lightly client to connect to the API
+# don't forget to pass the dataset_id if you don't use the client from the 
+# previous snippet (create a dataset)
+client = lightly.api.ApiWorkflowClient(token="TOKEN", dataset_id="DATASET_ID")
+
+# schedule the compute run using our custom config
+# We show here the full default config you can easily edit the
+# value according to your need.
+client.schedule_compute_worker_run(
+    worker_config={
+        'stopping_condition': {
+            'n_samples': 10,
+            'enable_corruptness_check': True,
+            'remove_exact_duplicates': True,
+            'enable_training': False,
+            'pretagging': False,
+            'pretagging_debug': False,
+            'method': 'coreset',
+            'stopping_condition': {
+                'n_samples': 0.1,
+                'min_distance': -1
+            },
+            'scorer': 'object-frequency',
+            'scorer_config': {
+                'frequency_penalty': 0.25,
+                'min_score': 0.9
+            }
+        }
+    },
+    lightly_config={
+        'loader': {
+            'batch_size': 16,
+            'shuffle': True,
+            'num_workers': -1,
+            'drop_last': True
+        },
+        'model': {
+            'name': 'resnet-18',
+            'out_dim': 128,
+            'num_ftrs': 32,
+            'width': 1
+        },
+        'trainer': {
+            'gpus': 1,
+            'max_epochs': 100,
+            'precision': 32
+        },
+        'criterion': {
+            'temperature': 0.5
+        },
+        'optimizer': {
+            'lr': 1,
+            'weight_decay': 0.00001
+        },
+        'collate': {
+            'input_size': 64,
+            'cj_prob': 0.8,
+            'cj_bright': 0.7,
+            'cj_contrast': 0.7,
+            'cj_sat': 0.7,
+            'cj_hue': 0.2,
+            'min_scale': 0.15,
+            'random_gray_scale': 0.2,
+            'gaussian_blur': 0.5,
+            'kernel_size': 0.1,
+            'vf_prob': 0,
+            'hf_prob': 0.5,
+            'rr_prob': 0
+        }
+    }
+)