address feedback

stac-extensions · Nov 4, 2024 · cd8ab8c · cd8ab8c
1 parent 0989894
commit cd8ab8c
Showing 1 changed file with 21 additions and 18 deletions.
diff --git a/MIGRATION_TO_MLM.md b/MIGRATION_TO_MLM.md
@@ -6,10 +6,10 @@ The ML Model Extension was started at Radiant Earth on October 4th, 2021. It was
 
 ## Shared Goals
 
-Both the ML Model Extension and the Machine Learning Model (MLM) Extension aim to provide a standard way to catalog machine learning (ML) models that work with Earth observation (EO) data. Their main goals are:
+Both the ML Model Extension and the Machine Learning Model (MLM) Extension aim to provide a standard way to catalog machine learning (ML) models that work with, but are not limited to, Earth observation (EO) data. Their main goals are:
 
 1. **Search and Discovery**: Helping users find and use ML models.
-2. **Describing Inference Requirements**: Making it easier to run these models by describing input requirements and outputs.
+2. **Describing Inference and Training Requirements**: Making it easier to run these models by describing input requirements and outputs.
 3. **Reproducibility**: Providing runtime information and links to assets so that model inference is reproducible.
 
 ## Schema Changes
@@ -39,6 +39,7 @@ Notable differences:
 - The MLM Extension covers more details at both the Item and Asset levels, making it easier to describe and use model metadata.
 - The MLM Extension covers Runtime requirements within the [Container Asset](https://github.com/crim-ca/mlm-extension?tab=readme-ov-file#container-asset), while the ML Model Extension records [similar information](./README.md#inferencetraining-runtimes) in the `ml-model:inference-runtime` or `ml-model:training-runtime` asset roles.
 - The MLM extension has a corresponding Python library, [`stac-model`](https://pypi.org/project/stac-model/) which can be used to create and validate MLM metadata. An example of the library in action is [here](https://github.com/crim-ca/mlm-extension/blob/main/stac_model/examples.py#L14). The ML Model extension does not support this and requires the JSON to be written manually by interpreting the JSON Schema or existing examples.
+- MLM is easier to maintain and enhance in a fast moving ML ecosystem thanks to it's use of pydantic models, while still being compatible with pystac for extension and STAc core validation.
 
 ## Changes in Field Names
 
@@ -49,27 +50,29 @@ Notable differences:
 | `ml-model:type`                    | N/A                | No direct equivalent, it is implied by the `mlm` prefix in MLM fields and directly specified by the schema identifier.                                                                                                                                                                                                                                    |
 | `ml-model:learning_approach`       | `mlm:tasks`        | Removed in favor of specifying specific `mlm:tasks`.                                                                                                                                                                                                                                                                                                      |
 | `ml-model:prediction_type`         | `mlm:tasks`        | `mlm:tasks` provides a more comprehensive enum of prediction types.                                                                                                                                                                                                                                                                                       |
-| `ml-model:architecture`            | `mlm:architecture` | The MLM provides specific guidance on using Papers With Code - Computer Vision identifiers for model architectures. No guidance is provided in ML Model.                                                                                                                                                                                                  |
-| `ml-model:training-processor-type` | `mlm:accelerator`  | MLM defines more choices for accelerators in an enum and specifies that this is the accelerator for inference (the focus of the MLM extension is inference). ML Model only accepts `cpu` or `gpu` but this isn't sufficient today where we have models optimized for different CPU architectures, CUDA GPUs, Intel GPUs, AMD GPUs, Mac Silicon, and TPUs. |
+| `ml-model:architecture`            | `mlm:architecture` | The MLM provides specific guidance on using *Papers With Code - Computer Vision* identifiers for model architectures. No guidance is provided in ML Model.                                                                                                                                                                                                  |
+| `ml-model:training-processor-type` | `mlm:accelerator`  | MLM defines more choices for accelerators in an enum and specifies that this is the accelerator for inference. ML Model only accepts `cpu` or `gpu` but this isn't sufficient today where we have models optimized for different CPU architectures, CUDA GPUs, Intel GPUs, AMD GPUs, Mac Silicon, and TPUs. |
 | `ml-model:training-os`             | N/A                | This field is no longer recommended in the MLM for training or inference; instead, users can specify an optional `mlm:training-runtime` asset.                                                                                                                                                                                                            |
 
 
 ### New Fields in MLM
 
-- **`mlm:name`**: A required name for the model.
-- **`mlm:framework`**: The framework used to train the model.
-- **`mlm:framework_version`**: The version of the framework. Useful in case a container runtime asset is not specified or if the consumer of the MLM wants to run the model outside of a container.
-- **`mlm:memory_size`**: The in-memory size of the model.
-- **`mlm:total_parameters`**: Total number of model parameters.
-- **`mlm:pretrained`**: Indicates if the model is derived from a pretrained model.
-- **`mlm:pretrained_source`**: Source of the pretrained model by name or URL if it is less well known.
-- **`mlm:batch_size_suggestion`**: Suggested batch size for the given accelerator.
-- **`mlm:accelerator_constrained`**: Indicates if the model requires a specific accelerator.
-- **`mlm:accelerator_summary`**: Description of the accelerator. This might contain details on the exact accelerator version (TPUv4 vs TPUv5) and their configuration.
-- **`mlm:accelerator_count`**: Minimum number of accelerator instances required.
-- **`mlm:input`**: Describes the model's input shape, dtype, and normalization and resize transformations.
-- **`mlm:output`**: Describes the model's output shape and dtype.
-- **`mlm:hyperparameters`**: Additional hyperparameters relevant to the model.
+| Field Name                       | Description                                                                                                             |
+|----------------------------------|-------------------------------------------------------------------------------------------------------------------------|
+| **`mlm:name`**                   | A required name for the model.                                                                                          |
+| **`mlm:framework`**              | The framework used to train the model.                                                                                  |
+| **`mlm:framework_version`**      | The version of the framework. Useful in case a container runtime asset is not specified or if the consumer of the MLM wants to run the model outside of a container. |
+| **`mlm:memory_size`**            | The in-memory size of the model.                                                                                        |
+| **`mlm:total_parameters`**       | Total number of model parameters.                                                                                       |
+| **`mlm:pretrained`**             | Indicates if the model is derived from a pretrained model.                                                              |
+| **`mlm:pretrained_source`**      | Source of the pretrained model by name or URL if it is less well known.                                                 |
+| **`mlm:batch_size_suggestion`**  | Suggested batch size for the given accelerator.                                                                         |
+| **`mlm:accelerator_constrained`**| Indicates if the model requires a specific accelerator.                                                                 |
+| **`mlm:accelerator_summary`**    | Description of the accelerator. This might contain details on the exact accelerator version (TPUv4 vs TPUv5) and their configuration. |
+| **`mlm:accelerator_count`**      | Minimum number of accelerator instances required.                                                                       |
+| **`mlm:input`**                  | Describes the model's input shape, dtype, and normalization and resize transformations.                                 |
+| **`mlm:output`**                 | Describes the model's output shape and dtype.                                                                           |
+| **`mlm:hyperparameters`**        | Additional hyperparameters relevant to the model.                                                                       |
 
 ### Asset Objects