Skip to content

Commit

Permalink
Update for removal of distinct V1 and V2 endpoints
Browse files Browse the repository at this point in the history
  • Loading branch information
PGijsbers committed Nov 28, 2023
1 parent 33b6e42 commit 714eaf2
Showing 1 changed file with 30 additions and 60 deletions.
90 changes: 30 additions & 60 deletions docs/migration.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,22 @@
# Migration
This implementation currently maintains two separate endpoints.
There are "old" endpoints (Python-V1), which mimic responses of the PHP REST API (PHP-V1)
as closely as possible, and new endpoints (V2) which have some additional changes and
will be updated going forward.

The advised way to upgrade connector packages that currently interact with the old
JSON API is to first migrate from the old PHP API to our re-implemented V1 API.
See "[V1: PHP to Python](#v1--php-to-python)" for the differences between the PHP and
Python API. After that migration, continue with the "[V1 to V2](#v1-to-v2)" guide.

Connectors currently using the XML API are recommended to upgrade to V2 directly,
in which case using the generated REST API documentation is recommended.

## V1: PHP to Python

The first iteration of the new server has nearly identical responses to the old JSON
endpoints, but there are exceptions. Most exceptions either bug fixes, or arise from
technical limitations. This list covers the most important changes, but there may
be some undocumented changes for edge cases. The PHP API was underspecified, and we
decided that reverse engineering the specifications which mostly arise from
implementation details was not worth the effort. If there is a behavioral change which
was not documented but affects you, please [open a bug report](https://github.com/openml/server-api/issues/new?assignees=&labels=bug%2C+triage&projects=&template=bug-report.md&title=).

### All Endpoints
The Python reimplementation provides the same endpoints as the old API, which are
largely functioning the same way. However, there are a few major deviations:

* Use of typed JSON: e.g., when a value represents an integer, it is returned as integer.
* Lists when multiple values are possible: if a field can have none, one, or multiple entries (e.g., authors), we always return a list.
* Restriction or expansion of input types as appropriate.
* Standardizing authentication and access messages, and consistently execute those checks
before fetching data or providing error messages about the data.

The list above is not exhaustive. Minor changes include, for example, bug fixes and the removal of unnecessary nesting.
There may be undocumented changes, especially in edge cases which may not have occurred in the test environment.
As the PHP API was underspecified, the re-implementation is based on a mix of reading old code and probing the API.
If there is a behavioral change which was not documented but affects you, please [open a bug report](https://github.com/openml/server-api/issues/new?assignees=&labels=bug%2C+triage&projects=&template=bug-report.md&title=).

## All Endpoints
The following changes affect all endpoints.

#### Error on Invalid Input
### Error on Invalid Input
When providing input of invalid types (e.g., a non-integer dataset id) the HTTP header
and JSON content will be different.

Expand All @@ -45,17 +36,15 @@ and JSON content will be different.
These endpoints now do enforce stricter input constraints.
Constraints for each endpoint parameter are documented in the API docs.

#### Other Errors
### Other Errors
For any other error messages, the response is identical except that outer field will be `"detail"` instead of `"error"`:

```diff title="JSON Content"
- {"error":{"code":"112","message":"No access granted"}}
+ {"detail":{"code":"112","message":"No access granted"}}
```

In some cases the JSON endpoints previously returned XML ([example](https://github.com/openml/OpenML/issues/1200)).
Python-V1 will always return JSON.

In some cases the JSON endpoints previously returned XML ([example](https://github.com/openml/OpenML/issues/1200)), the new API always returns JSON.

```diff title="XML replaced by JSON"
- <oml:error xmlns:oml="http://openml.org/openml">
Expand All @@ -65,20 +54,26 @@ Python-V1 will always return JSON.
+ {"detail": {"code":"103", "message": "Authentication failed" } }
```

### Datasets
## Datasets

#### `GET /{dataset_id}`
### `GET /{dataset_id}`
- Dataset format names are normalized to be all lower-case
(`"Sparse_ARFF"` -> `"sparse_arff"`).
- Non-`arff` datasets will not incorrectly have a `"parquet_url"`:
https://github.com/openml/OpenML/issues/1189
- Non-`arff` datasets will not incorrectly have a `"parquet_url"` ([openml#1189](https://github.com/openml/OpenML/issues/1189)).
- If `"creator"` contains multiple comma-separated creators it is always returned
as a list, instead of it depending on the quotation used by the original uploader.
- For (some?) datasets that have multiple values in `"ignore_attribute"`, this field
is correctly populated instead of omitted.
- Processing date is formatted with a `T` in the middle:
```diff title="processing_date"
- "2019-07-09 15:22:03"
+ "2019-07-09T15:22:03"
```
- Fields which may contain lists of values (e.g., `creator`, `contributor`) now always
returns a list (which may also be empty or contain a single element).
- Fields without a set value are no longer automatically removed from the response.


#### `GET /data/list/{filters}`
### `GET /data/list/{filters}`

The endpoint now accepts the filters in the body of the request, instead of as query parameters.
```diff
Expand All @@ -95,28 +90,3 @@ includes datasets which are private.

The `limit` and `offset` parameters can now be used independently, you no longer need
to provide both if you wish to set only one.

## V1 to V2
Most of the changes are focused on standardizing responses, working on:

* using JSON types.
* removing levels of nesting for endpoints which return single-field JSON.
* always returning lists for fields which may contain multiple values even if it
contains only one element or no element.
* restricting or expanding input types as appropriate.
* standardizing authentication and access messages, and consistently execute those checks
before fetching data or providing error messages about the data.


### Datasets

#### `GET /{dataset_id}`

- Processing date is formatted with a `T` in the middle:
```diff title="processing_date"
- "2019-07-09 15:22:03"
+ "2019-07-09T15:22:03"
```
- Fields which may contain lists of values (e.g., `creator`, `contributor`) now always
returns a list (which may also be empty or contain a single element).
- Fields without a set value are no longer automatically removed from the response.

0 comments on commit 714eaf2

Please sign in to comment.