Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor/remove v1 v2 #115

Merged
merged 3 commits into from
Nov 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 30 additions & 60 deletions docs/migration.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,22 @@
# Migration
This implementation currently maintains two separate endpoints.
There are "old" endpoints (Python-V1), which mimic responses of the PHP REST API (PHP-V1)
as closely as possible, and new endpoints (V2) which have some additional changes and
will be updated going forward.

The advised way to upgrade connector packages that currently interact with the old
JSON API is to first migrate from the old PHP API to our re-implemented V1 API.
See "[V1: PHP to Python](#v1--php-to-python)" for the differences between the PHP and
Python API. After that migration, continue with the "[V1 to V2](#v1-to-v2)" guide.

Connectors currently using the XML API are recommended to upgrade to V2 directly,
in which case using the generated REST API documentation is recommended.

## V1: PHP to Python

The first iteration of the new server has nearly identical responses to the old JSON
endpoints, but there are exceptions. Most exceptions either bug fixes, or arise from
technical limitations. This list covers the most important changes, but there may
be some undocumented changes for edge cases. The PHP API was underspecified, and we
decided that reverse engineering the specifications which mostly arise from
implementation details was not worth the effort. If there is a behavioral change which
was not documented but affects you, please [open a bug report](https://github.com/openml/server-api/issues/new?assignees=&labels=bug%2C+triage&projects=&template=bug-report.md&title=).

### All Endpoints
The Python reimplementation provides the same endpoints as the old API, which are
largely functioning the same way. However, there are a few major deviations:

* Use of typed JSON: e.g., when a value represents an integer, it is returned as integer.
* Lists when multiple values are possible: if a field can have none, one, or multiple entries (e.g., authors), we always return a list.
* Restriction or expansion of input types as appropriate.
* Standardizing authentication and access messages, and consistently execute those checks
before fetching data or providing error messages about the data.

The list above is not exhaustive. Minor changes include, for example, bug fixes and the removal of unnecessary nesting.
There may be undocumented changes, especially in edge cases which may not have occurred in the test environment.
As the PHP API was underspecified, the re-implementation is based on a mix of reading old code and probing the API.
If there is a behavioral change which was not documented but affects you, please [open a bug report](https://github.com/openml/server-api/issues/new?assignees=&labels=bug%2C+triage&projects=&template=bug-report.md&title=).

## All Endpoints
The following changes affect all endpoints.

#### Error on Invalid Input
### Error on Invalid Input
When providing input of invalid types (e.g., a non-integer dataset id) the HTTP header
and JSON content will be different.

Expand All @@ -45,17 +36,15 @@ and JSON content will be different.
These endpoints now do enforce stricter input constraints.
Constraints for each endpoint parameter are documented in the API docs.

#### Other Errors
### Other Errors
For any other error messages, the response is identical except that outer field will be `"detail"` instead of `"error"`:

```diff title="JSON Content"
- {"error":{"code":"112","message":"No access granted"}}
+ {"detail":{"code":"112","message":"No access granted"}}
```

In some cases the JSON endpoints previously returned XML ([example](https://github.com/openml/OpenML/issues/1200)).
Python-V1 will always return JSON.

In some cases the JSON endpoints previously returned XML ([example](https://github.com/openml/OpenML/issues/1200)), the new API always returns JSON.

```diff title="XML replaced by JSON"
- <oml:error xmlns:oml="http://openml.org/openml">
Expand All @@ -65,20 +54,26 @@ Python-V1 will always return JSON.
+ {"detail": {"code":"103", "message": "Authentication failed" } }
```

### Datasets
## Datasets

#### `GET /{dataset_id}`
### `GET /{dataset_id}`
- Dataset format names are normalized to be all lower-case
(`"Sparse_ARFF"` -> `"sparse_arff"`).
- Non-`arff` datasets will not incorrectly have a `"parquet_url"`:
https://github.com/openml/OpenML/issues/1189
- Non-`arff` datasets will not incorrectly have a `"parquet_url"` ([openml#1189](https://github.com/openml/OpenML/issues/1189)).
- If `"creator"` contains multiple comma-separated creators it is always returned
as a list, instead of it depending on the quotation used by the original uploader.
- For (some?) datasets that have multiple values in `"ignore_attribute"`, this field
is correctly populated instead of omitted.
- Processing date is formatted with a `T` in the middle:
```diff title="processing_date"
- "2019-07-09 15:22:03"
+ "2019-07-09T15:22:03"
```
- Fields which may contain lists of values (e.g., `creator`, `contributor`) now always
returns a list (which may also be empty or contain a single element).
- Fields without a set value are no longer automatically removed from the response.


#### `GET /data/list/{filters}`
### `GET /data/list/{filters}`

The endpoint now accepts the filters in the body of the request, instead of as query parameters.
```diff
Expand All @@ -95,28 +90,3 @@ includes datasets which are private.

The `limit` and `offset` parameters can now be used independently, you no longer need
to provide both if you wish to set only one.

## V1 to V2
Most of the changes are focused on standardizing responses, working on:

* using JSON types.
* removing levels of nesting for endpoints which return single-field JSON.
* always returning lists for fields which may contain multiple values even if it
contains only one element or no element.
* restricting or expanding input types as appropriate.
* standardizing authentication and access messages, and consistently execute those checks
before fetching data or providing error messages about the data.


### Datasets

#### `GET /{dataset_id}`

- Processing date is formatted with a `T` in the middle:
```diff title="processing_date"
- "2019-07-09 15:22:03"
+ "2019-07-09T15:22:03"
```
- Fields which may contain lists of values (e.g., `creator`, `contributor`) now always
returns a list (which may also be empty or contain a single element).
- Fields without a set value are no longer automatically removed from the response.
8 changes: 3 additions & 5 deletions src/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,9 @@
import uvicorn
from fastapi import FastAPI
from routers.mldcat_ap.dataset import router as mldcat_ap_router
from routers.v1.datasets import router as datasets_router_v1_format
from routers.v1.qualities import router as qualities_router
from routers.v1.tasktype import router as ttype_router
from routers.v2.datasets import router as datasets_router
from routers.openml.datasets import router as datasets_router
from routers.openml.qualities import router as qualities_router
from routers.openml.tasktype import router as ttype_router


def _parse_args() -> argparse.Namespace:
Expand Down Expand Up @@ -39,7 +38,6 @@ def create_api() -> FastAPI:
app = FastAPI()

app.include_router(datasets_router)
app.include_router(datasets_router_v1_format)
app.include_router(qualities_router)
app.include_router(mldcat_ap_router)
app.include_router(ttype_router)
Expand Down
2 changes: 1 addition & 1 deletion src/routers/mldcat_ap/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from sqlalchemy import Connection

from routers.dependencies import expdb_connection, userdb_connection
from routers.v2.datasets import get_dataset
from routers.openml.datasets import get_dataset

router = APIRouter(prefix="/mldcat_ap/datasets", tags=["datasets"])

Expand Down
File renamed without changes.
Loading