Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve HTTP API #370

Closed
wants to merge 1 commit into from
Closed

Improve HTTP API #370

wants to merge 1 commit into from

Conversation

jrwdunham
Copy link
Contributor

@jrwdunham jrwdunham commented Jun 5, 2018

Connected to #369

Summary

  • Adds consistent REST (HTTP JSON) endpoints for almost every model in the Storage Service.
  • Builds documentation into the API using a generated OpenAPI 3.0 YAML file and Swagger UI interface to same.
  • Auto-generates Python client code which is self-documenting and has client-side validation.
  • Introduces abstractions and an example of how to implement an OpenAPI-compliant custom endpoint GET locations/{pk}/browse/ using Python data structures.

This PR adds a new set of versioned API endpoints (under the beta namespace) which address #369 by being:

  • consistent: same HTTP method + path regex for all endpoints or operations of a given type; all resources expose the same read/write or read-only operations.
  • thoroughly documented using OpenAPI 3.0 spec—see /api/beta/—and a Swagger UI interface—see /api/beta/doc/—to same;
  • able to generate API client code—see /api/beta/client/—which is generated by clientbuilder.py; and
  • introduces needed endpoints, e.g., advanced search over all exposed resources, DELETE, PUT (update), etc.

How to Understand OpenAPI 3.0

OpenAPI suffers from too much documentation. I have found this one to be good:

How to Understand this PR

  • The API endpoints introduced here are configured in storage_service/locations/api/beta/
    • __init__.py defines the resources dict that tells Remple which resources with which endpoints to expose.
    • resources/ defines the resource classes (which point to Django models to infer read schemata and Formencode schemata for generating mutate schemata).
      • resources.py — simple resources with little configuration (ie no custom endpoints)
      • locations.py — a module of its own because it has a custom endpoint: browse
    • schemata.py defines the (Formencode) schemata for viewing, creating and updating resources.
  • Remple is the mini REST framework introduced here. If this PR is merged, we will probably want to move Remple into its own repo so that Archivematica can use it also.
    • resources.py provides the base classes for defining resources.
    • querybuilder.py holds the logic for transforming JSON/Python data structures into Django ORM filters.
    • routebuilder.py takes a resources config dict and can return a list of Django url() instances that can be included in urlpatterns.
    • openapi.py can generate an OpenAPI data structure given a configured Remple API and spit out an OpenAPI 3.0-conformant YAML file that can be used to generate the Swagger UI JavaScript/HTML app as well as the Python client code.
    • clientbuilder.py is a Python module that must be modified to contain an OpenAPI data structure so that it can define (at runtime) a set of Python classes that constitute a client for the SS API.

Questions

  • Would it be better to use TastyPie, Django Rest Framework or APIStar (Py3 only, but has OpenAPI spec generation built-in) instead of the custom-built Remple mini REST framework Remple introduced here? At a high level, the benefits of rolling our own API mini framework allows us more control as well as the ability to be on the bleeding edge when it comes to the OpenAPI spec (cf. codegen's unavailability in OpenAPI v. 3.0). The benefits of using an existing framework are a lower maintenance burden for us and the pros of using a battle-tested code base with buy-in from disparate stakeholders.
  • Can existing code generation tools be used to build the Python client/SDK instead of rolling our own in clientbuilder.py? From my research, the answer was "No, because these tools have not caught up to OpenAPI v. 3.0 yet." See Swagger Codegen. My preference would be to use clientbuilder.py and switch to a standard OpenAPI code generation tool when such tools catch up to the spec, an option that is opened up for us by virtue of using OpenAPI.

@jrwdunham jrwdunham changed the title WIP: Improve REST API WIP: Improve HTTP API Jun 5, 2018
@jrwdunham jrwdunham changed the base branch from stable/0.11.x to qa/0.x June 5, 2018 21:51
This was referenced Jun 5, 2018
@qubot qubot force-pushed the dev/issue-369-improve-http-api branch from 4850aa7 to 9105797 Compare June 5, 2018 22:39
@jrwdunham jrwdunham self-assigned this Jun 5, 2018
@jrwdunham jrwdunham added the Type: enhancement An improvement to existing functionality. label Jun 5, 2018
@qubot qubot force-pushed the dev/issue-369-improve-http-api branch 4 times, most recently from 2cd181a to 41ed013 Compare June 13, 2018 18:36
@sevein
Copy link
Member

sevein commented Jun 13, 2018

I think that you've accidentally checked in storage_service/.pytest_cache/v/cache/*.

@qubot qubot force-pushed the dev/issue-369-improve-http-api branch 3 times, most recently from db0dc01 to f6f8a2f Compare June 13, 2018 20:01
@jrwdunham
Copy link
Contributor Author

Thanks @sevein: removed those

@qubot qubot force-pushed the dev/issue-369-improve-http-api branch 7 times, most recently from cffa4d5 to ae871b7 Compare June 18, 2018 19:52
@qubot qubot force-pushed the dev/issue-369-improve-http-api branch from ae871b7 to 2c4f950 Compare June 18, 2018 21:01
@jrwdunham jrwdunham changed the title WIP: Improve HTTP API Improve HTTP API Jun 18, 2018
@jrwdunham jrwdunham requested review from sevein and ross-spencer June 18, 2018 23:27
@jambun
Copy link
Contributor

jambun commented Jun 20, 2018

Hi @jrwdunham. We have spent some time going through this and our overall impression is positive. We like the approach and were able to understand how you had done things and why.

Using clientbuilder as a stopgap makes sense. Hopefully swagger support isn’t too far away. We noticed this: https://github.com/openapitools/openapi-generator.

Our understanding is a bit abstract at this stage - we’ve only eyeballed the code. We will be able to give more detailed feedback after using it.

path_dict[http_method][attr] = val
paths[path] = path_dict

def _get_search_request_body_examples(self, resource_name):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specific to the Archivematica Storage Service so it should be removed or moved outside of Remple. Futhermore, it seems that it is not even being used.

rsrc_collection_name):
for action in ('search_post', 'new_search'):
http_method = {'search_post': 'post', 'new_search': 'get'}.get(
action, 'search')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document or remove the implicit usage here of the non-standard HTTP method SEARCH.

path_dict[http_method]['requestBody'] = request_body
path_dict[http_method]['responses'] = getattr(
self, responses_meth)(resource_name)
path_dict[http_method]['tags'] = [rsrc_collection_name]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to remove some of the duplication here with _set_crud_paths.

"""
return [
OrderedDict([
('url', self.get_dflt_server_path()),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: should we be able to use info in django_settings to specify a true URL here instead of just a path?

schema is constructed by introspecting the Django model, while the
create and update schemata are constructed by introspecting the relevant
Formencode schemata attached as class attributes on the resource class.
"""
Copy link
Contributor Author

@jrwdunham jrwdunham Jun 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: it seems that OpenAPI's readOnly and writeOnly boolean property attributes could be used here. See https://swagger.io/docs/specification/data-models/data-types/, in particular the Read-Only and Write-Only Properties section.

input-named resource.

TODO: should every parameter in an update request be optional, given
that the resource being updated is presumably valid?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. This comment is misleading. Update via Remple has the semantics of PUT not PATCH. A full representation of the post-update state must be sent in the update request.

"""Return a read schema for the resource named ``resource_name``.

The read schema describes what is returned by the server as a
representation of an instance of the input-named resource.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring should indicate that this is done by inspecting the Django model.

non_empty_ptr = [p for p in parts_to_right if int(p)]
if non_empty_ptr:
new_parts.append(part)
return 'v{}'.format('_'.join(new_parts))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above is only needed if we want to support semver-compliant API versions, which may not be a good idea.

return getattr(field, 'default', NOT_PROVIDED)


def get_required(field, default):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix function name since it returns a boolean indicating whether a field is required and whether it is null(able).

@helrond
Copy link

helrond commented Sep 13, 2018

This is a huge step forward in making the Storage Service API consistent, predictable and documented. Thank you! I haven't had a chance to spin this up locally yet, but I'm planning to do that in the next week or so and may have more specific feedback after that.

I am not a real developer, and have a very rudimentary knowledge of the Archivematica codebase, however from my perspective I'm not sure it makes sense to implement a custom REST framework. It seems like maintenance for Archivematica is already challenging enough that alleviating some of that burden would outweigh the benefits of OpenAPI 3.0 compliance, especially since as you note the tooling isn't really there yet. But perhaps I don't fully understand the benefits of OpenAPI 2.x versus 3.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stalled Type: enhancement An improvement to existing functionality.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants