Skip to content

Commit

Permalink
Merge pull request #4148 from lsst-sqre/tickets/DM-48622
Browse files Browse the repository at this point in the history
DM-48622: Document setting quotas in Gafaelfawr
  • Loading branch information
rra authored Jan 29, 2025
2 parents 1929016 + ac68246 commit 048c416
Show file tree
Hide file tree
Showing 4 changed files with 204 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/admin/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ Administrators operate infrastructure, manage secrets, and are involved in the d
audit-secrets
update-pull-secret
migrating-secrets
set-quotas

.. toctree::
:caption: Troubleshooting
Expand Down
15 changes: 15 additions & 0 deletions docs/admin/set-quotas.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
##############
Setting quotas
##############

Phalanx currently supports quotas for API services and for notebook spawning.
The quotas for notebook spawning can also be used to temporarily disable spawning of new notebooks.

Quotas are configured in the application configuration for Gafaelfawr (see :px-app:`gafaelfawr`).

Gafaelfawr calculates the quota for users and returns it via the ``/auth/api/v1/user-info`` REST API, which is used by Nublado (see :px-app:`nublado`) to limit the available notebook sizes and decide whether a user is allowed to spawn a notebook.
It also enforces API quotas directly, rejecting requests with an HTTP 429 error after the user has exceeded their quota.

See :doc:`/applications/gafaelfawr/quotas` for how to configure quotas.

See :doc:`/applications/nublado/block-spawns` for how to use quotas to block spawning of new notebooks.
1 change: 1 addition & 0 deletions docs/applications/gafaelfawr/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Guides

bootstrap
manage-schema
quotas
recreate-token
add-oidc-client
github-organizations
Expand Down
187 changes: 187 additions & 0 deletions docs/applications/gafaelfawr/quotas.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
##############
Setting quotas
##############

As part of its function as the authorization service for the Rubin Science Platform, Gafaelfawr also tracks user quotas and enforces API quotas.
Quotas are normally set in the Gafaelfawr :file:`values-{environment}.yaml` file for each environment.
They can also be temporarily overridden by using the Gafaelfawr REST API.

Types of quota
==============

Gafaelfawr tracks two types of quota for each user: API quotas and notebook quotas.
The notebook quotas are enforced by :px-app:`nublado` and only tracked in Gafaelfawr.

API quotas
----------

An API quota limits a user to a number of requests in each 15 minute interval.
After 15 minutes, the user's usage resets and they get their full quota again.

Every named service has a separate API quota.
This quota may not exist, in which case requests to that service are not rate limited.
All requests on behalf of a user count against that user's quota, whether they are made directly by the user or indirectly by another service on behalf of the user.

The scope of a "named service" for API quota purposes is the ``config.service`` key of a ``GafaelfawrIngress`` resource.
Every ``GafaelfawrIngress`` with the same ``config.service`` value consumes the API quota by the same name.

If a user is not subject to any quota for a particular service, no quota-related HTTP headers will be present in the response.
If a quota is in place, multiple ``X-RateLimit-*`` headers will be set.
See `the Gafaelfawr rate limit header documentation <https://gafaelfawr.lsst.io/user-guide/headers.html#rate-limit-headers>`__ for more details.
These headers are based on the rate limiting used by GitHub.

If the user exceeds their quota, subsequent requests will be rejected with an HTTP 429 response code.
That response will include the same ``X-RateLimit-*`` headers as well as the HTTP-standard ``Retry-After`` header, which specifies the time at which the user's quota will reset.

Notebook quotas
---------------

The user's notebook quota controls the maximum number of CPU equivalents and the maximum amount of memory that a user's notebook can use.
This, in turn, is used to filter the menu of sizes of notebooks that the user can request from the Nublado notebook spawning page.
Only sizes that fall below the user's quota will be available.

The size of the Nublado image is set as Kubernetes limits on the CPU and memory of the pod.
If the pod uses more CPU than its limit, it will be throttled.
If it uses more memory than its limit, the pod, and the user's notebook, will be terminated.
This is an abrupt experience, and it will usually not obvious to the user that this is why their notebook died.
Unfortunately, Kubernetes doesn't offer better options at present.

The requested CPU and memory, which Kubernetes uses for scheduling and node pool scale-up, are currently always set to 25% of the limits.

The notebook quota also includes a boolean flag, ``spawn``, which controls whether that user should be able to spawn new notebooks.
This flag is enforced by Nublado, not by Gafaelfawr.

Setting quotas
==============

Quotas are normally set via the ``config.quota`` key in :file:`values-{environment}.yaml` for Gafaelfawr in a given environment.
Default quotas that apply to every environment can be set in Gafaelfawr's :file:`values.yaml`.

There are three sections under the ``config.quota`` key.

Default quota
-------------

The default quota setting controls the quotas that all users get when there are no more specific rules (discussed below).
The ``api`` key should contain a mapping of service names to number of requests per 15 minutes.
The ``notebook`` key should contain ``cpu`` and ``memory`` keys specifying the default CPU and memory limits.
The memory limit is given in a floating point number of GiB.

For example:

.. code-block:: yaml
config:
quota:
default:
api:
datalinker: 1000
notebook:
cpu: 2.0
memory: 4.0
This sets a quota of 1000 requests per 15 minutes for the ``datalinker`` service, no quotas for any other API service, and a default limit of 2.0 CPU equivalents and 4.0 GiB of memory for notebooks.

The default quota for all API services not listed is unlimited.
To set a default quota of 0, explicitly list the API service with a quota of 0.

Group quota
-----------

Second, the ``groups`` key sets **additional** quota granted to members of specific groups.
The quota for every group of which the user is a member is added to the default quota.
For example:

.. code-block:: yaml
config:
quota:
groups:
g_developers:
api:
datalinker: 500
notebook:
cpu: 0.0
memory: 4.0
If this were combined with the above default quota, members of the ``g_developers`` group would receive a total of 1500 requests per 15 minutes for datalinker, and a total of 8.0 GiB of memory for notebooks.
The CPU quota for notebooks would be unchanged.

Normally, the group quota can only add to the individual quota.
There are two exceptions: the ``spawn`` flag for notebooks, and any API quotas for services that have no default quotas.
Consider the following addditional configuration:

.. code-block:: yaml
config:
quota:
groups:
g_limited:
api:
tap: 1000
notebook:
cpu: 0.0
memory: 0.0
spawn: false
If combined with the previous default configuration, members of the ``g_limited`` group will have a quota of 1000 requests per 15 minutes to the tap service.
Users who are not a member of that group will continue to have unlimited access to the tap service.
Also, members of the ``g_limited`` group will not be allowed to spawn new notebooks, because their ``spawn`` flag is set to false instead of the default of true.
Note that ``cpu`` and ``memory`` are also set because they are required fields, but are set to 0.0 so they don't add anything to the quota.

Bypas groups
------------

Finally, some groups can be allowed to bypass all quota limits.
This is done with the ``bypass`` key.

.. code-block:: yaml
config:
quota:
bypass:
- "g_admins"
All members of any group listed under ``bypass`` will ignore all quota restrictions, including the ``spawn`` flag for notebook quotas.

Overriding quotas
=================

Finally, Gafaelfawr supports temporary quota overrides.
This is done via the following REST API:

``GET /auth/api/v1/quota-overrides``
Retrieves the current quota overrides in JSON format.
Returns 404 if there are no quota overrides.

``PUT /auth/api/v1/quota-overrides``
Creates or replaces the quota overrides.
The body should be in JSON format.
There is no ``PATCH`` API; the complete override configuration has to be provided.

``DELETE /auth/api/v1/quota-overrides``
Delete the quota overrides.
Returns 404 if there are no quota overrides and 204 on success.

These routes require a token with ``admin:token`` scope.
You can create such a token via the token drop-down from the front page of a Phalanx installation that uses :px-app:`squareone`.

The body sent via ``PUT`` and returned by ``GET`` is the same format as the ``config.quota`` key for the Gafaelfawr configuration except in JSON format.
The following :command:`curl` command template may be useful for setting the quota overrides:

.. prompt:: bash

curl -X PUT -H 'Authorization: bearer <token>' \
-H 'Content-Type: application/json' \
-d '<json>' \
https://<base-url>/auth/api/v1/quota-overrides

Replace ``<token>`` with your ``admin:token`` token, ``<json>`` with the content of the quota override, and ``<base-url>`` with the base URL of the Phalanx environment.

Quota overrides, unlike group quotas, are not additive.
Instead, they override (as in the name) any quota from the default or group sections.
If the quota override configuration generates a notebook quota or an API quota for a particular service, the default and group quota information for notebooks or that API are ignored completely.
Otherwise, the normal quota default and group quota information applies.

One common reason to set a quota override is to temporarily block notebook spawns.
That use is described in more detail at :doc:`/applications/nublado/block-spawns`.

0 comments on commit 048c416

Please sign in to comment.