Add specification for list pagination

Related-to: blueprint pagination-of-resources
Change-Id: Ia79276b0278c04ed7c404c74ba90b84e1d7600a0
This commit is contained in:
Ian Cordasco 2016-12-21 11:40:45 -06:00
parent da0aacda85
commit 0cc403fec2
1 changed files with 336 additions and 0 deletions

View File

@ -0,0 +1,336 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
==============================
Pagination of List Resources
==============================
https://blueprints.launchpad.net/craton/+spec/pagination-of-resources
Craton is intended to manage large quantities of devices and other objects
without sacrificing performance. Craton needs to add pagination support in
order to efficiently handle queries on large collections.
Problem description
===================
In the current implementation, a request to one of our collection resources
will attempt to return all of the values that can be returned (based on
authentication, etc.). For example, if a user and project have access to 5000
hosts then making a ``GET`` request against ``/v1/hosts`` would return all
5000. Such large result sets can and likely will slow down Craton's response
times and make it unusable.
Proposed change
===============
We propose adding pagination query parameters to all collection endpoints. The
new parameters would assume defaults if the user does not include them.
We specifically propose that:
#. Craton choose a default page size of 30 and limit it to being at least 10
items and at most 100 items,
#. Craton choose to make the next page both discoverable *and* calculable. In
other words, using "link" hypermedia relations in a response to indicate
first, previous, next, and last page URLs that are generated by the server
for the client,
#. Craton should assume the defaults for requests that have no query
parameters. For example, if someone makes a ``GET`` request to
``/v1/hosts`` it would imply an original page size of 30 and that the first
30 results should be returned.
To provide pagination to users, it is suggested that we use ``limit`` and
``marker`` parameters to indicate the page size and last seen ID. This allows
users to begin pagination after an item, rather than at a particular page. For
example, if a user is checking for new hosts in the listing and they know the
ID of the last host they encountered they can provide ``marker=:id&limit=30``
to get the newer hosts. If instead, we used ``page`` and ``per_page`` there's
the possibility they'd miss items since hosts may have been deleted changing
the page number of the last host.
This implies that the default ``limit`` value would be 30 and the default
``marker`` would be null (to indicate that no last ID is seen).
This combination of parameters is practically the standard in OpenStack.
Operators familiar with OpenStack's existing Compute, Images, etc. APIs
will be familiar with these parameters.
In addition to pagination parameters, this spec proposes adding link relations
in the Response body - as defined by JSON Hyper-Schema and `favored by the API
WG`_
This makes API usage easier for everyone, including, people using the API
directly and people writing API wrappers such as python-cratonclient. This
does, however, have the downside of affecting our response bodies and JSON
Schema
Finally, I'd like to strongly propose that we include these links in each
response. Which relation types we include would depend on where in the
pagination the user is, but it would do something like this:
#. Include a ``self`` relation for every page that tells the user exactly what
page they're presently on.
#. If there is a page prior to the current one, we would include the ``prev``
**and** ``first`` relations. These tell the user what the previous page is
and what the first page is.
#. If there is a page after the current one, we would include the ``next``
**and** ``last`` relations. These are the opposites to ``prev`` and
``first`` respectively.
It is worth noting that without properly implemented caching the ``last``
relation, it could become computationally expensive to calculate for every
pagination query.
Alternatives
------------
Alternative query parameters to ``limit`` and ``marker`` are:
#. Use ``page`` and ``per_page`` parameters to indicate the 1-indexed "page
number" and number of items on each page respectively. This means that
users can change how many items they get on each page request and can
resume in arbitrary places by specifying the ``page`` parameter.
This would imply that the default ``page`` value would be 1 and the default
``per_page`` would be 30.
These two parameters are presently used by a significant number of large
APIs at the moment but are not common in OpenStack itself. They provide
simplicity in that if the API user wants to, they can just constantly
increment the page number to get the next page in the simplest way possible.
They don't have to calculate the next value from a combination of values in
the response of the last request.
This does, however, prevent users from being able to resume iteration from
the last item it received in a list. Further, this adds the potential that
users may miss objects due to deletions or other changes in the
corresponding collection. Finally, these parameters only provide users an
opaque idea as to where in a paginated resource they are and how to resume
pagination.
#. Use ``limit`` and ``offset`` parameters to provide similar functionality
and opacity to ``per_page`` and ``page`` respectively.
The default ``limit`` would, again, be 30 and the default ``offset`` would
be 0.
This combination of parameters is also present in a small number of
OpenStack projects but has some of the same negative implications as the
``page`` and ``per_page`` parameters when compared to ``limit`` and
``marker``.
An alternative way to provide pagination links are:
#. Link headers - as defined in :rfc:`6903` - using Relation Types defined in
:rfc:`5988`.
These are also commonly used outside of OpenStack and were popular to the
creation of including the relations in the response body. The benefit to
Craton of using this method is that it doesn't effect our JSON Schema or
existing Response bodies. A major problem with this approach is that a
relation type can be repeated in a Link header. However, the HTTP library
used by the majority of the Python world - Requests - does not parse such
links correctly. Further, widespread support for parsing these header
values is not known to the author of this specification.
Data model impact
-----------------
This should have **no** impact on our data model.
REST API impact
---------------
This specification will have two impacts on our REST API:
#. It will add ``limit`` and ``marker`` query parameters that are identical to
a number of existing and future endpoints.
#. It will change the fundamental structure of our list responses in order to
accommodate the link relations.
At the moment, for example, a ``GET`` request made to ``/v1/hosts`` has a
response body that looks like:
.. code-block:: json
[
{
"active": true,
"cell_id": null,
"device_type": "Computer",
"id": 1,
"ip_address": "12.12.12.15",
"name": "foo2Host",
"note": null,
"parent_id": null,
"region_id": 1
},
{
"active": true,
"cell_id": null,
"device_type": "Phone",
"id": 2,
"ip_address": "11.11.11.14",
"name": "fooHost",
"note": null,
"parent_id": null,
"region_id": 1
}
]
This would need to transform to
.. code-block:: json
{
"items": [
{
"active": true,
"cell_id": null,
"device_type": "Computer",
"id": 1,
"ip_address": "12.12.12.15",
"name": "foo2Host",
"note": null,
"parent_id": null,
"region_id": 1
},
{
"active": true,
"cell_id": null,
"device_type": "Phone",
"id": 2,
"ip_address": "11.11.11.14",
"name": "fooHost",
"note": null,
"parent_id": null,
"region_id": 1
}
],
"links": [
{
"rel": "first",
"href": "https://craton.environment.com/v1/hosts?limit=30"
},
{
"rel": "next",
"href": "https://craton.environment.com/v1/hosts?limit=30&marker=2"
},
{
"rel": "self",
"href": "https://craton.environment.com/v1/hosts?limit=30&marker=1"
}
]
}
Security impact
---------------
Pagination suppport reduces the potential attack surface for denial of service
attacks aimed at Craton. It alone, however, is not sufficient to prevent DoS
attacks and additional measures should be taken by deployers to further
mitigate those possibilities.
Notifications impact
--------------------
Craton does not yet have notifications.
Other end user impact
---------------------
This will have a minor affect on python-cratonclient. The ``list`` calls it
implements will need to become smarter so they can handle pagination for the
user automatically.
Performance Impact
------------------
There should not be any performance impact on the service created by this code
although it will frequently be called.
Other deployer impact
---------------------
None
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
- icordasc
Other contributors:
- None
Work Items
----------
- Add basic pagination support with tests to ensure that functionality works
independent of the other features proposed in this specification
- Add link relation support to response bodies
Dependencies
============
N/A
Testing
=======
This should be tested on different levels, but at a minimum on a functional
level.
Documentation Impact
====================
This will impact our API reference documentation
References
==========
* `IANA Link Relations Registry`_
* :rfc:`5988`
* :rfc:`6903`
* `JSON Hyper-Schema`_
* `"Pagination, Filtering, and Sorting" by the OpenStack API WG`_
.. _favored by the API WG:
http://specs.openstack.org/openstack/api-wg/guidelines/links.html
.. _IANA Link Relations Registry:
https://www.iana.org/assignments/link-relations/link-relations.xhtml
.. _JSON Hyper-Schema:
http://json-schema.org/latest/json-schema-hypermedia.html
.. _"Pagination, Filtering, and Sorting" by the OpenStack API WG:
http://specs.openstack.org/openstack/api-wg/guidelines/pagination_filter_sort.html