Merge "Proposal to count resources to check quota in API for cells"

This commit is contained in:
Jenkins 2016-11-18 11:03:36 +00:00 committed by Gerrit Code Review
commit de208254e8
1 changed files with 231 additions and 0 deletions

View File

@ -0,0 +1,231 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
===============================================
Count resources to check quota in API for cells
===============================================
https://blueprints.launchpad.net/nova/+spec/cells-count-resources-to-check-quota-in-api
For cellsv2, quota tables are moving to the API database as data global to
a deployment. Currently, for instance delete, quota reservations are made in
the API and then committed in compute. This is a disconnect which couples
compute cells with the API cell. In cellsv2, we endeavor to decouple compute
cells from the API cell as much as possible -- ideally, cells should not
need to have the API database connection in their configuration.
We propose a new approach of counting consumed resources and checking the
count against the quota limits in the API instead of the current reserve/commit
model where a reservation record is created, quota usage records are created
and marked as "in_use" when they are committed, and the reservarion record
deleted.
Problem description
===================
The current quota design consists of reservations and commits/rollbacks. A
simplified explanation of how it works during a create is: "reserve" creates a
reservation record and a usage record indicating resources are "reserved."
"Commit" updates the usage record to modify the "reserved" field, the "in use"
field, and deletes the reservation record. "Rollback" updates the usage record
to modify the "reserved" field and deletes the reservation record.
For instance delete, resources are first reserved in the API when a request is
received and then the reservation is later committed in compute when the
resources are freed. In cellsv2, this means compute cells will write to the API
database for the quota commit. By counting resources in the API to check quota,
we can reduce [*]_ the need for compute cells to write to the API database.
At least, we will eliminate the situation where a quota reserve/commit is split
across the API cell and compute cells.
.. [*] Quota reads and writes cannot be completely eliminated in compute cells
in a special case: nova-compute de/allocating fixed IPs from
nova-network during de/provisioning. This special case can be removed
when nova-network is fully removed.
Use Cases
---------
* Operators want to partition their deployments into cells for scaling, failure
domain, and buildout reasons. When partitioned, coupling between the API cell
and compute cells should be minimized.
Proposed change
===============
Consumed resources will be counted to check quota instead of the current
reserve/commit/rollback model.
* "Reserve," "commit," and "rollback" calls will be removed everywhere.
* "Reserve" calls will be replaced with something like "check_resource_count"
which will query the databases for consumed resources, count them, and raise
OverQuota if quota limits for the project can't accomodate the request.
Alternatives
------------
The initial proposal for this work was to commit quota immediately in the API
wherever possible and is an alternative approach to this one. The drawback to
committing quota immediately in the API is that it can't be entirely avoided
for a failed resize scenario. If a resize fails, resource consumption must
be updated accordingly in the quota_usages records whereas with a resource
counting approach, no such update would be needed.
Data model impact
-----------------
None
REST API impact
---------------
None
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
With the resource counting approach, it will be possible for a project to
consume more resources than they have quota if they are racing near the end
of their quota limits. This is because we must aggregate consumed resources
across instances in separate databases. So it would be possible for a quota
check Y to pass at the API and shortly after a racing request X also passed
quota check will have consumed the remaining resources allowed for the project,
and then request Y will consume more resources than the quota afterward.
Performance Impact
------------------
Performance will be adversely affected in the case of counting resources such
as cores and ram. This is because there is currently no project association
stored in the allocations records at present. In the future, we will be able
to query the placement API once it has more data and we can do an efficient
query through it. Until then, to count cores and ram, the following approach
is required:
* Get all instances by project per cell, parsing the flavor JSON blobs and
adding up the counts. For example::
instance_get_all_by_filters(filters={'project_id': myproj},
expected_attrs=['flavor'])
All other resources should be able to be counted in one step:
* instances: this can be obtained from instance_mappings table in API DB.
We may be able to create a tally from the aforementioned cores/ram query
and use that instead of doing a new query of instance_mappings.
* security_groups: deprecated in 2.36 and not checked in Nova with Neutron.
This is checked in the API with nova-network. security_groups are in the
cell database so this would be a cell DB read from the API to check.
* floating_ips: deprecated in 2.36 and not checked in Nova with Neutron. This
is checked when auto_assign_floating_ip allocates a floating ip with
nova-network. floating_ips are in the cell database so this would be a
local DB read until nova-network is removed.
* fixed_ips: not checked in Nova with Neutron. This is checked when
nova-compute de/allocates a fixed_ip with nova-network. fixed_ips are in
the cell database so this would be a local DB read until nova-network is
removed.
* metadata_items: this is a limit on allowed number of image metadata items
and is checked when image metadata requests come in. No counting of
resources is necessary.
* injected_files: Similar to metadata_items.
* injected_file_content_bytes: Similar to metadata_items.
* injected_file_path_bytes: Similar to metadata_items.
* security_group_rules: Similar to security_groups.
* key_pairs: this can be obtained from key_pairs table in API DB.
* server_groups: this can be obtained from instance_groups table in API DB.
* server_group_members: this can be obtained from instance_group_member table
in API DB.
Other deployer impact
---------------------
None
Developer impact
----------------
Nova developers will no longer call quota "reserve," "commit," or "rollback."
Instead, they will call quota "check_resource_count" or similar when adding a
new API which will consume quota.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
melwitt
Other contributors:
None
Work Items
----------
* Add a method in nova/objects/quota.py called check_resource_count that counts
consumed resources and raises OverQuota if the request would go over quota
limits.
* Remove reserve/commit/rollback everywhere.
* Mark "reserve," "commit," and "rollback" methods as DEPRECATED in the
docstrings to prevent their further use.
Dependencies
============
None
Testing
=======
New unit tests will be added to cover the new resource counting scenarios.
For the most part, this work should be transparent to end-users, so the
existing suite of unit, functional, and integration tests should suffice
for testing what is proposed.
There is an outstanding review for a regression test for the "quota out of
sync" bug that could be used to verify this proposal solves that problem
as a side effect.
Documentation Impact
====================
None
References
==========
None
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Ocata
- Introduced