From 22b389641f7ef5df4ec0ae6957cf3fddaea2f573 Mon Sep 17 00:00:00 2001
From: Jay Pipes <jaypipes@gmail.com>
Date: Mon, 12 Mar 2018 14:16:59 -0400
Subject: [PATCH] Support initial allocation ratios

Provide separate CONF options for specifying the initial allocation
ratio for compute nodes. Change the default values for
CONF.xxx_allocation_ratio options to None and change the behaviour of
the resource tracker to only override allocation ratios for *existing*
compute nodes if the CONF.xxx_allocation_ratio value is not None.

Change-Id: I5e6cf306dcac71f78f89d90a14ecc3bbbd7e0f42
blueprint: initial-allocation-ratios
---
 .../approved/initial-allocation-ratios.rst    | 313 ++++++++++++++++++
 1 file changed, 313 insertions(+)
 create mode 100644 specs/stein/approved/initial-allocation-ratios.rst

diff --git a/specs/stein/approved/initial-allocation-ratios.rst b/specs/stein/approved/initial-allocation-ratios.rst
new file mode 100644
index 000000000..a280db119
--- /dev/null
+++ b/specs/stein/approved/initial-allocation-ratios.rst
@@ -0,0 +1,313 @@
+..
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+======================================
+Default allocation ratio configuration
+======================================
+
+https://blueprints.launchpad.net/nova/+spec/initial-allocation-ratios
+
+Provide separate CONF options for specifying the initial allocation
+ratio for compute nodes. Change the default values for
+CONF.xxx_allocation_ratio options to None and change the behaviour of
+the resource tracker to only override allocation ratios for *existing*
+compute nodes if the CONF.xxx_allocation_ratio value is not None.
+
+The primary goal of this feature is to support both the API and config way to
+pass allocation ratios.
+
+Problem description
+===================
+
+Manually set placement allocation ratios are overwritten
+--------------------------------------------------------------------
+
+There is currently no way for an admin to set the allocation ratio on an
+individual compute node resource provider's inventory record in the placement
+API without the resource tracker eventually overwriting that value the next
+time it runs the ``update_available_resources`` periodic task on the
+``nova-compute`` service.
+
+The saga of the allocation ratio values on the compute host
+-----------------------------------------------------------
+
+The process by which nova determines the allocation ratio for CPU, RAM and disk
+resources on a hypervisor is confusing and `error`_ `prone`_. The
+``compute_nodes`` table in the nova cell DB contains three fields representing
+the allocation ratio for CPU, RAM and disk resources on that hypervisor. These
+fields are populated using different default values depending on the version of
+nova running on the ``nova-compute`` service.
+
+.. _error: https://bugs.launchpad.net/nova/+bug/1742747
+.. _prone: https://bugs.launchpad.net/nova/+bug/1789654
+
+Upon starting up, the resource tracker in the ``nova-compute`` service worker
+`checks`_ to see if a record exists in the ``compute_nodes`` table of the nova
+cell DB for itself. If it does not find one, the resource tracker `creates`_ a
+record in the table, `setting`_ the associated allocation ratio values in the
+``compute_nodes`` table to the value it finds in the ``cpu_allocation_ratio``,
+``ram_allocation_ratio`` and ``disk_allocation_ratio`` nova.conf configuration
+options but only if the config option value is not equal to 0.0.
+
+.. _checks: https://github.com/openstack/nova/blob/852de1e/nova/compute/resource_tracker.py#L566
+.. _creates: https://github.com/openstack/nova/blob/852de1e/nova/compute/resource_tracker.py#L577-L590
+.. _setting: https://github.com/openstack/nova/blob/6a68f9140/nova/compute/resource_tracker.py#L621-L645
+
+The default values of the ``cpu_allocation_ratio``, ``ram_allocation_ratio``
+and ``disk_allocation_ratio`` CONF options is `currently set`_ to ``0.0``.
+
+.. _currently set: https://github.com/openstack/nova/blob/852de1e/nova/conf/compute.py#L400
+
+The resource tracker saves these default ``0.0`` values to the
+``compute_nodes`` table when the resource tracker calls ``save()`` on the
+compute node object. However, there is `code`_ in the
+``ComputeNode._from_db_obj`` that, upon **reading** the record back from the
+database on first save, changes the values from ``0.0`` to ``16.0``, ``1.5`` or
+``1.0``.
+
+.. _code: https://github.com/openstack/nova/blob/852de1e/nova/objects/compute_node.py#L177-L207
+
+The ``ComputeNode`` object that was ``save()``'d by the resource tracker has
+these new values for some period of time while the record in the
+``compute_nodes`` table continues to have the wrong ``0.0`` values. When the
+resource tracker runs its ``update_available_resource()`` next perioidic task,
+the new ``16.0``/``1.5``/``1.0`` values are then saved to the compute nodes
+table.
+
+There is a `fix`_ for `bug/1789654`_, which is to not persist
+zero allocation ratios in ResourceTracker to avoid initializing placement
+allocation_ratio with 0.0 (due to the allocation ratio of 0.0 being multiplied
+by the total amount in inventory, leading to 0 resources shown on the system).
+
+.. _fix: https://review.openstack.org/#/c/598365/
+.. _bug/1789654: https://bugs.launchpad.net/nova/+bug/1789654
+
+Use Cases
+---------
+
+An administrator would like to set allocation ratios for individual resources
+on a compute node via the placement API *without that value being overwritten*
+by the compute node's resource tracker.
+
+An administrator chooses to only use the configuration file to set allocation
+ratio overrides on their compute nodes and does not want to use the placement
+API to set these ratios.
+
+Proposed change
+===============
+
+First, we propose to change the default option values of existing
+``CONF.cpu_allocation_ratio``, ``CONF.ram_allocation_ratio`` and
+``CONF.disk_allocation_ratio`` options relating to allocation ratios to
+``None`` from the existing default values of ``0.0``. The reason we change
+it is that this value will be change from ``0.0`` to ``16.0``, ``1.5`` or
+``1.0`` later, which is weird and confusing.
+
+We will also change the resource tracker to **only** overwrite the compute
+node's allocation ratios to the value of the ``cpu_allocation_ratio``,
+``ram_allocation_ratio`` and ``disk_allocation_ratio`` CONF options **if the
+value of these options is NOT ``None``**.
+
+In other words, if any of these CONF options is set to something *other than*
+``None``, then the CONF option should be considered the complete override value
+for that resource class' allocation ratio. Even if an admin manually adjusts
+the allocation ratio of the resource class in the placement API, the next time
+the ``update_available_resource()`` periodic task runs, it will be overwritten
+to the value of the CONF option.
+
+Second, we propose to add 3 new nova.conf configuration options:
+
+* ``initial_cpu_allocation_ratio``
+* ``initial_ram_allocation_ratio``
+* ``initial_disk_allocation_ratio``
+
+That will used to determine how to set the *initial* allocation ratio of
+``VCPU``, ``MEMORY_MB`` and ``DISK_GB`` resource classes when a compute worker
+first starts up and creates its compute node record in the nova cell DB and
+corresponding inventory records in the placement service. The value of these
+new configuration options will only be used if the compute service's resource
+tracker is not able to find a record in the placement service for the compute
+node the resource tracker is managing.
+
+The default value of each of these CONF options shall be ``16.0``, ``1.5``, and
+``1.0`` respectively. This is to match the default values for the original
+allocation ratio CONF options before they were set to ``0.0``.
+
+These new ``initial_xxx_allocation_ratio`` CONF options shall **ONLY** be used
+if the resource tracker detects no existing record in the ``compute_nodes``
+nova cell DB for that hypervisor.
+
+Finally, we will need also add an online data migration and continue to read
+the ``xxx_allocation_ratio`` or ``initial_xxx_allocation_ratio`` config on
+read from the DB if the values are ``0.0`` or ``None``. If it's an existing
+record with 0.0 values, we'd want to do what the compute does, which is use
+the configure ``xxx_allocation_ratio`` config if it's not None, and fallback
+to using the ``initial_xxx_allocation_ratio`` otherwise.
+
+And add an online data migration that updates all compute_nodes
+table records that have ``0.0`` or ``None`` allocation ratios. Then we drop
+that at some point with a blocker migration and remove the code in the
+``nova.objects.ComputeNode._from_db_obj`` that adjusts allocation ratios.
+
+We propose to add a nova-status upgrade check to iterate the cells looking
+for compute_nodes records with ``0.0`` or ``None`` allocation ratios and signal
+that as a warning that you haven't done the online data migration. We could
+also check the conf options to see if they are explicitly set to 0.0 and if
+so, we should fail the status check.
+
+Alternatives
+------------
+
+None
+
+Data model impact
+-----------------
+
+None
+
+REST API impact
+---------------
+
+None
+
+Security impact
+---------------
+
+None
+
+Notifications impact
+--------------------
+
+None
+
+Other end user impact
+---------------------
+
+None
+
+Performance Impact
+------------------
+
+None
+
+Other deployer impact
+---------------------
+
+None
+
+Developer impact
+----------------
+
+None
+
+Upgrade impact
+--------------
+
+We need an online data migrations for any compute_nodes with existing ``0.0``
+and ``None`` allocation ratio. If it's an existing record with 0.0 values, we
+will replace it with the configure ``xxx_allocation_ratio`` config if it's not
+None, and fallback to using the ``initial_xxx_allocation_ratio`` otherwise.
+
+.. note:: Migrating 0.0 allocation ratios from existing ``compute_nodes`` table
+   records is necessary because the ComputeNode object based on those table
+   records is what gets used in the scheduler [1]_, specifically the
+   ``NUMATopologyFilter`` and ``CPUWeigher`` (the ``CoreFilter``,
+   ``DiskFilter`` and ``RamFilter`` also use them but those filters are
+   deprecated for removal so they are not a concern here).
+
+And clearly in order to take advantage of the ability to manually set
+allocation ratios on a compute node, that hypervisor would need to be upgraded.
+No impact to old compute hosts.
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+Primary assignee:
+  yikun
+
+Work Items
+----------
+
+* Change the default values for ``CONF.xxx_allocation_ratio`` options to
+  ``None``.
+* Modify resource tracker to only set allocation ratios on the compute node
+  object when the CONF options are non- ``None``
+* Add new ``initial_xxx_allocation_ratio`` CONF options and modify resource
+  tracker's initial compute node creation to use these values
+* Remove code in the ``ComputeNode._from_db_obj()`` that changes allocation
+  ratio values
+* Add a db online migration to process all compute_nodes with existing ``0.0``
+  and ``None`` allocation ratio.
+* Add a nova-status upgrade check for ``0.0`` or ``None`` allocation ratio.
+
+Dependencies
+============
+
+None
+
+Testing
+=======
+
+No extraordinary testing outside normal unit and functional testing
+
+Documentation Impact
+====================
+
+A release note explaining the use of the new ``initial_xxx_allocation_ratio``
+CONF options should be created along with a more detailed doc in the admin
+guide explaining the following primary scenarios:
+
+* When the deployer wants to **ALWAYS** set an override value for a resource on
+  a compute node. This is where the deployer would ensure that the
+  ``cpu_allocation_ratio``, ``ram_allocation_ratio`` and
+  ``disk_allocation_ratio`` CONF options were set to a non- ``None`` value.
+* When the deployer wants to set an **INITIAL** value for a compute node's
+  allocation ratio but wants to allow an admin to adjust this afterwards
+  without making any CONF file changes. This scenario uses the new
+  ``initial_xxx_allocation_ratios`` for the initial ratio values and then shows
+  the deployer using the osc placement commands to manually set an allocation
+  ratio for a resource class on a resource provider.
+* When the deployer wants to **ALWAYS** use the placement API to set allocation
+  ratios, then the deployer should ensure that ``CONF.xxx_allocation_ratio``
+  options are all set to ``None`` and the deployer should issue Placement
+  REST API calls to
+  ``PUT /resource_providers/{uuid}/inventories/{resource_class}`` [2]_ or
+  ``PUT /resource_providers/{uuid}/inventories`` [3]_ to set the allocation
+  ratios of their resources as needed (or use the related ``osc-placement``
+  plugin commands [4]_).
+
+References
+==========
+
+.. [1] https://github.com/openstack/nova/blob/a534ccc5a7/nova/scheduler/host_manager.py#L255
+.. [2] https://developer.openstack.org/api-ref/placement/#update-resource-provider-inventory
+.. [3] https://developer.openstack.org/api-ref/placement/#update-resource-provider-inventories
+.. [4] https://docs.openstack.org/osc-placement/latest/
+
+Nova Stein PTG discussion:
+
+* https://etherpad.openstack.org/p/nova-ptg-stein
+
+Bugs:
+
+* https://bugs.launchpad.net/nova/+bug/1742747
+* https://bugs.launchpad.net/nova/+bug/1729621
+* https://bugs.launchpad.net/nova/+bug/1739349
+* https://bugs.launchpad.net/nova/+bug/1789654
+
+History
+=======
+
+.. list-table:: Revisions
+   :header-rows: 1
+
+   * - Release Name
+     - Description
+   * - Stein
+     - Proposed