From b4db4ea9a3b500f658ac561cad8e240afcbbd06d Mon Sep 17 00:00:00 2001
From: Dan Smith <dansmith@redhat.com>
Date: Wed, 14 Feb 2018 10:11:19 -0800
Subject: [PATCH] Add placement-req-filter spec

Change-Id: I0d874e56cfe50d21410e01c85e1fbd67bd32278f
---
 specs/rocky/approved/placement-req-filter.rst | 236 ++++++++++++++++++
 1 file changed, 236 insertions(+)
 create mode 100644 specs/rocky/approved/placement-req-filter.rst

diff --git a/specs/rocky/approved/placement-req-filter.rst b/specs/rocky/approved/placement-req-filter.rst
new file mode 100644
index 000000000..dd31302a7
--- /dev/null
+++ b/specs/rocky/approved/placement-req-filter.rst
@@ -0,0 +1,236 @@
+..
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+===========================
+Placement Request Filtering
+===========================
+
+https://blueprints.launchpad.net/nova/+spec/placement-req-filter
+
+As we move to having the scheduler rely on placement for providing the
+initial host list, we discover other use cases and edge scenarios
+where this may not be as efficient as we hope. With the goal of
+getting cellsv1 users converted over to cellsv2, we also have to
+consider deployment layouts that are in place that may be hard to
+change, or have other benefits to very large users.
+
+This spec specifically addresses one such concern of existing cellsv1
+users, but represents a class of problems that center around a need to
+make a more efficient request to placement than one purely based on
+the resources and traits implied by the flavor the user has
+chosen. Thus while a solution to this single problem is the goal of
+the implementation described here, we aim to provide a generic
+mechanism for solving those other problems along the way.
+
+Problem description
+===================
+
+With cellsv1, some deployments use the top-level filtering cell
+scheduler to pre-select a cell based soley on the tenant of the
+user making the request. This then limits the amount of work
+(i.e. hosts that must be filtered) that the scheduler within the cell
+must do in order to make a selection. With the global nature of
+cellsv2's scheduler and placement scope, this is not currently
+possible. Thus, for a cloud with a large amount of free space, a
+modest request that previously only considered ~200 hosts within a
+cell (due to pre-selection of a cell by tenant) may now have to filter
+many thousands of hosts in order to make a selection, most of which
+are categorically not valid based on the tenant mapping.
+
+Use Cases
+---------
+
+As a deployer, I wish to segregate users into cells for technical,
+security, or budget reasons and need efficient scheduling of the
+resources within those cells.
+
+Proposed change
+===============
+
+This spec aims to add a small and lightweight mechanism to the early
+phase of the scheduling process, where the request to placement is
+formed from things like the flavor selected by the user. It should
+provide us a way to opt in to certain behaviors, represented by simple
+modular transformations made to the `RequestSpec` object before we
+make the request to placement.
+
+These modules will be called "request filters" and will perform
+transformations on the `RequestSpec` object. They will be enabled
+initially through dedicated configuration variables (ideally boolean)
+in the short term for the sake of simplicity. As we grow more of
+these, it may make sense to enable a list-of-request-filters sort of
+configuration paradigm, like our existing scheduler filters.
+
+For the tenant-to-cell limiting functionality, a single new request
+filter will be provided and enabled by a single boolean configuration
+knob in the ``scheduler`` group. When enabled, this filter will:
+
+#. Look for host aggregates with metadata items of
+   `filter_tenant_id=$tenant`, for the tenant id making the request
+#. Augment the `RequestSpec` object to indicate that the result
+   should be limited to the matching aggregates
+#. Fail if no aggregates match
+
+This depends on placement aggregates overlaying with host aggregates
+configured with this key. Mirroring of those aggregates has been
+planned to happen automatically in nova, but this functionality will
+work with manual aggregate setup until that point and would only be
+required by deployers wishing to use this feature.
+
+To make this work, we will need to extend the
+``RequestSpec.destination`` to contain an ``aggregates`` field, peer
+to the ``host``, ``node``, and ``cell`` limits already present. The
+``get_allocation_candidates()`` scheduler client method will also need
+to consider those aggregates and pass the UUIDs to placement,
+indicating that the resulting nodes must be members of one of those
+aggregates. The aggregate metadata key used here
+(``filter_tenant_id``) is the same as the one used by the
+AggregateMultiTenancyIsolation scheduler filter to accomplish the same
+thing via filtering. As such, existing users of that filter will be
+able to easily convert to this request filter approach which will be
+more efficient as well.
+
+Alternatives
+------------
+
+We could build knowledge of tenant affinity into placement
+itself. This would require a less generic change to the API, as well
+as require another purpose-built change for the next thing we need
+along these lines.
+
+We could not provide a mechanism for this sort of filtering. This may
+result in cellsv1 users not migrating to cellsv2, hamper cellsv2
+adoption in general, or worst-case cause cellsv1 users to migrate away
+from nova.
+
+We could require that deployers handle this by assigning private
+flavors with trait requirements to control scheduling. This would
+result in a flavor explosion (aka `The Skittles(tm) Effect`) for cases
+like the one driving this, where all tenants would need their own
+flavors.
+
+We could also take the approach of a request filter, but instead of
+mapping tenants to host/placement aggregates, simply map them to a
+an auto-generated trait with the tenant id in the name. This approach
+would require a lot more trait churn on hosts when changing the
+boundaries, but may be a very viable option for other use-cases of the
+request filter pattern.
+
+Data model impact
+-----------------
+
+The `Destination` object (stored in the `RequestSpec`) will need to
+gain an `AggregateList` field. Besides this, no other data model
+changes will be required (`Aggregate` already has metadata for us to
+use).
+
+REST API impact
+---------------
+
+The Nova REST API will not be changed. The placement API will need to
+provide for aggregates to be specified in the ``allocation_candidates``
+query, which will be handled as part of a different spec.
+
+Security impact
+---------------
+
+None.
+
+Notifications impact
+--------------------
+
+None.
+
+Other end user impact
+---------------------
+
+None.
+
+Performance Impact
+------------------
+
+Large performance gain in some circumstances, derived from the ability
+to consider smaller groups of hosts during scheduling.
+
+Other deployer impact
+---------------------
+
+No impact to deployers not choosing to enable the
+functionality. Direct impact to deployers that need to be able to
+isolate tenants into cells (or other aggregates).
+
+Developer impact
+----------------
+
+None.
+
+Upgrade impact
+--------------
+
+None other than the usual placement-before-nova requirement when we
+add something to placement that nova depends on.
+
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+Primary assignee:
+  danms
+
+Work Items
+----------
+
+#. Add `AggregateList` to `Destination` object
+#. Add a query method to AggregateList that allows filtering by key
+   and value
+#. Make scheduler request to placement include aggregate members
+#. Add a lightweight request filter mechanism
+#. Add a request filter that does the tenant-to-aggregate mapping operation
+
+
+Dependencies
+============
+
+* This will require adding aggregate membership to
+  the `allocation_candidates` API, which is covered by:
+  https://blueprints.launchpad.net/nova/+spec/alloc-candidates-member-of
+* While not a hard dependency, this will be more automatic with
+  mirroring of host aggregates into placement, which is covered by:
+  https://blueprints.launchpad.net/nova/+spec/placement-mirror-host-aggregates
+
+
+Testing
+=======
+
+* Unit and functional tests for the filter mechanism, filter itself,
+  and the scheduler-to-placement API changes are simple
+
+Documentation Impact
+====================
+
+* Compute scheduler admin guide updates to describe the setup and use
+  of this feature
+
+References
+==========
+
+* Discussion with CERN folks about their requirements for moving from
+  cellsv1: http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-02-14.log.html#t2018-02-14T15:41:34
+
+
+History
+=======
+
+.. list-table:: Revisions
+   :header-rows: 1
+
+   * - Release Name
+     - Description
+   * - Rocky
+     - Introduced