Address follow-on comments in strict-two-level spec

This change addresses concerns raised in the original review of the strict two level enforcement model: Ibfb2ba2ffb0115fa7cf81d30bf9a025652d9ba42 bp strict-two-level-enforcement-model Change-Id: I7190443de8be189eac06a6f01f99a5e5bfabbbc9
2018-06-07 15:13:31 +00:00 · 2018-06-07 15:13:31 +00:00 · 3cdec964e2
parent d4e19f17ff
commit 3cdec964e2
1 changed files with 84 additions and 48 deletions
--- a/specs/keystone/rocky/strict-two-level-enforcement-model.rst
+++ b/specs/keystone/rocky/strict-two-level-enforcement-model.rst
@ -100,13 +100,12 @@ respectively.
      A -> B;
      A -> C;

-      A [label="A (l=20, u=0)"];
+      A [label="A (l=20, u=4)"];
      B [label="B (u=0)"];
      C [label="C (u=0)"];
   }

-Technically, both ``B`` and ``C`` can use up to 10 ``cores`` each and consume
-the entire limit for the tree. Resulting in:
+Technically, both ``B`` and ``C`` can use up to 8 ``cores`` each, resulting in:

 .. blockdiag::

@ -116,15 +115,16 @@ the entire limit for the tree. Resulting in:
      A -> B;
      A -> C;

-      A [label="A (l=20, u=0)"];
-      B [label="B (u=10)", textcolor = "#00af00"];
-      C [label="C (u=10)", textcolor = "#00af00"];
+      A [label="A (l=20, u=4)"];
+      B [label="B (u=8)", textcolor = "#00af00"];
+      C [label="C (u=8)", textcolor = "#00af00"];
   }

-If ``A`` attempts to claim two ``cores``, the usage check will fail because
-``oslo.limit`` will fetch the hierarchy from keystone and check the usage of
-each project in the hierarchy by using the callback provided by the service to
-see that both ``B`` and ``C`` have 10 ``cores`` each:
+If ``A`` attempts to claim two more ``cores``, the usage check will fail
+because ``oslo.limit`` will fetch the hierarchy from keystone and check the
+usage of each project in the hierarchy by using the callback provided by the
+service to see that total usage of ``A``, ``B``, and ``C`` is equal to the
+limit of the tree, set by ``A.limit``.

 .. blockdiag::

@ -134,9 +134,9 @@ see that both ``B`` and ``C`` have 10 ``cores`` each:
      A -> B;
      A -> C;

-      A [label="A (l=20, u=2)", textcolor = "#FF0000"];
-      B [label="B (u=10)"];
-      C [label="C (u=10)"];
+      A [label="A (l=20, u=6)", textcolor = "#FF0000"];
+      B [label="B (u=8)"];
+      C [label="C (u=8)"];
   }

 Despite the usage of the tree being equal to the limit, we can still add
@ -151,9 +151,9 @@ children to the tree:
      A -> C;
      A -> D;

-      A [label="A (l=20, u=0)"];
-      B [label="B (u=10)"];
-      C [label="C (u=10)"];
+      A [label="A (l=20, u=4)"];
+      B [label="B (u=8)"];
+      C [label="C (u=8)"];
      D [label="D (u=0)", textcolor = "#00af00"];
   }

@ -169,9 +169,9 @@ tree prevents ``D`` from claiming any ``cores``:
      A -> C;
      A -> D;

-      A [label="A (l=20, u=0)"];
-      B [label="B (u=10)"];
-      C [label="C (u=10)"];
+      A [label="A (l=20, u=4)"];
+      B [label="B (u=8)"];
+      C [label="C (u=8)"];
      D [label="D (u=2)", textcolor = "#FF0000"];
   }

@ -181,9 +181,9 @@ because it provides a very clear escalation path. When a request fails because
 the tree limit has been exceeded, a user has all the information they need to
 provide meaningful context in a support ticket (e.g. their project ID and the
 parent project ID). An administrator of project ``A`` should be able to
-reshuffle usage accordingly. A system administrator should be able to do the
-same thing. Providing this information in tree structures with more than a
-depth of two is much harder, but may be implemented with a separate model.
+reshuffle usage accordingly. A system administrator should be able to as well.
+Providing this information in tree structures with more than a depth of two is
+much harder, but may be implemented with a separate model.

 .. blockdiag::

@ -194,9 +194,9 @@ depth of two is much harder, but may be implemented with a separate model.
      A -> C;
      C -> D;

-      A [label="A (l=20, u=0)"];
-      B [label="B (u=10)"];
-      C [label="C (u=10)"];
+      A [label="A (l=20, u=4)"];
+      B [label="B (u=8)"];
+      C [label="C (u=8)"];
      D [label="D (u=0)", textcolor = "#FF0000"];
   }

@ -211,14 +211,15 @@ project-specific override for ``cores``:
      A -> B;
      A -> C;

-      A [label="A (l=20, u=0)"];
-      B [label="B (l=12, u=10)", textcolor = "#00af00"];
-      C [label="C (u=10)"];
+      A [label="A (l=20, u=4)"];
+      B [label="B (l=12, u=8)", textcolor = "#00af00"];
+      C [label="C (u=8)"];
   }

 Note that regardless of this update, any subsequent requests to claim more
 ``cores`` in the tree will be forbidden since the usage is equal to the limit
-of the ``A``. If ``cores`` are released from ``C``, ``B`` can claim them:
+of the ``A``. If ``cores`` are released from ``A`` and ``C``, ``B`` can claim
+them:

 .. blockdiag::

@ -228,9 +229,9 @@ of the ``A``. If ``cores`` are released from ``C``, ``B`` can claim them:
      A -> B;
      A -> C;

-      A [label="A (l=20, u=0)"];
-      B [label="B (l=12, u=10)"];
-      C [label="C (u=8)", textcolor = "#00af00"];
+      A [label="A (l=20, u=2)", textcolor = "#00af00"];
+      B [label="B (l=12, u=8)"];
+      C [label="C (u=6)", textcolor = "#00af00"];
   }

 .. blockdiag::
@ -241,9 +242,9 @@ of the ``A``. If ``cores`` are released from ``C``, ``B`` can claim them:
      A -> B;
      A -> C;

-      A [label="A (l=20, u=0)"];
+      A [label="A (l=20, u=2)"];
      B [label="B (l=12, u=12)", textcolor = "#00af00"];
-      C [label="C (u=8)"];
+      C [label="C (u=6)"];
   }

 While ``C`` is still under its default allocation of 10 ``cores``, it won't be
@ -259,9 +260,9 @@ had:
      A -> B;
      A -> C;

-      A [label="A (l=20, u=0)"];
+      A [label="A (l=20, u=2)"];
      B [label="B (l=12, u=12)"];
-      C [label="C (u=10)", textcolor = "#FF0000"];
+      C [label="C (u=8)", textcolor = "#FF0000"];
   }

 Creating or updating a project with a limit that exceeds the limit of ``A`` is
@ -362,12 +363,17 @@ to validate limits before writing them to the backend.

 If keystone is configured to use the ``strict-two-level`` enforcement model and
 current project structure within keystone violates the two-level project
-constraint, keystone should fail to start. To aid operators, we can develop a
-``keystone-manage`` command, to check the hierarchical structure of the
-projects in the deployment and warn operators if keystone is going to fail to
-start. This gives operators the ability to check and fix their project
-hierarchy before they deploy keystone with the new model. This clearly
-communicates a set project structure to operators at run time.
+constraint, keystone should fail to start. On start-up, keystone will scan the
+database to ensure that all projects don't exceed two levels of hierarchy and
+that ``keystone.conf [DEFAULT] max_project_tree_depth = 2``. If either
+condition fails, keystone will report an appropriate error message and refuse
+to start.
+
+To aid operators, we can develop a ``keystone-manage`` command, to check the
+hierarchical structure of the projects in the deployment and warn operators if
+keystone is going to fail to start. This gives operators the ability to check
+and fix their project hierarchy before they deploy keystone with the new model.
+This clearly communicates a set project structure to operators at run time.

 Proposed Library Changes & Consumption
 --------------------------------------
@ -544,22 +550,52 @@ None
 Performance Impact
 ------------------

+Usage Caching
+^^^^^^^^^^^^^
+
 Performance of this model is expected to be sub-optimal in comparison to flat
 enforcement. The main factor contributing to expected performance loss is the
 calculation of usage for the tree. The ``oslo.limit`` library will need to
 calculate the usage for every project in the tree in order to provide an answer
 to the service regarding the request.

+One possible solution to mitigate performance concerns here would be to
+calculate usage for projects in parallel.
+
+Limit Caching
+^^^^^^^^^^^^^
+
 Other services will be required to make additional calls to keystone to
 retrieve limit information in order to do quota enforcement. This will add some
-overhead to the overall performance of the API call.
+overhead to the overall performance of the API call because it requires a
+round-trip to keystone.

 It is also worth noting that both Registered Limits and Project Limits are not
-expected to change frequently. This means the data is safe to cache for some
-period of time. Caching has already been implemented internally to keystone,
-similar to how keystone caches responses for other resources. Caching can also
-be done client-side to avoid making frequent calls to keystone for relatively
-static limit information.
+expected to change frequently. This means that the limit data could possibly be
+cached client-side in ``oslo.limit``. However, this introduces concerns about
+limit invalidation. Consider the following example:
+
+* client-side cache TTL is set to one hour for limit information from keystone
+* one minute later, an administrator decreases the limit for ``cores`` on a
+  particular project
+* two minutes later, a user makes a request to create an instance in the
+  project that should bring them over the limit just set by the administrator
+* due to client-side caching, the service considers the project within it's
+  limit and allows the instance to be created
+* the current usage is out of sync with the limit set by the administrator and
+  the service won't realize this until the TTL expires in another 57 minutes
+
+Client-side caching is going to be a very specific case that needs to be
+handled carefully because cache invalidation strategies are going to be
+distributed across services. One possible mitigation would be for client-side
+caching and keystone to share the same cache instance, making it easier to
+perform cache invalidation. Conversely, this raises the operational bar for
+administrators and requires assumptions about underlying infrastructure.
+
+Until we know we can make those types of assumptions or find an alternative
+solution for cache invalidation, client-side caching should be avoided to
+prevent situations like what was described above. We should error on the side
+of accuracy when retrieving limit information.

 Other Deployer Impact
 ---------------------