diff --git a/specs/mitaka/unknown-schemas.rst b/specs/mitaka/unknown-schemas.rst new file mode 100644 index 0000000..076935c --- /dev/null +++ b/specs/mitaka/unknown-schemas.rst @@ -0,0 +1,208 @@ +.. + +========================================== +Policy Support for Unknown Table Schemas +========================================== + +The new distributed architecture requires the policy engine to +handle the case when the schema for some datasource drivers are +unknown. Today the policy engine assumes the schemas for all +datasource drivers are known at load-time. This spec outlines +a mechanism for supporting unknown schemas. + + +Problem description +=================== + +For the new distributed architecture, the policy engine will not know +the schema for all the datasources at the time rules are loaded from the +database. This is currently problematic because column-references are +compiled away at the time rules are loaded from the database, and that +compilation procedure requires the schema. Thus in the new architecture, +the policy engine will crash when it tries to load policy rules that +contain column references. + + +Proposed change +=============== + +The fix to this problem is to enable the policy engine to load rules +that include column references without compiling away those column references. +That is, the main reason to compile away column references is that the +internal datastructures for representing rules cannot represent those +column references natively, and hence, the column references must be +removed at load-time. The first task is to extend the internal +datastructures in compile.py so they can represent named-columns. + +The second reason column references get compiled away is that they cannot +be evaluated (even semantically) without the schema. The second task +is to extend the run-time capabilities of the policy engine so that +rules can be disabled without being deleted. A disabled rule will +be completely hidden from the evaluation engine when answering queries +yet will still be visible but marked as "disabled" when users view +the rules. + +Every time a new rule is inserted, if its schema is unknown, that +rule must be disabled. Moreover, every rule using a table dependent +on the table in the head of that rule must be disabled. Similarly +for deletion except that deletion can cause other rules to be enabled. + +Every time the schema changes, all rules impacted by that schema +change should be checked for consistency, and disabled rules +must be enabled once all schemas are known. Once a rule +is enabled, the column references are compiled away. + +If a 2nd schema arrives (unequal to the first), the policy engine +must check for consistency and recompile any rules whose schema +may have changed. + +For example, if the following rule is inserted before the schemas +for nova-servers and neutron-networks is known, the rule will +be disabled since it has column references. + +p(x, z) :- nova:servers(id=x, network=y), neutron:networks(id=y, status=z) + +Then when the nova schema becomes known this rule is validated +against that schema but is not enabled because the neutron schema +is unknown. + +Finally when the neutron schema becomes known, the column references +are compiled away and the rule is officially enabled. + + +Alternatives +------------ + +Instead of disabling rules, another option is to modify the +evaluation engine to do a best-effort query evaluation. The evaluation +algorithms themselves would know about column-references, and would +attempt to operate even if the schema was unknown. + +The downside to this alternative is that the rules are actually +semantically ambiguous, and hence the result of evaluation has +unknown semantic value. + + +Policy +------ + +N/A + +Policy actions +-------------- + +N/A + +Data sources +------------ + +N/A + +Data model impact +----------------- + +No database modifications are required. + + +REST API impact +--------------- + +The Rule object will have an additional boolean field representing whether +or not the rule is disabled. + + +Security impact +--------------- + +N/A + + +Notifications impact +-------------------- + +N/A + +Other end user impact +--------------------- + +N/A + +Performance impact +------------------ + +Rule inserts could now be slower since if the rule inserted gets disabled, +that could cause many other rules to be disabled. + +Rule deletions likewise could cause many policy rules to be enabled. + +Schema updates are expensive because the policy engine must do consistency +checks on all rules that are relevant, and potentially re-compile rules. + + +Other deployer impact +--------------------- + +N/A + +Developer impact +---------------- + +N/A + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + thinrichs + +Other contributors: + + +Work items +---------- + + +1. Modify compile.py datastructures to natively represent column +references. Include a 'disabled' flag. +2. Modify query evaluation engine to ignore disabled rules +3. Modify triggers to ignore disabled tables +4. Enable/disable rules on insert/delete/set-schema +- Write dependency analysis routine to compute the rules/tables +that are disabled once a given table is disabled. +Likely to need a datastructure that tracks disabled tables. +- Modify update routine to do schema check and enable/disable rules +as appropriate using the dependency analysis. +- Modify set-schema to appropriately enable/disable rules +May want to add field to rules that say which tables the compilation +was dependent on. + + +Dependencies +============ + +N/A + +Testing +======= + +Unit test coverage should be mostly adequate. + +Only real need for tempest tests would be testing the startup of Congress, +but that is not supported with tempest. + + +Documentation impact +==================== + +N/A + +References +========== + +The need for this spec was discussed at the Liberty Midcycle Sprint. +https://etherpad.openstack.org/p/congress-liberty-sprint + +