Merge "Add spec for handling unknown schemas"

This commit is contained in:
Jenkins 2015-11-26 04:11:20 +00:00 committed by Gerrit Code Review
commit ce21f8b706
1 changed files with 208 additions and 0 deletions

View File

@ -0,0 +1,208 @@
..
==========================================
Policy Support for Unknown Table Schemas
==========================================
The new distributed architecture requires the policy engine to
handle the case when the schema for some datasource drivers are
unknown. Today the policy engine assumes the schemas for all
datasource drivers are known at load-time. This spec outlines
a mechanism for supporting unknown schemas.
Problem description
===================
For the new distributed architecture, the policy engine will not know
the schema for all the datasources at the time rules are loaded from the
database. This is currently problematic because column-references are
compiled away at the time rules are loaded from the database, and that
compilation procedure requires the schema. Thus in the new architecture,
the policy engine will crash when it tries to load policy rules that
contain column references.
Proposed change
===============
The fix to this problem is to enable the policy engine to load rules
that include column references without compiling away those column references.
That is, the main reason to compile away column references is that the
internal datastructures for representing rules cannot represent those
column references natively, and hence, the column references must be
removed at load-time. The first task is to extend the internal
datastructures in compile.py so they can represent named-columns.
The second reason column references get compiled away is that they cannot
be evaluated (even semantically) without the schema. The second task
is to extend the run-time capabilities of the policy engine so that
rules can be disabled without being deleted. A disabled rule will
be completely hidden from the evaluation engine when answering queries
yet will still be visible but marked as "disabled" when users view
the rules.
Every time a new rule is inserted, if its schema is unknown, that
rule must be disabled. Moreover, every rule using a table dependent
on the table in the head of that rule must be disabled. Similarly
for deletion except that deletion can cause other rules to be enabled.
Every time the schema changes, all rules impacted by that schema
change should be checked for consistency, and disabled rules
must be enabled once all schemas are known. Once a rule
is enabled, the column references are compiled away.
If a 2nd schema arrives (unequal to the first), the policy engine
must check for consistency and recompile any rules whose schema
may have changed.
For example, if the following rule is inserted before the schemas
for nova-servers and neutron-networks is known, the rule will
be disabled since it has column references.
p(x, z) :- nova:servers(id=x, network=y), neutron:networks(id=y, status=z)
Then when the nova schema becomes known this rule is validated
against that schema but is not enabled because the neutron schema
is unknown.
Finally when the neutron schema becomes known, the column references
are compiled away and the rule is officially enabled.
Alternatives
------------
Instead of disabling rules, another option is to modify the
evaluation engine to do a best-effort query evaluation. The evaluation
algorithms themselves would know about column-references, and would
attempt to operate even if the schema was unknown.
The downside to this alternative is that the rules are actually
semantically ambiguous, and hence the result of evaluation has
unknown semantic value.
Policy
------
N/A
Policy actions
--------------
N/A
Data sources
------------
N/A
Data model impact
-----------------
No database modifications are required.
REST API impact
---------------
The Rule object will have an additional boolean field representing whether
or not the rule is disabled.
Security impact
---------------
N/A
Notifications impact
--------------------
N/A
Other end user impact
---------------------
N/A
Performance impact
------------------
Rule inserts could now be slower since if the rule inserted gets disabled,
that could cause many other rules to be disabled.
Rule deletions likewise could cause many policy rules to be enabled.
Schema updates are expensive because the policy engine must do consistency
checks on all rules that are relevant, and potentially re-compile rules.
Other deployer impact
---------------------
N/A
Developer impact
----------------
N/A
Implementation
==============
Assignee(s)
-----------
Primary assignee:
thinrichs
Other contributors:
<launchpad-id or None>
Work items
----------
1. Modify compile.py datastructures to natively represent column
references. Include a 'disabled' flag.
2. Modify query evaluation engine to ignore disabled rules
3. Modify triggers to ignore disabled tables
4. Enable/disable rules on insert/delete/set-schema
- Write dependency analysis routine to compute the rules/tables
that are disabled once a given table is disabled.
Likely to need a datastructure that tracks disabled tables.
- Modify update routine to do schema check and enable/disable rules
as appropriate using the dependency analysis.
- Modify set-schema to appropriately enable/disable rules
May want to add field to rules that say which tables the compilation
was dependent on.
Dependencies
============
N/A
Testing
=======
Unit test coverage should be mostly adequate.
Only real need for tempest tests would be testing the startup of Congress,
but that is not supported with tempest.
Documentation impact
====================
N/A
References
==========
The need for this spec was discussed at the Liberty Midcycle Sprint.
https://etherpad.openstack.org/p/congress-liberty-sprint