diff --git a/specs/newton/approved/dependency-driven-resource-deallocation.rst b/specs/newton/approved/dependency-driven-resource-deallocation.rst new file mode 100644 index 0000000..3250ad3 --- /dev/null +++ b/specs/newton/approved/dependency-driven-resource-deallocation.rst @@ -0,0 +1,544 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +================================================== +Dependency-driven multi-step resource deallocation +================================================== + +https://blueprints.launchpad.net/murano/+spec/dependency-driven-resource-deallocation + +Murano components may allocate various kinds of resources: virtual machines, +networks, volumes etc. When these components get removed from the deployment +appropriate resources have to be deallocated. Current implementation of this +process has some significant limitations and flaws. + +This specification aims to address these issues and provide a design for the +better resource deallocation / garbage collection system for MuranoPL. + +Problem description +=================== + +In Murano the deallocation of resources is managed by a garbage collection +system (GC). Its present implementation is based on the execution of special +methods called ``.destroy()`` which may be defined in each MuranoPL class and +are intended to contain the custom code to deallocate resources allocated by +the objects of those classes. +These methods get executed when their objects leave object graph, however the +exact order of these executions is currently undefined. + +There are two different scenarios when the objects may leave the object graph +thus causing their ``.destroy`` methods to be called: + +* "Offline" changes of the Object Graph, i.e. the changes introduced in the + serialized version of the object graph via the API. These changes are + detected by comparing the incoming object graph (the one passed from the + API for deployment) with the "snapshot" of current environment made after + the previous deployment has been completed. If some object exists in the + "snapshot" but is missing in the input graph it is considered to be + removed. Such objects are deserialized from the snapshot and their + ``.destroy`` methods are called in the order from deepest nested objects + towards topmost ones. + +* "Runtime" changes. Some objects may be removed from the object graph during + deployment: they may be unreferenced or assigned to Runtime properties or + local variable only. As a result after the deployment completes these + objects are not serialized neither into the output version of the object + graph nor into its "snapshot". The next deployments will know nothing about + their existence so the objects will be lost forever. To recover the + resources allocated by such unreferenced objects murano analyzes its + ObjectStore after the execution is complete. Each object which is present in + the store but is not present in the output version of the object graph is + considered to be "orphan" and thus its ``.destroy`` method is called. The + order of these calls is currently undefined: the objects get destroyed based + on their position in the object store, which is hardly predictable. + +Such a design is insufficient for production grade applications which +often require the following scenarios: + +* If some object is going to be deleted another object (either owning or just + referencing it) may need to execute some actions before or after the object + is deleted. + +* When a group of nested or interconnected objects is about to be deleted the + order in which their destructors should be executed may be different in + different cases. + +* Sometimes the actions being executed during the destruction of an object may + depend on the fact whether some other object is about to be deleted or not. + + + *Example* + + *Consider an Application which consists of a Network component and several + VM components. All the components are owned by the Application but there + are no ownership relationships between them. When the Application is going + to be deleted (i.e. its whole subgraph leaves the environment) all of its + components are about to be removed as well, and so their ``.destroy`` + methods will be called. Since Network does not own the VM's the order + of these calls is undefined. Due to various implementation details it may + be impossible to remove the Network before the VMs which are connected to + it (e.g. in case when the VM has a mandatory requirement to be always + connected to at least one network). In this case the ``.destroy`` of a + Network component should be always called after all the VMs have been + destroyed.* + +Proposed change +=============== + +Several improvements have to be made in Murano engine to address the problems +described above. + +General concepts +---------------- + +Destruction dependencies +^^^^^^^^^^^^^^^^^^^^^^^^ + +There should be a way to establish directional `destruction dependencies` +between Murano objects. If object `Foo` establishes such a dependency on the +object `Bar` then: + +* `Foo` will be notified when `Bar` is about to be destroyed. These + notifications are covered in details in "Multi-step destruction" section + below. + +* If both `Foo` and `Bar` are going to be destroyed in the same garbage + collection execution, `Bar` will be destroyed before Foo. + +These dependencies are not related to object ownership relationship or +property-based cross-references: the owner may have a destruction dependency on +its nested object or vise versa; the objects referencing each other may have +some destruction dependency established. Even entirely unrelated objects may +have a destruction dependency between them. + +Since the destruction dependencies are directional there is a theoretical +possibility of a circular dependency to exist. In case if two or more objects +form such a circle they will still be notified about pending destruction of +their dependencies, however the order of this notifications - and the +destruction itself - is undefined in this case. + +For now it's proposed to use destruction dependencies only for runtime garbage +collection. Probably, it will be implemented when we find a solution for issue +mentioned above in `Known issues`_. + + +Multi-step destruction +^^^^^^^^^^^^^^^^^^^^^^ + +Instead of just iterating through all the objects going to be destructed and +calling their ``.destroy`` methods Murano should perform a multi-step garbage +collection according to the following algorithm: + +1. Detect all the objects going to be destroyed using either comparison of + current object graph with its snapshot or the reference detection (see + "Offline" and "Online" changes in "Problem description" section) + +2. Sort the list of detected objects using the following comparator: for + any two objects A and B in the list: + + .. code:: + + IF (A has-a-destruction-dependency-on B) + AND (NOT B has-a-destruction-dependency-on A) + THEN A>B + ELSE IF (A owns B) THEN A>B + ELSE A==B + + where `has-a-destruction-dependency-on` means that the left operand object + has a destruction dependency (probably transitive) on right operand object, + `owns` means that the left operand object owns (probably transitively) the + right operand object. + + .. note:: + + The destruction order of objects which are considered to be equal by + the algorithm above is undefined. Even more, future implementations may + destroy such objects in parallel. + + +3. For each object in the list: + + 3.1. Notify all the objects having a destruction dependency on it that the + target object will be destroyed. + + 3.2. Call the ``.destroy()`` method of the object if it is present. + + 3.3. Change the object's status to "Destroyed" (see below). + +Destroyed objects +^^^^^^^^^^^^^^^^^ + +When an object is being processed by a garbage collector, it means that there +are no live references to it from the objects of the environment. However there +may be cases when the code which handles either the pre-destroy notification +(p. 3.1 above) or the actual ``.destroy`` method re-establishes the references +to the object being destructed, and thus the object remains in the object graph +after the GC is completed. Since the resources may be deallocated at this time +the regular usage of the object is not possible, however if it is assigned to +a property of some another object in the graph it may not always be possible to +just nullify that property since it may cause a contract violation. + +To resolve such collisions it is proposed to explicitly mark such destroyed +objects as "destroyed". MuranoPL executor will not allow to execute any methods +on such objects, however their properties remain accessible (i.e. readable) so +any runtime information associated with them may be recovered. Destroyed +objects will be serialized with the rest of object graph but the +json-representation of the object will have a special flag in their class +header (the "?" section) to indicate their special status. When deserialized +from json such objects will retain their "destroyed" status, so the method +execution will still be impossible even in subsequent deployments. + +When the destroyed objects are unreferenced from the object graph they go away +without additional actions: garbage collector ignores them since their +resources have already been released. + +Garbage collection executions +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The multi-step object destruction described above should take place in three +different scenarios: + +1. *(currently existing)* Before the deployment, destroys objects which were + present in the object graph after the previous deployment was finished but + were not found in the incoming object graph of a new deployment, i.e. the + ones explicitly removed using the API. + +2. *(currently existing)* After the deployment, destroys the objects existing + in the Object Store but not being a part of the **persistable** object graph + of current environment, i.e. having no references to them from the + persistable (In, Out, InOut) properties of the environment or its transitive + children). + +3. *(proposed)* During the deployment, explicitly initiated from MuranoPL code. + Destroys objects which are not part of the **complete** object graph, i.e. + having no references to them from any properties of the environment + (including runtime and private properties) AND not being referenced by local + variables in any frame of all the the green threads of current deployment. + +To implement p.3 above a new algorithm is needed. It should analyze all the +active contexts of all the running greenthreads of the current deployment and +retrieve all the data variables from that context, traversing through the +parent contexts as well. All the objects of ``MuranoObject`` type collected +this way should be added to the "queue of active roots" to be used for further +processing. +For each object of this queue the algorithm should save the id of the object +into the "result set" and then find objects which are reachable from the +current one (i.e. the objects of ``MuranoObject`` type contained in properties +of any kind). For each such object the algorithm should check whether its id is +already present in the "result set". If not, the object is added to the end of +"queue of active roots". The algorithm runs till it processes all the objects +of the queue. +The "result set" which the algorithm gets at the end of this process contains +the ids of alive objects. All other objects of the object store should be +considered as candidates for garbage collection. + +There should be one additional check made before doing the actual destruction: +some of the objects may have no valid access paths from MuranoPL objects, but +could be referenced from some python-back objects. This happens when an Object +is passed to some method backed with pythonic code. In this case the executor +creates an object of type ``MuranoObjectInterface`` - a wrapper to simplify the +work with a murano object from python. This object contains a reference to the +actual MuranoPL object within. If appropriate python object is alive, then +its corresponding MuranoPL object should not be garbage collected even if there +are no references to it from the active roots of MuranoPL. +To keep track of such situations the Garbage Collector should contain a special +dictionary mapping ids of the objects to the weak proxies pointing to the +``MuranoObjectInterface`` objects passed to pythonic code. For every garbage +collection candidate the algorithm should check if there is a map entry for +this object and the weak proxy at that entry is alive. If this is true, then +the object is excluded from the list of GC candidates. + +The resulting list of GC candidates is then destroyed as described in +"Multi-step destruction" section above. + +Known issues +^^^^^^^^^^^^ + +Murano is using ``Objects`` and ``ObjectsCopy`` objects to transfer data between +deployments. When destruction dependencies will be implemented handler will make +changes (if any) to objects in ``ObjectsCopy``. So, this changes aren't applied +during next deployment. + +It's proposed to change the way of object model generation, with updating new model +objects if they have been changed after last deployment. However, solving this +issued is not an aim of this specification, so we can skip details. + +Code changes +------------ + +GC class +^^^^^^^^ + +A new python-backed Murano class called ``GC`` should be added to the core +library. It should have the following static methods: + +* ``collect()`` - initiates garbage collection of unreferenced objects of + current deployment (see p.3 in "Garbage collection executions" section + above). + +* ``isDestroyed(object)`` - checks if the ``object`` was already destroyed + during some GC session and thus its methods cannot be called. + +* ``subscribe(target, handler=null)`` - establishes a destruction dependency + from the caller to the object passed as ``target``. Method may be called + several times, in this case only a single destruction dependency will be + established, however the same amount of calls of ``unsubscribe`` will be + required to remove it. + + ``handler`` argument is optional. If passed it should be the name of an + instance methods defined by the caller class to handle notification of + ``target``'s destruction (see "Multi-step destruction" section above: this + handlers is executed for p. 3.1) + + The following arguments will be passed to the handler method + + * ``sender`` - an instance of ``GC`` class describing the current + garbage collection session. + + * ``object`` - a target object which is going to be destroyed. It is not + recommended to persist the reference to this object anywhere. This will not + prevent the object from being garbage collected but the object will be + moved to the "destroyed" state which is almost always bad. The option to do + so is considered to be advanced feature which should not be done unless it + is absolutely necessary. + +* ``unsubscribe(target)`` - removes the destruction dependency from the caller + to the object passed as ``target``. Method may be called several times + without any side-effects. If ``subscribe`` was called more than once the same + (or more) amount of calls to ``unsubscribe`` is needed to remove the + dependency. + +An instance of ``GC`` class will be created during the garbage collection +session to encapsulate runtime information about this session. It defines a +single method which may be of use during a GC session: + +* ``isDoomed(object)`` - checks if the ``object`` is marked for destruction + during this GC session. + +Pythonic back-end of the ``GC`` class should be able to establish destruction +dependencies by storing the back-refs to the dependent object in attributes of +the python's instance of MuranoObject representing the dependency. + + .. note:: + + This is the opposite of how the destruction dependencies are stored + when the model is serialized: in serialized form that is the dependent + object who owns the reference to the dependency object. In runtime it + is the dependency object who owns the reference to dependent object. + + +When the garbage collection is needed the class will be instantiated and a list +of objects-to-delete created based on the current state of object graph, object +store and the execution context. Garbage collections will use this object for +all the steps of the workflow. + +Alternatives +------------ + +Application developers may try to implement their own event-based notification +logic to notify about pending and completed object destructions. However it +will solve only part of the problem: notifications will work properly, but they +will not affect the order in which the objects are destroyed, so the workflows +will be too complicated. Also this alternative will not have the advanced +features proposed in this spec, such as ability to check if some object is +going to be destroyed. + +Data model impact +----------------- + +None + +REST API impact +--------------- + +None + +Versioning impact +----------------- + +The proposed change is completely backwards compatible: without explicit +destruction dependencies objects will be collected based on their ownership +relationships, i.e. as it is done in the current implementation. + +The packages containing classes which explicitly call the methods of ``GC`` +should have package format of at least 1.4 to prevent their execution on older +versions of Murano which do not have this feature. + +Other end user impact +--------------------- + +None + +Deployer impact +--------------- + +None + +Developer impact +---------------- + +Developers will get the new MuranoPL-based API to manage resource deallocation +lifecycle. If they do not want to use it they don't need to do anything. + + +Murano-dashboard / Horizon impact +--------------------------------- + +None + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + ativelkov + +Other contributors: + starodubcevna + +Work Items +---------- + +* Implement a system to define and use destruction dependencies in runtime. + +* Introduce changes to MuranoObject class to keep track of "destroyed" + object status. + +* Modify the serializer / deserializer to properly persist the value of the + "destroyed object" flag. + +* Modify the code which instantiates yaql contexts for MuranoPL so all the + created contexts are tracked by the execution session. + +* Implement sorting algorithms to arrange objects-to-be-destroyed based on + criteria defined in p.2 of "Multi-step destruction" section above. + +* Modify an algorithm to collect alive object roots from the runtime and + private properties and local variables of executing threads. + +* Implement multi-step destruction workflow. + +* Implement ``GC`` class to bind all the above. + +* Create test-runner-based tests to cover all the test scenarios. + +* Document the new features. + + +Dependencies +============ + +The development of this feature will enable Application Development Framework +[1] to address resource deallocation problems during application uninstall. + +Testing +======= + +Tests should be written for test-runner to cover various scenarios of resource +deallocation. + +Runtime garbage collection +-------------------------- + +There should be test cases covering that: + +* objects assigned to persistent (Input, Output, InputOutput) properties (both + locally-declared and inherited) of objects reachable from the current roots + are NOT garbage collected; + +* objects assigned to transient (Runtime and undeclared) properties (both + locally-declared and inherited) of objects reachable from the current roots + are NOT garbage collected; target properties should be both locally-declared + and inherited; + +* objects assigned to static properties of various classes are NOT garbage + collected; + +* objects passed to python-backed objects and unreferenced in MuranoPL are NOT + garbage collected unless their MuranoObjectInterface proxies are unreferenced + / GC'ed in python; + +* objects assigned to local variables of the current execution frame (i.e. + variables of the current method and all the caller methods in call stack) + including method arguments are NOT garbage collected; + +* single unreferenced objects ARE garbage collected; + +* graphs of interconnected objects having no references from non-collected + objects ARE garbage collected; + +* objects passed to python-backed objects and unreferenced in both MuranoPL and + python ARE garbage collected; + +* garbage collector correctly processes stack-frame objects from green-threads + other than the one it is executed from; + +Destruction dependency resolution order +--------------------------------------- + +There should be test cases covering that: + +* if some child object has a destruction dependency on its parent, the parent + gets destroyed before the child; + +* if some parent object has a destruction dependency on its child, the child + gets destroyed before the parent; + +* if some objects not being the part of some ownership hierarchy have some + destruction dependency, the dependency-object is destroyed before the + dependent one; + +* if some objects have circular destruction dependency they are all destroyed + (the order is not enforced by the test); + +Destruction events +------------------ + +Given the base scenario of object A having a destruction dependency on object B +and B being GC'ed, there should be tests covering that: + +* the right order of events occurs (A gets warned about possible B's + destruction -> A is notified about inevitable B's destruction -> B is + destroyed -> A is notified that B was destroyed); + +* A may prevent B's destruction by establishing a reference on B in the warning + handler; + +* A may cancel GC in both warning and pre-destroy notification handlers; + +* A may establish more then 1 destruction dependency on B and still be + notified just once; + +* A may remove the destruction dependency and not get notified on B's + destruction; + +* If A established N destruction dependencies and then removed them M times, + (N>M) then notifications are still delivered; + +* If A established N destruction dependencies and then removed them M times, + (N<=M) then notifications are not delivered; + +* B may establish a destruction dependency on itself thus subscribing to + appropriate notifications; + +* ``phase`` property of GC instance is correct in appropriate event handlers; + +* ``isDoomed`` and ``isDestroyed`` methods return appropriate values when + called by A for B in appropriate event handlers. + +Documentation Impact +==================== + +Developers documentation should be updated to describe the new ``GC`` class and +its static and instance methods, as well as the design guidelines for +application developers to follow to utilize the new capability. + +References +========== + +[1] https://github.com/openstack/murano-specs/blob/master/specs/newton/approved/application-development-framework.rst