Update GC spec with implementation details

Change-Id: I4a4d23690d459a30ff5b931eebb77a7d4ebd9bcd
This commit is contained in:
Valerii Kovalchuk 2016-09-13 15:34:13 +03:00
parent f69397ccdb
commit 7b05cd043a
1 changed files with 114 additions and 124 deletions

View File

@ -85,6 +85,16 @@ often require the following scenarios:
Network component should be always called after all the VMs have been
destroyed.*
Another issue is that murano uses ``Objects`` and ``ObjectsCopy`` objects
to transfer data between deployments. When destruction dependencies are
implemented, the handler will make changes (if any) to objects in
``ObjectsCopy``. Therefore, these changes are not applied during the next
deployment and this should be addressed.
Also, sometimes it can be useful to deallocate resources used by the
unreferenced objects even before the end of deployment on demand from the
application code.
Proposed change
===============
@ -120,10 +130,9 @@ form such a circle they will still be notified about pending destruction of
their dependencies, however the order of this notifications - and the
destruction itself - is undefined in this case.
For now it's proposed to use destruction dependencies only for runtime garbage
collection. Probably, it will be implemented when we find a solution for issue
mentioned above in `Known issues`_.
Destruction dependencies are going to be used during all kinds of garbage
collection: pre-deployment ("offline"), on-demand (during deployment) and
post-deployment.
Multi-step destruction
^^^^^^^^^^^^^^^^^^^^^^
@ -132,19 +141,19 @@ Instead of just iterating through all the objects going to be destructed and
calling their ``.destroy`` methods Murano should perform a multi-step garbage
collection according to the following algorithm:
1. Detect all the objects going to be destroyed using either comparison of
current object graph with its snapshot or the reference detection (see
"Offline" and "Online" changes in "Problem description" section)
1. Detect all the objects going to be destroyed. It can be done by using
Python GC to track and collect objects, as described in the
*Garbage collection executions* section.
2. Sort the list of detected objects using the following comparator: for
any two objects A and B in the list:
.. code::
IF (A has-a-destruction-dependency-on B)
IF (A owns B) THEN A>B
ELSE IF (A has-a-destruction-dependency-on B)
AND (NOT B has-a-destruction-dependency-on A)
THEN A>B
ELSE IF (A owns B) THEN A>B
ELSE A==B
where `has-a-destruction-dependency-on` means that the left operand object
@ -152,21 +161,28 @@ collection according to the following algorithm:
`owns` means that the left operand object owns (probably transitively) the
right operand object.
.. note::
The objects which are considered to be equal by the algorithm above can be
destroyed in parallel.
The destruction order of objects which are considered to be equal by
the algorithm above is undefined. Even more, future implementations may
destroy such objects in parallel.
The result of the sorting is the dictionary with indexes as keys and
lists of objects with equal destruction priority as values.
3. Sort the keys of the dictionary in the reversed order of destruction
priority and for each key start parallel notification about scheduled
destruction of the objects in the corresponding group. During
notification, handlers of the objects that have a destruction dependency on
some "sentenced" object will be invoked.
3. For each object in the list:
4. Sort the keys of the dictionary in the direct order of destruction
priority and for each index in the dictionary start parallel destruction of
all objects in the corresponding group. Destruction of individual object
means calling the ``.destroy()`` method of the object if present and
changing the object's status to "Destroyed" (see below).
3.1. Notify all the objects having a destruction dependency on it that the
target object will be destroyed.
3.2. Call the ``.destroy()`` method of the object if it is present.
3.3. Change the object's status to "Destroyed" (see below).
As an environment does not have owners, it will always be in the last
group of destruction. There is no guarantee that some other objects (for
example, heat stack) will be alive at the time of its destruction. Thus
``io.murano.Environment`` class should not have the ``.destroy()`` method.
Destroyed objects
^^^^^^^^^^^^^^^^^
@ -174,7 +190,7 @@ Destroyed objects
When an object is being processed by a garbage collector, it means that there
are no live references to it from the objects of the environment. However there
may be cases when the code which handles either the pre-destroy notification
(p. 3.1 above) or the actual ``.destroy`` method re-establishes the references
(step 4 above) or the actual ``.destroy`` method re-establishes the references
to the object being destructed, and thus the object remains in the object graph
after the GC is completed. Since the resources may be deallocated at this time
the regular usage of the object is not possible, however if it is assigned to
@ -182,8 +198,10 @@ a property of some another object in the graph it may not always be possible to
just nullify that property since it may cause a contract violation.
To resolve such collisions it is proposed to explicitly mark such destroyed
objects as "destroyed". MuranoPL executor will not allow to execute any methods
on such objects, however their properties remain accessible (i.e. readable) so
objects as "destroyed". It means setting object's ``destroyed`` attribute to
``True`` and removing the self-reference from it.
MuranoPL executor will not allow to execute any methods on such objects,
however their properties remain accessible (i.e. readable) so
any runtime information associated with them may be recovered. Destroyed
objects will be serialized with the rest of object graph but the
json-representation of the object will have a special flag in their class
@ -191,9 +209,8 @@ header (the "?" section) to indicate their special status. When deserialized
from json such objects will retain their "destroyed" status, so the method
execution will still be impossible even in subsequent deployments.
When the destroyed objects are unreferenced from the object graph they go away
without additional actions: garbage collector ignores them since their
resources have already been released.
When "destroyed" objects are unreferenced from the object graph, their
properties get nullified and they get destroyed automatically by Python GC.
Garbage collection executions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -218,53 +235,50 @@ different scenarios:
(including runtime and private properties) AND not being referenced by local
variables in any frame of all the the green threads of current deployment.
To implement p.3 above a new algorithm is needed. It should analyze all the
active contexts of all the running greenthreads of the current deployment and
retrieve all the data variables from that context, traversing through the
parent contexts as well. All the objects of ``MuranoObject`` type collected
this way should be added to the "queue of active roots" to be used for further
processing.
For each object of this queue the algorithm should save the id of the object
into the "result set" and then find objects which are reachable from the
current one (i.e. the objects of ``MuranoObject`` type contained in properties
of any kind). For each such object the algorithm should check whether its id is
already present in the "result set". If not, the object is added to the end of
"queue of active roots". The algorithm runs till it processes all the objects
of the queue.
The "result set" which the algorithm gets at the end of this process contains
the ids of alive objects. All other objects of the object store should be
considered as candidates for garbage collection.
To implement scenario 3, a new algorithm is needed. As mentioned in the
*Multi-step destruction* section, Python garbage collector can be used
for that. MuranoPL can make use of the way Python's ``gc.collect`` works.
There should be one additional check made before doing the actual destruction:
some of the objects may have no valid access paths from MuranoPL objects, but
could be referenced from some python-back objects. This happens when an Object
is passed to some method backed with pythonic code. In this case the executor
creates an object of type ``MuranoObjectInterface`` - a wrapper to simplify the
work with a murano object from python. This object contains a reference to the
actual MuranoPL object within. If appropriate python object is alive, then
its corresponding MuranoPL object should not be garbage collected even if there
are no references to it from the active roots of MuranoPL.
To keep track of such situations the Garbage Collector should contain a special
dictionary mapping ids of the objects to the weak proxies pointing to the
``MuranoObjectInterface`` objects passed to pythonic code. For every garbage
collection candidate the algorithm should check if there is a map entry for
this object and the weak proxy at that entry is alive. If this is true, then
the object is excluded from the list of GC candidates.
Python library ``gc`` allows running on-demand garbage collection through the
``gc.collect()`` method and provides access to unreachable objects that the
collector found but cannot free through collection ``gc.garbage`` [2].
The resulting list of GC candidates is then destroyed as described in
"Multi-step destruction" section above.
To make use of this, there should be an ability to:
Known issues
^^^^^^^^^^^^
* Make object store have weak links only and prevent hard links in other DSL
places so that only links between objects remain.
Murano is using ``Objects`` and ``ObjectsCopy`` objects to transfer data between
deployments. When destruction dependencies will be implemented handler will make
changes (if any) to objects in ``ObjectsCopy``. So, this changes aren't applied
during next deployment.
* Prevent murano objects that should have been destroyed by Python GC from
being destroyed.
It's proposed to change the way of object model generation, with updating new model
objects if they have been changed after last deployment. However, solving this
issued is not an aim of this specification, so we can skip details.
* Get the list of such objects and destroy them in correct order and notify
subscribers about destruction.
The prevention can be done by adding ``__del__()`` magic method to the
MuranoObject class and creating cyclic reference in the object to itself.
When ``gc.collect()`` is done, all unreachable objects can be examined and
murano objects owned by current executor can be distinguished among them.
The difference in Python 3.4 and higher is that objects with a ``__del__()``
method don't end up in ``gc.garbage`` anymore and this list should be empty
most of the time [3]. So logic of adding object to GC candidates can be added
directly to ``__del__()``.
It means that in Python versions 3.4 and higher, murano objects will be added
to planned destruction from ``__del__()`` call caused by ``gc.collect()``,
and in versions prior to 3.4, presence of ``__del__()`` along with cyclic
reference to itself will provide adding the object to ``gc.garbage`` list,
and it can be added to candidates for destruction from there.
This logic can be used for garbage collection in all three scenarios
mentioned above.
The resulting list of GC candidates is then destroyed as described in the
*Multi-step destruction* section above.
With this approach, the comparison of Objects and ObjectsCopy is not needed
anymore. Garbage collector works with the same objects on each deployment,
so all changes are saved properly.
Code changes
------------
@ -282,21 +296,21 @@ library. It should have the following static methods:
* ``isDestroyed(object)`` - checks if the ``object`` was already destroyed
during some GC session and thus its methods cannot be called.
* ``subscribe(target, handler=null)`` - establishes a destruction dependency
from the caller to the object passed as ``target``. Method may be called
several times, in this case only a single destruction dependency will be
established, however the same amount of calls of ``unsubscribe`` will be
required to remove it.
* ``isDoomed(object)`` - can be used within the ``.destroy()`` method to
test if another object is also going to be destroyed.
* ``subscribeDestruction(publisher, subscriber, handler=null)`` - establishes
a destruction dependency from the ``subscriber`` to the object passed as
``publisher``. Method may be called several times, in this case only a single
destruction dependency will be established, however the same amount of calls
of ``unsubscribeDestruction`` will be required to remove it.
``handler`` argument is optional. If passed it should be the name of an
instance methods defined by the caller class to handle notification of
``target``'s destruction (see "Multi-step destruction" section above: this
handlers is executed for p. 3.1)
instance method defined by the caller class to handle notification of
``publisher``'s destruction (see "Multi-step destruction" section above: this
handler is executed for p. 3.1)
The following arguments will be passed to the handler method
* ``sender`` - an instance of ``GC`` class describing the current
garbage collection session.
The following arguments will be passed to the handler method:
* ``object`` - a target object which is going to be destroyed. It is not
recommended to persist the reference to this object anywhere. This will not
@ -305,35 +319,15 @@ library. It should have the following static methods:
so is considered to be advanced feature which should not be done unless it
is absolutely necessary.
* ``unsubscribe(target)`` - removes the destruction dependency from the caller
to the object passed as ``target``. Method may be called several times
without any side-effects. If ``subscribe`` was called more than once the same
(or more) amount of calls to ``unsubscribe`` is needed to remove the
* ``unsubscribeDestruction(publisher, subscriber, handler=null)`` - removes
the destruction dependency from the ``subscriber`` to the object passed as
``publisher``. Method may be called several times without any side-effects.
If ``subscribeDestruction`` was called more than once the same (or more)
amount of calls to ``unsubscribeDestruction`` is needed to remove the
dependency.
An instance of ``GC`` class will be created during the garbage collection
session to encapsulate runtime information about this session. It defines a
single method which may be of use during a GC session:
* ``isDoomed(object)`` - checks if the ``object`` is marked for destruction
during this GC session.
Pythonic back-end of the ``GC`` class should be able to establish destruction
dependencies by storing the back-refs to the dependent object in attributes of
the python's instance of MuranoObject representing the dependency.
.. note::
This is the opposite of how the destruction dependencies are stored
when the model is serialized: in serialized form that is the dependent
object who owns the reference to the dependency object. In runtime it
is the dependency object who owns the reference to dependent object.
When the garbage collection is needed the class will be instantiated and a list
of objects-to-delete created based on the current state of object graph, object
store and the execution context. Garbage collections will use this object for
all the steps of the workflow.
``handler`` argument is optional and must correspond the handler passed
during subscription if it was provided back then.
Alternatives
------------
@ -399,6 +393,7 @@ Primary assignee:
ativelkov
Other contributors:
Stan Lagun <istalker2>
starodubcevna
Work Items
@ -412,15 +407,12 @@ Work Items
* Modify the serializer / deserializer to properly persist the value of the
"destroyed object" flag.
* Modify the code which instantiates yaql contexts for MuranoPL so all the
created contexts are tracked by the execution session.
* Implement collecting unreferenced murano objects utilizing Python ``gc``
library
* Implement sorting algorithms to arrange objects-to-be-destroyed based on
criteria defined in p.2 of "Multi-step destruction" section above.
* Modify an algorithm to collect alive object roots from the runtime and
private properties and local variables of executing threads.
* Implement multi-step destruction workflow.
* Implement ``GC`` class to bind all the above.
@ -476,7 +468,7 @@ There should be test cases covering that:
python ARE garbage collected;
* garbage collector correctly processes stack-frame objects from green-threads
other than the one it is executed from;
other than the one it is executed from
Destruction dependency resolution order
---------------------------------------
@ -502,16 +494,14 @@ Destruction events
Given the base scenario of object A having a destruction dependency on object B
and B being GC'ed, there should be tests covering that:
* the right order of events occurs (A gets warned about possible B's
destruction -> A is notified about inevitable B's destruction -> B is
destroyed -> A is notified that B was destroyed);
* the right order of events occurs (B scheduled for destruction -> A is
notified about planned B's destruction -> B's ``.destroy()`` method is
called -> B gets destroyed);
* A may prevent B's destruction by establishing a reference on B in the warning
* A may prevent B's destruction by establishing a reference on B in the
handler;
* A may cancel GC in both warning and pre-destroy notification handlers;
* A may establish more then 1 destruction dependency on B and still be
* A may establish more than 1 destruction dependency on B and still be
notified just once;
* A may remove the destruction dependency and not get notified on B's
@ -526,8 +516,6 @@ and B being GC'ed, there should be tests covering that:
* B may establish a destruction dependency on itself thus subscribing to
appropriate notifications;
* ``phase`` property of GC instance is correct in appropriate event handlers;
* ``isDoomed`` and ``isDestroyed`` methods return appropriate values when
called by A for B in appropriate event handlers.
@ -535,10 +523,12 @@ Documentation Impact
====================
Developers documentation should be updated to describe the new ``GC`` class and
its static and instance methods, as well as the design guidelines for
application developers to follow to utilize the new capability.
its methods, as well as the design guidelines for application developers to
follow to utilize the new capability.
References
==========
[1] https://github.com/openstack/murano-specs/blob/master/specs/newton/approved/application-development-framework.rst
[2] https://docs.python.org/2/library/gc.html
[3] https://docs.python.org/3/library/gc.html