The backoff looping call logging was previously making
a decision on if timeout had or was going to be exceeded
by including idle time and it's internal time, however
this is misleading as the overall operation may take
30 seconds externally, a user initiated timeout of 30
seconds is requested, and the error might say something
like 18 seconds has passed, when that is incorrect.
The logic is actualy correct, the logging was just
misleading.
Fixes the exception message to use the total time.
Change-Id: Ie9ef5a53abb24f6ab7de0ad57a672c0a8d7ff0ee
Instead of having a copy-pasted version in this project, let's just
use the original directly. It is added to the public API of
oslo.utils in the dependency.
Depends-On: https://review.openstack.org/614806
Change-Id: If0dfac2505d097c117ef94c99399b1614f1e1f8f
Patch [1] for switched to use eventlet Event for loopingcall events.
It may now happen that stop event is sent when other event was
already sent and loopingcall is already not running.
That cause AssertionError in eventlet.event module.
To avoid that, we should check if if loopingcall is
running before sending _abort.set().
[1] https://review.openstack.org/#/c/611807/
Closes-Bug #1800879
Change-Id: I28ad3bdb51a20350c90dee4420058c30946897e5
This fixes broken logic for finding out how to use green threads. Using
threading.Event with eventlet creates useless load related to timers.
Change-Id: I62e9f1a7cde8846be368fbec58b8e0825ce02079
Closes-Bug: #1798774
This reverts commit 5975da493b.
Added code to support the case where unit tests are not being
monkey patched (example heat).
Change-Id: If715fbe21ac085e4f5c83cef0729dbca8dcb19ca
Some of the openstack services implement worker tasks that are based on
the oslo-service LoopingCallBase objects. They do this as a way to have
a task that runs periodically as a greenthread within a child worker
process. For example, the neutron-server runs AgentStatusCheckWorker()
objects as base service workers in its child worker processes.
When the parent server process handles a SIGTERM signal it attempts to
stop all services launched on each of the child worker processes (i.e.,
ProcessLauncher.stop()). That results in a stop() being called on each
of the underlying base services and then a wait() to ensure that they
complete before shutdown.
If any service that is implemented on a LoopingCallBase related object
is suspended on a greenthread.sleep() the previous call to stop() will
have no effect and so the wait() will block until the sleep() finishes.
For tasks that either have a frequent FixedLoopingBase interface or a
short initial_delay this may not be a problem, but for those with a long
delay this could mean that the wait() blocks for minutes before the
process is allowed to shutdown.
To solve this the LoopingCallBase calls to greenthread.sleep() are being
replaced with a threading.Event() object's wait() method. This allows a
caller of stop() to interrupt the sleep and expedite the shutdown.
Closes-Bug: #1660210
Change-Id: I5835f9595826df5349e4cc8b1da8529bb960ee04
Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
The backoff timer has a few issues that can cause it to get stuck
in an infinite loop and never time out.
1. The random.gauss() function used to generate random jitter can
return negative values, so when it does, it makes the elapsed time
self._error_time go "backward."
2. The random jitter is used as a multiplier for the self._interval,
so self._interval can deviate far away from the mean over time and
walk to zero, causing self._interval to be 0, which will prevent
the timer from making progress from that point on because idle
will always evaluate to zero and the elapsed time won't increase.
3. The evaluated interval doesn't have a lower bound, so over time
it can get extremely small if jitter (the mean) < 0.5.
This adds a min_interval keyword argument to the BackOffLoopingCall
start() function that defaults to 0.001s and uses it to lower bound
the interval calculations. We'll also take the absolute value of the
return from random.gauss() to prevent elapsed time going backward, and
we'll calculate the running self._interval separately to make it track
the desired growth rate of the backoff and not let it drift with the
random.gauss() values.
Closes-Bug: #1686159
Change-Id: Id17668a34d5cedbe870c9056350a7e9c7196faa7
Currently when using FixedIntervalLoopingCall, folks need to
add timeout checking logic in their function if they need it.
Adding a new class FixedIntervalWithTimeoutLoopingCall to
provide timeout checking support will save those effort.
Change-Id: I78bfb9e259c2394137d7efbc0ee96bb18a6dc5e7
We use "%(delay).2f" to print the time taken so even
if the delay is say 0.001, we will still print the
warning. We should just round it off till the 2nd place
to avoid odd looking lines that say:
run outlasted interval by 0.00 sec
Change-Id: I7137eb7ad985d7f35adc62a65ad1218a39a7a959
The module 'event' is an import and it is also
a function argument name in this function, this
can be confusing so rename it to be named differently
in the function.
Change-Id: I9382746f5117329b9292f811fc6a05118e76a9d0
The RetryDecorator is constructed with a list of expected exceptions and
if the function being retried raises an exception in that list, it is
currently logging a warning (with traceback). If there is a timeout, an
error is also logged.
This is bad practice because only the caller of the method being retried
has context as to what's going on and knows what level things should be
logged at (and if tracebacks should be logged at all).
This change drops the warning and error logging to debug level. We let
the caller handle the final re-raised exception and handle logging it at
the appropriate level and with the appropriate context for the message
to make sense to someone reading the logs rather than the somewhat
obscure messages logged within RetryDecorator.
Closes-Bug: #1503017
Change-Id: I07344aae977aca540a0555eef8d35b07bb969cbb
If the periodic task for dynamic looping call returns no suggestion for
delay, then we should use periodic_interval_max. If
periodic_interval_max is not specified, we should raise RuntimeError.
Otherwise, passing 'None' as a value, causes Exception at eventlet's
sleep() call.
Change-Id: Ida3e75bc64132d6b5920fa94657aa962e2fe9f53
Closes-bug: #1489998
The DEBUG log message in _run_loop() method will record every
loopingcall info every interval, that may cause log noise. e.g. In
ceilometer, the partition coordinator will have a fixed interval looping
call every two seconds as default.
Change-Id: I32aa474ece2458aa7cc1a5c271a40f664bf07af2
Using the utility function gets a better name.
For example:
$ python
>>> from oslo_utils import reflection
>>> class A(object):
... def m(self):
... pass
...
>>> z = A()
>>> reflection.get_callable_name(z.m)
'__main__.A.m'
Versus:
>>> z.m.__name__
'm'
Change-Id: I2eb9f49f6d7578d5d4e3a40cbd695e6622f3f850
This function should be hidden from public usage by
prefixing it with an underscore since this really is an
internal impl. detail and should be treated as such.
Change-Id: I3b1d6c73556d22de19923ca273a6dfc9c9b77e97
When no exceptions are passed in and the exception
tuple is empty this appears to catch no exceptions so
document that this happens.
>>> excps = ()
>>> try:
... raise IOError("I am broken")
... except excps:
... print("captured")
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
IOError: I am broken
Change-Id: I141ffd53b8f864bb0502d2e022264ea3145213ff
In some cases we don't want LoopingCall to iterrupt if there is
an uncaught exception in the callee. In case of exception LoopingCall
needs to wait similarly as if a call succeeded.
Co-Authored-By: Eugene Nikanorov <enikanorov@mirantis.com>
Related-Bug: #1458119
Change-Id: I58da03017b923cf5ec2223ecfc61c766726bd1de
Per suggestion during review of this changeset:
Ic5d57fcf769a4af53cd1cf82a3ca93142dbdb03f
If we use six.wraps then it makes things better!
Depends-On: Ibd2410e0153053b5121155474e99752256c7e4b8
Change-Id: Icc14d61504b9db0d91aa9b177abdab57c2f7ee55
RetryDecorator from oslo.vmware needs a better home. It allows
users to specify specific exceptions upon which the request can
be retried.
http://git.openstack.org/cgit/openstack/oslo.vmware/tree/oslo_vmware/api.py#n48
Depends-On: I0f07858e96ea3baf46f8a453e253b9ed29c7f7e2
Depends-On: I33bd2d9dff9cb7dc1a50177db7286b7317966784
Closes-Bug: #1465629
Change-Id: Ic5d57fcf769a4af53cd1cf82a3ca93142dbdb03f
This refactors the looping call subclasses to share
a common '_run_loop' function, this avoids duplicating that
code and taking advanage of the commonality of both of these
subclasses.
This also moves this code closer to not being dependent on
eventlet.
Change-Id: I61c0bbd6a7cda11f96374fd7c453ac5fa89ed613
Monotonic time and oslo.utils stopwatches will not go backwards
and therefore will not cause periodicity + misc. problems where tasks
are ran again or at the wrong times in a system that is having its
clock altered (via ntpd or other).
Related-Bug: #1450438
Change-Id: I90881842185c607eb6c8ea5bb7326a37e1bc3742