Fail to release a stack_lock in the database is a fatal action.

When a heat-engine thread activity completes, it calls release on
its stack_lock object in the database. If that release action fails
due to an inability to update the database, that engine process is no
longer usable. This code catches that failure, logs it, and terminates
that engine process so that a new one can be started. New heat engines
will automatically purge stale stack_locks from the database.
Also, make sure that if the thread exit does not teardown the process
after 5 seconds, the non blockable os level exit call will be invoked.

This bug is very timing specific. The DB error needs to exist when the
stack_lock release fails

Change-Id: I7663b2270bf325cd8e3dd194f2994227fd6f5e8a
Story: 2003439
Task: 24635
This commit is contained in:
Nakul Dahiwade 2018-08-14 18:16:11 +00:00
parent d96b564fd4
commit d7daa3438d
1 changed files with 15 additions and 1 deletions

View File

@ -15,8 +15,11 @@ import collections
import datetime
import functools
import itertools
import os
import pydoc
import signal
import socket
import sys
import eventlet
from oslo_config import cfg
@ -174,6 +177,10 @@ class ThreadGroupManager(object):
:param kwargs: Keyword-args to be passed to func
"""
def _force_exit(*args):
LOG.info('Graceful exit timeout exceeded, forcing exit.')
os._exit(-1)
def release(gt):
"""Callback function that will be passed to GreenThread.link().
@ -188,7 +195,14 @@ class ThreadGroupManager(object):
assert not notify.signalled()
notify.signal()
else:
lock.release()
try:
lock.release()
except Exception:
# allow up to 5 seconds for sys.exit to gracefully shutdown
signal.signal(signal.SIGALRM, _force_exit)
signal.alarm(5)
LOG.exception("FATAL. Failed stack_lock release. Exiting")
sys.exit(-1)
# Link to self to allow the stack to run tasks
stack.thread_group_mgr = self