Fix watchdog timeout fix
In I6cae11c1e89f6ccc78cb5bfaf61ef78e846e87be, we attempted to fix an error where long-running workers never reset their watchdog timeout flag, meaning that once a job timed out, all further jobs on that worker timed out. That change cleared the flag each time ansible ran. However, that flag is also used in conjunction with the abort flag to determine whether a failed or null result should be sent back to Zuul (a null result will cause a job to be rescheduled). By clearing the flag before, say, a post playbook we would lose the information that the abort was due to a timeout rather than a direct abort request, and return the null result to Zuul. This means all jobs that timeout would be relaunched. Instead of clearing the flag before each ansible run, clear it once at the start of the job launch. This means it will be set for any ansible timeout. That should be fine for both the aborted job check as well as the new "timed out" log message. The typo this change corrects indicates this was the intended logic. Change-Id: Ie31409a7706b6cf4d7ce858b4d5f0c00e4ee31da
This commit is contained in:
parent
cef224d162
commit
7f7ddbdfa0
|
@ -815,7 +815,7 @@ class NodeWorker(object):
|
|||
result = None
|
||||
self._sent_complete_event = False
|
||||
self._aborted_job = False
|
||||
self._watchog_timeout = False
|
||||
self._watchdog_timeout = False
|
||||
|
||||
try:
|
||||
self.sendStartEvent(job_name, args)
|
||||
|
@ -1424,8 +1424,6 @@ class NodeWorker(object):
|
|||
preexec_fn=os.setsid,
|
||||
env=env_copy,
|
||||
)
|
||||
# Reset timeout flag
|
||||
self._watchdog_timeout = False
|
||||
ret = None
|
||||
watchdog = Watchdog(ANSIBLE_DEFAULT_PRE_TIMEOUT,
|
||||
self._ansibleTimeout,
|
||||
|
@ -1467,8 +1465,6 @@ class NodeWorker(object):
|
|||
preexec_fn=os.setsid,
|
||||
env=env_copy,
|
||||
)
|
||||
# Reset timeout flag
|
||||
self._watchdog_timeout = False
|
||||
ret = None
|
||||
watchdog = Watchdog(timeout + ANSIBLE_WATCHDOG_GRACE,
|
||||
self._ansibleTimeout,
|
||||
|
@ -1522,8 +1518,6 @@ class NodeWorker(object):
|
|||
preexec_fn=os.setsid,
|
||||
env=env_copy,
|
||||
)
|
||||
# Reset timeout flag
|
||||
self._watchdog_timeout = False
|
||||
ret = None
|
||||
watchdog = Watchdog(ANSIBLE_DEFAULT_POST_TIMEOUT,
|
||||
self._ansibleTimeout,
|
||||
|
|
Loading…
Reference in New Issue