Retry jobs failed with MERGER_FAILURE

We sometimes get failed jobs that failed with MERGER_FAILURE. This
error is misleading to the user because it actually doesn't indicate a
merge conflict but some infrastructure related error. We already have
various retry mechanisms in place that retry most of the possible
failure causes within the executor. But catching all these places we
need to retry is difficult so we should add a safety net and
reschedule jobs that failed with MERGER_FAILURE.

Change-Id: I8844b11850c0a2cd3faddb7d8e944750c9da78ea
This commit is contained in:
Tobias Henkel 2018-10-12 09:48:47 +02:00
parent 859141cf4f
commit c982bfed4d
No known key found for this signature in database
GPG Key ID: 03750DEC158E5FA2
1 changed files with 15 additions and 0 deletions

View File

@ -395,6 +395,21 @@ class ExecutorClient(object):
if result in ('DISCONNECT', 'ABORTED'):
# Always retry if the executor just went away
build.retry = True
if result == 'MERGER_FAILURE':
# The build result MERGER_FAILURE is a bit misleading here
# because when we got here we know that there are no merge
# conflicts. Instead this is most likely caused by some
# infrastructure failure. This can be anything like connection
# issue, drive corruption, full disk, corrupted git cache, etc.
# This may or may not be a recoverable failure so we should
# retry here respecting the max retries. But to be able to
# distinguish from RETRY_LIMIT which normally indicates pre
# playbook failures we keep the build result after the max
# attempts.
if (build.build_set.getTries(build.job.name) <
build.job.attempts):
build.retry = True
result_data = data.get('data', {})
warnings = data.get('warnings', [])
self.log.info("Build %s complete, result %s, warnings %s" %