updater: Shuffle suffixes so we don't keep hitting the same failures

When tuning your updater, you often want to try a new config, see how it changes your metrics, then adjust concurrency up or down depending on how your container layer is responding. If your containers haven't been doing well, though, and you've got a giant backlog of async pendings to work through, updater restarts to change concurrency previously posed a problem: the updater would walk the suffix directories in the same order every start-up. So, if you found a config that was making decent progress for a while but still had *some* failures, and you wanted to try tweaking settings to see if you could *reduce* those failures -- you'd likely start getting *all* failures as it went to retry the failed ones first and all at once. If you continued trying to tweak configs to get your failures to a reasonable rate, you'd almost certainly over-correct for these handful of overwhelmed DBs and not the overall cluster. Now, shuffle the suffixes before we walk them. Change-Id: I3ef34119f0cb563ab405a6517335a24dbaf2b4c3 Closes-Bug: #1878056
2020-05-09 23:16:04 -07:00 · 2020-05-09 23:16:04 -07:00 · dee98a74d4
parent d050ef82f7
commit dee98a74d4
1 changed files with 4 additions and 2 deletions
--- a/swift/obj/updater.py
+++ b/swift/obj/updater.py
@ -19,7 +19,7 @@ import signal
 import sys
 import time
 from swift import gettext_ as _
-from random import random
+from random import random, shuffle

 from eventlet import spawn, Timeout

@ -230,7 +230,9 @@ class ObjectUpdater(Daemon):
                                      'to a valid policy (%(error)s)') % {
                                    'directory': asyncdir, 'error': e})
                continue
-            for prefix in self._listdir(async_pending):
+            prefix_dirs = self._listdir(async_pending)
+            shuffle(prefix_dirs)
+            for prefix in prefix_dirs:
                prefix_path = os.path.join(async_pending, prefix)
                if not os.path.isdir(prefix_path):
                    continue