Randomize the container list for uploads

When we work through the list of containers in an alphabetical fashion,
we end up duplicating much of the layer fetching because it can occur at
the same time. Things like cinder-api, cinder-backup, cinder-volume
share many of the same layers.  Since we don't ensure that we only do
a single fetching of a layer hash durring the multiprocessing, we end
up duplicating the fetches of layers. By randomizing the fetches, we
reduce the likelihood that we'll be fetching the same family of service
containers concurrently.

Change-Id: Ifbcd55de52c9e2283203b1c6e2adeb266d43eca6
Related-Bug: #1844446
(cherry picked from commit 3adfefa13a)
This commit is contained in:
Alex Schultz 2019-09-26 11:05:15 -06:00 committed by Emilien Macchi
parent cc48b071b8
commit 0c3c2623de
1 changed files with 18 additions and 2 deletions

View File

@ -19,6 +19,7 @@ import hashlib
import json
import netifaces
import os
import random
import re
import requests
from requests.adapters import HTTPAdapter
@ -218,6 +219,7 @@ class ImageUploadManager(BaseImageManager):
container_images = self.load_config_files(self.CONTAINER_IMAGES) or []
upload_images = uploads + container_images
tasks = []
for item in upload_images:
image_name = item.get('imagename')
uploader = item.get('uploader', DEFAULT_UPLOADER)
@ -232,10 +234,24 @@ class ImageUploadManager(BaseImageManager):
multi_arch = item.get('multi_arch', self.multi_arch)
uploader = self.uploader(uploader)
task = UploadTask(
tasks.append(UploadTask(
image_name, pull_source, push_destination,
append_tag, modify_role, modify_vars, self.dry_run,
self.cleanup, multi_arch)
self.cleanup, multi_arch))
# NOTE(mwhahaha): We want to randomize the upload process because of
# the shared nature of container layers. Because we multiprocess the
# handling of containers, if performed in an alphabetical order (the
# default) we end up duplicating fetching of container layers. Things
# Like cinder-volume and cinder-backup share almost all of the same
# layers so when they are fetched at the same time, we will duplicate
# the processing. By randomizing the list we will reduce the amount
# of duplicating that occurs. In my testing I went from ~30mins to
# ~20mins to run. In the future this could be improved if we added
# some locking to the container fetching based on layer hashes but
# will require a significant rewrite.
random.shuffle(tasks)
for task in tasks:
uploader.add_upload_task(task)
for uploader in self.uploaders.values():