From 480f58004d485b8388823fdd0fa8380cc999bd6e Mon Sep 17 00:00:00 2001 From: Kevin Carter Date: Thu, 31 Aug 2017 23:40:41 -0500 Subject: [PATCH] Hyperconverged Containers Change-Id: I62c72341a1eadf6e119215a657941e7fd5fbb077 Signed-off-by: Kevin Carter --- specs/queens/hyperconverged-containers.rst | 168 +++++++++++++++++++++ 1 file changed, 168 insertions(+) create mode 100644 specs/queens/hyperconverged-containers.rst diff --git a/specs/queens/hyperconverged-containers.rst b/specs/queens/hyperconverged-containers.rst new file mode 100644 index 0000000..0cd7293 --- /dev/null +++ b/specs/queens/hyperconverged-containers.rst @@ -0,0 +1,168 @@ +Hyper-Converge Containers +######################### +:date: 2017-09-01 22:00 +:tags: containers, hyperconverged, performance + +Reduce container counts across the infra structure hosts. + +To lower our deployment times and resource consumption across the board. This +spec looks to remove single purpose containers that have little to no benefit +on the architecture at scale. + +This change groups services resulting in fewer containers. This does not +mix service categories so there's no worry of cross polluting a different +service with unknown packages or unknown workloads. We're only look to minimize +the container types we have and simplify operations. By converging containers +we're removing no less than 10 steps in the container deployment process and the +service setup. Operationally we're reducing the load on operations teams +managing clouds at any scale. + + +Problem description +=================== + +When we started this project we started with the best of intentions to create a +pseudo micro-service model for our system layout and container orchestration. +While this works today, it does create a lot of unnecessary containers in terms +of resource utilization. + + +Proposed change +=============== + +Converge groups of containers found within the `env.d` directory into a single +container where at all possible. Most the changes we need to get this work done +have already been committed. In some instances we will need to "revert a change" +to get the core functionality of this spec into master but there will be little +to no development required to get the initial convergence work completed. + +Once the convergence work is complete we intend to develop a set of playbooks +which will allow the deployer to run an "opt-in" set of tasks which will cleanup +containers and services wherever necessary. Services behind a load balanacer +will need to be updated. Updates to the load balancer will be covered by the +"opt-in" playbooks provided the environment is using our supported software +LB (HAProxy). The "opt-in" playbooks will need to be codified, tested, and +documented. Should it be decided that the hyperconverged work is to be +cherry-picked to a stable branch, the new playbooks will need to first exist +and be tested within our periodic gates. We should expect no playbook impact +in-terms of the general deployer workflow. + + +Alternatives +------------ + +We could leave everything as-is which carries the resource requirements we +currently have along with an understanding that the resources required will +grow given the fact OpenStack services, both existing and net new, are ever +expanding. + + +Playbook/Role impact +-------------------- + +At least one new playbook will be added allowing a deployer to cleanup old +container types from the run-time and inventory should they decide to. The +cleanup playbook(s) will be "opt-in" and will not be part of our normal +automated deployment process. + + +Upgrade impact +-------------- + +There is no upgrade impact with this change as any existing deployment would +already have the all required associations within inventory. Services would +continue to function normally after this change. Greenfield deployments on the +other hand would have fewer containers to manage which reduces the resource +requirements while also ensuring we retain the host, network, and process +separation we have today. + +We will create a set of playbooks to cleanup some of the redundant containers +that would exist post upgrade however the execution of this playbook would be +opt-in. + + +Security impact +--------------- + +Security is not a concern within this spec however reducing the container +count would reduce the potential attack surface we already have. + + +Performance impact +------------------ + +Hyperconverging containers will reduce resource consumption on physical host. +Reducing the resources required to run an OpenStack cloud will improve the +performance of the playbooks and the system as a whole. + + +End user impact +--------------- + +N/A + + +Deployer impact +--------------- + +Deployers will have fewer containers to manage and be concerned with as they +run clouds for long periods of time. + +* Within an upgrade scenario a deployer will have the option to "opt-in" to a + hyperconverged setup. This change will have no service impact on running + deployments by default. + + +Developer impact +---------------- + +N/A + + +Dependencies +------------ + +* If we're to test the "opt-in" cleanup playbooks we'll need a periodic upgrade + gate job. The playbooks would be executed by the upgrade gate job and post + results to the ML/channel so that the OSA development team is notified of the + failure. + + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + Kevin Carter (IRC: cloudnull) + Major Hayden (IRC: mhayden) + + +Work items +---------- + +* Converge the containers into fewer groups +* Create the "opt-in" container reduction playbooks +* Document the new playbooks + + +Testing +======= + +* The core functionality of this patch will be tested on every commit. +* If the upgrade test dependencies are met we can create a code path within the + periodic gates and test the "opt-in" cleanup playbooks. + + +Documentation impact +==================== + +Documentation will be created for the "opt-in" container cleanup playbooks +created. + + +References +========== + +N/A