diff --git a/specs/containerize-openstack.rst b/specs/containerize-openstack.rst new file mode 100644 index 0000000000..d817990ced --- /dev/null +++ b/specs/containerize-openstack.rst @@ -0,0 +1,243 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +====================== +Containerize OpenStack +====================== + +When upgrading or downgrading OpenStack, it is possible to use package based +management or image-based management. Containerizing OpenStack is meant to +optimize image-based management of OpenStack. Containerizing OpenStack +solves a manageability and availability problem with the current state of the +art deployment systems in OpenStack. + +Problem description +=================== + +Current state of the art deployment systems use either image based or package +based upgrade. + +Image based upgrades are utilized by TripleO. When TripleO updates a system, +it creates an image of the entire disk and deploys that rather than just the +parts that compose the OpenStack deployment. This results in significant +loss of availability. Further running VMs are shut down in the imaging +process. However, image based systems offer atomicity, because all related +software for a service is updated in one atomic action by reimaging the system. + +Other systems use package based upgrade. Package based upgrades suffer from +a non-atomic nature. An update may update 1 or more RPM packages. The update +process could fail for any number of reasons, and there is no way to back +out the existing changes. Typically in an OpenStack deployment it is +desireable to update a service that does one thing including it's dependencies +as an atomic unit. Package based upgrades do not offer atomicity. + +To solve this problem, containers can be used to provide an image-based update +approach which offers atomic upgrade of a running system with minimal +interruption in service. A rough prototype of compute upgrade [1] shows +approximately a 10 second window of unavailability during a software update. +The prototype keeps virtual machines running without interruption. + +Use cases +--------- +1. Upgrade or rollback OpenStack deployments atomically. End-user wants to + change the running software versions in her system to deploy a new upstream + release without interrupting service for significant periods. +2. Upgrade OpenStack based by component. End-user wants to upgrade her system + in fine-grained chunks to limit damage from a failed upgrade. +3. Rollback OpenStack based by component. End-user experienced a failed + upgrade and wishes to rollback to the last known good working version. + + +Proposed change +=============== +An OpenStack deployment based on containers are represented in a tree structure +with each node representing a container set, and each leaf representing a +container. + +The full properties of a container set: + +* A container set is composed of one or more container subsets or one or more + individual containers +* A container set provides a single logical service +* A container set is managed as a unit during startup, shutdown, and version +* Each container set is launched together as one unit +* A container set with subsets is launched as one unit including all subsets +* A container set is not atomically managed +* A container set provides appropriate hooks for high availability monitoring + +The full properties of a container: + +* A container is atomically upgraded or rolled back +* A container includes a monotonically increasing generation number to identify + the container's age in comparison with other containers +* A container has a single responsibility +* A container may be super-privileged when it needs significant access to the + host including: + * the network namespace of the host + * The UUID namespace of the host + * The IPC namespace of the host + * Filesystem sharing of the host for persistent storage +* A container may lack any privileges when it does not require significant + access to the host. +* A container should include a check function for evaluating its own health. +* A container will include proper PID 1 handling for reaping exited child + processes. + +The top level container sets are composed of: + +* database control +* messaging control +* high availability control +* OpenStack control +* Openstack compute operation +* OpenStack storage operation + +The various container sets are composed in more detail as follows: + +* Database control + * galera + * mariadb + * mongodb + +* Messaging control + * rabbitmq + +* High availability control + * HAProxy + +* OpenStack control + * keystone + * glance-controller + * glance-api + * glance-registry + * nova-controller + * nova-api + * nova-conductor + * nova-scheduler + * neutron-controller + * neutron-server + * neutron-agents + * metadata + * ceiloemter-controller + * ceilometer-alarm + * ceilometer-api + * ceilometer-base + * ceilometer-central + * ceilometer-collector + * ceilometer-notification + * heat-controller + * heat-api + * heat-engine + +* Openstack compute operation + * nova-compute + * nova-libvirt + * neutron-agents-linux-bridge + * neutron-agents-ovs + * dhcp + * l3 + +* OpenStack storage operation + * Cinder + * Swift + * swift-account + * swift-base + * swift-container + * swift-object + * swift-proxy-server + +In order to achieve the desired results, we plan to permit super-privileged +containers. A super-privileged container is defined as any container launched +with the --privileged=true flag to docker that: + +* bind-mounts specific security-crucial host operating system directories + with -v. This includes nearly all directories in the filesystem except for + leaf directories with no other host openarting system use. +* shares any namespace with the --ipc=host, --pid=host, or --net=host flags + +We will use the docker flag --restart=always to provide some measure of +high availability for the individual containers and ensure they operate +correctly as currently designed. + +A host tool will run and monitor the container's built-in check script via +docker exec to validate the container is operational on a pre-configured timer. +If the container does not pass its healthcheck operation, it should be +restarted. + +Integration of metadata with fig or a similar single node Docker orchestration +tool will be implemented. Even though fig executes on a single node, the +containers will be designed to run multi-node and the deploy tool should take +some form of information to allow it to operate multi-node. The deploy tool +should take a set of key/value pairs as inputs and convert them into inputs +into the environment passed to Docker. These key/value pairs could be a file +or environment variables. We will not offer integration with multi-node +scheduling or orchestration tools, but instead expect our consumers to manage +each bare metal machine using our fig or similar in nature tool integration. + +Any contributions from the community of the required metadata to run these +containers using a multi-node orchestration tool will be warmly received but +generally won't be maintained by the core team. + +The technique for launching the deploy script is not handled by Kolla. This +is a problem for a higher level deployment tool such as TripleO or Fuel to +tackle. + +Logs from the individual containers will be retrievable in some consistent way. + +Security impact +--------------- + +Container usage with super-privileged mode may possibly impact security. For +example, when using --net=host mode and bind-mounting /run which is necessary +for a compute node, it is possible that a compute breakout could corrupt the +host operating system. + +To mitigate security concerns, solutions such as SELinux and AppArmor should +be used where appropriate to contain the security privileges of the containers. + +Performance Impact +------------------ + +The upgrade or downgrade process changes from a multi-hour outtage to a 10 +second outage across the system. + +Implementation +============== + + +Assignee(s) +----------- + +Primary assignee: + +kolla maintainers + +Work Items +---------- + +1. Container Sets +2. Containers +3. A minimal proof of concept single-node fig deployment integration +4. A minimal proof of concept fig healthchecking integration + +Testing +======= + +Functional tests will be implemented in the OpenStack check/gating system to +automatically check that containers pass each container's functional tests +stored in the project's repositories. + +Documentation Impact +==================== + +The documentation impact is unclear as this project is a proof of concept +with no clear delivery consumer. + + +References +========== + +* [1] https://github.com/sdake/compute-upgrade