diff --git a/doc/source/design/media/image18.png b/doc/source/design/media/image18.png new file mode 100644 index 00000000..3c111999 Binary files /dev/null and b/doc/source/design/media/image18.png differ diff --git a/doc/source/design/ref_arch_100_nodes.rst b/doc/source/design/ref_arch_100_nodes.rst new file mode 100644 index 00000000..e2a0ba79 --- /dev/null +++ b/doc/source/design/ref_arch_100_nodes.rst @@ -0,0 +1,1455 @@ +.. _ref_arch_for_100_300_500_nodes: + +=========================================================== +OpenStack Reference Architecture For 100, 300 and 500 Nodes +=========================================================== + +This document proposes a new Reference Architecture (RA) of OpenStack +installation on top of Kubernetes that supports a number of 100, 300 and 500 +compute nodes, using container technologies to improve scalability and +high availability of OpenStack Control Plane services. Containerization +of OpenStack components will also enable provisioning, patching and +upgrading large numbers of nodes in parallel, with high reliability and +minimal downtime. + +Introduction/Executive Summary +------------------------------ + +This document contains recommendations for building specific clouds +depending for different use cases. All recommendations are validated and +tested on the described scale in both synthetic and real-world +configurations. + +The proposed Reference Architecture applies the following open source +tools (among others): + +- OpenStack Control Plane is a scalable, modular cloud controller with + support for all aspects of virtualized infrastructure. + +- Ceph is a distributed storage system that provides all the most + popular types of storage to a virtualized infrastructure: object + storage, virtual block storage and distributed file system. + +- InfluxDB is a time-series database optimized for collecting metrics + from multiple sources in nearly-real time and providing access to + recorded metrics. + +- Docker containers are used to isolate OpenStack services from the + underlying operating system and control the state of every + service more precisely. + +Highlights +~~~~~~~~~~ + +Highlights of this document include: + +- Hardware and network configuration of the lab used to develop the + Reference Architecture. + +- OpenStack Control Plane overview - Details how the OpenStack Control + Plane is organized, including placement of the services for + scaling and high availability of the control plane. + +- Data plane overview - Describes the approach to the data plane and + technology stack used in the Reference Architecture. + +- Granular update and upgrade overview - Describes how the proposed + Reference Architecture supports updating and upgrading on all + levels from individual services to the whole OpenStack + application. + +Overview +-------- + +Hardware and network considerations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This section summarizes hardware considerations and network layouts for +the proposed solution. It defines the basic requirements to server +equipment hosting the cloud based on the CCP RA. Requirements to network +infrastructure in terms of L2 and L3 topologies, services like DNS and +NTP and external access provided in the network. + +OpenStack Control Plane +~~~~~~~~~~~~~~~~~~~~~~~ + +The Control Plane consists of OpenStack component services, like Nova, +Glance and Keystone, and supplementary services like MySQL database +server and RabbitMQ server, all enveloped in Docker containers and +managed by an orchestrator (e.g. Kubernetes). + +OpenStack Data Plane +~~~~~~~~~~~~~~~~~~~~ + +OpenStack data plane is constituted by backends to various drivers of +different components of OpenStack. They all fall into 3 main categories: + +- Hypervisor is data plane component backing OpenStack Compute (Nova), + for example, libvirt or VMWare vSphere. + +- Networking is multiple data plane components under management of + OpenStack Networking, for example, OpenVSwitch. + +- Storage has multiple components managed by OpenStack Storage and + Images services. This category includes such systems as LVM, + iSCSI, Ceph and others. + +Granular Life Cycle Management, Updates and Upgrades +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This document describes the strategy of updating the OpenStack cloud and +its components to new version. The strategy of upgrade is based on +containerization of all those components. Containers effectively split +the state of the system into set of states of individual container. +Every container's state is managed mostly independently. + +Hardware Overview +----------------- + +Server Hardware Specifications +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The following server hardware was used in a lab to install and test the +proposed architecture solution. For Compute nodes, two configurations +are used. + +Configuration One + +- Server model is Dell R630 + +- 2x12 Core CPUs E5-2680v3 + +- 256GB of RAM + +- 2x800GB SSD Intel S3610 + +- 2x10GB Intel X710 dual-port NICs + +Configuration Two + +- Server model is Lenovo RD550-1U + +- 2x12 Core CPUs E5-2680v3 + +- 256GB of RAM + +- 2x800GB SSD Intel S3610 + +- 2x10GB Intel X710 dual-port NICs + +For Storage nodes, the following configuration is used. + +- Server model is Lenovo RD650 + +- 2x12 Core CPUs E5-2670v3 + +- 128GB RAM + +- 2x480GB SSD Intel S3610 + +- 10x2TB HDD + +- 2x10GB Intel X710 dual-port NICs + +Resource Quantities +~~~~~~~~~~~~~~~~~~~ + +Compute/Controller Resources +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The number of Compute/Controller nodes in the environment: 100 nodes + +The number of CPU Cores available to hypervisors and control plane +services: 2400 cores + +The amount of RAM available to hypervisors and control plane services: +25,600 GB + +Storage Resources +^^^^^^^^^^^^^^^^^ + +* The number of Storage nodes in the environment: 45 nodes + +* The number of CPU Cores available to storage services: 1,080 cores + +* The amount of RAM available to storage cache: 5,760 GB + +* The total size of raw disk space available on storage nodes: 900 TB + +Servers are installed in 9 racks connected by ToR switches to spine +switches. + +Network Schema +-------------- + +Underlay Network Topology +~~~~~~~~~~~~~~~~~~~~~~~~~ + +The environment employs leaf switches topology in the underlay network. +BGP protocol used in the underlay network to ensure multipath links +aggregation (IP ECMP) to leaf switches. ToR leaf switches are connected +to spines with 40GbE uplinks. + +The leaf switches use VXLANs to provide overlay network to servers, and +MLAG aggregation to ensure availability and performance on the +downstream links. Servers are connected to ToR switches with 40GbE +port-channel links (4x10GbE with MLAG aggregation). + +The following diagram depicts the network schema of the environment: + +Figure 1. Network schema of the lab environment +''''''''''''''''''''''''''''''''''''''''''''''' + +|image0| + +No specific QoS configuration was made in the underlay network. Assume +that all services share the total bandwidth of network link without +guarantees for individual processes or sockets. + +The following models of switching hardware were used throughout testing +effort in the schema described above: + +- Spine switches: Arista 7508E (4x2900PS, 6xFabric-E modules, + 1xSupervisorE module) + +- ToR switches: Arista 7050X + +Network for OpenStack Platform +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +OpenStack platform uses underlay network to exchange data between its +components, expose public API endpoints and transport the data of +overlay or tenant networks. The following classes of networks are +defined for OpenStack platform in proposed architecture. + +- Management network. This is the network where sensitive data exchange + happens. Sensitive data includes authentication and authorization + credentials and tokens, database traffic, RPC messages and + notifications. This network also provides access to + administrative API endpoints exposed by components of the + platform. + +- Public network. Network of this class provides access to API + endpoints exposed by various components of the platform. It also + connects Floating IPs of virtual server instances to external + network segments and Internet. + +- Storage network. Specialized network to transport data of storage + subsystem, i.e. Ceph cluster. It also connects to internal Ceph + API endpoints. + +- Overlay network. Networks of this class transport the payload of + tenant networks, i.e. the private networks connecting virtual + server instances. The current architecture employs VXLANs for L2 + encapsulation. + +There can be one or more networks of each class in a particular cloud, +for example, Management network class can include separate segments for +messaging, database traffic and Admin API endpoints, and so on. + +In addition to role segmentation, networks in one class can be separated +by scale. For example, Public network class might include multiple +per-rack segments, all connected by an L3 router. + +Figure 2. Logical networks topology in OpenStack environment +'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' + +|image1| + +Control Plane +------------- + +OpenStack Overview +~~~~~~~~~~~~~~~~~~ + +OpenStack is a system that provides Infrastructure as a Service (IaaS). +IaaS is essentially a set of APIs that allow for creation of elements of +typical data center infrastructure, such as virtual servers, networks, +virtual disks and so on. Every aspect of an infrastructure is controlled +by a particular component of OpenStack: + +- OpenStack Compute (Nova) controls virtual servers lifecycle, from + creation through management to termination. + +- OpenStack Networking (Neutron) provides connectivity between virtual + servers and to the world outside. + +- OpenStack Image (Glance) holds base disk images used to boot the + instances of virtual servers with an operating system contained + in the image. + +- OpenStack Block Storage (Cinder) manages the virtual block storage + and supplies it to virtual servers as block devices. + +- OpenStack Identity (Keystone) provides authentication/authorization + to other components and clients of all APIs. + +Guidelines for OpenStack at Scale +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Currently, OpenStack scalability is limited by several factors: + +- Placement responsiveness desired + +- MySQL scalability + +- Messaging system scalability + +- Scalability of the SDN + +- Scalability of the SDS + +Scheduler Scalability +^^^^^^^^^^^^^^^^^^^^^ + +In general, the scheduling strategy for OpenStack is a completely +optimistic determination which means the state of all things is +considered whenever a placement decision is made. As more factors are +added to the consideration set (special hardware, more compute nodes, +affinity, etc.) the time to place resources increases at a quadratic +rate. There is work being done on splitting out the scheduling and +placement parts of Nova and isolating them as a separate service, like +Cinder or Neutron. For now, however, this is a work in progress and we +should not rely on increased scheduler performance for another few +releases. + +OpenStack can seamlessly accommodate different server types based on +CPU, memory and local disk without partitioning of the server pool. +Several partitioning schemes exist that provide ability to specify pools +of servers appropriate for a particular workload type. A commonly used +scheme is server aggregates, which allows specific image sets to be +scheduled to a set of servers based on server and image tags that the +administrator has defined. + +MySQL Scalability +^^^^^^^^^^^^^^^^^ + +As almost all OpenStack services use MySQL to track state, scalability +of the database can become a problem. Mirantis OpenStack deploys with a +MySQL + Galera cluster which allows for reads and writes to be scaled in +a limited fashion. + +However, careful tuning and care of the +databases is a very important consideration. MySQL is recommended to run +on dedicated nodes. The Control Plane DB should not be used by Tenant +Applications or other solutions like Zabbix; those should provision +another MySQL cluster on other physical nodes. + +Messaging System Scalability +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +A majority of OpenStack services use AMQP implementations (RabbitMQ and +Qpid) for message transport and RPC. As with MySQL, there are examples +of installations in the wild which have successfully scaled this +messaging infrastructure to tens of thousands of nodes. + +Services Overview +~~~~~~~~~~~~~~~~~ + +The following paragraphs describe how the individual services and +components should be placed and configured in the proposed Reference +Architecture. All services are divided into two categories: stateless +and stateful. Each category requires specific approach which is outlined +below. + +Stateless Services +^^^^^^^^^^^^^^^^^^ + +Services of this type do not record their state in their environment. +Those services can be killed and restarted without risk of losing data. +Most of OpenStack component services are inherently stateless being +either HTTP-based API servers or message queue processors. + +General approach to stateless services is envelop the service into +Docker container with minimal to no configuration embedded. + +Stateful Services +^^^^^^^^^^^^^^^^^ + +Services that keep track of their state in some persistent storage and +cannot be restarted in clean environment without losing that state are +considered stateful. The main stateful component of the OpenStack +platform is MySQL state database. Most of other services rely on the +database to keep track of their own state. + +Containers that run stateful services must have persistent storage +attached to them. This storage must be made available in case if the +container with stateful service should be move to another location. Such +availability can be ensured on application level via replication of some +sort, or on network level by redirecting connections from moving +container to the same storage location. + +OpenStack Identity (Keystone) +----------------------------- + +Keystone service provides identity to all other OpenStack services and +external clients of OpenStack APIs. The main component of Identity +service is an HTTP server that exposes an API of the service. + +Keystone service is used by all other components of OpenStack for +authorization of requests, including system requests between services +and thus gets called a lot even if the platform is idle. This generates +load on CPU resources of server where it runs. Thus, it is recommended +to run Keystone service on dedicated node, althhough they could be +co-located with other resource intensive components. + +For availability and load distribution it is recommended to run at least +3 instances of Keystone at the scale of 100 and 300 nodes. For scale +of 500 nodes, 5 instances of Keystone server are recommended. + +Load balancer with a Virtual IP address should be placed in front +of the Keystone services to ensure handling of failures and even +distribution of requests. + +Per these recommendations, the recommended number of dedicated physical +nodes to run Keystone service is 3 or 5. However, instances of Keystone +can share these nodes with any other services. + +Apache2 Web Server +~~~~~~~~~~~~~~~~~~ + +Apache web server wraps Keystone and Horizon services and exposes them +as web services. However, since Horizon can generate significant load +on resources, it is not recommended to co-locate it with Keystone, but +it is possible at scale of 100 to 500 nodes. See below for recommendations +on scaling Horizon. + +Figure 3. Keystone API servers in redundant configuration +''''''''''''''''''''''''''''''''''''''''''''''''''''''''' + +|image2| + +OpenStack Compute (Nova) +------------------------ + +Nova has multiple services, not all of which are stateless. API server, +scheduler and conductor in particular are stateless, while nova-compute +service is not as it manages the state of data-plane components on a +compute node. + +Nova API +~~~~~~~~ + +Web server that exposes Compute API of the OpenStack platform. This is a +stateless service. Nova API server consumes significant resources at +scale of hundreds of nodes. + +Nova API server should run on a dedicated physical servers to ensure it +does not share resources with other services with comparably big +footprint. It can co-locate with more lightweight services. + +For availability reasons, it is recommended to run at least 3 instances +of API server at any time. Load balancer with Virtual IP address shall +be placed in front of all API servers. + +Figure 4. Nova API servers in redundant configuration +''''''''''''''''''''''''''''''''''''''''''''''''''''' + +|image3| + +Nova Scheduler +~~~~~~~~~~~~~~ + +Scheduling service of Nova is stateless and can be co-located with +services that have larger footprint. Scheduler should be scaled by +adding more service instances. Every scheduler instance must run +with at least 3 worker threads. + +Using multiple scheduler processes might lead to unexpected +delays in provisioning due to scheduler retries if many simultaneous +placement requests are being served. + +At the scale of 100 and 300 of nodes, 3 instances of Scheduler will +suffice. For 500 nodes, 5 instances recommended. + +Since they can be co-located with other services, it won't take any +additional physical servers from the pool. Schedulers can be placed to +nodes running other Nova control plane services (i.e. API and +Conductor). + +Nova Conductor +~~~~~~~~~~~~~~ + +Centralized conductor service in Nova adds database interface layer +for improved security and complex operations. It is a stateless service +that is scaled horizontally by adding instances of it running on different +physical servers. + +To ensure that the Conductor service is highly available, run 3 +instances of the service at any time. This will not require dedicated +physical servers from the pool. + +Figure 5. Scaling of Nova Scheduler and Conductor services +'''''''''''''''''''''''''''''''''''''''''''''''''''''''''' + +|image4| + +Nova Compute +~~~~~~~~~~~~ + +Compute service is essentially an agent service of Nova. An instance of +it runs on every hypervisor node in an OpenStack environment and +controls the virtual machines and other connected resources on a node +level. + +Compute service in the proposed architecture is not containerized. It +runs on the same node as a hypervisor and communicates to different local +and remote APIs to get the virtual machines up and running. One instance +of Compute service runs per hypervisor host. + +Services Placement +~~~~~~~~~~~~~~~~~~ + +Nova control plane services is spread across 3 nodes with the following +services distributed evenly between those nodes: + +- 3 instances of Conductor service + +- 3 instances of API service + +- 3 to 5 instances of Scheduler service + +Load balancer is placed in front of the API services to ensure +availability and load distribution for the Compute API of OpenStack +platform. + +Compute services run per-node on hypervisor hosts (compute nodes) in +non-redundant fashion. + +OpenStack Networking (Neutron) +------------------------------ + +This component includes API server that exposes HTTP-based API and a set +of agents which actually manage data plane resources, such as IP +addresses, firewall and virtual switches. + +Neutron Server +~~~~~~~~~~~~~~ + +An API server exposes Neutron API and passes all web service calls to +the Neutron plugin for processing. This service generates moderate load +on processors and memory of physical servers. In proposed architecture, +it runs on the same nodes as other lightweight API servers. A minimal of +3 instances of the server is recommended for redundancy and load +distribution. + +Load balancer working in front of the API servers helps to ensure their +availability and distribute the flow of requests. + +Figure 6. Neutron Server services in redundant configuration +'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' + +|image5| + +Neutron DHCP agent +~~~~~~~~~~~~~~~~~~ + +DHCP agent of Neutron is responsible for assigning IP addresses to VMs. +The agent is horizontally scalable by adding new instances of it +distributed between different nodes. DHCP agent can be co-located to any +other services and can run on hypervisor hosts as well. + +Neutron L3 agent +~~~~~~~~~~~~~~~~ + +L3 agent controls routing in Public and Private networks by creating and +managing namespaces, routers, floating IP addresses and network +translations. The agent can be scaled by adding instances. High +availability of the agent is ensured by running its instances in +Corosync/Pacemaker cluster or using orchestrator-driven clustering and +state tracking (e.g. Kubernetes events system with etcd). + +The agent has low resources footprint and can be co-located with more +resource-intensive services, for example, Neutron Server. + +Neutron L2 agent +~~~~~~~~~~~~~~~~ + +L2 agent manages data link layer configuration of the overlay network +for Compute nodes. It also provides L2 functions to L3 agent. The agent +is specific to Neutron plugin. + +The L2 agent used with OpenVSwitch plugin generates high CPU load when +creates and monitors the OVS configurations. It has to run on any host +that runs nova-compute, neutron-l3-agent or neutron-dhcp-agent. + +Neutron metadata agent +~~~~~~~~~~~~~~~~~~~~~~~ + +The metadata agent provides network metadata to virtual servers. +For availability and redundancy, it is recommended to run at least 3 +instances of the agent. They can share a node with other Neutron or +control plane services. + +Services Placement +~~~~~~~~~~~~~~~~~~ + +Neutron control plane services run in batches 3 for redundancy +of Neutron Server. All micro services of Neutron could be co-located +with the Neutron Servers or with other components' services. + +Load balancer should be placed in front of the Neutron Server to ensure +availability and load distribution for the Network API of the OpenStack +platform. + +L3 and DHCP agents are co-located with instances of Neutron Server on +the same physical nodes. L2 agent works on every Compute node and on +every node that runs L3 and/or DHCP agent. + +OpenStack Images (Glance) +------------------------- + +Images service consists of API and indexing service. Both of them are +stateless and can be containerized. + +Glance API +~~~~~~~~~~ + +This service exposes Images API of OpenStack platform. It is used to +upload and download images and snapshot. + +Glance API server has low resource consumption and can be co-located +with other services. It does not require dedicated physical servers. + +Glance API server scales by adding new instances. Load balancer is +required to provide a single virtual address to access the API. The load +can be evenly distributed between instances of Glance API. + +Important parameter of the Image Service architecture is the storage +backend. The following types of storage are proposed for Glance in this +Reference Architecture: + +- Local filesystem + +- Ceph cluster + +With local storage, the consistency of the store for all instances of +Glance API must be ensured by an external component (e.g. replication +daemon). + +Ceph backend natively provides availability and replication of stored +data. Multiple instances of Glance API service work with the same shared +store in Ceph cluster. + +Figure 7. Cinder API service in redundant configuration with Ceph back-end +'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' + +|image6| + +Glance Registry +~~~~~~~~~~~~~~~ + +Registry serves images metadata part of the Images API. It stores and +manages a catalog of images. It has low footprint and can share a +physical node with other resource intensive services. + +Services Placement +~~~~~~~~~~~~~~~~~~ + +Micro services of Glance do not require dedicated physical servers. They +can be co-located with other services. + +For the purposes of high availability and redundancy, at least 3 +instances of Glance API service should run at any time. Load balancer +must be placed in front of those instances to provide single API +endpoint and distribute the load. + +OpenStack Block Storage (Cinder) +-------------------------------- + +Storage service manages and exposes a virtual block storage. The server +that handles the management of low-level storage requires access to the +disk subsystem. + +Cinder API +~~~~~~~~~~ + +This service exposes Volume API of the OpenStack platform. Volume API is +not very often used by other components of OpenStack and doesn't consume +too much resources. It could run on the same physical node with other +resource non intensive services. + +For availability and redundancy purposes, it is proposed to run at least +3 instances of Cinder API. Load balancer should be placed in front of +these instances to provide distribution of load and failover +capabilities. + +Figure 8. Cinder API service in redundant configuration +''''''''''''''''''''''''''''''''''''''''''''''''''''''' + +|image7| + +Cinder Scheduler +~~~~~~~~~~~~~~~~ + +Scheduler selects an instance of the Volume micro service to call when a +client requests creation of a virtual block volume. Scheduler is not +resource intensive and can be co-located with other services. It scales +horizontally by adding new instances of the service. For redundancy +purposes, the instances of scheduler service should be distributed +between different physical servers. + +Cinder Volume +~~~~~~~~~~~~~ + +Volume service manages block storage via appropriate API which depends +on the technology in use. With Ceph as a provider of virtual block +storage, single instance of Cinder Volume service is required to manage +the Ceph cluster via its API. If virtual block storage is provided by +using LVM with local disks of Compute nodes, the Volume service must be +running on every Compute node in the OpenStack environment. + +In Ceph case, it is recommended to run at least 3 instances of the +Volume service for the availability and redundancy of the service. Each +virtual volume is managed by one instance Volume service at a time. +However, if that instance is lost, another one takes over on its +volumes. + +Figure 9. Cinder services in redundant configuration +'''''''''''''''''''''''''''''''''''''''''''''''''''' + +|image8| + +Services Placement +~~~~~~~~~~~~~~~~~~ + +Cinder services do not require dedicated physical nodes. They run on the +same physical servers with other components of control plane of the +OpenStack platform. + +The instances of Cinder API service are placed behind a load balancer to +ensure the distribution of load and availability of the service. + +OpenStack Dashboard (Horizon) +----------------------------- + +Horizon dashboard provides user interface to the cloud's provisioning +APIs. This is a web application running on top of Apache2 web server. +For availability purposes, multiple instances of the server are started. +Load balancer is placed in front of the instances to provide load +distribution and failover. Horizon does not require dedicated physical +node. It can be co-located with other services of other components of +the OpenStack platform. + +For security reasons, it is not recommended to use the same instances +Apache2 server to wrap both Horizon and Keystone API services. + +RabbitMQ +-------- + +The messaging server allows all distributed components of an OpenStack +service to communicate to each other. Services use internal RPC for +communication. All OpenStack services also send broadcast notifications +via messaging queue bus. + +Clustering +~~~~~~~~~~ + +The prerequisite for High Availability of queue server is the configured +and working RabbitMQ cluster. All data/state required for the operation +of a RabbitMQ cluster is replicated across all nodes. An exception to +this are message queues, which by default reside on one node, though +they are visible and reachable from all nodes. + +Cluster assembly requires installing and using a clustering plugin on +all servers. Proposed solution for RabbitMQ clustering is the +`rabbitmq-autocluster `__\ `*plugin* `__. + +The RabbitMQ cluster also needs proper fencing mechanism to exclude +split brain conditions and preserve a quorum. Proposed solution for this +problem is using 'pause\_minority' `partition +mode `__ with the +rabbit-autocluster plugin. + +Replication +~~~~~~~~~~~ + +Replication mechanism for RabbitMQ queues is known as 'mirroring'. By +default, queues within a RabbitMQ cluster are located on a single node +(the node on which they were first declared). This is unlike exchanges +and bindings, which can always be considered to be on all nodes. Queues +can optionally be made mirrored across multiple nodes. Each mirrored +queue consists of one master and one or more slaves, with the oldest +slave being promoted to the new master if the old master disappears for +any reason. + +Messages published to the queue are replicated to all members of the +cluster. Consumers are connected to the master regardless of which node +they connect to, with slave nodes dropping messages that have been +acknowledged at the master. + +Queue mirroring therefore aims to enhance availability, but not +distribution of load across nodes (all participating nodes each do all +the work). It is important to note that using mirroring in RabbitMQ +actually reduces the availability of queues by dropping performance by +about 2 times in +`tests `__, +and eventually leads to `failures of +RabbitMQ `__ +because of extremely high rate of context switches at node's CPUs. + +There are two main types of messages in OpenStack: + +- Remote Procedure Call (RPC) messages carry commands and/or requests + between microservices within a single component of OpenStack + platform (e.g. nova-conductor to nova-compute). + +- Notification messages are issued by a microservice upon specific + events and are consumed by other components (e.g. Nova + notifications about creating VMs are consumed by Ceilometer). + +In proposed OpenStack architecture, only notification queues are +mirrored. All other queues are not, and if the instance of RabbitMQ +server dies after a message sent, but before it is read, that message is +gone forever. This is a trade-off for significant (at least 2 times) +performance and stability boost in potential bottleneck service. +Drawbacks of this mode of operation are: + +- Long-running tasks might stuck in transition states due to loss of + messages. For example, Heat stacks might never leave spawning + state. Most of the time, such conditions could be fixed by the + user via API. + +Data Persistence +~~~~~~~~~~~~~~~~ + +OpenStack's RPC mechanism does not impose requirements for durable +queues or messages. Thus, no durability required for RabbitMQ queues, +and there are no 'disk' nodes in cluster. Restarting a RabbitMQ node +then will cause all data of that node to be lost. Since OpenStack does +not rely on RPC as a guaranteed transport, it doesn't break the control +plane. Clients shall detect failure of a server they are talking to and +connect to another server automatically. + +RabbitMQ service considered stateless in terms defined in this document +due to the reasons mentioned above. + +Networking Considerations +~~~~~~~~~~~~~~~~~~~~~~~~~ + +RabbitMQ nodes address each other using domain names, either short or +fully-qualified (FQDNs). Therefore hostnames of all cluster members must +be resolvable from all cluster nodes, as well as machines on which +command line tools such as rabbitmqctl might be used. + +RabbitMQ clustering has several modes of dealing with `network +partitions `__, primarily +consistency oriented. Clustering is meant to be used across LAN. It is +not recommended to run clusters that span WAN. The +`Shovel `__ or +`Federation `__ plugins are +better solutions for connecting brokers across a WAN. Note that `Shovel +and Federation are not equivalent to +clustering `__. + +Services Placement +~~~~~~~~~~~~~~~~~~ + +RabbitMQ servers are to be installed on the dedicated nodes. Co-locating +RabbitMQ with other Control Plane services has negative impact on its +performance and stability due to high resource consumption under the +load. Other services that have different resource usage patterns can +prevent RabbitMQ from allocating sufficient resources and thus make +messaging unstable. + +Based on that, RabbitMQ will require 3 dedicated nodes out of the pool +of Compute/Controller servers. These nodes can also host lightweight +services like Cinder, Keystone or Glance. + +Alternatives +~~~~~~~~~~~~ + +RabbitMQ is a server of choice for OpenStack messaging. Other +alternatives include: + +- 0MQ (ZeroMQ), a lightweight messaging library that integrates into + all components and provides server-less distributed message + exchange. + +- Kafka, a distributed commit log type messaging system, supported by + oslo.messaging library in experimental mode. + +ZeroMQ +^^^^^^ + +This library provides direct exchange of messages between microservices. +Its architecture may include simple brokers or proxies that just relay +messages to endpoints, thus reducing the number of network connections. + +ZeroMQ library support was present in OpenStack since early releases. +However, the implementation assumed direct connections between services +and thus a full mesh network between all nodes. This architecture +doesn't scale well. More recent implementations introduce simple proxy +services on every host that aggregate messages and relay them to a +central proxy, which does host-based routing. + +`Benchmarks `__ +show that both direct and proxy-based ZeroMQ implementations are more +efficient than RabbitMQ in terms of throughput and latency. However, in +the direct implementation, quick exhaustion of network connections limit +occurs at scale. + +The major down side of the ZeroMQ-based solution is that the queues +don't have any persistence. This is acceptable for RPC messaging, but +Notifications may require durable queues. Thus, if RPC is using ZeroMQ, +the Telemetry will require a separate messaging transport (RabbitMQ or +Kafka). + +Kafka +^^^^^ + +Distributed commit log based service Kafka is supported in OpenStack's +oslo.messaging library as an experimental. This makes it unfeasible to +include in the Reference Architecture.. + +MySQL/Galera +------------ + +State database of OpenStack contains all data that describe the state of +the cloud. All components of OpenStack use the database to read and +store changes in their state and state of the data plane components. + +Clustering +~~~~~~~~~~ + +The proposed clustering solution is based on the native +orchestrator-specific state management with etcd providing distributed +monitoring and data exchange for the cluster. Cluster operations will be +triggered by orchestrator events and handled by custom scripts. + +Failover and fencing of failed instances of MySQL is provided by scripts +triggered by the orchestrator upon changes in state and availability of +the members of Galera cluster. State and configuration information is +provided by etcd cluster. + +Data Persistence +~~~~~~~~~~~~~~~~ + +Galera implements replication mechanism to ensure that any data written +to one of the MySQL servers is synchronously duplicated to other members +of the cluster. When a new instance joins the cluster, one of the two +replication methods is used to synchronize it: IST or SST. If the +initial data set exists on the new node, incremental method (IST) is +used. Otherwise, full replication will be performed (SST). + +Since all nodes in the cluster have synchronous copies of the data set +at any time, there is no need to use shared storage. All DB servers work +with the local disk storage. + +Replication +~~~~~~~~~~~ + +Incremental synchronous replication is used to keep MySQL databases of +members in Galera cluster in sync. If a new member is added to the +cluster, full replication (SST) will be performed. + +Full SST replication can take indefinite time if the data set is big +enough. To mitigate this risk, the proposed architecture includes a +number of hot stand-by MySQL servers in addition to one Active server. +The access to said servers is provided by an instance of a load balancer +(see details in Networking Considerations section). + +Proposed architecture allows to quickly replace failing instances of +MySQL server without need to run full replication. It is still necessary +to restore the pool of hot standby instances whenever the failover event +occurs. + +Networking Considerations +~~~~~~~~~~~~~~~~~~~~~~~~~ + +Load balancer is a key element of networking configuration of the Galera +cluster. Load balancer must be coordinated with the cluster, in terms +that it redirect write requests to appropriate instance of MySQL server. +It also ensures failover to hot standby instances and fencing of failed +active nodes. + +Services Placement +~~~~~~~~~~~~~~~~~~ + +MySQL servers forming the Galera cluster could run on the same physical +servers as instances of RabbitMQ broker. + +Ceph Distributed Storage +------------------------ + +Summary +~~~~~~~ + +Ceph is a distributed storage system with built-in replication +capability. Architecture of Ceph is designed for resiliency and +durability. + +Ceph provides few different APIs for different types of supported +storage: + +- Distributed file system + +- Object storage + +- Virtual block device + +OpenStack platform generally uses Object API and Block Device API of +Ceph for Image and Virtual Block storage correspondingly. + +Main components of Ceph are as follows. + +- A Ceph Monitor maintains a master copy of the cluster map. A cluster + of Ceph monitors ensures high availability should a monitor + daemon fail. Storage cluster clients retrieve a copy of the + cluster map from the Ceph Monitor. + +- A Ceph OSD Daemon checks its own state and the state of other OSDs + and reports back to monitors. + +- RADOS gateway exposes the Object API of the platform. This API is + generally compatible with OpenStack Object Storage API, which + allows to use it as a Glance back-end without additional + modifications. + +Ceph Monitor +~~~~~~~~~~~~ + +For reliability purposes, Ceph Monitors should be placed to different +physical nodes. Those nodes might be the Storage nodes themselves, +albeit that is not generally recommended. Proper resilience of the +cluster can be ensured by using 3 instances of Ceph Monitor, each +running on different hosts. + +Ceph OSD +~~~~~~~~ + +Every storage device requires an instance of Ceph OSD daemon to run on +the node. These daemons might be resource intensive under certain +conditions. Since one node usually have multiple devices attached to it, +there are usually more than one OSD process running on the node. Thus, +it is recommended that no other services are placed to the OSD nodes. + +RADOS Gateway +~~~~~~~~~~~~~ + +This service exposes an Object API via HTTP. It have low resource +footprint and can be co-located with other services, for example, with a +Ceph Monitor. Multiple radosgw daemons should be used to provide +redundancy of the service. Load balancer should be placed in front of +instances of radosgw for load distribution and failover. + +Services Placement +~~~~~~~~~~~~~~~~~~ + +Ceph scales very well by adding new OSD nodes when capacity increase is +required. So the size of the Ceph cluster may vary for different clouds +of the same scale. In this document, a cluster with 45 OSD nodes is +described. + +Instances of Monitor service require 1 dedicated node per instance, to +the total of 3 nodes. RADOS gateway servers run on the same nodes as +Monitors. + +Control Plane Operations Monitoring +----------------------------------- + +Summary +~~~~~~~ + +Monitoring the OpenStack Control Plane infrastructure is essential for +operating the platform. The main goal of it is to ensure that an +operator is alerted about failures and degradations in service level of +the environment. Metering plays less important role in this type of +monitoring. + +Currently proposed solution for infrastructure monitoring is produced by +Stacklight project. + +Stacklight is a distributed framework for collecting, processing and +analyzing metrics and logs from OpenStack components. It includes the +following services: + +- Stacklight Collector + Aggregator + +- Elasticsearch + Kibana + +- InfluxDB + Grafana + +- Nagios + +Stacklight Collector +~~~~~~~~~~~~~~~~~~~~ + +Smart agent that runs on every node in the OpenStack environment. It +collects logs, processes metrics and notifications, generates and sends +alerts when needed. Specialized instance of Collector called Aggregator +aggregates metrics on the cluster level and performs special correlation +functions. + +Elasticsearch + Kibana +~~~~~~~~~~~~~~~~~~~~~~ + +These components are responsible for analyzing large amounts of textual +data, particularly logs records and files coming from OpenStack platform +nodes. Kibana provides graphical interface that allows to configure and +view correlations in messages from multiple sources. + +A typical setup at least requires a quad-core server with 8 GB of RAM +and fast disks (ideally, SSDs). The actual disk space you need to run +the subsystem on depends on several factors including the size of your +OpenStack environment, the retention period, the logging level, and +workload. The more of the above, the more disk space you need to run the +Elaticsearch-Kibana solution. It is highly recommended to use dedicated +disk(s) for your data storage. + +InfluxDB + Grafana +~~~~~~~~~~~~~~~~~~ + +InfluxDB is a time series database that provides high throughput and +real-time queries for reduced support of standard SQL capabilities like +efficiently updating records. Grafana exposes graphic interface to time +series data, displays graphs of certain metrics in time and so on. + +The hardware configuration (RAM, CPU, disk(s)) required by this +subsystem depends on the size of your cloud environment and other +factors like the retention policy. An average setup would require a +quad-core server with 8 GB of RAM and access to a 500-1000 IOPS disk. +For sizeable production deployments it is strongly recommended to use a +disk capable of 1000+ IOPS like an SSD. See the `*InfluxDB Hardware +Sizing +Guide* `__ +for additional sizing information. + +Nagios +~~~~~~ + +In Stacklight architecture, Nagios does not perform actual checks on the +hosts. Instead it provides transport and user interface for the alerting +subsystem. It receives alerts from Collectors and generates +notifications to the end users or cloud operators. + +A typical setup would at least require a quad-core server with 8 GB of +RAM and fast disks (ideally, SSDs). + +Services Placement +~~~~~~~~~~~~~~~~~~ + +InfluxDB and Elasticsearch subsystems of the Stacklight solution should +run on dedicated servers. Additionally, clustered InfluxDB needs at +least three nodes to form a quorum. Elasticsearch scales horizontally by +adding instances, each running on a separate node. At least three +instances of Elasticsearch are recommended. + +Stacklight Aggregator service could share a physical node with other +low-profile services. + +Nagios server can be co-located with Stacklight Aggregator due to its +low resource footprint. + +The total of 5 additional physical nodes are required to install Control +Plane Operations Monitoring framework based on Stacklight. + +Services Placement Summary +-------------------------- + +The following table summarizes the placement requirements of the +services described above. + +Table 1. Placement requirements summary per a service at scale of 100 nodes +''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' + ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| Service | Number Of Instances | Number Of Dedicated Nodes | Can Share A Node | Requires a Load Balancer | ++==========================+=======================+=============================+====================+============================+ +| keystone-all | 3 | - | yes | yes | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| nova-api | 3 | - | yes | yes | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| nova-scheduler | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| nova-conductor | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| nova-compute\* | - | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| neutron-server | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| neutron-dhcp-agent | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| neutron-l2-agent\* | - | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| neutron-l3-agent | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| neutron-metadata-agent | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| glance-api | 3 | - | yes | yes | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| glance-registry | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| cinder-api | 3 | - | yes | yes | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| cinder-scheduler | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| cinder-volume | - | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| horizon/apache2 | 3 | - | yes | yes | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| rabbitmq-server | 3 | 3 | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| mysqld-server | 3 | 3 | yes | yes\*\* | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| ceph-mon | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| ceph-osd\*\*\* | - | - | no | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| radosgw | 3 | - | yes | yes | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| lma-aggregator | 1 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| Influxdb + Grafana | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| Elasticsearch + Kibana | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| Nagios | 1 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| **TOTAL** | - | 6 | - | - | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ + +Table 2. Placement requirements summary per a service at scale of 300 nodes +''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' + ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| Service | Number Of Instances | Number Of Dedicated Nodes | Can Share A Node | Requires a Load Balancer | ++==========================+=======================+=============================+====================+============================+ +| keystone-all | 3 | 3 | yes | yes | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| nova-api | 3 | - | yes | yes | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| nova-scheduler | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| nova-conductor | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| nova-compute\* | - | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| neutron-server | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| neutron-dhcp-agent | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| neutron-l2-agent\* | - | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| neutron-l3-agent | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| neutron-metadata-agent | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| glance-api | 3 | - | yes | yes | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| glance-registry | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| cinder-api | 3 | - | yes | yes | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| cinder-scheduler | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| cinder-volume | - | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| horizon/apache2 | 3 | - | yes | yes | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| rabbitmq-server | 3 | 3 | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| mysqld-server | 3 | 3 | yes | yes\*\* | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| ceph-mon | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| ceph-osd\*\*\* | - | - | no | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| radosgw | 3 | - | yes | yes | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| lma-aggregator | 1 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| Influxdb + Grafana | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| Elasticsearch + Kibana | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| Nagios | 1 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| **TOTAL** | - | 9 | - | - | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ + +Table 3. Placement requirements summary per a service at scale of 500 nodes +''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' + ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| Service | Number Of Instances | Number Of Dedicated Nodes | Can Share A Node | Requires a Load Balancer | ++==========================+=======================+=============================+====================+============================+ +| keystone-all | 5 | 5 | yes | yes | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| nova-api | 3 | - | yes | yes | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| nova-scheduler | 5 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| nova-conductor | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| nova-compute\* | - | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| neutron-server | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| neutron-dhcp-agent | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| neutron-l2-agent\* | - | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| neutron-l3-agent | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| neutron-metadata-agent | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| glance-api | 3 | - | yes | yes | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| glance-registry | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| cinder-api | 3 | - | yes | yes | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| cinder-scheduler | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| cinder-volume | - | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| horizon/apache2 | 3 | - | yes | yes | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| rabbitmq-server | 3 | 3 | no | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| mysqld-server | 3 | 3 | yes | yes\*\* | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| ceph-mon | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| ceph-osd\*\*\* | - | - | no | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| radosgw | 3 | - | yes | yes | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| lma-aggregator | 1 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| Influxdb + Grafana | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| Elasticsearch + Kibana | 3 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| Nagios | 1 | - | yes | no | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ +| **TOTAL** | - | 11 | - | - | ++--------------------------+-----------------------+-----------------------------+--------------------+----------------------------+ + +\* this service runs on a Compute/hypervisor host + +\*\* this service might require specialized load balancer (proxysql) + +\*\*\* this service runs on a Storage/OSD node + +Total number of nodes for control plane is 6 which includes separate servers +for monitoring infrastructure and RabbitMQ/Galera services. The following +schema describes the layout of services in the test cluster. + +|image9| + +Data Plane +---------- + +Compute Virtualization +~~~~~~~~~~~~~~~~~~~~~~ + +QEMU implementation of Linux KVM is used as a virtualization hypervisor. +The KVM runs under control of libvirtd daemon, configured and managed by +Nova Compute microservice (nova-compute). + +Network +~~~~~~~ + +Depending on the picked Neutron plugin, data plane network might be +realized by different technology stacks. Reference architecture employs +OVS+VXLAN plugin, network data plane at compute hosts runs on +OpenVSwitch virtual switching software. + +Storage +~~~~~~~ + +Local disks and Ceph cluster RBD volumes are both used to provide block +storage capability to different services in the platform. + +Ephemeral +^^^^^^^^^ + +Disk files of the VM instances running in the cloud are stored in Ceph +distributed storage via RBD. This enables live migration of VMs and high +availability of data in certain cases. + +Virtual Block Devices +^^^^^^^^^^^^^^^^^^^^^ + +Virtual block storage devices are provided by Ceph via RBD. + +Images +^^^^^^ + +Images used by Nova to spin up new VM instances are stored in Ceph via +Glance service's back end driver. + +Snapshot +^^^^^^^^ + +Snapshots are stored in Ceph. + +Database +^^^^^^^^ + +Database files are kept on the local storage devices of nodes that run +database servers. The replication and availability of the data are +ensured by WSREP mechanism of Galera cluster. + +Monitoring +^^^^^^^^^^ + +The monitoring metrics and time-series data are written to InfluxDB +database. The database keeps its files on a local storage, similar to +the approach taken for state database (see above). + +Logs +^^^^ + +Logs are written to local disks of nodes, plus optionally to remote logs +collector service which is the part of Telemetry component of the +platform. + +References +---------- + +This section contains references to external documents used in +preparation of this document. + +1. `Testing on scale of 1000 compute nodes `__ + +2. `Performance testing of OpenStack `__ + +3. `RabbitMQ Clustering `__ + +4. `RabbitMQ High Availability `__ + +5. `ZeroMQ Status Update `__ + +.. |image0| image:: media/image03.png + :width: 6.50000in + :height: 3.68056in +.. |image1| image:: media/image02.png + :width: 6.50000in + :height: 3.33333in +.. |image2| image:: media/image05.png + :width: 6.50000in + :height: 3.68056in +.. |image3| image:: media/image07.png + :width: 6.50000in + :height: 3.68056in +.. |image4| image:: media/image09.png + :width: 6.50000in + :height: 3.68056in +.. |image5| image:: media/image11.png + :width: 6.50000in + :height: 3.68056in +.. |image6| image:: media/image13.png + :width: 6.50000in + :height: 3.68056in +.. |image7| image:: media/image15.png + :width: 6.50000in + :height: 3.68056in +.. |image8| image:: media/image17.png + :width: 6.50000in + :height: 3.68056in +.. |image9| image:: media/image18.png + :width: 6.50000in + :height: 3.68056in diff --git a/doc/source/index.rst b/doc/source/index.rst index f43dc73f..bfc94dcf 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -46,6 +46,7 @@ Design docs design/clusters_on_k8s design/ost_compute_on_k8s + design/ref_arch_100_nodes design/ref_arch_1000_nodes Indices and tables