Architecture

Architecture Storage hardware selection options include three areas: Cost Performance Reliability The selection of hardware for a storage-focused OpenStack cloud must reflect the fact that the workloads are storage intensive. These workloads are not compute intensive, nor are they consistently network intensive. The network may be heavily utilized to transfer storage, but they are not otherwise network intensive. The hardware selection for a storage-focused OpenStack architecture design must reflect this preference for storage-intensive workloads. For a storage-focused OpenStack design architecture, the selection of storage hardware will determine the overall performance and scalability of the design architecture. A number of different factors must be considered in the design process: Cost: The overall cost of the solution will play a major role in what storage architecture and the resulting storage hardware that is selected. Performance: The performance of the solution, measured by observing the latency of storage I-O requests, also plays a major role. In a compute-focused OpenStack cloud storage latency could potentially be a major consideration, in some compute-intensive workloads, minimizing the delays that the CPU experiences while fetching data from the storage can have a significant impact on the overall performance of the application. Scalability: "Scalability" refers to how well the storage solution performs as it is expanded up to its maximum size. A storage solution that performs well in small configurations but has degrading performance as it expands would be considered not scalable. Conversely, a solution that continues to perform well at maximum expansion would be considered scalable. The ability of the storage solution to continue to perform well as it expands is important. Expandability: Here we are referring to the overall ability of the solution to grow. A storage solution that expands to 50 PB is considered more expandable than a solution that only scales to 10 PB. Note that this metric is related to but different from scalability which is a measure of the solution's performance as it expands. Latency is one of the key considerations in a storage-focused OpenStack cloud . Using solid-state disks (SSDs) to minimize latency for instance storage and reduce CPU delays caused by waiting for the storage will have a result of increased performance. It is also recommended to evaluate the gains from using RAID controller cards in compute hosts to improve the performance of the underlying disk subsystem. The selection of storage architecture (and the corresponding storage hardware, if there is an option) is determined by evaluating possible solutions against the key factors above. This will determine if a scale-out solution (such as Ceph, GlusterFS, or similar) should be used or if a single, highly expandable and scalable centralized storage array would be a better choice. If a centralized storage array is the right fit for the requirements then the hardware will be determined by the array vendor. It is possible to build a storage array using commodity hardware with Open Source software, but there needs to be access to people with expertise to build such a system. On the other hand, a scale-out storage solution that uses direct-attached storage (DAS) in the servers may be an appropriate choice. If this is true, then the server hardware needs to be configured to support the storage solution. Some potential impacts that might affect a particular storage architecture (and corresponding storage hardware) of a Storage-focused OpenStack cloud: Connectivity: Based on the storage solution selected, ensure the connectivity matches the storage solution requirements. If a centralized storage array is selected it is important to determine how the hypervisors will connect to the storage array. Connectivity can affect latency and thus performance. It is recommended to check that the network characteristics will minimize latency to boost the overall performance of the design. Latency: Determine if the use case will have consistent or highly variable latency. Throughput: Ensure that the storage solution throughput is optimized based on application requirements. Server Hardware: Use of DAS impacts the server hardware choice and affects host density, instance density, power density, OS-hypervisor, and management tools, to name a few.

Compute (Server) Hardware Selection Compute (server) hardware must be evaluated against four opposing dimensions: Server density: A measure of how many servers can fit into a given measure of physical space, such as a rack unit [U]. Resource capacity: The number of CPU cores, how much RAM, or how much storage a given server will deliver. Expandability: The number of additional resources that can be added to a server before it has reached its limit. Cost: The relative of the hardware weighted against the level of design effort needed to build the system. The dimensions need to be weighed against each other to determine the best design for the desired purpose. For example, increasing server density can mean sacrificing resource capacity or expandability. Increasing resource capacity and expandability can increase cost but decrease server density. Decreasing cost often means decreasing supportability, server density, resource capacity, and expandability. For a storage-focused OpenStack architecture design, a secondary design consideration for selecting server hardware will be the compute capacity (CPU cores and RAM capacity). As a result, the required server hardware must supply adequate CPU sockets, additional CPU cores, and more RAM; network connectivity and storage capacity are not as critical. The hardware will need to provide enough network connectivity and storage capacity to meet the user requirements, however they are not the primary consideration. Since there is only a need for adequate CPU and RAM capacity, some server hardware form factors will be better suited to this storage-focused design than others: Most blade servers typically support dual-socket multi-core CPUs; to avoid the limit will mean choosing "full width" or "full height" blades, which means losing server density. The high density blade servers (for example, both HP BladeSystem and Dell PowerEdge M1000e), which support up to 16 servers in only 10 rack units using "half height" or "half width" blades, suddenly decrease the density by 50% (only 8 servers in 10 U) if a "full width" or "full height" option is used. 1U rack-mounted servers (servers that occupy only a single rack unit) might be able to offer greater server density than a blade server solution (40 servers in a rack, providing space for the top of rack (ToR) switches, versus 32 "full width" or "full height" blade servers in a rack), but often are limited to dual-socket, multi-core CPU configurations. Note that as of the Icehouse release, neither HP, IBM, nor Dell offered 1U rack servers with more than 2 CPU sockets. To obtain greater than dual-socket support in a 1U rack-mount form factor, customers need to buy their systems from Original Design Manufacturers (ODMs) or second-tier manufacturers. This may cause issues for organizations that have preferred vendor policies or concerns with support and hardware warranties of non-tier 1 vendors. 2U rack-mounted servers provide quad-socket, multi-core CPU support but with a corresponding decrease in server density (half the density offered by 1U rack-mounted servers). Larger rack-mounted servers, such as 4U servers, often provide even greater CPU capacity. Commonly supporting four or even eight CPU sockets. These servers have greater expandability capacity but such servers have much lower server density and usually greater hardware cost. The so-called "sled servers" (rack-mounted servers that support multiple independent servers in a single 2U or 3U enclosure) deliver increased density as compared to a typical 1U-2U rack-mounted servers. For example, many sled servers offer four independent dual-socket nodes in 2U for a total of 8 CPU sockets in 2U. However, the dual-socket limitation on individual nodes may not be sufficient to offset their additional cost and configuration complexity. Other factors that will strongly influence server hardware selection for a storage-focused OpenStack design architecture: Instance density: In this architecture, instance density and CPU-RAM oversubscription are lower. More hosts will be required to support the anticipated scale, especially if the design uses dual-socket hardware designs. Host density: Another option to address the higher host count is to use a quad socket platform. Taking this approach will decrease host density which also increases rack count. This configuration affects the number of power connections and also impacts network and cooling requirements. Power and cooling density: The power and cooling density requirements might be lower than with blade, sled, or 1U server designs due to lower host density (by using 2U, 3U or even 4U server designs). For data centers with older infrastructure, this might be a desirable feature. Storage-focused OpenStack design architecture server hardware selection should focus on a "scale up" versus "scale out" solution. The determination of which will be the best solution, smaller number of larger hosts or a larger number of smaller hosts, will depend of a combination of factors including cost, power, cooling, physical rack and floor space, support-warranty, and manageability.

Networking Hardware Selection Some of the key considerations that should be included in the selection of networking hardware include: Port count: The user will require networking hardware that has the requisite port count. Port density: The network design will be affected by the physical space that is required to provide the requisite port count. A switch that can provide 48 10 GbE ports in 1U has a much higher port density than a switch that provides 24 10 GbE ports in 2U. On a general scale, a higher port density leaves more rack space for compute or storage components which is preferred. It is also important to consider fault domains and power density. Finally, higher density switches are more expensive, therefore it is important not to over design the network. Port speed: The networking hardware must support the proposed network speed, for example: 1 GbE, 10 GbE, or 40 GbE (or even 100 GbE). Redundancy: The level of network hardware redundancy required is influenced by the user requirements for high availability and cost considerations. Network redundancy can be achieved by adding redundant power supplies or paired switches. If this is a requirement the hardware will need to support this configuration. User requirements will determine if a completely redundant network infrastructure is required. Power requirements: Make sure that the physical data center provides the necessary power for the selected network hardware. This is not typically an issue for top of rack (ToR) switches, but may be an issue for spine switches in a leaf and spine fabric, or end of row (EoR) switches. Protocol support: It is possible to gain even more performance out of a single storage system by using specialized network technologies such as RDMA, SRP, iSER and SCST. The specifics for using these technologies is beyond the scope of this book.

Software Selection Selecting software to be included in a storage-focused OpenStack architecture design includes three areas: Operating system (OS) and hypervisor OpenStack components Supplemental software Design decisions made in each of these areas impacts the rest of the OpenStack architecture design.

Operating System and Hypervisor The selection of OS and hypervisor has a significant impact on the overall design and also affects server hardware selection. Ensure that the storage hardware is supported by the selected operating system and hypervisor combination and that the networking hardware selection and topology will work with the chosen operating system and hypervisor combination. For example, if the design uses Link Aggregation Control Protocol (LACP), the OS and hypervisor are both required to support it. Some areas that could be impacted by the selection of OS and hypervisor include: Cost: Selection of a commercially supported hypervisor, such as Microsoft Hyper-V, will result in a different cost model rather than selected a community-supported open source hypervisor like Kinstance or Xen. Similarly, choosing Ubuntu over Red Hat (or vice versa) will have an impact on cost due to support contracts. Conversely, business or application requirements might dictate a specific or commercially supported hypervisor. Supportability: Whichever hypervisor is chosen, the staff needs to have appropriate training and knowledge to support the selected OS and hypervisor combination. If they do not training will need to be provided, which could have a cost impact on the design. Another aspect to consider would be the support for the OS-hypervisor. The support of a commercial product such as Redhat, Suse, or Windows, is the responsibility of the OS vendor. If an Open Source platform is chosen, the support comes from in-house resources. Either decision has a cost that will have an impact on the design. Management tools: The management tools used for Ubuntu and Kinstance differ from the management tools for VMware vSphere. Although both OS and hypervisor combinations are supported by OpenStack, there will naturally be very different impacts to the rest of the design as a result of the selection of one combination versus the other. Scale and performance: Make sure that selected OS and hypervisor combination meet the appropriate scale and performance requirements needed for this general purpose OpenStack cloud. The chosen architecture will need to meet the targeted instance-host ratios with the selected OS-hypervisor combination. Security: Make sure that the design can accommodate the regular periodic installation of application security patches while maintaining the required workloads. The frequency of security patches for the proposed OS-hypervisor combination will have an impact on performance and the patch installation process could affect maintenance windows. Supported features: Determine what features of OpenStack are required. This will often determine the selection of the OS-hypervisor combination. Certain features are only available with specific OSs or hypervisors. For example, if certain features are not available, the design might need to be modified to meet the user requirements. Interoperability: Consideration should be given to the ability of the selected OS-hypervisor combination to interoperate or co-exist with other OS-hypervisors ,or other software solutions in the overall design, if that is a requirement. Operational and troubleshooting tools for one OS-hypervisor combination may differ from the tools used for another OS-hypervisor combination. As a result, the design will need to address if the two sets of tools need to interoperate.

OpenStack Components The selection of OpenStack components has a significant direct impact on the overall design. While there are certain components that will always be present, (Nova and Glance, for example) there are other services that may not need to be present. As an example, a certain design may not require OpenStack Heat. Omitting Heat would not typically have a significant impact on the overall design however, if the architecture uses a replacement for OpenStack Swift for its storage component, this could potentially have significant impacts on the rest of the design. A storage-focused design might require the ability to use Heat to launch instances with Cinder volumes to perform storage-intensive processing. For a storage-focused OpenStack design architecture, the following components would typically be used: Keystone Horizon Nova (including the use of multiple hypervisor drivers) Swift (or another object storage solution) Cinder Glance Neutron or nova-network The exclusion of certain OpenStack components may limit or constrain the functionality of other components. If a design opts to include Heat but exclude Ceilometer, then the design will not be able to take advantage of Heat's auto scaling functionality (which relies on information from Ceilometer). Due to the fact that you can use Heat to spin up a large number of instances to perform the compute-intensive processing, including Heat in a compute-focused architecture design is strongly recommended.

Supplemental Software While OpenStack is a fairly complete collection of software projects for building a platform for cloud services, there are additional pieces of software that might need to be added to any given OpenStack design.

Networking Software OpenStack Neutron provides a wide variety of networking services for instances. There are many additional networking software packages that may be useful to manage the OpenStack components themselves. Some examples include HAProxy, keepalived, and various routing daemons (like Quagga). Some of these software packages, HAProxy in particular, are described in more detail in the OpenStack HA Guide (refer to Chapter 8 of the OpenStack High Availability Guide). For a general purpose OpenStack cloud, it is reasonably likely that the OpenStack infrastructure components will need to be highly available, and therefore networking software packages like HAProxy will need to be included.

Management Software This includes software for providing clustering, logging, monitoring, and alerting. The factors for determining which software packages in this category should be selected is outside the scope of this design guide. This design guide focuses specifically on how the selected supplemental software solution impacts or affects the overall OpenStack cloud design. Clustering Software, such as Corosync or Pacemaker, is determined primarily by the availability design requirements. Therefore, the impact of including (or not including) these software packages is determined by the availability of the cloud infrastructure and the complexity of supporting the configuration after it is deployed. The OpenStack High Availability Guide provides more details on the installation and configuration of Corosync and Pacemaker, should these packages need to be included in the design. Requirements for logging, monitoring, and alerting are determined by operational considerations. Each of these sub-categories includes a number of various options. For example, in the logging sub-category one might consider Logstash, Splunk, Log Insight, or some other log aggregation-consolidation tool. Logs should be stored in a centralized location to make it easier to perform analytics against the data. Log data analytics engines can also provide automation and issue notification, by providing a mechanism to both alert and automatically attempt to remediate some of the more commonly known issues. If any of these software packages are needed, then the design must account for the additional resource consumption (CPU, RAM, storage, and network bandwidth for a log aggregation solution, for example). Some other potential design impacts include: OS - Hypervisor combination: Ensure that the selected logging, monitoring, or alerting tools support the proposed OS-hypervisor combination. Network hardware: The network hardware selection needs to be supported by the logging, monitoring, and alerting software.

Database Software Virtually all of the OpenStack components require access to back-end database services to store state and configuration information. Choose an appropriate back-end database which will satisfy the availability and fault tolerance requirements of the OpenStack services. MySQL is generally considered to be the de facto database for OpenStack, however, other compatible databases are also known to work. Note, however, that Ceilometer uses MongoDB. The solution selected to provide high availability for the database will change based on the selected database. If MySQL is selected, then a number of options are available. For active-active clustering a replication technology such as Galera can be used. For active-passive some form of shared storage must be used. Each of these potential solutions has an impact on the design: Solutions that employ Galera/MariaDB will require at least three MySQL nodes. MongoDB will have its own design considerations, with regards to making the database highly available. OpenStack design, generally, does not include shared storage but for a high availability design some components might require it depending on the specific implementation.