openstack-manuals/doc/arch-design/storage_focus/section_architecture_storag...

<?xml version="1.0" encoding="UTF-8"?>
<section xmlns="http://docbook.org/ns/docbook"
  xmlns:xi="http://www.w3.org/2001/XInclude"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  version="5.0"
  xml:id="architecture-storage-hardware">
    <title>Architecture</title>
    <para>Storage hardware selection options include three
        areas:</para>
    <itemizedlist>
        <listitem>
            <para>Cost</para>
        </listitem>
        <listitem>
            <para>Performance</para>
        </listitem>
        <listitem>
            <para>Reliability</para>
        </listitem>
    </itemizedlist>
    <para>The selection of hardware for a storage-focused OpenStack
        cloud must reflect the fact that the workloads are storage
        intensive. These workloads are not compute intensive, nor are
        they consistently network intensive. The network may be
        heavily utilized to transfer storage, but they are not
        otherwise network intensive. The hardware selection for a
        storage-focused OpenStack architecture design must reflect
        this preference for storage-intensive workloads.</para>
    <para>For a storage-focused OpenStack design architecture, the
        selection of storage hardware will determine the overall
        performance and scalability of the design architecture. A
        number of different factors must be considered in the design
        process:</para>
    <itemizedlist>
        <listitem>
            <para>Cost: The overall cost of the solution will play a
                major role in what storage architecture and the
                resulting storage hardware that is selected.</para>
        </listitem>
        <listitem>
            <para>Performance: The performance of the solution,
                measured by observing the latency of storage I-O
                requests, also plays a major role. In a
                compute-focused OpenStack cloud storage latency could
                potentially be a major consideration, in some
                compute-intensive workloads, minimizing the delays
                that the CPU experiences while fetching data from the
                storage can have a significant impact on the overall
                performance of the application.</para>
        </listitem>
    </itemizedlist>
    <itemizedlist>
        <listitem>
            <para>Scalability: "Scalability" refers to how well the
                storage solution performs as it is expanded up to its
                maximum size. A storage solution that performs well in
                small configurations but has degrading performance as
                it expands would be considered not scalable.
                Conversely, a solution that continues to perform well
                at maximum expansion would be considered scalable. The
                ability of the storage solution to continue to perform
                well as it expands is important.</para>
        </listitem>
        <listitem>
            <para>Expandability: Here we are referring to the overall
                ability of the solution to grow. A storage solution
                that expands to 50 PB is considered more expandable
                than a solution that only scales to 10 PB. Note that
                this metric is related to but different from
                scalability which is a measure of the solution's
                performance as it expands.</para>
        </listitem>
    </itemizedlist>
    <para>Latency is one of the key considerations in a
        storage-focused OpenStack cloud . Using solid-state disks
        (SSDs) to minimize latency for instance storage and reduce CPU
        delays caused by waiting for the storage will have a result of
        increased performance. It is also recommended to evaluate the
        gains from using RAID controller cards in compute hosts to
        improve the performance of the underlying disk
        subsystem.</para>
    <para>The selection of storage architecture (and the corresponding
        storage hardware, if there is an option) is determined by
        evaluating possible solutions against the key factors above.
        This will determine if a scale-out solution (such as Ceph,
        GlusterFS, or similar) should be used or if a single, highly
        expandable and scalable centralized storage array would be a
        better choice. If a centralized storage array is the right fit
        for the requirements then the hardware will be determined by
        the array vendor. It is possible to build a storage array
        using commodity hardware with Open Source software, but there
        needs to be access to people with expertise to build such a
        system. On the other hand, a scale-out storage solution that
        uses direct-attached storage (DAS) in the servers may be an
        appropriate choice. If this is true, then the server hardware
        needs to be configured to support the storage solution.</para>
    <para>Some potential impacts that might affect a particular
        storage architecture (and corresponding storage hardware) of a
        Storage-focused OpenStack cloud:</para>
    <itemizedlist>
        <listitem>
            <para>Connectivity: Based on the storage solution
                selected, ensure the connectivity matches the storage
                solution requirements. If a centralized storage array
                is selected it is important to determine how the
                hypervisors will connect to the storage array.
                Connectivity can affect latency and thus performance.
                It is recommended to check that the network
                characteristics will minimize latency to boost the
                overall performance of the design.</para>
        </listitem>
        <listitem>
            <para>Latency: Determine if the use case will have
                consistent or highly variable latency.</para>
        </listitem>
        <listitem>
            <para>Throughput: Ensure that the storage solution
                throughput is optimized based on application
                requirements.</para>
        </listitem>
        <listitem>
            <para>Server Hardware: Use of DAS impacts the server
                hardware choice and affects host density, instance
                density, power density, OS-hypervisor, and management
                tools, to name a few.</para>
        </listitem>
    </itemizedlist>
    <section xml:id="compute-server-hardware-selection">
        <title>Compute (Server) Hardware Selection</title>
    <para>Compute (server) hardware must be evaluated against four
        opposing dimensions:</para>
    <itemizedlist>
        <listitem>
            <para>Server density: A measure of how many servers can
                fit into a given measure of physical space, such as a
                rack unit [U].</para>
        </listitem>
        <listitem>
            <para>Resource capacity: The number of CPU cores, how much
                RAM, or how much storage a given server will
                deliver.</para>
        </listitem>
        <listitem>
            <para>Expandability: The number of additional resources
                that can be added to a server before it has reached
                its limit.</para>
        </listitem>
        <listitem>
            <para>Cost: The relative of the hardware weighted against
                the level of design effort needed to build the
                system.</para>
        </listitem>
    </itemizedlist>
    <para>The dimensions need to be weighed against each other to
        determine the best design for the desired purpose. For
        example, increasing server density can mean sacrificing
        resource capacity or expandability. Increasing resource
        capacity and expandability can increase cost but decrease
        server density. Decreasing cost often means decreasing
        supportability, server density, resource capacity, and
        expandability.</para>
    <para>For a storage-focused OpenStack architecture design, a
        secondary design consideration for selecting server hardware
        will be the compute capacity (CPU cores and RAM capacity). As
        a result, the required server hardware must supply adequate
        CPU sockets, additional CPU cores, and more RAM; network
        connectivity and storage capacity are not as critical. The
        hardware will need to provide enough network connectivity and
        storage capacity to meet the user requirements, however they
        are not the primary consideration.</para>
    <para>Since there is only a need for adequate CPU and RAM
        capacity, some server hardware form factors will be better
        suited to this storage-focused design than others:</para>
    <itemizedlist>
        <listitem>
            <para>Most blade servers typically support dual-socket
                multi-core CPUs; to avoid the limit will mean choosing
                "full width" or "full height" blades, which means
                losing server density. The high density blade servers
                (for example, both HP BladeSystem and Dell PowerEdge
                M1000e), which support up to 16 servers in only 10
                rack units using "half height" or "half width" blades,
                suddenly decrease the density by 50% (only 8 servers
                in 10 U) if a "full width" or "full height" option is
                used.</para>
        </listitem>
        <listitem>
            <para>1U rack-mounted servers (servers that occupy only a
                single rack unit) might be able to offer greater
                server density than a blade server solution (40
                servers in a rack, providing space for the top of rack
                (ToR) switches, versus 32 "full width" or "full
                height" blade servers in a rack), but often are
                limited to dual-socket, multi-core CPU configurations.
                Note that as of the Icehouse release, neither HP, IBM,
                nor Dell offered 1U rack servers with more than 2 CPU
                sockets. To obtain greater than dual-socket support in
                a 1U rack-mount form factor, customers need to buy
                their systems from Original Design Manufacturers
                (ODMs) or second-tier manufacturers. This may cause
                issues for organizations that have preferred vendor
                policies or concerns with support and hardware
                warranties of non-tier 1 vendors.</para>
        </listitem>
        <listitem>
            <para>2U rack-mounted servers provide quad-socket,
                multi-core CPU support but with a corresponding
                decrease in server density (half the density offered
                by 1U rack-mounted servers).</para>
        </listitem>
        <listitem>
            <para>Larger rack-mounted servers, such as 4U servers,
                often provide even greater CPU capacity. Commonly
                supporting four or even eight CPU sockets. These
                servers have greater expandability capacity but such
                servers have much lower server density and usually
                greater hardware cost.</para>
        </listitem>
        <listitem>
            <para>The so-called "sled servers" (rack-mounted servers
                that support multiple independent servers in a single
                2U or 3U enclosure) deliver increased density as
                compared to a typical 1U-2U rack-mounted servers. For
                example, many sled servers offer four independent
                dual-socket nodes in 2U for a total of 8 CPU sockets
                in 2U. However, the dual-socket limitation on
                individual nodes may not be sufficient to offset their
                additional cost and configuration complexity.</para>
        </listitem>
    </itemizedlist>
    <para>Other factors that will strongly influence server hardware
        selection for a storage-focused OpenStack design
        architecture:</para>
    <itemizedlist>
        <listitem>
            <para>Instance density: In this architecture, instance
                density and CPU-RAM oversubscription are lower. More
                hosts will be required to support the anticipated
                scale, especially if the design uses dual-socket
                hardware designs.</para>
        </listitem>
        <listitem>
            <para>Host density: Another option to address the higher
                host count is to use a quad socket platform. Taking
                this approach will decrease host density which also
                increases rack count. This configuration affects the
                number of power connections and also impacts network
                and cooling requirements.</para>
        </listitem>
        <listitem>
            <para>Power and cooling density: The power and cooling
                density requirements might be lower than with blade,
                sled, or 1U server designs due to lower host density
                (by using 2U, 3U or even 4U server designs). For data
                centers with older infrastructure, this might be a
                desirable feature.</para>
        </listitem>
    </itemizedlist>
    <para>Storage-focused OpenStack design architecture server
        hardware selection should focus on a "scale up" versus "scale
        out" solution. The determination of which will be the best
        solution, smaller number of larger hosts or a larger number of
        smaller hosts, will depend of a combination of factors
        including cost, power, cooling, physical rack and floor space,
        support-warranty, and manageability.</para></section>
    <section xml:id="networking-hardware-selections">
        <title>Networking Hardware Selection</title>
    <para>Some of the key considerations that should be included in
        the selection of networking hardware include:</para>
    <itemizedlist>
        <listitem>
            <para>Port count: The user will require networking
                hardware that has the requisite port count.</para>
        </listitem>
        <listitem>
            <para>Port density: The network design will be affected by
                the physical space that is required to provide the
                requisite port count. A switch that can provide 48 10
                GbE ports in 1U has a much higher port density than a
                switch that provides 24 10 GbE ports in 2U. On a
                general scale, a higher port density leaves more rack
                space for compute or storage components which is
                preferred. It is also important to consider fault
                domains and power density. Finally, higher density
                switches are more expensive, therefore it is important
                not to over design the network.</para>
        </listitem>
        <listitem>
            <para>Port speed: The networking hardware must support the
                proposed network speed, for example: 1 GbE, 10 GbE, or
                40 GbE (or even 100 GbE).</para>
        </listitem>
        <listitem>
            <para>Redundancy: The level of network hardware redundancy
                required is influenced by the user requirements for
                high availability and cost considerations. Network
                redundancy can be achieved by adding redundant power
                supplies or paired switches. If this is a requirement
                the hardware will need to support this configuration.
                User requirements will determine if a completely
                redundant network infrastructure is required.</para>
        </listitem>
        <listitem>
            <para>Power requirements: Make sure that the physical data
                center provides the necessary power for the selected
                network hardware. This is not typically an issue for
                top of rack (ToR) switches, but may be an issue for
                spine switches in a leaf and spine fabric, or end of
                row (EoR) switches.</para>
        </listitem>
        <listitem>
            <para>Protocol support: It is possible to gain even more
                performance out of a single storage system by using
                specialized network technologies such as RDMA, SRP,
                iSER and SCST. The specifics for using these
                technologies is beyond the scope of this book.</para>
        </listitem>
    </itemizedlist></section>
    <section xml:id="software-selection-arch-storage"><title>Software Selection</title>
    <para>Selecting software to be included in a storage-focused
        OpenStack architecture design includes three areas:</para>
    <itemizedlist>
        <listitem>
            <para>Operating system (OS) and hypervisor</para>
        </listitem>
        <listitem>
            <para>OpenStack components</para>
        </listitem>
        <listitem>
            <para>Supplemental software</para>
        </listitem>
    </itemizedlist>
    <para>Design decisions made in each of these areas impacts the
        rest of the OpenStack architecture design.</para></section>
    <section xml:id="operating-system-and-hypervisor-arch-storage">
        <title>Operating System and Hypervisor</title>
    <para>The selection of OS and hypervisor has a significant impact
        on the overall design and also affects server hardware
        selection. Ensure that the storage hardware is supported by
        the selected operating system and hypervisor combination and
        that the networking hardware selection and topology will work
        with the chosen operating system and hypervisor combination.
        For example, if the design uses Link Aggregation Control
        Protocol (LACP), the OS and hypervisor are both required to
        support it.</para>
    <para>Some areas that could be impacted by the selection of OS and
        hypervisor include:</para>
    <itemizedlist>
        <listitem>
            <para>Cost: Selection of a commercially supported
                hypervisor, such as Microsoft Hyper-V, will result in
                a different cost model rather than selected a
                community-supported open source hypervisor like
                Kinstance or Xen. Similarly, choosing Ubuntu over Red
                Hat (or vice versa) will have an impact on cost due to
                support contracts. Conversely, business or application
                requirements might dictate a specific or commercially
                supported hypervisor.</para>
        </listitem>
        <listitem>
            <para>Supportability: Whichever hypervisor is chosen, the
                staff needs to have appropriate training and knowledge
                to support the selected OS and hypervisor combination.
                If they do not training will need to be provided,
                which could have a cost impact on the design. Another
                aspect to consider would be the support for the
                OS-hypervisor. The support of a commercial product
                such as Redhat, Suse, or Windows, is the
                responsibility of the OS vendor. If an Open Source
                platform is chosen, the support comes from in-house
                resources. Either decision has a cost that will have
                an impact on the design.</para>
        </listitem>
        <listitem>
            <para>Management tools: The management tools used for
                Ubuntu and Kinstance differ from the management tools
                for VMware vSphere. Although both OS and hypervisor
                combinations are supported by OpenStack, there will
                naturally be very different impacts to the rest of the
                design as a result of the selection of one combination
                versus the other.</para>
        </listitem>
        <listitem>
            <para>Scale and performance: Make sure that selected OS
                and hypervisor combination meet the appropriate scale
                and performance requirements needed for this general
                purpose OpenStack cloud. The chosen architecture will
                need to meet the targeted instance-host ratios with
                the selected OS-hypervisor combination.</para>
        </listitem>
        <listitem>
            <para>Security: Make sure that the design can accommodate
                the regular periodic installation of application
                security patches while maintaining the required
                workloads. The frequency of security patches for the
                proposed OS-hypervisor combination will have an impact
                on performance and the patch installation process
                could affect maintenance windows.</para>
        </listitem>
        <listitem>
            <para>Supported features: Determine what features of
                OpenStack are required. This will often determine the
                selection of the OS-hypervisor combination. Certain
                features are only available with specific OSs or
                hypervisors. For example, if certain features are not
                available, the design might need to be modified to
                meet the user requirements.</para>
        </listitem>
        <listitem>
            <para>Interoperability: Consideration should be given to
                the ability of the selected OS-hypervisor combination
                to interoperate or co-exist with other OS-hypervisors
                ,or other software solutions in the overall design, if
                that is a requirement. Operational and troubleshooting
                tools for one OS-hypervisor combination may differ
                from the tools used for another OS-hypervisor
                combination. As a result, the design will need to
                address if the two sets of tools need to interoperate.
           </para>
        </listitem>
    </itemizedlist></section>
    <section xml:id="openstack-components-arch-storage"><title>OpenStack Components</title>
    <para>The selection of OpenStack components has a significant
        direct impact on the overall design. While there are certain
        components that will always be present, (Nova and Glance, for
        example) there are other services that may not need to be
        present. As an example, a certain design may not require
        OpenStack Heat. Omitting Heat would not typically have a
        significant impact on the overall design however, if the
        architecture uses a replacement for OpenStack Swift for its
        storage component, this could potentially have significant
        impacts on the rest of the design.</para>
    <para>A storage-focused design might require the ability to use
        Heat to launch instances with Cinder volumes to perform
        storage-intensive processing.</para>
    <para>For a storage-focused OpenStack design architecture, the
        following components would typically be used:</para>
    <itemizedlist>
        <listitem>
            <para>Keystone</para>
        </listitem>
        <listitem>
            <para>Horizon</para>
        </listitem>
        <listitem>
            <para>Nova (including the use of multiple hypervisor
                drivers)</para>
        </listitem>
        <listitem>
            <para>Swift (or another object storage solution)</para>
        </listitem>
        <listitem>
            <para>Cinder</para>
        </listitem>
        <listitem>
            <para>Glance</para>
        </listitem>
        <listitem>
            <para>Neutron or nova-network</para>
        </listitem>
    </itemizedlist>
    <para>The exclusion of certain OpenStack components may limit or
        constrain the functionality of other components. If a design
        opts to include Heat but exclude Ceilometer, then the design
        will not be able to take advantage of Heat's auto scaling
        functionality (which relies on information from Ceilometer).
        Due to the fact that you can use Heat to spin up a large
        number of instances to perform the compute-intensive
        processing, including Heat in a compute-focused architecture
        design is strongly recommended.</para></section>
    <section xml:id="supplemental-software-arch-storage"><title>Supplemental Software</title>
    <para>While OpenStack is a fairly complete collection of software
        projects for building a platform for cloud services, there are
        additional pieces of software that might need to be added to
        any given OpenStack design.</para></section>
    <section xml:id="networking-software-arch-storage"><title>Networking Software</title>
    <para>OpenStack Neutron provides a wide variety of networking
        services for instances. There are many additional networking
        software packages that may be useful to manage the OpenStack
        components themselves. Some examples include HAProxy,
        keepalived, and various routing daemons (like Quagga). Some of
        these software packages, HAProxy in particular, are described
        in more detail in the OpenStack HA Guide (refer to Chapter 8
        of the OpenStack High Availability Guide). For a general
        purpose OpenStack cloud, it is reasonably likely that the
        OpenStack infrastructure components will need to be highly
        available, and therefore networking software packages like
        HAProxy will need to be included.</para></section>
    <section xml:id="management-software-arch-storage"><title>Management Software</title>
    <para>This includes software for providing clustering, logging,
        monitoring, and alerting. The factors for determining which
        software packages in this category should be selected is
        outside the scope of this design guide. This design guide
        focuses specifically on how the selected supplemental software
        solution impacts or affects the overall OpenStack cloud
        design.</para>
    <para>Clustering Software, such as Corosync or Pacemaker, is
        determined primarily by the availability design requirements.
        Therefore, the impact of including (or not including) these
        software packages is determined by the availability of the
        cloud infrastructure and the complexity of supporting the
        configuration after it is deployed. The OpenStack High
        Availability Guide provides more details on the installation
        and configuration of Corosync and Pacemaker, should these
        packages need to be included in the design.</para>
    <para>Requirements for logging, monitoring, and alerting are
        determined by operational considerations. Each of these
        sub-categories includes a number of various options. For
        example, in the logging sub-category one might consider
        Logstash, Splunk, Log Insight, or some other log
        aggregation-consolidation tool. Logs should be stored in a
        centralized location to make it easier to perform analytics
        against the data. Log data analytics engines can also provide
        automation and issue notification, by providing a mechanism to
        both alert and automatically attempt to remediate some of the
        more commonly known issues.</para>
    <para>If any of these software packages are needed, then the
        design must account for the additional resource consumption
        (CPU, RAM, storage, and network bandwidth for a log
        aggregation solution, for example). Some other potential
        design impacts include:</para>
    <itemizedlist>
        <listitem>
            <para>OS - Hypervisor combination: Ensure that the
                selected logging, monitoring, or alerting tools
                support the proposed OS-hypervisor combination.</para>
        </listitem>
        <listitem>
            <para>Network hardware: The network hardware selection
                needs to be supported by the logging, monitoring, and
                alerting software.</para>
        </listitem>
    </itemizedlist></section>
    <section xml:id="database-software-arch-storage"><title>Database Software</title>
    <para>Virtually all of the OpenStack components require access to
        back-end database services to store state and configuration
        information. Choose an appropriate back-end database which
        will satisfy the availability and fault tolerance requirements
        of the OpenStack services.</para>
    <para>MySQL is generally considered to be the de facto database
        for OpenStack, however, other compatible databases are also
        known to work. Note, however, that Ceilometer uses
        MongoDB.</para>
    <para>The solution selected to provide high availability for the
        database will change based on the selected database. If MySQL
        is selected, then a number of options are available. For
        active-active clustering a replication technology such as
        Galera can be used. For active-passive some form of shared
        storage must be used. Each of these potential solutions has an
        impact on the design:</para>
    <itemizedlist>
        <listitem>
            <para>Solutions that employ Galera/MariaDB will require at
                least three MySQL nodes.</para>
        </listitem>
        <listitem>
            <para>MongoDB will have its own design considerations,
                with regards to making the database highly
                available.</para>
        </listitem>
        <listitem>
            <para>OpenStack design, generally, does not include shared
                storage but for a high availability design some
                components might require it depending on the specific
                implementation.</para>
        </listitem>
    </itemizedlist></section></section>