operations-guide/doc/openstack-ops/ch_arch_scaling.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE chapter [
<!ENTITY % openstack SYSTEM "openstack.ent">
%openstack;
]>
<chapter version="5.0" xml:id="scaling" xmlns="http://docbook.org/ns/docbook"
         xmlns:xlink="http://www.w3.org/1999/xlink"
         xmlns:xi="http://www.w3.org/2001/XInclude"
         xmlns:ns5="http://www.w3.org/2000/svg"
         xmlns:ns4="http://www.w3.org/1998/Math/MathML"
         xmlns:ns3="http://www.w3.org/1999/xhtml"
         xmlns:ns="http://docbook.org/ns/docbook">
  <?dbhtml stop-chunking?>

  <title>Scaling</title>

  <para>Whereas traditional applications required larger hardware to scale
  ("vertical scaling"), cloud-based applications typically request more,
  discrete hardware ("horizontal scaling"). If your cloud is successful,
  eventually you must add resources to meet the increasing demand.<indexterm
      class="singular">
      <primary>scaling</primary>

      <secondary>vertical vs. horizontal</secondary>
    </indexterm></para>

  <para>To suit the cloud paradigm, OpenStack itself is designed to be
  horizontally scalable. Rather than switching to larger servers, you procure
  more servers and simply install identically configured services. Ideally,
  you scale out and load balance among groups of functionally identical
  services (for example, compute nodes or <literal>nova-api</literal> nodes),
  that communicate on a message bus.</para>

  <section xml:id="starting">
    <title>The Starting Point</title>

    <para>Determining the scalability of your cloud and how to improve it is
    an exercise with many variables to balance. No one solution meets
    everyone's scalability goals. However, it is helpful to track a number of
    metrics. Since you can define virtual hardware templates, called "flavors"
    in OpenStack, you can start to make scaling decisions based on the flavors
    you'll provide. These templates define sizes for memory in RAM, root disk
    size, amount of ephemeral data disk space available, and number of cores
    for starters.<indexterm class="singular">
        <primary>virtual machine (VM)</primary>
      </indexterm><indexterm class="singular">
        <primary>hardware</primary>

        <secondary>virtual hardware</secondary>
      </indexterm><indexterm class="singular">
        <primary>flavor</primary>
      </indexterm><indexterm class="singular">
        <primary>scaling</primary>

        <secondary>metrics for</secondary>
      </indexterm></para>

    <para>The default OpenStack flavors are shown in <xref
    linkend="os-flavors-table" />.</para>

    <?hard-pagebreak ?>

    <table rules="all" xml:id="os-flavors-table">
      <caption>OpenStack default flavors</caption>

      <col width="20%" />

      <col width="20%" />

      <col width="20%" />

      <col width="20%" />

      <col width="20%" />

      <thead>
        <tr>
          <th align="left">Name</th>

          <th align="right">Virtual cores</th>

          <th align="right">Memory</th>

          <th align="right">Disk</th>

          <th align="right">Ephemeral</th>
        </tr>
      </thead>

      <tbody>
        <tr>
          <td><para>m1.tiny</para></td>

          <td align="right"><para>1</para></td>

          <td align="right"><para>512 MB</para></td>

          <td align="right"><para>1 GB</para></td>

          <td align="right"><para>0 GB</para></td>
        </tr>

        <tr>
          <td><para>m1.small</para></td>

          <td align="right"><para>1</para></td>

          <td align="right"><para>2 GB</para></td>

          <td align="right"><para>10 GB</para></td>

          <td align="right"><para>20 GB</para></td>
        </tr>

        <tr>
          <td><para>m1.medium</para></td>

          <td align="right"><para>2</para></td>

          <td align="right"><para>4 GB</para></td>

          <td align="right"><para>10 GB</para></td>

          <td align="right"><para>40 GB</para></td>
        </tr>

        <tr>
          <td><para>m1.large</para></td>

          <td align="right"><para>4</para></td>

          <td align="right"><para>8 GB</para></td>

          <td align="right"><para>10 GB</para></td>

          <td align="right"><para>80 GB</para></td>
        </tr>

        <tr>
          <td><para>m1.xlarge</para></td>

          <td align="right"><para>8</para></td>

          <td align="right"><para>16 GB</para></td>

          <td align="right"><para>10 GB</para></td>

          <td align="right"><para>160 GB</para></td>
        </tr>
      </tbody>
    </table>

    <para>The starting point for most is the core count of your cloud. By
    applying some ratios, you can gather information about: <itemizedlist>
        <listitem>
          <para>The number of virtual machines (VMs) you expect to run,
          <code>((overcommit fraction &times;&#160;cores) / virtual cores per
          instance)</code></para>
        </listitem>

        <listitem>
          <para>How much storage is required <code>(flavor disk size
            &times;&#160;number of instances)</code></para>
        </listitem>
      </itemizedlist> You can use these ratios to determine how much
    additional infrastructure you need to support your cloud.</para>

    <para>Here is an example using the ratios for gathering scalability
    information for the number of VMs expected as well as the storage needed.
    The following numbers support (200 / 2) &times; 16 = 1600 VM instances and
    require 80 TB of storage for <code>/var/lib/nova/instances</code>:</para>

    <itemizedlist>
      <listitem>
        <para>200 physical cores.</para>
      </listitem>

      <listitem>
        <para>Most instances are size m1.medium (two virtual cores, 50 GB of
        storage).</para>
      </listitem>

      <listitem>
        <para>Default CPU overcommit ratio (<code>cpu_allocation_ratio</code>
        in nova.conf) of 16:1.</para>
      </listitem>
    </itemizedlist>

    <note>
     <para>Regardless of the overcommit ratio, an instance can not be placed
           on any physical node with fewer raw (pre-overcommit) resources than
           instance flavor requires.</para>
    </note>

    <para>However, you need more than the core count alone to estimate the
    load that the API services, database servers, and queue servers are likely
    to encounter. You must also consider the usage patterns of your
    cloud.</para>

    <para>As a specific example, compare a cloud that supports a managed
    web-hosting platform with one running integration tests for a development
    project that creates one VM per code commit. In the former, the heavy work
    of creating a VM happens only every few months, whereas the latter puts
    constant heavy load on the cloud controller. You must consider your
    average VM lifetime, as a larger number generally means less load on the
    cloud controller.<indexterm class="singular">
        <primary>cloud controllers</primary>

        <secondary>scalability and</secondary>
      </indexterm></para>

    <para>Aside from the creation and termination of VMs, you must consider
    the impact of users accessing the service—particularly on
    <literal>nova-api</literal> and its associated database. Listing instances
    garners a great deal of information and, given the frequency with which
    users run this operation, a cloud with a large number of users can
    increase the load significantly. This can occur even without their
    knowledge—leaving the OpenStack dashboard instances tab open in the
    browser refreshes the list of VMs every 30 seconds.</para>

    <para>After you consider these factors, you can determine how many cloud
    controller cores you require. A typical eight core, 8 GB of RAM server is
    sufficient for up to a rack of compute nodes —&#160;given the above
    caveats.</para>

    <para>You must also consider key hardware specifications for the
    performance of user VMs, as well as budget and performance needs,
    including storage performance (spindles/core), memory availability
    (RAM/core), network bandwidth<indexterm class="singular">
        <primary>bandwidth</primary>

        <secondary>hardware specifications and</secondary>
      </indexterm> (Gbps/core), and overall CPU performance (CPU/core).</para>

    <tip>
      <para>For a discussion of metric tracking, including how to extract
      metrics from your cloud, see <xref
      linkend="logging_monitoring" />.</para>
    </tip>
  </section>

  <section xml:id="add_controller_nodes">
    <title>Adding Cloud Controller Nodes</title>

    <para>You can facilitate the horizontal expansion of your cloud by adding
    nodes. Adding compute nodes is straightforward—they are easily picked up
    by the existing installation. However, you must consider some important
    points when you design your cluster to be highly available.<indexterm
        class="singular">
        <primary>compute nodes</primary>

        <secondary>adding</secondary>
      </indexterm><indexterm class="singular">
        <primary>high availability</primary>
      </indexterm><indexterm class="singular">
        <primary>configuration options</primary>

        <secondary>high availability</secondary>
      </indexterm><indexterm class="singular">
        <primary>cloud controller nodes</primary>

        <secondary>adding</secondary>
      </indexterm><indexterm class="singular">
        <primary>scaling</primary>

        <secondary>adding cloud controller nodes</secondary>
      </indexterm></para>

    <para>Recall that a cloud controller node runs several different services.
    You can install services that communicate only using the message queue
    internally—<code>nova-scheduler</code> and <code>nova-console</code>—on a
    new server for expansion. However, other integral parts require more
    care.</para>

    <para>You should load balance user-facing services such as dashboard,
    <code>nova-api</code>, or the Object Storage proxy. Use any standard HTTP
    load-balancing method (DNS round robin, hardware load balancer, or
    software such as Pound or HAProxy). One caveat with dashboard is the VNC
    proxy, which uses the WebSocket protocol—something that an L7 load
    balancer might struggle with. See also <link
    xlink:href="http://docs.openstack.org/developer/horizon/topics/deployment.html#session-storage"
    xlink:title="Horizon session storage">Horizon session
    storage</link>.</para>

    <para>You can configure some services, such as <code>nova-api</code> and
    <code>glance-api</code>, to use multiple processes by changing a flag in
    their configuration file—allowing them to share work between multiple
    cores on the one machine.</para>

    <tip>
      <para>Several options are available for MySQL load balancing, and the
      supported AMQP brokers have built-in clustering support. Information on
      how to configure these and many of the other services can be found in
      <xref linkend="operations" xrefstyle="part-num-title" />.<indexterm
          class="singular">
          <primary>Advanced Message Queuing Protocol (AMQP)</primary>
        </indexterm></para>
    </tip>
  </section>

  <section xml:id="segregate_cloud">
    <title>Segregating Your Cloud</title>

    <para>When you want to offer users different regions to provide legal
    considerations for data storage, redundancy across earthquake fault lines,
    or for low-latency API calls, you segregate your cloud. Use one of the
    following OpenStack methods to segregate your cloud:
    <emphasis>cells</emphasis>, <emphasis>regions</emphasis>,
    <emphasis>availability zones</emphasis>, or <emphasis>host
    aggregates</emphasis>.<indexterm class="singular">
        <primary>segregation methods</primary>
      </indexterm><indexterm class="singular">
        <primary>scaling</primary>

        <secondary>cloud segregation</secondary>
      </indexterm></para>

    <para>Each method provides different functionality and can be best divided
    into two groups:</para>

    <itemizedlist>
      <listitem>
        <para>Cells and regions, which segregate an entire cloud and result in
        running separate Compute deployments.</para>
      </listitem>

      <listitem>
        <para><glossterm baseform="availability zone">Availability
        zones</glossterm> and host aggregates, which merely divide a single
        Compute deployment.</para>
      </listitem>
    </itemizedlist>

    <para><xref linkend="segragation_methods" /> provides a comparison view of
    each segregation method currently provided by OpenStack Compute.<indexterm
        class="singular">
        <primary>endpoints</primary>

        <secondary>API endpoint</secondary>
      </indexterm></para>

    <table rules="all" xml:id="segragation_methods">
      <caption>OpenStack segregation methods</caption>

      <thead>
        <tr>
          <th></th>

          <th>Cells</th>

          <th>Regions</th>

          <th>Availability zones</th>

          <th>Host aggregates</th>
        </tr>
      </thead>

      <tbody>
        <tr>
          <td><para><emphasis role="bold">Use when you need</emphasis>
          </para></td>

          <td><para>A single <glossterm>API endpoint</glossterm> for compute,
          or you require a second level of scheduling.</para></td>

          <td><para>Discrete regions with separate API endpoints and no
          coordination between regions.</para></td>

          <td><para>Logical separation within your nova deployment for
          physical isolation or redundancy.</para></td>

          <td><para>To schedule a group of hosts with common
          features.</para></td>
        </tr>

        <tr>
          <td><para><emphasis role="bold">Example</emphasis> </para></td>

          <td><para>A cloud with multiple sites where you can schedule VMs
          "anywhere" or on a particular site.</para></td>

          <td><para>A cloud with multiple sites, where you schedule VMs to a
          particular site and you want a shared infrastructure.</para></td>

          <td><para>A single-site cloud with equipment fed by separate power
          supplies.</para></td>

          <td><para>Scheduling to hosts with trusted hardware
          support.</para></td>
        </tr>

        <tr>
          <td><para><emphasis role="bold">Overhead</emphasis> </para></td>

          <td><para>Considered experimental.</para><para>A new service,
          nova-cells.</para><para>Each cell has a full nova installation
          except nova-api.</para></td>

          <td><para>A different API endpoint for every
          region.</para><para>Each region has a full nova installation.
          </para></td>

          <td><para>Configuration changes to <filename>nova.conf</filename>.</para></td>

          <td><para>Configuration changes to <filename>nova.conf</filename>.</para></td>
        </tr>

        <tr>
          <td><para><emphasis role="bold">Shared services</emphasis>
          </para></td>

          <td><para>Keystone</para><para><code>nova-api</code> </para></td>

          <td><para>Keystone</para></td>

          <td><para>Keystone</para><para>All nova services</para></td>

          <td><para>Keystone</para><para>All nova services</para></td>
        </tr>
      </tbody>
    </table>

    <section xml:id="cells_regions">
      <title>Cells and Regions</title>

      <para>OpenStack Compute cells are designed to allow running the cloud in
      a distributed fashion without having to use more complicated
      technologies, or be invasive to existing nova installations. Hosts in a
      cloud are partitioned into groups called <emphasis>cells</emphasis>.
      Cells are configured in a tree. The top-level cell ("API cell") has a
      host that runs the <code>nova-api</code> service, but no
      <code>nova-compute</code> services. Each child cell runs all of the
      other typical <code>nova-*</code> services found in a regular
      installation, except for the <code>nova-api</code> service. Each cell
      has its own message queue and database service and also runs
      <code>nova-cells</code>, which manages the communication between the API
      cell and child cells.<indexterm class="singular">
          <primary>scaling</primary>

          <secondary>cells and regions</secondary>
        </indexterm><indexterm class="singular">
          <primary>cells</primary>

          <secondary>cloud segregation</secondary>
        </indexterm><indexterm class="singular">
          <primary>region</primary>
        </indexterm></para>

      <para>This allows for a single API server being used to control access
      to multiple cloud installations. Introducing a second level of
      scheduling (the cell selection), in addition to the regular
      <code>nova-scheduler</code> selection of hosts, provides greater
      flexibility to control where virtual machines are run.</para>

      <para>Unlike having a single API endpoint, regions have a separate API endpoint
      per installation, allowing for a more discrete separation. Users wanting
      to run instances across sites have to explicitly select a region.
      However, the additional complexity of a running a new service is not
      required.</para>

      <para>The OpenStack dashboard (horizon) can be configured to use multiple
      regions. This can be configured through the <option>AVAILABLE_REGIONS</option> parameter.</para>
    </section>

    <section xml:id="availability_zones">
      <title>Availability Zones and Host Aggregates</title>

      <para>You can use availability zones, host aggregates, or both to
      partition a nova deployment.<indexterm class="singular">
          <primary>scaling</primary>

          <secondary>availability zones</secondary>
        </indexterm></para>

      <para>Availability zones are implemented through and configured in a
      similar way to host aggregates.</para>

      <para>However, you use them for different reasons.</para>

      <section xml:id="az_s3">
        <title>Availability zone</title>

        <para>This enables you to arrange OpenStack compute hosts into logical
        groups and provides a form of physical isolation and redundancy from
        other availability zones, such as by using a separate power supply or
        network equipment.<indexterm class="singular">
            <primary>availability zone</primary>
          </indexterm></para>

        <para>You define the availability zone in which a specified compute
        host resides locally on each server. An availability zone is commonly
        used to identify a set of servers that have a common attribute. For
        instance, if some of the racks in your data center are on a separate
        power source, you can put servers in those racks in their own
        availability zone. Availability zones can also help separate different
        classes of hardware.</para>

        <para>When users provision resources, they can specify from which
        availability zone they want their instance to be built. This allows
        cloud consumers to ensure that their application resources are spread
        across disparate machines to achieve high availability in the event of
        hardware failure.</para>
      </section>

      <section xml:id="ha_s3">
        <title>Host aggregates zone</title>

        <para>This enables you to partition OpenStack Compute deployments into
        logical groups for load balancing and instance distribution. You can
        use host aggregates to further partition an availability zone. For
        example, you might use host aggregates to partition an availability
        zone into groups of hosts that either share common resources, such as
        storage and network, or have a special property, such as trusted
        computing hardware.<indexterm class="singular">
            <primary>scaling</primary>

            <secondary>host aggregate</secondary>
          </indexterm><indexterm class="singular">
            <primary>host aggregate</primary>
          </indexterm></para>

        <para>A common use of host aggregates is to provide information for
        use with the <literal>nova-scheduler</literal>. For example, you might
        use a host aggregate to group a set of hosts that share specific
        flavors or images.</para>

        <para>The general case for this is setting key-value pairs in the
        aggregate metadata and matching key-value pairs in flavor's <parameter>extra_specs</parameter>
        metadata. The <parameter>AggregateInstanceExtraSpecsFilter</parameter> in
        the filter scheduler will enforce that instances be scheduled only on
        hosts in aggregates that define the same key to the same value.</para>

        <para>An advanced use of this general concept allows different
        flavor types to run with different CPU and RAM allocation ratios so
        that high-intensity computing loads and low-intensity development and
        testing systems can share the same cloud without either starving the
        high-use systems or wasting resources on low-utilization systems. This
        works by setting <parameter>metadata</parameter> in your host
        aggregates and matching <parameter>extra_specs</parameter> in your
        flavor types.</para>

        <para>The first step is setting the aggregate metadata keys
        <parameter>cpu_allocation_ratio</parameter> and
        <parameter>ram_allocation_ratio</parameter> to a floating-point
        value. The filter schedulers
        <parameter>AggregateCoreFilter</parameter> and
        <parameter>AggregateRamFilter</parameter> will use those values rather
        than the global defaults in <filename>nova.conf</filename> when
        scheduling to hosts in the aggregate. It is important to be cautious
        when using this feature, since each host can be in multiple aggregates
        but should have only one allocation ratio for each resources. It is up
        to you to avoid putting a host in multiple aggregates that define
        different values for the same <phrase
        role="keep-together">resource</phrase>.</para>

        <para>This is the first half of the equation. To get flavor types
        that are guaranteed a particular ratio, you must set the
        <parameter>extra_specs</parameter> in the flavor type to the
        key-value pair you want to match in the aggregate. For example, if you
        define <parameter>extra_specs</parameter>
        <parameter>cpu_allocation_ratio</parameter> to "1.0", then instances
        of that type will run in aggregates only where the metadata key
        <parameter>cpu_allocation_ratio</parameter> is also defined as "1.0."
        In practice, it is better to define an additional key-value pair in
        the aggregate metadata to match on rather than match directly on
        <parameter>cpu_allocation_ratio</parameter> or
        <parameter>core_allocation_ratio</parameter>. This allows better
        abstraction. For example, by defining a key
        <parameter>overcommit</parameter> and setting a value of "high,"
        "medium," or "low," you could then tune the numeric allocation ratios
        in the aggregates without also needing to change all flavor types
        relating to them.</para>

        <note>
          <para>Previously, all services had an availability zone. Currently,
          only the <literal>nova-compute</literal> service has its own
          availability zone. Services such as
          <literal>nova-scheduler</literal>, <literal>nova-network</literal>,
          and <literal>nova-conductor</literal> have always spanned all
          availability zones.</para>

          <para>When you run any of the following operations, the services
          appear in their own internal availability zone
          (CONF.internal_service_availability_zone): <itemizedlist>
              <listitem>
                <para>nova host-list (os-hosts)</para>
              </listitem>

              <listitem>
                <para>euca-describe-availability-zones verbose</para>
              </listitem>
            </itemizedlist>The internal availability zone is hidden in
          euca-describe-availability_zones (nonverbose).</para>

          <para>CONF.node_availability_zone has been renamed to
          CONF.default_availability_zone and is used only by the
          <literal>nova-api</literal> and <literal>nova-scheduler</literal>
          services.</para>

          <para>CONF.node_availability_zone still works but is
          deprecated.</para>
        </note>
      </section>
    </section>
  </section>

  <section xml:id="scalable_hardware">
    <title>Scalable Hardware</title>

    <para>While several resources already exist to help with deploying and
    installing OpenStack, it's very important to make sure that you have your
    deployment planned out ahead of time. This guide presumes that you have at
    least set aside a rack for the OpenStack cloud but also offers suggestions
    for when and what to scale.</para>

    <section xml:id="hardware_procure">
      <title>Hardware Procurement</title>

      <para>“The Cloud” has been described as a volatile environment where
      servers can be created and terminated at will. While this may be true,
      it does not mean that your servers must be volatile. Ensuring that your
      cloud's hardware is stable and configured correctly means that your
      cloud environment remains up and running. Basically, put effort into
      creating a stable hardware environment so that you can host a cloud that
      users may treat as unstable and volatile.<indexterm class="singular">
          <primary>servers</primary>

          <secondary>avoiding volatility in</secondary>
        </indexterm><indexterm class="singular">
          <primary>hardware</primary>

          <secondary>scalability planning</secondary>
        </indexterm><indexterm class="singular">
          <primary>scaling</primary>

          <secondary>hardware procurement</secondary>
        </indexterm></para>

      <para>OpenStack can be deployed on any hardware supported by an
      OpenStack-compatible Linux distribution.</para>

      <para>Hardware does not have to be consistent, but it should at least
      have the same type of CPU to support instance migration.</para>

      <para>The typical hardware recommended for use with OpenStack is the
      standard value-for-money offerings that most hardware vendors stock. It
      should be straightforward to divide your procurement into building
      blocks such as "compute," "object storage," and "cloud controller," and
      request as many of these as you need. Alternatively, should you be
      unable to spend more, if you have existing servers—provided they meet
      your performance requirements and virtualization technology—they are
      quite likely to be able to support OpenStack.</para>
    </section>

    <section xml:id="capacity_planning">
      <title>Capacity Planning</title>

      <para>OpenStack is designed to increase in size in a straightforward
      manner. Taking into account the considerations that we've mentioned in
      this chapter—particularly on the sizing of the cloud controller—it
      should be possible to procure additional compute or object storage nodes
      as needed. New nodes do not need to be the same specification, or even
      vendor, as existing nodes.<indexterm class="singular">
          <primary>capability</primary>

          <secondary>scaling and</secondary>
        </indexterm><indexterm class="singular">
          <primary>weight</primary>
        </indexterm><indexterm class="singular">
          <primary>capacity planning</primary>
        </indexterm><indexterm class="singular">
          <primary>scaling</primary>

          <secondary>capacity planning</secondary>
        </indexterm></para>

      <para>For compute nodes, <code>nova-scheduler</code> will take care of
      differences in sizing having to do with core count and RAM amounts;
      however, you should consider that the user experience changes with
      differing CPU speeds. When adding object storage nodes, a
      <glossterm>weight</glossterm> should be specified that reflects the
      <glossterm>capability</glossterm> of the node.</para>

      <para>Monitoring the resource usage and user growth will enable you to
      know when to procure. <xref linkend="logging_monitoring" /> details some
      useful metrics.</para>
    </section>

    <section xml:id="burin_testing">
      <title>Burn-in Testing</title>

      <para>The chances of failure for the server's hardware are high at the start and the
      end of its life. As a result, dealing with hardware
      failures while in production can be avoided by appropriate burn-in
      testing to attempt to trigger the early-stage failures. The general
      principle is to stress the hardware to its limits. Examples of burn-in
      tests include running a CPU or disk benchmark for several
      days.<indexterm class="singular">
          <primary>testing</primary>

          <secondary>burn-in testing</secondary>
        </indexterm><indexterm class="singular">
          <primary>troubleshooting</primary>

          <secondary>burn-in testing</secondary>
        </indexterm><indexterm class="singular">
          <primary>burn-in testing</primary>
        </indexterm><indexterm class="singular">
          <primary>scaling</primary>

          <secondary>burn-in testing</secondary>
        </indexterm></para>
    </section>
  </section>
</chapter>