operations-guide/doc/openstack-ops/ch_arch_scaling.xml

717 lines
29 KiB
XML

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE chapter [
<!ENTITY % openstack SYSTEM "openstack.ent">
%openstack;
]>
<chapter version="5.0" xml:id="scaling" xmlns="http://docbook.org/ns/docbook"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:ns5="http://www.w3.org/2000/svg"
xmlns:ns4="http://www.w3.org/1998/Math/MathML"
xmlns:ns3="http://www.w3.org/1999/xhtml"
xmlns:ns="http://docbook.org/ns/docbook">
<?dbhtml stop-chunking?>
<title>Scaling</title>
<para>Whereas traditional applications required larger hardware to scale
("vertical scaling"), cloud-based applications typically request more,
discrete hardware ("horizontal scaling"). If your cloud is successful,
eventually you must add resources to meet the increasing demand.<indexterm
class="singular">
<primary>scaling</primary>
<secondary>vertical vs. horizontal</secondary>
</indexterm></para>
<para>To suit the cloud paradigm, OpenStack itself is designed to be
horizontally scalable. Rather than switching to larger servers, you procure
more servers and simply install identically configured services. Ideally,
you scale out and load balance among groups of functionally identical
services (for example, compute nodes or <literal>nova-api</literal> nodes),
that communicate on a message bus.</para>
<section xml:id="starting">
<title>The Starting Point</title>
<para>Determining the scalability of your cloud and how to improve it is
an exercise with many variables to balance. No one solution meets
everyone's scalability goals. However, it is helpful to track a number of
metrics. Since you can define virtual hardware templates, called "flavors"
in OpenStack, you can start to make scaling decisions based on the flavors
you'll provide. These templates define sizes for memory in RAM, root disk
size, amount of ephemeral data disk space available, and number of cores
for starters.<indexterm class="singular">
<primary>virtual machine (VM)</primary>
</indexterm><indexterm class="singular">
<primary>hardware</primary>
<secondary>virtual hardware</secondary>
</indexterm><indexterm class="singular">
<primary>flavor</primary>
</indexterm><indexterm class="singular">
<primary>scaling</primary>
<secondary>metrics for</secondary>
</indexterm></para>
<para>The default OpenStack flavors are shown in <xref
linkend="os-flavors-table" />.</para>
<?hard-pagebreak ?>
<table rules="all" xml:id="os-flavors-table">
<caption>OpenStack default flavors</caption>
<col width="20%" />
<col width="20%" />
<col width="20%" />
<col width="20%" />
<col width="20%" />
<thead>
<tr>
<th align="left">Name</th>
<th align="right">Virtual cores</th>
<th align="right">Memory</th>
<th align="right">Disk</th>
<th align="right">Ephemeral</th>
</tr>
</thead>
<tbody>
<tr>
<td><para>m1.tiny</para></td>
<td align="right"><para>1</para></td>
<td align="right"><para>512 MB</para></td>
<td align="right"><para>1 GB</para></td>
<td align="right"><para>0 GB</para></td>
</tr>
<tr>
<td><para>m1.small</para></td>
<td align="right"><para>1</para></td>
<td align="right"><para>2 GB</para></td>
<td align="right"><para>10 GB</para></td>
<td align="right"><para>20 GB</para></td>
</tr>
<tr>
<td><para>m1.medium</para></td>
<td align="right"><para>2</para></td>
<td align="right"><para>4 GB</para></td>
<td align="right"><para>10 GB</para></td>
<td align="right"><para>40 GB</para></td>
</tr>
<tr>
<td><para>m1.large</para></td>
<td align="right"><para>4</para></td>
<td align="right"><para>8 GB</para></td>
<td align="right"><para>10 GB</para></td>
<td align="right"><para>80 GB</para></td>
</tr>
<tr>
<td><para>m1.xlarge</para></td>
<td align="right"><para>8</para></td>
<td align="right"><para>16 GB</para></td>
<td align="right"><para>10 GB</para></td>
<td align="right"><para>160 GB</para></td>
</tr>
</tbody>
</table>
<para>The starting point for most is the core count of your cloud. By
applying some ratios, you can gather information about: <itemizedlist>
<listitem>
<para>The number of virtual machines (VMs) you expect to run,
<code>((overcommit fraction &times;&#160;cores) / virtual cores per
instance)</code></para>
</listitem>
<listitem>
<para>How much storage is required <code>(flavor disk size
&times;&#160;number of instances)</code></para>
</listitem>
</itemizedlist> You can use these ratios to determine how much
additional infrastructure you need to support your cloud.</para>
<para>Here is an example using the ratios for gathering scalability
information for the number of VMs expected as well as the storage needed.
The following numbers support (200 / 2) &times; 16 = 1600 VM instances and
require 80 TB of storage for <code>/var/lib/nova/instances</code>:</para>
<itemizedlist>
<listitem>
<para>200 physical cores.</para>
</listitem>
<listitem>
<para>Most instances are size m1.medium (two virtual cores, 50 GB of
storage).</para>
</listitem>
<listitem>
<para>Default CPU overcommit ratio (<code>cpu_allocation_ratio</code>
in nova.conf) of 16:1.</para>
</listitem>
</itemizedlist>
<note>
<para>Regardless of the overcommit ratio, an instance can not be placed
on any physical node with fewer raw (pre-overcommit) resources than
instance flavor requires.</para>
</note>
<para>However, you need more than the core count alone to estimate the
load that the API services, database servers, and queue servers are likely
to encounter. You must also consider the usage patterns of your
cloud.</para>
<para>As a specific example, compare a cloud that supports a managed
web-hosting platform with one running integration tests for a development
project that creates one VM per code commit. In the former, the heavy work
of creating a VM happens only every few months, whereas the latter puts
constant heavy load on the cloud controller. You must consider your
average VM lifetime, as a larger number generally means less load on the
cloud controller.<indexterm class="singular">
<primary>cloud controllers</primary>
<secondary>scalability and</secondary>
</indexterm></para>
<para>Aside from the creation and termination of VMs, you must consider
the impact of users accessing the service—particularly on
<literal>nova-api</literal> and its associated database. Listing instances
garners a great deal of information and, given the frequency with which
users run this operation, a cloud with a large number of users can
increase the load significantly. This can occur even without their
knowledge—leaving the OpenStack dashboard instances tab open in the
browser refreshes the list of VMs every 30 seconds.</para>
<para>After you consider these factors, you can determine how many cloud
controller cores you require. A typical eight core, 8 GB of RAM server is
sufficient for up to a rack of compute nodes —&#160;given the above
caveats.</para>
<para>You must also consider key hardware specifications for the
performance of user VMs, as well as budget and performance needs,
including storage performance (spindles/core), memory availability
(RAM/core), network bandwidth<indexterm class="singular">
<primary>bandwidth</primary>
<secondary>hardware specifications and</secondary>
</indexterm> (Gbps/core), and overall CPU performance (CPU/core).</para>
<tip>
<para>For a discussion of metric tracking, including how to extract
metrics from your cloud, see <xref
linkend="logging_monitoring" />.</para>
</tip>
</section>
<section xml:id="add_controller_nodes">
<title>Adding Cloud Controller Nodes</title>
<para>You can facilitate the horizontal expansion of your cloud by adding
nodes. Adding compute nodes is straightforward—they are easily picked up
by the existing installation. However, you must consider some important
points when you design your cluster to be highly available.<indexterm
class="singular">
<primary>compute nodes</primary>
<secondary>adding</secondary>
</indexterm><indexterm class="singular">
<primary>high availability</primary>
</indexterm><indexterm class="singular">
<primary>configuration options</primary>
<secondary>high availability</secondary>
</indexterm><indexterm class="singular">
<primary>cloud controller nodes</primary>
<secondary>adding</secondary>
</indexterm><indexterm class="singular">
<primary>scaling</primary>
<secondary>adding cloud controller nodes</secondary>
</indexterm></para>
<para>Recall that a cloud controller node runs several different services.
You can install services that communicate only using the message queue
internally—<code>nova-scheduler</code> and <code>nova-console</code>—on a
new server for expansion. However, other integral parts require more
care.</para>
<para>You should load balance user-facing services such as dashboard,
<code>nova-api</code>, or the Object Storage proxy. Use any standard HTTP
load-balancing method (DNS round robin, hardware load balancer, or
software such as Pound or HAProxy). One caveat with dashboard is the VNC
proxy, which uses the WebSocket protocol—something that an L7 load
balancer might struggle with. See also <link
xlink:href="http://docs.openstack.org/developer/horizon/topics/deployment.html#session-storage"
xlink:title="Horizon session storage">Horizon session
storage</link>.</para>
<para>You can configure some services, such as <code>nova-api</code> and
<code>glance-api</code>, to use multiple processes by changing a flag in
their configuration file—allowing them to share work between multiple
cores on the one machine.</para>
<tip>
<para>Several options are available for MySQL load balancing, and the
supported AMQP brokers have built-in clustering support. Information on
how to configure these and many of the other services can be found in
<xref linkend="operations" xrefstyle="part-num-title" />.<indexterm
class="singular">
<primary>Advanced Message Queuing Protocol (AMQP)</primary>
</indexterm></para>
</tip>
</section>
<section xml:id="segregate_cloud">
<title>Segregating Your Cloud</title>
<para>When you want to offer users different regions to provide legal
considerations for data storage, redundancy across earthquake fault lines,
or for low-latency API calls, you segregate your cloud. Use one of the
following OpenStack methods to segregate your cloud:
<emphasis>cells</emphasis>, <emphasis>regions</emphasis>,
<emphasis>availability zones</emphasis>, or <emphasis>host
aggregates</emphasis>.<indexterm class="singular">
<primary>segregation methods</primary>
</indexterm><indexterm class="singular">
<primary>scaling</primary>
<secondary>cloud segregation</secondary>
</indexterm></para>
<para>Each method provides different functionality and can be best divided
into two groups:</para>
<itemizedlist>
<listitem>
<para>Cells and regions, which segregate an entire cloud and result in
running separate Compute deployments.</para>
</listitem>
<listitem>
<para><glossterm baseform="availability zone">Availability
zones</glossterm> and host aggregates, which merely divide a single
Compute deployment.</para>
</listitem>
</itemizedlist>
<para><xref linkend="segragation_methods" /> provides a comparison view of
each segregation method currently provided by OpenStack Compute.<indexterm
class="singular">
<primary>endpoints</primary>
<secondary>API endpoint</secondary>
</indexterm></para>
<table rules="all" xml:id="segragation_methods">
<caption>OpenStack segregation methods</caption>
<thead>
<tr>
<th></th>
<th>Cells</th>
<th>Regions</th>
<th>Availability zones</th>
<th>Host aggregates</th>
</tr>
</thead>
<tbody>
<tr>
<td><para><emphasis role="bold">Use when you need</emphasis>
</para></td>
<td><para>A single <glossterm>API endpoint</glossterm> for compute,
or you require a second level of scheduling.</para></td>
<td><para>Discrete regions with separate API endpoints and no
coordination between regions.</para></td>
<td><para>Logical separation within your nova deployment for
physical isolation or redundancy.</para></td>
<td><para>To schedule a group of hosts with common
features.</para></td>
</tr>
<tr>
<td><para><emphasis role="bold">Example</emphasis> </para></td>
<td><para>A cloud with multiple sites where you can schedule VMs
"anywhere" or on a particular site.</para></td>
<td><para>A cloud with multiple sites, where you schedule VMs to a
particular site and you want a shared infrastructure.</para></td>
<td><para>A single-site cloud with equipment fed by separate power
supplies.</para></td>
<td><para>Scheduling to hosts with trusted hardware
support.</para></td>
</tr>
<tr>
<td><para><emphasis role="bold">Overhead</emphasis> </para></td>
<td><para>Considered experimental.</para><para>A new service,
nova-cells.</para><para>Each cell has a full nova installation
except nova-api.</para></td>
<td><para>A different API endpoint for every
region.</para><para>Each region has a full nova installation.
</para></td>
<td><para>Configuration changes to <filename>nova.conf</filename>.</para></td>
<td><para>Configuration changes to <filename>nova.conf</filename>.</para></td>
</tr>
<tr>
<td><para><emphasis role="bold">Shared services</emphasis>
</para></td>
<td><para>Keystone</para><para><code>nova-api</code> </para></td>
<td><para>Keystone</para></td>
<td><para>Keystone</para><para>All nova services</para></td>
<td><para>Keystone</para><para>All nova services</para></td>
</tr>
</tbody>
</table>
<section xml:id="cells_regions">
<title>Cells and Regions</title>
<para>OpenStack Compute cells are designed to allow running the cloud in
a distributed fashion without having to use more complicated
technologies, or be invasive to existing nova installations. Hosts in a
cloud are partitioned into groups called <emphasis>cells</emphasis>.
Cells are configured in a tree. The top-level cell ("API cell") has a
host that runs the <code>nova-api</code> service, but no
<code>nova-compute</code> services. Each child cell runs all of the
other typical <code>nova-*</code> services found in a regular
installation, except for the <code>nova-api</code> service. Each cell
has its own message queue and database service and also runs
<code>nova-cells</code>, which manages the communication between the API
cell and child cells.<indexterm class="singular">
<primary>scaling</primary>
<secondary>cells and regions</secondary>
</indexterm><indexterm class="singular">
<primary>cells</primary>
<secondary>cloud segregation</secondary>
</indexterm><indexterm class="singular">
<primary>region</primary>
</indexterm></para>
<para>This allows for a single API server being used to control access
to multiple cloud installations. Introducing a second level of
scheduling (the cell selection), in addition to the regular
<code>nova-scheduler</code> selection of hosts, provides greater
flexibility to control where virtual machines are run.</para>
<para>Unlike having a single API endpoint, regions have a separate API endpoint
per installation, allowing for a more discrete separation. Users wanting
to run instances across sites have to explicitly select a region.
However, the additional complexity of a running a new service is not
required.</para>
<para>The OpenStack dashboard (horizon) can be configured to use multiple
regions. This can be configured through the <option>AVAILABLE_REGIONS</option> parameter.</para>
</section>
<section xml:id="availability_zones">
<title>Availability Zones and Host Aggregates</title>
<para>You can use availability zones, host aggregates, or both to
partition a nova deployment.<indexterm class="singular">
<primary>scaling</primary>
<secondary>availability zones</secondary>
</indexterm></para>
<para>Availability zones are implemented through and configured in a
similar way to host aggregates.</para>
<para>However, you use them for different reasons.</para>
<section xml:id="az_s3">
<title>Availability zone</title>
<para>This enables you to arrange OpenStack compute hosts into logical
groups and provides a form of physical isolation and redundancy from
other availability zones, such as by using a separate power supply or
network equipment.<indexterm class="singular">
<primary>availability zone</primary>
</indexterm></para>
<para>You define the availability zone in which a specified compute
host resides locally on each server. An availability zone is commonly
used to identify a set of servers that have a common attribute. For
instance, if some of the racks in your data center are on a separate
power source, you can put servers in those racks in their own
availability zone. Availability zones can also help separate different
classes of hardware.</para>
<para>When users provision resources, they can specify from which
availability zone they want their instance to be built. This allows
cloud consumers to ensure that their application resources are spread
across disparate machines to achieve high availability in the event of
hardware failure.</para>
</section>
<section xml:id="ha_s3">
<title>Host aggregates zone</title>
<para>This enables you to partition OpenStack Compute deployments into
logical groups for load balancing and instance distribution. You can
use host aggregates to further partition an availability zone. For
example, you might use host aggregates to partition an availability
zone into groups of hosts that either share common resources, such as
storage and network, or have a special property, such as trusted
computing hardware.<indexterm class="singular">
<primary>scaling</primary>
<secondary>host aggregate</secondary>
</indexterm><indexterm class="singular">
<primary>host aggregate</primary>
</indexterm></para>
<para>A common use of host aggregates is to provide information for
use with the <literal>nova-scheduler</literal>. For example, you might
use a host aggregate to group a set of hosts that share specific
flavors or images.</para>
<para>The general case for this is setting key-value pairs in the
aggregate metadata and matching key-value pairs in flavor's <parameter>extra_specs</parameter>
metadata. The <parameter>AggregateInstanceExtraSpecsFilter</parameter> in
the filter scheduler will enforce that instances be scheduled only on
hosts in aggregates that define the same key to the same value.</para>
<para>An advanced use of this general concept allows different
flavor types to run with different CPU and RAM allocation ratios so
that high-intensity computing loads and low-intensity development and
testing systems can share the same cloud without either starving the
high-use systems or wasting resources on low-utilization systems. This
works by setting <parameter>metadata</parameter> in your host
aggregates and matching <parameter>extra_specs</parameter> in your
flavor types.</para>
<para>The first step is setting the aggregate metadata keys
<parameter>cpu_allocation_ratio</parameter> and
<parameter>ram_allocation_ratio</parameter> to a floating-point
value. The filter schedulers
<parameter>AggregateCoreFilter</parameter> and
<parameter>AggregateRamFilter</parameter> will use those values rather
than the global defaults in <filename>nova.conf</filename> when
scheduling to hosts in the aggregate. It is important to be cautious
when using this feature, since each host can be in multiple aggregates
but should have only one allocation ratio for each resources. It is up
to you to avoid putting a host in multiple aggregates that define
different values for the same <phrase
role="keep-together">resource</phrase>.</para>
<para>This is the first half of the equation. To get flavor types
that are guaranteed a particular ratio, you must set the
<parameter>extra_specs</parameter> in the flavor type to the
key-value pair you want to match in the aggregate. For example, if you
define <parameter>extra_specs</parameter>
<parameter>cpu_allocation_ratio</parameter> to "1.0", then instances
of that type will run in aggregates only where the metadata key
<parameter>cpu_allocation_ratio</parameter> is also defined as "1.0."
In practice, it is better to define an additional key-value pair in
the aggregate metadata to match on rather than match directly on
<parameter>cpu_allocation_ratio</parameter> or
<parameter>core_allocation_ratio</parameter>. This allows better
abstraction. For example, by defining a key
<parameter>overcommit</parameter> and setting a value of "high,"
"medium," or "low," you could then tune the numeric allocation ratios
in the aggregates without also needing to change all flavor types
relating to them.</para>
<note>
<para>Previously, all services had an availability zone. Currently,
only the <literal>nova-compute</literal> service has its own
availability zone. Services such as
<literal>nova-scheduler</literal>, <literal>nova-network</literal>,
and <literal>nova-conductor</literal> have always spanned all
availability zones.</para>
<para>When you run any of the following operations, the services
appear in their own internal availability zone
(CONF.internal_service_availability_zone): <itemizedlist>
<listitem>
<para>nova host-list (os-hosts)</para>
</listitem>
<listitem>
<para>euca-describe-availability-zones verbose</para>
</listitem>
</itemizedlist>The internal availability zone is hidden in
euca-describe-availability_zones (nonverbose).</para>
<para>CONF.node_availability_zone has been renamed to
CONF.default_availability_zone and is used only by the
<literal>nova-api</literal> and <literal>nova-scheduler</literal>
services.</para>
<para>CONF.node_availability_zone still works but is
deprecated.</para>
</note>
</section>
</section>
</section>
<section xml:id="scalable_hardware">
<title>Scalable Hardware</title>
<para>While several resources already exist to help with deploying and
installing OpenStack, it's very important to make sure that you have your
deployment planned out ahead of time. This guide presumes that you have at
least set aside a rack for the OpenStack cloud but also offers suggestions
for when and what to scale.</para>
<section xml:id="hardware_procure">
<title>Hardware Procurement</title>
<para>“The Cloud” has been described as a volatile environment where
servers can be created and terminated at will. While this may be true,
it does not mean that your servers must be volatile. Ensuring that your
cloud's hardware is stable and configured correctly means that your
cloud environment remains up and running. Basically, put effort into
creating a stable hardware environment so that you can host a cloud that
users may treat as unstable and volatile.<indexterm class="singular">
<primary>servers</primary>
<secondary>avoiding volatility in</secondary>
</indexterm><indexterm class="singular">
<primary>hardware</primary>
<secondary>scalability planning</secondary>
</indexterm><indexterm class="singular">
<primary>scaling</primary>
<secondary>hardware procurement</secondary>
</indexterm></para>
<para>OpenStack can be deployed on any hardware supported by an
OpenStack-compatible Linux distribution.</para>
<para>Hardware does not have to be consistent, but it should at least
have the same type of CPU to support instance migration.</para>
<para>The typical hardware recommended for use with OpenStack is the
standard value-for-money offerings that most hardware vendors stock. It
should be straightforward to divide your procurement into building
blocks such as "compute," "object storage," and "cloud controller," and
request as many of these as you need. Alternatively, should you be
unable to spend more, if you have existing servers—provided they meet
your performance requirements and virtualization technology—they are
quite likely to be able to support OpenStack.</para>
</section>
<section xml:id="capacity_planning">
<title>Capacity Planning</title>
<para>OpenStack is designed to increase in size in a straightforward
manner. Taking into account the considerations that we've mentioned in
this chapter—particularly on the sizing of the cloud controller—it
should be possible to procure additional compute or object storage nodes
as needed. New nodes do not need to be the same specification, or even
vendor, as existing nodes.<indexterm class="singular">
<primary>capability</primary>
<secondary>scaling and</secondary>
</indexterm><indexterm class="singular">
<primary>weight</primary>
</indexterm><indexterm class="singular">
<primary>capacity planning</primary>
</indexterm><indexterm class="singular">
<primary>scaling</primary>
<secondary>capacity planning</secondary>
</indexterm></para>
<para>For compute nodes, <code>nova-scheduler</code> will take care of
differences in sizing having to do with core count and RAM amounts;
however, you should consider that the user experience changes with
differing CPU speeds. When adding object storage nodes, a
<glossterm>weight</glossterm> should be specified that reflects the
<glossterm>capability</glossterm> of the node.</para>
<para>Monitoring the resource usage and user growth will enable you to
know when to procure. <xref linkend="logging_monitoring" /> details some
useful metrics.</para>
</section>
<section xml:id="burin_testing">
<title>Burn-in Testing</title>
<para>The chances of failure for the server's hardware are high at the start and the
end of its life. As a result, dealing with hardware
failures while in production can be avoided by appropriate burn-in
testing to attempt to trigger the early-stage failures. The general
principle is to stress the hardware to its limits. Examples of burn-in
tests include running a CPU or disk benchmark for several
days.<indexterm class="singular">
<primary>testing</primary>
<secondary>burn-in testing</secondary>
</indexterm><indexterm class="singular">
<primary>troubleshooting</primary>
<secondary>burn-in testing</secondary>
</indexterm><indexterm class="singular">
<primary>burn-in testing</primary>
</indexterm><indexterm class="singular">
<primary>scaling</primary>
<secondary>burn-in testing</secondary>
</indexterm></para>
</section>
</section>
</chapter>