operations-guide/doc/openstack-ops/ch_arch_cloud_controller.xml

788 lines
31 KiB
XML

<?xml version="1.0" encoding="UTF-8"?>
<chapter version="5.0" xml:id="cloud_controller_design"
xmlns="http://docbook.org/ns/docbook"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:ns5="http://www.w3.org/1999/xhtml"
xmlns:ns4="http://www.w3.org/2000/svg"
xmlns:ns3="http://www.w3.org/1998/Math/MathML"
xmlns:db="http://docbook.org/ns/docbook">
<?dbhtml stop-chunking?>
<title>Designing for Cloud Controllers and <phrase
role="keep-together">Cloud Management</phrase></title>
<para>OpenStack is designed to be massively horizontally scalable, which
allows all services to be distributed widely. However, to simplify this
guide, we have decided to discuss services of a more central nature, using
the concept of a <emphasis>cloud controller</emphasis>. A cloud controller
is just a conceptual simplification. In the real world, you design an
architecture for your cloud controller that enables high availability so
that if any node fails, another can take over the required tasks. In
reality, cloud controller tasks are spread out across more than a single
node.<indexterm class="singular">
<primary>design considerations</primary>
<secondary>cloud controller services</secondary>
</indexterm><indexterm class="singular">
<primary>cloud controllers</primary>
<secondary>concept of</secondary>
</indexterm></para>
<para>The cloud controller provides the central management system for
OpenStack deployments. Typically, the cloud controller manages
authentication and sends messaging to all the systems through a message
queue.</para>
<para>For many deployments, the cloud controller is a single node. However,
to have high availability, you have to take a few considerations into
account, which we'll cover in this chapter.</para>
<para>The cloud controller manages the following services for the
cloud:<indexterm class="singular">
<primary>cloud controllers</primary>
<secondary>services managed by</secondary>
</indexterm></para>
<variablelist>
<varlistentry>
<term>Databases</term>
<listitem>
<para>Tracks current information about users and instances, for
example, in a database, typically one database instance managed per
service</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Message queue services</term>
<listitem>
<para>All AMQP—Advanced Message Queue Protocol—messages for services
are received and sent according to the queue broker<indexterm
class="singular">
<primary>Advanced Message Queuing Protocol (AMQP)</primary>
</indexterm></para>
</listitem>
</varlistentry>
<varlistentry>
<term>Conductor services</term>
<listitem>
<para>Proxy requests to a database</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Authentication and authorization for identity management</term>
<listitem>
<para>Indicates which users can do what actions on certain cloud
resources; quota management is spread out among services,
however<indexterm class="singular">
<primary>authentication</primary>
</indexterm></para>
</listitem>
</varlistentry>
<varlistentry>
<term>Image-management services</term>
<listitem>
<para>Stores and serves images with metadata on each, for launching in
the cloud</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Scheduling services</term>
<listitem>
<para>Indicates which resources to use first; for example, spreading
out where instances are launched based on an algorithm</para>
</listitem>
</varlistentry>
<varlistentry>
<term>User dashboard</term>
<listitem>
<para>Provides a web-based front end for users to consume OpenStack
cloud services</para>
</listitem>
</varlistentry>
<varlistentry>
<term>API endpoints</term>
<listitem>
<para>Offers each service's REST API access, where the API endpoint
catalog is managed by the Identity service</para>
</listitem>
</varlistentry>
</variablelist>
<para>For our example, the cloud controller has a collection of
<code>nova-*</code> components that represent the global state of the cloud;
talks to services such as authentication; maintains information about the
cloud in a database; communicates to all compute nodes and storage
<glossterm>worker</glossterm>s through a queue; and provides API access.
Each service running on a designated cloud controller may be broken out into
separate nodes for scalability or availability.<indexterm class="singular">
<primary>storage</primary>
<secondary>storage workers</secondary>
</indexterm><indexterm class="singular">
<primary>workers</primary>
</indexterm></para>
<para>As another example, you could use pairs of servers for a collective
cloud controller—one active, one standby—for redundant nodes providing a
given set of related services, such as:</para>
<itemizedlist>
<listitem>
<para>Front end web for API requests, the scheduler for choosing which
compute node to boot an instance on, Identity services, and the
dashboard</para>
</listitem>
<listitem>
<para>Database and message queue server (such as MySQL, RabbitMQ)</para>
</listitem>
<listitem>
<para>Image service for the image management</para>
</listitem>
</itemizedlist>
<para>Now that you see the myriad designs for controlling your cloud, read
more about the further considerations to help with your design
decisions.</para>
<section xml:id="hardware_consid">
<title>Hardware Considerations</title>
<para>A cloud controller's hardware can be the same as a compute node,
though you may want to further specify based on the size and type of cloud
that you run.<indexterm class="singular">
<primary>hardware</primary>
<secondary>design considerations</secondary>
</indexterm><indexterm class="singular">
<primary>design considerations</primary>
<secondary>hardware considerations</secondary>
</indexterm></para>
<para>It's also possible to use virtual machines for all or some of the
services that the cloud controller manages, such as the message queuing.
In this guide, we assume that all services are running directly on the
cloud controller.</para>
<para><xref linkend="controller-hardware-sizing" /> contains common
considerations to review when sizing hardware for the cloud controller
design.<indexterm class="singular">
<primary>cloud controllers</primary>
<secondary>hardware sizing considerations</secondary>
</indexterm><indexterm class="singular">
<primary>Active Directory</primary>
</indexterm><indexterm class="singular">
<primary>dashboard</primary>
</indexterm></para>
<table rules="all" xml:id="controller-hardware-sizing">
<caption>Cloud controller hardware sizing considerations</caption>
<col width="25%" />
<col width="75%" />
<thead>
<tr>
<th>Consideration</th>
<th>Ramification</th>
</tr>
</thead>
<tbody>
<tr>
<td><para>How many instances will run at once?</para></td>
<td><para>Size your database server accordingly, and scale out
beyond one cloud controller if many instances will report status at
the same time and scheduling where a new instance starts up needs
computing power.</para></td>
</tr>
<tr>
<td><para>How many compute nodes will run at once?</para></td>
<td><para>Ensure that your messaging queue handles requests
successfully and size accordingly.</para></td>
</tr>
<tr>
<td><para>How many users will access the API?</para></td>
<td><para>If many users will make multiple requests, make sure that
the CPU load for the cloud controller can handle it.</para></td>
</tr>
<tr>
<td><para>How many users will access the
<glossterm>dashboard</glossterm> versus the REST API
directly?</para></td>
<td><para>The dashboard makes many requests, even more than the API
access, so add even more CPU if your dashboard is the main interface
for your users.</para></td>
</tr>
<tr>
<td><para>How many <code>nova-api</code> services do you run at once
for your cloud?</para></td>
<td><para>You need to size the controller with a core per
service.</para></td>
</tr>
<tr>
<td><para>How long does a single instance run?</para></td>
<td><para>Starting instances and deleting instances is demanding on
the compute node but also demanding on the controller node because
of all the API queries and scheduling needs.</para></td>
</tr>
<tr>
<td><para>Does your authentication system also verify
externally?</para></td>
<td><para>External systems such as LDAP or <glossterm>Active
Directory</glossterm> require network connectivity between the cloud
controller and an external authentication system. Also ensure that
the cloud controller has the CPU power to keep up with
requests.</para></td>
</tr>
</tbody>
</table>
</section>
<section xml:id="separate_services">
<title>Separation of Services</title>
<para>While our example contains all central services in a single
location, it is possible and indeed often a good idea to separate services
onto different physical servers. <xref linkend="sep-services-table" /> is
a list of deployment scenarios we've seen and their
justifications.<indexterm class="singular">
<primary>provisioning/deployment</primary>
<secondary>deployment scenarios</secondary>
</indexterm><indexterm class="singular">
<primary>services</primary>
<secondary>separation of</secondary>
</indexterm><indexterm class="singular">
<primary>separation of services</primary>
</indexterm><indexterm class="singular">
<primary>design considerations</primary>
<secondary>separation of services</secondary>
</indexterm></para>
<table rules="all" xml:id="sep-services-table">
<caption>Deployment scenarios</caption>
<col width="25%" />
<col width="75%" />
<thead>
<tr>
<th>Scenario</th>
<th>Justification</th>
</tr>
</thead>
<tbody>
<tr>
<td><para>Run <code>glance-*</code> servers on the
<code>swift-proxy</code> server.</para></td>
<td><para>This deployment felt that the spare I/O on the Object
Storage proxy server was sufficient and that the Image Delivery
portion of glance benefited from being on physical hardware and
having good connectivity to the Object Storage back end it was
using.</para></td>
</tr>
<tr>
<td><para>Run a central dedicated database server.</para></td>
<td><para>This deployment used a central dedicated server to provide
the databases for all services. This approach simplified operations
by isolating database server updates and allowed for the simple
creation of slave database servers for failover.</para></td>
</tr>
<tr>
<td><para>Run one VM per service.</para></td>
<td><para>This deployment ran central services on a set of servers
running KVM. A dedicated VM was created for each service
(<literal>nova-scheduler</literal>, rabbitmq, database, etc). This
assisted the deployment with scaling because administrators could
tune the resources given to each virtual machine based on the load
it received (something that was not well understood during
installation).</para></td>
</tr>
<tr>
<td><para>Use an external load balancer.</para></td>
<td><para>This deployment had an expensive hardware load balancer in
its organization. It ran multiple <code>nova-api</code> and
<code>swift-proxy</code> servers on different physical servers and
used the load balancer to switch between them.</para></td>
</tr>
</tbody>
</table>
<para>One choice that always comes up is whether to virtualize. Some
services, such as <code>nova-compute</code>, <code>swift-proxy</code> and
<code>swift-object</code> servers, should not be virtualized. However,
control servers can often be happily virtualized—the performance penalty
can usually be offset by simply running more of the service.</para>
</section>
<section xml:id="database">
<title>Database</title>
<para>OpenStack Compute uses an SQL database to store and retrieve stateful
information. MySQL is the popular database choice in the OpenStack
community.<indexterm class="singular">
<primary>databases</primary>
<secondary>design considerations</secondary>
</indexterm><indexterm class="singular">
<primary>design considerations</primary>
<secondary>database choice</secondary>
</indexterm></para>
<para>Loss of the database leads to errors. As a result, we recommend that
you cluster your database to make it failure tolerant. Configuring and
maintaining a database cluster is done outside OpenStack and is determined
by the database software you choose to use in your cloud environment.
MySQL/Galera is a popular option for MySQL-based databases.</para>
</section>
<section xml:id="message_queue">
<title>Message Queue</title>
<para>Most OpenStack services communicate with each other using the
<emphasis>message queue</emphasis>.<indexterm class="singular">
<primary>messages</primary>
<secondary>design considerations</secondary>
</indexterm><indexterm class="singular">
<primary>design considerations</primary>
<secondary>message queues</secondary>
</indexterm> For example, Compute communicates to block storage services
and networking services through the message queue. Also, you can
optionally enable notifications for any service. RabbitMQ, Qpid, and 0mq
are all popular choices for a message-queue service. In general, if the
message queue fails or becomes inaccessible, the cluster grinds to a halt
and ends up in a read-only state, with information stuck at the point
where the last message was sent. Accordingly, we recommend that you
cluster the message queue. Be aware that clustered message queues can be a
pain point for many OpenStack deployments. While RabbitMQ has native
clustering support, there have been reports of issues when running it at a
large scale. While other queuing solutions are available, such as 0mq and
Qpid, 0mq does not offer stateful queues. Qpid is the <phrase
role="keep-together">messaging</phrase> system of choice for Red Hat and
its derivatives. Qpid does not have native clustering capabilities and
requires a supplemental service, such as Pacemaker or Corsync. For your
message queue, you need to determine what level of data loss you are
comfortable with and whether to use an OpenStack project's ability to
retry multiple MQ hosts in the event of a failure, such as using Compute's
ability to do so.<indexterm class="singular">
<primary>0mq</primary>
</indexterm><indexterm class="singular">
<primary>Qpid</primary>
</indexterm><indexterm class="singular">
<primary>RabbitMQ</primary>
</indexterm><indexterm class="singular">
<primary>message queue</primary>
</indexterm></para>
</section>
<section xml:id="conductor">
<title>Conductor Services</title>
<para>In the previous version of OpenStack, all
<literal>nova-compute</literal> services required direct access to the
database hosted on the cloud controller. This was problematic for two
reasons: security and performance. With regard to security, if a compute
node is compromised, the attacker inherently has access to the database.
With regard to performance, <literal>nova-compute</literal> calls to the
database are single-threaded and blocking. This creates a performance
bottleneck because database requests are fulfilled serially rather than in
parallel.<indexterm class="singular">
<primary>conductors</primary>
</indexterm><indexterm class="singular">
<primary>design considerations</primary>
<secondary>conductor services</secondary>
</indexterm></para>
<para>The conductor service resolves both of these issues by acting as a
proxy for the <literal>nova-compute</literal> service. Now, instead of
<literal>nova-compute</literal> directly accessing the database, it
contacts the <literal>nova-conductor</literal> service, and
<literal>nova-conductor</literal> accesses the database on
<literal>nova-compute</literal>'s behalf. Since
<literal>nova-compute</literal> no longer has direct access to the
database, the security issue is resolved. Additionally,
<literal>nova-conductor</literal> is a nonblocking service, so requests
from all compute nodes are fulfilled in parallel.</para>
<note>
<para>If you are using <literal>nova-network</literal> and multi-host
networking in your cloud environment, <literal>nova-compute</literal>
still requires direct access to the database.<indexterm class="singular">
<primary>multi-host networking</primary>
</indexterm></para>
</note>
<para>The <literal>nova-conductor</literal> service is horizontally
scalable. To make <literal>nova-conductor</literal> highly available and
fault tolerant, just launch more instances of the
<code>nova-conductor</code> process, either on the same server or across
multiple servers.</para>
</section>
<section xml:id="api">
<title>Application Programming Interface (API)</title>
<para>All public access, whether direct, through a command-line client, or
through the web-based dashboard, uses the API service. Find the API
reference at <link
xlink:href="http://api.openstack.org/"></link>.<indexterm class="singular">
<primary>API (application programming interface)</primary>
<secondary>design considerations</secondary>
</indexterm><indexterm class="singular">
<primary>design considerations</primary>
<secondary>API support</secondary>
</indexterm></para>
<para>You must choose whether you want to support the Amazon EC2
compatibility APIs, or just the OpenStack APIs. One issue you might
encounter when running both APIs is an inconsistent experience when
referring to images and instances.</para>
<para>For example, the EC2 API refers to instances using IDs that contain
hexadecimal, whereas the OpenStack API uses names and digits. Similarly,
the EC2 API tends to rely on DNS aliases for contacting virtual machines,
as opposed to OpenStack, which typically lists IP addresses.<indexterm
class="singular">
<primary>DNS (Domain Name Server, Service or System)</primary>
<secondary>DNS aliases</secondary>
</indexterm><indexterm class="singular">
<primary>troubleshooting</primary>
<secondary>DNS issues</secondary>
</indexterm></para>
<para>If OpenStack is not set up in the right way, it is simple to have
scenarios in which users are unable to contact their instances due to
having only an incorrect DNS alias. Despite this, EC2 compatibility can
assist users migrating to your cloud.</para>
<para>As with databases and message queues, having more than one
<glossterm>API server</glossterm> is a good thing. Traditional HTTP
load-balancing techniques can be used to achieve a highly available
<code>nova-api</code> service.<indexterm class="singular">
<primary>API (application programming interface)</primary>
<secondary>API server</secondary>
</indexterm></para>
</section>
<section xml:id="extensions">
<title>Extensions</title>
<para>The <link xlink:href="http://docs.openstack.org/api/api-specs.html"
xlink:title="API Specifications">API Specifications</link> define the core
actions, capabilities, and mediatypes of the OpenStack API. A client can
always depend on the availability of this core API, and implementers are
always required to support it in its <phrase
role="keep-together">entirety</phrase>. Requiring strict adherence to the
core API allows clients to rely upon a minimal level of functionality when
interacting with multiple implementations of the same API.<indexterm
class="singular">
<primary>extensions</primary>
<secondary>design considerations</secondary>
</indexterm><indexterm class="singular">
<primary>design considerations</primary>
<secondary>extensions</secondary>
</indexterm></para>
<para>The OpenStack Compute API is extensible. An extension adds
capabilities to an API beyond those defined in the core. The introduction
of new features, MIME types, actions, states, headers, parameters, and
resources can all be accomplished by means of extensions to the core API.
This allows the introduction of new features in the API without requiring
a version change and allows the introduction of vendor-specific niche
functionality.</para>
</section>
<section xml:id="scheduling">
<title>Scheduling</title>
<para>The scheduling services are responsible for determining the compute
or storage node where a virtual machine or block storage volume should be
created. The scheduling services receive creation requests for these
resources from the message queue and then begin the process of determining
the appropriate node where the resource should reside. This process is
done by applying a series of user-configurable filters against the
available collection of nodes.<indexterm class="singular">
<primary>schedulers</primary>
<secondary>design considerations</secondary>
</indexterm><indexterm class="singular">
<primary>design considerations</primary>
<secondary>scheduling</secondary>
</indexterm></para>
<para>There are currently two schedulers:
<literal>nova-scheduler</literal> for virtual machines and
<literal>cinder-scheduler</literal> for block storage volumes. Both
schedulers are able to scale horizontally, so for high-availability
purposes, or for very large or high-schedule-frequency installations, you
should consider running multiple instances of each scheduler. The
schedulers all listen to the shared message queue, so no special load
balancing is required.</para>
</section>
<section xml:id="images">
<title>Images</title>
<para>The OpenStack Image service consists of two parts:
<code>glance-api</code> and <code>glance-registry</code>. The former is
responsible for the delivery of images; the compute node uses it to
download images from the back end. The latter maintains the metadata
information associated with virtual machine images and requires a
database.<indexterm class="singular">
<primary>glance</primary>
<secondary>glance registry</secondary>
</indexterm><indexterm class="singular">
<primary>glance</primary>
<secondary>glance API server</secondary>
</indexterm><indexterm class="singular">
<primary>metadata</primary>
<secondary>OpenStack Image service and</secondary>
</indexterm><indexterm class="singular">
<primary>Image service</primary>
<secondary>design considerations</secondary>
</indexterm><indexterm class="singular">
<primary>design considerations</primary>
<secondary>images</secondary>
</indexterm></para>
<para>The <code>glance-api</code> part is an abstraction layer that allows
a choice of back end. Currently, it supports:</para>
<variablelist>
<varlistentry>
<term>OpenStack Object Storage</term>
<listitem>
<para>Allows you to store images as objects.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>File system</term>
<listitem>
<para>Uses any traditional file system to store the images as
files.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>S3</term>
<listitem>
<para>Allows you to fetch images from Amazon S3.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>HTTP</term>
<listitem>
<para>Allows you to fetch images from a web server. You cannot write
images by using this mode.</para>
</listitem>
</varlistentry>
</variablelist>
<para>If you have an OpenStack Object Storage service, we recommend using
this as a scalable place to store your images. You can also use a file
system with sufficient performance or Amazon S3—unless you do not need the
ability to upload new images through OpenStack.</para>
</section>
<section xml:id="dashboard">
<title>Dashboard</title>
<para>The OpenStack dashboard (horizon) provides a web-based user
interface to the various OpenStack components. The dashboard includes an
end-user area for users to manage their virtual infrastructure and an
admin area for cloud operators to manage the OpenStack environment as a
whole.<indexterm class="singular">
<primary>dashboard</primary>
</indexterm><indexterm class="singular">
<primary>design considerations</primary>
<secondary>dashboard</secondary>
</indexterm></para>
<para>The dashboard is implemented as a Python web application that
normally runs in <glossterm>Apache</glossterm> <code>httpd</code>.
Therefore, you may treat it the same as any other web application,
provided it can reach the API servers (including their admin endpoints)
over the <phrase role="keep-together">network</phrase>.<indexterm
class="singular">
<primary>Apache</primary>
</indexterm></para>
</section>
<section xml:id="authentication">
<title>Authentication and Authorization</title>
<para>The concepts supporting OpenStack's authentication and authorization
are derived from well-understood and widely used systems of a similar
nature. Users have credentials they can use to authenticate, and they can
be a member of one or more groups (known as projects or tenants,
interchangeably).<indexterm class="singular">
<primary>credentials</primary>
</indexterm><indexterm class="singular">
<primary>authorization</primary>
</indexterm><indexterm class="singular">
<primary>authentication</primary>
</indexterm><indexterm class="singular">
<primary>design considerations</primary>
<secondary>authentication/authorization</secondary>
</indexterm></para>
<para>For example, a cloud administrator might be able to list all
instances in the cloud, whereas a user can see only those in his current
group. Resources quotas, such as the number of cores that can be used,
disk space, and so on, are associated with a project.</para>
<para>OpenStack Identity provides
authentication decisions and user attribute information, which is then
used by the other OpenStack services to perform authorization. The policy is
set in the <filename>policy.json</filename> file. For <phrase
role="keep-together">information</phrase> on how to configure these, see
<xref linkend="projects_users" />.<indexterm class="singular">
<primary>Identity</primary>
<secondary>authentication decisions</secondary>
</indexterm><indexterm class="singular">
<primary>Identity</primary>
<secondary>plug-in support</secondary>
</indexterm></para>
<para>OpenStack Identity supports different plug-ins for authentication
decisions and identity storage. Examples of these plug-ins include:</para>
<itemizedlist role="compact">
<listitem>
<para>In-memory key-value Store (a simplified internal storage
structure)</para>
</listitem>
<listitem>
<para>SQL database (such as MySQL or PostgreSQL)</para>
</listitem>
<listitem>
<para>Memcached (a distributed memory object caching system)</para>
</listitem>
<listitem>
<para>LDAP (such as OpenLDAP or Microsoft's Active Directory)</para>
</listitem>
</itemizedlist>
<para>Many deployments use the SQL database; however, LDAP is also a
popular choice for those with existing authentication infrastructure that
needs to be integrated.</para>
</section>
<section xml:id="network_consid">
<title>Network Considerations</title>
<para>Because the cloud controller handles so many different services, it
must be able to handle the amount of traffic that hits it. For example, if
you choose to host the OpenStack Image service on the cloud controller,
the cloud controller should be able to support the transferring of the
images at an acceptable speed.<indexterm class="singular">
<primary>cloud controllers</primary>
<secondary>network traffic and</secondary>
</indexterm><indexterm class="singular">
<primary>networks</primary>
<secondary>design considerations</secondary>
</indexterm><indexterm class="singular">
<primary>design considerations</primary>
<secondary>networks</secondary>
</indexterm></para>
<para>As another example, if you choose to use single-host networking
where the cloud controller is the network gateway for all instances, then
the cloud controller must support the total amount of traffic that travels
between your cloud and the public Internet.</para>
<para>We recommend that you use a fast NIC, such as 10 GB. You can also
choose to use two 10 GB NICs and bond them together. While you might not
be able to get a full bonded 20 GB speed, different transmission streams
use different NICs. For example, if the cloud controller transfers two
images, each image uses a different NIC and gets a full 10 GB of
bandwidth.<indexterm class="singular">
<primary>bandwidth</primary>
<secondary>design considerations for</secondary>
</indexterm></para>
</section>
</chapter>