openstack-manuals/doc/arch-design/storage_focus/section_prescriptive_exampl...

170 lines
7.4 KiB
XML

<?xml version="1.0" encoding="UTF-8"?>
<section xmlns="http://docbook.org/ns/docbook"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xlink="http://www.w3.org/1999/xlink"
version="5.0"
xml:id="prescriptive-example-storage-focus">
<?dbhtml stop-chunking?>
<title>Prescriptive Examples</title>
<para>Storage-focused architectures are highly dependent on the
specific use case. Three specific example use cases are
discussed in this section: an object store with a RESTful
interface, compute analytics with parallel file systems, and a
high performance database.</para>
<para>This example describes a REST interface without a high
performance requirement, so the presented REST interface does
not require a high performance caching tier, and is presented
as a traditional Object store running on traditional
spindles.</para>
<para>Swift is a highly scalable object store that is part of the
OpenStack project. This is a diagram to explain the example
architecture:</para>
<mediaobject>
<imageobject>
<imagedata
fileref="../images/Storage_Object.png"
/>
</imageobject>
</mediaobject>
<para>This example uses the following components:</para>
<para>Network:</para>
<itemizedlist>
<listitem>
<para>10 GbE horizontally scalable spine leaf back end
storage and front end network.</para>
</listitem>
</itemizedlist>
<para>Storage hardware:</para>
<itemizedlist>
<listitem>
<para>10 storage servers each with 12x4 TB disks which
equals 480 TB total space with approximately 160 Tb of
usable space after replicas.</para>
</listitem>
</itemizedlist>
<para>Proxy:</para>
<itemizedlist>
<listitem>
<para>3x proxies</para>
</listitem>
<listitem>
<para>2x10 GbE bonded front end</para>
</listitem>
<listitem>
<para>2x10 GbE back end bonds</para>
</listitem>
<listitem>
<para>Approximately 60 Gb of total bandwidth to the
back end storage cluster</para>
</listitem>
</itemizedlist>
<note><para>For some applications, it may be necessary to
implement a 3rd-party caching layer to achieve suitable
performance.</para></note>
<section xml:id="compute-analytics-with-sahara"><title>Compute Analytics with Sahara</title>
<para>Analytics of large data sets can be highly dependent on the
performance of the storage system. Some clouds using storage
systems such as HDFS have inefficiencies which can cause
performance issues. A potential solution to this is to
implement a storage system designed with performance in mind.
Traditionally, parallel file systems have filled this need in
the HPC space and could be a consideration, when applicable,
for large scale performance-oriented systems.</para>
<para>This example discusses an OpenStack Object Store with a high
performance requirement. OpenStack has integration with Hadoop
through the Sahara project, which is leveraged to manage the
Hadoop cluster within the cloud.</para>
<mediaobject>
<imageobject>
<imagedata
fileref="../images/Storage_Hadoop3.png"
/>
</imageobject>
</mediaobject>
<para>The actual hardware requirements and configuration are
similar to those of the High Performance Database example
below. In this case, the architecture uses Ceph's
Swift-compatible REST interface, features that allow for
connecting a caching pool to allow for acceleration of the
presented pool.</para></section>
<section xml:id="high-performance-database-with-trove">
<title>High Performance Database with Trove</title>
<para>Databases are a common workload that can greatly benefit
from a high performance storage back end. Although enterprise
storage is not a requirement, many environments have existing
storage that can be used as back ends for an OpenStack cloud.
As shown in the following diagram, a storage pool can be
carved up to provide block devices with OpenStack Block
Storage to instances as well as an object interface. In this
example the database I-O requirements were high and demanded
storage presented from a fast SSD pool.</para>
<para>A storage system is used to present a LUN that is backed by
a set of SSDs using a traditional storage array with OpenStack
Block Storage integration or a storage platform such as Ceph
or Gluster.</para>
<para>This kind of system can also provide additional performance
in other situations. For example, in the database example
below, a portion of the SSD pool can act as a block device to
the Database server. In the high performance analytics
example, the REST interface would be accelerated by the inline
SSD cache layer.</para>
<mediaobject>
<imageobject>
<imagedata
fileref="../images/Storage_Database_+_Object5.png"
/>
</imageobject>
</mediaobject>
<para>Ceph was selected to present a Swift-compatible REST
interface, as well as a block level storage from a distributed
storage cluster. It is highly flexible and has features that
allow to reduce cost of operations such as self healing and
auto balancing. Erasure coded pools are used to maximize the
amount of usable space. Note that there are special
considerations around erasure coded pools, for example, higher
computational requirements and limitations on the operations
allowed on an object. For example, partial writes are not
supported in an erasure coded pool.</para>
<para>A potential architecture for Ceph, as it relates to the
examples above, would entail the following:</para>
<para>Network:</para>
<itemizedlist>
<listitem>
<para>10 GbE horizontally scalable spine leaf back end
storage and front end network</para>
</listitem>
</itemizedlist>
<para>Storage hardware:</para>
<itemizedlist>
<listitem>
<para>5 storage servers for caching layer 24x1 TB SSD
</para>
</listitem>
<listitem>
<para>10 storage servers each with 12x4 TB disks which
equals 480 TB total space with about approximately 160
Tb of usable space after 3 replicas</para>
</listitem>
</itemizedlist>
<para>REST proxy:</para>
<itemizedlist>
<listitem>
<para>3x proxies</para>
</listitem>
<listitem>
<para>2x10 GbE bonded front end</para>
</listitem>
<listitem>
<para>2x10 GbE back end bonds</para>
</listitem>
<listitem>
<para>Approximately 60 Gb of total bandwidth to the
back end storage cluster</para>
</listitem>
</itemizedlist>
<para>The SSD cache layer is used to present block devices
directly to Hypervisors or instances. The SSD cache systems
can also be used as an inline cache for the REST interface.
</para></section>
</section>