Remove passive voice from chap 5, arch guide

Change-Id: Iaaa9d2a052c9f81cefebd18bdc866d96d4fed64e
Closes-Bug: #1427935
This commit is contained in:
Suyog Sainkar 2015-05-07 16:21:30 +10:00 committed by Brian Moss
parent dbf44fc980
commit 31771bef7b
6 changed files with 588 additions and 702 deletions

View File

@ -5,167 +5,141 @@
version="5.0"
xml:id="network_focus">
<title>Network focused</title>
<para>All OpenStack deployments are dependent, to some extent, on
network communication in order to function properly due to a
service-based nature. In some cases, however, use cases
dictate that the network is elevated beyond simple
infrastructure. This chapter is a discussion of architectures
that are more reliant or focused on network services. These
architectures are heavily dependent on the network
infrastructure and need to be architected so that the network
services perform and are reliable in order to satisfy user and
application requirements.</para>
<para>All OpenStack deployments depend on network communication in order
to function properly due to its service-based nature. In some cases,
however, the network elevates beyond simple
infrastructure. This chapter discusses architectures that are more
reliant or focused on network services. These architectures depend
on the network infrastructure and require
network services that perform reliably in order to satisfy user and
application requirements.</para>
<para>Some possible use cases include:</para>
<variablelist>
<varlistentry>
<term>Content delivery network</term>
<listitem>
<para>This could include
streaming video, photographs or any other cloud based
repository of data that is distributed to a large
number of end users. Mass market streaming video will
be very heavily affected by the network configurations
that would affect latency, bandwidth, and the
distribution of instances. Not all video streaming is
consumer focused. For example, multicast videos (used
for media, press conferences, corporate presentations,
web conferencing services, and so on) can also utilize a
content delivery network. Content delivery will be
affected by the location of the video repository and
its relationship to end users. Performance is also
affected by network throughput of the back-end systems,
as well as the WAN architecture and the cache
methodology.</para>
<para>This includes streaming video, viewing photographs, or
accessing any other cloud-based data repository distributed to
a large number of end users. Network configuration affects
latency, bandwidth, and the distribution of instances. Therefore,
it impacts video streaming. Not all video streaming is
consumer-focused. For example, multicast videos (used for media,
press conferences, corporate presentations, and web conferencing
services) can also use a content delivery network.
The location of the video repository and its relationship to end
users affects content delivery. Network throughput of the back-end
systems, as well as the WAN architecture and the cache methodology,
also affect performance.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Network management functions</term>
<listitem>
<para>A cloud that provides
network service functions would be built to support
the delivery of back-end network services such as DNS,
NTP or SNMP and would be used by a company for
internal network management.</para>
<para>Use this cloud to provide network service functions built to
support the delivery of back-end network services such as DNS,
NTP, or SNMP. A company can use these services for internal
network management.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Network service offerings</term>
<listitem>
<para>A cloud can be used to
run customer facing network tools to support services.
For example, VPNs, MPLS private networks, GRE tunnels
and others.</para>
<para>Use this cloud to run customer-facing network tools to
support services. Examples include VPNs, MPLS private networks,
and GRE tunnels.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Web portals or web services</term>
<listitem>
<para>Web servers are a common
application for cloud services and we recommend
an understanding of the network requirements.
The network will need to be able to scale out to meet
user demand and deliver webpages with a minimum of
latency. Internal east-west and north-south network
bandwidth must be considered depending on the details
of the portal architecture.</para>
<para>Web servers are a common application for cloud services,
and we recommend an understanding of their network requirements.
The network requires scaling out to meet user demand and deliver
web pages with a minimum latency. Depending on the details of
the portal architecture, consider the internal east-west and
north-south network bandwidth.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>High speed and high volume transactional systems</term>
<listitem>
<para>
These types of applications are very sensitive to
network configurations. Examples include many
financial systems, credit card transaction
applications, trading and other extremely high volume
systems. These systems are sensitive to network jitter
and latency. They also have a high volume of both
east-west and north-south network traffic that needs
to be balanced to maximize efficiency of the data
delivery. Many of these systems have large high
performance database back ends that need to be
accessed.</para>
These types of applications are sensitive to network
configurations. Examples include financial systems,
credit card transaction applications, and trading and other
extremely high volume systems. These systems are sensitive
to network jitter and latency. They must balance a high volume
of East-West and North-South network traffic to
maximize efficiency of the data delivery.
Many of these systems must access large, high performance
database back ends.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>High availability</term>
<listitem>
<para>These types of use cases are
highly dependent on the proper sizing of the network
to maintain replication of data between sites for high
availability. If one site becomes unavailable, the
extra sites will be able to serve the displaced load
until the original site returns to service. It is
important to size network capacity to handle the loads
that are desired.</para>
<para>These types of use cases are dependent on the proper sizing
of the network to maintain replication of data between sites for
high availability. If one site becomes unavailable, the extra
sites can serve the displaced load until the original site
returns to service. It is important to size network capacity
to handle the desired loads.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Big data</term>
<listitem>
<para>Clouds that will be used for the
management and collection of big data (data ingest)
will have a significant demand on network resources.
Big data often uses partial replicas of the data to
maintain data integrity over large distributed clouds.
Other big data applications that require a large
amount of network resources are Hadoop, Cassandra,
NuoDB, RIAK and other No-SQL and distributed
databases.</para>
<para>Clouds used for the management and collection of big data
(data ingest) have a significant demand on network resources.
Big data often uses partial replicas of the data to maintain
integrity over large distributed clouds. Other big data
applications that require a large amount of network resources
are Hadoop, Cassandra, NuoDB, Riak, and other NoSQL and
distributed databases.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Virtual desktop infrastructure (VDI)</term>
<listitem>
<para>This use case
is very sensitive to network congestion, latency,
jitter and other network characteristics. Like video
streaming, the user experience is very important
however, unlike video streaming, caching is not an
option to offset the network issues. VDI requires both
upstream and downstream traffic and cannot rely on
caching for the delivery of the application to the end
user.</para>
<para>This use case is sensitive to network congestion, latency,
jitter, and other network characteristics. Like video streaming,
the user experience is important. However, unlike video
streaming, caching is not an option to offset the network issues.
VDI requires both upstream and downstream traffic and cannot rely
on caching for the delivery of the application to the end user.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Voice over IP (VoIP)</term>
<listitem>
<para>This is extremely sensitive to
network congestion, latency, jitter and other network
characteristics. VoIP has a symmetrical traffic
pattern and it requires network quality of service
(QoS) for best performance. It may also require an
active queue management implementation to ensure
delivery. Users are very sensitive to latency and
jitter fluctuations and can detect them at very low
levels.</para>
<para>This is sensitive to network congestion, latency, jitter,
and other network characteristics. VoIP has a symmetrical traffic
pattern and it requires network quality of service (QoS) for best
performance. In addition, you can implement active queue management
to deliver voice and multimedia content. Users are sensitive to
latency and jitter fluctuations and can detect them at very low
levels.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Video Conference or web conference</term>
<listitem>
<para>This also is
extremely sensitive to network congestion, latency,
jitter and other network flaws. Video Conferencing has
a symmetrical traffic pattern, but unless the network
is on an MPLS private network, it cannot use network
quality of service (QoS) to improve performance.
Similar to VOIP, users will be sensitive to network
performance issues even at low levels.</para>
<para>This is sensitive to network congestion, latency, jitter,
and other network characteristics. Video Conferencing has a
symmetrical traffic pattern, but unless the network is on an
MPLS private network, it cannot use network quality of service
(QoS) to improve performance. Similar to VoIP, users are
sensitive to network performance issues even at low levels.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>High performance computing (HPC)</term>
<listitem>
<para>This is a complex
use case that requires careful consideration of the
traffic flows and usage patterns to address the needs
of cloud clusters. It has high East-West traffic
patterns for distributed computing, but there can be
substantial North-South traffic depending on the
specific application.</para>
<para>This is a complex use case that requires careful
consideration of the traffic flows and usage patterns to address
the needs of cloud clusters. It has high east-west traffic
patterns for distributed computing, but there can be substantial
north-south traffic depending on the specific application.</para>
</listitem>
</varlistentry>
</variablelist>

View File

@ -5,222 +5,190 @@
version="5.0"
xml:id="architecture-network-focus">
<title>Architecture</title>
<para>Network focused OpenStack architectures have many
similarities to other OpenStack architecture use cases. There
are a number of very specific considerations to keep in mind when
designing for a network-centric or network-heavy application
environment.</para>
<para>Networks exist to serve as a medium of transporting data
between systems. It is inevitable that an OpenStack design
has inter-dependencies with non-network portions of OpenStack
as well as on external systems. Depending on the specific
workload, there may be major interactions with storage systems
both within and external to the OpenStack environment. For
example, if the workload is a content delivery network, then
the interactions with storage will be two-fold. There will be
traffic flowing to and from the storage array for ingesting
and serving content in a north-south direction. In addition,
there is replication traffic flowing in an east-west
direction.</para>
<para>Compute-heavy workloads may also induce interactions with
the network. Some high performance compute applications
require network-based memory mapping and data sharing and, as
a result, will induce a higher network load when they transfer
results and data sets. Others may be highly transactional and
issue transaction locks, perform their functions and rescind
transaction locks at very high rates. This also has an impact
on the network performance.</para>
<para>Some network dependencies are going to be external to
OpenStack. While OpenStack Networking is capable of providing network
ports, IP addresses, some level of routing, and overlay
networks, there are some other functions that it cannot
provide. For many of these, external systems or equipment may
be required to fill in the functional gaps. Hardware load
balancers are an example of equipment that may be necessary to
distribute workloads or offload certain functions. Note that,
as of the Kilo release, dynamic routing is currently in
its infancy within OpenStack and may need to be implemented
either by an external device or a specialized service instance
within OpenStack. Tunneling is a feature provided by OpenStack Networking,
however it is constrained to a Networking-managed region. If the
need arises to extend a tunnel beyond the OpenStack region to
either another region or an external system, it is necessary
to implement the tunnel itself outside OpenStack or by using a
tunnel management system to map the tunnel or overlay to an
external tunnel. OpenStack does not currently provide quotas
for network resources. Where network quotas are required, it
is necessary to implement quality of service management
outside of OpenStack. In many of these instances, similar
solutions for traffic shaping or other network functions will
be needed.
<para>Network-focused OpenStack architectures have many similarities to
other OpenStack architecture use cases. There are several factors
to consider when designing for a network-centric or network-heavy
application environment.</para>
<para>Networks exist to serve as a medium of transporting data between
systems. It is inevitable that an OpenStack design has inter-dependencies
with non-network portions of OpenStack as well as on external systems.
Depending on the specific workload, there may be major interactions with
storage systems both within and external to the OpenStack environment.
For example, in the case of content delivery network, there is twofold
interaction with storage. Traffic flows to and from the storage array for
ingesting and serving content in a north-south direction. In addition,
there is replication traffic flowing in an east-west direction.</para>
<para>Compute-heavy workloads may also induce interactions with the
network. Some high performance compute applications require network-based
memory mapping and data sharing and, as a result, induce a higher network
load when they transfer results and data sets. Others may be highly
transactional and issue transaction locks, perform their functions, and
revoke transaction locks at high rates. This also has an impact on the
network performance.</para>
<para>Some network dependencies are external to OpenStack. While
OpenStack Networking is capable of providing network ports, IP addresses,
some level of routing, and overlay networks, there are some other
functions that it cannot provide. For many of these, you may require
external systems or equipment to fill in the functional gaps. Hardware
load balancers are an example of equipment that may be necessary to
distribute workloads or offload certain functions. As of the Icehouse
release, dynamic routing is currently in its infancy within OpenStack and
you may require an external device or a specialized service instance
within OpenStack to implement it. OpenStack Networking provides a
tunneling feature, however it is constrained to a Networking-managed
region. If the need arises to extend a tunnel beyond the OpenStack region
to either another region or an external system, implement the tunnel
itself outside OpenStack or use a tunnel management system to map the
tunnel or overlay to an external tunnel. OpenStack does not currently
provide quotas for network resources. Where network quotas are required,
implement quality of service management outside of OpenStack. In many of
these instances, similar solutions for traffic shaping or other network
functions are needed.
</para>
<para>
Depending on the selected design, Networking itself might not
even support the required
<glossterm baseform="Layer-3 network">layer-3
support the required <glossterm baseform="Layer-3 network">layer-3
network</glossterm> functionality. If you choose to use the
provider networking mode without running the layer-3 agent, you
must install an external router to provide layer-3 connectivity
to outside systems.
</para>
<para>Interaction with orchestration services is inevitable in
larger-scale deployments. The Orchestration module is capable of allocating
network resource defined in templates to map to tenant
networks and for port creation, as well as allocating floating
IPs. If there is a requirement to define and manage network
resources in using orchestration, we recommend that the
design include the Orchestration module to meet the demands of
users.</para>
larger-scale deployments. The Orchestration module is capable of
allocating network resource defined in templates to map to tenant
networks and for port creation, as well as allocating floating IPs.
If there is a requirement to define and manage network resources when
using orchestration, we recommend that the design include the
Orchestration module to meet the demands of users.</para>
<section xml:id="design-impacts">
<title>Design impacts</title>
<para>A wide variety of factors can affect a network focused
OpenStack architecture. While there are some considerations
shared with a general use case, specific workloads related to
network requirements will influence network design
decisions.</para>
<para>One decision includes whether or not to use Network Address
Translation (NAT) and where to implement it. If there is a
requirement for floating IPs to be available instead of using
public fixed addresses then NAT is required. This can be seen
in network management applications that rely on an IP
endpoint. An example of this is a DHCP relay that needs to
know the IP of the actual DHCP server. In these cases it is
easier to automate the infrastructure to apply the target IP
to a new instance rather than reconfigure legacy or external
systems for each new instance.</para>
<para>NAT for floating IPs managed by Networking will reside within
the hypervisor but there are also versions of NAT that may be
running elsewhere. If there is a shortage of IPv4 addresses
there are two common methods to mitigate this externally to
OpenStack. The first is to run a load balancer either within
OpenStack as an instance, or use an external load balancing
solution. In the internal scenario, load balancing software,
such as HAproxy, can be managed with Networking's
Load-Balancer-as-a-Service (LBaaS). This is specifically to
manage the
Virtual IP (VIP) while a dual-homed connection from the
HAproxy instance connects the public network with the tenant
private network that hosts all of the content servers. In the
external scenario, a load balancer would need to serve the VIP
and also be joined to the tenant overlay network through
external means or routed to it via private addresses.</para>
<para>Another kind of NAT that may be useful is protocol NAT. In
some cases it may be desirable to use only IPv6 addresses on
instances and operate either an instance or an external
service to provide a NAT-based transition technology such as
NAT64 and DNS64. This provides the ability to have a globally
routable IPv6 address while only consuming IPv4 addresses as
necessary or in a shared manner.</para>
<para>Application workloads will affect the design of the
underlying network architecture. If a workload requires
network-level redundancy, the routing and switching
architecture will have to accommodate this. There are
differing methods for providing this that are dependent on the
network hardware selected, the performance of the hardware,
and which networking model is deployed. Some examples of this
are the use of Link aggregation (LAG) or Hot Standby Router
Protocol (HSRP). There are also the considerations of whether
to deploy OpenStack Networking or legacy networking (nova-network)
and which plug-in to select
for OpenStack Networking. If using an external system, Networking will need to
be configured to run
<glossterm baseform="Layer-2 network">layer 2</glossterm>
with a provider network
configuration. For example, it may be necessary to implement
HSRP to terminate layer-3 connectivity.</para>
<para>Depending on the workload, overlay networks may or may not
be a recommended configuration. Where application network
connections are small, short lived or bursty, running a
dynamic overlay can generate as much bandwidth as the packets
it carries. It also can induce enough latency to cause issues
with certain applications. There is an impact to the device
generating the overlay which, in most installations, will be
the hypervisor. This will cause performance degradation on
packet per second and connection per second rates.</para>
<para>Overlays also come with a secondary option that may or may
not be appropriate to a specific workload. While all of them
will operate in full mesh by default, there might be good
reasons to disable this function because it may cause
excessive overhead for some workloads. Conversely, other
workloads will operate without issue. For example, most web
services applications will not have major issues with a full
mesh overlay network, while some network monitoring tools or
storage replication workloads will have performance issues
with throughput or excessive broadcast traffic.</para>
<para>Many people overlook an important design decision: The choice
of layer-3
protocols. While OpenStack was initially built with only IPv4
<para>A wide variety of factors can affect a network-focused OpenStack
architecture. While there are some considerations shared with a general
use case, specific workloads related to network requirements influence
network design decisions.</para>
<para>One decision includes whether or not to use Network Address
Translation (NAT) and where to implement it. If there is a requirement
for floating IPs instead of public fixed addresses then you must use
NAT. An example of this is a DHCP relay that must know the IP of the
DHCP server. In these cases it is easier to automate the infrastructure
to apply the target IP to a new instance rather than to reconfigure
legacy or external systems for each new instance.</para>
<para>NAT for floating IPs managed by Networking resides within the
hypervisor but there are also versions of NAT that may be running
elsewhere. If there is a shortage of IPv4 addresses there are two common
methods to mitigate this externally to OpenStack. The first is to run a
load balancer either within OpenStack as an instance, or use an external
load balancing solution. In the internal scenario, Networking's
Load-Balancer-as-a-Service (LBaaS) can manage load balancing
software, for example HAproxy. This is specifically to manage the
Virtual IP (VIP) while a dual-homed connection from the HAproxy instance
connects the public network with the tenant private network that hosts
all of the content servers. In the external scenario, a load balancer
needs to serve the VIP and also connect to the tenant overlay
network through external means or through private addresses.</para>
<para>Another kind of NAT that may be useful is protocol NAT. In some
cases it may be desirable to use only IPv6 addresses on instances and
operate either an instance or an external service to provide a NAT-based
transition technology such as NAT64 and DNS64. This provides the ability
to have a globally routable IPv6 address while only consuming IPv4
addresses as necessary or in a shared manner.</para>
<para>Application workloads affect the design of the underlying network
architecture. If a workload requires network-level redundancy, the
routing and switching architecture have to accommodate this. There
are differing methods for providing this that are dependent on the
selected network hardware, the performance of the hardware, and which
networking model you deploy. Examples include
Link aggregation (LAG) and Hot Standby Router Protocol (HSRP). Also
consider whether to deploy OpenStack Networking or
legacy networking (nova-network), and which plug-in to select for
OpenStack Networking. If using an external system, configure Networking
to run <glossterm baseform="Layer-2 network">layer 2</glossterm>
with a provider network configuration. For example, implement HSRP
to terminate layer-3 connectivity.</para>
<para>Depending on the workload, overlay networks may not be the best
solution. Where application network connections are
small, short lived, or bursty, running a dynamic overlay can generate
as much bandwidth as the packets it carries. It also can induce enough
latency to cause issues with certain applications. There is an impact
to the device generating the overlay which, in most installations,
is the hypervisor. This causes performance degradation on packet
per second and connection per second rates.</para>
<para>Overlays also come with a secondary option that may not be
appropriate to a specific workload. While all of them operate in full
mesh by default, there might be good reasons to disable this function
because it may cause excessive overhead for some workloads. Conversely,
other workloads operate without issue. For example, most web services
applications do not have major issues with a full mesh overlay network,
while some network monitoring tools or storage replication workloads
have performance issues with throughput or excessive broadcast
traffic.</para>
<para>Many people overlook an important design decision: The choice of
layer-3 protocols. While OpenStack was initially built with only IPv4
support, Networking now supports IPv6 and dual-stacked networks.
Note that, as of the Icehouse release, this only includes
stateless address auto configuration but work is in
progress to support stateless and stateful DHCPv6 as well as
IPv6 floating IPs without NAT. Some workloads become possible
through the use of IPv6 and IPv6 to IPv4 reverse transition
mechanisms such as NAT64 and DNS64 or <glossterm>6to4</glossterm>,
because these
options are available. This will alter the requirements for
any address plan as single-stacked and transitional IPv6
deployments can alleviate the need for IPv4 addresses.</para>
<para>As of the Kilo release, OpenStack has limited support
for dynamic routing, however there are a number of options
available by incorporating third party solutions to implement
routing within the cloud including network equipment, hardware
nodes, and instances. Some workloads will perform well with
nothing more than static routes and default gateways
configured at the layer-3 termination point. In most cases
this will suffice, however some cases require the addition of
at least one type of dynamic routing protocol if not multiple
protocols. Having a form of interior gateway protocol (IGP)
available to the instances inside an OpenStack installation
opens up the possibility of use cases for anycast route
injection for services that need to use it as a geographic
location or failover mechanism. Other applications may wish to
directly participate in a routing protocol, either as a
passive observer as in the case of a looking glass, or as an
active participant in the form of a route reflector. Since an
instance might have a large amount of compute and memory
resources, it is trivial to hold an entire unpartitioned
routing table and use it to provide services such as network
path visibility to other applications or as a monitoring
As of the Icehouse release, this only includes stateless
address auto configuration but work is in progress to support stateless
and stateful DHCPv6 as well as IPv6 floating IPs without NAT. Some
workloads are possible through the use of IPv6 and IPv6 to IPv4
reverse transition mechanisms such as NAT64 and DNS64 or
<glossterm>6to4</glossterm>.
This alters the requirements for any address plan as single-stacked and
transitional IPv6 deployments can alleviate the need for IPv4
addresses.</para>
<para>As of the Icehouse release, OpenStack has limited support for
dynamic routing, however there are a number of options available by
incorporating third party solutions to implement routing within the
cloud including network equipment, hardware nodes, and instances. Some
workloads perform well with nothing more than static routes and default
gateways configured at the layer-3 termination point. In most cases this
is sufficient, however some cases require the addition of at least one
type of dynamic routing protocol if not multiple protocols. Having a
form of interior gateway protocol (IGP) available to the instances
inside an OpenStack installation opens up the possibility of use cases
for anycast route injection for services that need to use it as a
geographic location or failover mechanism. Other applications may wish
to directly participate in a routing protocol, either as a passive
observer, as in the case of a looking glass, or as an active participant
in the form of a route reflector. Since an instance might have a large
amount of compute and memory resources, it is trivial to hold an entire
unpartitioned routing table and use it to provide services such as
network path visibility to other applications or as a monitoring
tool.</para>
<para>
Path maximum transmission unit (MTU) failures are lesser known
but harder to diagnose. The MTU must be large enough to handle
normal traffic, overhead from an overlay network, and the
desired layer-3 protocol. When you add externally built tunnels,
the MTU packet size is reduced. In this case, you must pay
attention to the fully calculated MTU size because some systems
are configured to ignore or drop path MTU discovery packets.
</para>
<para>Path maximum transmission unit (MTU) failures are lesser known but
harder to diagnose. The MTU must be large enough to handle normal
traffic, overhead from an overlay network, and the desired layer-3
protocol. Adding externally built tunnels reduces the MTU packet size.
In this case, you must pay attention to the fully
calculated MTU size because some systems ignore or
drop path MTU discovery packets.</para>
</section>
<section xml:id="tunables">
<title>Tunable networking components</title>
<para>Consider configurable networking components related to an
OpenStack architecture design when designing for network intensive
workloads include MTU and QoS. Some workloads will require a larger
MTU than normal based on a requirement to transfer large blocks of
data. When providing network service for applications such as video
streaming or storage replication, it is recommended to ensure that
both OpenStack hardware nodes and the supporting network equipment
are configured for jumbo frames where possible. This will allow for
a better utilization of available bandwidth. Configuration of jumbo
frames should be done across the complete path the packets will
traverse. If one network component is not capable of handling jumbo
frames then the entire path will revert to the default MTU.</para>
<para>Quality of Service (QoS) also has a great impact on network
intensive workloads by providing instant service to packets which
have a higher priority due to their ability to be impacted by poor
network performance. In applications such as Voice over IP (VoIP)
differentiated services code points are a near requirement for
proper operation. QoS can also be used in the opposite direction for
mixed workloads to prevent low priority but high bandwidth
applications, for example backup services, video conferencing or
file sharing, from blocking bandwidth that is needed for the proper
operation of other workloads. It is possible to tag file storage
traffic as a lower class, such as best effort or scavenger, to allow
the higher priority traffic through. In cases where regions within a
cloud might be geographically distributed it may also be necessary
to plan accordingly to implement WAN optimization to combat latency
or packet loss.</para>
<title>Tunable networking components</title>
<para>Consider configurable networking components related to an
OpenStack architecture design when designing for network intensive
workloads that include MTU and QoS. Some workloads require a larger MTU
than normal due to the transfer of large blocks of data.
When providing network service for applications such as video
streaming or storage replication, we recommend that you configure
both OpenStack hardware nodes and the supporting network equipment
for jumbo frames where possible. This allows for better use of
available bandwidth. Configure jumbo frames
across the complete path the packets traverse. If one network
component is not capable of handling jumbo frames then the entire
path reverts to the default MTU.</para>
<para>Quality of Service (QoS) also has a great impact on network
intensive workloads as it provides instant service to packets which
have a higher priority due to the impact of poor
network performance. In applications such as Voice over IP (VoIP),
differentiated services code points are a near requirement for proper
operation. You can also use QoS in the opposite direction for mixed
workloads to prevent low priority but high bandwidth applications,
for example backup services, video conferencing, or file sharing,
from blocking bandwidth that is needed for the proper operation of
other workloads. It is possible to tag file storage traffic as a
lower class, such as best effort or scavenger, to allow the higher
priority traffic through. In cases where regions within a cloud might
be geographically distributed it may also be necessary to plan
accordingly to implement WAN optimization to combat latency or
packet loss.</para>
</section>
</section>

View File

@ -6,67 +6,63 @@
xml:id="operational-considerations-networking-focus">
<?dbhtml stop-chunking?>
<title>Operational considerations</title>
<para>Network focused OpenStack clouds have a number of
operational considerations that will influence the selected
design. Topics including, but not limited to, dynamic routing
of static routes, service level agreements, and ownership of
user management all need to be considered.</para>
<para>One of the first required decisions is the selection of a
telecom company or transit provider. This is especially true
if the network requirements include external or site-to-site
network connectivity.</para>
<para>Additional design decisions need to be made about monitoring
and alarming. These can be an internal responsibility or the
responsibility of the external provider. In the case of using
an external provider, SLAs will likely apply. In addition,
other operational considerations such as bandwidth, latency,
and jitter can be part of a service level agreement.</para>
<para>The ability to upgrade the infrastructure is another subject
for consideration. As demand for network resources increase,
operators will be required to add additional IP address blocks
and add additional bandwidth capacity. Managing hardware and
software life cycle events, for example upgrades,
decommissioning, and outages while avoiding service
interruptions for tenants, will also need to be
considered.</para>
<para>Maintainability will also need to be factored into the
overall network design. This includes the ability to manage
and maintain IP addresses as well as the use of overlay
identifiers including VLAN tag IDs, GRE tunnel IDs, and MPLS
tags. As an example, if all of the IP addresses have to be
changed on a network, a process known as renumbering, then the
design needs to support the ability to do so.</para>
<para>Network focused applications themselves need to be addressed
when concerning certain operational realities. For example,
the impending exhaustion of IPv4 addresses, the migration to
IPv6 and the utilization of private networks to segregate
different types of traffic that an application receives or
generates. In the case of IPv4 to IPv6 migrations,
applications should follow best practices for storing IP
addresses. It is further recommended to avoid relying on IPv4
features that were not carried over to the IPv6 protocol or
have differences in implementation.</para>
<para>When using private networks to segregate traffic,
applications should create private tenant networks for
database and data storage network traffic, and utilize public
networks for client-facing traffic. By segregating this
traffic, quality of service and security decisions can be made
to ensure that each network has the correct level of service
that it requires.</para>
<para>Finally, decisions must be made about the routing of network
traffic. For some applications, a more complex policy
framework for routing must be developed. The economic cost of
transmitting traffic over expensive links versus cheaper
links, in addition to bandwidth, latency, and jitter
requirements, can be used to create a routing policy that will
satisfy business requirements.</para>
<para>How to respond to network events must also be taken into
consideration. As an example, how load is transferred from one
link to another during a failure scenario could be a factor in
the design. If network capacity is not planned correctly,
failover traffic could overwhelm other ports or network links
and create a cascading failure scenario. In this case, traffic
that fails over to one link overwhelms that link and then
moves to the subsequent links until the all network traffic
stops.</para>
<para>Network-focused OpenStack clouds have a number of operational
considerations that influence the selected design, including:</para>
<itemizedlist>
<listitem>
<para>Dynamic routing of static routes</para>
</listitem>
<listitem>
<para>Service level agreements (SLAs)</para>
</listitem>
<listitem>
<para>Ownership of user management</para>
</listitem>
</itemizedlist>
<para>An initial network consideration is the selection of a telecom
company or transit provider.</para>
<para>Make additional design decisions about monitoring and alarming.
This can be an internal responsibility or the responsibility of the
external provider. In the case of using an external provider, service
level agreements (SLAs) likely apply. In addition, other operational
considerations such as bandwidth, latency, and jitter can be part of an
SLA.</para>
<para>Consider the ability to upgrade the infrastructure. As demand for
network resources increase, operators add additional IP address blocks
and add additional bandwidth capacity. In addition, consider managing
hardware and software life cycle events, for example upgrades,
decommissioning, and outages, while avoiding service interruptions for
tenants.</para>
<para>Factor maintainability into the overall network design. This
includes the ability to manage and maintain IP addresses as well as the
use of overlay identifiers including VLAN tag IDs, GRE tunnel IDs, and
MPLS tags. As an example, if you may need to change all of the IP
addresses on a network, a process known as renumbering, then the design
must support this function.</para>
<para>Address network-focused applications when considering certain
operational realities. For example, consider the impending exhaustion
of IPv4 addresses, the migration to IPv6, and the use of private
networks to segregate different types of traffic that an application
receives or generates. In the case of IPv4 to IPv6 migrations,
applications should follow best practices for storing IP addresses.
We recommend you avoid relying on IPv4 features that did not carry over
to the IPv6 protocol or have differences in implementation.</para>
<para>To segregate traffic, allow applications to create a private tenant
network for database and storage network traffic. Use a public network
for services that require direct client access from the internet. Upon
segregating the traffic, consider quality of service (QoS) and security
to ensure each network has the required level of service.</para>
<para>Finally, consider the routing of network traffic.
For some applications, develop a complex policy framework for
routing. To create a routing policy that satisfies business requirements,
consider the economic cost of transmitting traffic over expensive links
versus cheaper links, in addition to bandwidth, latency, and jitter
requirements.</para>
<para>Additionally, consider how to respond to network events. As an
example, how load transfers from one link to another during a
failure scenario could be a factor in the design. If you do not plan
network capacity correctly, failover traffic could overwhelm other ports
or network links and create a cascading failure scenario. In this case,
traffic that fails over to one link overwhelms that link and then moves
to the subsequent links until all network traffic stops.</para>
</section>

View File

@ -6,34 +6,34 @@
xml:id="prescriptive-example-large-scale-web-app">
<?dbhtml stop-chunking?>
<title>Prescriptive examples</title>
<para>A large-scale web application has been designed with cloud
principles in mind. The application is designed to scale
horizontally in a bursting fashion and will generate a high
<para>An organization design a large-scale web application with cloud
principles in mind. The application scales
horizontally in a bursting fashion and generates a high
instance count. The application requires an SSL connection to
secure data and must not lose connection state to individual
servers.</para>
<para>An example design for this workload is depicted in the
figure below. In this example, a hardware load balancer is
configured to provide SSL offload functionality and to connect
<para>The figure below depicts an example design for this workload.
In this example, a hardware load balancer provides SSL offload
functionality and connects
to tenant networks in order to reduce address consumption.
This load balancer is linked to the routing architecture as it
will service the VIP for the application. The router and load
balancer are configured with GRE tunnel ID of the
application's tenant network and provided an IP address within
This load balancer links to the routing architecture as it
services the VIP for the application. The router and load
balancer use the GRE tunnel ID of the
application's tenant network and an IP address within
the tenant subnet but outside of the address pool. This is to
ensure that the load balancer can communicate with the
application's HTTP servers without requiring the consumption
of a public IP address.</para>
<para>Because sessions persist until they are closed, the routing and
switching architecture is designed for high availability.
Switches are meshed to each hypervisor and each other, and
<para>Because sessions persist until closed, the routing and
switching architecture provides high availability.
Switches mesh to each hypervisor and each other, and
also provide an MLAG implementation to ensure that layer-2
connectivity does not fail. Routers are configured with VRRP
and fully meshed with switches to ensure layer-3 connectivity.
Since GRE is used as an overlay network, Networking is installed
and configured to use the Open vSwitch agent in GRE tunnel
connectivity does not fail. Routers use VRRP
and fully mesh with switches to ensure layer-3 connectivity.
Since GRE is provides an overlay network, Networking is present
and uses the Open vSwitch agent in GRE tunnel
mode. This ensures all devices can reach all other devices and
that tenant networks can be created for private addressing
that you can create tenant networks for private addressing
links to the load balancer.
<mediaobject>
<imageobject>
@ -44,9 +44,9 @@
</mediaobject></para>
<para>A web service architecture has many options and optional
components. Due to this, it can fit into a large number of
other OpenStack designs however a few key components will need
other OpenStack designs. A few key components, however, need
to be in place to handle the nature of most web-scale
workloads. The user needs the following components:</para>
workloads. You require the following components:</para>
<itemizedlist>
<listitem>
<para>OpenStack Controller services (Image, Identity,
@ -66,59 +66,59 @@
<para>Telemetry module</para>
</listitem>
</itemizedlist>
<para>
Beyond the normal Identity, Compute, Image service and Object
Storage components, the Orchestration module is a recommended
component to handle properly scaling the workloads to adjust to
demand. Due to the requirement for auto-scaling,
the design includes the Telemetry module. Web services
tend to be bursty in load, have very defined peak and valley
usage patterns and, as a result, benefit from automatic scaling
of instances based upon traffic. At a network level, a split
network configuration will work well with databases residing on
private tenant networks since these do not emit a large quantity
of broadcast traffic and may need to interconnect to some
databases for content.
<para>Beyond the normal Identity, Compute, Image service, and Object
Storage components, we recommend the Orchestration module
component to handle the proper scaling of workloads to adjust to
demand. Due to the requirement for auto-scaling,
the design includes the Telemetry module. Web services
tend to be bursty in load, have very defined peak and valley
usage patterns and, as a result, benefit from automatic scaling
of instances based upon traffic. At a network level, a split
network configuration works well with databases residing on
private tenant networks since these do not emit a large quantity
of broadcast traffic and may need to interconnect to some
databases for content.
</para>
<section xml:id="load-balancing">
<title>Load balancing</title>
<para>Load balancing was included in this design to spread
requests across multiple instances. This workload scales well
horizontally across large numbers of instances. This allows
instances to run without publicly routed IP addresses and
simply rely on the load balancer for the service to be
globally reachable. Many of these services do not require
<para>Load balancing spreads requests across multiple instances.
This workload scales well horizontally across large numbers of
instances. This enables instances to run without publicly
routed IP addresses and instead to rely on the load
balancer to provide a globally reachable service.
Many of these services do not require
direct server return. This aids in address planning and
utilization at scale since only the virtual IP (VIP) must be
public.</para></section>
public.</para>
</section>
<section xml:id="overlay-networks">
<title>Overlay networks</title>
<para>
The overlay functionality design includes OpenStack Networking
in Open vSwitch GRE tunnel mode.
In this case, the layer-3 external routers are paired with
VRRP and switches should be paired with an implementation of
MLAG running to ensure that you do not lose connectivity with
In this case, the layer-3 external routers pair with
VRRP, and switches pair with an implementation of
MLAG to ensure that you do not lose connectivity with
the upstream routing infrastructure.
</para>
</section>
<section xml:id="performance-tuning">
<title>Performance tuning</title>
<para>Network level tuning for this workload is minimal.
Quality-of-Service (QoS) will be applied to these workloads
<para>Network level tuning for this workload is minimal.
Quality-of-Service (QoS) applies to these workloads
for a middle ground Class Selector depending on existing
policies. It will be higher than a best effort queue but lower
policies. It is higher than a best effort queue but lower
than an Expedited Forwarding or Assured Forwarding queue.
Since this type of application generates larger packets with
longer-lived connections, bandwidth utilization can be
optimized for long duration TCP. Normal bandwidth planning
longer-lived connections, you can optimize bandwidth utilization
for long duration TCP. Normal bandwidth planning
applies here with regards to benchmarking a session's usage
multiplied by the expected number of concurrent sessions with
overhead.</para></section>
overhead.</para>
</section>
<section xml:id="network-functions">
<title>Network functions</title>
<para>Network functions is a broad category but encompasses
<para>Network functions is a broad category but encompasses
workloads that support the rest of a system's network. These
workloads tend to consist of large amounts of small packets
that are very short lived, such as DNS queries or SNMP traps.
@ -134,63 +134,57 @@
<para>The supporting network for this type of configuration needs
to have a low latency and evenly distributed availability.
This workload benefits from having services local to the
consumers of the service. A multi-site approach is used as
consumers of the service. Use a multi-site approach as
well as deploying many copies of the application to handle
load as close as possible to consumers. Since these
applications function independently, they do not warrant
running overlays to interconnect tenant networks. Overlays
also have the drawback of performing poorly with rapid flow
setup and may incur too much overhead with large quantities of
small packets and are therefore not recommended.</para>
<para>QoS is desired for some workloads to ensure delivery. DNS
small packets and therefore we do not recommend them.</para>
<para>QoS is desirable for some workloads to ensure delivery. DNS
has a major impact on the load times of other services and
needs to be reliable and provide rapid responses. It is to
configure rules in upstream devices to apply a higher Class
needs to be reliable and provide rapid responses. Configure rules
in upstream devices to apply a higher Class
Selector to DNS to ensure faster delivery or a better spot in
queuing algorithms.</para></section>
queuing algorithms.</para>
</section>
<section xml:id="cloud-storage">
<title>Cloud storage</title>
<para>
Another common use case for OpenStack environments is to provide
a cloud-based file storage and sharing service. You might
consider this a storage-focused use case, but its network-side
requirements make it a network-focused use case.</para>
<para>
For example, consider a cloud backup application. This workload
has two specific behaviors that impact the network. Because this
workload is an externally-facing service and an
internally-replicating application, it has both <glossterm
baseform="north-south traffic">north-south</glossterm> and
<glossterm>east-west traffic</glossterm>
considerations, as follows:
</para>
<para>Another common use case for OpenStack environments is providing
a cloud-based file storage and sharing service. You might
consider this a storage-focused use case, but its network-side
requirements make it a network-focused use case.</para>
<para>For example, consider a cloud backup application. This workload
has two specific behaviors that impact the network. Because this
workload is an externally-facing service and an
internally-replicating application, it has both <glossterm
baseform="north-south traffic">north-south</glossterm> and
<glossterm>east-west traffic</glossterm>
considerations:</para>
<variablelist>
<varlistentry>
<term>north-south traffic</term>
<listitem>
<para>
When a user uploads and stores content, that content moves
<para>When a user uploads and stores content, that content moves
into the OpenStack installation. When users download this
content, the content moves from the OpenStack
installation. Because this service is intended primarily
content, the content moves out from the OpenStack
installation. Because this service operates primarily
as a backup, most of the traffic moves southbound into the
environment. In this situation, it benefits you to
configure a network to be asymmetrically downstream
because the traffic that enters the OpenStack installation
is greater than the traffic that leaves the installation.
</para>
is greater than the traffic that leaves the installation.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>east-west traffic</term>
<listitem>
<para>
Likely to be fully symmetric. Because replication
<para>Likely to be fully symmetric. Because replication
originates from any node and might target multiple other
nodes algorithmically, it is less likely for this traffic
to have a larger volume in any specific direction. However
this traffic might interfere with north-south traffic.
</para>
this traffic might interfere with north-south traffic.</para>
</listitem>
</varlistentry>
</variablelist>
@ -201,16 +195,15 @@
/>
</imageobject>
</mediaobject>
<para>
This application prioritizes the north-south traffic over
<para>This application prioritizes the north-south traffic over
east-west traffic: the north-south traffic involves
customer-facing data.
</para>
customer-facing data.</para>
<para>The network design in this case is less dependent on
availability and more dependent on being able to handle high
bandwidth. As a direct result, it is beneficial to forego
redundant links in favor of bonding those connections. This
increases available bandwidth. It is also beneficial to
configure all devices in the path, including OpenStack, to
generate and pass jumbo frames.</para></section>
availability and more dependent on being able to handle high
bandwidth. As a direct result, it is beneficial to forgo
redundant links in favor of bonding those connections. This
increases available bandwidth. It is also beneficial to
configure all devices in the path, including OpenStack, to
generate and pass jumbo frames.</para>
</section>
</section>

View File

@ -13,27 +13,23 @@
involve those made about the protocol layer and the point when
IP comes into the picture. As an example, a completely
internal OpenStack network can exist at layer 2 and ignore
layer 3 however, in order for any traffic to go outside of
that cloud, to another network, or to the Internet, a layer-3
router or switch must be involved.</para>
<para>
The past few years have seen two competing trends in
layer 3. In order for any traffic to go outside of
that cloud, to another network, or to the Internet, however, you must
use a layer-3 router or switch.</para>
<para>The past few years have seen two competing trends in
networking. One trend leans towards building data center network
architectures based on layer-2 networking. Another trend treats
the cloud environment essentially as a miniature version of the
Internet. This approach is radically different from the network
architecture approach that is used in the staging environment:
the Internet is based entirely on layer-3 routing rather than
layer-2 switching.
</para>
<para>
A network designed on layer-2 protocols has advantages over one
architecture approach in the staging environment:
the Internet only uses layer-3 routing rather than
layer-2 switching.</para>
<para>A network designed on layer-2 protocols has advantages over one
designed on layer-3 protocols. In spite of the difficulties of
using a bridge to perform the network role of a router, many
vendors, customers, and service providers choose to use Ethernet
in as many parts of their networks as possible. The benefits of
selecting a layer-2 design are:
</para>
selecting a layer-2 design are:</para>
<itemizedlist>
<listitem>
<para>Ethernet frames contain all the essentials for
@ -47,13 +43,13 @@
protocol.</para>
</listitem>
<listitem>
<para>More layers added to the Ethernet frame only slow
<para>Adding more layers to the Ethernet frame only slows
the networking process down. This is known as 'nodal
processing delay'.</para>
</listitem>
<listitem>
<para>Adjunct networking features, for example class of
service (CoS) or multicasting, can be added to
<para>You can add adjunct networking features, for
example class of service (CoS) or multicasting, to
Ethernet as readily as IP networks.</para>
</listitem>
<listitem>
@ -62,45 +58,37 @@
</listitem>
</itemizedlist>
<para>Most information starts and ends inside Ethernet frames.
Today this applies to data, voice (for example, VoIP) and
video (for example, web cameras). The concept is that, if more
of the end-to-end transfer of information from a source to a
destination can be done in the form of Ethernet frames, more
of the benefits of Ethernet can be realized on the network.
Though it is not a substitute for IP networking, networking at
layer 2 can be a powerful adjunct to IP networking.
</para>
Today this applies to data, voice (for example, VoIP), and
video (for example, web cameras). The concept is that, if you can
perform more of the end-to-end transfer of information from
a source to a destination in the form of Ethernet frames, the network
benefits more from the advantages of Ethernet.
Although it is not a substitute for IP networking, networking at
layer 2 can be a powerful adjunct to IP networking.</para>
<para>
Layer-2 Ethernet usage has these advantages over layer-3 IP
network usage:
</para>
<itemizedlist>
<listitem>
<para>
Speed
</para>
<para>Speed</para>
</listitem>
<listitem>
<para>
Reduced overhead of the IP hierarchy.
</para>
<para>Reduced overhead of the IP hierarchy.</para>
</listitem>
<listitem>
<para>
No need to keep track of address configuration as systems
are moved around. Whereas the simplicity of layer-2
<para>No need to keep track of address configuration as systems
move around. Whereas the simplicity of layer-2
protocols might work well in a data center with hundreds
of physical machines, cloud data centers have the
additional burden of needing to keep track of all virtual
machine addresses and networks. In these data centers, it
is not uncommon for one physical node to support 30-40
instances.
</para>
instances.</para>
</listitem>
</itemizedlist>
<important>
<para>
Networking at the frame level says nothing
<para>Networking at the frame level says nothing
about the presence or absence of IP addresses at the packet
level. Almost all ports, links, and devices on a network of
LAN switches still have IP addresses, as do all the source and
@ -125,8 +113,8 @@
limited.</para>
</listitem>
<listitem>
<para>The need to maintain a set of layer-4 devices to
handle traffic control must be accommodated.</para>
<para>You must accommodate the need to maintain a set of
layer-4 devices to handle traffic control.</para>
</listitem>
<listitem>
<para>MLAG, often used for switch redundancy, is a
@ -138,21 +126,20 @@
without IP addresses and ICMP.</para>
</listitem>
<listitem>
<para>
Configuring <glossterm
<para>Configuring <glossterm
baseform="Address Resolution Protocol (ARP)">ARP</glossterm>
is considered complicated on large layer-2 networks.</para>
can be complicated on large layer-2 networks.</para>
</listitem>
<listitem>
<para>All network devices need to be aware of all MACs,
even instance MACs, so there is constant churn in MAC
tables and network state changes as instances are
started or stopped.</para>
tables and network state changes as instances start and
stop.</para>
</listitem>
<listitem>
<para>Migrating MACs (instance migration) to different
physical locations are a potential problem if ARP
table timeouts are not set properly.</para>
physical locations are a potential problem if you do not
set ARP table timeouts properly.</para>
</listitem>
</itemizedlist>
<para>It is important to know that layer 2 has a very limited set
@ -173,14 +160,15 @@
with the new location of the instance.</para>
<para>In a layer-2 network, all devices are aware of all MACs,
even those that belong to instances. The network state
information in the backbone changes whenever an instance is
started or stopped. As a result there is far too much churn in
the MAC tables on the backbone switches.</para></section>
information in the backbone changes whenever an instance starts
or stops. As a result there is far too much churn in
the MAC tables on the backbone switches.</para>
</section>
<section xml:id="layer-3-arch-advantages">
<title>Layer-3 architecture advantages</title>
<para>In the layer 3 case, there is no churn in the routing tables
due to instances starting and stopping. The only time there
would be a routing state change would be in the case of a Top
would be a routing state change is in the case of a Top
of Rack (ToR) switch failure or a link failure in the backbone
itself. Other advantages of using a layer-3 architecture
include:</para>
@ -194,15 +182,15 @@
straightforward.</para>
</listitem>
<listitem>
<para>Layer 3 can be configured to use <glossterm
<para>You can configure layer 3 to use <glossterm
baseform="Border Gateway Protocol (BGP)">BGP</glossterm>
confederation for scalability so core routers have state
proportional to the number of racks, not to the number of
servers or instances.</para>
</listitem>
<listitem>
<para>Routing ensures that instance MAC and IP addresses
out of the network core reducing state churn. Routing
<para>Routing takes instance MAC and IP addresses
out of the network core, reducing state churn. Routing
state changes only occur in the case of a ToR switch
failure or backbone link failure.</para>
</listitem>
@ -211,7 +199,7 @@
example ICMP, to monitor and manage traffic.</para>
</listitem>
<listitem>
<para>Layer-3 architectures allow for the use of Quality
<para>Layer-3 architectures enable the use of Quality
of Service (QoS) to manage network performance.</para>
</listitem>
</itemizedlist>
@ -220,17 +208,16 @@
<para>The main limitation of layer 3 is that there is no built-in
isolation mechanism comparable to the VLANs in layer-2
networks. Furthermore, the hierarchical nature of IP addresses
means that an instance will also be on the same subnet as its
physical host. This means that it cannot be migrated outside
means that an instance is on the same subnet as its
physical host. This means that you cannot migrate it outside
of the subnet easily. For these reasons, network
virtualization needs to use IP <glossterm>encapsulation</glossterm>
and software at
the end hosts for both isolation, as well as for separation of
the addressing in the virtual layer from addressing in the
and software at the end hosts for isolation and the separation of
the addressing in the virtual layer from the addressing in the
physical layer. Other potential disadvantages of layer 3
include the need to design an IP addressing scheme rather than
relying on the switches to automatically keep track of the MAC
addresses and to configure the interior gateway routing
relying on the switches to keep track of the MAC
addresses automatically and to configure the interior gateway routing
protocol in the switches.</para>
</section>
</section>
@ -242,13 +229,13 @@
Data in an OpenStack cloud moves both between instances across
the network (also known as East-West), as well as in and out
of the system (also known as North-South). Physical server
nodes have network requirements that are independent of those
used by instances which need to be isolated from the core
network to account for scalability. It is also recommended to
functionally separate the networks for security purposes and
tune performance through traffic shaping.</para>
<para>A number of important general technical and business factors
need to be taken into consideration when planning and
nodes have network requirements that are independent of instance
network requirements, which you must isolate from the core
network to account for scalability. We recommend
functionally separating the networks for security purposes and
tuning performance through traffic shaping.</para>
<para>You must consider a number of important general technical
and business factors when planning and
designing an OpenStack network. They include:</para>
<itemizedlist>
<listitem>
@ -286,11 +273,10 @@
future production environments.</para>
</listitem>
</itemizedlist>
<para>Keeping all of these in mind, the following network design
recommendations can be made:</para>
<para>Bearing in mind these considerations, we recommend the following:</para>
<itemizedlist>
<listitem>
<para>Layer-3 designs are preferred over layer-2
<para>Layer-3 designs are preferable to layer-2
architectures.</para>
</listitem>
<listitem>
@ -327,16 +313,16 @@
</itemizedlist></section>
<section xml:id="additional-considerations-network-focus">
<title>Additional considerations</title>
<para>There are numerous topics to consider when designing a
<para>There are several further considerations when designing a
network-focused OpenStack cloud.</para>
<section xml:id="openstack-networking-versus-nova-network">
<title>OpenStack Networking versus legacy networking (nova-network)
considerations</title>
<para>Selecting the type of networking technology to implement
<para>Selecting the type of networking technology to implement
depends on many factors. OpenStack Networking (neutron) and
legacy networking (nova-network) both have their advantages and disadvantages.
They are both valid and supported options that fit different
use cases as described in the following table.</para>
legacy networking (nova-network) both have their advantages and
disadvantages. They are both valid and supported options that fit
different use cases:</para>
<informaltable rules="all">
<col width="40%" />
<col width="60%" />
@ -375,79 +361,75 @@
<title>Redundant networking: ToR switch high availability
risk analysis</title>
<para>A technical consideration of networking is the idea that
switching gear in the data center that should be installed
you should install switching gear in a data center
with backup switches in case of hardware failure.</para>
<para>
Research into the mean time between failures (MTBF) on switches
<para>Research indicates the mean time between failures (MTBF) on switches
is between 100,000 and 200,000 hours. This number is dependent
on the ambient temperature of the switch in the data
center. When properly cooled and maintained, this translates to
between 11 and 22 years before failure. Even in the worst case
of poor ventilation and high ambient temperatures in the data
center, the MTBF is still 2-3 years. This is based on published
research found at <link
center, the MTBF is still 2-3 years. See <link
xlink:href="http://www.garrettcom.com/techsupport/papers/ethernet_switch_reliability.pdf">http://www.garrettcom.com/techsupport/papers/ethernet_switch_reliability.pdf</link>
and <link
xlink:href="http://www.n-tron.com/pdf/network_availability.pdf">http://www.n-tron.com/pdf/network_availability.pdf</link>.
</para>
<para>In most cases, it is much more economical to only use a
for further information.</para>
<para>In most cases, it is much more economical to use a
single switch with a small pool of spare switches to replace
failed units than it is to outfit an entire data center with
redundant switches. Applications should also be able to
tolerate rack level outages without affecting normal
operations since network and compute resources are easily
provisioned and plentiful.</para></section>
redundant switches. Applications should tolerate rack level
outages without affecting normal
operations, since network and compute resources are easily
provisioned and plentiful.</para>
</section>
<section xml:id="preparing-for-future-ipv6-support">
<title>Preparing for the future: IPv6 support</title>
<para>
One of the most important networking topics today is the
impending exhaustion of IPv4 addresses. In early 2014, ICANN
announced that they started allocating the final IPv4 address
blocks to the Regional Internet Registries (<link
xlink:href="http://www.internetsociety.org/deploy360/blog/2014/05/goodbye-ipv4-iana-starts-allocating-final-address-blocks/">http://www.internetsociety.org/deploy360/blog/2014/05/goodbye-ipv4-iana-starts-allocating-final-address-blocks/</link>).
This means the IPv4 address space is close to being fully
allocated. As a result, it will soon become difficult to
allocate more IPv4 addresses to an application that has
experienced growth, or is expected to scale out, due to the lack
of unallocated IPv4 address blocks.</para>
<para>For network focused applications the future is the IPv6
<para>One of the most important networking topics today is the
impending exhaustion of IPv4 addresses. In early 2014, ICANN
announced that they started allocating the final IPv4 address
blocks to the Regional Internet Registries (<link
xlink:href="http://www.internetsociety.org/deploy360/blog/2014/05/goodbye-ipv4-iana-starts-allocating-final-address-blocks/">http://www.internetsociety.org/deploy360/blog/2014/05/goodbye-ipv4-iana-starts-allocating-final-address-blocks/</link>).
This means the IPv4 address space is close to being fully
allocated. As a result, it will soon become difficult to
allocate more IPv4 addresses to an application that has
experienced growth, or that you expect to scale out, due to the lack
of unallocated IPv4 address blocks.</para>
<para>For network focused applications the future is the IPv6
protocol. IPv6 increases the address space significantly,
fixes long standing issues in the IPv4 protocol, and will
become essential for network focused applications in the
future.</para>
<para>OpenStack Networking supports IPv6 when configured to take advantage of
the feature. To enable it, simply create an IPv6 subnet in
<para>OpenStack Networking supports IPv6 when configured to take
advantage of it. To enable IPv6, create an IPv6 subnet in
Networking and use IPv6 prefixes when creating security
groups.</para></section>
<section xml:id="asymmetric-links">
<title>Asymmetric links</title>
<para>When designing a network architecture, the traffic patterns
of an application will heavily influence the allocation of
total bandwidth and the number of links that are used to send
<para>When designing a network architecture, the traffic patterns
of an application heavily influence the allocation of
total bandwidth and the number of links that you use to send
and receive traffic. Applications that provide file storage
for customers will allocate bandwidth and links to favor
incoming traffic, whereas video streaming applications will
allocate bandwidth and links to favor outgoing traffic.</para></section>
for customers allocate bandwidth and links to favor
incoming traffic, whereas video streaming applications
allocate bandwidth and links to favor outgoing traffic.</para>
</section>
<section xml:id="performance-network-focus">
<title>Performance</title>
<para>It is important to analyze the applications' tolerance for
<para>It is important to analyze the applications' tolerance for
latency and jitter when designing an environment to support
network focused applications. Certain applications, for
example VoIP, are less tolerant of latency and jitter. Where
latency and jitter are concerned, certain applications may
require tuning of QoS parameters and network device queues to
ensure that they are queued for transmit immediately or
guaranteed minimum bandwidth. Since OpenStack currently does
not support these functions, some considerations may need to
be made for the network plug-in selected.</para>
<para>The location of a service may also impact the application or
consumer experience. If an application is designed to serve
differing content to differing users it will need to be
designed to properly direct connections to those specific
locations. Use a multi-site installation for these situations,
where appropriate.</para>
<para>Networking can be implemented in two separate
ways. The legacy networking (nova-network) provides a flat DHCP network
ensure that they queue for transmit immediately or
guarantee minimum bandwidth. Since OpenStack currently does
not support these functions, consider carefully your selected
network plug-in.</para>
<para>The location of a service may also impact the application or
consumer experience. If an application serves
differing content to different users it must properly direct
connections to those specific locations. Where appropriate,
use a multi-site installation for these situations.</para>
<para>You can implement networking in two separate
ways. Legacy networking (nova-network) provides a flat DHCP network
with a single broadcast domain. This implementation does not
support tenant isolation networks or advanced plug-ins, but it
is currently the only way to implement a distributed layer-3
@ -457,15 +439,15 @@
variety of network methods. Some of these include a layer-2
only provider network model, external device plug-ins, or even
OpenFlow controllers.</para>
<para>Networking at large scales becomes a set of boundary
<para>Networking at large scales becomes a set of boundary
questions. The determination of how large a layer-2 domain
needs to be is based on the amount of nodes within the domain
must be is based on the amount of nodes within the domain
and the amount of broadcast traffic that passes between
instances. Breaking layer-2 boundaries may require the
implementation of overlay networks and tunnels. This decision
is a balancing act between the need for a smaller overhead or
a need for a smaller domain.</para>
<para>When selecting network devices, be aware that making this
<para>When selecting network devices, be aware that making this
decision based on the greatest port density often comes with a
drawback. Aggregation switches and routers have not all kept
pace with Top of Rack switches and may induce bottlenecks on

View File

@ -6,187 +6,160 @@
xml:id="user-requirements-network-focus">
<?dbhtml stop-chunking?>
<title>User requirements</title>
<para>Network focused architectures vary from the general purpose
designs. They are heavily influenced by a specific subset of
applications that interact with the network in a more
impacting way. Some of the business requirements that will
influence the design include:</para>
<para>Network-focused architectures vary from the general-purpose
architecture designs. Certain network-intensive applications influence
these architectures. Some of the business requirements that influence
the design include:</para>
<itemizedlist>
<listitem>
<para>User experience: User experience is impacted by
network latency through slow page loads, degraded
video streams, and low quality VoIP sessions. Users
are often not aware of how network design and
architecture affects their experiences. Both
enterprise customers and end-users rely on the network
for delivery of an application. Network performance
problems can provide a negative experience for the
end-user, as well as productivity and economic loss.
<para>Network latency through slow page loads, degraded video
streams, and low quality VoIP sessions impacts the user
experience. Users are often not aware of how network design and
architecture affects their experiences. Both enterprise customers
and end-users rely on the network for delivery of an application.
Network performance problems can result in a negative experience
for the end-user, as well as productivity and economic loss.
</para>
</listitem>
<listitem>
<para>Regulatory requirements: Networks need to take into
consideration any regulatory requirements about the
physical location of data as it traverses the network.
For example, Canadian medical records cannot pass
outside of Canadian sovereign territory. Another
network consideration is maintaining network
segregation of private data flows and ensuring that
the network between cloud locations is encrypted where
required. Network architectures are affected by
regulatory requirements for encryption and protection
of data in flight as the data moves through various
networks.</para>
<para>Regulatory requirements: Consider regulatory
requirements about the physical location of data as it traverses
the network. In addition, maintain network segregation of private
data flows while ensuring an encrypted network between cloud
locations where required. Regulatory requirements for encryption
and protection of data in flight affect network architectures as
the data moves through various networks.</para>
</listitem>
</itemizedlist>
<para>Many jurisdictions have legislative and regulatory
requirements governing the storage and management of data in
cloud environments. Common areas of regulation include:</para>
<para>Many jurisdictions have legislative and regulatory requirements
governing the storage and management of data in cloud environments.
Common areas of regulation include:</para>
<itemizedlist>
<listitem>
<para>Data retention policies ensuring storage of
persistent data and records management to meet data
archival requirements.</para>
<para>Data retention policies ensuring storage of persistent data
and records management to meet data archival requirements.</para>
</listitem>
<listitem>
<para>Data ownership policies governing the possession and
responsibility for data.</para>
responsibility for data.</para>
</listitem>
<listitem>
<para>Data sovereignty policies governing the storage of
data in foreign countries or otherwise separate
jurisdictions.</para>
<para>Data sovereignty policies governing the storage of data in
foreign countries or otherwise separate jurisdictions.</para>
</listitem>
<listitem>
<para>Data compliance policies governing where information
needs to reside in certain locations due to regular
issues and, more importantly, where it cannot reside
in other locations for the same reason.</para>
<para>Data compliance policies govern where information can and
cannot reside in certain locations.</para>
</listitem>
</itemizedlist>
<para>Examples of such legal frameworks include the data
protection framework of the European Union (<link
xlink:href="http://ec.europa.eu/justice/data-protection/">http://ec.europa.eu/justice/data-protection/</link>)
and the requirements of the Financial Industry Regulatory
Authority (<link
xlink:href="http://www.finra.org/Industry/Regulation/FINRARules">http://www.finra.org/Industry/Regulation/FINRARules</link>)
in the United States. Consult a local regulatory body for more
information.</para>
<para>Examples of such legal frameworks include the data protection
framework of the European Union
(<link xlink:href="http://ec.europa.eu/justice/data-protection/">http://ec.europa.eu/justice/data-protection/</link>)
and the requirements of the Financial Industry Regulatory Authority
(<link xlink:href="http://www.finra.org/Industry/Regulation/FINRARules">http://www.finra.org/Industry/Regulation/FINRARules</link>)
in the United States. Consult a local regulatory body for more
information.</para>
<section xml:id="high-availability-issues-network-focus">
<title>High availability issues</title>
<para>OpenStack installations with high demand on network
resources have high availability requirements that are
determined by the application and use case. Financial
transaction systems will have a much higher requirement for
high availability than a development application. Forms of
network availability, for example quality of service (QoS),
can be used to improve the network performance of sensitive
applications, for example VoIP and video streaming.</para>
<para>Often, high performance systems will have SLA requirements
for a minimum QoS with regard to guaranteed uptime, latency
and bandwidth. The level of the SLA can have a significant
impact on the network architecture and requirements for
redundancy in the systems.</para></section>
<para>Depending on the application and use case, network-intensive
OpenStack installations can have high availability requirements.
Financial transaction systems have a much higher requirement for high
availability than a development application. Use network availability
technologies, for example quality of service (QoS), to improve the
network performance of sensitive applications such as VoIP and video
streaming.</para>
<para>High performance systems have SLA requirements for a minimum
QoS with regard to guaranteed uptime, latency, and bandwidth. The level
of the SLA can have a significant impact on the network architecture and
requirements for redundancy in the systems.</para>
</section>
<section xml:id="risks-network-focus">
<title>Risks</title>
<variablelist>
<varlistentry>
<term>Network misconfigurations</term>
<listitem>
<para>Configuring incorrect IP
addresses, VLANs, and routes can cause outages to
areas of the network or, in the worst-case scenario,
the entire cloud infrastructure. Misconfigurations can
cause disruptive problems and should be automated to
minimize the opportunity for operator error.</para>
<para>Configuring incorrect IP addresses, VLANs, and routers
can cause outages to areas of the network or, in the worst-case
scenario, the entire cloud infrastructure. Automate network
configurations to minimize the opportunity for operator error
as it can cause disruptive problems.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Capacity planning</term>
<listitem>
<para>Cloud networks need to be managed
for capacity and growth over time. There is a risk
that the network will not grow to support the
workload. Capacity planning includes the purchase of
network circuits and hardware that can potentially
have lead times measured in months or more.</para>
<para>Cloud networks require management for capacity and growth
over time. Capacity planning includes the purchase of network
circuits and hardware that can potentially have lead times
measured in months or years.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Network tuning</term>
<listitem>
<para>Cloud networks need to be configured
to minimize link loss, packet loss, packet storms,
broadcast storms, and loops.</para>
<para>Configure cloud networks to minimize link loss, packet loss,
packet storms, broadcast storms, and loops.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Single Point Of Failure (SPOF)</term>
<listitem>
<para>High availability
must be taken into account even at the physical and
environmental layers. If there is a single point of
failure due to only one upstream link, or only one
power supply, an outage becomes unavoidable.</para>
<para>Consider high availability at the physical and environmental
layers. If there is a single point of failure due to only one
upstream link, or only one power supply, an outage can become
unavoidable.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Complexity</term>
<listitem>
<para>An overly complex network design becomes
difficult to maintain and troubleshoot. While
automated tools that handle overlay networks or device
level configuration can mitigate this, non-traditional
interconnects between functions and specialized
hardware need to be well documented or avoided to
prevent outages.</para>
<para>An overly complex network design can be difficult to
maintain and troubleshoot. While device-level configuration
can ease maintenance concerns and automated tools can handle
overlay networks, avoid or document non-traditional interconnects
between functions and specialized hardware to prevent
outages.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Non-standard features</term>
<listitem>
<para>There are additional risks
that arise from configuring the cloud network to take
advantage of vendor specific features. One example is
multi-link aggregation (MLAG) that is being used to
provide redundancy at the aggregator switch level of
the network. MLAG is not a standard and, as a result,
each vendor has their own proprietary implementation
of the feature. MLAG architectures are not
interoperable across switch vendors, which leads to
vendor lock-in, and can cause delays or inability when
upgrading components.</para>
<para>There are additional risks that arise from configuring the
cloud network to take advantage of vendor specific features.
One example is multi-link aggregation (MLAG) used to provide
redundancy at the aggregator switch level of the network. MLAG
is not a standard and, as a result, each vendor has their own
proprietary implementation of the feature. MLAG architectures
are not interoperable across switch vendors, which leads to
vendor lock-in, and can cause delays or inability when upgrading
components.</para>
</listitem>
</varlistentry>
</variablelist>
</section>
<section xml:id="security-network-focus"><title>Security</title>
<para>Security is often overlooked or added after a design has
been implemented. Consider security implications and
requirements before designing the physical and logical network
topologies. Some of the factors that need to be addressed
include making sure the networks are properly segregated and
traffic flows are going to the correct destinations without
crossing through locations that are undesirable. Some examples
of factors that need to be taken into consideration are:</para>
<para>Users often overlook or add security after a design implementation.
Consider security implications and requirements before designing the
physical and logical network topologies. Make sure that the networks are
properly segregated and traffic flows are going to the correct
destinations without crossing through locations that are undesirable.
Consider the following example factors:</para>
<itemizedlist>
<listitem>
<para>Firewalls</para>
</listitem>
<listitem>
<para>Overlay interconnects for joining separated tenant
networks</para>
<para>Overlay interconnects for joining separated tenant networks</para>
</listitem>
<listitem>
<para>Routing through or avoiding specific networks</para>
</listitem>
</itemizedlist>
<para>Another security vulnerability that must be taken into
account is how networks are attached to hypervisors. If a
network must be separated from other systems at all costs, it
may be necessary to schedule instances for that network onto
dedicated compute nodes. This may also be done to mitigate
against exploiting a hypervisor breakout allowing the attacker
access to networks from a compromised instance.</para>
<para>How networks attach to hypervisors can expose security
vulnerabilities. To mitigate against exploiting hypervisor breakouts,
separate networks from other systems and schedule instances for the
network onto dedicated compute nodes. This prevents attackers
from having access to the networks from a compromised instance.</para>
</section>
</section>