security-doc/security-guide/source/instance-management/security-services-for-insta...

===============================
Security services for instances
===============================

Entropy to instances
~~~~~~~~~~~~~~~~~~~~

We consider entropy to refer to the quality and source of
random data that is available to an instance. Cryptographic
technologies typically rely heavily on randomness, requiring a
high quality pool of entropy to draw from. It is typically hard
for a virtual machine to get enough entropy to support these
operations, which is referred to as entropy starvation. Entropy
starvation can manifest in instances as something seemingly
unrelated. For example, slow boot time may be caused by the
instance waiting for ssh key generation. Entropy starvation
may also motivate users to employ poor quality entropy sources
from within the instance, making applications running in the
cloud less secure overall.

Fortunately, a cloud architect may address these issues by
providing a high quality source of entropy to the cloud
instances. This can be done by having enough hardware random
number generators (HRNG) in the cloud to support the instances.
In this case, "enough" is somewhat domain specific. For
everyday operations, a modern HRNG is likely to produce enough
entropy to support 50-100 compute nodes. High bandwidth HRNGs,
such as the RdRand instruction available with Intel Ivy Bridge
and newer processors could potentially handle more nodes. For a
given cloud, an architect needs to understand the application
requirements to ensure that sufficient entropy is available.

The Virtio RNG is a random number generator that uses
``/dev/random`` as the source of entropy by default, however can be
configured to use a hardware RNG or a tool such as the entropy
gathering daemon (`EGD <http://egd.sourceforge.net>`_) to provide
a way to fairly and securely distribute entropy through a
distributed system. The Virtio RNG is enabled using the ``hw_rng``
property of the metadata used to create the instance.

Scheduling instances to nodes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Before an instance is created, a host for the image
instantiation must be selected. This selection is performed by
the ``nova-scheduler`` which determines how to dispatch compute
and volume requests.

The ``FilterScheduler`` is the default scheduler for OpenStack
Compute, although other schedulers exist (see the section `Scheduling
<https://docs.openstack.org/ocata/config-reference/compute/schedulers.html>`_
in the `OpenStack Configuration Reference
<https://docs.openstack.org/ocata/config-reference/config-overview.html>`_
). This works in collaboration with 'filter hints' to decide where an
instance should be started. This process of host selection allows
administrators to fulfill many different security and compliance
requirements. Depending on the cloud deployment type for example, one
could choose to have tenant instances reside on the same hosts whenever
possible if data isolation was a primary concern. Conversely one could
attempt to have instances for a tenant reside on as many different hosts
as possible for availability or fault tolerance reasons.

Filter schedulers fall under four main categories:

Resource based filters
    These filters will create an instance based on the utilizations of
    the hypervisor host sets and can trigger on free or used properties
    such as RAM, IO, or CPU utilization.

Image based filters
    This delegates instance creation based on the image used, such as
    the operating system of the VM or type of image used.

Environment based filters
    This filter will create an instance based on external details such
    as in a specific IP range, across availability zones, or on the
    same host as another instance.

Custom criteria
    This filter will delegate instance creation based on user or
    administrator provided criteria such as trusts or metadata parsing.

Multiple filters can be applied at once, such as the
``ServerGroupAffinity`` filter to ensure an instance is created on a
member of a specific set of hosts and ``ServerGroupAntiAffinity``
filter to ensure that same instance is not created on another specific
set of hosts. These filters should be analyzed carefully to ensure
they do not conflict with each other and result in rules that prevent
the creation of instances.

.. image:: ../figures/filteringWorkflow1.png

The ``GroupAffinity`` and ``GroupAntiAffinity`` filters conflict and
should not both be enabled at the same time.

The ``DiskFilter`` filter is capable of oversubscribing disk space.
While not normally an issue, this can be a concern on storage devices
that are thinly provisioned, and this filter should be used with
well-tested quotas applied.

We recommend you disable filters that parse things that are provided
by users or are able to be manipulated such as metadata.

Trusted images
~~~~~~~~~~~~~~
In a cloud environment, users work with either pre-installed images or
images they upload themselves. In both cases, users should be able to
ensure the image they are utilizing has not been tampered with. The
ability to verify images is a fundamental imperative for security. A
chain of trust is needed from the source of the image to the
destination where it's used. This can be accomplished by signing
images obtained from trusted sources and by verifying the signature
prior to use. Various ways to obtain and create verified images will
be discussed below, followed by a description of the image signature
verification feature.


Image creation process
----------------------

The OpenStack Documentation provides guidance on how to create and
upload an image to the Image service. Additionally it is assumed that
you have a process by which you install and harden operating systems.
Thus, the following items will provide additional guidance on how to
ensure your images are transferred securely into OpenStack. There are
a variety of options for obtaining images. Each has specific steps that
help validate the image's provenance.

The first option is to obtain boot media from a trusted source.

.. code:: console

    $ mkdir -p /tmp/download_directorycd /tmp/download_directory
    $ wget http://mirror.anl.gov/pub/ubuntu-iso/CDs/precise/ubuntu-12.04.2-server-amd64.iso
    $ wget http://mirror.anl.gov/pub/ubuntu-iso/CDs/precise/SHA256SUMS
    $ wget http://mirror.anl.gov/pub/ubuntu-iso/CDs/precise/SHA256SUMS.gpg
    $ gpg --keyserver hkp://keyserver.ubuntu.com --recv-keys 0xFBB75451
    $ gpg --verify SHA256SUMS.gpg SHA256SUMSsha256sum -c SHA256SUMS 2>&1 | grep OK


The second option is to use the
`OpenStack Virtual Machine Image Guide <https://docs.openstack.org/image-guide/>`_.
In this case, you will want to follow your organizations OS hardening
guidelines or those provided by a trusted third-party such as the
`Linux STIGs <http://iase.disa.mil/stigs/os/unix-linux/Pages/index.aspx>`_.

The final option is to use an automated image builder. The following
example uses the Oz image builder. The OpenStack community has recently
created a newer tool worth investigating: disk-image-builder. We have
not evaluated this tool from a security perspective.

Example of RHEL 6 CCE-26976-1 which will help implement NIST 800-53
Section *AC-19(d)* in Oz.

.. code:: xml

    <template>
    <name>centos64</name>
    <os>
      <name>RHEL-6</name>
      <version>4</version>
      <arch>x86_64</arch>
      <install type='iso'>
      <iso>http://trusted_local_iso_mirror/isos/x86_64/RHEL-6.4-x86_64-bin-DVD1.iso</iso>
      </install>
      <rootpw>CHANGE THIS TO YOUR ROOT PASSWORD</rootpw>
    </os>
    <description>RHEL 6.4 x86_64</description>
    <repositories>
      <repository name='epel-6'>
      <url>http://download.fedoraproject.org/pub/epel/6/$basearch</url>
      <signed>no</signed>
      </repository>
    </repositories>
    <packages>
      <package name='epel-release'/>
      <package name='cloud-utils'/>
      <package name='cloud-init'/>
    </packages>
    <commands>
      <command name='update'>
      yum update
      yum clean all
      rm -rf /var/log/yum
      sed -i '/^HWADDR/d' /etc/sysconfig/network-scripts/ifcfg-eth0
      echo -n > /etc/udev/rules.d/70-persistent-net.rules
      echo -n > /lib/udev/rules.d/75-persistent-net-generator.rules
      chkconfig --level 0123456 autofs off
      service autofs stop
      </command>
    </commands>
    </template>

It is recommended to avoid the manual image building process as it is
complex and prone to error. Additionally, using an automated system
like Oz for image building or a configuration management utility like
Chef or Puppet for post-boot image hardening gives you the ability to
produce a consistent image as well as track compliance of your base
image to its respective hardening guidelines over time.

If subscribing to a public cloud service, you should check with the
cloud provider for an outline of the process used to produce their
default images. If the provider allows you to upload your own images,
you will want to ensure that you are able to verify that your image
was not modified before using it to create an instance. To do this,
refer to the following section on Image Signature Verification, or
the following paragraph if signatures cannot be used.

Images come from the Image service to the Compute service on a node.
This transfer should be protected by running over TLS. Once the image
is on the node, it is verified with a basic checksum and then its
disk is expanded based on the size of the instance being launched. If,
at a later time, the same image is launched with the same instance
size on this node, it is launched from the same expanded image.
Since this expanded image is not re-verified by default before
launching, it is possible that it has undergone tampering. The user
would not be aware of tampering, unless a manual inspection of the
files is performed in the resulting image.

Image signature verification
----------------------------
Several features related to image signing are now available in
OpenStack. As of the Mitaka release, the Image service can verify
these signed images, and, to provide a full chain of trust, the
Compute service has the option to perform image signature verification
prior to image boot. Successful signature validation before image
boot ensures the signed image hasn't changed. With this feature
enabled, unauthorized modification of images (e.g., modifying the
image to include malware or rootkits) can be detected.

Administrators can enable instance signature verification by setting
the ``verify_glance_signatures`` flag to ``True`` in the
``/etc/nova/nova.conf`` file. When enabled, the Compute service
automatically validates the signed instance when it is retrieved from
the Image service. If this verification fails, the boot won't occur.
The OpenStack Operations Guide provides guidance on how to create and
upload a signed image, and how to use this feature. For more
information, see `Adding Signed Images
<https://docs.openstack.org/operations-guide/ops-user-facing-operations.html#adding-signed-images>`_
in the Operations Guide.

Instance migrations
~~~~~~~~~~~~~~~~~~~

OpenStack and the underlying virtualization layers provide for
the live migration of images between OpenStack nodes, allowing
you to seamlessly perform rolling upgrades of your OpenStack
compute nodes without instance downtime. However, live
migrations also carry significant risk. To understand the risks
involved, the following are the high-level steps performed
during a live migration:

1. Start instance on destination host
2. Transfer memory
3. Stop the guest and sync disks
4. Transfer state
5. Start the guest

Live migration risks
--------------------

At various stages of the live migration process the contents of an
instances run time memory and disk are transmitted over the network
in plain text. Thus there are several risks that need to be addressed
when using live migration. The following in-exhaustive list details
some of these risks:

* *Denial of Service (DoS)*: If something fails during the migration
  process, the instance could be lost.
* *Data exposure*: Memory or disk transfers must be handled securely.
* *Data manipulation*: If memory or disk transfers are not handled
  securely, then an attacker could manipulate user data during the
  migration.
* *Code injection*: If memory or disk transfers are not handled
  securely, then an attacker could manipulate executables, either on
  disk or in memory, during the migration.

Live migration mitigations
--------------------------

There are several methods to mitigate some of the risk associated
with live migrations, the following list details some of these:

* Disable live migration
* Isolated migration network
* Encrypted live migration

Disable live migration
----------------------

At this time, live migration is enabled in OpenStack
by default. Live migrations can be disabled by adding the
following lines to the nova ``policy.json`` file:

.. code:: json

    {
        "compute_extension:admin_actions:migrate": "!",
        "compute_extension:admin_actions:migrateLive": "!",
    }

Migration network
-----------------

As a general practice, live migration traffic should be restricted
to the management security domain, see
:doc:`../introduction/security-boundaries-and-threats`.
With live migration traffic, due to its plain text nature and the fact
that you are transferring the contents of disk and memory of a running
instance, it is recommended you further separate live migration traffic
onto a dedicated network. Isolating the traffic to a dedicated network
can reduce the risk of exposure.

Encrypted live migration
------------------------

If there is a sufficient business case for keeping live migration
enabled, then libvirtd can provide encrypted tunnels for the live
migrations. However, this feature is not currently exposed in either
the OpenStack Dashboard or nova-client commands, and can only be
accessed through manual configuration of libvirtd. The live migration
process then changes to the following high-level steps:

1. Instance data is copied from the hypervisor to libvirtd.
2. An encrypted tunnel is created between libvirtd processes on both
   source and destination hosts.
3. Destination libvirtd host copies the instances back to an
   underlying hypervisor.

Monitoring, alerting, and reporting
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

As an OpenStack virtual machine is a server image able to
be replicated across hosts, best practice in logging applies
similarly between physical and virtual hosts. Operating
system-level and application-level events should be logged,
including access events to hosts and data, user additions and
removals, changes in privilege, and others as dictated by the
environment. Ideally, you can configure these logs to export to
a log aggregator that collects log events, correlates them for
analysis, and stores them for reference or further action. One
common tool to do this is an
`ELK stack, or Elasticsearch, Logstash, and Kibana <https://www.elastic.co/>`_.

These logs should be reviewed at a regular cadence such as
a live view by a network operations center (NOC), or if the
environment is not large enough to necessitate a NOC, then logs
should undergo a regular log review process.

Many times interesting events trigger an alert which is
sent to a responder for action. Frequently this alert takes the
form of an email with the messages of interest. An interesting
event could be a significant failure, or known health indicator
of a pending failure. Two common utilities for managing alerts
are `Nagios <https://www.nagios.org>`_ and
`Zabbix <https://www.zabbix.com/>`_.

Updates and patches
~~~~~~~~~~~~~~~~~~~

A hypervisor runs independent virtual machines. This
hypervisor can run in an operating system or directly on the
hardware (called baremetal). Updates to the hypervisor are not
propagated down to the virtual machines. For example, if a
deployment is using XenServer and has a set of Debian virtual
machines, an update to XenServer will not update anything
running on the Debian virtual machines.

Therefore, we recommend that clear ownership of virtual
machines be assigned, and that those owners be responsible for
the hardening, deployment, and continued functionality of the
virtual machines. We also recommend that updates be deployed on
a regular schedule. These patches should be tested in an
environment as closely resembling production as possible to
ensure both stability and resolution of the issue behind the
patch.

Firewalls and other host-based security controls
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Most common operating systems include host-based firewalls
for additional security. While we recommend that virtual
machines run as few applications as possible (to the point of
being single-purpose instances, if possible), all applications
running on a virtual machine should be profiled to determine
what system resources the application needs access to, the
lowest level of privilege required for it to run, and what the
expected network traffic is that will be going into and coming
from the virtual machine. This expected traffic should be added
to the host-based firewall as allowed traffic (or whitelisted),
along with any necessary logging and management communication
such as SSH or RDP. All other traffic should be explicitly
denied in the firewall configuration.

On Linux virtual machines, the application profile above
can be used in conjunction with a tool like
`audit2allow <http://wiki.centos.org/HowTos/SELinux#head-faa96b3fdd922004cdb988c1989e56191c257c01>`_
to build an SELinux policy that will further protect sensitive
system information on most Linux distributions. SELinux uses a
combination of users, policies and security contexts to
compartmentalize the resources needed for an application to run,
and segmenting it from other system resources that are not needed.

OpenStack provides security groups for both hosts and the
network to add defense in depth to the virtual machines in a
given project. These are similar to host-based firewalls as
they allow or deny incoming traffic based on port, protocol,
and address, however security group rules are applied to
incoming traffic only, while host-based firewall rules are
able to be applied to both incoming and outgoing traffic. It
is also possible for host and network security group rules to
conflict and deny legitimate traffic. We recommend ensuring
that security groups are configured correctly for the
networking being used. See :ref:`networking-security-groups`
in this guide for more detail.