Add attestation interface spec

Story: #2002713
Task: #22554
Change-Id: I9b89b2b7e99dfba92d8b4b2f8416bff57ee1b508
This commit is contained in:
Julia Kreger 2018-06-19 18:39:45 -07:00 committed by Leo McGann
parent 7e05695bd3
commit 4f97a6ffaa
2 changed files with 500 additions and 0 deletions

View File

@ -0,0 +1,499 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
=============================================
Attestation Interface and Keylime Integration
=============================================
https://storyboard.openstack.org/#!/story/2002713
In order to help verify that baremetal nodes are in a trustworthy
state, we are in need of an interface that allows us to take certain
actions or verification steps while proceeding along the state machine.
Some of these steps may involve calling an external attestation server,
or executing a special step during cleaning in order to ensure that a
node is owned by the attestation server.
At a high level, we need an interface of hooks. And there is no better
way than to provide a facility to execute external tooling.
Problem description
===================
Terms Glossary
--------------
In trying to bring together two unrelated services, a bit of namespace
pollution was inevitable. So for those unfamiliar with Keylime terminology and
to avoid confusion with Openstack vocabulary, we will define all the terms
needed for this spec here.
"Trusted Platform Module (TPM)" - a microcontroller within a machine which can
create and store hashes securely. All nodes looking to use Keylime for
attestation will need to have TPM 2.0.
"Integrity Measurement Architecture (IMA)" - A security subsystem in Linux
which gathers hashes of files, file metadata, and process metadata as a
'measurement' of the system. Stores the measurement in the machine's TPM. In
the context of Ironic and Keylime integration, we will need to run IMA on the
node we are attesting.
"allowlist" - A hash representing the golden state of the node. In the context
of Keylime, an allowlist is compared with an IMA measurement to see if the node
has been tampered with in an unauthorized way.
"Keylime verifier" - A component of the Keylime suite which is responsible for
comparing the allowlist to the measurement gathered from the node we are
attesting. The verifier will run on a machine external to Ironic and the node
Ironic is controlling and looking to attest.
"Keylime registrar" - A component of the Keylime suite which Ironic will need
to talk to in order to initiate the attestation workflow for a node. The
registrar also runs on a machine external to Ironic and the node. The verifier
and registrar may run on the same machine, but it is not necessary and the
decision is left to the operator.
"Keylime agent" - A component of the Keylime suite which runs on the node we
are attesting. The agent will command IMA to collect measurements and
send the measurements to the verifier.
"Keylime tenant" - An API which the Ironic conductor will need to use to
communicate with the registrar and verifier. Not to be confused with Openstack
tenants.
Introduction
------------
Presently, we rely upon a certain level of trust for users that leverage
baremetal resources. While we do perform cleaning between deployments,
a malicious attacker could potentially modify firmware of attached devices
in ways that may or may not be readily detectable.
The solution that has been proposed for this is the use of a measured launch
environments with engagement of Trusted Platform Management (TPM) modules to
help ensure that the running system profile is exactly as desired or approved,
by the attestation service.
But from a security standpoint, security is not always about code.
Sometimes security is adherence to process. To leverage TPM's for
attestation, we propose Keylime, an open source remote boot attestation and
runtime integrity measurement system.
The first step requires a new interface type 'attestation_interface'
to be added as a subclass of 'BaseDriver'. This would then come with a
'attestation_interface' implementation which would use Keylime to learn about
the security state of a node and manage configurations. All calls to the
attestation interface would happen along existing clean and deployment
workflows and simply fail transition if a node is deemed to be compromised.
The second step is a set of enhancements for the ramdisk to support TPM 2.0,
and installation of the Keylime agent. From there the Keylime agent
would communicate with the registrar and verifier. The manager would
trigger attestations at certain points along the node's workflow ex) during
the boot process. Note that in order to perform attestation, the verifier
must be within the same network as the node.
Proposed change
===============
Attestation Interface
---------------------
The addition of a ``attestation_interface`` field in the ``nodes`` table,
which maps to a `task.node.driver.attestation` interface, along with the other
standard configuration parameters and defaults behavior that exists with
the driver composition model.
Accordingly the ``attestation_interface`` would be returned on the node object
when retrieved via the REST API, and will be able to be set as another
interface.
The attestation_interface will provide a means of configuring and orchestrating
a node's connection with a verifier machine.
The Ironic controller will work under the assumption that the
network used to communicate with the attestation service is secure and
that the attestation entity is also always trustworthy. Trying to concern
ourselves with issues like replay attacks or spoofed messages is beyond
the scope of IMA attestation workflows.
To accommodate operator workflows wherein an operator may not have
access to the attestation service, we cannot allow the attestation service
perform any orchestration. This requires all communication to an
attestation service to involve the Ironic controller polling an API for a
status or instructing the attestation service or node to take action, as
opposed to receiving information from the attestation service or node
itself. For example, Keylime offers revocation frameworks for taking
action immediately upon a node being compromised. However, from
Ironic's perspective, allowing another service to do any orchestration
could put Ironic in a state where it does not know what is happening
on the node.
Presently, we are mainly concerned with monitoring deployment and
cleaning of a node. The intended workflow will be to use the interface
during these steps to ensure the firmware of a node has not been
modified.
Keylime Interface
-----------------
The Keylime interface will inherit from the AttestationInterface class. The
purpose of the interface is to allow the controller to gather relevant
information about the security state of the node and take action based on
the results. Doing so will require methods which will make calls to the
Keylime verifier through the available REST API as well as calls to the IPA
to pass necessary configuration parameters. Keylime is anticipated to be
supported by generic hardware types.
Keylime Configuration
---------------------
The Keylime verifier and Keylime registrar are two components of the Keylime
suite which must be stood up by an operator. The verifier and registrar will
need TLS connections over https in order to communicate. The Keylime tenant CLI
is installed on ironic controller. The operator will be responsible for
securing any network the registrar and verifier are setup in.
Detailed communication requirements are list as following:
Keylime tenant -> Keylime verifier: mutual TLS connection
Keylime verifier/tenant -> node: unencrypted connection
Keylime verifier/node/tenant -> registrar: mutual TLS connection for
post/put requests; unencrypted connection for get/delete requests
Every Keylime agent must have a uuid associated with it in order to register
itself with the registrar. It generates its uuid using the Keylime config
file. The uuid defaults to a random id.
Allowlist and Excludelist
-------------------------
Allowlists and Excludelists will be generated beforehand and hosted on a
remote server or in the conductor's filesystem. A filepath for the conductor's
filesystem or url to a remote server to locate such files will be supplied to
Ironic before provisioning. Allowlists may also be signed with a checksum to
ensure they have not been tampered with. Such checksums would also be
supplied to Ironic with a path or url to the file. Supplying an allowlist is
required in order to perform attestation. Excludelists are not required but
are used in a majority of Keylime use cases.
The paths of the allowlist, checksum, and excludelist can be saved in
``driver_info\keylime_allowlist``,
``driver_info\keylime_allowlist_checksum``, and
``driver_info\keylime_excludelist``.
Linux's IMA submodule gathers measurement list signed with TPM quote. The
Ironic controller will send the allowlist to the verifier using the Keylime
tenant. The Keylime verifier obtains the measurement list and performs
attestation by comparing the measurement list against allowlist.
Alternatives
------------
We could add such functionality to various interfaces, but generally
attestation will be a specific model for a deployment or portion of a
deployment, and thus we may one day have need for "vendor" specific drivers
for particular attestation solutions and workflow. As such, not creating a
new interface for this seems less ideal.
Another alternative would be to perform certain checks along state transitions.
For example, at clean time we can check the firmware and fail if things have
been modified. However, this is undesirable in a scenario where we have strict
workflows and processes we want to adhere to. In the situation where an owner
lends a node to an untrustworthy lessee the owner might want to ensure the
lessee does not perform any unexpected actions. This is also less extensible
to other workflows such as a periodic monitoring.
Data model impact
-----------------
Addition of a ``attestation_interface`` field to the node object, and this
will require a database migration to create the field. The field will
default to ``None`` which will map to a no-attestation interface.
State Machine Impact
--------------------
No impact to the state machine is expected. All calls to the new interface's
methods will take place in existing workflows driven by the state machine.
Action will be taken on a result immediately upon receiving the result.
REST API impact
---------------
The ``attestation_interface`` will be added to the node object and guarded by
an API microversion.
Client (CLI) impact
-------------------
"ironic" CLI
~~~~~~~~~~~~
None
"openstack baremetal" CLI
~~~~~~~~~~~~~~~~~~~~~~~~~
The OSC plugin will be changed accordingly to assist users in
changing the new ``attestation_interface`` field.
RPC API impact
--------------
This new ``attestation_interface`` field requires the RPC version to be
incremented.
Driver API impact
-----------------
The attestation interface methods that would be proposed would consist
of a ``no-attestation`` interface defined on a new base class
AttestationInterface.
These methods would consist of::
def validate_security_status(self, task):
"""Grabs the latest information about the node's security state
from the attestation machine. Returns nothing on success, raises
an exception if status is not what we expect or unable to reach
verifier to obtain a status.
"""
def start_attestation(self, task):
"""Grabs the allowist, allowlist checksum, and excludelist from
``driver_info`` instructions. Verifies the integrity of the allowlist
using the checksum. Attempts to send the allowlist and excludelist to
the attestation service. Sending allowlist and excludelist allows the
node to begin attesting itself. Returns nothing on success, raises an
exception if checksum does not pass or is unable to reach the
verifier to send allowlist/excludelist.
"""
def unregister_node(self, task):
"""Unregisters the node from the verifier machine. Returns
nothing on success, raises an exception if status is not what
we expect.
"""
These methods can be used during the node's cleaning and
deployment time. The action taken on a particular security state
will be configurable. Whether or not we raise an error on attestation failure
will be configurable.
A few additional variables will need to be saved as part of ``driver_info``
in order to manage the node. These include:
``driver_info\keylime_allowlist`` the allowlist for a node.
``driver_info\keylime_allowlist_checksum`` a checksum for the allowlist
to ensure the allowlist has not been tampered with.
``driver_info\keylime_excludelist`` the excludelist for a node.
``driver_info\keylime_agent_uuid`` the uuid for a Keylime agent. Needed
for querying the verifier for a security status and associating an
allowlist/excludelist pair with a node in the Keylime verifier.
Workflow
--------
With all this in mind, we have devised the following workflow for deployment/
cleaning using a Keylime implementation of the attestation interface.
Beforehand, the operator will stand up a machine with the Keylime verifier and
registrar. The user will generate their own allowlist, allowlist checksum,
and excludelist for the node. An admin may make these files available on the
same machine as the Ironic controller and pass in the filepath to
``driver_info`` or a non admin may make these files available to grab and
instead pass in a url to ``driver_info``. This step must be done before
provisioning. The operator will also pass in how to locate the Keylime
registrar and verifier to ``driver_info.``
During the image building process the node image will be set up with an
instance of the Keylime agent, as well as TPM, and IMA configurations which
will allow the Keylime agent to run. The Keylime agent will register itself
with the Keylime registrar automatically once started. At this point booting
has begun and the node may send its first heartbeat back to the Ironic
controller.
Next, start_attestation() will be called to send the allowlist and
excludelist to the verifier. The conductor will make an rpc call to the agent
to retrieve the Keylime agent's uuid, the Keylime agent's address, and the
port which the Keylime agent is listening on. The Ironic controller will save
these variables as ``driver_info\keylime_agent_uuid``,
``driver_info\keylime_agent_address``, and
``driver_info\keylime_agent_port`` for further use. If the conductor does not
receive these credentials cleaning will fail.
The allowlist and excludelist will be sent to the verifier by calling the
keylime_tenant cli programatically. Once the verifier has received the
allowlist and excludelist, attestation will begin. The verifier will
periodically poll the Keylime agent for IMA measurements and compare them
with the allowlist and excludelist to determine if the node has been tampered
with. The verifier will record the status of the node, but take no action on
the status.
At this point, the conductor may perform a validate_security_status() call to
get the status of the node. If the status is what we expect, we may proceed.
If the status is something we do not expect, or the controller is unable to
access the verifier due to network issues, we will fail the deployment.
The Keylime agent will need to be unregistered with a call to unregister_node()
to instruct the Keylime verifier to end its connection and remove the node from
its database.
Here is a diagram for the anticipated workflow:
diagram {
Image; Node; Keylime-tenant; Keylime-verifier; Keylime-registrar;
activation = none; span_height = 1; edge_length = 250;
default_note_color = white; default_fontsize = 12;
Image -> Node [label = "The node is booted with an image generated by
diskimage-builder tool. Keylime and TPM environment is setup in the image"];
Node -> Keylime-registrar [label = "Makes a post request to register the
Keylime agent on the node"];
Keylime-registrar -> Node [label = "Responses the node with an encrypted AIK"];
Node -> Keylime-registrar [label = "Makes an activation request with an
ephemeral registrar key from TPM"];
Keylime-registrar -> Node[label = "200 OK"];
Node -> Keylime-tenant [label = "First heartbeat"];
Keylime-tenant -> Keylime-tenant [label = "The allowlist and excludelist are
provided by the user to the Keylime tenant command"];
Keylime-tenant -> Keylime-verifier [label = "Sends allowlist and excludelist
and adds the Keylime agent uuid to the verifier"];
Keylime-tenant -> Node [label ="Gets TPM quote from the node to check the
Keylime agents validity with the registrar"];
Keylime-verifier -> Node [label ="Starts polling the node for verification"];
Keylime-tenant -> Keylime-verifier [label = "Gets the current status of the
node"];
}
Workflows which allow node lessees to bring their own Keylime instance in to
attest a node is theoretically possible within the framework given in this
spec. However, Keylime currently lacks certain features needed to make this
fully automated in Ironic.
Nova driver impact
------------------
None
Ramdisk impact
--------------
To have the Keylime agent work with TPM 2.0, certain libraries and
configuration must be provided. These enhancements will come as part of the
ramdisk. This includes tpm2-tss software stack, tpm2-tools utilities,
and, although not required, the tpm2-abrmd resource manager.
Keylime-agent will be setup on the ramdisk. A new dib element will be created
to install keylime-agent and make it run as a system service.
A new IPA extension will be needed to collect and send back to the conductor
the keylime_agent_uuid, keylime_agent_address, and keylime_agent_port.
Security impact
---------------
It has a positive impact on security, since we can verify if the node is
trustworthy by the attestation service.
Other end user impact
---------------------
None
Scalability impact
------------------
None
Performance Impact
------------------
None
Other deployer impact
---------------------
The ``attestation`` interface will not be enabled by default, since the default
will map to a ``no-attestation`` interface.
Config options
~~~~~~~~~~~~~~
Options for configuring whether or not cleaning and deployment
should fail in face of attestation failure will be part of the new
``[keylime]`` section
fail_clean_on_attestation_failure
Boolean to determine whether to fail clean on attestation failure
fail_deploy_on_attestation_failure
Boolean to determine whether to fail deploy on attestation failure
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
Leo McGann <ljmcgann> lmcgann@redhat.com
Danni Shi <sdanni> sdanni@redhat.com
Other contributors:
None
Work Items
----------
* Add ``attestation_interface`` database field.
* Implement base interface addition
* Implement ``no-attestation`` interface.
* Add node RPC object field
* Add API support and microversion.
* Implement Keylime attestation interface.
Dependencies
============
None
Testing
=======
Testing for this interface and basic functionality, as well as integration
testing using the ansible-keylime-tpm-emulator for TPM emulation.
Upgrades and Backwards Compatibility
====================================
No issues are anticipated.
Documentation Impact
====================
Documentation will be provided about how to use keylime-verifier and
keylime-registrar.
References
==========
https://github.com/keylime

View File

@ -0,0 +1 @@
../approved/attestation-interface.rst