telemetry: reorg and cleanup data collection

- ha notes are redundant to ha-guide
- move meter definition section under notifications as that's what
it relates to
- move the configuration of standard meters to install guide.

Change-Id: Ib09dea06c609e3611a8a9d20d5f3a1c8f757a547
This commit is contained in:
gord chung 2017-02-22 14:58:59 +00:00 committed by Alexandra Settle
parent a98a174092
commit b6946421d0
4 changed files with 103 additions and 262 deletions

View File

@ -109,9 +109,8 @@ The alarm evaluation process uses the same mechanism for workload
partitioning as the central and compute agents. The
`Tooz <https://pypi.python.org/pypi/tooz>`_ library provides the
coordination within the groups of service instances. For further
information about this approach, see the section called
:ref:`Support for HA deployment of the central and compute agent services
<ha-deploy-services>`.
information about this approach, see the `high availability guide
<https://docs.openstack.org/ha-guide/controller-ha-telemetry.html>`_.
To use this workload partitioning solution set the
``evaluation_service`` option to ``default``. For more

View File

@ -42,7 +42,8 @@ Data collection
#. If polling many resources or at a high frequency, you can add additional
central and compute agents as necessary. The agents are designed to scale
horizontally. For more information see, :ref:`ha-deploy-services`.
horizontally. For more information refer to the `high availability guide
<https://docs.openstack.org/ha-guide/controller-ha-telemetry.html>`_.
Data storage
------------

View File

@ -38,6 +38,7 @@ RESTful API (deprecated in Ocata)
Notifications
~~~~~~~~~~~~~
All OpenStack services send notifications about the executed operations
or system state. Several notifications carry information that can be
metered. For example, CPU time of a VM instance created by OpenStack
@ -199,35 +200,104 @@ compute service, see
telemetry/ocata/configure_services/nova/install-nova-ubuntu.html>`__ in the
Installation Tutorials and Guides.
Middleware for the OpenStack Object Storage service
---------------------------------------------------
Meter definitions
-----------------
A subset of Object Store statistics requires additional middleware to
be installed behind the proxy of Object Store. This additional component
emits notifications containing data-flow-oriented meters, namely the
``storage.objects.(incoming|outgoing).bytes values``. The list of these
meters are listed in :ref:`telemetry-object-storage-meter`, marked with
``notification`` as origin.
The Telemetry service collects a subset of the meters by filtering
notifications emitted by other OpenStack services. You can find the meter
definitions in a separate configuration file, called
``ceilometer/meter/data/meters.yaml``. This enables
operators/administrators to add new meters to Telemetry project by updating
the ``meters.yaml`` file without any need for additional code changes.
The instructions on how to install this middleware can be found in
`Configure the Object Storage service for Telemetry
<https://docs.openstack.org/project-install-guide/
telemetry/ocata/configure_services/swift/install-swift-ubuntu.html>`__
section in the Installation Tutorials and Guides.
.. note::
Telemetry middleware
--------------------
The ``meters.yaml`` file should be modified with care. Unless intended,
do not remove any existing meter definitions from the file. Also, the
collected meters can differ in some cases from what is referenced in the
documentation.
Telemetry provides HTTP request and API endpoint counting
capability in OpenStack. This is achieved by
storing a sample for each event marked as ``audit.http.request``,
``audit.http.response``, ``http.request`` or ``http.response``.
A standard meter definition looks like:
It is recommended that these notifications be consumed as events rather
than samples to better index the appropriate values and avoid massive
load on the Metering database. If preferred, Telemetry can consume these
events as samples if the services are configured to emit ``http.*``
notifications.
.. code-block:: yaml
---
metric:
- name: 'meter name'
event_type: 'event name'
type: 'type of meter eg: gauge, cumulative or delta'
unit: 'name of unit eg: MB'
volume: 'path to a measurable value eg: $.payload.size'
resource_id: 'path to resource id eg: $.payload.id'
project_id: 'path to project id eg: $.payload.owner'
metadata: 'addiitonal key-value data describing resource'
The definition above shows a simple meter definition with some fields,
from which ``name``, ``event_type``, ``type``, ``unit``, and ``volume``
are required. If there is a match on the event type, samples are generated
for the meter.
The ``meters.yaml`` file contains the sample
definitions for all the meters that Telemetry is collecting from
notifications. The value of each field is specified by using JSON path in
order to find the right value from the notification message. In order to be
able to specify the right field you need to be aware of the format of the
consumed notification. The values that need to be searched in the notification
message are set with a JSON path starting with ``$.`` For instance, if you need
the ``size`` information from the payload you can define it like
``$.payload.size``.
A notification message may contain multiple meters. You can use ``*`` in
the meter definition to capture all the meters and generate samples
respectively. You can use wild cards as shown in the following example:
.. code-block:: yaml
---
metric:
- name: $.payload.measurements.[*].metric.[*].name
event_type: 'event_name.*'
type: 'delta'
unit: $.payload.measurements.[*].metric.[*].unit
volume: payload.measurements.[*].result
resource_id: $.payload.target
user_id: $.payload.initiator.id
project_id: $.payload.initiator.project_id
In the above example, the ``name`` field is a JSON path with matching
a list of meter names defined in the notification message.
You can use complex operations on JSON paths. In the following example,
``volume`` and ``resource_id`` fields perform an arithmetic
and string concatenation:
.. code-block:: yaml
---
metric:
- name: 'compute.node.cpu.idle.percent'
event_type: 'compute.metrics.update'
type: 'gauge'
unit: 'percent'
volume: payload.metrics[?(@.name='cpu.idle.percent')].value * 100
resource_id: $.payload.host + "_" + $.payload.nodename
You can use the ``timedelta`` plug-in to evaluate the difference in seconds
between two ``datetime`` fields from one notification.
.. code-block:: yaml
---
metric:
- name: 'compute.instance.booting.time'
event_type: 'compute.instance.create.end'
type: 'gauge'
unit: 'sec'
volume:
fields: [$.payload.created_at, $.payload.launched_at]
plugin: 'timedelta'
project_id: $.payload.tenant_id
resource_id: $.payload.instance_id
Polling
~~~~~~~
@ -378,107 +448,6 @@ The list of collected meters can be found in
compute node. If ``conductor.send_sensor_data`` is set, this
misconfiguration causes duplicated IPMI sensor samples.
.. _ha-deploy-services:
Support for HA deployment
~~~~~~~~~~~~~~~~~~~~~~~~~
Both the polling agents and notification agents can run in an HA deployment,
which means that multiple instances of these services can run in
parallel with workload partitioning among these running instances.
The `Tooz <https://pypi.python.org/pypi/tooz>`__ library provides the
coordination within the groups of service instances. Tooz supports `various
drivers <https://docs.openstack.org/developer/tooz/drivers.html>`__
including the following back end solutions:
- `Zookeeper <http://zookeeper.apache.org/>`__. Recommended solution by
the Tooz project.
- `Redis <http://redis.io/>`__. Recommended solution by the Tooz
project.
- `Memcached <http://memcached.org/>`__. Recommended for testing.
You must configure a supported Tooz driver for the HA deployment of the
Telemetry services.
For information about the required configuration options that have to be
set in the ``ceilometer.conf`` configuration file for both the central
and Compute agents, see the `Coordination section
<https://docs.openstack.org/ocata/config-reference/telemetry/telemetry-config-options.html>`__
in the OpenStack Configuration Reference.
Notification agent HA deployment
--------------------------------
Workload partitioning support is particularly useful as the pipeline processing
is handled exclusively by the notification agent now which may result
in a larger amount of load.
To enable workload partitioning by notification agent, the ``backend_url``
option must be set in the ``ceilometer.conf`` configuration file.
Additionally, ``workload_partitioning`` should be enabled in the
`Notification section <https://docs.openstack.org/ocata/config-reference/telemetry/telemetry-config-options.html>`__ in the OpenStack Configuration Reference.
The notification agent creates multiple queues to divide the workload across
all active agents. The number of queues can be controlled by the
``pipeline_processing_queues`` option in the ``ceilometer.conf`` configuration
file.
.. note::
A larger value will result in better distribution of
tasks but will also require more memory and longer startup time. It is
recommended to have a value approximately three times the number of active
notification agents. At a minimum, the value should be equal to the number
of active agents.
Polling agent HA deployment
---------------------------
.. note::
Without the ``backend_url`` option being set only one instance of
both the central and Compute agent service is able to run and
function correctly.
The availability check of the instances is provided by heartbeat
messages. When the connection with an instance is lost, the workload
will be reassigned within the remained instances in the next polling
cycle.
.. note::
``Memcached`` uses a ``timeout`` value, which should always be set
to a value that is higher than the ``heartbeat`` value set for
Telemetry.
For backward compatibility and supporting existing deployments, the
central agent configuration also supports using different configuration
files for groups of service instances of this type that are running in
parallel. For enabling this configuration set a value for the
``partitioning_group_prefix`` option in the `polling section
<https://docs.openstack.org/ocata/config-reference/telemetry/telemetry-config-options.html>`__
in the OpenStack Configuration Reference.
.. warning::
For each sub-group of the central agent pool with the same
``partitioning_group_prefix`` a disjoint subset of meters must be
polled, otherwise samples may be missing or duplicated. The list of
meters to poll can be set in the ``/etc/ceilometer/pipeline.yaml``
configuration file. For more information about pipelines see
:ref:`telemetry-data-pipelines`.
To enable the Compute agent to run multiple instances simultaneously
with workload partitioning, the ``workload_partitioning`` option has to
be set to ``True`` under the `Compute section
<https://docs.openstack.org/ocata/config-reference/telemetry/telemetry-config-options.html>`__
in the ``ceilometer.conf`` configuration file.
Send samples to Telemetry
~~~~~~~~~~~~~~~~~~~~~~~~~
@ -545,131 +514,3 @@ following command should be invoked:
| user_id | 679b0499e7a34ccb9d90b64208401f8e |
| volume | 48.0 |
+-------------------+--------------------------------------------+
.. _telemetry-meter-definitions:
Meter definitions
-----------------
The Telemetry service collects a subset of the meters by filtering
notifications emitted by other OpenStack services. You can find the meter
definitions in a separate configuration file, called
``ceilometer/meter/data/meters.yaml``. This enables
operators/administrators to add new meters to Telemetry project by updating
the ``meters.yaml`` file without any need for additional code changes.
.. note::
The ``meters.yaml`` file should be modified with care. Unless intended,
do not remove any existing meter definitions from the file. Also, the
collected meters can differ in some cases from what is referenced in the
documentation.
A standard meter definition looks like:
.. code-block:: yaml
---
metric:
- name: 'meter name'
event_type: 'event name'
type: 'type of meter eg: gauge, cumulative or delta'
unit: 'name of unit eg: MB'
volume: 'path to a measurable value eg: $.payload.size'
resource_id: 'path to resource id eg: $.payload.id'
project_id: 'path to project id eg: $.payload.owner'
metadata: 'addiitonal key-value data describing resource'
The definition above shows a simple meter definition with some fields,
from which ``name``, ``event_type``, ``type``, ``unit``, and ``volume``
are required. If there is a match on the event type, samples are generated
for the meter.
If you take a look at the ``meters.yaml`` file, it contains the sample
definitions for all the meters that Telemetry is collecting from
notifications. The value of each field is specified by using JSON path in
order to find the right value from the notification message. In order to be
able to specify the right field you need to be aware of the format of the
consumed notification. The values that need to be searched in the notification
message are set with a JSON path starting with ``$.`` For instance, if you need
the ``size`` information from the payload you can define it like
``$.payload.size``.
A notification message may contain multiple meters. You can use ``*`` in
the meter definition to capture all the meters and generate samples
respectively. You can use wild cards as shown in the following example:
.. code-block:: yaml
---
metric:
- name: $.payload.measurements.[*].metric.[*].name
event_type: 'event_name.*'
type: 'delta'
unit: $.payload.measurements.[*].metric.[*].unit
volume: payload.measurements.[*].result
resource_id: $.payload.target
user_id: $.payload.initiator.id
project_id: $.payload.initiator.project_id
In the above example, the ``name`` field is a JSON path with matching
a list of meter names defined in the notification message.
You can even use complex operations on JSON paths. In the following example,
``volume`` and ``resource_id`` fields perform an arithmetic
and string concatenation:
.. code-block:: yaml
---
metric:
- name: 'compute.node.cpu.idle.percent'
event_type: 'compute.metrics.update'
type: 'gauge'
unit: 'percent'
volume: payload.metrics[?(@.name='cpu.idle.percent')].value * 100
resource_id: $.payload.host + "_" + $.payload.nodename
You can use the ``timedelta`` plug-in to evaluate the difference in seconds
between two ``datetime`` fields from one notification.
.. code-block:: yaml
---
metric:
- name: 'compute.instance.booting.time'
event_type: 'compute.instance.create.end'
type: 'gauge'
unit: 'sec'
volume:
fields: [$.payload.created_at, $.payload.launched_at]
plugin: 'timedelta'
project_id: $.payload.tenant_id
resource_id: $.payload.instance_id
Block Storage audit script setup to get notifications
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you want to collect OpenStack Block Storage notification on demand,
you can use :command:`cinder-volume-usage-audit` from OpenStack Block Storage.
This script becomes available when you install OpenStack Block Storage,
so you can use it without any specific settings and you don't need to
authenticate to access the data. To use it, you must run this command in
the following format:
.. code-block:: console
$ cinder-volume-usage-audit \
--start_time='YYYY-MM-DD HH:MM:SS' --end_time='YYYY-MM-DD HH:MM:SS' --send_actions
This script outputs what volumes or snapshots were created, deleted, or
exists in a given period of time and some information about these
volumes or snapshots. Information about the existence and size of
volumes and snapshots is store in the Telemetry service. This data is
also stored as an event which is the recommended usage as it provides
better indexing of data.
Using this script via cron you can get notifications periodically, for
example, every 5 minutes::
*/5 * * * * /path/to/cinder-volume-usage-audit --send_actions

View File

@ -1,15 +1,15 @@
==============================
Highly available Telemetry API
==============================
==========================
Highly available Telemetry
==========================
The `Telemetry service
<https://docs.openstack.org/admin-guide/common/get-started-telemetry.html>`_
provides a data collection service and an alarming service.
Telemetry central agent
Telemetry polling agent
~~~~~~~~~~~~~~~~~~~~~~~
The Telemetry central agent can be configured to partition its polling
The Telemetry polling agent can be configured to partition its polling
workload between multiple agents. This enables high availability (HA).
Both the central and the compute agent can run in an HA deployment.