[docs] Edits the StackLight Collector plugin guide

Edits the following sections of the StackLight Collector
plugin 0.10.0 documentation:

* Introduction
* Requirements
* Prerequisites
* Limitations
* Release notes
* Licenses
* References

Change-Id: Ifd19b71bb120d46713bc2c983114ed08206cd0b9
(cherry picked from commit 988b67ed30)
This commit is contained in:
Maria Zlatkova 2016-07-14 19:11:00 +03:00
parent c4d9ff5b99
commit 8e45b8db96
7 changed files with 166 additions and 152 deletions

View File

@ -3,82 +3,78 @@
Introduction
------------
The **StackLight Collector Plugin** is used to install and configure
several software components that are used to collect and process all the
data that we think is relevant to provide deep operational insights about
your OpenStack environment. These finely integrated components are
collectively referred to as the **StackLight Collector** (or just **the Collector**).
The **StackLight Collector Plugin for Fuel** is used to install and configure
several software components that are used to collect and process all the data
that is relevant to provide deep operational insights about your OpenStack
environment. These finely integrated components are collectively referred to
as the **StackLight Collector**, or just **the Collector**.
.. note:: The Collector has evolved over time and so the term
'collector' is a little bit of a misnomer since it is
more of a **smart monitoring agent** than a mere data 'collector'.
.. note:: The Collector has evolved over time, so the term *collector* is a
little bit of a misnomer since it is more of a *smart monitoring agent*
than a mere data *collector*.
The Collecor is a key component of the so-called
The Collector is the key component of the so-called
`Logging, Monitoring and Alerting toolchain of Mirantis OpenStack
<https://launchpad.net/lma-toolchain>`_ (a.k.a StackLight).
<https://launchpad.net/lma-toolchain>`_ also known as StackLight.
.. image:: ../../images/toolchain_map.png
:align: center
:width: 90%
:width: 80%
The Collector is installed on every node of your OpenStack
environment. Each Collector is individually responsible for supporting
all the monitoring functions of your OpenStack environment for both
the operating system and the services running on the node.
Note also that the Collector running on the *primary controller*
(the controller which owns the management VIP) is called the
**Aggregator** since it performs additional aggregation and correlation
functions. The Aggregator is the central point of convergence for
all the faults and anomalies detected at the node level. The
fundamental role of the Aggregator is to issue an opinion about the
health status of your OpenStack environment at the cluster
level. As such, the Collector may be viewed as a monitoring
The Collector is installed on every node of your OpenStack environment. Each
Collector is individually responsible for supporting all the monitoring
functions of your OpenStack environment for both the operating system and the
services running on the node. The Collector running on the *primary controller*
(the controller which owns the management VIP) is called the **Aggregator**
since it performs additional aggregation and correlation functions. The
Aggregator is the central point of convergence for all the faults and
anomalies detected at the node level. The fundamental role of the Aggregator
is to issue an opinion about the health status of your OpenStack environment
at the cluster level. As such, the Collector may be viewed as a monitoring
agent for cloud infrastructure clusters.
The main building blocks of the Collector are:
The main building blocks of the Collector are as follows:
* **collectd** which comes bundled with a collection of monitoring plugins.
Some of them are standard collectd plugins while others are purpose-built
plugins written in python to perform various OpenStack services checks.
* **Heka**, `a golang data processing swiss army knife by Mozilla
<https://github.com/mozilla-services/heka>`_.
Heka supports a number of standard input and output plugins
that allows to ingest data from a variety of sources
including collectd, log files and RabbitMQ,
as well as to persist the operational data to external backend servers like
Elasticsearch, InfluxDB and Nagios for search and further processing.
* **A collection of Heka plugins** written in Lua which does
the actual data processing such as running metrics transformations,
running alarms and logs parsing.
* The **collectd** daemon, which comes bundled with a collection of monitoring
plugins. Some of them are standard collectd plugins while others are
purpose-built plugins written in Python to perform various OpenStack
services checks.
* **Heka**, `a golang data-processing multifunctional tool by Mozilla
<https://github.com/mozilla-services/heka>`_. Heka supports a number of
standard input and output plugins that allows to ingest data from a variety
of sources including collectd, log files, and RabbitMQ, as well as to
persist the operational data to external back-end servers like Elasticsearch,
InfluxDB, and Nagios for search and further processing.
* **A collection of Heka plugins** written in Lua, which perform the actual
data processing such as running metrics transformations, running alarms, and
logs parsing.
.. note:: An important function of the Collector is to normalize
the operational data into an internal `Heka message structure
<https://hekad.readthedocs.io/en/stable/message/index.html>`_
representation that can be ingested into the Heka's stream processing
pipeline. The stream processing pipeline uses matching policies to
representation that can be ingested into the Heka's stream-processing
pipeline. The stream-processing pipeline uses matching policies to
route the Heka messages to the `Lua <http://www.lua.org/>`_ plugins that
will perform the actual data computation functions.
perform the actual data-computation functions.
There are three types of Lua plugins that were developed for the Collector:
The following Lua plugins were developed for the Collector:
* The **decoder plugins** to sanitize and normalize the ingested data.
* The **filter plugins** to process the data.
* The **encoder plugins** to serialize the data that is
sent to the backend servers.
* **decoder plugins** sanitize and normalize the ingested data.
* **filter plugins** process the data.
* **encoder plugins** serialize the data that is sent to the back-end servers.
There are five types of data sent by the Collector (and the Aggregator)
to the backend servers:
The following are the types of data sent by the Collector (and the Aggregator)
to the back-end servers:
* The logs and the notifications, which are referred to as events,
sent to Elasticsearch for indexing.
* The logs and the notifications, which are referred to as events sent to
Elasticsearch for indexing.
* The metric's time-series sent to InfluxDB.
* The annotation sent to InfluxDB.
* The OpenStack environment clusters health status
sent as *passive checks* to Nagios
* The annotations sent to InfluxDB.
* The OpenStack environment clusters health status sent as *passive checks*
to Nagios.
.. note:: The annotations are like notification messages
which are exposed in Grafana. They contain information about the
anomalies and faults that have been detected by the Collector.
They basicaly contain the same information as the *passive checks*
sent to Nagios. In addition, they may contain 'hints' about what
the Collector think could be the root cause of a problem.
.. note:: The annotations are like notification messages that are exposed in
Grafana. They contain information about the anomalies and faults that have
been detected by the Collector. Annotations basically contain the same
information as the *passive checks* sent to Nagios. In addition, they may
contain hints on what can be the root cause of a problem.

View File

@ -7,7 +7,7 @@ Third-party components
++++++++++++++++++++++
+----------------------------+------------------------------------------+------------------------+
| Name | Project Web Site | License |
| Name | Project website | License |
+============================+==========================================+========================+
| Heka | https://github.com/mozilla-services/heka | Mozilla Public License |
+----------------------------+------------------------------------------+------------------------+
@ -54,7 +54,7 @@ Puppet modules
++++++++++++++
+-----------------------+-----------------------------------------------------+-----------+
| Name | Project Web Site | License |
| Name | Project website | License |
+=======================+=====================================================+===========+
| puppet-collectd | https://github.com/puppet-community/puppet-collectd | Apache v2 |
+-----------------------+-----------------------------------------------------+-----------+

View File

@ -3,9 +3,12 @@
Limitations
-----------
* The plugin is not compatible with an OpenStack environment deployed with nova-network.
The StackLight Collector plugin 0.10.0 has the following limitations:
* When you re-execute tasks on deployed nodes using the Fuel CLI, the *collectd*
processes will be restarted on these nodes during the post-deployment
phase. See `bug #1570850
<https://bugs.launchpad.net/lma-toolchain/+bug/1570850>`_ for details.
* The plugin is not compatible with an OpenStack environment deployed with
nova-network.
* When you re-execute tasks on deployed nodes using the Fuel CLI, the
*collectd* processes will be restarted on these nodes during the
post-deployment phase.
See `bug #1570850 <https://bugs.launchpad.net/lma-toolchain/+bug/1570850>`_.

View File

@ -3,9 +3,9 @@
Prerequisites
-------------
Prior to installing the StackLight Collector Plugin,
you may want to install the backend services the *collector* uses
to store the data. These backend services include:
Prior to installing the StackLight Collector plugin for Fuel, you may want to
install the back-end services the *collector* uses to store the data. These
back-end services include the following:
* Elasticsearch
* InfluxDB
@ -13,12 +13,14 @@ to store the data. These backend services include:
There are two installation options:
1. Install the backend services automatically within a Fuel environment using the Fuel Plugins listed below.
#. Install the back-end services automatically within a Fuel environment using
the following Fuel plugins:
* `StackLight Elasticsearch-Kibana Fuel Plugin Installation Guide <http://fuel-plugin-elasticsearch-kibana.readthedocs.io/en/latest/installation.html#installation-guide>`_.
* `StackLight InfluxDB-Grafana Fuel Plugin Installation Guide <http://fuel-plugin-influxdb-grafana.readthedocs.io/en/latest/installation.html#installation-guide>`_.
* `StackLight Infrastructure Alerting Fuel Plugin Installation Guide <http://fuel-plugin-lma-infrastructure-alerting.readthedocs.io/en/latest/installation.html#installation-guide>`_.
* `StackLight Elasticsearch-Kibana Fuel Plugin Installation Guide <http://fuel-plugin-elasticsearch-kibana.readthedocs.io/en/latest/installation.html#installation-guide>`_
* `StackLight InfluxDB-Grafana Fuel Plugin Installation Guide <http://fuel-plugin-influxdb-grafana.readthedocs.io/en/latest/installation.html#installation-guide>`_
* `StackLight Infrastructure Alerting Fuel Plugin Installation Guide <http://fuel-plugin-lma-infrastructure-alerting.readthedocs.io/en/latest/installation.html#installation-guide>`_
2. Install the backend services on your own outside of a Fuel environment.
Note that in this case, the installation must comply with the StackLight Collector
Plugin's :ref:`requirements <plugin_requirements>`.
#. Install the back-end services manually outside of a Fuel environment.
In this case, the installation must comply with the
:ref:`requirements <plugin_requirements>` of the StackLight Collector
plugin.

View File

@ -7,12 +7,12 @@
References
----------
* The `StackLight Collector plugin <https://github.com/openstack/fuel-plugin-lma-collector>`_ project at GitHub.
* The `StackLight Elasticsearch-Kibana plugin <https://github.com/openstack/fuel-plugin-elasticsearch-kibana>`_ project at GitHub.
* The `StackLight InfluxDB-Grafana plugin <https://github.com/openstack/fuel-plugin-influxdb-grafana>`_ project at GitHub.
* The `StackLight Infrastructure Alerting plugin <https://github.com/openstack/fuel-plugin-lma-Infrastructure-alerting>`_ project at GitHub.
* The official `Kibana documentation <https://www.elastic.co/guide/en/kibana/3.0/index.html>`_.
* The official `Elasticsearch documentation <https://www.elastic.co/guide/en/elasticsearch/reference/1.4/index.html>`_.
* The official `InfluxDB documentation <https://docs.influxdata.com/influxdb/v0.10/>`_.
* The official `Grafana documentation <http://docs.grafana.org/v2.6/>`_.
* The official `Nagios documentation <https://www.nagios.org/documentation/>`_.
* The `StackLight Collector plugin <https://github.com/openstack/fuel-plugin-lma-collector>`_ project at GitHub
* The `StackLight Elasticsearch-Kibana plugin <https://github.com/openstack/fuel-plugin-elasticsearch-kibana>`_ project at GitHub
* The `StackLight InfluxDB-Grafana plugin <https://github.com/openstack/fuel-plugin-influxdb-grafana>`_ project at GitHub
* The `StackLight Infrastructure Alerting plugin <https://github.com/openstack/fuel-plugin-lma-Infrastructure-alerting>`_ project at GitHub
* The official `Kibana documentation <https://www.elastic.co/guide/en/kibana/3.0/index.html>`_
* The official `Elasticsearch documentation <https://www.elastic.co/guide/en/elasticsearch/reference/1.4/index.html>`_
* The official `InfluxDB documentation <https://docs.influxdata.com/influxdb/v0.10/>`_
* The official `Grafana documentation <http://docs.grafana.org/v2.6/>`_
* The official `Nagios documentation <https://www.nagios.org/documentation/>`_

View File

@ -1,119 +1,126 @@
.. _release_notes:
.. raw:: latex
\pagebreak
Release notes
-------------
Version 0.10.0
++++++++++++++
* Changes
Additionally to the bug fixes, the StackLight Collector plugin 0.10.0 for Fuel
contains the following updates:
* Separate processing pipeline for logs and metrics
* Separated the processing pipeline for logs and metrics.
Prior to StackLight version 0.10.0, there was one instance of the *hekad*
process running to process both the logs and the metrics. Starting with StackLight
version 0.10.0, the processing of the logs and notifications is separated
from the processing of the metrics in two different *hekad* instances.
This allows for better performance and control of the flow when the
maximum buffer size on disk has reached a limit. With the *hekad* instance
processing the metrics, the buffering policy mandates to drop the metrics
when the maximum buffer size is reached. With the *hekad* instance
processing the logs, the buffering policy mandates to block the
entire processing pipeline. This way, we can avoid
losing logs (and notifications) when the Elasticsearch
server is inaccessible for a long period of time.
As a result, the StackLight collector has now two processes running
on the node:
Prior to StackLight version 0.10.0, there was one instance of the *hekad*
process running to process both the logs and the metrics. Starting with
StackLight version 0.10.0, the processing of the logs and notifications is
separated from the processing of the metrics in two different *hekad*
instances. This allows for better performance and control of the flow when
the maximum buffer size on disk has reached a limit. With the *hekad*
instance processing the metrics, the buffering policy mandates to drop the
metrics when the maximum buffer size is reached. With the *hekad* instance
processing the logs, the buffering policy mandates to block the entire
processing pipeline. This helps to avoid losing logs (and notifications)
when the Elasticsearch server is inaccessible for a long period of time.
As a result, the StackLight collector has now two processes running
on the node:
* One for the *log_collector* service
* One for the *metric_collector* service
* One for the *log_collector* service
* One for the *metric_collector* service
* Metrics derived from logs are aggregated by the *log_collector* service.
* The metrics derived from logs are now aggregated by the *log_collector*
service.
To avoid flooding the *metric_collector* with bursts of metrics derived
from logs, the *log_collector* service sends metrics by bulk to the
*metric_collector* service.
An example of aggregated metric derived from logs is the
`openstack_<service>_http_response_time_stats
<http://fuel-plugin-lma-collector.readthedocs.io/en/latest/appendix_b.html#api-response-times>`_.
To avoid flooding the *metric_collector* with bursts of metrics derived from
logs, the *log_collector* service sends metrics by bulk to the
*metric_collector* service. An example of aggregated metric derived from
logs is the `openstack_<service>_http_response_time_stats
<http://fuel-plugin-lma-collector.readthedocs.io/en/latest/appendix_b.html#api-response-times>`_.
* Diagnostic tool
* Added a diagnostic tool.
A diagnostic tool is now available to help diagnose problems.
The diagnostic tool checks that the toolchain is properly installed
and configured across the entire LMA toolchain. Please check the
`Diagnostic Tool
<http://fuel-plugin-lma-collector.readthedocs.io/en/latest/configuration.html#diagnostic>`_
section of the User Guide for more information.
* Bug fixes
A diagnostic tool is now available to help diagnose issues. The diagnostic
tool checks that the toolchain is properly installed and configured across
the entire LMA toolchain. For more information, see the
`Diagnostic Tool
<http://fuel-plugin-lma-collector.readthedocs.io/en/latest/configuration.html#diagnostic>`_
section of the User Guide.
Version 0.9.0
+++++++++++++
* Changes
The StackLight Collector plugin 0.9.0 for Fuel contains the following updates:
* Upgrade to Heka *0.10.0*.
* Upgraded to Heka *0.10.0*.
* Collect libvirt metrics on compute nodes.
* Added the capability to collect libvirt metrics on compute nodes.
* Detect spikes of errors in the OpenStack services logs.
* Added the capability to detect spikes of errors in the OpenStack services
logs.
* Report OpenStack workers status per node.
* Added the capability to report OpenStack workers status per node.
* Support multi-environment deployments.
* Added support for multi-environment deployments.
* Add support for Sahara logs and notifications.
* Added support for Sahara logs and notifications.
* Bug fixes
* Bug fixes:
* Reconnect to the local RabbitMQ instance if the connection has been lost
(`#1503251 <https://bugs.launchpad.net/lma-toolchain/+bug/1503251>`_).
* Added the capability to reconnect to the local RabbitMQ instance if the
connection has been lost.
See `#1503251 <https://bugs.launchpad.net/lma-toolchain/+bug/1503251>`_.
* Enable buffering for Elasticsearch, InfluxDB, Nagios and TCP outputs to reduce
congestion in the Heka pipeline (`#1488717
<https://bugs.launchpad.net/lma-toolchain/+bug/1488717>`_, `#1557388
<https://bugs.launchpad.net/lma-toolchain/+bug/1557388>`_).
* Enabled buffering for Elasticsearch, InfluxDB, Nagios and TCP outputs to
reduce congestion in the Heka pipeline.
See `#1488717 <https://bugs.launchpad.net/lma-toolchain/+bug/1488717>`_,
`#1557388 <https://bugs.launchpad.net/lma-toolchain/+bug/1557388>`_.
* Return the correct status for Nova when Midonet is used (`#1531541
<https://bugs.launchpad.net/lma-toolchain/+bug/1531541>`_).
* Fixed the status for Nova when Midonet is used.
See `#1531541 <https://bugs.launchpad.net/lma-toolchain/+bug/1531541>`_.
* Return the correct status for Neutron when Contrail is used (`#1546017
<https://bugs.launchpad.net/lma-toolchain/+bug/1546017>`_).
* Fixed the status for Neutron when Contrail is used.
See `#1546017 <https://bugs.launchpad.net/lma-toolchain/+bug/1546017>`_.
* Increase the maximum number of file descriptors (`#1543289
<https://bugs.launchpad.net/lma-toolchain/+bug/1543289>`_).
* Increased the maximum number of file descriptors.
See `#1543289 <https://bugs.launchpad.net/lma-toolchain/+bug/1543289>`_.
* Avoid spawning several hekad processes (`#1561109
<https://bugs.launchpad.net/lma-toolchain/+bug/1561109>`_).
* The spawning of several hekad processes is now avoided.
See `#1561109 <https://bugs.launchpad.net/lma-toolchain/+bug/1561109>`_.
* Remove the monitoring of individual queues of RabbitMQ (`#1549721
<https://bugs.launchpad.net/lma-toolchain/+bug/1549721>`_).
* Removed the monitoring of individual queues of RabbitMQ. See `#1549721
<https://bugs.launchpad.net/lma-toolchain/+bug/1549721>`_.
* Rotate hekad logs every 30 minutes if necessary (`#1561603
<https://bugs.launchpad.net/lma-toolchain/+bug/1561603>`_).
* Added the capability to rotate hekad logs every 30 minutes if necessary.
See `#1561603 <https://bugs.launchpad.net/lma-toolchain/+bug/1561603>`_.
Version 0.8.0
+++++++++++++
* Support for alerting in two different modes:
The StackLight Collector plugin 0.8.0 for Fuel contains the following updates:
* Email notifications.
* Added support for alerting in two different modes:
* Integration with Nagios.
* Email notifications
* Upgrade to InfluxDB 0.9.5.
* Integration with Nagios
* Upgrade to Grafana 2.5.
* Upgraded to InfluxDB 0.9.5.
* Management of the LMA collector service by Pacemaker on the controller nodes for improved reliability.
* Upgraded to Grafana 2.5.
* Management of the LMA collector service by Pacemaker on the controller nodes
for improved reliability.
* Monitoring of the LMA toolchain components (self-monitoring).
* Support for configurable alarm rules in the Collector.
* Added support for configurable alarm rules in the Collector.
Version 0.7.0
+++++++++++++
* Initial release of the plugin. This is a beta version.
The initial release of the StackLight Collector plugin. This is a beta version.

View File

@ -1,8 +1,14 @@
.. _plugin_requirements:
.. raw:: latex
\pagebreak
Requirements
------------
The StackLight Collector plugin 0.10.0 has the following requirements:
+-------------------------------------------------------+-------------------------------------------------------------------+
| Requirement | Version/Comment |
+=======================================================+===================================================================+