Merge "Updated the User section"
Before Width: | Height: | Size: 123 KiB After Width: | Height: | Size: 40 KiB |
Before Width: | Height: | Size: 132 KiB After Width: | Height: | Size: 69 KiB |
Before Width: | Height: | Size: 143 KiB After Width: | Height: | Size: 98 KiB |
After Width: | Height: | Size: 178 KiB |
|
@ -10,66 +10,67 @@ Plugin configuration
|
||||||
|
|
||||||
To configure your plugin, you need to follow these steps:
|
To configure your plugin, you need to follow these steps:
|
||||||
|
|
||||||
#. `Create a new environment <http://docs.mirantis.com/openstack/fuel/fuel-7.0/user-guide.html#launch-wizard-to-create-new-environment>`_
|
1. `Create a new environment <http://docs.mirantis.com/openstack/fuel/fuel-8.0/user-guide.html#launch-wizard-to-create-new-environment>`_
|
||||||
with the Fuel web user interface.
|
with the Fuel web user interface.
|
||||||
|
|
||||||
#. Click on the Settings tab of the Fuel web UI.
|
#. Click the **Settings** tab and select the **Other** category.
|
||||||
|
|
||||||
#. Scroll down the page and select the LMA Infrastructure Alerting Plugin in the left column.
|
#. Scroll down through the settings until you find the **LMA Infrastructure Alerting
|
||||||
The LMA Infrastructure Alerting Plugin settings screen should appear as shown below.
|
Plugin** section. You should see a page like this.
|
||||||
|
|
||||||
.. image:: ../images/lma_infrastructure_alerting_settings.png
|
.. image:: ../images/lma_infrastructure_alerting_settings.png
|
||||||
:width: 800
|
:width: 800
|
||||||
:align: center
|
:align: center
|
||||||
|
|
||||||
#. Select the LMA Infrastructure Alerting Plugin checkbox and fill-in the required fields.
|
#. Check the *LMA Infrastructure Alerting Plugin* box and fill-in the required fields
|
||||||
|
as indicated below.
|
||||||
|
|
||||||
* Change the nagiosadmin password (optional).
|
a. Change the Nagios web interface password (recommended).
|
||||||
|
#. Check the boxes corresponding to the type of notification you would.
|
||||||
|
like to be alerted for by email (*CRITICAL*, *WARNING*, *UNKNOWN*, *RECOVERY*).
|
||||||
|
#. Specify the recipient email address for the alerts.
|
||||||
|
#. Specify the sender email address for the alerts.
|
||||||
|
#. Specify the SMTP server address and port.
|
||||||
|
#. Specify the SMTP authentication method.
|
||||||
|
#. Specify the SMTP username and password (required if the authentication method isn't *None*).
|
||||||
|
|
||||||
* Specify the recipient email address for the alerts.
|
#. When you are done with the settings, scroll down to the bottom of the page and click
|
||||||
|
the **Save Settings** button.
|
||||||
|
|
||||||
* Specify the sender email address for the alerts.
|
#. Click the *Nodes* tab and assign the *LMA Infrastructure Alerting* role to nodes
|
||||||
|
as shown below. You can see in this example that the *Infrastructure_Alerting*
|
||||||
* Specify the SMTP server address and port.
|
role is assigned to three different nodes along with the *Elasticsearch_Kibana* role
|
||||||
|
and the *InfluxDB_Grafana* role. This means that the three plugins of the LMA toolchain
|
||||||
* Specify the SMTP authentication method.
|
can be installed on the same nodes.
|
||||||
|
|
||||||
* Specify the SMTP username and password (required if the authentication method isn't 'None').
|
|
||||||
|
|
||||||
* Specify which types of notification should be sent by email.
|
|
||||||
|
|
||||||
#. Assign the *LMA Infrastructure Alerting* role to a node as shown in the figure below.
|
|
||||||
|
|
||||||
.. image:: ../images/lma_infrastructure_alerting_role.png
|
.. image:: ../images/lma_infrastructure_alerting_role.png
|
||||||
:width: 800
|
:width: 800
|
||||||
:align: center
|
:align: center
|
||||||
|
|
||||||
.. note:: Because of a bug with Fuel 7.0 (see bug `#1496328
|
.. note:: You can assign the *Infrastructure_Alerting* role up to three nodes.
|
||||||
<https://bugs.launchpad.net/fuel-plugins/+bug/1496328>`_), the UI won't let
|
Nagios clustering for high availability requires that you assign
|
||||||
you assign the *LMA Infrastructure Alerting* role if at least one node is already
|
the *Infrastructure_Alerting* role to at least three nodes. Note also that
|
||||||
assigned with one of the built-in roles.
|
it is possible to add or remove a node with the *Infrastructure_Alerting*
|
||||||
|
role after deployment.
|
||||||
|
|
||||||
To workaround this problem, you should either remove the already assigned built-in roles or use the Fuel CLI::
|
#. Clik on **Apply Changes**.
|
||||||
|
|
||||||
$ fuel --env <environment id> node set \
|
#. Adjust the disk configuration if necessary (see the `Fuel User Guide
|
||||||
--node-id <node_id> --role=infrastructure_alerting
|
<http://docs.mirantis.com/openstack/fuel/fuel-8.0/user-guide.html#disk-partitioning>`_
|
||||||
|
for details). By default, the *LMA Infrastructure Alerting Plugin* allocates:
|
||||||
|
|
||||||
#. Please take into consideration the information on the disks partitioning.
|
* 20% of the first available disk for the operating system by honoring a range of
|
||||||
By default, the LMA Infrastructure Alerting Plugin allocates:
|
15GB minimum and 50GB maximum,
|
||||||
|
* 10GB for */var/log*,
|
||||||
|
* At least 20 GB for the Nagios data in */var/nagios*.
|
||||||
|
|
||||||
- 20% of the first available disk for the operating system by honoring a range of 15GB minimum and 50GB maximum.
|
#. `Configure your environment <http://docs.mirantis.com/openstack/fuel/fuel-8.0/user-guide.html#configure-your-environment>`_
|
||||||
- 10GB for */var/log*.
|
|
||||||
- At least 20 GB for the Nagios data in */var/nagios*.
|
|
||||||
|
|
||||||
Please check the `Fuel User Guide <http://docs.mirantis.com/openstack/fuel/fuel-7.0/user-guide.html#disk-partitioning>`_
|
|
||||||
if you would like to change the default configuration of the disks partitioning.
|
|
||||||
|
|
||||||
#. `Configure your environment <http://docs.mirantis.com/openstack/fuel/fuel-7.0/user-guide.html#configure-your-environment>`_
|
|
||||||
as needed.
|
as needed.
|
||||||
|
|
||||||
#. `Verify the networks <http://docs.mirantis.com/openstack/fuel/fuel-7.0/user-guide.html#verify-networks>`_ on the Networks tab of the Fuel web UI.
|
#. `Verify the networks <http://docs.mirantis.com/openstack/fuel/fuel-8.0/user-guide.html#verify-networks>`_
|
||||||
|
on the Networks tab of the Fuel web UI.
|
||||||
|
|
||||||
#. `Deploy <http://docs.mirantis.com/openstack/fuel/fuel-7.0/user-guide.html#deploy-changes>`_ your changes.
|
#. And finally, `Deploy <http://docs.mirantis.com/openstack/fuel/fuel-8.0/user-guide.html#deploy-changes>`_ your changes.
|
||||||
|
|
||||||
.. _plugin_install_verification:
|
.. _plugin_install_verification:
|
||||||
|
|
||||||
|
@ -79,38 +80,17 @@ Plugin verification
|
||||||
Be aware, that depending on the number of nodes and deployment setup,
|
Be aware, that depending on the number of nodes and deployment setup,
|
||||||
deploying a Mirantis OpenStack environment can typically take anything
|
deploying a Mirantis OpenStack environment can typically take anything
|
||||||
from 30 minutes to several hours. But once your deployment is complete,
|
from 30 minutes to several hours. But once your deployment is complete,
|
||||||
you should see a notification that looks like the following:
|
you should see a deployment success notification message with
|
||||||
|
a link to the Nagios dashboard as shown below.
|
||||||
|
|
||||||
.. image:: ../images/deployment_notification.png
|
.. image:: ../images/deployment_notification.png
|
||||||
:align: center
|
:align: center
|
||||||
:width: 800
|
:width: 800
|
||||||
|
|
||||||
Once your deployment has completed, you should verify that Nagios is
|
From the Fuel web UI **Dashboard** view, click on the **Nagios** link.
|
||||||
installed properly through checking its URL::
|
Once you have authenticated (username is ``nagiosadmin`` and the
|
||||||
|
password is defined in the settings of the plugin), you should be directed to
|
||||||
http://<HOST>:8001/
|
the *Nagios Home Page* as shown below.
|
||||||
|
|
||||||
Where *HOST* is the IP address of the node which runs the Nagios server.
|
|
||||||
|
|
||||||
.. note:: You can retrieve the IP address where Nagios is installed using
|
|
||||||
the `fuel` command line::
|
|
||||||
|
|
||||||
[root@fuel ~]# fuel nodes
|
|
||||||
id | status | name | cluster | ip | mac ....
|
|
||||||
---|----------|------------------|---------|-----------|------------------- ....
|
|
||||||
14 | ready | Untitled (20:0c) | 8 | 10.20.0.8 | 08:00:27:29:20:0c ....
|
|
||||||
13 | ready | Untitled (47:b7) | 8 | 10.20.0.4 | 08:00:27:54:47:b7 ....
|
|
||||||
|
|
||||||
... | roles | pending_roles | online | group_id
|
|
||||||
... |-----------------------------|---------------|--------|---------
|
|
||||||
... | controller | | True | 8
|
|
||||||
... | lma_infrastructure_alerting | | True | 8
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Once you have authenticated to the Nagios UI (the username is ``nagiosadmin`` and the
|
|
||||||
password is defined in the settings of the plugin), you should get to this
|
|
||||||
page:
|
|
||||||
|
|
||||||
.. image:: ../images/nagios_homepage.png
|
.. image:: ../images/nagios_homepage.png
|
||||||
:align: center
|
:align: center
|
||||||
|
@ -120,45 +100,53 @@ Managing Nagios
|
||||||
---------------
|
---------------
|
||||||
|
|
||||||
You can get the current status of the OpenStack environment by clicking on
|
You can get the current status of the OpenStack environment by clicking on
|
||||||
the *Services* menu item:
|
the *Services* menu item as shown below.
|
||||||
|
|
||||||
.. image:: ../images/nagios_services.png
|
.. image:: ../images/nagios_services.png
|
||||||
:align: center
|
:align: center
|
||||||
:width: 900
|
:width: 800
|
||||||
|
|
||||||
The LMA Infrastructure Alerting plugin has provisioned Nagios with all the
|
The *LMA Infrastructure Alerting Plugin* configures Nagios for all the
|
||||||
hosts and services that have been deployed in the environment. The alarms (or
|
hosts and services that have been deployed in the environment. The alarms (or
|
||||||
service checks in Nagios vocabulary) are configured in passive mode because
|
service checks in Nagios terms) are created in **passive mode** as
|
||||||
they are received from the LMA collectors and aggregator (see the `LMA
|
they are received from the *LMA Collector* and *Aggregator* (see the `LMA
|
||||||
Collector documentation <http://fuel-plugin-lma-collector.readthedocs.org/>`_
|
Collector documentation <http://fuel-plugin-lma-collector.readthedocs.org/>`_
|
||||||
for more details).
|
for more details).
|
||||||
|
|
||||||
.. note:: Notifications for system and node cluster alarms are disabled by
|
.. note:: The alert notifications for the nodes and clusters of nodes are
|
||||||
default because they can be triggered often while not affecting the overall
|
disabled by default to avoid the alert fatigue and because they are not
|
||||||
health of the OpenStack services. If you want to enable notifications for a
|
necessarily indicative of a condition affecting the overall health state
|
||||||
particular service, go to the service's details page and click on the 'Enable
|
of an OpenStack service cluster. If you nonetheless want to enable those alerts,
|
||||||
notifications for this service' link in the 'Service Commands' panel.
|
go to the service details page and click on the *Enable notifications
|
||||||
|
for this service* link within the *Service Commands* panel as shown below.
|
||||||
|
|
||||||
There are also two *virtual* hosts representing the service and node clusters:
|
.. image:: ../images/nagios_enable_notifs.png
|
||||||
|
:align: center
|
||||||
|
:width: 800
|
||||||
|
|
||||||
* *00-global-clusters-env${ENVID}* for the service clusters like the Nova
|
There are also two *Virtual Hosts* representing the health state of the
|
||||||
|
*service clusters* and *node clusters*:
|
||||||
|
|
||||||
|
* *00-global-clusters-env${ENVID}* for the service clusters like the Nova
|
||||||
cluster, the Keystone cluster, the RabbiMQ cluster and so on.
|
cluster, the Keystone cluster, the RabbiMQ cluster and so on.
|
||||||
|
|
||||||
* *00-node-clusters-env${ENVID}* for the physical node clusters like the
|
* *00-node-clusters-env${ENVID}* for the physical node clusters like the
|
||||||
cluster of controller nodes, the cluster of storage nodes and so on.
|
cluster of controller nodes, the cluster of storage nodes and so on.
|
||||||
|
|
||||||
These additional 2 entities offer the high-level view on the healthiness of the
|
These *Virtual Hosts* entities offer a high-level health state view for
|
||||||
OpenStack environment.
|
those clusters in the OpenStack environment.
|
||||||
|
|
||||||
Configuring service checks on InfluxDB metrics
|
Configuring service checks on InfluxDB metrics
|
||||||
----------------------------------------------
|
----------------------------------------------
|
||||||
|
|
||||||
You could configure addtional alarms (other than those already defined in the
|
You could configure addtional alarms (other than those already defined in the
|
||||||
LMA Collector) based on the metrics stored in the InfluxDB database. For
|
*LMA Collector*) based on the metrics stored in the InfluxDB database. You
|
||||||
instance, if you wanted to be alerted when the system CPU usage of the
|
could, for example, define an alert to be notified when the CPU activity for a
|
||||||
Elasticsearch process reaches a certain threshold, you could setup a 'warning'
|
particular process crosses a particular threshold.
|
||||||
alarm at say 30% of CPU usage threshold and a 'criticial' alarm at 50% of CPU
|
Say for example, you would like to set a 'warning'
|
||||||
usage threshold. The steps to define those alarms in Nagios would be as follow:
|
alarm at 30% of system CPU usage and a 'criticial' alarm at 50% system CPU usage for the
|
||||||
|
Elasticsearch process.
|
||||||
|
The steps to define those alarms in Nagios would be as follow:
|
||||||
|
|
||||||
#. Connect to the *LMA Infrastructure Alerting* node.
|
#. Connect to the *LMA Infrastructure Alerting* node.
|
||||||
|
|
||||||
|
@ -196,43 +184,43 @@ usage threshold. The steps to define those alarms in Nagios would be as follow:
|
||||||
Total Warnings: 0
|
Total Warnings: 0
|
||||||
Total Errors: 0
|
Total Errors: 0
|
||||||
|
|
||||||
Things look okay - No serious problems were detected during the pre-flight check
|
Here, things look okay. No serious problems were detected during the pre-flight check.
|
||||||
|
|
||||||
|
5. Restart the Nagios server,::
|
||||||
#. Restart the Nagios server::
|
|
||||||
|
|
||||||
[root@node-13 ~]# /etc/init.d/nagios3 restart
|
[root@node-13 ~]# /etc/init.d/nagios3 restart
|
||||||
|
|
||||||
#. Go the Nagios dashboard and verify that the service check has been added.
|
#. Go the Nagios dashboard and verify that the service check has been added.
|
||||||
|
|
||||||
|
From there, you could define additional service checks for different hosts or
|
||||||
|
host groups using the same ``check_influx`` command.
|
||||||
|
You will just need to provide these three required arguments for defining new service checks:
|
||||||
|
|
||||||
From there, you can define additional service checks for different hosts or hostgroups using the same ``check_influx`` command. You just need to provide the 3 required arguments when defining the service checks:
|
* A valid InfluxDB query that should return only one row with a single value.
|
||||||
|
Check the `InfluxDB documentation <https://influxdb.com/docs/v0.10/query_language>`_
|
||||||
|
to learn how to use the InfluxDB's query language.
|
||||||
|
* A range specification for the warning threshold.
|
||||||
|
* A range specification for the critical threshold.
|
||||||
|
|
||||||
* A valid InfluxDB query that should return only one row with a single value. Check the `InfluxDB documentation <https://influxdb.com/docs/v0.9/query_language/index.html>`_ to learn how to use InfluxDB query language.
|
.. note:: Threshold ranges are defined following the `Nagios format
|
||||||
|
<https://nagios-plugins.org/doc/guidelines.html#THRESHOLDFORMAT>`_.
|
||||||
* A range specification for the warning threshold.
|
|
||||||
|
|
||||||
* A range specification for the critical threshold.
|
|
||||||
|
|
||||||
.. _note: Threshold ranges are defined following the `Nagios format <https://nagios-plugins.org/doc/guidelines.html#THRESHOLDFORMAT>`_.
|
|
||||||
|
|
||||||
Using an external SMTP server with STARTTLS
|
Using an external SMTP server with STARTTLS
|
||||||
-------------------------------------------
|
-------------------------------------------
|
||||||
|
|
||||||
If your SMTP server requires the use of STARTTLS, you need to make some
|
If your SMTP server requires STARTTLS, you need to make some
|
||||||
manual adjustements to the Nagios configuration after the deployment of the
|
manual adjustements to the Nagios configuration after the deployment of
|
||||||
environment has completed. To enable STARTTLS, you should have configured the SMTP
|
your environment.
|
||||||
Authentication method to use either to Plain, Login or CRAM-MD5 first.
|
|
||||||
|
|
||||||
.. note:: Future versions of the LMA Infrastructure Alerting plugin will
|
.. note:: Prior to enabling STARTTLS, you need to configure the *SMTP Authentication method*
|
||||||
support the configuration of STARTTLS from the Fuel UI.
|
parameter in the plugin's settings to use either *Plain*, *Login* or *CRAM-MD5*.
|
||||||
|
|
||||||
#. Login to the *LMA Infrastructure Alerting* node.
|
#. Login to the *LMA Infrastructure Alerting* node.
|
||||||
|
|
||||||
#. Edit the
|
#. Edit the
|
||||||
``/etc/nagios3/conf.d/cmd_notify-service-by-smtp-with-long-service-output.cfg``
|
``/etc/nagios3/conf.d/cmd_notify-service-by-smtp-with-long-service-output.cfg``
|
||||||
file to add the ``-S smtp-use-starttls`` option to the `mail` command. For
|
file to add the ``-S smtp-use-starttls`` option to the `mail` command. For
|
||||||
instance::
|
example::
|
||||||
|
|
||||||
define command{
|
define command{
|
||||||
command_name notify-service-by-smtp-with-long-service-output
|
command_name notify-service-by-smtp-with-long-service-output
|
||||||
|
@ -270,11 +258,12 @@ Authentication method to use either to Plain, Login or CRAM-MD5 first.
|
||||||
Troubleshooting
|
Troubleshooting
|
||||||
---------------
|
---------------
|
||||||
|
|
||||||
If you cannot access the Nagios UI, check the following:
|
If you cannot access the Nagios UI, follow these troubleshooting tips.
|
||||||
|
|
||||||
#. Check if the nodes are able to connect to the Nagios server on port *8001*.
|
#. Check that the *LMA Collector* nodes are able to connect to the Nagios
|
||||||
|
VIP address on port *8001*.
|
||||||
|
|
||||||
#. Check the Nagios configuration is valid::
|
#. Check that the Nagios configuration is valid::
|
||||||
|
|
||||||
[root@node-13 ~]# nagios3 -v /etc/nagios3/nagios.cfg
|
[root@node-13 ~]# nagios3 -v /etc/nagios3/nagios.cfg
|
||||||
|
|
||||||
|
@ -283,14 +272,13 @@ If you cannot access the Nagios UI, check the following:
|
||||||
Total Warnings: 0
|
Total Warnings: 0
|
||||||
Total Errors: 0
|
Total Errors: 0
|
||||||
|
|
||||||
Things look okay - No serious problems were detected during the pre-flight check
|
Here, things look okay. No serious problems were detected during the pre-flight check.
|
||||||
|
|
||||||
|
|
||||||
#. Check that the Nagios server is up and running::
|
#. Check that the Nagios server is up and running::
|
||||||
|
|
||||||
[root@node-13 ~]# /etc/init.d/nagios3 status
|
[root@node-13 ~]# /etc/init.d/nagios3 status
|
||||||
|
|
||||||
#. If Nagios is down, start it::
|
#. If Nagios is down, restart it::
|
||||||
|
|
||||||
[root@node-13 ~]# /etc/init.d/nagios3 start
|
[root@node-13 ~]# /etc/init.d/nagios3 start
|
||||||
|
|
||||||
|
@ -298,23 +286,23 @@ If you cannot access the Nagios UI, check the following:
|
||||||
|
|
||||||
[root@node-13 ~]# /etc/init.d/apache2 status
|
[root@node-13 ~]# /etc/init.d/apache2 status
|
||||||
|
|
||||||
#. If Apache is down, start it::
|
#. If Apache is down, restart it::
|
||||||
|
|
||||||
[root@node-13 ~]# /etc/init.d/apache2 start
|
[root@node-13 ~]# /etc/init.d/apache2 start
|
||||||
|
|
||||||
If Nagios reports some hosts or services as 'UNKNOWN: No data received for at
|
Finally, Nagios may report a host or service state as *UNKNOWN*.
|
||||||
least X seconds ', it indicates that the LMA collector fails to communicate
|
Two cases can be distinguished:
|
||||||
with the Nagios service:
|
|
||||||
|
|
||||||
#. First, check that the LMA Collector is running properly on these nodes
|
* 'UNKNOWN: No datapoint have been received ever',
|
||||||
by following the troubleshooting instructions of the
|
* 'UNKNOWN: No datapoint have been received over the last X seconds'.
|
||||||
`LMA Collector Fuel Plugin User Guide <http://fuel-plugin-lma-collector.readthedocs.org/en/latest/user/configuration.html#troubleshooting>`_.
|
|
||||||
|
|
||||||
#. Check if the nodes are able to connect to the Nagios server on port *8001*.
|
Both cases indicate that Nagios doesn't receive regular passive checks from
|
||||||
|
the *LMA Collector*. This may be due to different problems:
|
||||||
|
|
||||||
If Nagios reports some hosts or services as 'UNKNOWN: No datapoint have been
|
* The 'hekad' process of the *LMA Collector* fails to communicate with Nagios,
|
||||||
received ever' or 'UNKNOWN: No datapoint have been received over the last X
|
* The 'collectd' and/or 'hekad' process of the *LMA Collector* has crashed,
|
||||||
seconds ', it indicates that the LMA collector fails to determine the status of
|
* One or several alarm rules are misconfigured.
|
||||||
the service because either the alarm rule is misconfigured or no metric is
|
|
||||||
received. In both cases, follow the the troubleshooting instructions of the
|
To remedy to the above situations, follow the `troubleshooting tips
|
||||||
`LMA Collector Fuel Plugin User Guide <http://fuel-plugin-lma-collector.readthedocs.org/en/latest/user/configuration.html#troubleshooting>`_.
|
<http://fuel-plugin-lma-collector.readthedocs.org/en/latest/user/configuration.html#troubleshooting>`_
|
||||||
|
of the *LMA Collector Plugin User Guide*.
|
||||||
|
|