Merge "Updated the User section"

This commit is contained in:
Jenkins 2016-02-23 10:56:15 +00:00 committed by Gerrit Code Review
commit def0c8e343
5 changed files with 112 additions and 124 deletions

Binary file not shown.

Before

Width:  |  Height:  |  Size: 123 KiB

After

Width:  |  Height:  |  Size: 40 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 132 KiB

After

Width:  |  Height:  |  Size: 69 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 143 KiB

After

Width:  |  Height:  |  Size: 98 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 178 KiB

View File

@ -10,66 +10,67 @@ Plugin configuration
To configure your plugin, you need to follow these steps: To configure your plugin, you need to follow these steps:
#. `Create a new environment <http://docs.mirantis.com/openstack/fuel/fuel-7.0/user-guide.html#launch-wizard-to-create-new-environment>`_ 1. `Create a new environment <http://docs.mirantis.com/openstack/fuel/fuel-8.0/user-guide.html#launch-wizard-to-create-new-environment>`_
with the Fuel web user interface. with the Fuel web user interface.
#. Click on the Settings tab of the Fuel web UI. #. Click the **Settings** tab and select the **Other** category.
#. Scroll down the page and select the LMA Infrastructure Alerting Plugin in the left column. #. Scroll down through the settings until you find the **LMA Infrastructure Alerting
The LMA Infrastructure Alerting Plugin settings screen should appear as shown below. Plugin** section. You should see a page like this.
.. image:: ../images/lma_infrastructure_alerting_settings.png .. image:: ../images/lma_infrastructure_alerting_settings.png
:width: 800 :width: 800
:align: center :align: center
#. Select the LMA Infrastructure Alerting Plugin checkbox and fill-in the required fields. #. Check the *LMA Infrastructure Alerting Plugin* box and fill-in the required fields
as indicated below.
* Change the nagiosadmin password (optional). a. Change the Nagios web interface password (recommended).
#. Check the boxes corresponding to the type of notification you would.
like to be alerted for by email (*CRITICAL*, *WARNING*, *UNKNOWN*, *RECOVERY*).
#. Specify the recipient email address for the alerts.
#. Specify the sender email address for the alerts.
#. Specify the SMTP server address and port.
#. Specify the SMTP authentication method.
#. Specify the SMTP username and password (required if the authentication method isn't *None*).
* Specify the recipient email address for the alerts. #. When you are done with the settings, scroll down to the bottom of the page and click
the **Save Settings** button.
* Specify the sender email address for the alerts. #. Click the *Nodes* tab and assign the *LMA Infrastructure Alerting* role to nodes
as shown below. You can see in this example that the *Infrastructure_Alerting*
* Specify the SMTP server address and port. role is assigned to three different nodes along with the *Elasticsearch_Kibana* role
and the *InfluxDB_Grafana* role. This means that the three plugins of the LMA toolchain
* Specify the SMTP authentication method. can be installed on the same nodes.
* Specify the SMTP username and password (required if the authentication method isn't 'None').
* Specify which types of notification should be sent by email.
#. Assign the *LMA Infrastructure Alerting* role to a node as shown in the figure below.
.. image:: ../images/lma_infrastructure_alerting_role.png .. image:: ../images/lma_infrastructure_alerting_role.png
:width: 800 :width: 800
:align: center :align: center
.. note:: Because of a bug with Fuel 7.0 (see bug `#1496328 .. note:: You can assign the *Infrastructure_Alerting* role up to three nodes.
<https://bugs.launchpad.net/fuel-plugins/+bug/1496328>`_), the UI won't let Nagios clustering for high availability requires that you assign
you assign the *LMA Infrastructure Alerting* role if at least one node is already the *Infrastructure_Alerting* role to at least three nodes. Note also that
assigned with one of the built-in roles. it is possible to add or remove a node with the *Infrastructure_Alerting*
role after deployment.
To workaround this problem, you should either remove the already assigned built-in roles or use the Fuel CLI:: #. Clik on **Apply Changes**.
$ fuel --env <environment id> node set \ #. Adjust the disk configuration if necessary (see the `Fuel User Guide
--node-id <node_id> --role=infrastructure_alerting <http://docs.mirantis.com/openstack/fuel/fuel-8.0/user-guide.html#disk-partitioning>`_
for details). By default, the *LMA Infrastructure Alerting Plugin* allocates:
#. Please take into consideration the information on the disks partitioning. * 20% of the first available disk for the operating system by honoring a range of
By default, the LMA Infrastructure Alerting Plugin allocates: 15GB minimum and 50GB maximum,
* 10GB for */var/log*,
* At least 20 GB for the Nagios data in */var/nagios*.
- 20% of the first available disk for the operating system by honoring a range of 15GB minimum and 50GB maximum. #. `Configure your environment <http://docs.mirantis.com/openstack/fuel/fuel-8.0/user-guide.html#configure-your-environment>`_
- 10GB for */var/log*.
- At least 20 GB for the Nagios data in */var/nagios*.
Please check the `Fuel User Guide <http://docs.mirantis.com/openstack/fuel/fuel-7.0/user-guide.html#disk-partitioning>`_
if you would like to change the default configuration of the disks partitioning.
#. `Configure your environment <http://docs.mirantis.com/openstack/fuel/fuel-7.0/user-guide.html#configure-your-environment>`_
as needed. as needed.
#. `Verify the networks <http://docs.mirantis.com/openstack/fuel/fuel-7.0/user-guide.html#verify-networks>`_ on the Networks tab of the Fuel web UI. #. `Verify the networks <http://docs.mirantis.com/openstack/fuel/fuel-8.0/user-guide.html#verify-networks>`_
on the Networks tab of the Fuel web UI.
#. `Deploy <http://docs.mirantis.com/openstack/fuel/fuel-7.0/user-guide.html#deploy-changes>`_ your changes. #. And finally, `Deploy <http://docs.mirantis.com/openstack/fuel/fuel-8.0/user-guide.html#deploy-changes>`_ your changes.
.. _plugin_install_verification: .. _plugin_install_verification:
@ -79,38 +80,17 @@ Plugin verification
Be aware, that depending on the number of nodes and deployment setup, Be aware, that depending on the number of nodes and deployment setup,
deploying a Mirantis OpenStack environment can typically take anything deploying a Mirantis OpenStack environment can typically take anything
from 30 minutes to several hours. But once your deployment is complete, from 30 minutes to several hours. But once your deployment is complete,
you should see a notification that looks like the following: you should see a deployment success notification message with
a link to the Nagios dashboard as shown below.
.. image:: ../images/deployment_notification.png .. image:: ../images/deployment_notification.png
:align: center :align: center
:width: 800 :width: 800
Once your deployment has completed, you should verify that Nagios is From the Fuel web UI **Dashboard** view, click on the **Nagios** link.
installed properly through checking its URL:: Once you have authenticated (username is ``nagiosadmin`` and the
password is defined in the settings of the plugin), you should be directed to
http://<HOST>:8001/ the *Nagios Home Page* as shown below.
Where *HOST* is the IP address of the node which runs the Nagios server.
.. note:: You can retrieve the IP address where Nagios is installed using
the `fuel` command line::
[root@fuel ~]# fuel nodes
id | status | name | cluster | ip | mac ....
---|----------|------------------|---------|-----------|------------------- ....
14 | ready | Untitled (20:0c) | 8 | 10.20.0.8 | 08:00:27:29:20:0c ....
13 | ready | Untitled (47:b7) | 8 | 10.20.0.4 | 08:00:27:54:47:b7 ....
... | roles | pending_roles | online | group_id
... |-----------------------------|---------------|--------|---------
... | controller | | True | 8
... | lma_infrastructure_alerting | | True | 8
Once you have authenticated to the Nagios UI (the username is ``nagiosadmin`` and the
password is defined in the settings of the plugin), you should get to this
page:
.. image:: ../images/nagios_homepage.png .. image:: ../images/nagios_homepage.png
:align: center :align: center
@ -120,45 +100,53 @@ Managing Nagios
--------------- ---------------
You can get the current status of the OpenStack environment by clicking on You can get the current status of the OpenStack environment by clicking on
the *Services* menu item: the *Services* menu item as shown below.
.. image:: ../images/nagios_services.png .. image:: ../images/nagios_services.png
:align: center :align: center
:width: 900 :width: 800
The LMA Infrastructure Alerting plugin has provisioned Nagios with all the The *LMA Infrastructure Alerting Plugin* configures Nagios for all the
hosts and services that have been deployed in the environment. The alarms (or hosts and services that have been deployed in the environment. The alarms (or
service checks in Nagios vocabulary) are configured in passive mode because service checks in Nagios terms) are created in **passive mode** as
they are received from the LMA collectors and aggregator (see the `LMA they are received from the *LMA Collector* and *Aggregator* (see the `LMA
Collector documentation <http://fuel-plugin-lma-collector.readthedocs.org/>`_ Collector documentation <http://fuel-plugin-lma-collector.readthedocs.org/>`_
for more details). for more details).
.. note:: Notifications for system and node cluster alarms are disabled by .. note:: The alert notifications for the nodes and clusters of nodes are
default because they can be triggered often while not affecting the overall disabled by default to avoid the alert fatigue and because they are not
health of the OpenStack services. If you want to enable notifications for a necessarily indicative of a condition affecting the overall health state
particular service, go to the service's details page and click on the 'Enable of an OpenStack service cluster. If you nonetheless want to enable those alerts,
notifications for this service' link in the 'Service Commands' panel. go to the service details page and click on the *Enable notifications
for this service* link within the *Service Commands* panel as shown below.
There are also two *virtual* hosts representing the service and node clusters: .. image:: ../images/nagios_enable_notifs.png
:align: center
:width: 800
* *00-global-clusters-env${ENVID}* for the service clusters like the Nova There are also two *Virtual Hosts* representing the health state of the
*service clusters* and *node clusters*:
* *00-global-clusters-env${ENVID}* for the service clusters like the Nova
cluster, the Keystone cluster, the RabbiMQ cluster and so on. cluster, the Keystone cluster, the RabbiMQ cluster and so on.
* *00-node-clusters-env${ENVID}* for the physical node clusters like the * *00-node-clusters-env${ENVID}* for the physical node clusters like the
cluster of controller nodes, the cluster of storage nodes and so on. cluster of controller nodes, the cluster of storage nodes and so on.
These additional 2 entities offer the high-level view on the healthiness of the These *Virtual Hosts* entities offer a high-level health state view for
OpenStack environment. those clusters in the OpenStack environment.
Configuring service checks on InfluxDB metrics Configuring service checks on InfluxDB metrics
---------------------------------------------- ----------------------------------------------
You could configure addtional alarms (other than those already defined in the You could configure addtional alarms (other than those already defined in the
LMA Collector) based on the metrics stored in the InfluxDB database. For *LMA Collector*) based on the metrics stored in the InfluxDB database. You
instance, if you wanted to be alerted when the system CPU usage of the could, for example, define an alert to be notified when the CPU activity for a
Elasticsearch process reaches a certain threshold, you could setup a 'warning' particular process crosses a particular threshold.
alarm at say 30% of CPU usage threshold and a 'criticial' alarm at 50% of CPU Say for example, you would like to set a 'warning'
usage threshold. The steps to define those alarms in Nagios would be as follow: alarm at 30% of system CPU usage and a 'criticial' alarm at 50% system CPU usage for the
Elasticsearch process.
The steps to define those alarms in Nagios would be as follow:
#. Connect to the *LMA Infrastructure Alerting* node. #. Connect to the *LMA Infrastructure Alerting* node.
@ -196,43 +184,43 @@ usage threshold. The steps to define those alarms in Nagios would be as follow:
Total Warnings: 0 Total Warnings: 0
Total Errors: 0 Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check Here, things look okay. No serious problems were detected during the pre-flight check.
5. Restart the Nagios server,::
#. Restart the Nagios server::
[root@node-13 ~]# /etc/init.d/nagios3 restart [root@node-13 ~]# /etc/init.d/nagios3 restart
#. Go the Nagios dashboard and verify that the service check has been added. #. Go the Nagios dashboard and verify that the service check has been added.
From there, you could define additional service checks for different hosts or
host groups using the same ``check_influx`` command.
You will just need to provide these three required arguments for defining new service checks:
From there, you can define additional service checks for different hosts or hostgroups using the same ``check_influx`` command. You just need to provide the 3 required arguments when defining the service checks: * A valid InfluxDB query that should return only one row with a single value.
Check the `InfluxDB documentation <https://influxdb.com/docs/v0.10/query_language>`_
to learn how to use the InfluxDB's query language.
* A range specification for the warning threshold.
* A range specification for the critical threshold.
* A valid InfluxDB query that should return only one row with a single value. Check the `InfluxDB documentation <https://influxdb.com/docs/v0.9/query_language/index.html>`_ to learn how to use InfluxDB query language. .. note:: Threshold ranges are defined following the `Nagios format
<https://nagios-plugins.org/doc/guidelines.html#THRESHOLDFORMAT>`_.
* A range specification for the warning threshold.
* A range specification for the critical threshold.
.. _note: Threshold ranges are defined following the `Nagios format <https://nagios-plugins.org/doc/guidelines.html#THRESHOLDFORMAT>`_.
Using an external SMTP server with STARTTLS Using an external SMTP server with STARTTLS
------------------------------------------- -------------------------------------------
If your SMTP server requires the use of STARTTLS, you need to make some If your SMTP server requires STARTTLS, you need to make some
manual adjustements to the Nagios configuration after the deployment of the manual adjustements to the Nagios configuration after the deployment of
environment has completed. To enable STARTTLS, you should have configured the SMTP your environment.
Authentication method to use either to Plain, Login or CRAM-MD5 first.
.. note:: Future versions of the LMA Infrastructure Alerting plugin will .. note:: Prior to enabling STARTTLS, you need to configure the *SMTP Authentication method*
support the configuration of STARTTLS from the Fuel UI. parameter in the plugin's settings to use either *Plain*, *Login* or *CRAM-MD5*.
#. Login to the *LMA Infrastructure Alerting* node. #. Login to the *LMA Infrastructure Alerting* node.
#. Edit the #. Edit the
``/etc/nagios3/conf.d/cmd_notify-service-by-smtp-with-long-service-output.cfg`` ``/etc/nagios3/conf.d/cmd_notify-service-by-smtp-with-long-service-output.cfg``
file to add the ``-S smtp-use-starttls`` option to the `mail` command. For file to add the ``-S smtp-use-starttls`` option to the `mail` command. For
instance:: example::
define command{ define command{
command_name notify-service-by-smtp-with-long-service-output command_name notify-service-by-smtp-with-long-service-output
@ -270,11 +258,12 @@ Authentication method to use either to Plain, Login or CRAM-MD5 first.
Troubleshooting Troubleshooting
--------------- ---------------
If you cannot access the Nagios UI, check the following: If you cannot access the Nagios UI, follow these troubleshooting tips.
#. Check if the nodes are able to connect to the Nagios server on port *8001*. #. Check that the *LMA Collector* nodes are able to connect to the Nagios
VIP address on port *8001*.
#. Check the Nagios configuration is valid:: #. Check that the Nagios configuration is valid::
[root@node-13 ~]# nagios3 -v /etc/nagios3/nagios.cfg [root@node-13 ~]# nagios3 -v /etc/nagios3/nagios.cfg
@ -283,14 +272,13 @@ If you cannot access the Nagios UI, check the following:
Total Warnings: 0 Total Warnings: 0
Total Errors: 0 Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check Here, things look okay. No serious problems were detected during the pre-flight check.
#. Check that the Nagios server is up and running:: #. Check that the Nagios server is up and running::
[root@node-13 ~]# /etc/init.d/nagios3 status [root@node-13 ~]# /etc/init.d/nagios3 status
#. If Nagios is down, start it:: #. If Nagios is down, restart it::
[root@node-13 ~]# /etc/init.d/nagios3 start [root@node-13 ~]# /etc/init.d/nagios3 start
@ -298,23 +286,23 @@ If you cannot access the Nagios UI, check the following:
[root@node-13 ~]# /etc/init.d/apache2 status [root@node-13 ~]# /etc/init.d/apache2 status
#. If Apache is down, start it:: #. If Apache is down, restart it::
[root@node-13 ~]# /etc/init.d/apache2 start [root@node-13 ~]# /etc/init.d/apache2 start
If Nagios reports some hosts or services as 'UNKNOWN: No data received for at Finally, Nagios may report a host or service state as *UNKNOWN*.
least X seconds ', it indicates that the LMA collector fails to communicate Two cases can be distinguished:
with the Nagios service:
#. First, check that the LMA Collector is running properly on these nodes * 'UNKNOWN: No datapoint have been received ever',
by following the troubleshooting instructions of the * 'UNKNOWN: No datapoint have been received over the last X seconds'.
`LMA Collector Fuel Plugin User Guide <http://fuel-plugin-lma-collector.readthedocs.org/en/latest/user/configuration.html#troubleshooting>`_.
#. Check if the nodes are able to connect to the Nagios server on port *8001*. Both cases indicate that Nagios doesn't receive regular passive checks from
the *LMA Collector*. This may be due to different problems:
If Nagios reports some hosts or services as 'UNKNOWN: No datapoint have been * The 'hekad' process of the *LMA Collector* fails to communicate with Nagios,
received ever' or 'UNKNOWN: No datapoint have been received over the last X * The 'collectd' and/or 'hekad' process of the *LMA Collector* has crashed,
seconds ', it indicates that the LMA collector fails to determine the status of * One or several alarm rules are misconfigured.
the service because either the alarm rule is misconfigured or no metric is
received. In both cases, follow the the troubleshooting instructions of the To remedy to the above situations, follow the `troubleshooting tips
`LMA Collector Fuel Plugin User Guide <http://fuel-plugin-lma-collector.readthedocs.org/en/latest/user/configuration.html#troubleshooting>`_. <http://fuel-plugin-lma-collector.readthedocs.org/en/latest/user/configuration.html#troubleshooting>`_
of the *LMA Collector Plugin User Guide*.