[UG] New Troubleshooting sections

This patch adds the following articles to the
Troubleshooting section of Fuel UG:

- Restarting an OpenStack service
- Enabling debug mode
- Logging OS services
- Check OS services status

Change-Id: I498ae6f5caa5e22204a1b1403ab23928eed6810f
This commit is contained in:
Olena Logvinova 2016-09-29 13:24:23 +03:00
parent 94d4ddaeda
commit 87e8bd8401
5 changed files with 696 additions and 1 deletions

View File

@ -12,4 +12,8 @@ This section includes the following topics:
.. toctree::
:maxdepth: 3
troubleshooting/network.rst
troubleshooting/service-status.rst
troubleshooting/restart-service.rst
troubleshooting/logging.rst
troubleshooting/debug-mode.rst
troubleshooting/network.rst

View File

@ -0,0 +1,45 @@
.. _debug-mode:
==========================================
Enable debug mode for an OpenStack service
==========================================
Most OpenStack services use the same configuration options to enable the
debug logging that is also used to troubleshoot your OpenStack environment.
**To enable the debug mode for an OpenStack service:**
#. Log in to each controller node.
#. Locate and open the required OpenStack service configuration file in the
``/etc`` directory, for example, ``/etc/nova/nova.conf``.
#. In the ``DEFAULT`` section, change the value for the ``debug`` parameter
to ``True``:
.. code-block:: ini
debug = True
If the configuration file that you edit contains the ``use_syslog``
parameter, change its value to ``False``:
.. code-block:: ini
use_syslog = False
Disabling syslog will protect the Fuel Master node from overloading debug
messages.
#. Save the changes.
#. The following services require additional configuration to enable the debug
mode:
* For Cinder, edit the configuration file on *each node with Cinder role*.
* For Glance, edit two configuration files: ``/etc/glance/glance-api.conf``
and ``/etc/glance/glance-registry.conf``.
* For Ironic, edit the ``/etc/nova/nova.conf`` file of the ``nova-compute``
service configured to work with Ironic.
#. Restart the service. See :ref:`restart-service`.
.. caution:: Remember to revert the original values in the OpenStack service
configuration file when troubleshooting is done.

View File

@ -0,0 +1,58 @@
==========================
OpenStack services logging
==========================
Depending on your needs, use the following logging locations for the OpenStack
services:
* On the Fuel Master node, the log files of all OpenStack services are located
in ``/var/log/remote/<NODE_HOSTNAME_OR_IP>/SERVICE_NAME.log``.
* On each node of your environment, the log files are located in the
``/var/log/<SERVICE_NAME>-all.log`` file and the ``/var/log/<SERVICE_NAME>/``
folder. Some OpenStack services, for example, Horizon and Ironic, have only
a log folder in ``/var/log/<SERVICE_NAME>/`` and do not have a
``/var/log/<SERVICE_NAME>-all.log`` file.
Some OpenStack services have additional logging locations. The following table
lists these locations:
.. list-table::
:widths: 10 25
:header-rows: 1
* - Service name
- Log files location
* - Corosync/Pacemaker
- Fuel Master node:
* /var/log/remote/<NODE_HOSTNAME_OR_IP>/attrd.log
* /var/log/remote/<NODE_HOSTNAME_OR_IP>/crmd.log
* /var/log/remote/<NODE_HOSTNAME_OR_IP>/cib.log
* /var/log/remote/<NODE_HOSTNAME_OR_IP>/lrmd.log
* /var/log/remote/<NODE_HOSTNAME_OR_IP>/pengine.log
* - Horizon
- Controller node:
* /var/log/apache2/horizon_access.log
* /var/log/apache2/horizon_error.log
* - Keystone
- Controller node:
Since the Keystone service is available through the Apache server,
the Apache logs contain the Keystone logs:
* /var/log/apache2/error.log
* /var/log/apache2/access.log
* /var/log/apache2/keystone_wsgi_admin_access.log
* /var/log/apache2/keystone_wsgi_admin_error.log
* /var/log/apache2/keystone_wsgi_main_access.log
* /var/log/apache2/keystone_wsgi_main_error.log
* - MySQL
- Controller node:
* /var/log/syslog
* - Neutron
- Controller node:
* /var/log/openvswitch

View File

@ -0,0 +1,339 @@
.. _restart-service:
============================
Restart an OpenStack service
============================
Troubleshooting of an OpenStack service usually requires a service restart.
To restart an OpenStack service, complete the steps described in the
following table on *all controller nodes* unless indicated otherwise.
.. caution:: Before restarting a service on the next controller node,
verify that the service is up and running on the node where you
have restarted it using the :command:`service <SERVICE_NAME> status`.
.. note:: Since a resource restart requires a considerable amount of time,
some commands listed in the table below do not provide an
immediate output.
.. list-table::
:widths: 3 25
:header-rows: 1
* - Service name
- Restart procedure
* - Ceilometer
- #. Log in to a controller node CLI.
#. Restart the Ceilometer services:
.. code-block:: console
# service ceilometer-agent-central restart
# service ceilometer-api restart
# service ceilometer-agent-notification restart
# service ceilometer-collector status restart
#. Verify the status of the Ceilometer services. See
:ref:`service-status`.
#. Repeat step 1 - 3 on all controller nodes.
* - Cinder
- #. Log in to a controller node CLI.
#. Restart the Cinder services:
.. code-block:: console
# service cinder-api restart
# service cinder-scheduler restart
#. Verify the status of the Cinder services. See
:ref:`service-status`.
#. Repeat step 1 - 3 on all controller nodes.
#. On every node with Cinder role, run:
.. code-block:: console
# service cinder-volume restart
# service cinder-backup restart
#. Verify the status of the ``cinder-volume`` and ``cinder-backup``
services.
* - Corosync/Pacemaker
- #. Log in to a controller node CLI.
#. Restart the Corosync and Pacemaker services:
.. code-block:: console
# service corosync restart
# service pacemaker restart
#. Verify the status of the Corosync and Pacemaker services. See
:ref:`service-status`.
#. Repeat step 1 - 3 on all controller nodes.
* - Glance
- #. Log in to a controller node CLI.
#. Restart the Glance services:
.. code-block:: console
# service glance-api restart
# service glance-registry restart
#. Verify the status of the Glance services. See
:ref:`service-status`.
#. Repeat step 1 - 3 on all controller nodes.
* - Horizon
- Since the Horizon service is available through the Apache server,
you should restart the Apache service on all controller nodes:
#. Log in to a controller node CLI.
#. Restart the Apache server:
.. code-block:: console
# service apache2 restart
#. Verify whether the Apache service is successfully running after
restart:
.. code-block:: console
# service apache2 status
#. Verify whether the Apache ports are opened and listening:
.. code-block:: console
# netstat -nltp | egrep apache2
#. Repeat step 1 - 3 on all controller nodes.
* - Ironic
- #. Log in to a controller node CLI.
#. Restart the Ironic services:
.. code-block:: console
# service ironic-api restart
# service ironic-conductor restart
#. Verify the status of the Ironic services. See
:ref:`service-status`.
#. Repeat step 1 - 3 on all controller nodes.
#. On any controller node, run the following command for the
``nova-compute`` service configured to work with Ironic:
.. code-block:: console
# crm resource restart p_nova_compute_ironic
#. Verify the status of the ``p_nova_compute_ironic`` service.
* - Keystone
- Since the Keystone service is available through the Apache server,
complete the following steps on all controller nodes:
#. Log in to a controller node CLI.
#. Restart the Apache server:
.. code-block:: console
# service apache2 restart
#. Verify whether the Apache service is successfully running after
restart:
.. code-block:: console
# service apache2 status
#. Verify whether the Apache ports are opened and listening:
.. code-block:: console
# netstat -nltp | egrep apache2
#. Repeat step 1 - 3 on all controller nodes.
* - MySQL
- #. Log in to any controller node CLI.
#. Run the following command:
.. code-block:: console
# pcs status | grep -A1 mysql
In the output, the resource ``clone_p_mysql`` should be in the
``Started`` status.
#. Disable the ``clone_p_mysql`` resource:
.. code-block:: console
# pcs resource disable clone_p_mysqld
#. Verify that the resource ``clone_p_mysqld`` is in the ``Stopped``
status:
.. code-block:: console
# pcs status | grep -A2 mysql
It may take some time for this resource to be stopped on all
controller nodes.
#. Disable the ``clone_p_mysql`` resource:
.. code-block:: console
# pcs resource enable clone_p_mysqld
#. Verify that the resource ``clone_p_mysqld`` is in the ``Started``
status again on all controller nodes:
.. code-block:: console
# pcs status | grep -A2 mysql
.. warning:: Use the :command:`pcs` commands instead of :command:`crm`
for restarting the service.
The pcs tool correctly stops the service according to the
quorum policy preventing MySQL failures.
* - Neutron
- Use the following restart steps for the DHCP Neutron agent as an
example for all Neutron agents.
#. Log in to any controller node CLI.
#. Verify the DHCP agent status:
.. code-block:: console
# pcs resource show | grep -A1 neutron-dhcp-agent
The output should contain the list of all controllers in the
``Started`` status.
#. Stop the DHCP agent:
.. code-block:: console
# pcs resource disable clone_neutron-dhcp-agent
#. Verify the Corosync status of the DHCP agent:
.. code-block:: console
# pcs resource show | grep -A1 neutron-dhcp-agent
The output should contain the list of all controllers in the
``Stopped`` status.
#. Verify the ``neutron-dhcp-agent`` status on the OpenStack side:
.. code-block:: console
# neutron agent-list
The output table should contain the DHCP agents for every
controller node with ``xxx`` in the ``alive`` column.
#. Start the DHCP agent on every controller node:
.. code-block:: console
# pcs resource enable clone_neutron-dhcp-agent
#. Verify the DHCP agent status:
.. code-block:: console
# pcs resource show | grep -A1 neutron-dhcp-agent
The output should contain the list of all controllers in the
``Started`` status.
#. Verify the ``neutron-dhcp-agent`` status on the OpenStack side:
.. code-block:: console
# neutron agent-list
The output table should contain the DHCP agents for every
controller node with ``:-)`` in the ``alive`` column and ``True``
in the ``admin_state_up`` column.
* - Nova
- #. Log in to a controller node CLI.
#. Restart the Nova services:
.. code-block:: console
# service nova-api restart
# service nova-cert restart
# service nova-compute restart
# service nova-conductor restart
# service nova-consoleauth restart
# service nova-novncproxy restart
# service nova-scheduler restart
# service nova-spicehtml5proxy restart
# service nova-xenvncproxy restart
#. Verify the status of the Nova services. See
:ref:`service-status`.
#. Repeat step 1 - 3 on all controller nodes.
#. On every compute node, run:
.. code-block:: console
# service nova-compute restart
#. Verify the status of the ``nova-compute`` service.
* - RabbitMQ
- #. Log in to any controller node CLI.
#. Disable the RabbitMQ service:
.. code-block:: console
# pcs resource disable master_p_rabbitmq-server
#. Verify whether the service is stopped:
.. code-block:: console
# pcs status | grep -A2 rabbitmq
#. Enable the service:
.. code-block:: console
# pcs resource enable master_p_rabbitmq-server
During the startup process, the output of the :command:`pcs status`
command can show all existing RabbitMQ services in the ``Slaves``
mode.
#. Verify the service status:
.. code-block:: console
# rabbitmqctl cluster_status
In the output, the ``running_nodes`` field should contain all
controllers host names in the ``rabbit@<HOSTNAME>`` format. The
``partitions`` field should be empty.
* - Swift
- #. Log in to a controller node CLI.
#. Restart the Swift services:
.. code-block:: console
# service swift-account-auditor restart
# service swift-account restart
# service swift-account-reaper restart
# service swift-account-replicator restart
# service swift-container-auditor restart
# service swift-container restart
# service swift-container-reconciler restart
# service swift-container-replicator restart
# service swift-container-sync restart
# service swift-container-updater restart
# service swift-object-auditor restart
# service swift-object restart
# service swift-object-reconstructor restart
# service swift-object-replicator restart
# service swift-object-updater restart
# service swift-proxy restart
#. Verify the status of the Swift services. See
:ref:`service-status`.
#. Repeat step 1 - 3 on all controller nodes.

View File

@ -0,0 +1,249 @@
.. _service-status:
==================================
Verify an OpenStack service status
==================================
To ensure that an OpenStack service is up and running, verify the service
status on *every controller node*. Some OpenStack services require additional
verification on the non-controller nodes. The following table describes the
verification steps for the common OpenStack services.
.. note:: In the table below, the output of the
:command:`service <SERVICE_NAME> status` command should contain the
service status and the process ID unless indicated otherwise.
For example, ``neutron-server start/running, process 283``.
.. list-table:: **Verifying an OpenStack service status**
:widths: 3 25
:header-rows: 1
* - Service name
- Verification procedure
* - Ceilometer
- #. On every MongoDB node, run:
.. code-block:: console
# service mongodb status
# netstat -nltp | grep mongo
The output of the :command:`netstat` command returns the
management and local IP addresses and ports in the
``LISTEN`` status.
#. On every controller node, run:
.. code-block:: console
# service ceilometer-agent-central status
# service ceilometer-api status
# service ceilometer-agent-notification status
# service ceilometer-collector status
#. On every compute node, run:
.. code-block:: console
# service ceilometer-polling status
#. On any controller node, run :command:`pcs status | grep ceilometer`
or :command:`crm status | grep ceilometer` to verify which node is
currently handling requests and their status. The output should
contain the node ID and the ``Started`` status.
* - Cinder
- #. On every controller node, run:
.. code-block:: console
# service cinder-api status
# service cinder-scheduler status
#. On every node with the Cinder role, run:
.. code-block:: console
# service cinder-volume status
# service cinder-backup status
* - Corosync/Pacemaker
- On every controller node:
#. Run :command:`service corosync status` and
:command:`service pacemaker status`.
#. Verify the output of the :command:`pcs status` or
:command:`crm status` command. The ``Online`` field should contain
all the controllers' host names.
#. Verify the output of the :command:`pcs resource show` or
:command:`crm resource show` command. All resources should be
``Started``.
* - Glance
- On every controller node, run:
.. code-block:: console
# service glance-api status
# service glance-registry status
* - Heat
- #. On any controller node, verify the status of Heat engines:
.. code-block:: console
# source openrc
# heat service-list
The output should contain the table with a list of the Heat engines
for all controller nodes in the ``up`` status.
#. On every controller node, run:
.. code-block:: console
# service heat-api status
# service heat-api-cfn status
# service heat-api-cloudwatch status
# service heat-engine status
* - Horizon
- Since the Horizon service is available through the Apache server,
you should verify the Apache service status as well. Complete the
following steps on all controller nodes:
#. Verify whether the Apache service is running using the
:command:`service apache2 status` command.
#. Verify whether the Horizon ports are opened and listening using the
:command:`netstat -nltp | egrep ':80|:443'` command. The output
should contain the management and local IP addresses with either
port 80 or 443 in the ``LISTEN`` status.
* - Ironic
- #. On every controller node, run :command:`service ironic-api status`.
#. On every Ironic node, run :command:`service ironic-conductor status`.
#. On any controller node, run :command:`pcs status | grep ironic`.
The output should contain the name or ID of the node where the
``p_nova_compute_ironic`` resource is running.
* - Keystone
- Since the Keystone service is available through the Apache server,
you should verify the Apache service status as well. Complete the
following steps on all controller nodes (and the nodes with the
Keystone role if any):
#. Verify whether the Apache service is running using
:command:`service apache2 status`.
#. Verify whether the Keystone ports are opened and listening using
:command:`netstat -nltp | egrep '5000|35357'`. The output should
contain the management and local IP addresses with the ports 5000
and 35357 in the ``LISTEN`` status.
* - MySQL/Galera
- On any controller node:
#. Verify the output of the :command:`pcs status|grep -A1 clone_p_mysql`
or :command:`crm status|grep -A1 clone_p_mysql` command. The resource
``clone_p_mysqld`` should be in the ``Started`` status for all
controllers.
#. Verify the output of the
:command:`mysql -e "show status" | egrep 'wsrep_(local_state|incoming_address)'`
command. The ``wsrep_local_state_comment`` variable should be
``Synced``, the ``wsrep_incoming_address`` field should contain all
IP addresses of the controller nodes (in the management network).
* - Neutron
- #. On every compute node, run:
.. code-block:: console
# service neutron-openvswitch-agent status
#. On every controller node:
#. Verify the ``neutron-server`` service status:
.. code-block:: console
# service neutron-server status
#. Verify the statuses of the Neutron agents:
.. code-block:: console
# service neutron-metadata-agent status
# service neutron-dhcp-agent status
# service neutron-l3-agent status
# service neutron-openvswitch-agent status
#. On any controller node:
#. Verify the states of the Neutron agents:
.. code-block:: console
# source openrc
# neutron agent-list
The output table should list all the Neutron agents with the
``:-)`` value in the ``alive`` column and the ``True`` value in
the ``admin_state_up`` column.
#. Verify the Corosync/Pacemaker status:
.. code-block:: console
# pcs status | grep -A2 neutron
The output should contain the Neutron resources in the ``Started``
status for all controller nodes.
* - Nova
- * Using the Fuel CLI:
#. On every controller node, run:
.. code-block:: console
# service nova-api status
# service nova-cert status
# service nova-compute status
# service nova-conductor status
# service nova-consoleauth status
# service nova-novncproxy status
# service nova-scheduler status
# service nova-spicehtml5proxy status
# service nova-xenvncproxy status
#. On every compute node, run :command:`service nova-compute status`.
* Using the Nova CLI:
.. code-block:: console
# source openrc
# nova service-list
The output should contain the table with the Nova services list. The
services status should be ``enabled``, their state should be ``up``.
* - RabbitMQ
- * On any controller node, run :command:`rabbitmqctl cluster_status`.
In the output, the ``running_nodes`` field should contain all the
controllers host names in the ``rabbit@<HOSTNAME>`` format. The
``partitions`` field should be empty.
* - Swift
- * On every controller node, run:
.. code-block:: console
# service swift-account-auditor status
# service swift-account status
# service swift-account-reaper status
# service swift-account-replicator status
# service swift-container-auditor status
# service swift-container status
# service swift-container-reconciler status
# service swift-container-replicator status
# service swift-container-sync status
# service swift-container-updater status
# service swift-object-auditor status
# service swift-object status
# service swift-object-reconstructor status
# service swift-object-replicator status
# service swift-object-updater status
# service swift-proxy status
.. seealso:: :ref:`restart-service`