[UG] New Troubleshooting sections

This patch adds the following articles to the Troubleshooting section of Fuel UG: - Restarting an OpenStack service - Enabling debug mode - Logging OS services - Check OS services status Change-Id: I498ae6f5caa5e22204a1b1403ab23928eed6810f
2016-09-29 13:24:23 +03:00 · 2016-09-29 13:24:23 +03:00 · 87e8bd8401
parent 94d4ddaeda
commit 87e8bd8401
5 changed files with 696 additions and 1 deletions
--- a/userdocs/fuel-user-guide/troubleshooting.rst
+++ b/userdocs/fuel-user-guide/troubleshooting.rst
@ -12,4 +12,8 @@ This section includes the following topics:
 .. toctree::
   :maxdepth: 3

-   troubleshooting/network.rst
+   troubleshooting/service-status.rst
+   troubleshooting/restart-service.rst
+   troubleshooting/logging.rst
+   troubleshooting/debug-mode.rst
+   troubleshooting/network.rst
--- a/userdocs/fuel-user-guide/troubleshooting/debug-mode.rst
+++ b/userdocs/fuel-user-guide/troubleshooting/debug-mode.rst
@ -0,0 +1,45 @@
+.. _debug-mode:
+
+==========================================
+Enable debug mode for an OpenStack service
+==========================================
+
+Most OpenStack services use the same configuration options to enable the
+debug logging that is also used to troubleshoot your OpenStack environment.
+
+**To enable the debug mode for an OpenStack service:**
+
+#. Log in to each controller node.
+#. Locate and open the required OpenStack service configuration file in the
+   ``/etc`` directory, for example, ``/etc/nova/nova.conf``.
+#. In the ``DEFAULT`` section, change the value for the ``debug`` parameter
+   to ``True``:
+
+   .. code-block:: ini
+
+    debug = True
+
+   If the configuration file that you edit contains the ``use_syslog``
+   parameter, change its value to ``False``:
+
+   .. code-block:: ini
+
+    use_syslog = False
+
+   Disabling syslog will protect the Fuel Master node from overloading debug
+   messages.
+
+#. Save the changes.
+#. The following services require additional configuration to enable the debug
+   mode:
+
+   * For Cinder, edit the configuration file on *each node with Cinder role*.
+   * For Glance, edit two configuration files: ``/etc/glance/glance-api.conf``
+     and ``/etc/glance/glance-registry.conf``.
+   * For Ironic, edit the ``/etc/nova/nova.conf`` file of the ``nova-compute``
+     service configured to work with Ironic.
+
+#. Restart the service. See :ref:`restart-service`.
+
+.. caution:: Remember to revert the original values in the OpenStack service
+             configuration file when troubleshooting is done.
--- a/userdocs/fuel-user-guide/troubleshooting/logging.rst
+++ b/userdocs/fuel-user-guide/troubleshooting/logging.rst
@ -0,0 +1,58 @@
+==========================
+OpenStack services logging
+==========================
+
+Depending on your needs, use the following logging locations for the OpenStack
+services:
+
+* On the Fuel Master node, the log files of all OpenStack services are located
+  in ``/var/log/remote/<NODE_HOSTNAME_OR_IP>/SERVICE_NAME.log``.
+
+* On each node of your environment, the log files are located in the
+  ``/var/log/<SERVICE_NAME>-all.log`` file and the ``/var/log/<SERVICE_NAME>/``
+  folder. Some OpenStack services, for example, Horizon and Ironic, have only
+  a log folder in ``/var/log/<SERVICE_NAME>/`` and do not have a
+  ``/var/log/<SERVICE_NAME>-all.log`` file.
+
+Some OpenStack services have additional logging locations. The following table
+lists these locations:
+
+.. list-table::
+   :widths: 10 25
+   :header-rows: 1
+
+   * - Service name
+     - Log files location
+   * - Corosync/Pacemaker
+     - Fuel Master node:
+
+       * /var/log/remote/<NODE_HOSTNAME_OR_IP>/attrd.log
+       * /var/log/remote/<NODE_HOSTNAME_OR_IP>/crmd.log
+       * /var/log/remote/<NODE_HOSTNAME_OR_IP>/cib.log
+       * /var/log/remote/<NODE_HOSTNAME_OR_IP>/lrmd.log
+       * /var/log/remote/<NODE_HOSTNAME_OR_IP>/pengine.log
+   * - Horizon
+     - Controller node:
+
+       * /var/log/apache2/horizon_access.log
+       * /var/log/apache2/horizon_error.log
+   * - Keystone
+     - Controller node:
+
+       Since the Keystone service is available through the Apache server,
+       the Apache logs contain the Keystone logs:
+
+       * /var/log/apache2/error.log
+       * /var/log/apache2/access.log
+       * /var/log/apache2/keystone_wsgi_admin_access.log
+       * /var/log/apache2/keystone_wsgi_admin_error.log
+       * /var/log/apache2/keystone_wsgi_main_access.log
+       * /var/log/apache2/keystone_wsgi_main_error.log
+   * - MySQL
+     - Controller node:
+
+       * /var/log/syslog
+   * - Neutron
+     - Controller node:
+
+       * /var/log/openvswitch
--- a/userdocs/fuel-user-guide/troubleshooting/restart-service.rst
+++ b/userdocs/fuel-user-guide/troubleshooting/restart-service.rst
@ -0,0 +1,339 @@
+.. _restart-service:
+
+============================
+Restart an OpenStack service
+============================
+
+Troubleshooting of an OpenStack service usually requires a service restart.
+To restart an OpenStack service, complete the steps described in the
+following table on *all controller nodes* unless indicated otherwise.
+
+.. caution:: Before restarting a service on the next controller node,
+             verify that the service is up and running on the node where you
+             have restarted it using the :command:`service <SERVICE_NAME> status`.
+
+.. note:: Since a resource restart requires a considerable amount of time,
+          some commands listed in the table below do not provide an
+          immediate output.
+
+.. list-table::
+   :widths: 3 25
+   :header-rows: 1
+
+   * - Service name
+     - Restart procedure
+   * - Ceilometer
+     - #. Log in to a controller node CLI.
+       #. Restart the Ceilometer services:
+
+          .. code-block:: console
+
+           # service ceilometer-agent-central restart
+           # service ceilometer-api restart
+           # service ceilometer-agent-notification restart
+           # service ceilometer-collector status restart
+
+       #. Verify the status of the Ceilometer services. See
+          :ref:`service-status`.
+       #. Repeat step 1 - 3 on all controller nodes.
+   * - Cinder
+     - #. Log in to a controller node CLI.
+       #. Restart the Cinder services:
+
+          .. code-block:: console
+
+           # service cinder-api restart
+           # service cinder-scheduler restart
+
+       #. Verify the status of the Cinder services. See
+          :ref:`service-status`.
+       #. Repeat step 1 - 3 on all controller nodes.
+       #. On every node with Cinder role, run:
+
+          .. code-block:: console
+
+           # service cinder-volume restart
+           # service cinder-backup restart
+
+       #. Verify the status of the ``cinder-volume`` and ``cinder-backup``
+          services.
+   * - Corosync/Pacemaker
+     - #. Log in to a controller node CLI.
+       #. Restart the Corosync and Pacemaker services:
+
+          .. code-block:: console
+
+           # service corosync restart
+           # service pacemaker restart
+
+       #. Verify the status of the Corosync and Pacemaker services. See
+          :ref:`service-status`.
+       #. Repeat step 1 - 3 on all controller nodes.
+   * - Glance
+     - #. Log in to a controller node CLI.
+       #. Restart the Glance services:
+
+          .. code-block:: console
+
+           # service glance-api restart
+           # service glance-registry restart
+
+       #. Verify the status of the Glance services. See
+          :ref:`service-status`.
+       #. Repeat step 1 - 3 on all controller nodes.
+   * - Horizon
+     - Since the Horizon service is available through the Apache server,
+       you should restart the Apache service on all controller nodes:
+
+       #. Log in to a controller node CLI.
+       #. Restart the Apache server:
+
+          .. code-block:: console
+
+           # service apache2 restart
+
+       #. Verify whether the Apache service is successfully running after
+          restart:
+
+          .. code-block:: console
+
+           # service apache2 status
+
+       #. Verify whether the Apache ports are opened and listening:
+
+          .. code-block:: console
+
+           # netstat -nltp | egrep apache2
+
+       #. Repeat step 1 - 3 on all controller nodes.
+   * - Ironic
+     - #. Log in to a controller node CLI.
+       #. Restart the Ironic services:
+
+          .. code-block:: console
+
+           # service ironic-api restart
+           # service ironic-conductor restart
+
+       #. Verify the status of the Ironic services. See
+          :ref:`service-status`.
+       #. Repeat step 1 - 3 on all controller nodes.
+       #. On any controller node, run the following command for the
+          ``nova-compute`` service configured to work with Ironic:
+
+          .. code-block:: console
+
+           # crm resource restart p_nova_compute_ironic
+
+       #. Verify the status of the ``p_nova_compute_ironic`` service.
+
+   * - Keystone
+     - Since the Keystone service is available through the Apache server,
+       complete the following steps on all controller nodes:
+
+       #. Log in to a controller node CLI.
+       #. Restart the Apache server:
+
+          .. code-block:: console
+
+           # service apache2 restart
+
+       #. Verify whether the Apache service is successfully running after
+          restart:
+
+          .. code-block:: console
+
+           # service apache2 status
+
+       #. Verify whether the Apache ports are opened and listening:
+
+          .. code-block:: console
+
+           # netstat -nltp | egrep apache2
+
+       #. Repeat step 1 - 3 on all controller nodes.
+
+   * - MySQL
+     - #. Log in to any controller node CLI.
+       #. Run the following command:
+
+          .. code-block:: console
+
+           # pcs status | grep -A1 mysql
+
+          In the output, the resource ``clone_p_mysql`` should be in the
+          ``Started`` status.
+       #. Disable the ``clone_p_mysql`` resource:
+
+          .. code-block:: console
+
+           # pcs resource disable clone_p_mysqld
+
+       #. Verify that the resource ``clone_p_mysqld`` is in the ``Stopped``
+          status:
+
+          .. code-block:: console
+
+           # pcs status | grep -A2 mysql
+
+          It may take some time for this resource to be stopped on all
+          controller nodes.
+       #. Disable the ``clone_p_mysql`` resource:
+
+          .. code-block:: console
+
+           # pcs resource enable clone_p_mysqld
+
+       #. Verify that the resource ``clone_p_mysqld`` is in the ``Started``
+          status again on all controller nodes:
+
+          .. code-block:: console
+
+           # pcs status | grep -A2 mysql
+
+       .. warning:: Use the :command:`pcs` commands instead of :command:`crm`
+                    for restarting the service.
+                    The pcs tool correctly stops the service according to the
+                    quorum policy preventing MySQL failures.
+   * - Neutron
+     - Use the following restart steps for the DHCP Neutron agent as an
+       example for all Neutron agents.
+
+       #. Log in to any controller node CLI.
+       #. Verify the DHCP agent status:
+
+          .. code-block:: console
+
+           # pcs resource show | grep -A1 neutron-dhcp-agent
+ 
+          The output should contain the list of all controllers in the
+          ``Started`` status.
+       #. Stop the DHCP agent:
+
+          .. code-block:: console
+
+           # pcs resource disable clone_neutron-dhcp-agent
+
+       #. Verify the Corosync status of the DHCP agent:
+
+          .. code-block:: console
+
+           # pcs resource show | grep -A1 neutron-dhcp-agent
+
+          The output should contain the list of all controllers in the
+          ``Stopped`` status.
+       #. Verify the ``neutron-dhcp-agent`` status on the OpenStack side:
+
+          .. code-block:: console
+
+           # neutron agent-list
+
+          The output table should contain the DHCP agents for every
+          controller node  with ``xxx`` in the ``alive`` column.
+       #. Start the DHCP agent on every controller node:
+
+          .. code-block:: console
+
+           # pcs resource enable clone_neutron-dhcp-agent
+
+       #. Verify the DHCP agent status:
+
+          .. code-block:: console
+
+           # pcs resource show | grep -A1 neutron-dhcp-agent
+ 
+          The output should contain the list of all controllers in the
+          ``Started`` status.
+       #. Verify the ``neutron-dhcp-agent`` status on the OpenStack side:
+
+          .. code-block:: console
+
+           # neutron agent-list
+
+          The output table should contain the DHCP agents for every
+          controller node  with ``:-)`` in the ``alive`` column and ``True``
+          in the ``admin_state_up`` column.
+   * - Nova
+     - #. Log in to a controller node CLI.
+       #. Restart the Nova services:
+
+          .. code-block:: console
+
+           # service nova-api restart
+           # service nova-cert restart
+           # service nova-compute restart
+           # service nova-conductor restart
+           # service nova-consoleauth restart
+           # service nova-novncproxy restart
+           # service nova-scheduler restart
+           # service nova-spicehtml5proxy restart
+           # service nova-xenvncproxy restart
+
+       #. Verify the status of the Nova services. See
+          :ref:`service-status`.
+       #. Repeat step 1 - 3 on all controller nodes.
+       #. On every compute node, run:
+
+          .. code-block:: console
+
+           # service nova-compute restart
+       #. Verify the status of the ``nova-compute`` service.
+   * - RabbitMQ
+     - #. Log in to any controller node CLI.
+       #. Disable the RabbitMQ service:
+
+          .. code-block:: console
+
+           # pcs resource disable master_p_rabbitmq-server
+
+       #. Verify whether the service is stopped:
+
+          .. code-block:: console
+
+           # pcs status | grep -A2 rabbitmq
+
+       #. Enable the service:
+
+          .. code-block:: console
+
+           # pcs resource enable master_p_rabbitmq-server
+
+          During the startup process, the output of the :command:`pcs status`
+          command can show all existing RabbitMQ services in the ``Slaves``
+          mode.
+
+       #. Verify the service status:
+
+          .. code-block:: console
+
+           # rabbitmqctl cluster_status
+
+          In the output, the ``running_nodes`` field should contain all
+          controllers’ host names in the ``rabbit@<HOSTNAME>`` format. The
+          ``partitions`` field should be empty.
+   * - Swift
+     - #. Log in to a controller node CLI.
+       #. Restart the Swift services:
+
+          .. code-block:: console
+
+           # service swift-account-auditor restart
+           # service swift-account restart
+           # service swift-account-reaper restart
+           # service swift-account-replicator restart
+           # service swift-container-auditor restart
+           # service swift-container restart
+           # service swift-container-reconciler restart
+           # service swift-container-replicator restart
+           # service swift-container-sync restart
+           # service swift-container-updater restart
+           # service swift-object-auditor restart
+           # service swift-object restart
+           # service swift-object-reconstructor restart
+           # service swift-object-replicator restart
+           # service swift-object-updater restart
+           # service swift-proxy restart
+
+       #. Verify the status of the Swift services. See
+          :ref:`service-status`.
+       #. Repeat step 1 - 3 on all controller nodes.
--- a/userdocs/fuel-user-guide/troubleshooting/service-status.rst
+++ b/userdocs/fuel-user-guide/troubleshooting/service-status.rst
@ -0,0 +1,249 @@
+.. _service-status:
+
+==================================
+Verify an OpenStack service status
+==================================
+
+To ensure that an OpenStack service is up and running, verify the service
+status on *every controller node*. Some OpenStack services require additional
+verification on the non-controller nodes. The following table describes the
+verification steps for the common OpenStack services.
+
+.. note:: In the table below, the output of the
+          :command:`service <SERVICE_NAME> status` command should contain the
+          service status and the process ID unless indicated otherwise.
+          For example, ``neutron-server start/running, process 283``.
+
+.. list-table:: **Verifying an OpenStack service status**
+   :widths: 3 25
+   :header-rows: 1
+
+   * - Service name
+     - Verification procedure
+   * - Ceilometer
+     - #. On every MongoDB node, run:
+
+          .. code-block:: console
+
+           # service mongodb status
+           # netstat -nltp | grep mongo
+
+          The output of the :command:`netstat` command returns the
+          management and local IP addresses and ports in the
+          ``LISTEN`` status.
+
+       #. On every controller node, run:
+
+          .. code-block:: console
+
+           # service ceilometer-agent-central status
+           # service ceilometer-api status
+           # service ceilometer-agent-notification status
+           # service ceilometer-collector status
+
+       #. On every compute node, run:
+
+          .. code-block:: console
+
+           # service ceilometer-polling status
+
+       #. On any controller node, run :command:`pcs status | grep ceilometer`
+          or :command:`crm status | grep ceilometer` to verify which node is
+          currently handling requests and their status. The output should
+          contain the node ID and the ``Started`` status.
+   * - Cinder
+     - #. On every controller node, run:
+
+          .. code-block:: console
+
+           # service cinder-api status
+           # service cinder-scheduler status
+
+       #. On every node with the Cinder role, run:
+
+          .. code-block:: console
+
+           # service cinder-volume status
+           # service cinder-backup status
+
+   * - Corosync/Pacemaker
+     - On every controller node:
+
+       #. Run :command:`service corosync status` and
+          :command:`service pacemaker status`.
+       #. Verify the output of the :command:`pcs status` or
+          :command:`crm status` command. The ``Online`` field should contain
+          all the controllers' host names.
+       #. Verify the output of the :command:`pcs resource show` or
+          :command:`crm resource show` command. All resources should be
+          ``Started``.
+   * - Glance
+     - On every controller node, run:
+
+       .. code-block:: console
+
+        # service glance-api status
+        # service glance-registry status
+
+   * - Heat
+     - #. On any controller node, verify the status of Heat engines:
+
+          .. code-block:: console
+
+           # source openrc
+           # heat service-list
+
+          The output should contain the table with a list of the Heat engines
+          for all controller nodes in the ``up`` status.
+       #. On every controller node, run:
+
+          .. code-block:: console
+
+           # service heat-api status
+           # service heat-api-cfn status
+           # service heat-api-cloudwatch status
+           # service heat-engine status
+
+   * - Horizon
+     - Since the Horizon service is available through the Apache server,
+       you should verify the Apache service status as well. Complete the
+       following steps on all controller nodes:
+
+       #. Verify whether the Apache service is running using the
+          :command:`service apache2 status` command.
+       #. Verify whether the Horizon ports are opened and listening using the
+          :command:`netstat -nltp | egrep ':80|:443'` command. The output
+          should contain the management and local IP addresses with either
+          port 80 or 443 in the ``LISTEN`` status.
+   * - Ironic
+     - #. On every controller node, run :command:`service ironic-api status`.
+       #. On every Ironic node, run :command:`service ironic-conductor status`.
+       #. On any controller node, run :command:`pcs status | grep ironic`.
+          The output should contain the name or ID of the node where the
+          ``p_nova_compute_ironic`` resource is running.
+   * - Keystone
+     - Since the Keystone service is available through the Apache server,
+       you should verify the Apache service status as well. Complete the
+       following steps on all controller nodes (and the nodes with the
+       Keystone role if any):
+
+       #. Verify whether the Apache service is running using
+          :command:`service apache2 status`.
+       #. Verify whether the Keystone ports are opened and listening using
+          :command:`netstat -nltp | egrep '5000|35357'`. The output should
+          contain the management and local IP addresses with the ports 5000
+          and 35357 in the ``LISTEN`` status.
+   * - MySQL/Galera
+     - On any controller node:
+
+       #. Verify the output of the :command:`pcs status|grep -A1 clone_p_mysql`
+          or :command:`crm status|grep -A1 clone_p_mysql` command. The resource
+          ``clone_p_mysqld`` should be in the ``Started`` status for all
+          controllers.
+       #. Verify the output of the
+          :command:`mysql -e "show status" | egrep 'wsrep_(local_state|incoming_address)'`
+          command. The ``wsrep_local_state_comment`` variable should be
+          ``Synced``, the ``wsrep_incoming_address`` field should contain all
+          IP addresses of the controller nodes (in the management network).
+   * - Neutron
+     - #. On every compute node, run:
+
+          .. code-block:: console
+
+           # service neutron-openvswitch-agent status
+
+       #. On every controller node:
+
+          #. Verify the ``neutron-server`` service status:
+
+             .. code-block:: console
+
+              # service neutron-server status
+
+          #. Verify the statuses of the Neutron agents:
+
+             .. code-block:: console
+
+              # service neutron-metadata-agent status
+              # service neutron-dhcp-agent status
+              # service neutron-l3-agent status
+              # service neutron-openvswitch-agent status
+
+       #. On any controller node:
+
+          #. Verify the states of the Neutron agents:
+
+             .. code-block:: console
+
+              # source openrc
+              # neutron agent-list
+
+             The output table should list all the Neutron agents with the
+             ``:-)`` value in the ``alive`` column and the ``True`` value in
+             the ``admin_state_up`` column.
+
+          #. Verify the Corosync/Pacemaker status:
+
+             .. code-block:: console
+
+              # pcs status | grep -A2 neutron
+
+             The output should contain the Neutron resources in the ``Started``
+             status for all controller nodes.
+   * - Nova
+     - * Using the Fuel CLI:
+
+         #. On every controller node, run:
+
+            .. code-block:: console
+
+             # service nova-api status
+             # service nova-cert status
+             # service nova-compute status
+             # service nova-conductor status
+             # service nova-consoleauth status
+             # service nova-novncproxy status
+             # service nova-scheduler status
+             # service nova-spicehtml5proxy status
+             # service nova-xenvncproxy status
+
+         #. On every compute node, run :command:`service nova-compute status`.
+
+       * Using the Nova CLI:
+
+         .. code-block:: console
+
+          # source openrc
+          # nova service-list
+
+         The output should contain the table with the Nova services list. The
+         services status should be ``enabled``, their state should be ``up``.
+   * - RabbitMQ
+     - * On any controller node, run :command:`rabbitmqctl cluster_status`.
+
+         In the output, the ``running_nodes`` field should contain all the
+         controllers’ host names in the ``rabbit@<HOSTNAME>`` format. The
+         ``partitions`` field should be empty.
+   * - Swift
+     - * On every controller node, run:
+
+         .. code-block:: console
+
+          # service swift-account-auditor status
+          # service swift-account status
+          # service swift-account-reaper status
+          # service swift-account-replicator status
+          # service swift-container-auditor status
+          # service swift-container status
+          # service swift-container-reconciler status
+          # service swift-container-replicator status
+          # service swift-container-sync status
+          # service swift-container-updater status
+          # service swift-object-auditor status
+          # service swift-object status
+          # service swift-object-reconstructor status
+          # service swift-object-replicator status
+          # service swift-object-updater status
+          # service swift-proxy status
+
+.. seealso:: :ref:`restart-service`