diff --git a/userdocs/fuel-user-guide/troubleshooting.rst b/userdocs/fuel-user-guide/troubleshooting.rst index b0b517ab4..96a77c929 100644 --- a/userdocs/fuel-user-guide/troubleshooting.rst +++ b/userdocs/fuel-user-guide/troubleshooting.rst @@ -12,4 +12,8 @@ This section includes the following topics: .. toctree:: :maxdepth: 3 - troubleshooting/network.rst \ No newline at end of file + troubleshooting/service-status.rst + troubleshooting/restart-service.rst + troubleshooting/logging.rst + troubleshooting/debug-mode.rst + troubleshooting/network.rst diff --git a/userdocs/fuel-user-guide/troubleshooting/debug-mode.rst b/userdocs/fuel-user-guide/troubleshooting/debug-mode.rst new file mode 100644 index 000000000..e1b36ebd8 --- /dev/null +++ b/userdocs/fuel-user-guide/troubleshooting/debug-mode.rst @@ -0,0 +1,45 @@ +.. _debug-mode: + +========================================== +Enable debug mode for an OpenStack service +========================================== + +Most OpenStack services use the same configuration options to enable the +debug logging that is also used to troubleshoot your OpenStack environment. + +**To enable the debug mode for an OpenStack service:** + +#. Log in to each controller node. +#. Locate and open the required OpenStack service configuration file in the + ``/etc`` directory, for example, ``/etc/nova/nova.conf``. +#. In the ``DEFAULT`` section, change the value for the ``debug`` parameter + to ``True``: + + .. code-block:: ini + + debug = True + + If the configuration file that you edit contains the ``use_syslog`` + parameter, change its value to ``False``: + + .. code-block:: ini + + use_syslog = False + + Disabling syslog will protect the Fuel Master node from overloading debug + messages. + +#. Save the changes. +#. The following services require additional configuration to enable the debug + mode: + + * For Cinder, edit the configuration file on *each node with Cinder role*. + * For Glance, edit two configuration files: ``/etc/glance/glance-api.conf`` + and ``/etc/glance/glance-registry.conf``. + * For Ironic, edit the ``/etc/nova/nova.conf`` file of the ``nova-compute`` + service configured to work with Ironic. + +#. Restart the service. See :ref:`restart-service`. + +.. caution:: Remember to revert the original values in the OpenStack service + configuration file when troubleshooting is done. diff --git a/userdocs/fuel-user-guide/troubleshooting/logging.rst b/userdocs/fuel-user-guide/troubleshooting/logging.rst new file mode 100644 index 000000000..ad4e05ff9 --- /dev/null +++ b/userdocs/fuel-user-guide/troubleshooting/logging.rst @@ -0,0 +1,58 @@ +========================== +OpenStack services logging +========================== + +Depending on your needs, use the following logging locations for the OpenStack +services: + +* On the Fuel Master node, the log files of all OpenStack services are located + in ``/var/log/remote//SERVICE_NAME.log``. + +* On each node of your environment, the log files are located in the + ``/var/log/-all.log`` file and the ``/var/log//`` + folder. Some OpenStack services, for example, Horizon and Ironic, have only + a log folder in ``/var/log//`` and do not have a + ``/var/log/-all.log`` file. + +Some OpenStack services have additional logging locations. The following table +lists these locations: + +.. list-table:: + :widths: 10 25 + :header-rows: 1 + + * - Service name + - Log files location + * - Corosync/Pacemaker + - Fuel Master node: + + * /var/log/remote//attrd.log + * /var/log/remote//crmd.log + * /var/log/remote//cib.log + * /var/log/remote//lrmd.log + * /var/log/remote//pengine.log + * - Horizon + - Controller node: + + * /var/log/apache2/horizon_access.log + * /var/log/apache2/horizon_error.log + * - Keystone + - Controller node: + + Since the Keystone service is available through the Apache server, + the Apache logs contain the Keystone logs: + + * /var/log/apache2/error.log + * /var/log/apache2/access.log + * /var/log/apache2/keystone_wsgi_admin_access.log + * /var/log/apache2/keystone_wsgi_admin_error.log + * /var/log/apache2/keystone_wsgi_main_access.log + * /var/log/apache2/keystone_wsgi_main_error.log + * - MySQL + - Controller node: + + * /var/log/syslog + * - Neutron + - Controller node: + + * /var/log/openvswitch diff --git a/userdocs/fuel-user-guide/troubleshooting/restart-service.rst b/userdocs/fuel-user-guide/troubleshooting/restart-service.rst new file mode 100644 index 000000000..61d53dacb --- /dev/null +++ b/userdocs/fuel-user-guide/troubleshooting/restart-service.rst @@ -0,0 +1,339 @@ +.. _restart-service: + +============================ +Restart an OpenStack service +============================ + +Troubleshooting of an OpenStack service usually requires a service restart. +To restart an OpenStack service, complete the steps described in the +following table on *all controller nodes* unless indicated otherwise. + +.. caution:: Before restarting a service on the next controller node, + verify that the service is up and running on the node where you + have restarted it using the :command:`service status`. + +.. note:: Since a resource restart requires a considerable amount of time, + some commands listed in the table below do not provide an + immediate output. + +.. list-table:: + :widths: 3 25 + :header-rows: 1 + + * - Service name + - Restart procedure + * - Ceilometer + - #. Log in to a controller node CLI. + #. Restart the Ceilometer services: + + .. code-block:: console + + # service ceilometer-agent-central restart + # service ceilometer-api restart + # service ceilometer-agent-notification restart + # service ceilometer-collector status restart + + #. Verify the status of the Ceilometer services. See + :ref:`service-status`. + #. Repeat step 1 - 3 on all controller nodes. + * - Cinder + - #. Log in to a controller node CLI. + #. Restart the Cinder services: + + .. code-block:: console + + # service cinder-api restart + # service cinder-scheduler restart + + #. Verify the status of the Cinder services. See + :ref:`service-status`. + #. Repeat step 1 - 3 on all controller nodes. + #. On every node with Cinder role, run: + + .. code-block:: console + + # service cinder-volume restart + # service cinder-backup restart + + #. Verify the status of the ``cinder-volume`` and ``cinder-backup`` + services. + * - Corosync/Pacemaker + - #. Log in to a controller node CLI. + #. Restart the Corosync and Pacemaker services: + + .. code-block:: console + + # service corosync restart + # service pacemaker restart + + #. Verify the status of the Corosync and Pacemaker services. See + :ref:`service-status`. + #. Repeat step 1 - 3 on all controller nodes. + * - Glance + - #. Log in to a controller node CLI. + #. Restart the Glance services: + + .. code-block:: console + + # service glance-api restart + # service glance-registry restart + + #. Verify the status of the Glance services. See + :ref:`service-status`. + #. Repeat step 1 - 3 on all controller nodes. + * - Horizon + - Since the Horizon service is available through the Apache server, + you should restart the Apache service on all controller nodes: + + #. Log in to a controller node CLI. + #. Restart the Apache server: + + .. code-block:: console + + # service apache2 restart + + #. Verify whether the Apache service is successfully running after + restart: + + .. code-block:: console + + # service apache2 status + + #. Verify whether the Apache ports are opened and listening: + + .. code-block:: console + + # netstat -nltp | egrep apache2 + + #. Repeat step 1 - 3 on all controller nodes. + * - Ironic + - #. Log in to a controller node CLI. + #. Restart the Ironic services: + + .. code-block:: console + + # service ironic-api restart + # service ironic-conductor restart + + #. Verify the status of the Ironic services. See + :ref:`service-status`. + #. Repeat step 1 - 3 on all controller nodes. + #. On any controller node, run the following command for the + ``nova-compute`` service configured to work with Ironic: + + .. code-block:: console + + # crm resource restart p_nova_compute_ironic + + #. Verify the status of the ``p_nova_compute_ironic`` service. + + * - Keystone + - Since the Keystone service is available through the Apache server, + complete the following steps on all controller nodes: + + #. Log in to a controller node CLI. + #. Restart the Apache server: + + .. code-block:: console + + # service apache2 restart + + #. Verify whether the Apache service is successfully running after + restart: + + .. code-block:: console + + # service apache2 status + + #. Verify whether the Apache ports are opened and listening: + + .. code-block:: console + + # netstat -nltp | egrep apache2 + + #. Repeat step 1 - 3 on all controller nodes. + + * - MySQL + - #. Log in to any controller node CLI. + #. Run the following command: + + .. code-block:: console + + # pcs status | grep -A1 mysql + + In the output, the resource ``clone_p_mysql`` should be in the + ``Started`` status. + #. Disable the ``clone_p_mysql`` resource: + + .. code-block:: console + + # pcs resource disable clone_p_mysqld + + #. Verify that the resource ``clone_p_mysqld`` is in the ``Stopped`` + status: + + .. code-block:: console + + # pcs status | grep -A2 mysql + + It may take some time for this resource to be stopped on all + controller nodes. + #. Disable the ``clone_p_mysql`` resource: + + .. code-block:: console + + # pcs resource enable clone_p_mysqld + + #. Verify that the resource ``clone_p_mysqld`` is in the ``Started`` + status again on all controller nodes: + + .. code-block:: console + + # pcs status | grep -A2 mysql + + .. warning:: Use the :command:`pcs` commands instead of :command:`crm` + for restarting the service. + The pcs tool correctly stops the service according to the + quorum policy preventing MySQL failures. + * - Neutron + - Use the following restart steps for the DHCP Neutron agent as an + example for all Neutron agents. + + #. Log in to any controller node CLI. + #. Verify the DHCP agent status: + + .. code-block:: console + + # pcs resource show | grep -A1 neutron-dhcp-agent + + The output should contain the list of all controllers in the + ``Started`` status. + #. Stop the DHCP agent: + + .. code-block:: console + + # pcs resource disable clone_neutron-dhcp-agent + + #. Verify the Corosync status of the DHCP agent: + + .. code-block:: console + + # pcs resource show | grep -A1 neutron-dhcp-agent + + The output should contain the list of all controllers in the + ``Stopped`` status. + #. Verify the ``neutron-dhcp-agent`` status on the OpenStack side: + + .. code-block:: console + + # neutron agent-list + + The output table should contain the DHCP agents for every + controller node with ``xxx`` in the ``alive`` column. + #. Start the DHCP agent on every controller node: + + .. code-block:: console + + # pcs resource enable clone_neutron-dhcp-agent + + #. Verify the DHCP agent status: + + .. code-block:: console + + # pcs resource show | grep -A1 neutron-dhcp-agent + + The output should contain the list of all controllers in the + ``Started`` status. + #. Verify the ``neutron-dhcp-agent`` status on the OpenStack side: + + .. code-block:: console + + # neutron agent-list + + The output table should contain the DHCP agents for every + controller node with ``:-)`` in the ``alive`` column and ``True`` + in the ``admin_state_up`` column. + * - Nova + - #. Log in to a controller node CLI. + #. Restart the Nova services: + + .. code-block:: console + + # service nova-api restart + # service nova-cert restart + # service nova-compute restart + # service nova-conductor restart + # service nova-consoleauth restart + # service nova-novncproxy restart + # service nova-scheduler restart + # service nova-spicehtml5proxy restart + # service nova-xenvncproxy restart + + #. Verify the status of the Nova services. See + :ref:`service-status`. + #. Repeat step 1 - 3 on all controller nodes. + #. On every compute node, run: + + .. code-block:: console + + # service nova-compute restart + #. Verify the status of the ``nova-compute`` service. + * - RabbitMQ + - #. Log in to any controller node CLI. + #. Disable the RabbitMQ service: + + .. code-block:: console + + # pcs resource disable master_p_rabbitmq-server + + #. Verify whether the service is stopped: + + .. code-block:: console + + # pcs status | grep -A2 rabbitmq + + #. Enable the service: + + .. code-block:: console + + # pcs resource enable master_p_rabbitmq-server + + During the startup process, the output of the :command:`pcs status` + command can show all existing RabbitMQ services in the ``Slaves`` + mode. + + #. Verify the service status: + + .. code-block:: console + + # rabbitmqctl cluster_status + + In the output, the ``running_nodes`` field should contain all + controllers’ host names in the ``rabbit@`` format. The + ``partitions`` field should be empty. + * - Swift + - #. Log in to a controller node CLI. + #. Restart the Swift services: + + .. code-block:: console + + # service swift-account-auditor restart + # service swift-account restart + # service swift-account-reaper restart + # service swift-account-replicator restart + # service swift-container-auditor restart + # service swift-container restart + # service swift-container-reconciler restart + # service swift-container-replicator restart + # service swift-container-sync restart + # service swift-container-updater restart + # service swift-object-auditor restart + # service swift-object restart + # service swift-object-reconstructor restart + # service swift-object-replicator restart + # service swift-object-updater restart + # service swift-proxy restart + + #. Verify the status of the Swift services. See + :ref:`service-status`. + #. Repeat step 1 - 3 on all controller nodes. diff --git a/userdocs/fuel-user-guide/troubleshooting/service-status.rst b/userdocs/fuel-user-guide/troubleshooting/service-status.rst new file mode 100644 index 000000000..c6d9a489f --- /dev/null +++ b/userdocs/fuel-user-guide/troubleshooting/service-status.rst @@ -0,0 +1,249 @@ +.. _service-status: + +================================== +Verify an OpenStack service status +================================== + +To ensure that an OpenStack service is up and running, verify the service +status on *every controller node*. Some OpenStack services require additional +verification on the non-controller nodes. The following table describes the +verification steps for the common OpenStack services. + +.. note:: In the table below, the output of the + :command:`service status` command should contain the + service status and the process ID unless indicated otherwise. + For example, ``neutron-server start/running, process 283``. + +.. list-table:: **Verifying an OpenStack service status** + :widths: 3 25 + :header-rows: 1 + + * - Service name + - Verification procedure + * - Ceilometer + - #. On every MongoDB node, run: + + .. code-block:: console + + # service mongodb status + # netstat -nltp | grep mongo + + The output of the :command:`netstat` command returns the + management and local IP addresses and ports in the + ``LISTEN`` status. + + #. On every controller node, run: + + .. code-block:: console + + # service ceilometer-agent-central status + # service ceilometer-api status + # service ceilometer-agent-notification status + # service ceilometer-collector status + + #. On every compute node, run: + + .. code-block:: console + + # service ceilometer-polling status + + #. On any controller node, run :command:`pcs status | grep ceilometer` + or :command:`crm status | grep ceilometer` to verify which node is + currently handling requests and their status. The output should + contain the node ID and the ``Started`` status. + * - Cinder + - #. On every controller node, run: + + .. code-block:: console + + # service cinder-api status + # service cinder-scheduler status + + #. On every node with the Cinder role, run: + + .. code-block:: console + + # service cinder-volume status + # service cinder-backup status + + * - Corosync/Pacemaker + - On every controller node: + + #. Run :command:`service corosync status` and + :command:`service pacemaker status`. + #. Verify the output of the :command:`pcs status` or + :command:`crm status` command. The ``Online`` field should contain + all the controllers' host names. + #. Verify the output of the :command:`pcs resource show` or + :command:`crm resource show` command. All resources should be + ``Started``. + * - Glance + - On every controller node, run: + + .. code-block:: console + + # service glance-api status + # service glance-registry status + + * - Heat + - #. On any controller node, verify the status of Heat engines: + + .. code-block:: console + + # source openrc + # heat service-list + + The output should contain the table with a list of the Heat engines + for all controller nodes in the ``up`` status. + #. On every controller node, run: + + .. code-block:: console + + # service heat-api status + # service heat-api-cfn status + # service heat-api-cloudwatch status + # service heat-engine status + + * - Horizon + - Since the Horizon service is available through the Apache server, + you should verify the Apache service status as well. Complete the + following steps on all controller nodes: + + #. Verify whether the Apache service is running using the + :command:`service apache2 status` command. + #. Verify whether the Horizon ports are opened and listening using the + :command:`netstat -nltp | egrep ':80|:443'` command. The output + should contain the management and local IP addresses with either + port 80 or 443 in the ``LISTEN`` status. + * - Ironic + - #. On every controller node, run :command:`service ironic-api status`. + #. On every Ironic node, run :command:`service ironic-conductor status`. + #. On any controller node, run :command:`pcs status | grep ironic`. + The output should contain the name or ID of the node where the + ``p_nova_compute_ironic`` resource is running. + * - Keystone + - Since the Keystone service is available through the Apache server, + you should verify the Apache service status as well. Complete the + following steps on all controller nodes (and the nodes with the + Keystone role if any): + + #. Verify whether the Apache service is running using + :command:`service apache2 status`. + #. Verify whether the Keystone ports are opened and listening using + :command:`netstat -nltp | egrep '5000|35357'`. The output should + contain the management and local IP addresses with the ports 5000 + and 35357 in the ``LISTEN`` status. + * - MySQL/Galera + - On any controller node: + + #. Verify the output of the :command:`pcs status|grep -A1 clone_p_mysql` + or :command:`crm status|grep -A1 clone_p_mysql` command. The resource + ``clone_p_mysqld`` should be in the ``Started`` status for all + controllers. + #. Verify the output of the + :command:`mysql -e "show status" | egrep 'wsrep_(local_state|incoming_address)'` + command. The ``wsrep_local_state_comment`` variable should be + ``Synced``, the ``wsrep_incoming_address`` field should contain all + IP addresses of the controller nodes (in the management network). + * - Neutron + - #. On every compute node, run: + + .. code-block:: console + + # service neutron-openvswitch-agent status + + #. On every controller node: + + #. Verify the ``neutron-server`` service status: + + .. code-block:: console + + # service neutron-server status + + #. Verify the statuses of the Neutron agents: + + .. code-block:: console + + # service neutron-metadata-agent status + # service neutron-dhcp-agent status + # service neutron-l3-agent status + # service neutron-openvswitch-agent status + + #. On any controller node: + + #. Verify the states of the Neutron agents: + + .. code-block:: console + + # source openrc + # neutron agent-list + + The output table should list all the Neutron agents with the + ``:-)`` value in the ``alive`` column and the ``True`` value in + the ``admin_state_up`` column. + + #. Verify the Corosync/Pacemaker status: + + .. code-block:: console + + # pcs status | grep -A2 neutron + + The output should contain the Neutron resources in the ``Started`` + status for all controller nodes. + * - Nova + - * Using the Fuel CLI: + + #. On every controller node, run: + + .. code-block:: console + + # service nova-api status + # service nova-cert status + # service nova-compute status + # service nova-conductor status + # service nova-consoleauth status + # service nova-novncproxy status + # service nova-scheduler status + # service nova-spicehtml5proxy status + # service nova-xenvncproxy status + + #. On every compute node, run :command:`service nova-compute status`. + + * Using the Nova CLI: + + .. code-block:: console + + # source openrc + # nova service-list + + The output should contain the table with the Nova services list. The + services status should be ``enabled``, their state should be ``up``. + * - RabbitMQ + - * On any controller node, run :command:`rabbitmqctl cluster_status`. + + In the output, the ``running_nodes`` field should contain all the + controllers’ host names in the ``rabbit@`` format. The + ``partitions`` field should be empty. + * - Swift + - * On every controller node, run: + + .. code-block:: console + + # service swift-account-auditor status + # service swift-account status + # service swift-account-reaper status + # service swift-account-replicator status + # service swift-container-auditor status + # service swift-container status + # service swift-container-reconciler status + # service swift-container-replicator status + # service swift-container-sync status + # service swift-container-updater status + # service swift-object-auditor status + # service swift-object status + # service swift-object-reconstructor status + # service swift-object-replicator status + # service swift-object-updater status + # service swift-proxy status + +.. seealso:: :ref:`restart-service`