Add troubleshooting section

- attach separate section in documentation contents
- describe possible problems of plugin and NSX platform
- remove sentence about pre-built plugin package from last page

Change-Id: I26ab3c774206916c62c88fafd31f444af6f633a8
Cherry-picked from 5474fac
This commit is contained in:
Igor Zinovik 2016-02-19 18:14:49 +03:00
parent 1be1b29e74
commit b726058f89
2 changed files with 226 additions and 4 deletions

View File

@ -23,6 +23,9 @@ Through documentation we use terms "NSX" and "NSXv" interchangeably, both of
these terms refer to `VMware NSX virtualized network platform
<https://www.vmware.com/products/nsx>`_.
Pre-built package of the plugin you can find in
`Fuel Plugin Catalog <https://www.mirantis.com/products/openstack-drivers-and-plugins/fuel-plugins>`_.
Documentation contents
======================
@ -35,7 +38,4 @@ Documentation contents
source/configuration
source/usage
source/release-notes
Pre-built package of the plugin you can find in
`Fuel Plugin Catalog <https://www.mirantis.com/products/openstack-drivers-and-plugins/fuel-plugins>`_.
source/troubleshooting

View File

@ -0,0 +1,222 @@
Troubleshooting
===============
Neutron NSX plugin issues
-------------------------
Neutron NSX plugin does not have separate log file, its messages are logged by
neutron server. Default log file on OpenStack controllers for neutron server
is ``/var/log/neutron/server.log``
Inability to resolve NSX Manager hostname
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you see following message::
2016-02-19 ... ERROR neutron.service [-] Unrecoverable error: please check log for details.
2016-02-19 ... ERROR neutron.service Traceback (most recent call last):
...
2016-02-19 ... ERROR neutron ServerNotFoundError: Unable to find the server at nsxmanager.mydom.org
2016-02-19 ... ERROR neutron
It means that controller cannot resolve NSX Manager hostname
(``nsxmanager.mydom.org`` in this example) that is specified in config file.
Check that DNS server IP addresses that you specified in *Host OS DNS Servers*
section of Fuel web UI are correct and reachable by all controllers (pay
attention that default route for controllers is *Public* network). Also verify
that host name that you entered is correct try to resolve it via ``host`` or
``dig`` programs.
SSL/TLS certificate problems
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
::
2016-02-19 ... ERROR neutron File "/usr/lib/python2.7/dist-packages/httplib2/__init__.py",
line 1251, in _conn_request
2016... 10939 ERROR neutron conn.connect()
2016-02-19 ... ERROR neutron File "/usr/lib/python2.7/dist-packages/httplib2/__init__.py",
line 1043, in connect
2016-02-19 ... ERROR neutron raise SSLHandshakeError(e)
2016-02-19 ... ERROR neutron SSLHandshakeError: [Errno 1]_ssl.c:510: error:
14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
This error indicates that you enabled SSL/TLS certificate verification, but
certificate verification failed during connection to NSX Manager. Possible
reasons of this:
#. NSX Manager certificate expired. Log into NSX Manager web GUI and check
certificate validation dates.
#. Check certification authority (CA) certificate is still valid. CA
certificate is specified by ``ca_file`` directive in ``nsx.ini``.
User access problems
~~~~~~~~~~~~~~~~~~~~
::
2016-02-19 ... CRITICAL neutron [-] Forbidden: Forbidden: https://172.16.0.249/api/1.0/
appliance-management/summary/system
...
2016-02-19 ... ERROR neutron File "/usr/lib/python2.7/dist-packages/vmware_nsx/plugins/
nsx_v/vshield/common/VcnsApiClient.py", line 119, in request
2016-02-19 ... ERROR neutron raise cls(uri=uri, status=status, header=header, response=response)
2016-02-19 ... ERROR neutron Forbidden: Forbidden: https://172.16.0.249/api/1.0/
appliance-management/summary/system
Possible solutions:
* Username is incorrect.
* Password is incorrect.
* User account does not have sufficient privileges to perform certain
operations.
Non-existent vCenter entity specified
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If some settings of vCenter does not exist plugin will report following message
with varying setting that is not found in vCenter:
::
2016-02-19 ... ERROR neutron File "/usr/lib/python2.7/dist-packages/vmware_nsx/plugins/
nsx_v/plugin.py", line 2084, in _validate_config
2016-02-19 ... ERROR neutron raise nsx_exc.NsxPluginException(err_msg=error)
2016-02-19 ... ERROR neutron NsxPluginException: An unexpected error occurred in the NSX
Plugin: Configured datacenter_moid not found
2016-02-19 ... ERROR neutron
Non-existent transport zone
~~~~~~~~~~~~~~~~~~~~~~~~~~~
If transport zone does not exist plugin will fail with following message:
::
2016-02-19 ... CRITICAL neutron [req-81bbb7f6-...] NsxPluginException: An unexpected error
occurred in the NSX Plugin: Configured vdn_scope_id not found
...
2016-02-19 ... ERROR neutron Traceback (most recent call last):
2016-02-19 ... ERROR neutron raise nsx_exc.NsxPluginException(err_msg=error)
2016-02-19 ... ERROR neutron NsxPluginException: An unexpected error occurred in the NSX
Plugin: Configured vdn_scope_id not found
You can get list of available transport zones via GET request to NSX Manager
API URL ``https://nsx-manager.yourdomain.org/api/2.0/vdn/scopes``
Neutron client returns 504 Gateway timeout
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: bash
root@node-1:~# neutron router-create r_app --router_type exclusive
Result:
<html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>
This may signal that your NSX Manager or vCenter server are overloaded and
cannot handle incoming requests in certain amount of time. Possible solution to
this problem might be increasing haproxy timeouts for nova API and neutron.
Double values of following settings:
* timeout client
* timeout client-fin
* timeout server
* timeout server-fin
Edit configuration files (they are located in ``/etc/haproxy/conf.d``) and restart
haproxy on all controllers.
NSX platform issues
-------------------
Transport network connectivity
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before debugging problems of VM connectivity when they get spread across
ESXi cluster hosts it is good to verify that transport (underlay) network
functions properly.
You can get list of vmknic adapters that are used for VXLAN tunnels with
``esxcli`` command by providing DVS name. Then use one as output interface for
ping and try to reach another ESXi host.
::
~ # esxcli network vswitch dvs vmware vxlan vmknic list --vds-name computeDVS
Vmknic Name Switch Port ID VDS Port ID Endpoint ID VLAN ID IP Netmask
----------- -------------- ----------- ----------- ------- ----------- -------------
vmk1 50331670 33 0 0 172.16.0.91 255.255.255.0
Provide ``++netstack=vxlan`` option to operate via VXLAN networking stack.
::
~ # ping ++netstack=vxlan -d -s 1550 -I vmk1 172.29.46.12
If host does not get respond try following options:
* remove options ``-d`` (disable don't fragment bit) and ``-s`` (packet size)
and try to ping. In this case ping will use 56 byte packets and if reply
gets successfully delivered, consider revising MTU on network switches.
* if ping with smaller packets also fails, consider uplink interface
configuration (e.g. VLAN ID).
Verify NSX controllers state
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NSX controllers must form cluster majority
You can verify NSX controllers cluster state in web UI (``Network & Security ->
Installation -> Management``). All of them must be in normal status.
Verify ESXi hosts connectivity with NSX controllers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Check that each ESXi host established connection with NSX controllers
::
~ # esxcli network ip connection list | grep 1234
tcp 0 0 172.16.0.252:51916 192.168.130.101:1234
ESTABLISHED 77203 newreno netcpa-worker
Check that all connections are in ESTABLISHED state. If connection is not
established:
* Check that ESXi host can reach NSX controller.
* Check that firewall between ESXi host and NSX controller.
* Check that netcp agent (process that is responsible for communication between
ESXi and NSX controller) is running: ``/etc/init.d/netcpad status``. If it is
not running try to start it and check that it is running:
::
~ # /etc/init.d/netcpad start
~ # /etc/init.d/netcpad status
netCP agent service is running
Verify that Control Plane is Enabled and connection is up::
~ # esxcli network vswitch dvs vmware vxlan network list --vds-name computeDVS
VXLAN ID Multicast IP Control Plane
Controller Connection Port Count MAC Entry Count ARP Entry Count
-------- ------------------------- -----------------------------------
--------------------- ---------- --------------- ---------------
5000 N/A (headend replication) Enabled (multicast proxy,ARP proxy)
192.168.130.101 (up) 2 0 0
vSphere/NSX infrastructure is not running after power outage
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vCenter and NSX management VMs must be started in certain order.
Please see `VMware KB article
<https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2139067>`_.