The input parameter "is_ovs_port" is no longer needed in the method
``IpLinkCommand.set_netns`` since [1].
[1]https://review.opendev.org/c/openstack/neutron/+/905836
Trivial-Fix
Change-Id: I0e36cf8afe76904997e14eca415a0e978f05c55a
If the ``IpLinkCommand.set_netns`` fails, the method restores the
previous device namespace before raising the exception.
Closes-Bug: #2049590
Change-Id: I73b36ef161441b52922d888c11a144eafe8a7ed0
normal network namespaces are bind-mounted to files under
/var/run/netns. If a process deleting a network namespace gets killed
during that operation there is the chance that the bind mount to the
netns has been removed, but the file under /var/run/netns still exists.
When the neutron-ovn-metadata-agent tries to clean up such network
namespaces it first tires to validate that the network namespace is
empty. For the cases described above this fails, as this network
namespace no longer really exists, but is just a stray file laying
around.
To fix this we treat network namespaces where we get an `OSError` with
errno 22 (Invalid Argument) as empty. The calls to pyroute2 to delete
the namespace will then clean up the file.
Additionally we add a guard to teardown_datapath to continue even if
this fails. failing to remove a datapath is not critical and leaves in
the worst case a process and a network namespace running, however
previously it would have also prevented the creation of new datapaths
which is critical for VM startup.
Closes-Bug: #2037102
Change-Id: I7c43812fed5903f98a2e491076c24a8d926a59b4
IPv4 DAD is non-existent in Linux or its failure is silent, so we
never needed to catch and ignore it. On the other hand IPv6 DAD
failure is explicit, hence comes this change.
This of course leaves the metadata service dead on hosts where
duplicate address detection failed. But if we catch the
DADFailed exception and delete the address, at least other
functions of the dhcp-agent should not be affected.
With this the IPv6 isolated metadata service is not redundant, which
is the best we can do without a redesign.
Also document the promised service level of isolated metadata.
Added additional tests for the metadata driver as well.
Change-Id: I6b544c5528cb22e5e8846fc47dfb8b05f70f975c
Partial-Bug: #1953165
When network device which is ovs internal port is moved to the namespace
it may happend sometimes that it will have "shy port syndrome" [1].
Even though there is wait for device to be in namespace in the set_netns
method it may happend that device is in namespace during this check but
it dissapears for short time later and that causes failures e.g. in
functional tests like described in [2].
To avoid that, this patch proposed simple (and ugly) sleep for 1 second
before checking if port really exists in the namespace. If it will be
"shy" port it should already flap during that 1 second.
[1] https://bugs.launchpad.net/neutron/+bug/1618987
[2] https://bugs.launchpad.net/neutron/+bug/1961740
Related-Bug: #1961740
Related-Bug: #1998337
Change-Id: I442587e7ef55917f4ea873e190bf8afbc0e911e1
When DAD fails on an IPv6 address, both the 'dadfailed'
and 'tentative' flags will be set. So change the code
to check for 'dadfailed' first, just to be explicit.
Added better unit testing to cover more cases as well.
Trivialfix
Change-Id: I2dddc296826e5ab5e057c32a554e353577cc36e8
This patch also removes the pylint disablement message control
statements and imports NetNS and IPRoute from the new locations
in pyroute2.
Trivial-Fix
Change-Id: I298a7da767473c236ddf03c5702a2904d4870284
Added ``devlink.get_port`` method that provides information about
a devlink port [1]. It is used to retrieve information about a port
representor connected to a local OVS instance (aka: hardware offloaded
ports). This method reports the PF PCI address, the PF index, the VF
index and the PF name; the PF name will be used to enforce the QoS
policies on the SR-IOV parent device (similar to what is done in the
ML2/SRIOV agent).
[1]https://www.kernel.org/doc/html/latest/networking/devlink/devlink-port.html
Related-Bug: #1998608
Change-Id: I34daf554cabcf17cb6371d510d5827457012516d
Created new add_ip_addresses privileged function
which takes an iterable of cidrs and adds them
in one privileged call. This is so we dont have to
take on additional priv overhead when calling
add_ip_address in a loop.
For parity, performed the same change on the
delete_ip_address function.
Closes-Bug: #1987281
Partial-Bug: #1981113
Change-Id: Ib1278af20c3b3b057712453cb249aba34b684a21
When a new IP route is created, before passing the route protocol,
find if it is a string and if this string is on the pyroute2 defined
protocols. In this case, pass the protocol number.
In the same way, when the IP route is returned, if the protocol is a
number, convert it to the corresponding protocol string.
Closes-Bug: #1988037
Change-Id: I4ca66d86705a55b2b63083c229629c16b6136283
"pyroute2" methods can include some objects that don't implement
any serialization method (e.g.: "nla_slot" [1]). In those methods
that require an output ("get_*", "list_*", etc.), the Neutron
IP library formats the output inside the privsep context only to
contain serializable objects.
However this library is also returning the blobs returned from
the "pyroute2" library, without parsing and formatting, from
methods that don't require an output ("set_*", "add_*", "delete_*",
etc.). This patch removes the "return" statement from those methods
because the output is not required and to avoid issues like those
reported in the related bug.
[1]8716b9b5c0/pyroute2/netlink/__init__.py (L1754)
Closes-Bug: #1986644
Change-Id: I491dbdabfda0ca010ca56355b71dfe150ed71a71
Now it is mandatory, at least for IPv6 addresses, to define a table
when an IP rule is added. The default table selected is "default"
(table=253). In any case, all commands calling this method right now are
specifying the table in the kwargs.
Partial-Bug: #1981963
Change-Id: Ia44ac34ca9b91719a86f4d573c9777a4708d69a4
Those extra logs should tell more about what IP addresses are
added/removed in the qrouter namespace by the keepalived process and
hopefully help us understand failures in functional CI job,
like are described in the related bug.
Related-bug: #1956958
Change-Id: I5e924922baffbf2e059f243b115ff799e8432a56
Check if group and/or local addresses passed to ip_lib / add_vxlan()
are IPv4 or IPv6. In case of IPv4 fill 'vxlan_group' and 'vxlan_local'
arguments and in case of IPv6 fill 'vxlan_group6' and 'vxlan_local6'
arguments to be passed down to privileged create_interface() method.
In case of an invalid address format raise an AddrFormatError exception.
Closes-Bug: #1952897
Change-Id: I2e3b0c1635627edb2c86c6120b0410ab3c4678b2
"IPWrapper.add_vxlan" method must have "dev" parameter as possitional
argument. A VXLAN interface must be always created on top of an existing
network device:
https://www.kernel.org/doc/Documentation/networking/vxlan.txt
Closes-Bug: #1954316
Change-Id: Ia082f8531ffcc1599206124774599dcdb500274a
When an interface is moved to a new namespace, specially with OVS
internal ports, the interface first dissapears from any network
namespace and then is added again. ovs-vswitchd service detects
this interface change as reported in [1]. This delay is the cause
of the related bug, where some interfaces are not present when
the L3 agent needs to manipulate them.
[1]https://bugs.launchpad.net/neutron/+bug/1948832/comments/3
Closes-Bug: #1948832
Change-Id: I3af4d0afa784899689ccb595ce6ba64495431eb9
To check the existance of a namespace, instead of listing the
namespaces directory (by default "/var/run/netns"), this patch
directly checks the existence of the namespace directory, using
"os.path.exists".
This check is faster than listing the whole directory and avoids
timeout problems as reported in the related bug.
Closes-Bug: #1947974
Change-Id: I558d50d28378beb3710d98a2113ff9549c82ae17
Since version 0.6.2, pyroute2 library dynamically imports the needed
modules when loaded. A static analysis will fail when checking the
import references.
Change-Id: I5aaf9494a2d5c2533199e6b92d4df8fe785f83a3
Closes-Bug: #1930750
"get_routing_table" uses "pyroute2.IPDB" that has been deprecated.
"list_ip_routes" has been improved to be able to read multipath
routes.
Closes-Bug: #1926476
Change-Id: I0299fa11a7afefbd2999f81cd4ed3beed572009c
This is a leftover of the "ip route" command migration to Pyroute2.
A new paremeter, "proto", is added to the IP route add and list
commands. The default protocol used is "static".
Story: #2007686
Task: #41284
Related-Bug: #1492714
Change-Id: I319fd0611d3e8a3a09d6d4e077a17a622f74f51c
"IpCommandBase" class was implemented to provide a common interface
for all "ip" command subclasses. This base class provided a COMMAND
class variable, to define the "ip" shell command subparameter and
a two execution methods, "_run" and "_as_root".
Now all "ip" command classes have been migrated to Pyroute2, this
basic interface is not needed anymore.
Story: #2007686
Task: #41558
Change-Id: Ib7d30b954bef3bc3551f1ca206873df354d1ab23
As reported in LP#1896734, there is a limit in the size of information
that can be transmitted in one single message between an application
and the privsep daemon. The read socket buffer is limited in size;
a message exceeding this size will generate an exception.
In order to limit the amount of information to be sent, this patch
improves the performance of "get_devices_with_ip". In the previous
implementation, the whole list of network devices from a namespace
was retrieved. In some environments, the list of devices could be
so big that the list returned by "privileged.get_link_devices" can
exceed the read buffer size (as reported in the LP bug when the
OVS agent tries to retrieve the list of IP addresses in the system).
Now the function calls "privileged.get_ip_addresses", that returns
a much smaller list. This patch is also reducing the number of system
calls to just one; the previous implementation was retrieving first
the devices link information list (that method was returning a much
bigger blob) and then, per device, retrieving the IP address
information.
Change-Id: I97ada62484023b9833ed12afd68eb4c8d337fd1f
Related-Bug: #1896734
Replace rootwrap execution with privsep context execution.
This series of patches will progressively replace any
rootwrap call.
This patch migrates some missing execution methods present in
the code and removes unneeded rootwrap filters.
Story: #2007686
Task: #41558
Change-Id: I1542dc4cf98658fc9a40018192498c7a5cd1c3fe
Replace rootwrap execution with privsep context execution.
This series of patches will progressively replace any
rootwrap call.
This patch replaces some "IpNetnsCommand" command execution
methods.
Change-Id: Ic5fdf221a2a2cd0951539b0e040d2a941feee287
Story: #2007686
Task: #41558
The main idea of the commit is to fix code
according with the latest oslo.i18n requirements
https://docs.openstack.org/oslo.i18n/latest/
1. removed log translation if log is not seen by users
in raised exception or api call response.
2. keep translated log if it's used in raised exception.
3. removed log message 'Error while reading %s'
which was "dead" (unused) code in the function
"_get_value_from_conf_file"
of module "agent/linux/dhcp.py".
Partial-Bug: 1600788
Change-Id: Ifb5455336b06c2c87a930b816c90b4a766856b1e
In "IpAddrCommand.list" method, the "scope" parameter is a string
("link", "site", "global" or "host"). This method will retrieve all
devices with an IP address calling "ip_lib.get_devices_with_ip".
Since [1], "ip_lib.get_devices_with_ip" makes the conversion of
"scope" string parameter to pyroute2 format (see
"pyroute2.netlink.rtnl.rtscopes"). The list command should skip then
the previous conversion.
Closes-Bug: #1899141
[1]https://review.opendev.org/#/c/747406/
Change-Id: I55a0f4341b328af52ea3bd758a72f633fbe3abcb
If the device is not ready, the method should inform about this
event. The code calling this method, if needed, can write a higher
log message.
Change-Id: Ib7c5ba736f6e4ccc88df665faeef304c176a24e7
Closes-Bug: #1896920
This method is using ip_lib.get_devices_with_ip function to get
IP addresses with scope "link".
Unfortunatelly this method wasn't translating scope names to the pyrout2
values and due to that wasn't returning correct IP addresses.
Now this is fixed and correctl link local IPv6 addresses are returned.
Change-Id: Ia41c1bc627ad2ce89d658ff1fdedee802f6dfa15
Closes-Bug: #1892489
Some arping versions only accept an integer number for the
"deadline" (-w) parameter.
Change-Id: Icf5e2a73b15407419d5c922e236181af85aad0dc
Closes-Bug: #1885169
Recent changes in some versions of iproute2 CLI output (v4.18),
have invalidated the regular expression used to parse the
"ip link" output.
To solve this problem and avoid future ones, pyroute2 is used to
retrieve the virtual functions information and set the VF attributes
(spoofcheck, min_tx_rate, max_tx_rate and link_state).
pyroute2 extended the "ip link" support to retrieve this information,
adding "ext_mask=1" in the get command. If no virtual functions are
present in this particular network interface, the added method,
"get_link_vfs", will return an empty list.
The set commands can return a "InterfaceOperationNotSupported" in
case the operation is not supported. For min_tx_rate, if the driver
does not support to set a minimum bandwidth, an "InvalidArgument"
(from a pyroute2.NetlinkError(22)) exception will be raised.
Change-Id: I680da4f64bd114f1caecaaeedbf8a4b1915a0849
Closes-Bug: #1878042
"keepalived_state_change" monitor does not use eventlet but normal
Python threads. When "send_ip_addr_adv_notif" is called from inside
the monitor, the arping command is never sent because the eventlet
thread does not start. In order to be able to be called from this
process, this method should also have an alternative implementation
using "threading".
"TestMonitorDaemon.test_new_fip_sends_garp" is also modified to
actually test the GARP sent. The test was originally implemented with
only one interface in the monitored namespace.
"keepalived_state_change" sends a GARP when a new IP address is added
in a interface other than the monitored one. That's why this patch
creates a new interface and sets it as the monitor interface. When
a new IP address is added to the other interface, the monitor populates
it by sending a GARP through the modified interface [1].
[1] 8ee34655b8/neutron/agent/l3/keepalived_state_change.py (L90)
Change-Id: Ib69e21b4645cef71db07595019fac9af77fefaa1
Closes-Bug: #1870313
By default, if no metric is defined, the kernel interprets the
highest value (0).
The current implementation, using pyroute2, is a translation from
the CLI command "ip route". This command uses the netlink API to
communicate with the kernel. In IPv6, when the metric value is not
set is translated as 1024 as default [1].
[1]https://access.redhat.com/solutions/3659171
Change-Id: I0c5f9e320bbbf314a2d6a22c515bf903de84cdaf
Related-Bug: #1855759
The gateway IP address in the gateway dictionary returned by
"ip_lib.list_ip_routes" is stored in "via".
"priority" parameter is changed to "metric", to match input and
output parameters.
Change-Id: I67ae473dca8d706f963c3b55b9410f9a79d7f32b
Closes-Bug: #1855759
In "ip_lib.ensure_device_is_ready", before retrieving the interface
attributes, a check is done to know if the interface exists. In case
it does not exist, the exception "NetworkInterfaceNotFound" will not
be raised and written in the logs.
Change-Id: I4b9fd0885d850601717274a5058e042871211bbb
Closes-Bug: #1854723
IP monitor is a method that is going to be executed in a separate
process, to monitor the IP addresses changes in a namespace.
This method spawns a thread to read from a socket opened by Pyroute2.
The read function is a blocking method that will end only when the
socket is closed. To avoid thread starvation that can happen using
greenthreads, IP monitor will use kernel threads.
This will increase the resources used but will ensure that no message
is lost when reading the monitor socket.
Reduced the number of IPs generated in "test_add_and_remove_multiple_ips"
to shrink the testing time used.
Change-Id: I3fbba2854d40ab0f683443aa30c2a95752345d2e
Closes-Bug: #1849547
Seems that is_enabled_and_bind_by_default() from
neutron.common.ipv6_utils was copied directly into
oslo_utils.netutils, so start using it instead.
Trivialfix
Change-Id: I00fa441e7a20fcd1115485bb8ab75750e6a8cf07
In "NamespaceFixture", before deleting the namespace, this patch
introduces a check to first kill all processes running on it.
Closes-Bug: #1838793
Change-Id: I27f3db33f2e7ab685523fd2d6922177d7c9cb71b
- Add a new property to IPDevice to allow us to identify
the kind of the interface.
This change is required as an out of tree interface driver
which supports operations on a per-physnet basis
needs to be aware of the kind of interface an interface driver
created in order to correlate between an interface driver
and an interface created by it.
Change-Id: Icbdb011a639475f416ca1b98fdf3ce2f52482c7c
Partial-Bug: #1834176
In order to capture all IP address changes, the method reading the
netlink socket will be executed in a parallel thread. Once the
"ip_monitor" method is stopped, this blocking thread will be killed.
A new functional test, "test_add_multiple_ips", is added in order to
stress test this method.
Change-Id: I8f1de4a31f97bab734a33f94c3069444defd870f
Closes-Bug: #1832307
This method allows to track any IP address change in a
namespace. In future patches, this method will replace
the current IP monitor used in the keepalived_state_change
daemon. The current implementation relays in a spawned shell,
executed in root mode, and the output of this shell,
conveniently parsed.
If the passed namespace is not None, this new method must
be executed in privileged mode (root user), but cannot use
privsep because is a blocking function and can exhaust the
number of working threads.
This function should be executed in a parallel thread, returning
the data using the eventlet queue. Pyroute does not implement yet
a non blocking method to retrieve the command output or to know if
the buffer has data. This method, spawned in a greenthread, must be
stopped by killing this thread.
An example of how to use it can be found in the functional tests
implemented in this patch.
Change-Id: I86e4487035d60e1b52e951dd3cd50d6bb54f388b
Related-Bug: #1680183