Commit Graph

284 Commits

Author SHA1 Message Date
Brian Haley c3b855a100 Remove obsolete PID files before start
External processes, such as radvd, can refuse to start
and throw an exception such as:

  "Unable to convert value in $pidfile"

because the given pidfile has more than one PID in it.
The situation can happen when the neutron node is reset
and the obsolete PID files are not cleaned before neutron
is started.

This commit adds PID file cleanup before external
process start.

Closes-bug: #2033980
Change-Id: Id62bf18067d0b144c3e8825c7603cc1e51dca052
2023-10-20 17:09:20 -04:00
Anton Kurbatov f3c743d090 Do not update static routes in snat-ns for dvr router with ha
If a router is distributed with ha enabled, then the keepalived service
is responsible for setting static routes. This patch adds a check if
the router ha is disabled before adding routes. Otherwise, there are
duplicate routes and the issue when this route needs to be removed.
In addition this patch fixes multipath route in the snat-ns if no HA is
enabled.

Closes-Bug: #1999678
Signed-off-by: Anton Kurbatov <Anton.Kurbatov@acronis.com>
Change-Id: I8f1004b3fe2cad79cb61aa942b257c1508d18b68
2022-12-15 10:23:01 +00:00
Rajesh Tailor 8ab5ee1d17 Fix remaining typos in comments and tests
Change-Id: I872422cffd1f9a2e59b5e18a86695e5cb6edc2cd
2022-07-06 21:20:27 +05:30
Rodolfo Alonso Hernandez d73ec5000b [L3] Fix "NDPProxyAgentExtension.ha_state_change" call
The parameter "data" passed to the method "ha_state_change" is not
a router but a dictionary with "router_id" info.

The method "NDPProxyAgentExtension._process_router" requires the
router ID and the "enable_ndp_proxy" value, stored in the agent
router cache.

Closes-Bug: #1967839
Related-Bug: #1877301
Change-Id: Iab163e69f7e3641e2e1a451374231b6ccfa74c3e
2022-04-08 16:36:00 +00:00
Zuul d72eaabd83 Merge "[Agent Side] L3 router support ndp proxy" 2022-03-12 03:59:15 +00:00
labedz f430cd0072 Don't set HA ports down while L3 agent restart.
Because of the fix for bug[1] and issue with linux_utils
get_process_count_by_name() L3 agent puts all it's HA ports down
during initialization phase. Unfortunately such operation can break
already working L3 communication. Rewiring ha-* port from down state to
up can takes few seconds and some VRRP packages could be lost then.
That triggers keepalived on other node so router HA state change
may be triggered.

This change prevents putting HA ports down when during initialization
phase L3 agent finds already configured own net namespaces. Existance
of such net namespace is a good proof that there is a network
configuration existing so host wasn't rebooted so most probably it is
just agent restart.

[1] https://bugs.launchpad.net/neutron/+bug/1597461

Closes-Bug: #1959151
Change-Id: Id9c906b2d141c3bedd80fb5f868190f8a4b66f54
2022-03-01 14:27:42 +00:00
Yang JianFeng 9b27020a65 [Agent Side] L3 router support ndp proxy
The agent side codes need consider three scenarios:
1. Non-dvr router. The all related rules are applied in
   qrouter-namespace
2. Dvr router with the local agent mode is dvr_no_external.
   The all related rules are applied in snat-namespace.
3. Dvr router with the local agent mode is dvr. In this scenario,
   The all related rules are applied in fip-namespace.

Change-Id: Ie8729586d318be4a673858021a0116e09e193522
Partial-Bug: #1877301
2022-02-25 12:42:13 +08:00
Zuul 3ddc6dcbd9 Merge "Cleanup router for which processing added router failed" 2021-11-10 22:15:47 +00:00
Slawek Kaplonski 41159bd9a4 Cleanup router for which processing added router failed
In the _process_added_router() method of the L3 agent, if processing
router will fail, router_info should be cleaned to e.g. be removed from
the router cache so it will not be treated as updated router in next
iteration of the agent.

Closes-Bug: #1947993
Change-Id: Ic0bc3d951d32efadc116708bfe518a711730429d
2021-11-08 16:42:08 +01:00
Zuul 81f8524527 Merge "[DVR] Set arp entries only for IPs from the correct subnet" 2021-11-02 03:24:50 +00:00
Rodolfo Alonso Hernandez 8127221479 Check a namespace existence by checking only its own directory
To check the existance of a namespace, instead of listing the
namespaces directory (by default "/var/run/netns"), this patch
directly checks the existence of the namespace directory, using
"os.path.exists".

This check is faster than listing the whole directory and avoids
timeout problems as reported in the related bug.

Closes-Bug: #1947974
Change-Id: I558d50d28378beb3710d98a2113ff9549c82ae17
2021-10-25 09:59:32 +00:00
Rodolfo Alonso Hernandez c20f2e5136 [HA] Do not add initial state change delay in HA router
The initial state ("primary", "backup") should be set immediately.
in [1], a transition delay to "primary" was introduced. This delay
is unnecesary when the first state happens.

Closes-Bug: #1945512

[1]https://review.opendev.org/q/I70037da9cdd0f8448e0af8dd96b4e3f5de5728ad

Change-Id: Ibe9178c4126977f1321e414676d67f28e5ec9b57
2021-10-04 14:28:22 +00:00
Slawek Kaplonski 771fdc0b07 [DVR] Set arp entries only for IPs from the correct subnet
When dvr router is processing internal ports it is checking all
ports connected to the subnet and adding permanent arp entries for
all fixed IPs and allowed address pairs from those ports in the qrouter
namespace.
But port can have fixed IPs from different subnets, e.g. from IPv4 and
IPv6 subnet and until now Neutron wasn't checking subnet_id of the
fixed_ip address nor ip version of the allowed address pair's IP
address. That resulted in adding arp entries for all IPs through all
interfaces, e.g. IPv4 address was added as it's reachable through
interface connected to the IPv6 subnet.

This patch adds checking of the subnet_id for fixed_ips and ip version
for the allowed address pairs configured on the port to avoid that
problem.

Closes-Bug: #1936980
Change-Id: Id5afad7af74d69f8b4159163d23807a1cf032733
2021-09-24 09:51:30 +00:00
Zuul 7be2008090 Merge "[L3] Use processing queue for network update events" 2021-07-19 14:20:01 +00:00
Zuul d1228f265b Merge "Populate self.floating_ips_dict using "ip rule" information" 2021-07-19 14:00:30 +00:00
Rodolfo Alonso Hernandez a03c240ef4 Populate self.floating_ips_dict using "ip rule" information
When the L3 agent starts, reads the floating IP rule priority from
a state file created by "FipRulePriorityAllocator". In case of not
having all floating IPs registers in this file, the method:
- Creates a new priority for this floating IP.
- Creates the "ip rule" in the namespace.
- Adds a new entry in "self.floating_ips_dict".

All "ip rules" present in the namespace that do not match the
registered fixed IP address ("from") and the priority assigned
are deleted.

Closes-Bug: #1891673
Closes-Bug: #1929821

Change-Id: Ia3fbde3304ab5f3c309dc62dbf58274afbcf4614
2021-07-08 15:40:08 +00:00
Slawek Kaplonski 6ce48c30bd [L3] Use processing queue for network update events
Router_info's _process_internal_ports() method is the one which is
manipulating router_info.internal_ports cache and network_update()
method from the L3 agent is relying on that Router_info's cache to
check if updated network is connected to the router or not.
So they shouldn't be run together as that may cause some race conditions
and unexpected issues, like e.g. described in the related bug.

Until now, network_update event was the only one which was processed
without using queue of events. And because of that such race condition
as described above were possible.
To fix that, this patch changes network_update method in the way that it
now adds update events for each router hosted by agent to the queue.
Those events for single routers are then processed, checks if network is
actually connected to the router and if yes, schedules router update to
be processed.

Closes-Bug: #1933234
Change-Id: I2efe66a7415f7a18fb85bd2536a1901e751d6203
2021-07-08 17:03:43 +02:00
Hemanth Nakkina be7d0bb6ab Update arp entry of snat port on qrouter ns
In some cases, the arp entry of snat port is not updated
in qrouter namespace. l3-agent calls get_ports_by_subnet()
while setting arps for the subnet. And the snat port is
not returned if it is still unbound. One of the scenario
this is observed is when router is created, external
gateway set and internal subnet attached to router in
quick succession.

This patch retrieves snat port details from router info
as well and updates arp entry for snat port.

Closes-Bug: #1933092
Change-Id: I7ee797b4b930306cf6360922d855f8b24f1b813d
2021-07-02 17:06:43 +05:30
Slawek Kaplonski 24dcbcbe09 Block metadata requests to not go out from the router
Packets send from instances to the metadata service which is running in
the router's namespace should never go out from the router.
Even if e.g. nat rule to redirect it to port 9697 isn't installed in
iptables for some reason, in worst case such requests should be dropped.

Before that patch we had in mangle table rules to mark such packets with
specific mark. But we didn't block such packets later.
This patch adds rule to DROP such packets in the "scope" chain in the
filter table.

Co-authored-by: Rodolfo Alonso Hernandez <ralonsoh@redhat.com>

Related-Bug: #1920778
Change-Id: I6e9eec8fe9606d21fbce3699b4262e0783f667ec
2021-03-26 10:16:07 +01:00
Rodolfo Alonso Hernandez c89c1f53db Remove rootwrap execution (1)
Replace rootwrap execution with privsep context execution.
This series of patches will progressively replace any
rootwrap call.

This patch replaces some "IpNetnsCommand" command execution
methods.

Change-Id: Ic5fdf221a2a2cd0951539b0e040d2a941feee287
Story: #2007686
Task: #41558
2021-02-06 16:22:43 +00:00
Rodolfo Alonso Hernandez 0a0f647ea0 Delete HA metadata proxy PID and config with elevated privileges
Both files cannot be deleted with the default permissions because
those files are created by the "root" user.

Change-Id: I73dd37b3104fac8d3172f520f71cffd85d040c4b
Closes-Bug: #1907695
2020-12-13 21:50:31 +00:00
Bernard Cafarelli 8d6c301301
Update requirements for recent pip failures
Bump astroid test requirement to 2.4.0
Older versions trigger an error on wrapt dependency:
https://github.com/PyCQA/astroid/issues/755

Bump pylint accordingly to new astroid

Fix some new PEP8 warnings appearing with new versions, and filter out
the larget I202 "Additional newline in a group of imports" one for now

Drop psutil from functional requirements, it indicated an old version
and we have it in common requirements now

Bump a series of lower-constraints and requirements to work with new pip
resolver, testing with steps outlined at:
http://lists.openstack.org/pipermail/openstack-discuss/2020-December/019285.html
This includes eventlet 0.22.1, previous versions triggered a hard to
track error on enum34
Cap cryptography in lower-constraints to prevent discovery failure in
relevant job (other jobs have it capped via upper-constraints)

Change-Id: Ie74ea517a403e6e2a7a4e0a245dd20e5281339e8
Closes-Bug: #1907242
2020-12-09 13:17:51 +01:00
Slawek Kaplonski 489e0ead72 Fix migration from the HA to non-HA routers
In case if during switching HA router to be down, there will be any
failure, router_info will be stored in L3 agent's cache as HaRouter.
In case when next update on the router is migration to non-HA router
this is wrong class and it causes other issues, e.g. with
remove_vip_by_ip_address() which is correct only for HA routers.

This patch fixes that issue by adding check of the router's ha and
distributed flags and update local cache with new router_info class
in case if at least one of those flags don't match.

Change-Id: Ib0d3a501f88c149baea7d715c7cfe5811bc85e4f
Closes-Bug: #1892846
2020-11-16 21:56:30 +01:00
Zuul bdee7b0c58 Merge "Ensure fip ip rules deleted when fip removed" 2020-09-08 10:17:55 +00:00
Bence Romsics a1f4ee3ade metadata-ipv6: Router namespace
We push a v6 host route to make the guest send its metadata requests
in the direction of our router. We redirect it to haproxy which
mangles the headers and sends the request along to metadata-agent.

Apparently the supported list of dhcp options for dhcpv6 is quite
short in dnsmasq (cf. dnsmasq --help dhcp6) - not including anything
like classless-static-route for dhcpv4. So we must rely solely on
radvd to push host routes to the guest.

Metadata access over IPv6 is supposed to work both on dual-stack and
v6-only networks.

The following v6 subnet modes are supposed to work:

--ipv6-ra-mode slaac --ipv6-address-mode slaac
--ipv6-ra-mode dhcpv6-stateless --ipv6-address-mode dhcpv6-stateless
--ipv6-ra-mode dhcpv6-stateful --ipv6-address-mode dhcpv6-stateful

Change-Id: I28f2914b1b67659af2db7240eae730ac43daccd2
Partial-Bug: #1460177
2020-08-31 13:02:49 +02:00
Bence Romsics a0b18d553d metadata-ipv6: DHCP namespace
Send IPv6 metadata traffic (dst=fe80::a9fe:a9fe) to the metadata-agent.

When running on IPv6 enabled system bind haproxy (i.e. the
metadata-proxy) to 169.254.169.254 and to fe80::a9fe:a9fe also.

We do not introduce new config options. The usual config options
(enable_isolated_metadata, force_metadata, enable_metadata_proxy)
now control the metadata service over both IPv4 and IPv6.

This change series only affects the guests' access to the metadata
service (over tenant networks). They change nothing about how the
metadata-agent talks to Nova's metadata service.

Metadata access over IPv6 is supposed to work both on dual-stack and
v6-only networks.

In order to enable the metadata service on pre-existing isolated
networks during an upgrade, this change makes each dhcp-agent restart
trigger a quick restart of dhcp-agent-controlled metadata-proxies,
so they can pick up their new config making them also bind to
fe80::a9fe:a9fe.

Change-Id: If35f00d1fc9e4ab7e232660362410ce7320c45ba
Partial-Bug: #1460177
2020-08-31 13:02:39 +02:00
Zuul be1e4f845d Merge "Improve terminology in the Neutron tree" 2020-08-28 14:06:18 +00:00
Slawek Kaplonski 81d375d39a Handle properly existing LLA address during l3 agent restart
In case when L3 agent is hosting routers which have got subnets
with Prefix Delegation enabled, agent couldn't properly handle
IpAddressAlreadyExists exception raised when pd module tries to
configure link local IPv6 addresses.

Now this is fixed and L3 agent can restart without problems in such
case.

Change-Id: Icc995f7b2b465921e41342711d17539f16ead0ce
Closes-Bug: #1892362
2020-08-21 13:18:14 +02:00
Brian Haley 055036ba2b Improve terminology in the Neutron tree
There is no real reason we should be using some of the
terms we do, they're outdated, and we're behind other
open-source projects in this respect. Let's switch to
using more inclusive terms in all possible places.

Change-Id: I99913107e803384b34cbd5ca588451b1cf64d594
2020-08-19 16:47:53 -04:00
Edward Hope-Morley 5eca44bfa8 Ensure fip ip rules deleted when fip removed
The information needed to delete ip rules associated
with fips is held in memory between add and remove so
a restart of the l3-agent results in any fips that
existed before the restart having their ip rules
persist after the fips are removed. This patch
enures that an agent restart reloads this information
so that ip rules associated with a fip are correctly
removed when the fip is removed.

Change-Id: If656a703c996ccc7719b1b09d793c5bbdfd6f3c1
Closes-Bug: #1891673
2020-08-18 20:39:10 +01:00
Brian Haley 7594bb0627 Remove the dependency on the "mock" package
Now that we are python3 only, we should move to using the built
in version of mock that supports all of our testing needs and
remove the dependency on the "mock" package.

This patch moves all references to "import mock" to
"from unittest import mock". It also cleans up some new line
inconsistency.

Fixed an inconsistency in the OVSBridge.deferred() definition
as it needs to also have an *args argument.

Fixed an issue where an l3-agent test was mocking
functools.partial, causing a python3.8 failure.

Unit tests only, removing from tests/base.py affects
functional tests which need additional work.

Change-Id: I40e8a8410840c3774c72ae1a8054574445d66ece
2020-04-28 18:05:37 -04:00
LIU Yulong 12b9149e20 Not remove the running router when MQ is unreachable
When the L3 agent get a router update notification, it will try to
retrieve the router info from neutron server. But at this time, if
the message queue is down/unreachable. It will get exceptions related
message queue. The resync actions will be run then. Sometimes, rabbitMQ
cluster is not so much easy to recover. Then Long time MQ recover time
will cause the router info sync RPC never get successful until it meets
the max retry time. Then the bad thing happens, L3 agent is trying to
remove the router now. It basically shutdown all the existing L3 traffic
of this router.

This patch directly removes the final router removal action, let the
router run as it is.

Closes-Bug: #1871850
Change-Id: I9062638366b45a7a930f31185cd6e23901a43957
2020-04-24 17:44:27 -04:00
Oleg Bondarev 5663517613 Support L3 agent cleanup on shutdown
Add an option to delete all routers on agent shutdown.

Closes-Bug: #1851609
Change-Id: I7a4056680d8453b2ef2dcc853437a0ec4b3e8044
2019-12-16 17:01:31 -05:00
Zuul 101a7a6215 Merge "Start using oslo_utils.netutils.is_ipv6_enabled()" 2019-10-26 21:32:02 +00:00
Brian Haley 555238da69 Start using oslo_utils.netutils.is_ipv6_enabled()
Seems that is_enabled_and_bind_by_default() from
neutron.common.ipv6_utils was copied directly into
oslo_utils.netutils, so start using it instead.

Trivialfix

Change-Id: I00fa441e7a20fcd1115485bb8ab75750e6a8cf07
2019-10-16 21:44:56 -04:00
Rodolfo Alonso Hernandez 6a5a75d5a6 Add radvd_user config option
In some deployments, the "neutron" user does not have the permissions
to modify the kernel interfaces. In those cases the radvd user should
be defined. This patch introduces a new config option: "radvd_user".

This config option is the username passed to radvd, used to drop root
privileges and change user ID to username and group ID to the primary
group of username. If no user specified (by default is an empty string),
the user executing the L3 agent will be passed. If "root" specified,
because radvd is spawned as root, no "username" parameter will be
passed.

Change-Id: Ie9a6fbf04d453a3c1c0bddf9ecaa3d4d6467e8ff
Closes-Bug: #1844688
2019-10-14 13:01:30 +00:00
LIU Yulong f51e5ce924 Remove get_external_network_id for router
L3 agent supports multiple external networks from a long
time ago, so remove this RPC call since it is not used.
According to codesearch of [1] and [2], we just remove
neutron built-in L3 agent RPC. For neutron server side
or RPC callback classes, the function is still remained.

[1] http://codesearch.openstack.org/?q=get_external_network_id
[2] http://codesearch.openstack.org/?q=L3RpcCallback

Change-Id: I764423e175d6e82729a647e415a9f267f495916f
Closes-Bug: #1844168
2019-09-20 13:31:32 +00:00
Rodolfo Alonso Hernandez 3f022a193f Delay HA router transition from "backup" to "master"
As described in the bug, when a HA router transitions from "master" to
"backup", "keepalived" processes will set the virtual IP in all other
HA routers. Each HA router will then advert it and "keepalived" will
decide, according to a trivial algorithm (higher interface IP), which
one should be "master". At this point, the other "keepalived" processes
running in the other servers, will remove the HA router virtual IP
assigned an instant before

To avoid transitioning some routers form "backup" to "master" and then
to "backup" in a very short period, this patch delays the "backup" to
"master" transition, waiting for a possible new "backup" state. If
during the waiting period (set to the HA VRRP advert time, 2 seconds
default) to set the HA state to "master", the L3 agent receives a new
"backup" HA state, the L3 agent does nothing.

Closes-Bug: #1837635

Change-Id: I70037da9cdd0f8448e0af8dd96b4e3f5de5728ad
2019-08-27 16:47:00 +00:00
Zuul 0cde163967 Merge "Remove 'gateway_external_network_id' config option" 2019-08-05 12:40:08 +00:00
Adrian Chiris 0e80d2251e Pass get_networks() callback to interface driver
In order to support out of tree interface drivers it is required
to pass a callback to allow the drivers to query information about
the network.

- Allow passing **kwargs to interface drivers
- Pass get_networks() as `get_networks_cb` kw arg
  `get_networks_cb` has the same API as
  `neutron.neutron_plugin_base_v2.NeutronPluginBaseV2.get_networks()`
   minus the the request context which will be embeded in the callback
   itself.

The out of tree interface drivers in question are:

MultiInterfaceDriver - a per-physnet interface driver that delegates
                       operations on a per-physnet basis.
IPoIBInterfaceDriver - an interface driver for IPoIB (IP over Infiniband)
                       networks.

Those drivers are a part of networking-mlnx[1], Their implementation
is vendor agnostic so they can later be moved to a more common place
if desired.

[1] https://github.com/openstack/networking-mlnx

Change-Id: I74d9f449fb24f64548b0f6db4d5562f7447efb25
Closes-Bug: #1834176
2019-07-30 20:21:16 +03:00
Slawek Kaplonski 9b2e472ae9 Remove 'gateway_external_network_id' config option
This option was deprecated since couple of releases already.
In Stein we removed 'external_network_bridge' option from L3 agent's
config so now it's time to remove also this one.

There is also new upgrade check introduced to check and warn
users if gateway_external_network_id was used in the deployment.

This patch also removes method _check_router_needs_rescheduling() from
neutron/db/l3_db.py module as it is not needed anymore.

This patch also removes unit tests:
test_update_gateway_agent_exists_supporting_network
test_update_gateway_agent_exists_supporting_multiple_network
test_router_update_gateway_no_eligible_l3_agent
from neutron/tests/unit/extensions/test_l3.py module as those
tests are not needed when there is no "gateway_external_network_id"
config option anymore.

Change-Id: Id01571cd42cfe9c5ce91e90159917c7d3c963878
2019-07-26 13:19:14 +02:00
Brian Haley b79842f289 Start enforcing E125 flake8 directive
Removed E125 (continuation line does not distinguish itself
from next logical line) from the ignore list and fixed all
the indentation issues.  Didn't think it was going to be
close to 100 files when I started.

Change-Id: I0a6f5efec4b7d8d3632dd9dbb43e0ab58af9dff3
2019-07-19 23:39:41 -04:00
LIU Yulong 426a5b2833 Adjust some HA router log
In case router is concurrently deleted, so the HA
state change LOG is not necessary. It sometimes
makes us confusing.
Also print the log for the pid of router
keepalived-state-change child process.

Change-Id: Id57dd787c254994af967db17647a3a28925714da
Related-Bug: #1798475
2019-07-03 04:50:45 +00:00
Slawek Kaplonski c195352e70 Remove mock of not existing method in L3 agent UT.
There was mock of ri._get_floatingips_bound_to_host() in
L3 test_agent unit test module. This method was removed long time
ago in [1] so this mock is not necessary anymore.

TrivialFix

[1] https://review.opendev.org/#/c/499725/

Change-Id: Ia93cab667f8154663ba62b78bc0329ee16b8202c
2019-06-18 09:23:51 +02:00
Miguel Lavalle 0b3f5f429d Support multiple external networks in L3 agent
Change [1] removed the deprecated option external_network_bridge. Per
commit message in change [2], "l3 agent can handle any networks by
setting the neutron parameter external_network_bridge and
gateway_external_network_id to empty". So the consequence of [1] was to
introduce a regression whereby multiple external networks are not
supported by the L3 agent anymore.

This change proposes a new simplified rule. If
gateway_external_network_id is defined, that is the network that the L3
agent will use. If not and multiple external networks exist, the L3
agent will handle any of them.

[1] https://review.opendev.org/#/c/567369/
[2] https://review.opendev.org/#/c/59359

Change-Id: Idd766bd069eda85ab6876a78b8b050ee5ab66cf6
Closes-Bug: #1824571
2019-05-27 19:23:28 -05:00
Zuul 554b7cd228 Merge "Add router_factory to l3-agent and L3 extension API" 2019-04-27 06:37:15 +00:00
Yang Youseok ec875b42b6 Add router_factory to l3-agent and L3 extension API
Currently, most implementations override the L3NatAgent class itself
for their own logic since there is no proper interface to extend
RouterInfo class. This adds unnecessary complexity for developers
who just want to extend router mechanism instead of whole RPC.

Add a RouterFactory class that developer can registers RouterInfo class
and delegate it for RouterInfo creation. Seperate functions and variables
which currently used externally to abstract class from RouterInfo, so that
extension can use the basic interface.

Provide the router registration function to the l3 extension API so that
extension can extend RouterInfo itself which correspond to each features
(ha, distribtued, ha + distributed)

Depends-On: https://review.openstack.org/#/c/620348/
Closes-Bug: #1804634
Partially-Implements: blueprint openflow-based-dvr
Change-Id: I1eff726900a8e67596814ca9a5f392938f154d7b
2019-04-26 10:22:50 +09:00
Zuul 4765708ec7 Merge "Packets getting lost during SNAT with too many connections" 2019-04-24 03:13:18 +00:00
Zuul aee110ca42 Merge "Mock check if ipv6 is enabled in L3 agent unit tests" 2019-04-18 19:45:55 +00:00
Swaminathan Vasudevan 30f35e08f9 Packets getting lost during SNAT with too many connections
We have a problem with SNAT with too many connections using the
same source and destination on the network nodes.

In addition we can see in the conntrack table that the who
"instert_failed" increases.

This might be a generic problem with conntrack and linux.
We suspect that we encounter the following "limitation / bug"
in the kernel.

There seems to be a workaround to alleviate this behavior by
setting the -random-fully flag in iptables for port consumption.

This patch fixes the problem by adding the --random-fully to
the SNAT rules.

Change-Id: I246c1f56df889bad9c7e140b56c3614124d80a19
Closes-Bug: #1814002
2019-04-12 10:12:04 -04:00