Commit Graph

134 Commits

Author SHA1 Message Date
Brian Haley b7201b9fbb Add router ID in HA router process() debug message
All the other messages in this file have the router ID
in them, add it in the process() method as well to aid
in debugging HA port issues.

Related-bug: #2037239
Change-Id: Ic6fba46d5e80aae95c977b63228ec8458ea60f5d
2023-09-25 14:53:23 -04:00
Vasyl Saienko 25ec6e7e4f Set ip_nonlocal_bind to 1 for HA routers and DVR snat
Set nonlocal_bind to 1 to allow starting applications in both
routers (like ipsec from vpnaas). nonlocal_bin 0 prevens us from
starting ipsec in both routers simulteniously as process can't bind
to non existing address which was worarkunded in [1]
by setting dependency on python process during failover.

This revert [2] completely, which was partially reverted by [3].

[1] https://review.opendev.org/c/openstack/neutron-vpnaas/+/823904
[2] https://review.opendev.org/393886
[3] https://review.opendev.org/c/openstack/neutron/+/752360

Related-Bug: 1999761

Change-Id: I18a15aa3ca745b2b794610350f538d02575ccbe0
2023-01-05 14:45:20 +00:00
Damian Dabrowski 5288593faf [L3-HA] Disable automatic link-local address assignment for HA routers
In order to get both [1] and [2] fixed, we set
`net.ipv6.conf.all.addr_gen_mode=1` in HA router namespace to
prevent auto-assigning link-local address(lla) to the interfaces.
We don't need lla auto-assignment as keepalived manages them.
With this change, we will have link-local addresses only on active
router, which will prevent 'dadfailed' and MLD packets will not be
sent from standby router.

Previously we also reverted [3] to always keep qg-* interface up on both
active&standby router's instance, no matter if keepalived is started or
not.
Without link-local address assigned, backup router's instance won't
send any packets, so I see no reason to keep qg-* interface down.

[1] https://bugs.launchpad.net/neutron/+bug/1952907
[2] https://bugs.launchpad.net/neutron/+bug/1859832
[3] https://review.opendev.org/c/openstack/neutron/+/834162

Closes-Bug: #1952907
Related-Bug: #1859832
Depends-On: https://review.opendev.org/c/openstack/neutron/+/834162
Change-Id: I306f14aa6b7e8bb69a81f441be337bc1a584d3b2
2022-05-24 11:30:02 +00:00
Edward Hope-Morley 36bf1df46d Partially revert "Do not link up HA router gateway in backup node"
This partially reverts commit c52029c39a.

We revert everything except one minor addition to
neutron/agent/l3/ha_router.py which ensures that ha_confs path is
created when the keepalived manager is initialised.

Closes-Bug: #1965297
Change-Id: I14ad015c4344b32f7210c924902dac4e6ad1ae88
2022-05-24 11:24:30 +00:00
Slawek Kaplonski 21eabbcf03 [DVR] Fix update of the MTU in the DVR HA routers
This is follow up of the patch [1] which fixed updating MTU in the
snat namespace for the DVR routers.
In case of DVR-HA routers there was additional issue with that as
L3 agent tried to update MTU for the qr- interface in the
self.ha_namespace which, for DVR-HA routers is snat namespace.

This patch fixes that issue by setting MTU on the qr- interface in
qrouter namespace and also setting MTU on the snat interface in snat
namespace.

[1] https://review.opendev.org/c/openstack/neutron/+/799226

Closes-bug: #1933273
Change-Id: I409bc674b65e4f495ebd42d03e97a09d51482339
2021-10-13 14:33:19 +02:00
Zuul ed3374746d Merge "Revert "[L3][HA] Retry when setting HA router GW status."" 2021-09-15 09:49:32 +00:00
Slawek Kaplonski b5dd6efdca [DVR] Fix update of the MTU in the SNAT namespace
When network's MTU is changed, Neutron sends notification about it
to the L3 agents. In case of DVR (and DVR HA) MTU is then changed in
the qrouter- namespace but should be also changed on snat interfaces
in the snat namespace. And that part was missing.

This patch adds special implementation of the internal_network_updated()
method in the DvrEdgeRouter class so it can configure MTU also for
in the snat namespace.

This patch also removed passing attributes "interface_name",
"ip_cidrs" and "mtu" to the internal_network_updated() method and adds
"port" dict to be passed there. It is consistent with what is already
done in e.g. internal_network_added() method and "port" dict is actually
necessary to configure properly snat internal interface in the snat
namespace.

This patch adds also functional test of update network mtu for all types
of routers as there was no such test at all.

There is additional issue with DVR-HA which isn't fixed with that patch
and for which follow up will be proposed. Because of that this patch is
marked as partial fix for the related bug.

Related-Bug: #1933273
Change-Id: I200acfcaaae7f056ea9a563fead9ff2de8464971
2021-08-30 16:49:01 +02:00
Edward Hope-Morley 344fc0c8d2 Revert "[L3][HA] Retry when setting HA router GW status."
In short this patch can cause the privsep reader thread to
die resulting in the l3 agent getting stuck and e.g. not
processing any router updates. See related LP bug for full
explanation.

Closes-Bug: #1927868

This reverts commit 662f483120.

Change-Id: Ide7e9771d08eb623dd75941e425813d9b857b4c6
2021-08-20 12:26:42 +01:00
Slawek Kaplonski 82fd968011 [L3HA] Add extra logs to the process of ha state changes
Some extra debug logs may be useful to understand exactly what happens
during ha states transitions and e.g. to understand failures like
described in the related bug.

Related-bug: #1939507
Change-Id: Id708b2c7a602df8d4ba1b32e58d4b152b5c58ba6
2021-08-12 16:48:57 +02:00
yangjianfeng f192153b44 HA-non-DVR router don't need manually add static route
When a router set as HA mode, The keepalived process will take over
the route entry's generation. So, the codes that add static route
is redundant.

But, for DVR-HA router, in dvr_snat node the keepalived process run
in snat-namespace and don't take over qrouter-namespace, so the
manually add static route codes still need be called.

Closes-Bug: #1927849
Change-Id: Id09de6c43c0fab4009336e253c88f54219398053
2021-05-16 04:49:01 -04:00
Rodolfo Alonso 19eb12bd29 Revert "Implement "kill" method using os.kill()"
This reverts commit 4b21111eb1.

Reason for revert: This method is unstable and prone to timeouts

Change-Id: I6064d60e4d63b085046aace7683d766a79dd22da
2021-03-25 22:05:58 +00:00
Rodolfo Alonso Hernandez 4b21111eb1 Implement "kill" method using os.kill()
Implement the "kill" method (send a signal to a process) using the
Python native library "os".

In functional tests, "RootHelperProcess.kill" method should not fail if
the process does not exist.

Closes-Bug: #1843446
Closes-Bug: #1843418

Change-Id: Iee97a83779dd3e20eb3a223fb8557a94b8f15dc0
2021-03-22 08:58:20 +00:00
Rodolfo Alonso Hernandez 662f483120 [L3][HA] Retry when setting HA router GW status.
When a HA router instance changes the state (active, backup), the
GW interface is set to up or down. As reported in the bug, while
keepalived is configuring the interface, the interface disappears
and appears again from the kernel namespace, as seen in the udev
messages.

This patch is a workaround until the real issue is addressed (if
possible), retrying the interface configuration for a small period
of time.

Related-Bug: #1916024

Change-Id: I8ced69f4f8e7d7c73da130a57e89e9d66590390b
2021-03-02 10:45:50 +00:00
Slawek Kaplonski 0d8ae15767 Remove update_initial_state() method from the HA router
This method was intended to check state of the HA router on the
node and update it in the neutron server.
Patch [1] added check of the initial status to the
neutron_keepalived_state_change_monitor process.
It also could cause some race conditions and event which is setting
correct state of the router will be not processed thus router may endup
with two nodes with "primary" state in the Neutron's DB.

Neutron_keepalived_state_change_monitor was notifying agent about
router's initial state only if this state was 'primary'.
Now it will notify agent always to let agent set router's state as
'backup' if needed (that was previously done by this removed
update_initial_state() method).

[1] https://review.opendev.org/c/openstack/neutron/+/642295

Change-Id: I2cc58c30cf844ee0ecf0611ecdec430086464790
Closes-Bug: #1916022
2021-02-23 14:58:29 +00:00
Zuul d9769028c6 Merge "Don't raise FileNotFoundError during disabling keepalived" 2020-08-28 16:54:30 +00:00
Slawek Kaplonski a08893368a Don't raise FileNotFoundError during disabling keepalived
In case when keepalived's config is not existing already, there is no
need to raise any exception while L3 agent is trying to clean this
config.

Change-Id: I9ec81ad0c10379294d3145c5902e8b81b65c0221
Closes-Bug: #1892866
2020-08-25 15:06:04 +02:00
Brian Haley 055036ba2b Improve terminology in the Neutron tree
There is no real reason we should be using some of the
terms we do, they're outdated, and we're behind other
open-source projects in this respect. Let's switch to
using more inclusive terms in all possible places.

Change-Id: I99913107e803384b34cbd5ca588451b1cf64d594
2020-08-19 16:47:53 -04:00
jufeng 554b5c2267 Support gateway which is not in subnet CIDR in ha_router
There is case that gateway is not in subnet CIDR.
We can set 2 routes as follows to support this:
ip route add 172.16.0.1/32 dev eth0
ip route add default via 172.16.0.1 dev eth0

Closes-bug: #1861674
Change-Id: I69356e926b15de7f1f99540e7cb98671c634e8a9
2020-07-09 09:11:28 +00:00
LIU Yulong c52029c39a Do not link up HA router gateway in backup node
L3 router will set its devices link up by default.
For HA routers, the gateway device will be pluged
in all scheduled hosts. When the gateway deivce is
up in backup node, it will send out IPv6 related
packets (MLDv2) according to some kernal config.
This will cause the physical fabric think that the
gateway MAC is now working in the backup node. And
finally the master node L3 traffic will be broken.

This patch sets the backup gateway device link down
by default. When the VRRP sets the master state in
one host, the L3 agent state change procedure will
do link up action for the gateway device.

Closes-Bug: #1859832
Change-Id: I8dca2c1a2f8cb467cfb44420f0eea54ca0932b05
2020-03-25 16:09:42 +08:00
Rodolfo Alonso Hernandez 3437572906 Replace "ip monitor" command with Pyroute2 implementation
Use the "ip monitor" tool implemented with Pyroute2 library in
the neutron-keepalived-state-change monitor.

Change-Id: I932b62a8e0fa1a2f51bbde44134272f0b31b5c76
Related-Bug: #1680183
2019-12-08 22:38:45 +00:00
Rodolfo Alonso Hernandez 3f022a193f Delay HA router transition from "backup" to "master"
As described in the bug, when a HA router transitions from "master" to
"backup", "keepalived" processes will set the virtual IP in all other
HA routers. Each HA router will then advert it and "keepalived" will
decide, according to a trivial algorithm (higher interface IP), which
one should be "master". At this point, the other "keepalived" processes
running in the other servers, will remove the HA router virtual IP
assigned an instant before

To avoid transitioning some routers form "backup" to "master" and then
to "backup" in a very short period, this patch delays the "backup" to
"master" transition, waiting for a possible new "backup" state. If
during the waiting period (set to the HA VRRP advert time, 2 seconds
default) to set the HA state to "master", the L3 agent receives a new
"backup" HA state, the L3 agent does nothing.

Closes-Bug: #1837635

Change-Id: I70037da9cdd0f8448e0af8dd96b4e3f5de5728ad
2019-08-27 16:47:00 +00:00
LIU Yulong 426a5b2833 Adjust some HA router log
In case router is concurrently deleted, so the HA
state change LOG is not necessary. It sometimes
makes us confusing.
Also print the log for the pid of router
keepalived-state-change child process.

Change-Id: Id57dd787c254994af967db17647a3a28925714da
Related-Bug: #1798475
2019-07-03 04:50:45 +00:00
Zuul c3e611eaf1 Merge "Add kill hooks for external processes" 2019-06-05 01:09:51 +00:00
Slawek Kaplonski 93015527f0 Add kill hooks for external processes
This patch adds possibility to configure kill hooks used to kill
external processes, like dnsmasq or keepalived.

Change-Id: I29dfbedfb7167982323dcff1c4554ee780cc48db
Closes-Bug: #1825943
2019-06-03 14:39:51 +02:00
LIU Yulong 26388a9952 Set neutron-keepalived-state-change proctitle
Then we can count the process correctly.

Related-Bug: #1798475
Change-Id: I9c6651ed192669b91a4683f5f3bd2795e8d8276a
2019-05-23 15:22:35 +08:00
Rodolfo Alonso Hernandez aacd11ab9f Remove rootwrap configuration from neutron-keepalived-state-change
New IP command introduced by Ie3fe825d65408fc969c478767b411fe0156e9fbc
requires only privsep initialization. This patch removes the prisep
error FailedToDropPrivileges when executed under neutron-rootwrap.

Closes-Bug: #1823038

Change-Id: I6cde3c9dae7ffdccce49e88c3c79d1c379f291cf
2019-05-15 17:22:48 +00:00
LIU Yulong 45957f12c8 Keep HA ports info for HA router during entire lifecycle
Once HA port is set, it must remain this value no matter
what the server return. Because there is race condition
between l3-agent side sync router info for processing
and server side router deleting.

This patch adds a helper function for every ha_port set
action. If the ha_port is not None, it will always stay
with original value.

Closes-Bug: #1826726
Change-Id: I96a088d25048be02a9c5b12c1d087df075b36fc4
2019-05-05 10:34:09 +08:00
Zuul 540449cfbd Merge "Add log file for neutron-keepalived-state-change" 2019-04-24 19:04:47 +00:00
LIU Yulong ccf76c36bb Add log file for neutron-keepalived-state-change
neutron-keepalived-state-change may not start but have no method
to find out why. This patch adds the log file for it.

Change-Id: I688a6e6d0ac42c00d87571484f726e0eae091675
Related-Bug: #1822155
2019-04-18 01:04:16 +00:00
Zuul f8b990736b Merge "remove neutron.common.constants" 2019-04-11 18:33:05 +00:00
Boden R 9bbe9911c4 remove neutron.common.constants
All of the externally consumed variables from neutron.common.constants
now live in neutron-lib. This patch removes neutron.common.constants
and switches all uses over to lib.

NeutronLibImpact

Depends-On: https://review.openstack.org/#/c/647836/
Change-Id: I3c2f28ecd18996a1cee1ae3af399166defe9da87
2019-04-04 14:10:26 -06:00
Edward Hope-Morley afbbec83a2 Don't pass None arg to neutron-keepalived-state-change
The original fix for bug 1818614 added two new cli args
when spawning neutron-keepalived-state-change but if
e.g. self.agent_conf.AGENT.root_helper_daemon is unset
then "None" string is passed which breaks the
neutron-keepalived-state-change daemon.

Change-Id: I4afcdbbf2f3d2dafcad241ba3fc0778b52b8fc85
Related-Bug: #1818614
Related-Bug: #1823038
2019-04-04 14:30:35 +01:00
Zuul b6c2d2afd1 Merge "Set initial ha router state in neutron-keepalived-state-change" 2019-03-14 19:08:19 +00:00
Slawek Kaplonski 8fec1ffc83 Set initial ha router state in neutron-keepalived-state-change
Sometimes in case of HA routers it may happend that
keepalived will set status of router to MASTER before
neutron-keepalived-state-change daemon will spawn "ip monitor"
to monitor changes of IPs in router's namespace.

In such case neutron-keepalived-state-change process will never
notice that keepalived set router to be MASTER and L3 agent will
not be notified about that so router will not be configured properly.

To avoid such race condition neutron-keepalived-state-change will
now check if VIP address is already configured on ha interface
before it will spawn "ip monitor". If it is already configured
by keepalived, it will notify L3 agent that router is set to
MASTER.

Change-Id: Ie3fe825d65408fc969c478767b411fe0156e9fbc
Closes-Bug: #1818614
2019-03-12 12:29:36 +01:00
Sławek Kapłoński b09b44608b Remove deprecated 'external_network_bridge' option
This option is deprecated and marked to be deleted in Ocata. So
as we are now in Stein development cycle I think that it's good time
to remove it.

Change-Id: I07474713206c218710544ad98c08caaa37dbf53a
2019-03-09 22:07:38 +00:00
Brian Haley b083d39a83 Change agents to use get_devices_with_ip()
Instead of instantiating an IPDevice object just to get
the list of IPs, call get_devices_with_ip() instead since
that's what it's doing anyways.

Trivialfix

Change-Id: I5055d24a40d45f3f3b13b05249d353ea67acf4d5
2019-02-05 18:30:01 +02:00
Bernard Cafarelli 6124f60297 Switch isolated metadata proxy to bind to 169.254.169.254
Currently the metadata proxy binds to default 0.0.0.0, which does not
add any advantage (metadata requests are not sent to random IP
addresses), and may allow access to cloud information from
third parties.

This changes the generated configuration to bind to METADATA_DEFAULT_IP
address instead.

This is not enabled in other metadata proxy configuration (in the L3
agent), as this would require net.ipv4.ip_nonlocal_bind everywhere
(currently only enabled for DVR) or transparent mode in haproxy (which
requires net.ipv4.ip_nonlocal_bind anyway)

Changed set_ip_nonlocal_bind_for_namespace() to support setting the
value in both the given and root namespace correctly, since it was
only used from inside the neutron codebase according to codesearch.

Change-Id: I388391cf697dade1a163d15ab568b33134f7b2d9
Co-Authored-By: Andrey Arapov <andrey.arapov@nixaid.com>
Closes-Bug: #1745618
2019-01-30 14:17:43 +00:00
Zuul 9ad2e05088 Merge "filter "updated_at" and "revision_number" in _gateway_ports_equal" 2018-12-01 09:36:11 +00:00
hujin 6541304d5e filter "updated_at" and "revision_number" in _gateway_ports_equal
When the HA attribute of the router changes, the code determines
whether the gateway in memory is consistent with the gateway
in the database to decide whether it needs to be reconfigured.
But there are problems with the judging conditions.

After the HA attribute of the router changes, the relevant parameters
of gateway port will be updated by ML2 agent,
including "binding:host_id"、"updated_at" and "revison_number".
Method "_gateway_ports_equal" removes
only the "binding:host_id" property of the port,
resulting in unequal results for each decision

Change-Id: I19e024ff360611d191da2bd3bff1b86abe1a8ea1
Closes-Bug: 1797298
2018-11-20 10:11:24 +08:00
Brian Haley b847cd02c5 Enable 'all' IPv6 forwarding knob correctly
When the external gateway is plugged and we enable IPv6
forwarding on it, make sure the 'all' sysctl knob is also
enabled, else IPv6 packets will not be forwarded.  This
seems to only affect HA routers that default to disabling
this 'all' knob on creation.

Also, when we are removing all the IPv6 addresses from a
HA router internal interface, set 'accept_ra' to zero so
it doesn't accidentally auto-configure an address.  Set
it back to one when adding them back.

Re-homed newly added _wait_until_ipv6_forwarding_has_state()
accordingly.

Closes-bug: #1787919

Change-Id: Ia1f311ee31d1479089685367a97bf13cf170b342
2018-11-15 14:59:49 -05:00
Swaminathan Vasudevan 81652cd939 DVR-HA: Configure extra routes on router namespace in dvr_snat node
Extra routes are not configured on Router namespaces in dvr_snat
node with DVR-HA configuration.
This patch fixes the problem.

Change-Id: If620b23564479042aa6f58640bcd6705e5eb52cf
Closes-Bug: #1797037
2018-10-12 11:00:35 -04:00
Slawek Kaplonski 3e9e2a5b4b Disable IPv6 forwarding by default on HA routers
In case of HA routers IPv6 forwarding is not disabled by default and
then enabled only on master node.
Before this patch it was done in opposite way, so forwarding was
enabled by default and then disabled on backup nodes.
When forwarding was enabled/disabled for qg- port, MLDv2 packets are
sent and that might lead to temportary packets loss as packets to
FIP were sent to this backup node instead of master one.

Related-Bug: #1771841

Change-Id: Ia6b772e91c1f94612ca29d7082eca999372e60d6
2018-05-31 20:19:21 +00:00
Brian Haley 922cd0a938 Change ha_state property to always return a value
Right now, ha_state could return any value that is in
the state file, or even '' if the file is empty.  Instead,
return 'unknown' if it's empty.

We also need to update the translation map in the HA code
to deal with this new value to avoid a KeyError.

Related-bug: #1755243

Change-Id: I94a39e574cf4ff5facb76df352c14cbaba793e98
2018-04-17 14:23:23 +00:00
Zuul aa58e9c2eb Merge "Fix callers of get_devices_with_ip() to pass addresses" 2018-01-05 07:53:02 +00:00
Brian Haley c62d54d0c2 Fix HA router initialization exception
When an HA router initialization fails early, it can lead to:

 AttributeError: 'HaRouter' object has no attribute 'process_monitor'

Add init of 'self.process_monitor' in RouterInfo init code in
case we try and cleanup early.

Change-Id: Iddeaeef13adee10f7b130e3f9e584b6e9f037030
Closes-bug: #1735557
2017-11-30 16:47:43 -05:00
Brian Haley d2b909f533 Move check_ha_state_for_router() into notification code
As soon as we call router_info.initialize(), we could
possibly try and process a router.  If it is HA, and
we have not fully initialized the HA port or keepalived
manager, we could trigger an exception.

Move the call to check_ha_state_for_router() into the
update notification code so it's done after the router
has been created.  Updated the functional tests for this
since the unit tests are now invalid.

Also added a retry counter to the RouterUpdate object so
the l3-agent code will stop re-enqueuing the same update
in an infinite loop.  We will delete the router if the
limit is reached.

Finally, have the L3 HA code verify that ha_port and
keepalived_manager objects are valid during deletion since
there is no need to do additional work if they are not.

Change-Id: Iae65305cbc04b7af482032ddf06b6f2162a9c862
Closes-bug: #1726370
2017-11-07 13:10:55 -05:00
Brian Haley 7b8289253c Fix callers of get_devices_with_ip() to pass addresses
If callers of get_devices_with_ip(), or
device.addr.list(to=address) pass an ip_cidr, it
could match any ip_cidr in that range on the interface.
Callers need to pass the IP without the prefix portion in
order to match it exactly.  Added a helper utility to
strip the cidr part from a ip_cidr.

Determined the unit test for this can't actually check
this case since we are mocking the return value from
/sbin/ip, so modified it to just make sure the dict
is correct.

Added a functional test that adds two IP addresses in
the same IP range to verify that we actually filter
correctly when a 'to=IP' is specified.

Change-Id: I3a95b3bb72a43f322ad23892d8959398aac22a1c
Closes-bug: #1728080
2017-10-31 16:20:28 -04:00
Boden R 60f8048c7c use synchronized lock decorator from neutron-lib
neutron-lib contains the synchronized lockutils decorator as well as
the SYNCHRONIZED_PREFIX global. This patch consumes them from
neutron-lib and removes them from neutron.

NeutronLibImpact

Change-Id: I729da348e340509f2d09f8a6436716e2398f1583
2017-10-04 13:57:42 -06:00
Inessa Vasilevskaya 7322bd6efb Make code follow log translation guideline
Since Pike log messages should not be translated.
This patch removes calls to i18n _LC, _LI, _LE, _LW from
logging logic throughout the code. Translators definition
from neutron._i18n is removed as well.
This patch also removes log translation verification from
ignore directive in tox.ini.

Change-Id: If9aa76fcf121c0e61a7c08088006c5873faee56e
2017-08-14 02:01:48 +00:00
Ihar Hrachyshka cc69828ff0 Apply network MTU changes to l3 ports
This patch makes L3 agent to update its ports' MTU when it's changed on
core plugin side.

Related-Bug: #1671634
Change-Id: I4444da6358e8b8420a3a365e1107b02f5bb1161d
2017-08-11 11:10:10 -04:00