After updating pylint, it started emitting additional "R"
warnings in some cases, fix some of them.
use-a-generator,
unnecessary-lambda-assignment,
consider-using-max-builtin,
consider-using-generator,
consider-using-in,
use-list-literal,
consider-using-from-import
Trivialfix
Change-Id: Ife6565cefcc30b4e8a0df9121c9454cf744225df
Because of the fix for bug[1] and issue with linux_utils
get_process_count_by_name() L3 agent puts all it's HA ports down
during initialization phase. Unfortunately such operation can break
already working L3 communication. Rewiring ha-* port from down state to
up can takes few seconds and some VRRP packages could be lost then.
That triggers keepalived on other node so router HA state change
may be triggered.
This change prevents putting HA ports down when during initialization
phase L3 agent finds already configured own net namespaces. Existance
of such net namespace is a good proof that there is a network
configuration existing so host wasn't rebooted so most probably it is
just agent restart.
[1] https://bugs.launchpad.net/neutron/+bug/1597461
Closes-Bug: #1959151
Change-Id: Id9c906b2d141c3bedd80fb5f868190f8a4b66f54
In the _process_added_router() method of the L3 agent, if processing
router will fail, router_info should be cleaned to e.g. be removed from
the router cache so it will not be treated as updated router in next
iteration of the agent.
Closes-Bug: #1947993
Change-Id: Ic0bc3d951d32efadc116708bfe518a711730429d
This patch switches over to callback payloads for ROUTER
AFTER_CREATE, AFTER_UPDATE and AFTER_DELETE events.
Change-Id: Ie818ffbb1a291faa80501157b46ff6671d5c26ba
This patch switches over to callback payloads for ROUTER
BEFORE_CREATE, PRECOMMIT_CREATE, BEFORE_UPDATE and
PRECOMMIT_DELETE events.
Change-Id: I4a52c773d3f753c918df0986f1d261083156651c
Router_info's _process_internal_ports() method is the one which is
manipulating router_info.internal_ports cache and network_update()
method from the L3 agent is relying on that Router_info's cache to
check if updated network is connected to the router or not.
So they shouldn't be run together as that may cause some race conditions
and unexpected issues, like e.g. described in the related bug.
Until now, network_update event was the only one which was processed
without using queue of events. And because of that such race condition
as described above were possible.
To fix that, this patch changes network_update method in the way that it
now adds update events for each router hosted by agent to the queue.
Those events for single routers are then processed, checks if network is
actually connected to the router and if yes, schedules router update to
be processed.
Closes-Bug: #1933234
Change-Id: I2efe66a7415f7a18fb85bd2536a1901e751d6203
In order to dig the real action of a ResourceUpdate, add logs for:
1. add/update router
2. delete router
3. delete namespace
4. agent extension router add/delete/update actions
Change-Id: I5c0ff485cd0c966afe535f8063deca6e410e012d
Related-bug: #1881995
This method was intended to check state of the HA router on the
node and update it in the neutron server.
Patch [1] added check of the initial status to the
neutron_keepalived_state_change_monitor process.
It also could cause some race conditions and event which is setting
correct state of the router will be not processed thus router may endup
with two nodes with "primary" state in the Neutron's DB.
Neutron_keepalived_state_change_monitor was notifying agent about
router's initial state only if this state was 'primary'.
Now it will notify agent always to let agent set router's state as
'backup' if needed (that was previously done by this removed
update_initial_state() method).
[1] https://review.opendev.org/c/openstack/neutron/+/642295
Change-Id: I2cc58c30cf844ee0ecf0611ecdec430086464790
Closes-Bug: #1916022
In case if during switching HA router to be down, there will be any
failure, router_info will be stored in L3 agent's cache as HaRouter.
In case when next update on the router is migration to non-HA router
this is wrong class and it causes other issues, e.g. with
remove_vip_by_ip_address() which is correct only for HA routers.
This patch fixes that issue by adding check of the router's ha and
distributed flags and update local cache with new router_info class
in case if at least one of those flags don't match.
Change-Id: Ib0d3a501f88c149baea7d715c7cfe5811bc85e4f
Closes-Bug: #1892846
For some agent extension implementation, it may need the router_info
to do some clean up work. So this patch just moves the extension
delete action forward.
Closes-Bug: #1897423
Change-Id: I3434ec7c0942229b99e67de7500090dedb37b13f
Now that we use setproctitle for neutron-server workers (and
neutron-keepalived-state-change), this has the side effect of changing
the process name for agents, impacting some monitoring systems. More
details in launchpad bug.
This patch fixes it by setting the name with setproctitle to:
agent name (original process name).
Also use the newly introduced name constants to replace existing
hardcoded uses.
Change-Id: I74c3a4d3e9f833752571a75f196560cd45529385
Closes-Bug: #1881297
When the L3 agent get a router update notification, it will try to
retrieve the router info from neutron server. But at this time, if
the message queue is down/unreachable. It will get exceptions related
message queue. The resync actions will be run then. Sometimes, rabbitMQ
cluster is not so much easy to recover. Then Long time MQ recover time
will cause the router info sync RPC never get successful until it meets
the max retry time. Then the bad thing happens, L3 agent is trying to
remove the router now. It basically shutdown all the existing L3 traffic
of this router.
This patch directly removes the final router removal action, let the
router run as it is.
Closes-Bug: #1871850
Change-Id: I9062638366b45a7a930f31185cd6e23901a43957
Seems that is_enabled_and_bind_by_default() from
neutron.common.ipv6_utils was copied directly into
oslo_utils.netutils, so start using it instead.
Trivialfix
Change-Id: I00fa441e7a20fcd1115485bb8ab75750e6a8cf07
L3 agent supports multiple external networks from a long
time ago, so remove this RPC call since it is not used.
According to codesearch of [1] and [2], we just remove
neutron built-in L3 agent RPC. For neutron server side
or RPC callback classes, the function is still remained.
[1] http://codesearch.openstack.org/?q=get_external_network_id
[2] http://codesearch.openstack.org/?q=L3RpcCallback
Change-Id: I764423e175d6e82729a647e415a9f267f495916f
Closes-Bug: #1844168
In fetch_and_sync_all_routers method is used python's range function.
Range function accepts integers.
This patch is fixing divide behaviour in py3 where result number is float,
by retyping float to int as it is represented in py2.
Change-Id: Ifffdee0d4a3226d4871cfabd0bdbf13d7058a83e
Closes-Bug: #1824334
As described in the bug, when a HA router transitions from "master" to
"backup", "keepalived" processes will set the virtual IP in all other
HA routers. Each HA router will then advert it and "keepalived" will
decide, according to a trivial algorithm (higher interface IP), which
one should be "master". At this point, the other "keepalived" processes
running in the other servers, will remove the HA router virtual IP
assigned an instant before
To avoid transitioning some routers form "backup" to "master" and then
to "backup" in a very short period, this patch delays the "backup" to
"master" transition, waiting for a possible new "backup" state. If
during the waiting period (set to the HA VRRP advert time, 2 seconds
default) to set the HA state to "master", the L3 agent receives a new
"backup" HA state, the L3 agent does nothing.
Closes-Bug: #1837635
Change-Id: I70037da9cdd0f8448e0af8dd96b4e3f5de5728ad
In order to support out of tree interface drivers it is required
to pass a callback to allow the drivers to query information about
the network.
- Allow passing **kwargs to interface drivers
- Pass get_networks() as `get_networks_cb` kw arg
`get_networks_cb` has the same API as
`neutron.neutron_plugin_base_v2.NeutronPluginBaseV2.get_networks()`
minus the the request context which will be embeded in the callback
itself.
The out of tree interface drivers in question are:
MultiInterfaceDriver - a per-physnet interface driver that delegates
operations on a per-physnet basis.
IPoIBInterfaceDriver - an interface driver for IPoIB (IP over Infiniband)
networks.
Those drivers are a part of networking-mlnx[1], Their implementation
is vendor agnostic so they can later be moved to a more common place
if desired.
[1] https://github.com/openstack/networking-mlnx
Change-Id: I74d9f449fb24f64548b0f6db4d5562f7447efb25
Closes-Bug: #1834176
This option was deprecated since couple of releases already.
In Stein we removed 'external_network_bridge' option from L3 agent's
config so now it's time to remove also this one.
There is also new upgrade check introduced to check and warn
users if gateway_external_network_id was used in the deployment.
This patch also removes method _check_router_needs_rescheduling() from
neutron/db/l3_db.py module as it is not needed anymore.
This patch also removes unit tests:
test_update_gateway_agent_exists_supporting_network
test_update_gateway_agent_exists_supporting_multiple_network
test_router_update_gateway_no_eligible_l3_agent
from neutron/tests/unit/extensions/test_l3.py module as those
tests are not needed when there is no "gateway_external_network_id"
config option anymore.
Change-Id: Id01571cd42cfe9c5ce91e90159917c7d3c963878
- Added get_networks() RPC call for DHCP agent
- Added get_networks() RPC call for L3 agent
This change is required in order to support out of tree
MultiInterfaceDriver and IPoIBInterfaceDriver interface drivers
as they require information on the network a port is being plugged
to.
These RPCs will be passed as kwargs when loading the relevant
interface driver.
get_networks() keyword args map to the keyword arguments of:
neutron.neutron_plugin_base_v2.NeutronPluginBaseV2.get_networks()
Change-Id: I11d82380aad8655a4fdc9656737b912b16e2859b
Partial-Bug: #1834176
And set it to all the L3 RPC functions. Move to neutron-lib
if it will be widely used corss other projects.
Related-Bug: #1835663
Change-Id: Ie7743db097fd45df432af341470336d6a5662c6f
Change [1] removed the deprecated option external_network_bridge. Per
commit message in change [2], "l3 agent can handle any networks by
setting the neutron parameter external_network_bridge and
gateway_external_network_id to empty". So the consequence of [1] was to
introduce a regression whereby multiple external networks are not
supported by the L3 agent anymore.
This change proposes a new simplified rule. If
gateway_external_network_id is defined, that is the network that the L3
agent will use. If not and multiple external networks exist, the L3
agent will handle any of them.
[1] https://review.opendev.org/#/c/567369/
[2] https://review.opendev.org/#/c/59359
Change-Id: Idd766bd069eda85ab6876a78b8b050ee5ab66cf6
Closes-Bug: #1824571
RPC notifier method can sometimes be time-consuming,
this will cause other parallel processing resources
fail to send notifications in time. This patch changes
the notify to asynchronous.
Closes-Bug: #1824911
Change-Id: I3f555a0c78fbc02d8214f12b62c37d140bc71da1
Currently, most implementations override the L3NatAgent class itself
for their own logic since there is no proper interface to extend
RouterInfo class. This adds unnecessary complexity for developers
who just want to extend router mechanism instead of whole RPC.
Add a RouterFactory class that developer can registers RouterInfo class
and delegate it for RouterInfo creation. Seperate functions and variables
which currently used externally to abstract class from RouterInfo, so that
extension can use the basic interface.
Provide the router registration function to the l3 extension API so that
extension can extend RouterInfo itself which correspond to each features
(ha, distribtued, ha + distributed)
Depends-On: https://review.openstack.org/#/c/620348/
Closes-Bug: #1804634
Partially-Implements: blueprint openflow-based-dvr
Change-Id: I1eff726900a8e67596814ca9a5f392938f154d7b
Add a unique id for resource update, then we can calculate
the resource processing time and track it.
Related-Bug: #1825152
Related-Bug: #1824911
Related-Bug: #1821912
Related-Bug: #1813787
Change-Id: Ib4d197c6c180c32860964440882393794aabb6ef
All of the externally consumed variables from neutron.common.constants
now live in neutron-lib. This patch removes neutron.common.constants
and switches all uses over to lib.
NeutronLibImpact
Depends-On: https://review.openstack.org/#/c/647836/
Change-Id: I3c2f28ecd18996a1cee1ae3af399166defe9da87
This option is deprecated and marked to be deleted in Ocata. So
as we are now in Stein development cycle I think that it's good time
to remove it.
Change-Id: I07474713206c218710544ad98c08caaa37dbf53a
Removing an active or a standby HA router from an agent that has a
valid DVR serviceable port (such as DHCP), does not remove the
HA interface associated with the Router in the SNAT namespace.
When we try to add the HA router back to the agent, then it
adds more than one HA interface to the SNAT Namespace causing
more problems and we sometimes also see multiple active routers.
This bug might have been introduced by this patch [1].
Fix the problem by just adding the router namespaces without HA
interfaces when there is no HA and re-insert the HA interfaces
when HA router is bound to the agent into the namespace.
[1] https://review.openstack.org/#/c/522362/
Closes-Bug: #1816698
Change-Id: Ie625abcb73f8185bb2bee06dcd26a01d8af0b0d1
I noticed in the functional logs that the l3-agent is constantly
logging this message, even when just adding or removing a single
router:
Resizing router processing queue green pool size to: 8
It's misleading as the pool is not being resized, it's still 8,
so let's only log when we're actually changing the pool size.
Change-Id: I5dc42fa4b4c1964b7d027681b61550cd82e83234
If l3-agent was restarted by a regular action, such as config change,
package upgrade, manually service restart etc. We should not set the
HA port down during such scenarios. Unless the physical host was
rebooted, aka the VRRP processes were all terminated.
This patch adds a new RPC call during l3 agent init, it will try to
retrieve the HA router count first. And then compare the VRRP process
(keepalived) count and 'neutron-keepalived-state-change' count
with the hosting router count. If the count matches, then that
set HA port to 'DOWN' state action will not be triggered anymore.
Closes-Bug: #1798475
Change-Id: I5e2bb64df0aaab11a640a798963372c8d91a06a8
There is a race condition between nova-compute boots instance and
l3-agent processes DVR (local) router in compute node. This issue
can be seen when a large number of instances were booted to one
same host, and instances are under different DVR router. So the
l3-agent will concurrently process all these dvr routers in this
host at the same time.
For now we have a green pool for the router ResourceProcessingQueue
with 8 greenlet, but some of these routers can still be waiting, event
worse thing is that there are time-consuming actions during the router
processing procedure. For instance, installing arp entries, iptables
rules, route rules etc.
So when the VM is up, it will try to get meta via the local proxy
hosting by the dvr router. But the router is not ready yet in that
host. And finally those instances will not be able to setup some
config in the guest OS.
This patch adds a new measurement based on the router quantity to
indicate the L3 router process queue green pool size. The pool size
will be limit from 8 (original value) to 32, because we do not want
the L3 agent cost too much host resource on processing router in the
compute node.
Related-Bug: #1813787
Change-Id: I62393864a103d666d5d9d379073f5fc23ac7d114
The neutron.common.rpc module has been in neutron-lib for awhile now and
neutron is shimmed to use neutron-lib already.
This patch removes neutron.common.rpc and switches the code over to use
neutron-lib's implementation where needed.
NeutronLibImpact
Change-Id: I733f07a8c4a2af071b3467bd710290eee11a4f4c
In case of an l3 agent sync it is important to understand when
a router is processing an update to identify when it applies
changes that can cause failovers.
Change-Id: Ie9ba2a8ffebfcc3bfb35f7a48f73a25352309b4e
Today the neutron common exceptions already live in neutron-lib and are
shimmed from neutron. This patch removes the neutron.common.exceptions
module and changes neutron's imports over to use their respective
neutron-lib exception module instead.
NeutronLibImpact
Change-Id: I9704f20eb21da85d2cf024d83338b3d94593671e
If the L3 agent fails to send its report_state
to the server, it logs an exception:
Failed reporting state!: MessagingTimeout: Timed out...
If it then tries a second time and succeeds it just
goes on happily. It would be nice if it logged that
it had success on the subsequent attempt so someone
looking at the logs know it recovered.
Change-Id: I9019782588caffcb647cf1fd557f76ce89cea254
The L3AgentExtension class delete_router() method expects a
dict as it's 'data' argument, but the l3-agent code that
deletes a router was passing just the router ID. Change to
correctly pass a router dictionary if one exists.
Change-Id: I112d1f8dce9defddfbd8fbfa75bf538e308e1561
Closes-bug: #1809134
In case when 2 dvr routers are connected to each other with
tenant network, those routers needs to be always deployed
on same compute nodes.
So this patch changes dvr routers scheduler that it will create
dvr router on each host on which there are vms or other dvr routers
connected to same subnets.
Co-Authored-By: Swaminathan Vasudevan <SVasudevan@suse.com>
Closes-Bug: #1786272
Change-Id: I579c2522f8aed2b4388afacba34d9ffdc26708e3
This patch switches callbacks over to the payload object style events
[1] for ROUTER and ROUTER_GATEWAY BEFORE_DELETE based notifications. To
do so a DBEventPayload object is used with the publish() method to pass
along the related data.
NeutronLibImpact
[1] https://docs.openstack.org/neutron-lib/latest/contributor/callbacks.html#event-payloads
Change-Id: I3ce4475643f4f0afed01f2e9956b3bf84714e6f2
Moved the router processing queue code to the agent/common
directory and renamed it "resource processing queue". This
way it can be consumed by other agents, or possibly even
moved to neutron-lib in the future.
Change-Id: I735cf5b0a915828c420c3316b78a48f6d54035e6