L3 RPC loop could delete a router on concurrent update

routers_updated does not acquire any lock just updates
a set for future rpc loop processing.

The self.updated_routers can be changed by concurrent update
notification. If this change happens at the time around the
self.plugin_rpc.get_routers call, the additional routers
- by mistake - is considered as admin_state_up=false routers, which
 are safe to delete.

Creating a local copy of the updated_routers and preserve
the fresh updated_routers entries for the next _rpc_loop
operations.

Change-Id: Icc7377f9c29e248c3b34562465e859b15ecc2ec3
Closes-Bug: #1315467
Partial-Bug: #1253896
This commit is contained in:
Attila Fazekas 2014-05-04 19:54:37 +02:00
parent e3d0c2b811
commit 45381fe1c7
1 changed files with 11 additions and 5 deletions

View File

@ -793,18 +793,24 @@ class L3NATAgent(firewall_l3_agent.FWaaSL3AgentRpcCallback, manager.Manager):
# _rpc_loop and _sync_routers_task will not be
# executed in the same time because of lock.
# so we can clear the value of updated_routers
# and removed_routers
# and removed_routers, but they can be updated by
# updated_routers and removed_routers rpc call
try:
LOG.debug(_("Starting RPC loop for %d updated routers"),
len(self.updated_routers))
if self.updated_routers:
router_ids = list(self.updated_routers)
# We're capturing and clearing the list, and will
# process the "captured" updates in this loop,
# and any updates that happen due to a context switch
# will be picked up on the next pass.
updated_routers = set(self.updated_routers)
self.updated_routers.clear()
router_ids = list(updated_routers)
routers = self.plugin_rpc.get_routers(
self.context, router_ids)
# routers with admin_state_up=false will not be in the fetched
fetched = set([r['id'] for r in routers])
self.removed_routers.update(self.updated_routers - fetched)
self.updated_routers.clear()
self.removed_routers.update(updated_routers - fetched)
self._process_routers(routers)
self._process_router_delete()