Revert resize: wait for events according to hybrid plug

Since 4817165fc5, when reverting a
resized instance back to the source host, the libvirt driver waits for
vif-plugged events when spawning the instance. When called from
finish_revert_resize() in the source compute manager, libvirt's
finish_revert_migration() does not pass vifs_already_plugged to
_create_domain_and_network(), making the latter use the default False
value.

When the source compute manager calls
network_api.migrate_instance_finish() in finish_revert_resize(), this
updates the port binding back to the source host. If Neutron is
configured to use OVS hybrid plug, it will send the vif-plugged event
immediately after completing this request. This happens before the
virt driver's finish_revert_migration() method is called. This causes
the wait in the libvirt driver to time out because the event is
received before Nova starts waiting for it.

The neutron ovs l2 agent sends vif-plugged events when two conditions
are met. First the port must be bound to the host managed by the
l2 agent and second, the agent must have completed configuring the
port on ovs. This involves assigning the port a local VLAN for tenant
isolation, applying security group rules if required and applying
QoS policies or other agent extensions like service function chaining.

During the boot process, we bind the port first to the host
then plug the interface into ovs which triggers the l2 agent to
configure it resulting in the emission of the vif-plugged event.
In the revert case, as noted above, since the vif is already plugged
on the source node when hybrid-plug is used, binding the port to the
source node fulfils the second condition to send the vif-plugged event.

Events sent immediately after port binding update are hereafter known
as "bind-time" events. For ports that do not use OVS hybrid plug,
Neutron will continue to send vif-plugged events only when Nova
actually plugs the VIF. These types of events are hereafter known as
"plug-time" events. OVS hybrid plug is a per agent setting, so for
a particular host, bind-time events are an all-or-nothing thing for the
ovs backend: either all VIF_TYPE=ovs ports have them, or no ovs ports
have them. In general, a host will only have one network backend.
The only exception to this is SR-IOV. SR-IOV is commonly deployed on
the same host as other network backends such as OVS or linuxbridge.
SR-IOV ports with VNIC_TYPE=direct-physical will always have only
bind-time events. If an instance mixes OVS ports with hybrid-plug=False
with direct physical ports, it will have both kinds of events.

This patch adds functions to the NetworkInfo model that return what
kinds of events each VIF has. These are then used in the migration
revert logic to decide when to wait for external events: in the
compute manager, when binding the port, for bind-time events,
and/or in libvirt, when plugging the VIFs, for plug-time events.

Closes-bug: 1832028
Co-Authored-By: Sean Mooney work@seanmooney.info
Change-Id: I51cdcae67be8c68a55bc939de4ea0aba2361dcc4
This commit is contained in:
Artom Lifshitz 2019-03-20 10:38:12 -04:00 committed by Sean Mooney
parent a628d2f09a
commit 19f9b37721
7 changed files with 240 additions and 34 deletions

View File

@ -4164,6 +4164,50 @@ class ComputeManager(manager.Manager):
self.compute_rpcapi.finish_revert_resize(context, instance,
migration, migration.source_compute)
def _finish_revert_resize_network_migrate_finish(self, context, instance,
migration):
"""Causes port binding to be updated. In some Neutron or port
configurations - see NetworkModel.get_bind_time_events() - we
expect the vif-plugged event from Neutron immediately and wait for it.
The rest of the time, the event is expected further along in the
virt driver, so we don't wait here.
:param context: The request context.
:param instance: The instance undergoing the revert resize.
:param migration: The Migration object of the resize being reverted.
:raises: eventlet.timeout.Timeout or
exception.VirtualInterfacePlugException.
"""
network_info = instance.get_network_info()
events = []
deadline = CONF.vif_plugging_timeout
if deadline and utils.is_neutron() and network_info:
events = network_info.get_bind_time_events()
if events:
LOG.debug('Will wait for bind-time events: %s', events,
instance=instance)
error_cb = self._neutron_failed_migration_callback
try:
with self.virtapi.wait_for_instance_event(instance, events,
deadline=deadline,
error_callback=error_cb):
# NOTE(hanrong): we need to change migration.dest_compute to
# source host temporarily.
# "network_api.migrate_instance_finish" will setup the network
# for the instance on the destination host. For revert resize,
# the instance will back to the source host, the setup of the
# network for instance should be on the source host. So set the
# migration.dest_compute to source host at here.
with utils.temporary_mutation(
migration, dest_compute=migration.source_compute):
self.network_api.migrate_instance_finish(context,
instance,
migration)
except eventlet.timeout.Timeout:
with excutils.save_and_reraise_exception():
LOG.error('Timeout waiting for Neutron events: %s', events,
instance=instance)
@wrap_exception()
@reverts_task_state
@wrap_instance_event(prefix='compute')
@ -4211,17 +4255,8 @@ class ComputeManager(manager.Manager):
self.network_api.setup_networks_on_host(context, instance,
migration.source_compute)
# NOTE(hanrong): we need to change migration.dest_compute to
# source host temporarily. "network_api.migrate_instance_finish"
# will setup the network for the instance on the destination host.
# For revert resize, the instance will back to the source host, the
# setup of the network for instance should be on the source host.
# So set the migration.dest_compute to source host at here.
with utils.temporary_mutation(
migration, dest_compute=migration.source_compute):
self.network_api.migrate_instance_finish(context,
instance,
migration)
self._finish_revert_resize_network_migrate_finish(
context, instance, migration)
network_info = self.network_api.get_instance_nw_info(context,
instance)
@ -6439,8 +6474,8 @@ class ComputeManager(manager.Manager):
return migrate_data
@staticmethod
def _neutron_failed_live_migration_callback(event_name, instance):
msg = ('Neutron reported failure during live migration '
def _neutron_failed_migration_callback(event_name, instance):
msg = ('Neutron reported failure during migration '
'with %(event)s for instance %(uuid)s')
msg_args = {'event': event_name, 'uuid': instance.uuid}
if CONF.vif_plugging_is_fatal:
@ -6518,7 +6553,7 @@ class ComputeManager(manager.Manager):
disk = None
deadline = CONF.vif_plugging_timeout
error_cb = self._neutron_failed_live_migration_callback
error_cb = self._neutron_failed_migration_callback
# In order to avoid a race with the vif plugging that the virt
# driver does on the destination host, we register our events
# to wait for before calling pre_live_migration. Then if the

View File

@ -458,6 +458,16 @@ class VIF(Model):
'ips': ips}
return []
def has_bind_time_event(self):
"""When updating the port binding to a host that already has the
instance in a shutoff state (in practice, this currently means
reverting a resize or cold migration), the following Neutron/port
configurations cause network-vif-plugged events to be sent as soon as
the binding is updated:
- OVS with hybrid plug
"""
return self.is_hybrid_plug_enabled()
def is_hybrid_plug_enabled(self):
return self['details'].get(VIF_DETAILS_OVS_HYBRID_PLUG, False)
@ -515,6 +525,24 @@ class NetworkInfo(list):
def json(self):
return jsonutils.dumps(self)
def get_bind_time_events(self):
"""When updating the port binding to a host that already has the
instance in a shutoff state (in practice, this currently means
reverting a resize or cold migration), return external events that are
sent as soon as the binding is updated.
"""
return [('network-vif-plugged', vif['id'])
for vif in self if vif.has_bind_time_event()]
def get_plug_time_events(self):
"""When updating the port binding to a host that already has the
instance in a shutoff state (in practice, this currently means
reverting a resize or cold migration), return external events that are
sent when the VIF is plugged.
"""
return [('network-vif-plugged', vif['id'])
for vif in self if not vif.has_bind_time_event()]
class NetworkInfoAsyncWrapper(NetworkInfo):
"""Wrapper around NetworkInfo that allows retrieving NetworkInfo

View File

@ -5891,7 +5891,9 @@ class ComputeTestCase(BaseTestCase,
old_vm_state = vm_states.ACTIVE
else:
old_vm_state = vm_states.STOPPED
params = {'vm_state': old_vm_state}
params = {'vm_state': old_vm_state,
'info_cache': objects.InstanceInfoCache(
network_info=network_model.NetworkInfo([]))}
instance = self._create_fake_instance_obj(params)
self.stub_out('nova.virt.fake.FakeDriver.finish_migration', fake)
@ -6041,7 +6043,9 @@ class ComputeTestCase(BaseTestCase,
def fake(*args, **kwargs):
pass
instance = self._create_fake_instance_obj()
params = {'info_cache': objects.InstanceInfoCache(
network_info=network_model.NetworkInfo([]))}
instance = self._create_fake_instance_obj(params)
self.stub_out('nova.virt.fake.FakeDriver.finish_migration', fake)
self.stub_out('nova.virt.fake.FakeDriver.finish_revert_migration',

View File

@ -5054,6 +5054,81 @@ class ComputeManagerUnitTestCase(test.NoDBTestCase,
self.context, fake_instance, fake_bdm)
block_stats.assert_called_once_with(fake_instance, 'vda')
def _test_finish_revert_resize_network_migrate_finish(self, vifs, events):
instance = fake_instance.fake_instance_obj(self.context)
instance.info_cache = objects.InstanceInfoCache(
network_info=network_model.NetworkInfo(vifs))
migration = objects.Migration(
source_compute='fake-source',
dest_compute='fake-dest')
def fake_migrate_instance_finish(context, instance, migration):
self.assertEqual(migration.source_compute, 'fake-source')
# NOTE(artom) This looks weird, but it's checking that the
# temporaty_mutation() context manager did its job.
self.assertEqual(migration.dest_compute, 'fake-source')
with test.nested(
mock.patch.object(self.compute.virtapi,
'wait_for_instance_event'),
mock.patch.object(self.compute.network_api,
'migrate_instance_finish',
side_effect=fake_migrate_instance_finish)
) as (mock_wait, mock_migrate_instance_finish):
self.compute._finish_revert_resize_network_migrate_finish(
self.context, instance, migration)
mock_wait.assert_called_once_with(
instance, events, deadline=CONF.vif_plugging_timeout,
error_callback=self.compute._neutron_failed_migration_callback)
mock_migrate_instance_finish.assert_called_once_with(
self.context, instance, migration)
def test_finish_revert_resize_network_migrate_finish_wait(self):
"""Test that we wait for bind-time events if we have a hybrid-plugged
VIF.
"""
self._test_finish_revert_resize_network_migrate_finish(
[network_model.VIF(id=uuids.hybrid_vif,
details={'ovs_hybrid_plug': True}),
network_model.VIF(id=uuids.normal_vif,
details={'ovs_hybrid_plug': False})],
[('network-vif-plugged', uuids.hybrid_vif)])
def test_finish_revert_resize_network_migrate_finish_dont_wait(self):
"""Test that we're not waiting for any events if we don't have any
hybrid-plugged VIFs.
"""
self._test_finish_revert_resize_network_migrate_finish(
[network_model.VIF(id=uuids.hybrid_vif,
details={'ovs_hybrid_plug': False}),
network_model.VIF(id=uuids.normal_vif,
details={'ovs_hybrid_plug': False})],
[])
def test_finish_revert_resize_network_migrate_finish_no_vif_timeout(self):
"""Test that we're not waiting for any events if vif_plugging_timeout
is 0.
"""
self.flags(vif_plugging_timeout=0)
self._test_finish_revert_resize_network_migrate_finish(
[network_model.VIF(id=uuids.hybrid_vif,
details={'ovs_hybrid_plug': True}),
network_model.VIF(id=uuids.normal_vif,
details={'ovs_hybrid_plug': True})],
[])
@mock.patch.object(utils, 'is_neutron', return_value=False)
def test_finish_revert_resize_network_migrate_finish_not_neutron(self, _):
"""Test that we're not waiting for any events if we're not using
Neutron.
"""
self._test_finish_revert_resize_network_migrate_finish(
[network_model.VIF(id=uuids.hybrid_vif,
details={'ovs_hybrid_plug': True}),
network_model.VIF(id=uuids.normal_vif,
details={'ovs_hybrid_plug': True})],
[])
class ComputeManagerBuildInstanceTestCase(test.NoDBTestCase):
def setUp(self):

View File

@ -15,6 +15,7 @@
# under the License.
from oslo_config import cfg
from oslo_utils.fixture import uuidsentinel as uuids
from nova import exception
from nova.network import model
@ -857,6 +858,21 @@ iface eth1 inet static
libvirt_virt_type='lxc')
self.assertEqual(expected, template)
def test_get_events(self):
network_info = model.NetworkInfo([
model.VIF(
id=uuids.hybrid_vif,
details={'ovs_hybrid_plug': True}),
model.VIF(
id=uuids.normal_vif,
details={'ovs_hybrid_plug': False})])
self.assertEqual(
[('network-vif-plugged', uuids.hybrid_vif)],
network_info.get_bind_time_events())
self.assertEqual(
[('network-vif-plugged', uuids.normal_vif)],
network_info.get_plug_time_events())
class TestNetworkMetadata(test.NoDBTestCase):
def setUp(self):

View File

@ -17203,8 +17203,9 @@ class LibvirtConnTestCase(test.NoDBTestCase,
self.assertEqual(0, domain.resume.call_count)
def _test_create_with_network_events(self, neutron_failure=None,
power_on=True):
power_on=True, events=None):
generated_events = []
events_passed_to_prepare = []
def wait_timeout():
event = mock.MagicMock()
@ -17222,6 +17223,7 @@ class LibvirtConnTestCase(test.NoDBTestCase,
m.event_name = '%s-%s' % (name, tag)
m.wait.side_effect = wait_timeout
generated_events.append(m)
events_passed_to_prepare.append((name, tag))
return m
virtapi = manager.ComputeVirtAPI(mock.MagicMock())
@ -17241,7 +17243,8 @@ class LibvirtConnTestCase(test.NoDBTestCase,
def test_create(cleanup, create, fw_driver, plug_vifs):
domain = drvr._create_domain_and_network(self.context, 'xml',
instance, vifs,
power_on=power_on)
power_on=power_on,
external_events=events)
plug_vifs.assert_called_with(instance, vifs)
pause = self._get_pause_flag(drvr, vifs, power_on=power_on)
@ -17256,7 +17259,9 @@ class LibvirtConnTestCase(test.NoDBTestCase,
test_create()
if utils.is_neutron() and CONF.vif_plugging_timeout and power_on:
if events and utils.is_neutron() and CONF.vif_plugging_timeout:
self.assertEqual(events_passed_to_prepare, events)
elif utils.is_neutron() and CONF.vif_plugging_timeout and power_on:
prepare.assert_has_calls([
mock.call(instance, 'network-vif-plugged', uuids.vif_1),
mock.call(instance, 'network-vif-plugged', uuids.vif_2)])
@ -17269,6 +17274,22 @@ class LibvirtConnTestCase(test.NoDBTestCase,
else:
self.assertEqual(0, prepare.call_count)
@mock.patch('nova.utils.is_neutron', new=mock.Mock(return_value=True))
def test_create_with_network_events_passed_in(self):
self._test_create_with_network_events(
events=[('network-vif-plugged', uuids.fake_vif)])
@mock.patch('nova.utils.is_neutron', new=mock.Mock(return_value=False))
def test_create_with_network_events_passed_in_nova_net(self):
self._test_create_with_network_events(
events=[('network-vif-plugged', uuids.fake_vif)])
@mock.patch('nova.utils.is_neutron', new=mock.Mock(return_value=True))
def test_create_with_network_events_passed_in_0_timeout(self):
self.flags(vif_plugging_timeout=0)
self._test_create_with_network_events(
events=[('network-vif-plugged', uuids.fake_vif)])
@mock.patch('nova.utils.is_neutron', return_value=True)
def test_create_with_network_events_neutron(self, is_neutron):
self._test_create_with_network_events()
@ -19558,10 +19579,13 @@ class LibvirtDriverTestCase(test.NoDBTestCase, TraitsComparisonMixin):
def fake_create_domain(_self, context, xml, instance, network_info,
block_device_info=None, power_on=None,
vifs_already_plugged=None):
vifs_already_plugged=None,
external_events=None):
self.fake_create_domain_called = True
self.assertEqual(powered_on, power_on)
self.assertFalse(vifs_already_plugged)
self.assertEqual(self.events_passed_to_fake_create,
external_events)
return mock.MagicMock()
def fake_enable_hairpin(self):
@ -19605,9 +19629,15 @@ class LibvirtDriverTestCase(test.NoDBTestCase, TraitsComparisonMixin):
f = open(libvirt_xml_path, 'w')
f.close()
network_info = network_model.NetworkInfo(
[network_model.VIF(id=uuids.normal_plug_time_vif),
network_model.VIF(id=uuids.hybrid_bind_time_vif,
details={'ovs_hybrid_plug': True})])
self.events_passed_to_fake_create = [
('network-vif-plugged', uuids.normal_plug_time_vif)]
self.drvr.finish_revert_migration(
context.get_admin_context(), ins_ref,
[], None, power_on)
context.get_admin_context(), ins_ref,
network_info, None, power_on)
self.assertTrue(self.fake_create_domain_called)
def test_finish_revert_migration_power_on(self):
@ -19639,7 +19669,8 @@ class LibvirtDriverTestCase(test.NoDBTestCase, TraitsComparisonMixin):
if del_inst_failed:
mock_rmtree.side_effect = OSError(errno.ENOENT,
'test exception')
drvr.finish_revert_migration(context, ins_ref, [])
drvr.finish_revert_migration(context, ins_ref,
network_model.NetworkInfo())
if backup_made:
mock_rename.assert_called_once_with('/fake/foo_resize',
'/fake/foo')
@ -19678,7 +19709,9 @@ class LibvirtDriverTestCase(test.NoDBTestCase, TraitsComparisonMixin):
return_value=image_meta),
mock.patch.object(drvr, '_get_guest_xml',
side_effect=fake_get_guest_xml)):
drvr.finish_revert_migration('', instance, None, power_on=False)
drvr.finish_revert_migration('', instance,
network_model.NetworkInfo(),
power_on=False)
def test_finish_revert_migration_snap_backend(self):
drvr = libvirt_driver.LibvirtDriver(fake.FakeVirtAPI(), False)
@ -19692,7 +19725,9 @@ class LibvirtDriverTestCase(test.NoDBTestCase, TraitsComparisonMixin):
mock.patch.object(drvr, '_get_guest_xml')) as (
mock_image, mock_cdn, mock_ggx):
mock_image.return_value = {'disk_format': 'raw'}
drvr.finish_revert_migration('', ins_ref, None, power_on=False)
drvr.finish_revert_migration('', ins_ref,
network_model.NetworkInfo(),
power_on=False)
drvr.image_backend.rollback_to_snap.assert_called_once_with(
libvirt_utils.RESIZE_SNAPSHOT_NAME)
@ -19732,7 +19767,9 @@ class LibvirtDriverTestCase(test.NoDBTestCase, TraitsComparisonMixin):
mock.patch.object(drvr, '_get_guest_xml')) as (
mock_rbd, mock_image, mock_cdn, mock_ggx):
mock_image.return_value = {'disk_format': 'raw'}
drvr.finish_revert_migration('', ins_ref, None, power_on=False)
drvr.finish_revert_migration('', ins_ref,
network_model.NetworkInfo(),
power_on=False)
self.assertFalse(drvr.image_backend.rollback_to_snap.called)
self.assertFalse(drvr.image_backend.remove_snap.called)

View File

@ -5727,14 +5727,15 @@ class LibvirtDriver(driver.ComputeDriver):
block_device_info=None, power_on=True,
vifs_already_plugged=False,
post_xml_callback=None,
destroy_disks_on_failure=False):
destroy_disks_on_failure=False,
external_events=None):
"""Do required network setup and create domain."""
timeout = CONF.vif_plugging_timeout
if (self._conn_supports_start_paused and
utils.is_neutron() and not
vifs_already_plugged and power_on and timeout):
events = self._get_neutron_events(network_info)
if (self._conn_supports_start_paused and utils.is_neutron()
and not vifs_already_plugged and power_on and timeout):
events = (external_events if external_events
else self._get_neutron_events(network_info))
else:
events = []
@ -9129,9 +9130,19 @@ class LibvirtDriver(driver.ComputeDriver):
xml = self._get_guest_xml(context, instance, network_info, disk_info,
instance.image_meta,
block_device_info=block_device_info)
self._create_domain_and_network(context, xml, instance, network_info,
block_device_info=block_device_info,
power_on=power_on)
# NOTE(artom) In some Neutron or port configurations we've already
# waited for vif-plugged events in the compute manager's
# _finish_revert_resize_network_migrate_finish(), right after updating
# the port binding. For any ports not covered by those "bind-time"
# events, we wait for "plug-time" events here.
events = network_info.get_plug_time_events()
if events:
LOG.debug('Instance is using plug-time events: %s', events,
instance=instance)
self._create_domain_and_network(
context, xml, instance, network_info,
block_device_info=block_device_info, power_on=power_on,
external_events=events)
if power_on:
timer = loopingcall.FixedIntervalLoopingCall(