Fix AttributeError in RT._update_usage_from_migration

Change Ieb539c9a0cfbac743c579a1633234537a8e3e3ee in Stein
added some logging in _update_usage_from_migration to log
the flavor for an inbound and outbound migration.

If an instance is resized and then the resize is immediately
confirmed, it's possible to race with ComputeManager._confirm_resize
setting the instance.old_flavor to None before the migration
status is changed to "confirmed" while the update_available_resource
periodic is running which will result in _update_usage_from_migration
hitting an AttributeError when trying to log instance.old_flavor.flavorid
since instance.old_flavor is None.

There are a few key points there:

- We get into _update_usage_from_migration because the
  _update_available_resource method gets in-progress migrations
  related to the host (in this case the source compute) and the
  migration is consider in-progress until its status is "confirmed".

- The instance is not in the tracked_instances dict when
  _update_usage_from_migration runs because RT only tracks instances
  where the instance.host matches the RT.host and in this case the
  instance has been resized to another compute and the instance.host
  is pointing at the dest compute.

The fix here is to simply check if we got the instance.old_flavor and
not log the message if we do not have it, which gets us back to the old
behavior.

This bug was found by noticing it in CI job logs - there is a link to
hits in logstash in the bug report.

As for the "incoming and not tracked" case in _update_usage_from_migration
I have not modified that since I am not sure we have the same race nor
have I seen it in CI logs.

Change-Id: I43e34b3ff1424d42632a3e8f842c93508905aa1a
Closes-Bug: #1834349
This commit is contained in:
Matt Riedemann 2019-06-26 13:26:44 -04:00
parent 707deb1589
commit 818419c9d3
2 changed files with 31 additions and 2 deletions

View File

@ -1114,8 +1114,14 @@ class ResourceTracker(object):
itype = self._get_instance_type(instance, 'old_', migration)
numa_topology = self._get_migration_context_resource(
'numa_topology', instance, prefix='old_')
LOG.debug('Starting to track outgoing migration %s with flavor %s',
migration.uuid, itype.flavorid, instance=instance)
# We could be racing with confirm_resize setting the
# instance.old_flavor field to None before the migration status
# is "confirmed" so if we did not find the flavor in the outgoing
# resized instance we won't track it.
if itype:
LOG.debug('Starting to track outgoing migration %s with '
'flavor %s', migration.uuid, itype.flavorid,
instance=instance)
if itype:
cn = self.compute_nodes[nodename]

View File

@ -2761,6 +2761,29 @@ class TestUpdateUsageFromMigration(test.NoDBTestCase):
_NODENAME)
self.assertFalse(get_mock.called)
def test_missing_old_flavor_outbound_resize(self):
"""Tests the case that an instance is not being tracked on the source
host because it has been resized to a dest host. The confirm_resize
operation in ComputeManager sets instance.old_flavor to None before
the migration.status is changed to "confirmed" so the source compute
RT considers it an in-progress migration and tries to update tracked
usage from the instance.old_flavor (which is None when
_update_usage_from_migration runs). This test just makes sure that the
RT method gracefully handles the instance.old_flavor being gone.
"""
migration = _MIGRATION_FIXTURES['source-only']
rt = resource_tracker.ResourceTracker(
migration.source_compute, mock.sentinel.virt_driver)
ctxt = context.get_admin_context()
instance = objects.Instance(
uuid=migration.instance_uuid, old_flavor=None,
migration_context=objects.MigrationContext())
rt._update_usage_from_migration(
ctxt, instance, migration, migration.source_node)
self.assertNotIn('Starting to track outgoing migration',
self.stdlog.logger.output)
self.assertNotIn(migration.instance_uuid, rt.tracked_migrations)
class TestUpdateUsageFromMigrations(BaseTestCase):
@mock.patch('nova.compute.resource_tracker.ResourceTracker.'