cinder-manage online_data_migrations fixes

Addresses some issues with this command:

1) When used without the --max-count option, the summary table will
   always show zero migrations run, because it only accounts for the
   last batch, and the loop only exits when the last batch does no work.

2) "remaining" counts cannot be accurate, given the way migrations are
   implemented, because the "found" count refers to the number of rows
   that exist in the database, not the number that still need the
   migration applied.

3) In the case where no migrations are successful, but some raise
   exceptions, the command was exiting with status zero, which usually
   indicates "success". This can cause issues that cause migration
   failures to go unnoticed, especially when automated.

4) When exceptions do occur, a minimally useful message is output, and
   no detail about the exception is available to the user. The exception
   detail should be logged.

5) Inaccuracies in the documentation - "--max_number" should be
   "--max-count", and stale references to the "--ignore_state" option,
   which was removed in [1]

The solution for (3) introduces a new exit status, 2. See release note
for details.

These changes are aligned with equivalents [2][3] for the nova-manage
command, except for the calculation of "Total Needed" - nova seems to
interpret the "found" count differently/inconsistently.

[1] https://review.openstack.org/510201
[2] https://review.openstack.org/605828
[3] https://review.openstack.org/608091

Change-Id: I878480eb2359625cde839b073230844acc645cba
Closes-Bug: #1794364
Closes-Bug: #1796192
(cherry picked from commit d47486d317)
This commit is contained in:
imacdonn 2018-10-17 22:48:18 +00:00
parent 43d6c50b73
commit 74fd810ad1
6 changed files with 156 additions and 41 deletions

View File

@ -88,6 +88,8 @@ from cinder.volume import utils as vutils
CONF = cfg.CONF
LOG = logging.getLogger(__name__)
RPC_VERSIONS = {
'cinder-scheduler': scheduler_rpcapi.SchedulerAPI.RPC_API_VERSION,
'cinder-volume': volume_rpcapi.VolumeAPI.RPC_API_VERSION,
@ -339,34 +341,32 @@ class DbCommands(object):
def _run_migration(self, ctxt, max_count):
ran = 0
exceptions = False
migrations = {}
for migration_meth in self.online_migrations:
count = max_count - ran
try:
found, done = migration_meth(ctxt, count)
except Exception:
print(_("Error attempting to run %(method)s") %
{'method': migration_meth.__name__})
msg = (_("Error attempting to run %(method)s") %
{'method': migration_meth.__name__})
print(msg)
LOG.exception(msg)
exceptions = True
found = done = 0
name = migration_meth.__name__
remaining = found - done
if found:
print(_('%(found)i rows matched query %(meth)s, %(done)i '
'migrated, %(remaining)i remaining') % {'found': found,
'meth': name,
'done': done,
'remaining':
remaining})
migrations.setdefault(name, (0, 0, 0))
migrations[name] = (migrations[name][0] + found,
migrations[name][1] + done,
migrations[name][2] + remaining)
'migrated') % {'found': found,
'meth': name,
'done': done})
migrations[name] = found, done
if max_count is not None:
ran += done
if ran >= max_count:
break
return migrations
return migrations, exceptions
@args('--max_count', metavar='<number>', dest='max_count', type=int,
help='Maximum number of objects to consider.')
@ -383,20 +383,19 @@ class DbCommands(object):
max_count = 50
print(_('Running batches of %i until complete.') % max_count)
# FIXME(jdg): So this is annoying and confusing,
# we iterate through in batches until there are no
# more updates, that's AWESOME!! BUT we only print
# out a table reporting found/done AFTER the loop
# here, so that means the response the user sees is
# always a table of "needed 0" and "completed 0".
# So it's an indication of "all done" but it seems like
# some feedback as we go would be nice to have here.
ran = None
exceptions = False
migration_info = {}
while ran is None or ran != 0:
migrations = self._run_migration(ctxt, max_count)
migration_info.update(migrations)
ran = sum([done for found, done, remaining in migrations.values()])
migrations, exceptions = self._run_migration(ctxt, max_count)
ran = 0
for name in migrations:
migration_info.setdefault(name, (0, 0))
migration_info[name] = (
max(migration_info[name][0], migrations[name][0]),
migration_info[name][1] + migrations[name][1],
)
ran += migrations[name][1]
if not unlimited:
break
headers = ["{}".format(_('Migration')),
@ -408,6 +407,18 @@ class DbCommands(object):
t.add_row([name, info[0], info[1]])
print(t)
# NOTE(imacdonn): In the "unlimited" case, the loop above will only
# terminate when all possible migrations have been effected. If we're
# still getting exceptions, there's a problem that requires
# intervention. In the max-count case, exceptions are only considered
# fatal if no work was done by any other migrations ("not ran"),
# because otherwise work may still remain to be done, and that work
# may resolve dependencies for the failing migrations.
if exceptions and (unlimited or not ran):
print(_("Some migrations failed unexpectedly. Check log for "
"details."))
sys.exit(2)
sys.exit(1 if ran else 0)
@args('--enable-replication', action='store_true', default=False,

View File

@ -450,8 +450,8 @@ class TestCinderManageCmd(test.TestCase):
command.online_data_migrations, 10)
self.assertEqual(1, exit.code)
expected = """\
5 rows matched query mock_mig_1, 4 migrated, 1 remaining
6 rows matched query mock_mig_2, 6 migrated, 0 remaining
5 rows matched query mock_mig_1, 4 migrated
6 rows matched query mock_mig_2, 6 migrated
+------------+--------------+-----------+
| Migration | Total Needed | Completed |
+------------+--------------+-----------+
@ -466,6 +466,78 @@ class TestCinderManageCmd(test.TestCase):
self.assertEqual(expected, sys.stdout.getvalue())
@mock.patch('cinder.context.get_admin_context')
def test_online_migrations_no_max_count(self, mock_get_context):
self.useFixture(fixtures.MonkeyPatch('sys.stdout', StringIO()))
fake_remaining = [120]
def fake_migration(context, count):
self.assertEqual(mock_get_context.return_value, context)
found = 120
done = min(fake_remaining[0], count)
fake_remaining[0] -= done
return found, done
command_cls = self._fake_db_command((fake_migration,))
command = command_cls()
exit = self.assertRaises(SystemExit,
command.online_data_migrations, None)
self.assertEqual(0, exit.code)
expected = """\
Running batches of 50 until complete.
120 rows matched query fake_migration, 50 migrated
120 rows matched query fake_migration, 50 migrated
120 rows matched query fake_migration, 20 migrated
120 rows matched query fake_migration, 0 migrated
+----------------+--------------+-----------+
| Migration | Total Needed | Completed |
+----------------+--------------+-----------+
| fake_migration | 120 | 120 |
+----------------+--------------+-----------+
"""
self.assertEqual(expected, sys.stdout.getvalue())
@mock.patch('cinder.context.get_admin_context')
def test_online_migrations_error(self, mock_get_context):
self.useFixture(fixtures.MonkeyPatch('sys.stdout', StringIO()))
good_remaining = [50]
def good_migration(context, count):
self.assertEqual(mock_get_context.return_value, context)
found = 50
done = min(good_remaining[0], count)
good_remaining[0] -= done
return found, done
bad_migration = mock.MagicMock()
bad_migration.side_effect = test.TestingException
bad_migration.__name__ = 'bad_migration'
command_cls = self._fake_db_command((bad_migration, good_migration))
command = command_cls()
# bad_migration raises an exception, but it could be because
# good_migration had not completed yet. We should get 1 in this case,
# because some work was done, and the command should be reiterated.
exit = self.assertRaises(SystemExit,
command.online_data_migrations, max_count=50)
self.assertEqual(1, exit.code)
# When running this for the second time, there's no work left for
# good_migration to do, but bad_migration still fails - should
# get 2 this time.
exit = self.assertRaises(SystemExit,
command.online_data_migrations, max_count=50)
self.assertEqual(2, exit.code)
# When --max-count is not used, we should get 2 if all possible
# migrations completed but some raise exceptions
good_remaining = [50]
exit = self.assertRaises(SystemExit,
command.online_data_migrations, None)
self.assertEqual(2, exit.code)
@mock.patch('cinder.cmd.manage.DbCommands.online_migrations',
(mock.Mock(side_effect=((2, 2), (0, 0)), __name__='foo'),))
def test_db_commands_online_data_migrations_ignore_state_and_max(self):

View File

@ -310,10 +310,8 @@ and cgsnapshots will be removed from the database::
cinder-manage db online_data_migrations
--max_count <max>
--ignore_state
max_count is optional. Default is 50.
ignore_state is optional. Default is False.
After running the above migration command to migrate CGs to generic
volume groups, CG and group APIs work as follows:

View File

@ -67,16 +67,30 @@ Cinder Db
Purge database entries that are marked as deleted, that are older than the number of days specified.
``cinder-manage db online_data_migrations``
``cinder-manage db online_data_migrations [--max-count <n>]``
Perform online data migrations for database upgrade between releases in batches.
Perform online data migrations for database upgrade between releases in
batches.
This command interprets the following options when it is invoked:
--max_count Maximum number of objects to consider.
--ignore_state Force records to migrate even if another operation is
performed on them. This may be dangerous, please refer to
release notes for more information.
--max-count Maximum number of objects to migrate. If not specified,
all possible migrations will be completed, in batches of
50 at a time.
Returns exit status 0 if no (further) updates are possible, 1 if the
``--max-count`` option was used and some updates were completed
successfully (even if others generated errors), 2 if some updates generated
errors and no other migrations were able to take effect in the last batch
attempted, or 127 if invalid input is provided (e.g. non-numeric
max-count).
This command should be run after upgrading the database schema. If it exits
with partial updates (exit status 1) it should be called again, even if
some updates initially generated errors, because some updates may depend on
others having completed. If it exits with status 2, intervention is
required to resolve the issue causing remaining updates to fail. It should
be considered successfully completed only when the exit status is 0.
Cinder Logs
~~~~~~~~~~~

View File

@ -238,10 +238,17 @@ After maintenance window
upgrade is performed.
* Since Ocata, you also need to run ``cinder-manage db online_data_migrations``
command to make sure data migrations are applied. The tool let's you limit
the impact of the data migrations by using ``--max_number`` option to limit
number of migrations executed in one run. You need to complete all of the
migrations before starting upgrade to the next version (e.g. you need to
complete Ocata's data migrations before proceeding with upgrade to Pike; you
won't be able to execute Pike's DB schema migrations before completing
Ocata's data migrations).
command to make sure data migrations are applied. The tool lets you limit
the impact of the data migrations by using ``--max-count`` option to limit
number of migrations executed in one run. If this option is used, the
exit status will be 1 if any migrations were successful (even if others
generated errors, which could be due to dependencies between migrations).
The command should be rerun while the exit status is 1. If no further
migrations are possible, the exit status will be 2 if some migrations are
still generating errors, which requires intervention to resolve. The
command should be considered completed successfully only when the exit
status is 0. You need to complete all of the migrations before starting
upgrade to the next version (e.g. you need to complete Ocata's data
migrations before proceeding with upgrade to Pike; you won't be able to
execute Pike's DB schema migrations before completing Ocata's data
migrations).

View File

@ -0,0 +1,13 @@
---
upgrade:
- |
The ``cinder-manage db online_data_migrations`` command now returns exit
status 2 in the case where some migrations failed (raised exceptions) and
no others were completed successfully from the last batch attempted. This
should be considered a fatal condition that requires intervention. Exit
status 1 will be returned in the case where the ``--max-count`` option was
used and some migrations failed but others succeeded (updated at least one
row), because more work may remain for the non-failing migrations, and
their completion may be a dependency for the failing ones. The command
should be reiterated while it returns exit status 1, and considered
completed successfully only when it returns exit status 0.