Share server migration

This patch adds a specification for migrating share servers
and all its resources to a new destination. This spec proposes
a mechanism similar to the existent share migration.

APIImpact
Partially-Implements: bp share-server-migration

Change-Id: I535efdc6d8f5517163b6c285e7c1503a4313b6ee
Signed-off-by: Douglas Viroel <viroel@gmail.com>
This commit is contained in:
Douglas Viroel 2020-06-16 16:57:29 +00:00
parent eb16cffa49
commit a84a5d0610
1 changed files with 508 additions and 0 deletions

View File

@ -0,0 +1,508 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
======================
Share Server Migration
======================
https://blueprints.launchpad.net/manila/+spec/share-server-migration
Manila supports the deployment model where share drivers are able to handle the
creation and the management of share servers as well as shares and their
capabilities[1]. By managing different share servers per tenant level, Manila
leverages its capability of configuring storage entities and provides more
manageability for administrators. As presented in Liberty release, and later
improved on Mitaka, Newton and Ocata releases, share migration operation allows
administrators to move a share across backends, in a non-disruptive manner, by
implementing a 2-phase migration approach. This spec now proposes to extend
this migration concept to the share server entity, relying on share drivers
that can do this operation in an atomic and efficient way.
Problem description
===================
Administrators might need to handle situations like back end evacuation or
rebalancing, and face the problem of migrating lots of shares, one by one, to
a specific, and probably common, destination. Even with additional tools or
scripts this task can be hard to manage and mainly, to recover from failure
states. The lack of a feature that helps administrators to rebalance/evacuate
large storage systems is the reason for proposing the following solution.
Use Cases
=========
There are several scenarios where share server migration comes handy and
provides benefits to cloud administrators:
* **Rebalance**: move shares to a back end that has more free capacity, freeing
up space for other shares to grow over the time;
* **Optimization**: move shares and spare a back end in order to conserve
power. Move data closer to the hosts for a better network performance;
* **Evacuation**: evacuate a back end that is too old or that is experiencing
failures;
* **Maintenance**: move shares to a newer hardware version/model;
* **Others**: change shares' configuration like: share network,
security services, etc.
Proposed change
===============
As designed for share migration on Newton release[2], the 2-phase migration
logic will be also implemented for share servers. By invoking
`share-server-migration-start`, the share server migration can start to copy
all data, from source to destination, including all shares, snapshots and
shares' access, if supported by the driver that implements it.
After finishing the 1st phase, administrators can plan and start the 2nd phase,
by invoking 'share-server-migration-complete' to finish the operation, that
usually causes the disruption of share's access, since share's export locations
might be updated.
It is important to note that when migrating a share server, many share
attributes won't be modified during the process, while share server attributes
might change depending on the provided parameters. Administrators will be able
to provide a new 'Share Network' to associate to the new share server, but
won't be able to change its shares' attributes like 'Share Type' since this
is a share level entity and different 'Share Types' can live in the same share
server.
Share API and Manager Changes
-----------------------------
The share API will hold all validations needed before proceeding with driver's
calls and database updates. The API will check if any of the shares within the
share server being migrated are in an invalid state or have any dependent
resource that cannot be migrated together with the share. The migration can
fail earlier if one of those validations cannot be satisfied.
Before starting the migration, the share server and all its shares will have
their status updated to reflect the operation that is being executed and to
block any other operation that could be triggered after this one started.
The source share server and all its shares will have their status updated to
``server_migrating`` while the destination share server will be updated to
``server_migrating_to``. By changing all shares' status, users will be
able to identify that a group of shares is blocked for receiving any other
operation.
After running through all validations with success, the share server's new
attribute called `task_state` will be updated to ``server_migration_starting``
and the scheduler will be invoked to validate if the host matches with the
provided share types.
By reaching share manager's migration start method, a driver's call will be
triggered to analyze if the destination back end can handle such operation
before starting the migration. If one of the required options can't be
satisfied, the migration will fail.
The share manager will update the share server's `task_state` to
``server_migrating`` and all its instances' status to ``server_migrating``. A
new share server might be requested in the destination back end to hold all the
data from source. It is expected that drivers will be able to identify that a
new server is being requested for migration purposes. After that, the driver
will be called to start the share server migration and to return immediately.
A share manager periodic task will continuously check share servers that have
the `task_state` set to ``server_migrating`` to invoke the driver's call
`share_server_migration_continue` to track the progress of share servers that
are in the 1st phase of the migration. After successfully finishing the 1st
phase, the share server `task_state` will be updated to
``server_migrating_phase1_done``.
Finally, share manager's `share_server_migration_complete` method can be
invoked for share servers that already completed the 1st phase, to finish the
migration. In this phase, the driver is called to finish the share server
migration and perform the last steps in the back end and return the list of
export locations for all its shares. The `task_state` of the share server is
set to ``server_migration_completed`` and all its shares have their export
paths updated before they become ``available`` again.
Before moving to the 2nd phase, during the data copy or at the 1st phase
completed, administrators can cancel the operation by invoking the
`share_server_migration_cancel` API. If supported by the driver, the cancel
operation will delete everything new that was created during the process, and
the share server and all its shares will go back to the initial state.
Scheduler Changes
-----------------
The scheduler filters can be used to validate if the destination host can
hold all shares associated to the share server being migrated. Share API will
need to provide the share server's total size along with all associated share
types' capabilities in order to validate if the destination host is suitable
for the new share server. However, the Scheduler won't be able to validate
share servers that spans across multiple pools, and for this type of scenario,
share server migration will need to rely on driver's checks to validate the
feasibility of such operation.
Alternatives
------------
The alternative is to use scripts or any other automation tool to move all
shares to a new destination, one by one, using share migration feature.
Data model impact
-----------------
A new field will be added to `Share Server` table to help tracking the states
of a share server migration. The new field `task_state` will work
like the same field that already exists on `Share` table. Administrator will be
able to reset the `task_state` by issuing the API
`share-server-reset-task-state`, as shown in the next section.
REST API impact
---------------
For admin-only, new API methods will be implemented:
1) `share-server-migration-start`
Migrates a share server::
POST /share-servers/{share_server_id}/action
Body::
{
"migration_start": {
"writable": true,
"nondisruptive": true,
"preserve_snapshots": true,
"host": "host@dummy1#pool2",
"new_share_network_id": "new_share_network_id"
}
}
The `host` contains the string host where the share server will be migrated to.
The capabilities `preserve_metadata`, `writable`, `nondisruptive` and
`preserve_snapshots`, if enabled, must be supported by the drivers that
implement such feature. If one of the capabilities isn't supported, the
migration will fail later in the driver's compatibility check.
By setting `writable` to ``true`` it's expected that all shares remain writable
during the first phase of the migration, where the data copy usually occurs.
However it doesn't guarantee that will remain ``writable`` during the second
phase, where the cutover usually happens for drivers that don't support a
`nondisruptive` migration.
By specifying `nondisruptive` equal to ``true``, the migration will be
performed without disrupting clients during the entire process, which usually
means that export locations won't be modified, and hence new network
allocations won't be made for the new share server.
If `preserve_snapshots` is set, it's expected that all snapshots from all
shares will be migrated together with the share server. If not supported by the
driver, users will need to consider unmanaging or deleting all snapshots
before proceeding with the migration.
The only optional parameters is 'new_share_network_id', which may need to be
provided to fit destination network requirements.
If the provided `share_server_id` doesn't exist, the API will respond with
``404 Not Found``. If one of the optional parameters is invalid or doesn't
exist, the API will respond with ``400 Bad Request``. If during the initial
validations in the Share API, one of the resources is busy or has an invalid
status, the API will respond with ``409 Conflict``.
Upon a failure, the share server and all its share will have their status
updated to ``available`` and their `task_state` set to
``server_migration_error``.
2) `share-server-migration-complete`
Start the 2nd phase of migration::
POST /share-servers/{share_server_id}/action
Body::
{"migration_complete": {}}
Triggers the start of the 2nd phase of migration on a share server that already
finished the 1st phase.
If the provided `share_server_id` doesn't exist, the API will respond with
``404 Not Found``.
If the operation can't be performed due to unsupported migration state, the API
will respond with ``400 Bad Request``.
Upon a failure in the second phase of the migration, the share server and all
its shares will have their status updated to ``error`` and their `task_state`
set to ``server_migration_error``. At this point, it won't be possible to
determine the status of the share server and its shares, and it will be up to
the administrator to manually fix this problem.
3) `share-server-migration-cancel`
Attempts to cancel migration::
POST /share-servers/{share_server_id}/action
Body::
{"migration_cancel": {}}
To cancel a migration in progress, the operation must not be in the 2nd phase
and the driver must support such operation.
If the provided `share_server_id` doesn't exist, the API will respond with
with ``404 Not Found``.
If the operation can't be performed due to unsupported migration state or
unsupported operation within the driver, the API will respond with
``400 Bad Request``.
After a successful migration cancellation operation, the share server and all
its shares will have their status updated to ``available`` and their
`task_state` set to ``server_migration_cancelled``.
4) `share-server-migration-get-progress`
Attempts to obtain migration progress::
POST /share-servers/{share_server_id}/action
Body::
{"migration_get_progress": {}}
Response::
{"total_progress": 30}
Gives the current migration progress in a percentage value. Drivers might also
provide additional information together with `total_progress` info.
If the provided `share_server_id` doesn't exist, the API will respond with
``404 Not Found``.
If the provided `share_server_id` isn't performing a migration, the API will
respond with ``400 Bad Request``.
5) `share-server-reset-task-state`
Reset task state field value::
POST /share-servers/{share_server_id}/action
Body::
{
"reset_task_state": {
"task_state": "migration_error"
}
}
If the provided `share_server_id` doesn't exist, the API will respond with
``404 Not Found``.
6) `share-server-migration-check`
Check if a share server can be migrated to a destination host::
POST /share-servers/{share_server_id}/action
Body::
{
"migration_check": {
"writable": true,
"nondisruptive": true,
"preserve_snapshots": true,
"host": "host@dummy1#pool2",
"new_share_network_id": "new_share_network_id"
}
}
Response::
{
"compatible": true,
"requested_capabilities": {
"writable": true,
"nondisruptive": true,
"preserve_snapshots": true,
"host": "host@dummy1#pool2",
"new_share_network_id": "new_share_network_id"
}
"supported_capabilities": {
"writable": true,
"nondisruptive": false,
"preserve_snapshots": true,
"new_share_network_id": "new_share_network_id"
"migration_cancel": true,
"migration_get_progress" false,
}
}
Checks the feasibility of migrating a share server to a destination host.
Drivers will be able to check if the provided destination host can hold the
share server and which migration options will be available for this operation.
By answering `compatible` equal to ``true`` or ``false``, the admin will know
if the provided host is a feasible destination for the share server.
The migration options `writable`, `nondisruptive` and `preserve_snapshots`
show if the driver supports such options while migrating the share server.
If supported, the current share network or, if provided, the
`new_share_network_id` will also appear in the `supported_capabilities` field.
The migration operations `migration_cancel` and `migration_get_progress` may
also be available depending on the driver implementation.
Driver impact
-------------
Vendors that want to support share server migration must implement the
following interfaces:
* **choose_share_server_compatible_for_migration**: interface needed to tell
the share manager which compatible share server can be used as destination
in a migration operation;
* **share_server_migration_check_compatibility**: it will be always called
before starting the migration to check if the driver supports migrating the
share server to the required destination, and answer which kind of
capabilities will be supported on such operation;
* **share_server_migration_start**: called to start the first phase of
migration. The procedure should be started in the back end and return
immediately.
* **share_server_migration_continue**: will be called to monitor the progress
of a share server migration. Drivers will answer if the 1st phase was already
finished or raise an exception in case of failure.
* **share_server_migration_complete**: starts the 2nd phase of the migration,
to complete the operation by cutting over the access from the source and
providing access through the destination.
* **share_server_migration_cancel**: drivers will implement this call if they
support the cancellation of a migration operation that is already in
progress. The migration cancellation won't be available for share servers
that already started the 2nd phase;
* **share_server_migration_get_progress**: drivers will implement this call to
provide the total progress of the migration.
As implemented in share migration approach, drivers will be invoked to check
the compatibility with the destination back end before starting the migration.
During this validation, drivers will be able to return the capabilities
supported for migrating a share server to the provided destination, such as
remaining writable, preserving snapshots and others.
After that, `share_server_migration_start` will take place and ask drivers to
start the 1st phase of the migration, that should be answered asynchronously.
Manila will reuse the same periodic task from share migration to continuously
check if the 1st phase is already completed by calling the driver interface
`share_server_migration_continue`.
Finally, the driver will need to perform the last steps to complete the share
server migration when the `share_server_migration_complete` is invoked. At this
moment, the access to the source share server shares may be interrupted,
depending on driver's capabilities, and moved to the new destination.
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
During the migration process users won't be able to perform any management
operation in all shares that belong to the share server being migrated.
Depending on driver's capabilities, users may also lose write access to those
shares.
Performance Impact
------------------
No performance impact is expected on implementing this feature. However,
depending on how many shares are placed within a share server, other
operations can be impacted due to the number of database operations triggered
by a share server migration, during sanity checks and status updates on all
affected resources (shares, snapshots, access, etc).
Other deployer impact
---------------------
Drivers that implement share server migration might need to retrieve the
configuration from other back ends in order to access it and provide a way of
copying all the data. Administrators will need to keep these files up to date
in all its share service instances.
Developer impact
----------------
None.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
dviroel
Work Items
----------
* Implement main patch that contains:
* New API methods for share server migration;
* New Scheduler call for share server migration start;
* Share Manager implementation for share server migration;
* Database updates for Share Server model;
* New driver interfaces for migration of share servers.
* Update python-manilaclient with new share server's CLI commands.
* For testing:
* Improve and implement both container and dummy drivers to support share
server migration across different back ends.
* New functional tests in manila-tempest-plugin.
* Documentation updates.
Dependencies
============
None.
Testing
=======
The container driver will need to be improved to support share server migration
across different back ends.
New functional tests will be added to perform share server migration on the
same back end and across different back ends. Vendors that implement support
for this feature will be encouraged to run these tests in their CI.
Documentation Impact
====================
The following documentation will be updated:
* API reference: Will update the Share Server API by adding the new actions for
share server migration procedure.
* Admin reference: Will add information on how the functionality works and
which drivers supports it.
* Developer reference: Will add information on how the new functionality works,
and which interfaces need to be implemented.
References
==========
[1] https://docs.openstack.org/manila/ussuri/admin/shared-file-systems-share-server-management.html
[2] https://opendev.org/openstack/manila-specs/src/branch/master/specs/newton/newton-migration-improvements.rst
[3] https://etherpad.opendev.org/p/share-server-migration