From 64703bf1c756a770de0f27cd821cc3b21cbd38c1 Mon Sep 17 00:00:00 2001 From: Gorka Eguileor Date: Wed, 2 Aug 2017 19:53:34 +0200 Subject: [PATCH] Provisioning Improvements Spec to add support for standardized over-provisioning calculations and make all drivers status reports consistent between different backends. Change-Id: Ie709333d75fb8c1c4693eae7d09d7f4925dc3fca --- specs/queens/provisioning-improvements.rst | 490 +++++++++++++++++++++ 1 file changed, 490 insertions(+) create mode 100644 specs/queens/provisioning-improvements.rst diff --git a/specs/queens/provisioning-improvements.rst b/specs/queens/provisioning-improvements.rst new file mode 100644 index 00000000..f6d01234 --- /dev/null +++ b/specs/queens/provisioning-improvements.rst @@ -0,0 +1,490 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +========================= +Provisioning Improvements +========================= + +Include the URL of your launchpad blueprint: + +https://blueprints.launchpad.net/cinder/+spec/provisioning-improvements + +Cinder provisioning is still a source of pain for everyone: end users, admins, +and developers. Multiple factors have contributed to our current situation, +like the information being dispersed in different specs, developer's reference, +and even the code, but we also have cases of misinterpreted documentation, +documentation not keeping up with the project's evolution, and even misleading +or incorrect documentation. + +This spec will build on those that preceded it on the same topic [1]_ [2]_ and +others related to the subject [3]_ to bring a consolidated and updated view on +the matter as well as add a some minor improvements and fix some issues in +hopes that we can provide a better experience for all involved. + + +Problem description +=================== + +Our current situation is quite chaotic, we have volume creations that fail +based on an incorrect capacity calculation that would have succeeded if we had +just received an update on the stats of the backend, we have drivers that are +reporting incorrect data on the stats, and we have volumes that cannot be +created when they should be allowed to. + +Before going any further we first need to define the terms we'll be using to +ensure they hold the same meaning for all of us, as this is the source of some +of our current issues. + +Disagreement on the mapping of these terms and their description is +understandable, but for the sake of understanding each other we'll hold below +descriptions as true since they were defined as such in our specs and most of +our code. + +Improvements on the word used for the terms and field names can be discussed at +another time and updated in the specs, documentation, and code accordingly. + +For the sake of completeness and to remove any misunderstandings that could +lead to different implementations on the drivers, which we currently have, the +descriptions will also include some clarifications and examples that may +reference current Cinder code. + +Terminology +----------- + +GB: + Even though this is formally known as the symbol representation of a gigabyte + -decimal unit of measurement- we will be using it throughout the spec and our + code as the symbol for the gibibyte -binary unit of measurement as defined by + the International Electrotechnical Commission (IEC), with symbol GiB-, so + when we talk about 1GB we are talking about 1024MB, and the same applies to + TB and MB. + +Total capacity: + It is the total physical capacity that would be available in the storage + array's pool being used by Cinder if no volumes were present. + + This is currently being reported by the drivers as `total_capacity_gb` and, + as the name indicates, should be reported in GB and with a precision no + greater than 2 decimals. + + If the storage array has 5TB of space but the pool used in Cinder is limited + to 1TB then the driver should be reporting a `total_capacity_gb` of 1024GB. + +Volume size: + It is the maximum physical size that a volume can take in the storage array. + + This is referenced throughout the code as `volume_size`. + + For a thick volume the `volume_size` will be the same as the free capacity we + lost when it was provisioned, whereas for a thin volume it will be greater + than the space used for the volume in the storage array until the volume gets + completely full. + +Free capacity: + It is the current physical capacity available in the storage array's pool + being used by Cinder. The number and volume sizes of the thin and thick + volumes that have been provisioned by Cinder or directly in the storage array + are irrelevant here. + + This is currently being reported by the drivers as `free_capacity_gb` and, as + the name indicates, should be reported in GB and with a precision no greater + than 2 decimals. + + If the storage array has 5TB of space with a total of 3TB available for all + its pools but Cinder is using a pool that has a limit of 1TB of which it has + already used 400GB and someone has manually created volumes outside of Cinder + that are currently using 124GB of space, then the driver should be reporting + a `free_capacity_gb` of 500GB (1TB = 1024GB = 400GB + 124GB + 500GB). + +Provisioned capacity: + The amount of capacity that would be used in the storage array's pool being + used by Cinder if all the volumes present in there were completely full. + + This is currently being reported by the drivers as `provisioned_capacity_gb` + and, as the name indicates, should be reported in GB and with a precision no + greater than 2 decimals. This is a required field and *must always be + present*. + + This includes not only volumes created by Cinder but also all other existing + volumes in that backend, but *does not include snapshots*. + + Let's expand the earlier example from "free capacity" where 524GB of the + available 1TB had already been used, and say that the 124GB that were + externally created were all used by 1GB thick volumes, and that Cinder was + using the 400GB with 400 thick volumes of 1GB and 20 empty thin volumes of + 20GB each. In this situation our reported `provisioned_capacity_gb` value + should be 924GB ((124 * 1GB) + (400 * 1GB) + (20 * 20GB)). + + If a driver does not report the `provisioned_capacity_gb` data we'll use the + automatically calculated `allocated_capacity_gb` as described below. + +Allocated capacity: + Contrary to what the name may suggest this is not referring to the + "allocated" space on the storage array, but to the provisioned volumes + created by this specific Cinder Volume backend process on the storage array's + pool being used by Cinder and that still present. + + Important to notice that this refers to a specific service backend, so if you + are running a multi-backend Cinder service or multiple Cinder Volume services + where you have more than one backend configured to use the same storage + array's pool, then each one of these backends will only be reporting the + sum of the `volume_size` of the volumes they created and not the sum of all + the `volume_size` of the volumes that have been created by a Cinder service. + + This is currently being reported by the Volume service as + `allocated_capacity_gb` and, as the name indicates, should be reported in GB. + + For two volumes had been created, one thick and one thin, each one of 1GB, + then you'll be reporting 2GB as `allocated_capacity_gb`, but if you were to + unmanage one of those volumes then you would only be reporting 1GB, even if + the volume is still there and will still be counted in the + `provisioned_capacity_gb`. + + This field is calculated directly by the Cinder core code and drivers should + not calculate or report this information on their `get_volume_stats` method. + +Over subscription ratio: + It is the maximum ratio between the "provisioned capacity" and the "total + capacity" represented as a real number. A ratio of 1.0 means that the + "provisioned capacity" cannot exceed the "total capacity" whereas a value of + 5.0 means that the Cinder backend is allowed to create as much as 5 times the + "total capacity" of the storage array's pool in volumes. + + This will only have effect when a thin provisioned volume is being created, + and will be ignored for thick provisioned. + + This is currently being reported by the drivers as + `max_over_subscription_ratio` with a greater or equal value to 1.0, + preferably with no more than a 2 decimal precision. + + This value is optional, and when missing from the driver's status report the + value defined in the `[DEFAULT]` section on the Cinder scheduler receiving + the request will be used. So vendors should make sure that they are + correctly returning this value in their drivers if they support thin + provisioning and admins should make sure they have a consistent default value + of the `max_over_subscription_ratio` across all scheduler nodes. + + Note that this ratio is per backend or per pool depending on driver + implementation. + +Reserved percentage: + Represents the percentage of the storage array's "total capacity" that is + reserved and should not be used for calculations. It is represented by an + integer value going from 0 up to 100. + + This is currently being reported by the drivers as + `reserved_percentage` with a greater or equal value to 1.0, preferably + with no more than a 2 decimal precision. + + Default value is 0 if the field is missing in the status report from the + backend or if the user has not defined it in the backend's Cinder + configuration. This is per backend or per pool depending on driver + implementation. + +Provisioning support: + Cinder backends may support up to two different types of provisioning, *thin* + and *thick* and drivers are expected to indicate as capable of one of them at + least in their capabilities report. + + The way to report support for these is setting to true the boolean fields + `thin_provisioning_support` and/or `thick_provisioning_support`. And non + reported provisioning types will default to false. + + A Cinder backend may support both provisioning types at the same time. + +Volume provisioning type: + For Cinder backends that only support one of the provisioning types all + volumes created on them will be of that type, and we can use the volume + type's extra specs to make the scheduler filter out backends not supporting a + specific provisioning type: + + - 'thin_provisioning_support': ' True' or ' False' + - 'thick_provisioning_support': ' True' or ' False' + + But if our deployment is using a backend that is supporting both provisioning + types simultaneously we need to be explicit about the type of provisioning we + want for a volume using the volume type's extra spec `provisioning:type` and + setting it to `thin` or `thick`. + + If no `provisioning:type` is defined for a volume it will default to thin if + the backend is capable of it, and the driver is expected to honor this + assumption. + +Incorrect reports +----------------- + +Given above terms which were originally defined in their corresponding specs, +even if there may be additional comments in this one, we can determine that +there are a good number of Cinder drivers that do not follow these definitions +and are reporting what would be incorrect values. + +Reporting incorrect values means that on a heterogeneous cloud you'll have +inconsistent scheduling and an admin will not be able to make sense of the +stats from the volumes. + +To illustrate this here are some of the interpretations we can see across +different drivers for the `provisioned_capacity_gb`: + +* Sum of all the volumes' max sizes, which is correct. +* Sum of all the volumes' physical disk usage, which is wrong. +* Sum of the Cinder volumes' physical disk usage, which is wrong. + +And something similar happens with the `allocated_capacity_gb` where drivers go +and report the value directly instead of letting the Cinder core code take care +of it. Drivers have been known to report here the following information: + +* Sum of the Cinder volumes' physical disk usage, which is correct. +* Sum of the physical disk usage, which is wrong. +* Sum of all the volumes' max sizes, which is wrong. + +Provisioning calculations +------------------------- + +Some of the creation failures are based on the `provisioned_capacity_gb` value +being wrong, but there are other cases where Cinder's calculations for over +provisioning do not match industry's standard definition, which for some admins +create confusion and undesired behavior. + +Standard provisioning calculation to check if a volume of `volume_size` fits +is:: + + ((provisioned_capacity_gb + volume_size) <= + (total_capacity_gb + x (1 - (reserved_percentage / 100.0)) + x max_over_subscription_ratio)) + +Whereas the Cinder calculations, which were agreed on as the best calculations +for being considered safer are:: + + (volume_size <= + (free_capacity_gb + - (total_capacity_gb x reserved_percentage / 100.0)) + x max_over_subscription_ratio) + + +Calculating max over subscription ratio +--------------------------------------- + +Most deployments have very dynamic workloads each with different physical +storage requirements, which means that one month we may require many volumes +of which we barely use any space and next month we may require fewer volumes +but use most of the provisioned capacity. + +This makes it almost impossible to accurately model our storage requirements +at deployment time, which is precisely when we have to set the +`max_over_subscription_ratio` for our Cinder backends. + +As requirements change one option would be to change the configuration and +restart our Cinder Volume services, but since Cinder is also in the data path +the restart may take a long time to do and will have a considerable impact on +our cloud users. + +Not being able to determine beforehand the best `max_over_subscription_ratio` +and not being able to easily restart the Cinder service is a common pain that +most operators have with backends supporting thin provisioning. + +Use Cases +========= + +The basic case for fixing the status report is where we would like to have +consistent reporting from our backends for the admins to see in the logs and +for the scheduler to use. + +Any operator using thin provisioning storage that wants to optimize their +storage usage and dynamically adjust to the dynamic requirements of its cloud. + +As for the alternative calculations it would greatly benefit any backend that +is close to their full capacity or one that is creating huge volumes that +usually never get filled in. + +Proposed change +=============== + +Incorrect reports +----------------- + +Since we have consolidated all the documentation in one place, this spec, where +we clearly state expected driver behavior all driver maintainers will be urged +to make their drivers compliant with it. This will mean adapt their drivers +to follow this document's definition for `provisioned_capacity_gb` and stop +reporting the `allocated_capacity_gb` field. + +Automatic over subscription ratio calculation +--------------------------------------------- + +To allow automatic over subscription ratio calculation we will add a new +configuration option named `auto_max_over_subscription_ratio` that will +instruct Cinder to use configured `max_over_subscription_ratio` as a starting +reference when the backend is empty and then, when there is data calculate the +current value on each driver stats report with the following formula:: + + adjusted_total = `total_capacity_gb` x (1 - (`reserved_percentage` / 100.0)) + ratio = `provisioned_capacity_gb` / adjusted_total - `free_capacity_gb` + +If the driver is not reporting `provisioned_capacity_gb` then we'll proceed to +use the `allocated_capacity_gb` instead:: + + adjusted_total = `total_capacity_gb` x (1 - (`reserved_percentage` / 100.0)) + ratio = `allocated_capacity_gb` / adjusted_total - `free_capacity_gb` + +This new configuration option will be independent of the drivers and it will be +part of Cinder's core code, so if `auto_max_over_subscription_ratio` is not +defined or set to `False` then Cinder will continue behaving as it is now +(returning the `max_over_subscription_ratio` that is reported by the driver or +the one configured by default if not present). But if it is set to True it +will always return calculated ratio as explained. + +There are a couple of drivers that are already doing this, Pure and Kaminario's +K2, but with different configuration options, +`pure_automatic_max_oversubscription_ratio` and +`auto_calc_max_oversubscription_ratio` respectively, so we'll deprecate those +configuration options and remove the code within those drivers when we add the +generic code to Cinder. + +Provisioning calculations +------------------------- + +Instead of keep fighting with admins and developers on which one of the +approaches is best -standard calculation or Cinder's- we will be adding a new +configuration option called `over_provisioning_calculation` which will take +values `standard` and `cinder` and default to `cinder` for backward +compatibility and that will be used by the `CapacityFilter` to determine which +one of the mechanism to use. + +This configuration option will also affect `CapacityWeigher` as it will need to +do the free space calculation according to the standard definition as well. + +As one can assume thick provisioning will have no modifications on its +behavior. + +Alternatives +------------ + +* Don't support standard over-provisioning calculations. +* Instead of modifying `CapacityFilter` and `CapacityWeigher` create 2 new + classes. +* Instead of adding the `over_provisioning_calculation` configuration option + make the filter use the options JSON file provided by + `scheduler_json_config_location` . This data seems to be currently missing + on some of the operations like migrate, extend, so that would need to + changed. + +Data model impact +----------------- + +N/A + +REST API impact +--------------- + +The only affected API will be the `get_pools` API that will be able to return +the 2 new fields, `total_used_capacity_gb` and `cinder_used_capacity_gb`, when +they are being reported by the driver's `get_volume_stats` method. Fields will +not be present if the drivers are not reporting them. + +Security impact +--------------- + +N/A + +Notifications impact +-------------------- + +N/A + +Other end user impact +--------------------- + +The user may see new fields when calling cinderclient's `get_pools`. + +Performance Impact +------------------ + +Depending on the driver and the storage array, performance could increase or +decrease, since getting provisioned sizes instead of physical sizes could be +faster or slower. + +Other deployer impact +--------------------- + +With the change of values returned by Cinder backends for +`allocated_capacity_gb` and `provisioned_capacity_gb` we may experience +failures on creating volumes until we correct the values of +`reserved_percentage` and `max_over_subscription_ratio` in our cloud to the +right values, since we may have been using incorrect ones. + +Two new configuration options will be added: + +* `auto_max_over_subscription_ratio`: Boolean value that will instruct Cinder + to automatically calculate the over subscription ratio based on current usage + instead of using a fixed value. + +* `over_provisioning_calculation`: Will allow to select what kind of + calculations the `CapacityFilter` does to determine if there is space for a + volume in a backend. Acceptable values are `standard` and `cinder`. Default + values will be `cinder`. + +Developer impact +---------------- + +Driver maintainer will need to verify, and fix if necessary, their stat reports +for `allocated_capacity_gb` and `provisioned_capacity_gb` unless they start +using the new `auto_max_over_subscription_ratio` configuration option. + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + None + +Other contributors: + None + +Work Items +---------- + +* File bugs for drivers that are not in compliance. +* Fix drivers stat reporting. +* Add support for the 3 new fields, `provisioned_capacity_precission`, + `total_used_capacity_gb` and `cinder_used_capacity_gb` in the scheduler, the + `get_pools` API, and the client. +* Modify `CapacityFilter` to support the standard over-provisioning + calculation. +* Modify `CapacityWeigher` to support standard over-provisioning calculations. +* Add to the volume manager the estimation mechanism for drivers that don't + report `provisioned_capacity_gb`. +* Update all the developers reference docs to ensure that there is no more + confusion on what the report stats need to return, and make sure that the + wiki page on how to contribute a driver links to that documentation + explaining the importance of following it when writing the driver. + +Dependencies +============ + +N/A + +Testing +======= + +New unit tests will be added to test the changed code. + +Documentation Impact +==================== + +Since our current documentation is lacking in this aspect this will add and +update it to reflect what's expected of the driver in the stats reports. + +End user documentation should also be updated. + +References +========== + +.. [1] https://specs.openstack.org/openstack/cinder-specs/specs/kilo/over-subscription-in-thin-provisioning.html +.. [2] https://specs.openstack.org/openstack/cinder-specs/specs/newton/differentiate-thick-thin-in-scheduler.html +.. [3] https://specs.openstack.org/openstack/cinder-specs/specs/liberty/standard-capabilities.html