diff --git a/doc/source/devref/filter_scheduler.rst b/doc/source/devref/filter_scheduler.rst index 01359000f56b..9f51462d3bc3 100644 --- a/doc/source/devref/filter_scheduler.rst +++ b/doc/source/devref/filter_scheduler.rst @@ -1,103 +1,245 @@ -.. - Copyright 2011 OpenStack LLC - All Rights Reserved. - - Licensed under the Apache License, Version 2.0 (the "License"); you may - not use this file except in compliance with the License. You may obtain - a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, WITHOUT - WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the - License for the specific language governing permissions and limitations - under the License. - - Source for illustrations in doc/source/image_src/zone_distsched_illustrations.odp - (OpenOffice Impress format) Illustrations are "exported" to png and then scaled - to 400x300 or 640x480 as needed and placed in the doc/source/images directory. - Filter Scheduler -===================== +================ -The Scheduler is akin to a Dating Service. Requests for the creation of new instances come in and the most applicable Compute nodes are selected from a large pool of potential candidates. In a small deployment we may be happy with the currently available Chance Scheduler which randomly selects a Host from the available pool. Or if you need something a little more fancy you may want to use the Filter Scheduler, which selects Compute hosts from a logical partitioning of available hosts. +The **Filter Scheduler** supports `filtering` and `weighting` to make informed +decisions on where a new instance should be created. This Scheduler supports +only working with Compute Nodes. - .. image:: /images/dating_service.png +Filtering +--------- -The Filter Scheduler supports filtering and weighing to make informed decisions on where a new instance should be created. +.. image:: /images/filteringWorkflow1.png -So, how does this all work? +During its work Filter Scheduler firstly makes dictionary of unfiltered hosts, +then filters them using filter properties and finally chooses hosts for the +requested number of instances (each time it chooses the least costed host and +appends it to the list of selected costs). -Costs & Weights ---------------- -When deciding where to place an Instance, we compare a Weighted Cost for each Host. The Weighting, currently, is just the sum of each Cost. Costs are nothing more than integers from `0 - max_int`. Costs are computed by looking at the various Capabilities of the Host relative to the specs of the Instance being asked for. Trying to put a plain vanilla instance on a high performance host should have a very high cost. But putting a vanilla instance on a vanilla Host should have a low cost. +If it turns up, that it can't find candidates for the next instance, it means +that there are no more appropriate instances locally. -Some Costs are more esoteric. Consider a rule that says we should prefer Hosts that don't already have an instance on it that is owned by the user requesting it (to mitigate against machine failures). Here we have to look at all the other Instances on the host to compute our cost. +If we speak about `filtering` and `weighting`, their work is quite flexible +in the Filter Scheduler. There are a lot of filtering strategies for the +Scheduler to support. Also you can even implement `your own algorithm of +filtering`. -An example of some other costs might include selecting: - * a GPU-based host over a standard CPU - * a host with fast ethernet over a 10mbps line - * a host that can run Windows instances - * a host in the EU vs North America - * etc +There are some standard filter classes to use (:mod:`nova.scheduler.filters`): -This Weight is computed for each Instance requested. If the customer asked for 1000 instances, the consumed resources on each Host are "virtually" depleted so the Cost can change accordingly. +* |AllHostsFilter| - frankly speaking, this filter does no operation. It + returns all the available hosts after its work. +* |AvailabilityZoneFilter| - filters hosts by availability zone. It returns + hosts with the same availability zone as the requested instance has in its + properties. +* |ComputeFilter| - checks that the capabilities provided by the compute + service satisfy the extra specifications, associated with the instance type. + It returns a list of hosts that can create instance type. +* |CoreFilter| - filters based on CPU core utilization. It will approve host if + it has sufficient number of CPU cores. +* |IsolatedHostsFilter| - filter based on "image_isolated" and "host_isolated" + flags. +* |JsonFilter| - allows simple JSON-based grammar for selecting hosts. +* |RamFilter| - filters hosts by their RAM. So, it returns only the hosts with + enough available RAM. +* |SimpleCIDRAffinityFilter| - allows to put a new instance on a host within + the same IP block. - .. image:: /images/costs_weights.png - -Filtering and Weighing ----------------------- -The filtering (excluding compute nodes incapable of fulfilling the request) and weighing (computing the relative "fitness" of a compute node to fulfill the request) rules used are very subjective operations ... Service Providers will probably have a very different set of filtering and weighing rules than private cloud administrators. The filtering and weighing aspects of the `FilterScheduler` are flexible and extensible. - - .. image:: /images/filtering.png - -Host Filter ------------ - -As we mentioned earlier, filtering hosts is a very deployment-specific process. Service Providers may have a different set of criteria for filtering Compute nodes than a University. To facilitate this, the `FilterScheduler` supports a variety of filtering strategies as well as an easy means for plugging in your own algorithms. Specifying filters involves 2 settings. One makes filters available for use. The second specifies which filters to use by default (out of the filters available). The reason for this second option is that there may be support to allow end-users to specify specific filters during a build at some point in the future. - -Making filters available: - -Filters are made available to the scheduler via the `--scheduler_available_filters` setting. This setting can be specified more than once and should contain lists of filter class names (with full paths) to make available. Specifying 'nova.scheduler.filters.standard_filters' will cause all standard filters under 'nova.scheduler.filters' to be made available. That is the default setting. Additionally, you can specify your own classes to be made available. For example, 'myfilter.MyFilterClass' can be specified. Now that you've configured which filters are available, you should set which ones you actually want to use by default. - -Setting the default filtering classes: - -The default filters to use are set via the `--scheduler_default_filters` setting. This setting should contain a list of class names. You should not specify the full paths with these class names. By default this flag is set to `['AvailabilityZoneFilter', 'RamFilter', 'ComputeFilter']`. Below is a list of standard filter classes: - - * `AllHostsFilter` includes all hosts (essentially is a No-op filter) - * `AvailabilityZoneFilter` provides host filtering based on availability_zone - * `ComputeFilter` provides host filtering based on `InstanceType` extra_specs, comparing against host capability announcements - * `CoreFilter` provides host filtering based on number of cpu cores - * `DifferentHostFilter` provides host filtering based on scheduler_hint's 'different_host' value. With the scheduler_hints extension, this allows one to put a new instance on a different host from another instance - * `IsolatedHostsFilter` provides host filtering based on the 'isolated_hosts' and 'isolated_images' flags/settings. - * `JSONFilter` filters hosts based on simple JSON expression grammar. Using a LISP-like JSON structure the caller can request instances based on criteria well beyond what `ComputeFilter` specifies. See `nova.tests.scheduler.test_host_filters` for examples. - * `RamFilter` provides host filtering based on the memory needed vs memory free - * `SameHostFilter` provides host filtering based on scheduler_hint's 'same_host' value. With the scheduler_hints extension, this allows one to put a new instance on the same host as another instance - * `SimpleCIDRAffinityFilter` provides host filtering based on scheduler_hint's 'build_near_host_ip' value. With the scheduler_hints extension, this allows one to put a new instance on a host within the same IP block. - -To create your own `HostFilter` the user simply has to derive from `nova.scheduler.filters.BaseHostFilter` and implement one method: `host_passes`. This method accepts a `HostState` instance describing a host as well as a `filter_properties` dictionary. Host capabilities can be found in `HostState`.capabilities and other properites can be found in `filter_properties` like `instance_type`, etc. Your method should return True if it passes the filter. - -Flags ------ - -Here are some of the main flags you should set in your `nova.conf` file: +Now we can focus on these standard filter classes in details. I will pass the +simplest ones, such as |AllHostsFilter|, |CoreFilter| and |RamFilter| are, +because their functionality is quite simple and can be understood just from the +code. For example class |RamFilter| has the next realization: :: - --scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler - --scheduler_available_filters=nova.scheduler.filters.standard_filters - # --scheduler_available_filters=myfilter.MyOwnFilter - --scheduler_default_filters=RamFilter,ComputeFilter,MyOwnFilter + class RamFilter(filters.BaseHostFilter): + """Ram Filter with over subscription flag""" -`scheduler_driver` is the real workhorse of the operation. For Filter Scheduler, you need to specify a class derived from `nova.scheduler.filter_scheduler.FilterScheduler`. -`scheduler_default_filters` are the host filters to be used for filtering candidate Compute nodes. + def host_passes(self, host_state, filter_properties): + """Only return hosts with sufficient available RAM.""" + instance_type = filter_properties.get('instance_type') + requested_ram = instance_type['memory_mb'] + free_ram_mb = host_state.free_ram_mb + return free_ram_mb * FLAGS.ram_allocation_ratio >= requested_ram -Some optional flags which are handy for debugging are: +Here `ram_allocation_ratio` means the virtual RAM to physical RAM allocation +ratio (it is 1.5 by default). Really, nice and simple. + +Next standard filter to describe is |AvailabilityZoneFilter| and it isn't +difficult too. This filter just looks at the availability zone of compute node +and availability zone from the properties of the request. Each compute service +has its own availability zone. So deployment engineers have an option to run +scheduler with availability zones support and can configure availability zones +on each compute host. This classes method `host_passes` returns `True` if +availability zone mentioned in request is the same on the current compute host. + +|ComputeFilter| checks if host can create `instance_type`. Let's note that +instance types describe the compute, memory and storage capacity of nova +compute nodes, it is the list of characteristics such as number of vCPUs, +amount RAM and so on. So |ComputeFilter| looks at hosts' capabilities (host +without requested specifications can't be chosen for the creating of the +instance), checks if the hosts service is up based on last heartbeat. Finally, +this Scheduler can verify if host satisfies some `extra specifications` +associated with the instance type (of course if there are no such extra +specifications, every host suits them). + +Now we are going to |IsolatedHostsFilter|. There can be some special hosts +reserved for specific images. These hosts are called **isolated**. So the +images to run on the isolated hosts are also called isolated. This Scheduler +checks if `image_isolated` flag named in instance specifications is the same +that the host has. + +|SimpleCIDRAffinityFilter| looks at the subnet mask and investigates if +the network address of the current host is in the same sub network as it was +defined in the request. + +|JsonFilter| - this filter provides the opportunity to write complicated +queries for the hosts capabilities filtering, based on simple JSON-like syntax. +There can be used the following operations for the host states properties: +'=', '<', '>', 'in', '<=', '>=', that can be combined with the following +logical operations: 'not', 'or', 'and'. For example, there is the query you can +find in tests: :: - --connection_type=fake - --verbose + ['and', + ['>=', '$free_ram_mb', 1024], + ['>=', '$free_disk_mb', 200 * 1024] + ] -Using the `Fake` virtualization driver is handy when you're setting this stuff up so you're not dealing with a million possible issues at once. When things seem to working correctly, switch back to whatever hypervisor your deployment uses. +This query will filter all hosts with free RAM greater or equal than 1024 MB +and at the same time with free disk space greater or equal than 200 GB. + +Many filters use data from `scheduler_hints`, that is defined in the moment of +creation of the new server for the user. The only exeption for this rule is +|JsonFilter|, that takes data in some strange difficult to understand way. + +To use filters you specify next two settings: + +* `scheduler_available_filters` - points available filters. +* `scheduler_default_filters` - points filters to be used by default from the + list of available ones. + +Host Manager sets up these flags in `nova.conf` by default on the next values: + +:: + + --scheduler_available_filters=nova.scheduler.filters.standard_filters + --scheduler_default_filters=RamFilter,ComputeFilter,AvailabilityZoneFilter + +These two lines mean, that all the filters in the `nova.scheduler.filters` +would be available, and the default ones would be |RamFilter|, |ComputeFilter| +and |AvailabilityZoneFilter|. + +If you want to create **your own filter** you just need to inherit from +|BaseHostFilter| and implement one method: +`host_passes`. This method should return `True` if host passes the filter. It +takes `host_state` (describes host) and `filter_properties` dictionary as the +parameters. + +So in the end file nova.conf should contain lines like these: + +:: + + --scheduler_driver=nova.scheduler.distributed_scheduler.FilterScheduler + --scheduler_available_filters=nova.scheduler.filters.standard_filters + --scheduler_available_filters=myfilter.MyFilter + --scheduler_default_filters=RamFilter,ComputeFilter,MyFilter + +As you see, flag `scheduler_driver` is set up for the `FilterSchedule`, +available filters can be specified more than once and description of the +default filters should not contain full paths with class names you need, only +class names. + +Costs and weights +----------------- + +Filter Scheduler uses so-called **weights** and **costs** during its work. + +`Costs` are the computed integers, expressing hosts measure of fitness to be +chosen as a result of the request. Of course, costs are computed due to hosts +characteristics compared with characteristics from the request. So trying to +put instance on a not appropriate host (for example, trying to put really +simple and plain instance on a high performance host) would have high cost, and +putting instance on an appropriate host would have low. + +So let's find out, how does all this computing work happen. + +Before weighting Filter Scheduler creates the list of tuples containing weights +and cost functions to use for weighing hosts. These functions can be got from +cache, if this operation had been done before (this cache depends on `topic` of +node, Filter Scheduler works with only the Compute Nodes, so the topic would be +"`compute`" here). If there is no cost functions in cache associated with +"compute", Filter Scheduler tries to get these cost functions from `nova.conf`. +Weight in tuple means weight of cost function matching with it. It also can be +got from `nova.conf`. After that Scheduler weights host, using selected cost +functions. It does this using `weighted_sum` method, which parameters are: + +* `weighted_fns` - list of cost functions created with their weights; +* `host_states` - hosts to be weighted; +* `weighing_properties` - dictionary of values that can influence weights. + +This method firstly creates a grid of function results (it just counts value of +each function using `host_state` and `weighing_properties`) - `scores`, where +it would be one row per host and one function per column. The next step is to +multiply value from the each cell of the grid by the weight of appropriate cost +function. And the final step is to sum values in the each row - it would be the +weight of host, described in this line. This method returns the host with the +lowest weight - the best one. + +If we concentrate on cost functions, it would be important to say that we use +`compute_fill_first_cost_fn` function by default, which simply returns hosts +free RAM: + +:: + + def compute_fill_first_cost_fn(host_state, weighing_properties): + """More free ram = higher weight. So servers will less free ram will be + preferred.""" + return host_state.free_ram_mb + +You can implement your own variant of cost function for the hosts capabilities +you would like to mention. Using different cost functions (as you understand, +there can be a lot of ones used in the same time) can make the chose of next +host for the creating of the new instance flexible. + +These cost functions should be set up in the `nova.conf` with the flag +`least_cost_functions` (there can be more than one functions separated by +commas). By default this line would look like this: + +:: + + --least_cost_functions=nova.scheduler.least_cost.compute_fill_first_cost_fn + +As for weights of cost functions, they also should be described in `nova.conf`. +The line with this description looks the following way: +**function_name_weight**. + +As for default cost function, it would be: `compute_fill_first_cost_fn_weight`, +and by default it is 1.0. + +:: + + --compute_fill_first_cost_fn_weight=1.0 + +Filter Scheduler finds local list of acceptable hosts by repeated filtering and +weighing. Each time it chooses a host, it virtually consumes resources on it, +so subsequent selections can adjust accordingly. It is useful if the customer +asks for the some large amount of instances, because weight is computed for +each instance requested. + +.. image:: /images/filteringWorkflow2.png + +In the end Filter Scheduler sorts selected hosts by their weight and provisions +instances on them. + +P.S.: you can find more examples of using Filter Scheduler and standard filters +in :mod:`nova.tests.scheduler`. + +.. |AllHostsFilter| replace:: :class:`AllHostsFilter ` +.. |AvailabilityZoneFilter| replace:: :class:`AvailabilityZoneFilter ` +.. |BaseHostFilter| replace:: :class:`BaseHostFilter ` +.. |ComputeFilter| replace:: :class:`ComputeFilter ` +.. |CoreFilter| replace:: :class:`CoreFilter ` +.. |IsolatedHostsFilter| replace:: :class:`IsolatedHostsFilter ` +.. |JsonFilter| replace:: :class:`JsonFilter ` +.. |RamFilter| replace:: :class:`RamFilter ` +.. |SimpleCIDRAffinityFilter| replace:: :class:`SimpleCIDRAffinityFilter ` diff --git a/doc/source/images/costs_weights.png b/doc/source/images/costs_weights.png deleted file mode 100644 index b65e98b0c5d1..000000000000 Binary files a/doc/source/images/costs_weights.png and /dev/null differ diff --git a/doc/source/images/dating_service.png b/doc/source/images/dating_service.png deleted file mode 100644 index 49f1bd86a30d..000000000000 Binary files a/doc/source/images/dating_service.png and /dev/null differ diff --git a/doc/source/images/filtering.png b/doc/source/images/filtering.png deleted file mode 100644 index 4303bded8a9a..000000000000 Binary files a/doc/source/images/filtering.png and /dev/null differ diff --git a/doc/source/images/filteringWorkflow1.png b/doc/source/images/filteringWorkflow1.png new file mode 100644 index 000000000000..58da979d793e Binary files /dev/null and b/doc/source/images/filteringWorkflow1.png differ diff --git a/doc/source/images/filteringWorkflow2.png b/doc/source/images/filteringWorkflow2.png new file mode 100644 index 000000000000..e0fe66acfe2f Binary files /dev/null and b/doc/source/images/filteringWorkflow2.png differ