nova/nova/scheduler
melanie witt bf65fdd59e Stop globally caching host states in scheduler HostManager
Currently, in the scheduler HostManager, we cache host states in
a map global to all requests. This used to be okay because we were
always querying the entire compute node list for every request to
pass on to filtering. So we cached the host states globally and
updated them per request and removed "dead nodes" from the cache
(compute nodes still in the cache that were not returned from
ComputeNodeList.get_all).

As of Ocata, we started filtering our ComputeNodeList query based on
an answer from placement about which resource providers could satisfy
the request, instead of querying the entire compute node list every
time. This is much more efficient (don't consider compute nodes that
can't possibly fulfill the request) BUT it doesn't play well with the
global host state cache. We started seeing "Removing dead compute node"
messages in the logs, signaling removal of compute nodes from the
global cache when compute nodes were actually available.

If request A comes in and all compute nodes can satisfy its request,
then request B arrives concurrently and no compute nodes can satisfy
its request, the request B request will remove all the compute nodes
from the global host state cache and then request A will get "no valid
hosts" at the filtering stage because get_host_states_by_uuids returns
a generator that hands out hosts from the global host state cache.

This removes the global host state cache from the scheduler HostManager
and instead generates a fresh host state map per request and uses that
to return hosts from the generator. Because we're filtering the
ComputeNodeList based on a placement query per request, each request
can have a completely different set of compute nodes that can fulfill
it, so we're not gaining much by caching host states anyway.

Co-Authored-By: Dan Smith <dansmith@redhat.com>

Closes-Bug: #1742827
Related-Bug: #1739323

Change-Id: I40c17ed88f50ecbdedc4daf368fff10e90e7be11
(cherry picked from commit c98ac6adc5)
2018-01-29 19:23:04 +00:00
..
client Log consumer uuid when retrying claims in the scheduler 2017-10-06 03:41:42 +00:00
filters Refined fix for validating image on rebuild 2017-11-27 20:51:11 -05:00
weights Add PCIWeigher 2017-06-08 09:44:46 +01:00
__init__.py Improve hacking rule to avoid author markers 2014-05-05 14:35:20 +02:00
caching_scheduler.py Mark Chance and Caching schedulers as deprecated 2017-08-09 10:53:53 -07:00
chance.py Mark Chance and Caching schedulers as deprecated 2017-08-09 10:53:53 -07:00
driver.py placement: scheduler uses allocation candidates 2017-07-07 11:35:54 -04:00
filter_scheduler.py Fix an error in _get_host_states when deleting a compute node 2017-12-21 10:37:08 -05:00
host_manager.py Stop globally caching host states in scheduler HostManager 2018-01-29 19:23:04 +00:00
ironic_host_manager.py Set IronicNodeState.uuid in _update_from_compute_node 2017-07-25 17:52:47 -04:00
manager.py Raise NoValidHost if no allocation candidates 2017-08-08 14:00:08 +02:00
rpcapi.py conf: remove *_topic config opts 2017-07-17 21:27:02 -07:00
utils.py Refined fix for validating image on rebuild 2017-11-27 20:51:11 -05:00