nova/scheduler at 11c47cafed8fc1526a24a5fb504483e0398347f4 - nova

History

melanie witt bf65fdd59e Stop globally caching host states in scheduler HostManager Currently, in the scheduler HostManager, we cache host states in a map global to all requests. This used to be okay because we were always querying the entire compute node list for every request to pass on to filtering. So we cached the host states globally and updated them per request and removed "dead nodes" from the cache (compute nodes still in the cache that were not returned from ComputeNodeList.get_all). As of Ocata, we started filtering our ComputeNodeList query based on an answer from placement about which resource providers could satisfy the request, instead of querying the entire compute node list every time. This is much more efficient (don't consider compute nodes that can't possibly fulfill the request) BUT it doesn't play well with the global host state cache. We started seeing "Removing dead compute node" messages in the logs, signaling removal of compute nodes from the global cache when compute nodes were actually available. If request A comes in and all compute nodes can satisfy its request, then request B arrives concurrently and no compute nodes can satisfy its request, the request B request will remove all the compute nodes from the global host state cache and then request A will get "no valid hosts" at the filtering stage because get_host_states_by_uuids returns a generator that hands out hosts from the global host state cache. This removes the global host state cache from the scheduler HostManager and instead generates a fresh host state map per request and uses that to return hosts from the generator. Because we're filtering the ComputeNodeList based on a placement query per request, each request can have a completely different set of compute nodes that can fulfill it, so we're not gaining much by caching host states anyway. Co-Authored-By: Dan Smith <dansmith@redhat.com> Closes-Bug: #1742827 Related-Bug: #1739323 Change-Id: I40c17ed88f50ecbdedc4daf368fff10e90e7be11 (cherry picked from commit `c98ac6adc5`)		2018-01-29 19:23:04 +00:00
..
client	Log consumer uuid when retrying claims in the scheduler	2017-10-06 03:41:42 +00:00
filters	Refined fix for validating image on rebuild	2017-11-27 20:51:11 -05:00
weights	Add PCIWeigher	2017-06-08 09:44:46 +01:00
__init__.py	Improve hacking rule to avoid author markers	2014-05-05 14:35:20 +02:00
caching_scheduler.py	Mark Chance and Caching schedulers as deprecated	2017-08-09 10:53:53 -07:00
chance.py	Mark Chance and Caching schedulers as deprecated	2017-08-09 10:53:53 -07:00
driver.py	placement: scheduler uses allocation candidates	2017-07-07 11:35:54 -04:00
filter_scheduler.py	Fix an error in _get_host_states when deleting a compute node	2017-12-21 10:37:08 -05:00
host_manager.py	Stop globally caching host states in scheduler HostManager	2018-01-29 19:23:04 +00:00
ironic_host_manager.py	Set IronicNodeState.uuid in _update_from_compute_node	2017-07-25 17:52:47 -04:00
manager.py	Raise NoValidHost if no allocation candidates	2017-08-08 14:00:08 +02:00
rpcapi.py	conf: remove *_topic config opts	2017-07-17 21:27:02 -07:00
utils.py	Refined fix for validating image on rebuild	2017-11-27 20:51:11 -05:00