This should be ControllerExtraConfig, since the current parameter name
has been deprecated for some time, and is inconsistent with all other
roles.
Since mitaka is now EOL this also removes references to the
worker-config-mitaka-and-below environment.
Change-Id: I0f07b3abbe290ed7f740a6f4915e16be39e3a4c6
We have a few black sheep compute nodes in rh1 that have no SSD
and/or less memory. The non-SSD nodes tend to get preferred by the
scheduler due to the larger disk number they report (1000 GB > 200 GB)
and that's particularly a problem when we have a node with only 64
GB of memory and a 1 TB drive. It tends to get over-scheduled,
even though it's our slowest node.
Since we almost exclusively care about distributing memory evenly,
let's weigh scheduler decisions only on that. Note that the normal
filters are left in place, so we shouldn't ever try to schedule
more vms than a node can handle. This will only stop us from
preferring the nodes with slower storage.
Also note that this change is already live in rh1 and seems to be
working fine. I'm just updating the env to reflect the change.
Change-Id: I54731e49ed9bb08a6048bc52fe25412d9de6473c
Running the full 24 workers doesn't increase our capacity from what
I can tell, so it's just a waste of CPU and memory. Neutron is by
far the most heavily used service in rh1, and limiting its workers
seems to cause issues so I'm leaving it alone.
Change-Id: I39b89a0eeb9791fb60b44f3bf62dc31bd721624f
With a max of 60 envs, we're starting to hit scheduling errors due
to lack of disk space on some of the compute nodes. In reality,
none of the compute nodes are using more than 61% of their disk
and most are under 50%, so a 33% increase in overcommit should be
safe enough.
We may also want to increase the scheduler retries to help with
this problem. Part of the issue is that most of the compute nodes
have sufficient disk available, but if we happen to get unlucky and
pick three in a row that don't then the instance fails. More
scheduler retries would help with that.
Change-Id: Ifb2db1ddafd183aa4c9584b406e3b47bf7b0c5a9
If we have more than 4 Heat engine workers in rh1 it can generate
so much traffic that the other services have trouble keeping up,
which causes all kinds of trouble. 4 seems to be a pretty good
sweet spot where Heat has plenty of capacity to create stacks but
doesn't DoS the other services.
Note that this is already in the running config on rh1. This is
just a patch to update the env file so it will be persistent.
Change-Id: I2c5d33cf2307349ea231ad1ba07170b250a84cef