This patch delivers the first working version of a distributed
scheduler implementation based on local and persistent job
queues. The idea is inspired by the parallel computing pattern
known as "Work stealing" although it doesn't fully repeat it
due to a nature of Mistral.
See https://en.wikipedia.org/wiki/Work_stealing for details.
Advantages of this scheduler implementation:
* It doesn't have job processing delays when a cluster topology'
is stable caused by DB polling intervals. A job gets scheduled
in memory and also saved into the persistent storage for
reliability. A persistent job can be picked up only after a
configured allowed period of time so that it happens effectively
after a node responsible for local processing crashed.
* Low DB load. DB polling still exists but it's not a primary
scheduling mechamisn now but rather a protection from node crash
situations. That means that a polling interval can now be made
large like 30 seconds, instead of 1-2 seconds. Less DB load
leads to less DB deadlocks between scheduler instances and less
retries on MySQL.
* Since DB load is now less it gives better scalability properties.
A bigger number of engines won't now lead to much bigger
contention because of a big DB polling intervals.
* Protection from having jobs forever hanging in processing state.
In the existing implementation, if a scheduler captured a job
for processing (set its "processing" flag to True) and then
crashed then a job will be in processing state forever in the DB.
Instead of a boolean "processing" flag, the new implementation
uses a timestamp showing when a job was captured. That gives us
the opportunity to make such jobs eligible for recapturing and
further processing after a certain configured timeout.
TODO:
* More testing
* DB migration for the new scheduled jobs table
* Benchmarks and testing under load
* Standardize the scheduler interface and write an adapter for the
existing scheduler so that we could choose between scheduler
implementations. It's highly desired to make transition to the
new scheduler smooth in production: we always need to be able
to roll back to the existing scheduler.
Partial blueprint: mistral-redesign-scheduler
Partial blueprint: mistral-eliminate-scheduler-delays
Change-Id: If7d06b64ac14d01e80d31242e1640cb93f2aa6fe