* If a subworkflow completes it sends its result to a parent
workflow by using the scheduler (delayed call) which operates
through the database and has a delay between iterations.
This patch optimizes this by reusing already existing
decorator @action_queue.process to make RPC calls to convey
subworkflow results outside of a DB transaction, similar
way as we schedule action runs after completion of a task.
The main reason for making this change is how Scheduler now
works in HA mode. In fact, it doesn't scale well because
every Scheduler instance keeps quering DB for delayed calls
eligible for processing and hence in HA setup many Schedulers
take same delayed calls often and clash between each other
causing DB deadlocks in mysql. They are caused just by mysql
locking model (it's documented in their docs) so we have
means to handle them. However, Scheduler still remans a
bottleneck in the system and it's better to reduce the load
on it as much as possible.
One more reason to make this change is that we don't solve
the problem of eleminating the possibility to loose RPC
messages (when a DB TX is committed and RPC calls is not made
yet) with Scheduler anyway. If we use Scheduler for scheduling
RPC calls we just shift the place where we can unsync DB and
MQ to the Scheduler. So, in other words, it is a fundamental
problem of syncing two external data sources which can't be
naturally enrolled into one distributed transaction.
Based on our experience or running big workflows we concluded
that simplication of network protocols gives better results,
meaning that the less components we use for network
communications the better. Eventually it increases performance
and reduces the load on the system and also reduces the
probability of having DB and MQ out of sync.
We used to use Scheduler for running actions on executors too by
scheduling RPC calls but at some point we saw that it reduces
performance on 40-50% without bringing any real benefits at
this expense. The opposite way, Scheduler was even a worse
bottleneck because of this. So we decided to eliminate the
Scheduler from this chain and the system became practically
much more performant and reliable. So now I did the same
with delivering a subworkflow result.
I believe when it comes to recovering from situations of
DB and MQ being out of sync we need to come up with special
tools that will assume some minimal human intervention
(although I think we can recover some things automatically).
Such a tool should just make it very obvious what's broken
and how to fix it, and make it convenient to fix it (restart
a task/action etc.).
* Processing action queue now happens within a new greenthread
because otherwise Mistral engine can get into a deadlock
by sending a request to itself while processing another one.
It can happen if we use blocking RPC which is the only option
for now.
* Other small fixes
Change-Id: Ic3cf6c47bba215dc6a13944b0585cce59e4e88f9