It has been migrated to the Jenkins community:
https://github.com/jenkinsci/gearman-plugin/
Depends-On: Ib6010d7ce85a934501c50a53e9ac78dcf74bc403
Change-Id: I0c84db2ad3fbb4d9f0eff793a0159c6ed3a8e25c
Instead of registering all functions for every node each time the
functions change register the delta of the functions each time. This
should cut down on the amount of CAN_DO updates we were doing in the
past.
Note that we handle the loss of all functions with RESET_ABILITIES
rather than sending a CANT_DO for each function that is no longer
available. Also, starting a new connection will always begin with
RESET_ABILITIES to clear any potentially stale state from the gearman
server.
Change-Id: I2b16117fce30ddb3e11b338043204cf726c7f1d4
A lock is kept on a gearman worker when a WORK_FAIL event happens.
This causes the worker therad to stall on the next attempt that
the AvailabilityMonitor attempt to get a lock on the worker.
This causes the jenkins nodes to stop working (not run builds
anymore). Unlock the worker on a WORK_FAIL event to avoid this
deadlock state.
This fixes issue https://issues.jenkins-ci.org/browse/JENKINS-28891
Change-Id: I015ce9732fd535676a832680f39e220b09df95cf
Gearman plugin had a race between adding jobs to the functionList and
registering jobs. When registering jobs the functionMap is cleared, when
adding a job the plugin checks if the job is in the function Map before
running it. If we happen to trigger registration of jobs when we get a
response from gearman with a job assignment then the functionMap can be
empty making us send a work fail instead of running the job.
To make things worse this jenkins worker would not send a subsequent
GET JOB and would live lock never doing any useful work.
Correct this by making the processing for gearman events synchronous in
the work loop. This ensures that we never try to clear the function map
and check against it at the same time via different threads. To make
this happen the handleSessionEvent() method puts all events on a thread
safe queue for synchronous processing. This has allowed us to simplify
the work() loop and basically do the following:
while running:
init()
register()
process one event
run function if processed
drive IO
This is much easier to reason about as we essentially only have
bookkeeping and the code for one thing at a time.
Change-Id: Id537710f6c8276a528ad78afd72c5a7c8e8a16ac
If a build job is requested with the "OFFLINE_NODE_WHEN_COMPLETE"
parameter set to a true value, then mark the node as temporarily
offline when the build is complete (regardless of the outcome).
This facilitates single-use slaves (or slaves that need cleanup
after their jobs). "Temporarily offline" was chosen as the
most lightweight method of preventing new builds to facilitate
either performing an external cleanup action (which would then
online the existing node), or external deletion of the node.
To accomplish this, the NodeAvailabilityMonitor unlock call is
moved from the StartJobWorker gearman function out into the gearman
worker so that the lock is held during the entire run of the job
and further past the point where the StartJobWorker will set
the node offline.
Also, supply the name of the gearman worker (which includes the
node name) with the build data to the client. This way the client
will know which worker performed the job, and whose node may need
to be manipulated if the offline flag is set.
Change-Id: I5cda75eb44b26ec58e5f03d0aa980af09ee023f6
It's possible that some InterruptedExceptions were being hidden by
the GrabJobEventHandler, which must catch those and not throw them.
So wherever we call driveSessionIO, check to make sure that we are
still supposed to be running afterwords.
Make sure all other places where InterruptedException is caught
do something reasonable with it.
Remove printStackTrace calls and replace with logging calls.
Change-Id: I0790eece8582c1ee2cd28e8866bfd4a9d5d700cd
Don't catch any exceptions while running the job; instead, report
them back to the client (via a catch-all exception handler in
StartJobWorker).
If the worker raises an exception, unlock the node monitor, in case
the worker didn't get to the point where it would be unlocked.
This change has the side effect that if the gearman server disconnects
while the job is running, the worker should return from watching the
job run (as soon as it notices, currently up to 5 seconds). This is
helpful in that it will be available to register with gearman again,
including sending CAN_DO packets. But the node monitor will still
prevent it from scheduling a new job while the one it started earlier
is still running.
Change-Id: Ie01ef0f9e706d81452b189099e36242ab9967950
Every node (slave or master) gets an AvailabilityMonitor that
handles mutually exclusive access to scheduling builds on that
node. If Jenkins wants to run a build on the node, it will only
be able to do so if we are not waiting for a response to a
GRAB_JOB packet from Gearman. Likewise, immediately before
sending a GRAB_JOB, we lock the monitor and only unlock it if
we either get a NO_JOB response, or after the job we were just
assigned starts building.
(As an exception to the above rule, since Jenkins will apply the
same scheduling veto logic to the build that we request via Gearman,
(while we still hold the lock) we tell the monitor to expect a request
for that build from Jenkins and we permit Jenkins to build it even
if the lock is held.)
Change-Id: Iae03932aef4b503c69699b99d38a6fc2691fb02e
This rearranges a bit of the previous change to move the WaitBool
functionality into an AvailabilityChecker class.
It implements the new feature of not grabbing jobs while not in the
quiet mode for shutdown by using a busy wait. There doesn't seem to
be an event framework for that, and well, it doesn't happen very
often, so a slow busy wait probably isn't terrible.
This only applies to Executor workers, not Management workers
(so jobs can still be stopped and descriptions set).
Removed the default-name constructor of AbstractWorkerThread because
it is not used anywhere now (removed its test as well).
Change-Id: I6d5e1cd3cb47c8876ceb909d205cb66445388992
Jobs can be triggered by non-gearman sources, sadly. This makes the gearman
plugin aware of when executors are busy and it will refrain from grabbing a
job from gearman in those circumstances.
It's far from perfect, but should at least handle the most likely cases where
a job is already running an an executor.
Change-Id: If993c6d6bc63ed89b385d2e5bb41762ef84a429f
We previously assumed that we would connect to gearman immediately and be able
to send a SET_CLIENT_ID packet; that would raise an exception and stop any
further gearman processing. Instead, don't assume we can send that packet,
and also, do send it immediately after reconnecting.
Also, lower the retry delay from 3 second to 2. I don't think we'll mind
Jenkins being in a tighter loop because it probably won't be doing anything else.
Change-Id: Id585726e23076ad22e2935173995db2bdadf9974
* Send GRAB_JOB after NO_OP.
* Send GRAB_JOB after initial setup to start the event-state system.
* Don't send GRAB_JOB after changing the functions. The gearman server will
maintain the existing state of the connection if functions change, so if
we are sleeping, we will still get appropriate NOOP messages even after
changing functions.
* Don't send GRAB_JOB when the task queue is empty. The event-state system
should send it when necessary.
* Send GRAB_JOB after job completion (to restart the event-state system).
Fixes 1183454.
Also fixes cases where we would end up grabbing multiple jobs in one cycle.
Change-Id: I36a890711ecdfef62a6554bac820acf0ca8b5f5b
Add a package local implementation of something like the GearmanWorker from
java-gearman (based on GearmanWorkerImpl).
It is much simpler than the existing GearmanWorkerImpl and is more suited to
the way we need to use it in the Jenkins plugin. It assumes jobs are always
changed in batches, and only changes jobs at the top of the event loop (not
when a job is running).
The worker threads are updated to only request job changes when there is an
actual difference.
WORK_STATUS events are sent every 10 seconds while a job is running.
run-fast is updated to only remove the gearman plugin from the work directory,
preserving any other plugins that may be installed.
This isn't very elegant, but is a start and broadly demonstrates what we need
the plugin to do.
Change-Id: I26df504534ec50f03c9e0ef772a709046cf88a23