Commit Graph

16 Commits

Author SHA1 Message Date
Antoine Musso 0c47b4292d Retire the project on OpenDev
It has been migrated to the Jenkins community:
https://github.com/jenkinsci/gearman-plugin/

Depends-On: Ib6010d7ce85a934501c50a53e9ac78dcf74bc403
Change-Id: I0c84db2ad3fbb4d9f0eff793a0159c6ed3a8e25c
2021-05-27 17:23:43 +02:00
Jenkins c5637cef49 Merge "Register the diff of functions" 2016-03-08 21:49:21 +00:00
Clark Boylan 73359c0456 Register the diff of functions
Instead of registering all functions for every node each time the
functions change register the delta of the functions each time. This
should cut down on the amount of CAN_DO updates we were doing in the
past.

Note that we handle the loss of all functions with RESET_ABILITIES
rather than sending a CANT_DO for each function that is no longer
available. Also, starting a new connection will always begin with
RESET_ABILITIES to clear any potentially stale state from the gearman
server.

Change-Id: I2b16117fce30ddb3e11b338043204cf726c7f1d4
2016-03-08 10:32:26 -08:00
Khai Do 08e9c429de Fix deadlock from a WORK_FAIL event
A lock is kept on a gearman worker when a WORK_FAIL event happens.
This causes the worker therad to stall on the next attempt that
the AvailabilityMonitor attempt to get a lock on the worker.
This causes the jenkins nodes to stop working (not run builds
anymore). Unlock the worker on a WORK_FAIL event to avoid this
deadlock state.

This fixes issue https://issues.jenkins-ci.org/browse/JENKINS-28891

Change-Id: I015ce9732fd535676a832680f39e220b09df95cf
2015-07-01 10:17:20 +00:00
Clark Boylan 65a08e0e95 Fix race between adding job and registering
Gearman plugin had a race between adding jobs to the functionList and
registering jobs. When registering jobs the functionMap is cleared, when
adding a job the plugin checks if the job is in the function Map before
running it. If we happen to trigger registration of jobs when we get a
response from gearman with a job assignment then the functionMap can be
empty making us send a work fail instead of running the job.

To make things worse this jenkins worker would not send a subsequent
GET JOB and would live lock never doing any useful work.

Correct this by making the processing for gearman events synchronous in
the work loop. This ensures that we never try to clear the function map
and check against it at the same time via different threads. To make
this happen the handleSessionEvent() method puts all events on a thread
safe queue for synchronous processing. This has allowed us to simplify
the work() loop and basically do the following:

  while running:
    init()
    register()
    process one event
    run function if processed
    drive IO

This is much easier to reason about as we essentially only have
bookkeeping and the code for one thing at a time.

Change-Id: Id537710f6c8276a528ad78afd72c5a7c8e8a16ac
2015-05-05 14:39:21 -07:00
James E. Blair e45ffe249d Add OFFLINE_NODE_WHEN_COMPLETE option
If a build job is requested with the "OFFLINE_NODE_WHEN_COMPLETE"
parameter set to a true value, then mark the node as temporarily
offline when the build is complete (regardless of the outcome).

This facilitates single-use slaves (or slaves that need cleanup
after their jobs).  "Temporarily offline" was chosen as the
most lightweight method of preventing new builds to facilitate
either performing an external cleanup action (which would then
online the existing node), or external deletion of the node.

To accomplish this, the NodeAvailabilityMonitor unlock call is
moved from the StartJobWorker gearman function out into the gearman
worker so that the lock is held during the entire run of the job
and further past the point where the StartJobWorker will set
the node offline.

Also, supply the name of the gearman worker (which includes the
node name) with the build data to the client.  This way the client
will know which worker performed the job, and whose node may need
to be manipulated if the offline flag is set.

Change-Id: I5cda75eb44b26ec58e5f03d0aa980af09ee023f6
2013-08-06 11:09:14 -07:00
James E. Blair a9adc4b7d9 Make logging more consistent
Change-Id: Ifedb1ddbd7663900438fd89f2eabb3c1f4c2a5aa
2013-07-11 12:39:24 -07:00
James E. Blair 2ba8b61b3a More deadlock fixes.
It's possible that some InterruptedExceptions were being hidden by
the GrabJobEventHandler, which must catch those and not throw them.
So wherever we call driveSessionIO, check to make sure that we are
still supposed to be running afterwords.

Make sure all other places where InterruptedException is caught
do something reasonable with it.

Remove printStackTrace calls and replace with logging calls.

Change-Id: I0790eece8582c1ee2cd28e8866bfd4a9d5d700cd
2013-06-14 21:22:27 -07:00
James E. Blair 352664ee95 Send final packet immediately.
Rather than waiting for it to be sent on the next pass through the
loop.

Change-Id: If77bf1e0be6946d6a0b5db33cec5d1caf98cd058
2013-06-14 17:28:08 -07:00
James E. Blair 4556818799 Report exceptions while running the job to the client.
Don't catch any exceptions while running the job; instead, report
them back to the client (via a catch-all exception handler in
StartJobWorker).

If the worker raises an exception, unlock the node monitor, in case
the worker didn't get to the point where it would be unlocked.

This change has the side effect that if the gearman server disconnects
while the job is running, the worker should return from watching the
job run (as soon as it notices, currently up to 5 seconds).  This is
helpful in that it will be available to register with gearman again,
including sending CAN_DO packets.  But the node monitor will still
prevent it from scheduling a new job while the one it started earlier
is still running.

Change-Id: Ie01ef0f9e706d81452b189099e36242ab9967950
2013-06-14 15:23:07 -07:00
James E. Blair 6041401766 Handle mutex scheduling from Gearman or Jenkins.
Every node (slave or master) gets an AvailabilityMonitor that
handles mutually exclusive access to scheduling builds on that
node.  If Jenkins wants to run a build on the node, it will only
be able to do so if we are not waiting for a response to a
GRAB_JOB packet from Gearman.  Likewise, immediately before
sending a GRAB_JOB, we lock the monitor and only unlock it if
we either get a NO_JOB response, or after the job we were just
assigned starts building.

(As an exception to the above rule, since Jenkins will apply the
same scheduling veto logic to the build that we request via Gearman,
(while we still hold the lock) we tell the monitor to expect a request
for that build from Jenkins and we permit Jenkins to build it even
if the lock is held.)

Change-Id: Iae03932aef4b503c69699b99d38a6fc2691fb02e
2013-06-13 12:42:51 -07:00
James E. Blair 76cb343b8c Don't grab jobs when shutting down.
This rearranges a bit of the previous change to move the WaitBool
functionality into an AvailabilityChecker class.

It implements the new feature of not grabbing jobs while not in the
quiet mode for shutdown by using a busy wait.  There doesn't seem to
be an event framework for that, and well, it doesn't happen very
often, so a slow busy wait probably isn't terrible.

This only applies to Executor workers, not Management workers
(so jobs can still be stopped and descriptions set).

Removed the default-name constructor of AbstractWorkerThread because
it is not used anywhere now (removed its test as well).

Change-Id: I6d5e1cd3cb47c8876ceb909d205cb66445388992
2013-06-12 07:59:46 -07:00
James E. Blair 858fb155fe Don't grab a job when executors are busy.
Jobs can be triggered by non-gearman sources, sadly.  This makes the gearman
plugin aware of when executors are busy and it will refrain from grabbing a
job from gearman in those circumstances.

It's far from perfect, but should at least handle the most likely cases where
a job is already running an an executor.

Change-Id: If993c6d6bc63ed89b385d2e5bb41762ef84a429f
2013-06-11 16:04:09 -07:00
James E. Blair 61e47a1e8b Handle not connecting at start.
We previously assumed that we would connect to gearman immediately and be able
to send a SET_CLIENT_ID packet; that would raise an exception and stop any
further gearman processing.  Instead, don't assume we can send that packet,
and also, do send it immediately after reconnecting.

Also, lower the retry delay from 3 second to 2.  I don't think we'll mind
Jenkins being in a tighter loop because it probably won't be doing anything else.

Change-Id: Id585726e23076ad22e2935173995db2bdadf9974
2013-06-06 15:40:05 -07:00
James E. Blair 0b017fe95d Rework when GRAB_JOB is sent.
* Send GRAB_JOB after NO_OP.
* Send GRAB_JOB after initial setup to start the event-state system.
* Don't send GRAB_JOB after changing the functions.  The gearman server will
  maintain the existing state of the connection if functions change, so if
  we are sleeping, we will still get appropriate NOOP messages even after
  changing functions.
* Don't send GRAB_JOB when the task queue is empty.  The event-state system
  should send it when necessary.
* Send GRAB_JOB after job completion (to restart the event-state system).

Fixes 1183454.

Also fixes cases where we would end up grabbing multiple jobs in one cycle.

Change-Id: I36a890711ecdfef62a6554bac820acf0ca8b5f5b
2013-05-24 16:55:38 -07:00
James E. Blair 20844c7e46 Add local GearmanWorker.
Add a package local implementation of something like the GearmanWorker from
java-gearman (based on GearmanWorkerImpl).

It is much simpler than the existing GearmanWorkerImpl and is more suited to
the way we need to use it in the Jenkins plugin.  It assumes jobs are always
changed in batches, and only changes jobs at the top of the event loop (not
when a job is running).

The worker threads are updated to only request job changes when there is an
actual difference.

WORK_STATUS events are sent every 10 seconds while a job is running.

run-fast is updated to only remove the gearman plugin from the work directory,
preserving any other plugins that may be installed.

This isn't very elegant, but is a start and broadly demonstrates what we need
the plugin to do.

Change-Id: I26df504534ec50f03c9e0ef772a709046cf88a23
2013-04-22 10:29:03 -07:00