gearman-plugin

Commit Graph

Author	SHA1	Message	Date
Antoine Musso	0c47b4292d	Retire the project on OpenDev It has been migrated to the Jenkins community: https://github.com/jenkinsci/gearman-plugin/ Depends-On: Ib6010d7ce85a934501c50a53e9ac78dcf74bc403 Change-Id: I0c84db2ad3fbb4d9f0eff793a0159c6ed3a8e25c	2021-05-27 17:23:43 +02:00
Jenkins	c5637cef49	Merge "Register the diff of functions"	2016-03-08 21:49:21 +00:00
Clark Boylan	73359c0456	Register the diff of functions Instead of registering all functions for every node each time the functions change register the delta of the functions each time. This should cut down on the amount of CAN_DO updates we were doing in the past. Note that we handle the loss of all functions with RESET_ABILITIES rather than sending a CANT_DO for each function that is no longer available. Also, starting a new connection will always begin with RESET_ABILITIES to clear any potentially stale state from the gearman server. Change-Id: I2b16117fce30ddb3e11b338043204cf726c7f1d4	2016-03-08 10:32:26 -08:00
Khai Do	08e9c429de	Fix deadlock from a WORK_FAIL event A lock is kept on a gearman worker when a WORK_FAIL event happens. This causes the worker therad to stall on the next attempt that the AvailabilityMonitor attempt to get a lock on the worker. This causes the jenkins nodes to stop working (not run builds anymore). Unlock the worker on a WORK_FAIL event to avoid this deadlock state. This fixes issue https://issues.jenkins-ci.org/browse/JENKINS-28891 Change-Id: I015ce9732fd535676a832680f39e220b09df95cf	2015-07-01 10:17:20 +00:00
Clark Boylan	65a08e0e95	Fix race between adding job and registering Gearman plugin had a race between adding jobs to the functionList and registering jobs. When registering jobs the functionMap is cleared, when adding a job the plugin checks if the job is in the function Map before running it. If we happen to trigger registration of jobs when we get a response from gearman with a job assignment then the functionMap can be empty making us send a work fail instead of running the job. To make things worse this jenkins worker would not send a subsequent GET JOB and would live lock never doing any useful work. Correct this by making the processing for gearman events synchronous in the work loop. This ensures that we never try to clear the function map and check against it at the same time via different threads. To make this happen the handleSessionEvent() method puts all events on a thread safe queue for synchronous processing. This has allowed us to simplify the work() loop and basically do the following: while running: init() register() process one event run function if processed drive IO This is much easier to reason about as we essentially only have bookkeeping and the code for one thing at a time. Change-Id: Id537710f6c8276a528ad78afd72c5a7c8e8a16ac	2015-05-05 14:39:21 -07:00
James E. Blair	e45ffe249d	Add OFFLINE_NODE_WHEN_COMPLETE option If a build job is requested with the "OFFLINE_NODE_WHEN_COMPLETE" parameter set to a true value, then mark the node as temporarily offline when the build is complete (regardless of the outcome). This facilitates single-use slaves (or slaves that need cleanup after their jobs). "Temporarily offline" was chosen as the most lightweight method of preventing new builds to facilitate either performing an external cleanup action (which would then online the existing node), or external deletion of the node. To accomplish this, the NodeAvailabilityMonitor unlock call is moved from the StartJobWorker gearman function out into the gearman worker so that the lock is held during the entire run of the job and further past the point where the StartJobWorker will set the node offline. Also, supply the name of the gearman worker (which includes the node name) with the build data to the client. This way the client will know which worker performed the job, and whose node may need to be manipulated if the offline flag is set. Change-Id: I5cda75eb44b26ec58e5f03d0aa980af09ee023f6	2013-08-06 11:09:14 -07:00
James E. Blair	a9adc4b7d9	Make logging more consistent Change-Id: Ifedb1ddbd7663900438fd89f2eabb3c1f4c2a5aa	2013-07-11 12:39:24 -07:00
James E. Blair	2ba8b61b3a	More deadlock fixes. It's possible that some InterruptedExceptions were being hidden by the GrabJobEventHandler, which must catch those and not throw them. So wherever we call driveSessionIO, check to make sure that we are still supposed to be running afterwords. Make sure all other places where InterruptedException is caught do something reasonable with it. Remove printStackTrace calls and replace with logging calls. Change-Id: I0790eece8582c1ee2cd28e8866bfd4a9d5d700cd	2013-06-14 21:22:27 -07:00
James E. Blair	352664ee95	Send final packet immediately. Rather than waiting for it to be sent on the next pass through the loop. Change-Id: If77bf1e0be6946d6a0b5db33cec5d1caf98cd058	2013-06-14 17:28:08 -07:00
James E. Blair	4556818799	Report exceptions while running the job to the client. Don't catch any exceptions while running the job; instead, report them back to the client (via a catch-all exception handler in StartJobWorker). If the worker raises an exception, unlock the node monitor, in case the worker didn't get to the point where it would be unlocked. This change has the side effect that if the gearman server disconnects while the job is running, the worker should return from watching the job run (as soon as it notices, currently up to 5 seconds). This is helpful in that it will be available to register with gearman again, including sending CAN_DO packets. But the node monitor will still prevent it from scheduling a new job while the one it started earlier is still running. Change-Id: Ie01ef0f9e706d81452b189099e36242ab9967950	2013-06-14 15:23:07 -07:00
James E. Blair	6041401766	Handle mutex scheduling from Gearman or Jenkins. Every node (slave or master) gets an AvailabilityMonitor that handles mutually exclusive access to scheduling builds on that node. If Jenkins wants to run a build on the node, it will only be able to do so if we are not waiting for a response to a GRAB_JOB packet from Gearman. Likewise, immediately before sending a GRAB_JOB, we lock the monitor and only unlock it if we either get a NO_JOB response, or after the job we were just assigned starts building. (As an exception to the above rule, since Jenkins will apply the same scheduling veto logic to the build that we request via Gearman, (while we still hold the lock) we tell the monitor to expect a request for that build from Jenkins and we permit Jenkins to build it even if the lock is held.) Change-Id: Iae03932aef4b503c69699b99d38a6fc2691fb02e	2013-06-13 12:42:51 -07:00
James E. Blair	76cb343b8c	Don't grab jobs when shutting down. This rearranges a bit of the previous change to move the WaitBool functionality into an AvailabilityChecker class. It implements the new feature of not grabbing jobs while not in the quiet mode for shutdown by using a busy wait. There doesn't seem to be an event framework for that, and well, it doesn't happen very often, so a slow busy wait probably isn't terrible. This only applies to Executor workers, not Management workers (so jobs can still be stopped and descriptions set). Removed the default-name constructor of AbstractWorkerThread because it is not used anywhere now (removed its test as well). Change-Id: I6d5e1cd3cb47c8876ceb909d205cb66445388992	2013-06-12 07:59:46 -07:00
James E. Blair	858fb155fe	Don't grab a job when executors are busy. Jobs can be triggered by non-gearman sources, sadly. This makes the gearman plugin aware of when executors are busy and it will refrain from grabbing a job from gearman in those circumstances. It's far from perfect, but should at least handle the most likely cases where a job is already running an an executor. Change-Id: If993c6d6bc63ed89b385d2e5bb41762ef84a429f	2013-06-11 16:04:09 -07:00
James E. Blair	61e47a1e8b	Handle not connecting at start. We previously assumed that we would connect to gearman immediately and be able to send a SET_CLIENT_ID packet; that would raise an exception and stop any further gearman processing. Instead, don't assume we can send that packet, and also, do send it immediately after reconnecting. Also, lower the retry delay from 3 second to 2. I don't think we'll mind Jenkins being in a tighter loop because it probably won't be doing anything else. Change-Id: Id585726e23076ad22e2935173995db2bdadf9974	2013-06-06 15:40:05 -07:00
James E. Blair	0b017fe95d	Rework when GRAB_JOB is sent. * Send GRAB_JOB after NO_OP. * Send GRAB_JOB after initial setup to start the event-state system. * Don't send GRAB_JOB after changing the functions. The gearman server will maintain the existing state of the connection if functions change, so if we are sleeping, we will still get appropriate NOOP messages even after changing functions. * Don't send GRAB_JOB when the task queue is empty. The event-state system should send it when necessary. * Send GRAB_JOB after job completion (to restart the event-state system). Fixes 1183454. Also fixes cases where we would end up grabbing multiple jobs in one cycle. Change-Id: I36a890711ecdfef62a6554bac820acf0ca8b5f5b	2013-05-24 16:55:38 -07:00
James E. Blair	20844c7e46	Add local GearmanWorker. Add a package local implementation of something like the GearmanWorker from java-gearman (based on GearmanWorkerImpl). It is much simpler than the existing GearmanWorkerImpl and is more suited to the way we need to use it in the Jenkins plugin. It assumes jobs are always changed in batches, and only changes jobs at the top of the event loop (not when a job is running). The worker threads are updated to only request job changes when there is an actual difference. WORK_STATUS events are sent every 10 seconds while a job is running. run-fast is updated to only remove the gearman plugin from the work directory, preserving any other plugins that may be installed. This isn't very elegant, but is a start and broadly demonstrates what we need the plugin to do. Change-Id: I26df504534ec50f03c9e0ef772a709046cf88a23	2013-04-22 10:29:03 -07:00

16 Commits