zuul/zuul - zuul - OpenDev: Free Software Needs Free Tools

Commit Graph

Author	SHA1	Message	Date
Paul Belanger	174a8274d0	Rename zuul-launcher to zuul-executor To avoid confusion with nodepool-launcher, we've decided to rename zuul-launcher to zuul-executor. Change-Id: I7d03cf0f0093400f4ba2e4beb1c92694224a3e8c Signed-off-by: Paul Belanger <pabelanger@redhat.com>	2017-03-15 12:21:24 -04:00
Joshua Hesketh	25695cbb51	Merge branch 'master' into feature/zuulv3 Change-Id: I37a3c5d4f12917b111b7eb624f8b68689687ebc4	2017-03-06 09:40:04 -08:00
Paul Belanger	08de693416	Bump post playbook timeout to 30mins Currently our post playbook timeout value is hardcoded to 10mins, for the majority of our jobs this is okay. However, when projects need to transfer a lot of data (kolla 2.6gb tarballs) zuul will abort the post playbook. For zuulv3, we should properly expose this value to be configured per job, but today just bump our timeout to 30mins. Change-Id: I12dcbfe60bb1d59c3af8a13f49f04e3b68ff7197 Signed-off-by: Paul Belanger <pabelanger@redhat.com>	2017-02-03 09:01:07 -05:00
James E. Blair	7f7ddbdfa0	Fix watchdog timeout fix In I6cae11c1e89f6ccc78cb5bfaf61ef78e846e87be, we attempted to fix an error where long-running workers never reset their watchdog timeout flag, meaning that once a job timed out, all further jobs on that worker timed out. That change cleared the flag each time ansible ran. However, that flag is also used in conjunction with the abort flag to determine whether a failed or null result should be sent back to Zuul (a null result will cause a job to be rescheduled). By clearing the flag before, say, a post playbook we would lose the information that the abort was due to a timeout rather than a direct abort request, and return the null result to Zuul. This means all jobs that timeout would be relaunched. Instead of clearing the flag before each ansible run, clear it once at the start of the job launch. This means it will be set for any ansible timeout. That should be fine for both the aborted job check as well as the new "timed out" log message. The typo this change corrects indicates this was the intended logic. Change-Id: Ie31409a7706b6cf4d7ce858b4d5f0c00e4ee31da	2016-12-14 10:12:15 -08:00
Monty Taylor	cef224d162	Add a log message when ansible times out The watchdog timeout emits an operator log, but no end-user visible message. Add some text to the error message if we do time out. Change-Id: I38fed8e020a966362ee708025ab5bc9aa5995c68	2016-12-14 12:03:12 -06:00
Joshua Hesketh	52846ffb3d	Add note about redundant file Change-Id: I45be20233dab35e58eb8a77309df184fe4415e9d	2016-12-09 15:42:23 +11:00
Monty Taylor	4afdd8a89d	Add reset of watchdog timeout flag For the long lived worker, the flag never gets reset, which means that every job that runs after a job that times out will show as failed for no good reason. Change-Id: I6cae11c1e89f6ccc78cb5bfaf61ef78e846e87be	2016-12-07 10:26:04 -06:00
Clark Boylan	63a595bae3	Don't retry when using synchronize module There is a bug (https://github.com/ansible/ansible/issues/18281) in the ansible synchronize module that causes any retry attempt at synchronizing to fail because the paths get munged resulting in invalid paths. Unfortunately this also means that the error message we get is not for the first failed sync attempt but for the last making it hard to debug why things failed in the first place. Address this by not attempting to retry until ansible is fixed. This way we get accurate error messages more quickly (as we don't retry over and over and generate a bad error message at the end). Change-Id: I545c44b11f37576edc8768a3ed78962ff870995f	2016-11-16 11:49:08 -08:00
James E. Blair	bafbc5b328	Ansible launcher: move AFS publisher into a module The logic to rsync files into AFS is very complex, requiring an rsync command for each of the pseudo-build-roots that are produced by our docs jobs. Rather than try to do this in ansible YAML, move it into an ansible module where it is much simpler. Change-Id: I4cab8003442734ed48c67e09ea8407ec69303d87	2016-11-07 14:32:52 -08:00
James E. Blair	38ce39fe58	Use separate library directories for pre and post The custom command module used in order to collect job output was also being used by the pre and post playbooks. This meant that instead of going to the ansible log file, the rsync output would end up in /tmp/console.html on the zuul launcher. To correct this, create separate library directories for use by the pre and post playbooks which will contain all of the modules except the custom command module. Write separate ansible.cfg files for them, and instruct ansible-playbook to use those config files. Change-Id: I5eb6bcc48bcaa6b056af1af7da93f29408f9db41	2016-11-01 08:43:06 -07:00
James E. Blair	b2d99edf67	Add extra debugging for AFS rsync Add the Ansible-standard rsync output format option to rsync, and also output the filter file to the logs to aid in debugging. Change-Id: I68daf93ee7f5d501e51ec90d201830a18c6e5a47	2016-11-01 07:52:42 -07:00
Monty Taylor	a126e32d30	Add names to post-playbook tasks for debugging While trying to follow a failed post-playbook in the gate, it became harder than desirable to follow which task was failing. Add names to the tasks so that we can track which thing is going on. Change-Id: I35fd7ad75c82f6a82fc8d12b7fd48860c1ab10f1	2016-11-01 07:35:31 -05:00
James E. Blair	90b13ca096	Ansible launcher: remove keep_remote_files This option was overriding pipelining=True. Change-Id: Icfb281513e33d2390414a5dffc8c9f433d7e24d7	2016-10-20 10:05:31 -07:00
Paul Belanger	7aaf5d2f76	Add back timeout_var logic We still need to setup our timeout-var environmental variable, otherwise devstack gate will fail to read BUILD_TIMEOUT and default jobs to 120min timeouts. Change-Id: Ieccba55eaab83074a409efdbb928b4a4fdfdecf7 Signed-off-by: Paul Belanger <pabelanger@redhat.com>	2016-10-20 06:12:33 -04:00
John L. Villalovos	5a84a7647e	Ansible launcher: use sequence-uuid in shell scripts For the generated shell scripts which are named using UUID4, prepend a sequence count to them to easily be able to tell the ordering of the scripts when looking in '_zuul_ansible/scripts/'. Keep the uuid to avoid potential collisions in /tmp. Change-Id: Id80bf5139ba1ce12c62945421d49c5e3cd8e2f48	2016-10-19 10:11:33 -07:00
John L. Villalovos	3f21d4061d	Generate shell scripts as a sequence For the generated shell scripts in ansiblelaunchserver.py, have them be generated in numerical order. For example 01.sh, 02.sh, etc. This will allow us to tell the ordering of the scripts when looking in '_zuul_ansible/scripts/' Change-Id: Iba6231242a58a23549c92aa32620d498e05886f8	2016-10-19 11:17:41 -05:00
Monty Taylor	a4c892d6b4	Revert "Put script string in directly instead of in files" This reverts commit `a192814194`. Change-Id: Idd17e474d3ac8842855cb47f74d5ba7c331a074e	2016-10-19 10:54:07 -05:00
Jenkins	b27b7e1c5d	Merge "Enable pipelining for ansible-playbook"	2016-10-19 13:55:01 +00:00
Jenkins	380103662e	Merge "Split playbook into vars, pre-playbook and playbook"	2016-10-19 13:54:55 +00:00
Jenkins	9555cafb98	Merge "Stop running commands with async"	2016-10-19 13:54:48 +00:00
Jenkins	192c027adb	Merge "Put script string in directly instead of in files"	2016-10-19 13:54:41 +00:00
Jenkins	4746b64086	Merge "Use command module instead of zuul_runner"	2016-10-19 13:54:37 +00:00
Jenkins	523f4458c4	Merge "Rename zuul_runner to command"	2016-10-19 13:51:43 +00:00
James E. Blair	226cdd4706	Ansible launcher: Fix afs publisher root detection The find command that collected the marker files is expected to print paths with a leading '/' (see later commands which grep for '^/') but this was omitted. This would cause all jobs which published to the root (whether they had any content in the root directory or were simply only intended to publish to a subdir of the root) to conflict with each other. Also, correct a missing fully-qualified path. Change-Id: I6030c2b101026ff8e72cf4043e1d1b4fbffc5dcb	2016-10-03 10:50:31 -07:00
Paul Belanger	fd97be44d9	Strip leading / from afs targets It seems that Jenkins does this. At least with FTP. We don't have any leading / on AFS targets, but do the same there for symmetry. Change-Id: Icb7451c0f3f5fa62c8a15fc621fd30f2df166c96 Signed-off-by: Paul Belanger <pabelanger@redhat.com>	2016-09-30 19:58:24 -04:00
Paul Belanger	843b4df30d	Enable pipelining for ansible-playbook From the manual: Enabling pipelining reduces the number of SSH operations required to execute a module on the remote server, by executing many ansible modules without actual file transfer. This can result in a very significant performance improvement when enabled, however when using “sudo:” operations you must first disable ‘requiretty’ in /etc/sudoers on all managed hosts. Basically on local testing, there is a speed improvement. However, I believe the better reason to enable this is to reduce the number of SSH transactions we preform on our workers. In doing this we reduce our potential chances for SSH connection issue. However, it also appears async operations do not use this setting simply because of async works. Change-Id: Ib224fbf1fed19be3ce7db4da0c466e3d11acc365 Signed-off-by: Paul Belanger <pabelanger@redhat.com>	2016-09-29 18:35:30 -05:00
Monty Taylor	6767fceba6	Split playbook into vars, pre-playbook and playbook In order to get to the point where playbooks that people write for tests are playbooks that they could conceivably also use outside of the zuul context, we need to remove the need for zuul-specific things in the main playbook. Add a pre-playbook that runs before the playbook and runs the things that are not tied to current JJB content - namely setting up the logger and prepping directories. Move the SUCCESS/FAILURE message to the post-playbook. Extract the injected variables into a variables file and add a -e@vars.yaml option to the playbook invocation. This provides variables in a known namespace. Obviously there is still an exercise in how a user might write a playbook that wants to consume those variables in some way. Change-Id: Ie5ec6ec65a03ceea9afc3ac59df73cb28f5ca4dd	2016-09-29 18:35:30 -05:00
Monty Taylor	f166784a28	Stop running commands with async The async module is complex, and we're only using it to handle the running cumulative timeout. However, we still fallback on the watchdog timeout from time to time. Make things simpler by just having that be how we time things out. Change-Id: Ie51de4a135d953c4ad9dcb773d27b3c54ca8829b	2016-09-29 18:34:59 -05:00
Monty Taylor	a192814194	Put script string in directly instead of in files Now that we're using the command module, just do inline script content to make debugging/reading easier. Change-Id: Ia63f77fd41a03b4662c26f9d0f3b70d1e6a8b5d3	2016-09-29 18:19:53 -05:00
Monty Taylor	d1ddd284b8	Use command module instead of zuul_runner Having a modified command module with the zuul_runner logic allows us to use normal command and shell entries in the playbooks. (shell is just a wrapper around command) At this moment in time it's an invasive fork of the run_command method on AnsibleModule. That's not optimal for long term, but should get us closer to being able to discuss appropriate hook points with upstream ansible. Use environment task parameter instead of parameters ansible has a structure for passing in environment variables which we can use. We did not use it before due to a behavior in ansible from pre-2.2 that set LANG settings in the environment in a way that caused us to need to clean things in zuul_runner. The module_set_locale variable defaults to False in 2.2, but to True in 2.1 (which was the regression) Set the config value explcitly just to be sure. Change-Id: Iae4769f923ecf74462e1fe43168ea93ff1c61d6e	2016-09-29 18:17:56 -05:00
Monty Taylor	331c3de4a7	Rename zuul_runner to command In the next patch, we're going to change the body of zuul_runner. But, in order to render that diff well, do the rename in this patch. Change-Id: I3727f506cae5da561948869bd8f8daaf42e4dc0d	2016-09-29 18:17:56 -05:00
James E. Blair	5b9b2bdf02	Ansible launcher: fix afs publisher This contains several fixes: * Support remove-prefix. This is used by the FTP publisher we are replacing. * Fix sed expressions. They were missing a '/'. * Make the target directory before rsync. Rsync requires the target root directory exist before running. Elsewhere we solved that by encoding the mkdir into the remote rsync command. Since we are running locally here, just run 'mkdir -p' before running rsync. However, it must be done with the keytab, so include it in the k5start command (so that we do not need to run k5start twice). * Include the 'user' in the site definition as the principal for k5start. Change-Id: I69c263a35e732b9a21d411bd30215945783d1023	2016-09-29 10:08:23 -07:00
Jenkins	2e077db0b8	Merge "Ansible launcher: format ipv6 urls [correctly]"	2016-09-15 17:13:33 +00:00
James E. Blair	c02dd818a6	Ansible launcher: format ipv6 urls [correctly] Change-Id: Ib6464498a6a030cbfa89c65dcf27dd98d21c1cfa	2016-09-14 16:13:52 -07:00
James E. Blair	583fdc3d7e	Ansible launcher: run k5start in playbook Rather than requiring the launcher to be run with k5start, run k5start only during the specific rsync command where it is required. Change-Id: I1d8258c4b13d21c96072d1a03c3a3472b0d878d5	2016-09-14 16:13:19 -07:00
James E. Blair	50408bc955	Ansible launcher: add AFS publisher This is an extension to JJB that works only in zuul-launcher, not Jenkins. It allows copying the results of a build into afs. It actually isn't really AFS specific at all, other than it checks that the destination path is under /afs. Otherwise, it behaves as a local copy on the launcher itself. It also contains the logic needed to publish OpenStack's documentation builds, which can appear as subdirectories of other builds. Change-Id: Icda75266219d2d7167e80aaad8e290443cfdbadc	2016-09-14 16:05:00 -07:00
Jenkins	aba3258b4e	Merge "Use {{ ansible_host }} for ssh-keyscan"	2016-09-02 22:00:52 +00:00
Sean Dague	fa17628a45	bump timeout on ssh commands to 30s We are seeing intermitent failures in zuul trying to talk to the node which look like they are the 10s ssh negotiation failing. Extremely busy test nodes that are using their entire network bw to pull packages, may take longer than this. Try to reduce this by bumping the timeout. Change-Id: Ic4ec2ea3c8b77cb308fb1a85514d831acf6c4b67	2016-09-01 09:26:57 -04:00
Jenkins	80fe50f484	Merge "Ansible launcher: re-register functions after disconnect"	2016-08-31 23:21:17 +00:00
Jenkins	a63f4e7bbe	Merge "Revert "Make job registration with labels optional""	2016-08-30 18:50:20 +00:00
Paul Belanger	30f2b29874	Revert "Store ssh_host_key of remote node" Jobs no longer launch using this code. Revert so we can debug the issue. This reverts commit `b6341fbe63`. Change-Id: Ie8076e3e162e3f223367321d8f57ccb48a0f57f6 Signed-off-by: Paul Belanger <pabelanger@redhat.com>	2016-08-25 00:43:17 -04:00
Paul Belanger	b6341fbe63	Store ssh_host_key of remote node Run ssh-keygen on the known_host file to extract the ssh_host_key. We do this to help debug the scenario when the remote nodes identification has changed: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! Someone could be eavesdropping on you right now (man-in-the-middle attack)! It is also possible that a host key has just been changed. The fingerprint for the RSA key sent by the remote host is 51:82:00:1c:7e:6f:ac:ac:de:f1:53:08:1c:7d:55:68. Please contact your system administrator. Change-Id: Ica41c80db91e7b08dbc34516b3812da4148c36e3 Signed-off-by: Paul Belanger <pabelanger@redhat.com>	2016-08-24 15:15:32 -04:00
James E. Blair	17262fd512	Ansible launcher: retry publisher sync tasks We sometimes see errors rsyncing data from the node or to the log server. Since these are all rsync commands, they are safe to retry. Attempt all post playbook rsyncs up to 3 times with a 30 second delay between each attempt. Change-Id: I329e1f1f31d53d82799e3485a912b76e2249d03f	2016-08-10 07:57:11 -07:00
Paul Belanger	6662f85042	Use {{ ansible_host }} for ssh-keyscan This is a noop change, which removes the hardcoded node IP address from our playbook. This is a step forward to allow users to re-run our playbooks in an effort to reproduce produce problems locally. Change-Id: I3d3b979fb9bfffce1ea1466403a277e6f6e146cc Signed-off-by: Paul Belanger <pabelanger@redhat.com>	2016-08-08 18:53:45 -04:00
James E. Blair	2959855544	Ansible launcher: re-register functions after disconnect Because we are using the private MASS_DO gearman operation to register functions, the gear.Worker does not know what functions are registered and therefore the routine which automatically re-registers functions after a gear server disconnect was not effective. Correct this by also storing the function list when sending MASS_DO. This will result in the worker actually sending CAN_DO packets rather than MASS_DO in the case of a reconnect, but at least it will be correct, if not efficient. This error would cause existing nodes attached to zuul launchers to be unable to run jobs after a zuul (geard) restart. Change-Id: I60804355a8b3a3cfb79a12dd6e6f0e219fe50c31	2016-08-03 15:14:43 -07:00
James E. Blair	176431ec14	Ansible launcher: set remote_tmp When we use 'delegate_to' to run commands locally, the 'remote' side of the Ansible connection is the local host. When running these tasks it will write to the 'remote_tmp' directory, which is actually the local ~/.ansible/tmp directory. We also set 'keep_remote_files' to true in order to avoid a race condition with 'async' on the actual remote hosts, but in this case, these two options in combination end up meaning 'keep some files in the local ~/.ansible/tmp directory indefinitely' which is not good for our long-running launchers. Instead, set 'remote_tmp' to a subdirectory of the jobdir so that when used in the local context, it will be cleaned up at the end of the run. In the remote context, it will end up in a similarly randomly named directory under /tmp on the worker. Ansible will create that directory. This has the side benefit of removing the Ansible running the job further from potential uses of Ansible within the job (which may continue to use ~/.ansible by default). Change-Id: I70475d5844cbd66bf670566f992fdec263d271a5	2016-07-25 08:11:24 -07:00
Jenkins	c709b6adb7	Merge "Ansible launcher: Use port 19885 for console streaming"	2016-07-21 20:37:06 +00:00
James E. Blair	ff6dd45cbc	Ansible launcher: Use port 19885 for console streaming Thanks Jay! Change-Id: Ie67bbba02dbf61a481f66001de3e0dede9448316 Closes-Bug: 1590139	2016-07-21 11:43:02 -07:00
James E. Blair	bb2e9dbcfb	Revert "Make job registration with labels optional" This reverts commit `aad4917fce`. We ended up not using this. Change-Id: I17d37627528ece1880d05e372099e9d1158e1fec	2016-07-20 09:51:48 -07:00
Jenkins	bc58ea3412	Merge "Make job registration with labels optional"	2016-07-19 21:06:49 +00:00

1 2 3 4

183 Commits