puppet-tripleo

Commit Graph

Author	SHA1	Message	Date
Ghanshyam Mann	e06f50cb06	Retire Tripleo: remove repo content TripleO project is retiring - https://review.opendev.org/c/openstack/governance/+/905145 this commit remove the content of this project repo Change-Id: I73df79a8698625815ea4e3099904da448a49887e	2024-02-24 11:42:30 -08:00
Takashi Kajinami	ef041632ea	Remove implementations for Docker support ... because Docker support has been removed from tht and these are no longer used. Depends-on: https://review.opendev.org/843755 Change-Id: I5719d06464ba2c1d37898b44f70ac5521ceaaf7e	2022-06-20 17:29:07 +09:00
Takashi Kajinami	d6ed3576af	Pacemaker: Replace hiera by lookup (1) The hiera function is deprecated and does not work with the latest hieradata version 5. It should be replaced by the new lookup function[1]. [1] https://puppet.com/docs/puppet/7/hiera_automatic.html With the lookup function, we can define value type and merge behavior, but these are kept default at this moment to limit scope of this change to just simple replacement. Adding value type might be useful to make sure the value is in expected type (especially when a boolean value is expected), but we will revisit that later. example: lookup(<NAME>, [<VALUE TYPE>], [<MERGE BEHAVIOR>], [<DEFAULT VALUE>]) Note that this change does not cover manifests to set up pacemaker resources. These will be covered by a separate patch. Change-Id: Id5d601477477f0f33277f115527139f81d80133e	2022-05-25 16:00:34 +09:00
Michele Baldessari	5cb4565c01	Use pcs 0.9 style authkey/remotes when doing an upgrade We leverage the _override keys for pacemaker and pacemaker_remote services in order to decide if we should use the old way of managing remotes (managed authkey and pcs 0.9 way of doing remotes). The pacemaker_remote override keys are introduced via https://review.opendev.org/741610 Related-Bug: #1888398 Change-Id: Iad23663d6c98e4fd3a507980638870e0ad0cee45	2020-07-31 10:44:55 +02:00
Michele Baldessari	d833bcd92e	Fix typo in remote pcsd_bind_addr Typo slipped in when this code was merged, we need to reference the proper variable Closes-Bug: #1861668 Change-Id: Ida10d018e73fb19bb72032fcb2113e1762fb94fa	2020-02-03 11:03:17 +01:00
Zuul	97640c3611	Merge "Add support to configure pcsd bind address"	2019-12-21 05:14:39 +00:00
Takashi Kajinami	b5ee4bacac	Add support to configure pcsd bind address Add support to configure pcsd bind address so that we can make pcsd listen on specific address instead of all interfaces on the node. Related-Bug: #1856626 Depends-on: https://review.opendev.org/#/c/697942 Depends-On: https://review.opendev.org/700250 Change-Id: I442b190b6fa429ee3a81fd2ea84ada6ed9bca7d2	2019-12-20 23:27:40 +00:00
Tobias Urdin	1523a4b804	Convert all class usage to relative names Change-Id: Ib2ed745b682cf12f9469a5a64451adcabec400af	2019-12-08 23:23:25 +01:00
Michele Baldessari	f1a593b642	Initial support for tls_priorities We add initial support for being able to specify tls priorities in pacemaker. For bundles this will happen via an env variable because pacemaker_remote is started normally as a process and there is no sourcing of /etc/sysconfig/pacemaker. Tested on both queens and stein. Via a deploy and a redeploy against existing cloud. Observed that: A) We got PCMK_tls_priorities inside /etc/sysconfig/pacemaker with the value that was passed in THT B) Containers had the following env variable set: "PCMK_tls_priorities=normal", The '-e' addition is a noop in case the PCMK_tls_priorities is unset so that we do not change the signature of the resources and hence do not needlessly restart the HA resource. Depends-On: I1971810f6a90f244ed5ced972a5fe7fde29dde86 Change-Id: I703b5a429f48063474aace85bc45d948f5c91435	2019-07-27 07:59:45 +00:00
Michele Baldessari	16c5f16925	Make sure we pass the proper new pcs 0.10 variables With I852d5d7aa8578c45f0c7215827cedd4ea72c8d0b we will now use the new system to create pacemaker remotes when pcs 0.10 is installed on the system (RHEL/Centos >=8). This new way of doing things requires the remotes to set up pcsd and so we need to pass the same pcs user and password that are used by pacemaker itself to the remote resource. Tested on an IHA OSP15 deployment and got a successful deployment. Depends-On: I852d5d7aa8578c45f0c7215827cedd4ea72c8d0b Change-Id: I02a8c1d618b11f68f9272b9044ff81ac39d6a81b	2019-06-04 07:37:06 +02:00
Michele Baldessari	e288dbd825	Make sure rhel-plugin-push.service is stopped after pacemaker stops When issuing a normal reboot command on an overcloud node the following stop sequence can take place: ------------- ----------------------------- \| Pacemaker \| \| paunch-container-shutdown \| ------------- ----------------------------- \| \| \ / \ / ---------- \| docker \| ---------- If there are docker plugins that are allowed to stop before docker and also before pacemaker, it might happen that stopping them down during the pacemaker stop will cause a bunch of timeouts and a failure to stop containers: Sep 13 17:53:00.821030 controller-0.localdomain pacemakerd[6147]: notice: Shutting down Pacemaker Sep 13 17:54:15.798026 controller-0.localdomain lrmd[6284]: warning: galera-bundle-docker-0_monitor_60000 process (PID 226329) timed out Sep 13 17:54:15.799004 controller-0.localdomain lrmd[6284]: warning: galera-bundle-docker-0_monitor_60000:226329 - timed out after 20000ms One of these plugins is 'rhel-push-plugin.service'. It seems that when this plugin is free to stop before docker on shutdown, it is very possible that docker commands can start timing out. Before: Before adding the symlink we would need 15mins to reboot a node and we would get a bunch of timeouts on shutdown and some failed actions on boot. After: A reboot will take a reasonable couple of minutes to complete with no failed actions at boot and timeouts during shutdown. NB: We add the symlink unconditionally as systemd will ignore it if the service is not installed. Change-Id: I6f6d27f2457efcc49d9edd8a2f98484c5f7c0933 Closes-Bug: #1792701	2018-09-20 11:53:04 +02:00
Michele Baldessari	feca86b730	Fix ordering when pacemaker with bundles is being used If you gracefully restart a node with pacemaker on it, the following can happen: 1) docker service gets stopped first 2) pacemaker gets shutdown 3) pacemaker tries to shutdown the bundles but fails due to 1) This can make it so that after the reboot, because shutting down the services failed, two scenarios can take place: A) The node gets fenced (when stonith is configured) because it failed to stop a resource B) The state of the resource might be saved as Stopped and when the node comes back up (if multiple nodes were rebooted at the same time) the CIB might have Stopped as the target state for the resource. In the case of B) we will see something like the following: Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Full list of resources: Docker container set: rabbitmq-bundle [192.168.0.1:8787/rhosp12/openstack-rabbitmq-docker:pcmklatest] rabbitmq-bundle-0 (ocf:💓rabbitmq-cluster): Stopped overcloud-controller-0 rabbitmq-bundle-1 (ocf:💓rabbitmq-cluster): Stopped overcloud-controller-0 rabbitmq-bundle-2 (ocf:💓rabbitmq-cluster): Stopped overcloud-controller-0 Docker container set: galera-bundle [192.168.0.1:8787/rhosp12/openstack-mariadb-docker:pcmklatest] galera-bundle-0 (ocf:💓galera): Stopped overcloud-controller-0 galera-bundle-1 (ocf:💓galera): Stopped overcloud-controller-0 galera-bundle-2 (ocf:💓galera): Stopped overcloud-controller-0 Docker container set: redis-bundle [192.168.0.1:8787/rhosp12/openstack-redis-docker:pcmklatest] redis-bundle-0 (ocf:💓redis): Stopped overcloud-controller-0 redis-bundle-1 (ocf:💓redis): Stopped overcloud-controller-0 redis-bundle-2 (ocf:💓redis): Stopped overcloud-controller-0 ip-192.168.0.12 (ocf:💓IPaddr2): Stopped ip-10.19.184.160 (ocf:💓IPaddr2): Stopped ip-10.19.104.14 (ocf:💓IPaddr2): Stopped ip-10.19.104.19 (ocf:💓IPaddr2): Stopped ip-10.19.105.11 (ocf:💓IPaddr2): Stopped ip-192.168.200.15 (ocf:💓IPaddr2): Stopped Docker container set: haproxy-bundle [192.168.0.1:8787/rhosp12/openstack-haproxy-docker:pcmklatest] haproxy-bundle-docker-0 (ocf:💓docker): FAILED (blocked)[ overcloud-controller-0 overcloud-controller-2 overcloud-controller-1 ] haproxy-bundle-docker-1 (ocf:💓docker): FAILED (blocked)[ overcloud-controller-0 overcloud-controller-2 overcloud-controller-1 ] haproxy-bundle-docker-2 (ocf:💓docker): FAILED (blocked)[ overcloud-controller-0 overcloud-controller-2 overcloud-controller-1 ] openstack-cinder-volume (systemd:openstack-cinder-volume): Started overcloud-controller-0 Failed Actions: * rabbitmq-bundle-docker-0_stop_0 on overcloud-controller-0 'unknown error' (1): call=93, status=Timed Out, exitreason='none', last-rc-change='Fri Nov 17 13:55:35 2017', queued=0ms, exec=20023ms * rabbitmq-bundle-docker-1_stop_0 on overcloud-controller-0 'unknown error' (1): call=94, status=Timed Out, exitreason='none', last-rc-change='Fri Nov 17 13:55:35 2017', queued=0ms, exec=20037ms * galera-bundle-docker-0_stop_0 on overcloud-controller-0 'unknown error' (1): call=96, status=Timed Out, exitreason='none', last-rc-change='Fri Nov 17 13:55:35 2017', queued=0ms, exec=20035ms We fix this by adding the docker service to resource-agents-deps.target.wants which is the recommended method to make sure non pacemaker managed resources come up before pacemaker during a start and get stopped after pacemaker's service stop: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/high_availability_add-on_reference/s1-nonpacemakerstartup-haar We conditionalize this change when docker is enabled and we also make sure that we make the change only after the docker package is installed, in order to cover split stack deployments as well. With this change we were able to restart nodes without observing any timeouts during stop or stopped resources at startup. Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com> Change-Id: I6a4dc3d4d4818f15e9b7e68da3eb07e54b0289fa Closes-Bug: #1733348	2017-11-21 09:44:57 +01:00
Jenkins	8ca7a1e390	Merge "Ensure hiera step value is an integer"	2017-06-16 23:25:23 +00:00
Michele Baldessari	332755c0fd	Only set the stonith property on the pacemaker_master node It makes little sense to enforce the stonith property on remote nodes and/or all cluster nodes. We can just enforce it once on the pacemaker_master node as it is a cluster-wide property anyway. We can also remove the tripleo::fencing -> pacemaker::stonith constraint in the pacemaker remote profile now as the fencing stuff happens on step 5 anyway and the property is set at step 1. While this works in general it creates extra CIB changes for nothing and slows down the deployment. Change-Id: Ifef08033043a4cc90a6261e962d2fdecdf275650 Closes-Bug: #1696336	2017-06-14 09:01:26 +02:00
Steve Baker	94f13e6608	Ensure hiera step value is an integer The step is typically set with the hieradata setting an integer value: {"step": 1} However it would be useful for the value to be a string so that substitutions are possible, for example: {"step": "%{::step}"} This change ensures the step parameter defaults to an integer by calling Integer(hiera('step')) This change was made by manually removing the undef defaults from fluentd.pp, uchiwa.pp, and sensu.pp then bulk updating with: find ./ -type f -print0 \|xargs -0 sed -i "s/= hiera('step')/= Integer(hiera('step'))/" Change-Id: I8a47ca53a7dea8391103abcb8960a97036a6f5b3	2017-06-14 14:31:52 +12:00
Chris Jones	19d177c182	Add support for autofencing to Pacemaker Remote. We now configure stonith devices for Pacemaker Remote nodes. Change-Id: I87c60bd56feac6dedc00a3c458b805aa9b71d9ce Depends-On: Ifb4d19a6b9920b0e340555d6441878c7234eb197 Partial-Bug: #1686115	2017-04-26 10:21:09 +01:00
Michele Baldessari	25b327c9a0	pacemaker remote profile support This support enables a base profile called pacemaker_remote which will allow the operator to automatically configure the pacemaker_remote service on such nodes. This manifest also automatically adds any pacemaker_remote nodes to the pacemaker cluster. Depends-On: I0c01ecb7df1a0f9856fdc866b9d06acf0283fa4f Depends-On: Ic0488f4fc63e35b9aede60fae1e2cab34b1fbdd5 Change-Id: I92953afcc7d536d387381f08164cae8b52f41605	2017-01-24 15:46:51 +01:00

17 Commits