Commit Graph

17 Commits

Author SHA1 Message Date
Ghanshyam Mann e06f50cb06 Retire Tripleo: remove repo content
TripleO project is retiring
- https://review.opendev.org/c/openstack/governance/+/905145

this commit remove the content of this project repo

Change-Id: I73df79a8698625815ea4e3099904da448a49887e
2024-02-24 11:42:30 -08:00
Takashi Kajinami ef041632ea Remove implementations for Docker support
... because Docker support has been removed from tht and these are no
longer used.

Depends-on: https://review.opendev.org/843755
Change-Id: I5719d06464ba2c1d37898b44f70ac5521ceaaf7e
2022-06-20 17:29:07 +09:00
Takashi Kajinami d6ed3576af Pacemaker: Replace hiera by lookup (1)
The hiera function is deprecated and does not work with the latest
hieradata version 5. It should be replaced by the new lookup
function[1].

[1] https://puppet.com/docs/puppet/7/hiera_automatic.html

With the lookup function, we can define value type and merge behavior,
but these are kept default at this moment to limit scope of this change
to just simple replacement. Adding value type might be useful to make
sure the value is in expected type (especially when a boolean value is
expected), but we will revisit that later.

example:
lookup(<NAME>, [<VALUE TYPE>], [<MERGE BEHAVIOR>], [<DEFAULT VALUE>])

Note that this change does not cover manifests to set up pacemaker
resources. These will be covered by a separate patch.

Change-Id: Id5d601477477f0f33277f115527139f81d80133e
2022-05-25 16:00:34 +09:00
Michele Baldessari 5cb4565c01 Use pcs 0.9 style authkey/remotes when doing an upgrade
We leverage the _override keys for pacemaker and pacemaker_remote
services in order to decide if we should use the old way of managing
remotes (managed authkey and pcs 0.9 way of doing remotes).

The pacemaker_remote override keys are introduced via
https://review.opendev.org/741610

Related-Bug: #1888398

Change-Id: Iad23663d6c98e4fd3a507980638870e0ad0cee45
2020-07-31 10:44:55 +02:00
Michele Baldessari d833bcd92e Fix typo in remote pcsd_bind_addr
Typo slipped in when this code was merged, we need to reference the
proper variable

Closes-Bug: #1861668

Change-Id: Ida10d018e73fb19bb72032fcb2113e1762fb94fa
2020-02-03 11:03:17 +01:00
Zuul 97640c3611 Merge "Add support to configure pcsd bind address" 2019-12-21 05:14:39 +00:00
Takashi Kajinami b5ee4bacac Add support to configure pcsd bind address
Add support to configure pcsd bind address so that we can
make pcsd listen on specific address instead of all interfaces
on the node.

Related-Bug: #1856626
Depends-on: https://review.opendev.org/#/c/697942
Depends-On: https://review.opendev.org/700250
Change-Id: I442b190b6fa429ee3a81fd2ea84ada6ed9bca7d2
2019-12-20 23:27:40 +00:00
Tobias Urdin 1523a4b804 Convert all class usage to relative names
Change-Id: Ib2ed745b682cf12f9469a5a64451adcabec400af
2019-12-08 23:23:25 +01:00
Michele Baldessari f1a593b642 Initial support for tls_priorities
We add initial support for being able to specify tls priorities in
pacemaker. For bundles this will happen via an env variable because
pacemaker_remote is started normally as a process and there is no
sourcing of /etc/sysconfig/pacemaker.

Tested on both queens and stein. Via a deploy and a redeploy against
existing cloud. Observed that:
A) We got PCMK_tls_priorities inside /etc/sysconfig/pacemaker with the
value that was passed in THT
B) Containers had the following env variable set:
  "PCMK_tls_priorities=normal",

The '-e' addition is a noop in case the PCMK_tls_priorities is unset
so that we do not change the signature of the resources and hence do
not needlessly restart the HA resource.

Depends-On: I1971810f6a90f244ed5ced972a5fe7fde29dde86
Change-Id: I703b5a429f48063474aace85bc45d948f5c91435
2019-07-27 07:59:45 +00:00
Michele Baldessari 16c5f16925 Make sure we pass the proper new pcs 0.10 variables
With I852d5d7aa8578c45f0c7215827cedd4ea72c8d0b we will now
use the new system to create pacemaker remotes when pcs 0.10
is installed on the system (RHEL/Centos >=8). This new
way of doing things requires the remotes to set up pcsd
and so we need to pass the same pcs user and password that
are used by pacemaker itself to the remote resource.

Tested on an IHA OSP15 deployment and got a successful deployment.

Depends-On: I852d5d7aa8578c45f0c7215827cedd4ea72c8d0b
Change-Id: I02a8c1d618b11f68f9272b9044ff81ac39d6a81b
2019-06-04 07:37:06 +02:00
Michele Baldessari e288dbd825 Make sure rhel-plugin-push.service is stopped after pacemaker stops
When issuing a normal reboot command on an overcloud node the following
stop sequence can take place:
------------- -----------------------------
| Pacemaker | | paunch-container-shutdown |
------------- -----------------------------
          |     |
           \   /
            \ /
        ----------
        | docker |
        ----------

If there are docker plugins that are allowed to stop before docker and
also before pacemaker, it might happen that stopping them down during
the pacemaker stop will cause a bunch of timeouts and a failure to stop
containers:
Sep 13 17:53:00.821030 controller-0.localdomain pacemakerd[6147]: notice: Shutting down Pacemaker
Sep 13 17:54:15.798026 controller-0.localdomain lrmd[6284]: warning: galera-bundle-docker-0_monitor_60000 process (PID 226329) timed out
Sep 13 17:54:15.799004 controller-0.localdomain lrmd[6284]: warning: galera-bundle-docker-0_monitor_60000:226329 - timed out after 20000ms

One of these plugins is 'rhel-push-plugin.service'. It seems that when
this plugin is free to stop before docker on shutdown, it is very
possible that docker commands can start timing out.

Before:
Before adding the symlink we would need 15mins to reboot a node and
we would get a bunch of timeouts on shutdown and some failed actions on
boot.

After:
A reboot will take a reasonable couple of minutes to complete with no
failed actions at boot and timeouts during shutdown.

NB: We add the symlink unconditionally as systemd will ignore it if the
service is not installed.

Change-Id: I6f6d27f2457efcc49d9edd8a2f98484c5f7c0933
Closes-Bug: #1792701
2018-09-20 11:53:04 +02:00
Michele Baldessari feca86b730 Fix ordering when pacemaker with bundles is being used
If you gracefully restart a node with pacemaker on it, the following can happen:
1) docker service gets stopped first
2) pacemaker gets shutdown
3) pacemaker tries to shutdown the bundles but fails due to 1)

This can make it so that after the reboot, because shutting down the
services failed, two scenarios can take place:
A) The node gets fenced (when stonith is configured) because it failed to stop a resource
B) The state of the resource might be saved as Stopped and when the node
   comes back up (if multiple nodes were rebooted at the same time) the CIB
   might have Stopped as the target state for the resource.

In the case of B) we will see something like the following:

Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Full list of resources:

 Docker container set: rabbitmq-bundle [192.168.0.1:8787/rhosp12/openstack-rabbitmq-docker:pcmklatest]
   rabbitmq-bundle-0 (ocf:💓rabbitmq-cluster): Stopped overcloud-controller-0
   rabbitmq-bundle-1 (ocf:💓rabbitmq-cluster): Stopped overcloud-controller-0
   rabbitmq-bundle-2 (ocf:💓rabbitmq-cluster): Stopped overcloud-controller-0
 Docker container set: galera-bundle [192.168.0.1:8787/rhosp12/openstack-mariadb-docker:pcmklatest]
   galera-bundle-0 (ocf:💓galera): Stopped overcloud-controller-0
   galera-bundle-1 (ocf:💓galera): Stopped overcloud-controller-0
   galera-bundle-2 (ocf:💓galera): Stopped overcloud-controller-0
 Docker container set: redis-bundle [192.168.0.1:8787/rhosp12/openstack-redis-docker:pcmklatest]
   redis-bundle-0 (ocf:💓redis): Stopped overcloud-controller-0
   redis-bundle-1 (ocf:💓redis): Stopped overcloud-controller-0
   redis-bundle-2 (ocf:💓redis): Stopped overcloud-controller-0
 ip-192.168.0.12 (ocf:💓IPaddr2): Stopped
 ip-10.19.184.160 (ocf:💓IPaddr2): Stopped
 ip-10.19.104.14 (ocf:💓IPaddr2): Stopped
 ip-10.19.104.19 (ocf:💓IPaddr2): Stopped
 ip-10.19.105.11 (ocf:💓IPaddr2): Stopped
 ip-192.168.200.15 (ocf:💓IPaddr2): Stopped
 Docker container set: haproxy-bundle [192.168.0.1:8787/rhosp12/openstack-haproxy-docker:pcmklatest]
   haproxy-bundle-docker-0 (ocf:💓docker): FAILED (blocked)[ overcloud-controller-0 overcloud-controller-2 overcloud-controller-1 ]
   haproxy-bundle-docker-1 (ocf:💓docker): FAILED (blocked)[ overcloud-controller-0 overcloud-controller-2 overcloud-controller-1 ]
   haproxy-bundle-docker-2 (ocf:💓docker): FAILED (blocked)[ overcloud-controller-0 overcloud-controller-2 overcloud-controller-1 ]
 openstack-cinder-volume (systemd:openstack-cinder-volume): Started overcloud-controller-0

Failed Actions:
* rabbitmq-bundle-docker-0_stop_0 on overcloud-controller-0 'unknown error' (1): call=93, status=Timed Out, exitreason='none',
    last-rc-change='Fri Nov 17 13:55:35 2017', queued=0ms, exec=20023ms
* rabbitmq-bundle-docker-1_stop_0 on overcloud-controller-0 'unknown error' (1): call=94, status=Timed Out, exitreason='none',
    last-rc-change='Fri Nov 17 13:55:35 2017', queued=0ms, exec=20037ms
* galera-bundle-docker-0_stop_0 on overcloud-controller-0 'unknown error' (1): call=96, status=Timed Out, exitreason='none',
    last-rc-change='Fri Nov 17 13:55:35 2017', queued=0ms, exec=20035ms

We fix this by adding the docker service to
resource-agents-deps.target.wants which is the recommended method to
make sure non pacemaker managed resources come up before pacemaker
during a start and get stopped after pacemaker's service stop:
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/high_availability_add-on_reference/s1-nonpacemakerstartup-haar

We conditionalize this change when docker is enabled and we also
make sure that we make the change only after the docker package
is installed, in order to cover split stack deployments as well.

With this change we were able to restart nodes without
observing any timeouts during stop or stopped resources
at startup.

Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>

Change-Id: I6a4dc3d4d4818f15e9b7e68da3eb07e54b0289fa
Closes-Bug: #1733348
2017-11-21 09:44:57 +01:00
Jenkins 8ca7a1e390 Merge "Ensure hiera step value is an integer" 2017-06-16 23:25:23 +00:00
Michele Baldessari 332755c0fd Only set the stonith property on the pacemaker_master node
It makes little sense to enforce the stonith property on remote nodes and/or
all cluster nodes. We can just enforce it once on the pacemaker_master
node as it is a cluster-wide property anyway. We can also remove the
tripleo::fencing -> pacemaker::stonith constraint in the pacemaker
remote profile now as the fencing stuff happens on step 5 anyway and
the property is set at step 1.

While this works in general it creates extra CIB changes for nothing and
slows down the deployment.

Change-Id: Ifef08033043a4cc90a6261e962d2fdecdf275650
Closes-Bug: #1696336
2017-06-14 09:01:26 +02:00
Steve Baker 94f13e6608 Ensure hiera step value is an integer
The step is typically set with the hieradata setting an integer value:

  {"step": 1}

However it would be useful for the value to be a string so that
substitutions are possible, for example:

  {"step": "%{::step}"}

This change ensures the step parameter defaults to an integer by
calling Integer(hiera('step'))

This change was made by manually removing the undef defaults from
fluentd.pp, uchiwa.pp, and sensu.pp then bulk updating with:

    find ./ -type f -print0 |xargs -0 sed -i "s/= hiera('step')/= Integer(hiera('step'))/"

Change-Id: I8a47ca53a7dea8391103abcb8960a97036a6f5b3
2017-06-14 14:31:52 +12:00
Chris Jones 19d177c182 Add support for autofencing to Pacemaker Remote.
We now configure stonith devices for Pacemaker Remote nodes.

Change-Id: I87c60bd56feac6dedc00a3c458b805aa9b71d9ce
Depends-On: Ifb4d19a6b9920b0e340555d6441878c7234eb197
Partial-Bug: #1686115
2017-04-26 10:21:09 +01:00
Michele Baldessari 25b327c9a0 pacemaker remote profile support
This support enables a base profile called pacemaker_remote which will
allow the operator to automatically configure the pacemaker_remote
service on such nodes. This manifest also automatically adds any
pacemaker_remote nodes to the pacemaker cluster.

Depends-On: I0c01ecb7df1a0f9856fdc866b9d06acf0283fa4f
Depends-On: Ic0488f4fc63e35b9aede60fae1e2cab34b1fbdd5
Change-Id: I92953afcc7d536d387381f08164cae8b52f41605
2017-01-24 15:46:51 +01:00