Commit Graph

51 Commits

Author SHA1 Message Date
Ghanshyam Mann e06f50cb06 Retire Tripleo: remove repo content
TripleO project is retiring
- https://review.opendev.org/c/openstack/governance/+/905145

this commit remove the content of this project repo

Change-Id: I73df79a8698625815ea4e3099904da448a49887e
2024-02-24 11:42:30 -08:00
Takashi Kajinami 8427725125 Pacemaker: Replace hiera by lookup (2)
The hiera function is deprecated and does not work with the latest
hieradata version 5. It should be replaced by the new lookup
function[1].

[1] https://puppet.com/docs/puppet/7/hiera_automatic.html

With the lookup function, we can define value type and merge behavior,
but these are kept default at this moment to limit scope of this change
to just simple replacement. Adding value type might be useful to make
sure the value is in expected type (especially when a boolean value is
expected), but we will revisit that later.

example:
lookup(<NAME>, [<VALUE TYPE>], [<MERGE BEHAVIOR>], [<DEFAULT VALUE>])

This covers the remaining manifests to set up pacemaker resource.

Change-Id: I749b979a7333f68a646f36afa912603b1af0a943
2022-09-08 02:29:49 +09:00
Takashi Kajinami 6dc7cde6c6 RabbitMQ: Migrate environment/volumes definition
This change effectively migrates environment and volumes used by
rabbitmq pacemaker resource from puppet-tripleo to tht, so that we can
reduce amount of logics we implement in puppet layer.

Depends-on: https://review.opendev.org/854943
Change-Id: I5c895c6ad76d635f574824161f612eb102c673f4
2022-08-30 03:41:03 +00:00
Takashi Kajinami ae15e803e0 RabbitMQ: Simplify how to suppress error from pam_unix.so
This is follow-up of 44985bd42d, and
replaces the implementation to suppress error from pam_unix.so by
the quiet option, as CentOS 7/RHEL 7 support was removed a long ago.

Change-Id: I620f96dc21c5bc85b14152e92c79b648c4a1b343
2022-08-07 02:07:59 +09:00
Takashi Kajinami ef041632ea Remove implementations for Docker support
... because Docker support has been removed from tht and these are no
longer used.

Depends-on: https://review.opendev.org/843755
Change-Id: I5719d06464ba2c1d37898b44f70ac5521ceaaf7e
2022-06-20 17:29:07 +09:00
Takashi Kajinami eac5caa96c Fix lint failures
We started seeing some lint failures which were not caught properly
before. This change fixes all these failures to unblock the lint job.

Change-Id: I8efbf29e0d153d48f114d8799ffb67e3c7a8185f
2022-01-31 16:25:16 +00:00
Bogdan Dobrelya 39aad09567 Make reply_ and _fanout queues non HA
Based on [0][1] for better performance of rabbitmq cluster,
short-living queues should not be replicated for HA. Those are not only
amq.* but reply_* for RPC calls and *_fanout for casts/notifications.

Note: there had been quite a few fixes in oslo.messaging to address the
missing reply_ queues, and the most recent was [2]

[0] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit
[1] http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016569.html
[2] https://review.opendev.org/q/Id5cddbefbe24ef100f1cc522f44430df77d217cb

Change-Id: Ibf95bb7029cbe7f7bf8823fe2e724e9cafbf31c6
Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>
2021-11-30 14:10:45 +01:00
Michele Baldessari fdca31a200 Bind mount the IPA crt when internal_tls is enabled
In order for later reviews to make use of the FreeIPA internal
CA we need to first bind mount it within the container.

We need to add a default in the hiera definition (/etc/ipa/ca.crt)
in order to break a cyclic dependency on the subsequent patches.
(THT child change will set the rabbitmq::ssl_cacert key)

Related-Bug: #1946374
Change-Id: Ib0236f9c086d520d0a27e3aa8b41927bc7b50c26
2021-10-09 09:16:55 +02:00
Cédric Jeanneret e91aac2822 Add missing "z" flag for specific mounts
Depending on the host history, it may happen some directory content
don't have the correct SELinux type. This has been seen with OVN
service, during a Queens -> Train FFU:

while the /var/lib/openvswitch/ovn directory had the correct
container_file_t type, some files in this location were typed with
openvswitch_var_lib_t, leading to errors during the deploy part of the
upgrade (after the OS upgrade, when the deploy is running on the cleaned
host).
The specific issue depends on the actual files with the wrong label, but
usually it involves a container crash/error, leading to a deploy error,
and a manual intervention in order to correct the SELinux type in the
location.

This situation may happen when first deployed on Queens, since it was
using Docker. For the records, back then Docker Daemon was configured in
order to disable the SELinux support, so it didn't really care about
labels; but the situation is different with Podman, and we have a full
SELinux support at all levels on the OS, leading to the issue.

For the records, tripleo-heat-templates as well as tripleo-ansible are
setting the "setype: container_file_t" on the directories, but we don't
use the "recurse: true" in order to avoid performance issues - some
locations might be huge, and it would take too much time to relabel
everything via ansible.

This patch aims to converge all the mounts to the same options, and
ensure no SELinux denial can prevent the actual container startup and
function.

Change-Id: Ic3e427156fc82c524c763d1896937fcc3c49fabb
Closes-Bug: #1943459
2021-09-14 12:59:31 +02:00
Michele Baldessari ae8e9c4912 Allow to use the upstream rabbitmq-server-ha OCF resource agent
We introduce a new hiera key in order to be able to use the upstream
rabbitmq-server-ha OCF resource [1].
For it to work inside bundles we need to have a rabbitmq-server
package inside the bundle which includes at least
https://github.com/rabbitmq/rabbitmq-server/pull/2853
and also we need to be using at least pacemaker-2.0.4-4.el8.

The rationale for this work is that the current rabbitmq-cluster
resource agent maintained under the ClusterLabs umbrella is
a cloned resource, which is limited in the number of actions
it can do in a number of situations (partition, failover, etc).

The upstream resource agent is a master/slave resource and
allows for more expressive semantics in general and is preferrable [2].

[1] https://github.com/rabbitmq/rabbitmq-server/blob/master/scripts/rabbitmq-server-ha.ocf
[2] https://github.com/lemenkov/pmk-rmq.md/blob/master/pmk-rmq.md

Co-authored-by: Bogdan Dobrelya <bdobrelia@redhat.com>
Change-Id: Ia273d0dbc668bbae4c6e9cb535bd68783faf0148
2021-07-31 20:56:46 +02:00
Takashi Kajinami 26ee01a0d9 Allow tuning timeouts for rabbitmq pacemaker resource
This change introduces several timeout parameters so that users can
tune operation timeouts about rabbimtq resource in pacemaker.

Change-Id: Iaecdc0adb8455b2e660624f19a42e6dede5b931d
2021-06-09 08:26:34 +09:00
Takashi Kajinami f08d83de05 Fix lint errors with the latest lint packages
This change fixes the lint errors detected since we removed pins of
lint packages.
Note that this change also replaces absolute name used to call
the tripleo::stunnel::service_proxy resource type, which is not yet
detected by the latest lint rules.

Closes-Bug: #1928079
Change-Id: I12ba801db92cb3df1d05f14f4c150ac765f0b874
2021-05-11 22:17:37 +09:00
Michele Baldessari d185cbf032 Allow OCF resources to be created with --force
While moving to running pcs commands on the host and off short-lived
containers, we are confronted with the issue that pcs usually checks
for the resource agent's existence on the host before creating it.
Since we'd rather avoid installing the needed resource agents on the
host (as it is inside a container), we allow a new 'force_ocf' parameter
to be passed to those situations where we might need it.

Depends-On: I20eb78a061a334b20f6b2274591c5d313a0af532

Related-Bug: #1863442
Change-Id: If9048196b5c03e3cfaba72f043b7f7275568bdc4
2020-05-08 08:12:28 +00:00
Takashi Kajinami 5f77bc71ac Remove unnecessory usage of hiera
We don't need to use hiera if the parameter is actually implemented
in the class.

Change-Id: Ia916707eaecb7a6d48f992ff2112fe8507544ee1
2020-04-21 23:30:39 +09:00
Michele Baldessari 06c4aa7446 Log stdout of HA containers
When podman dropped the journald log-driver we rushed to move to the supported
k8s-file driver. This had the side effect of us losing the stdout logs of the
HA containers.

In fact previously we were easily able to troubleshoot haproxy startup failures
just by looking in the journal. These days instead if haproxy fails to start we
have no traces whatsoever in the logs, because when a container fails it gets
stopped by pacemaker (and consequently removed) and no logs on the system are
available any longer.

Tested as follows:
1) Redeploy a previously deployed overcloud that did not have the patch
and observe that we now log the startup of HA bundles in /var/log/containers/stdouts/*bundle.log

[root@controller-0 stdouts]# ls -l *bundle.log |grep -v -e init -e restart
-rw-------. 1 root root   16032 Apr 14 14:13 openstack-cinder-volume.log
-rw-------. 1 root root   19515 Apr 14 14:00 haproxy-bundle.log
-rw-------. 1 root root   10509 Apr 14 14:03 ovn-dbs-bundle.log
-rw-------. 1 root root    6451 Apr 14 14:00 redis-bundle.log

2) Deploy a composable HA overcloud from scratch with the patch above
and observe that we obtain the stdout on disk.

Note that most HA containers log to their usual on-host files just
fine, we are mainly missing haproxy logs and/or the kolla startup only
of the HA containers.

Closes-Bug: #1872734

Change-Id: I4270b398366e90206adffe32f812632b50df615b
2020-04-15 20:10:03 +00:00
Alex Schultz a566d6b9b8 Add check for bootstrap_node for downcase
Downcase in puppet 6.14 throws an error if the input to it is Undef. We
can avoid this by checking for a value before trying to downcase.

See context https://review.rdoproject.org/r/#/c/26297/

Change-Id: Ib2e97060523a4198a14949a15c9171b56928699c
2020-04-07 14:51:41 -06:00
Damien Ciabrini e60351ee09 HA: fix rabbitmq readiness check for rabbitmq-server 3.8
In HA profiles, we wait for rabbitmq application readiness by
parsing the output of "rabbitmqctl status". This breaks with
rabbitmq-server 3.8 which changed the output of that command.

Fix our check by using a "rabbitmqctl eval" and by relying on
a stable function call rather than parsing output. This
approach works for rabbitmq-server 3.6 to 3.8.

Change-Id: Id88d0aee74e4b26fd64bbc2da5d0c0fc4bbd6644
Co-Authored-By: Yatin Karel <ykarel@redhat.com>
Closes-Bug: #1864962
2020-02-27 16:41:44 +01:00
Michele Baldessari d766eb81a3 Make the bundle user configurable via hiera
Allow all bundles --user option to be overridden as some of them might
prefer switching to a non-root user when possible.
The ovn-dbs bundle is a bit special because it never specified any user.
Hence we default that user to undef and do not set anything.

Tested as follows:
1. deployed an overcloud
2. patched it with this change
3. redeployed and and then observed that no HA container has restarted at all
4. verified cinder-volume runs with root by default:
USER  PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root    1  0.0  0.0   4204   716 ?        Ss   09:01   0:00 dumb-init --single-child -- /bin/bash /usr/local/bin/kolla_start
root    7  0.7  0.7 912976 145760 ?       S    09:01   1:04 /usr/bin/python3 /usr/bin/cinder-volume --config-file /usr/share/cinder/cinder-dist.conf --config-file /etc/cinder/cinder.conf
root   71  0.1  0.6 925800 124640 ?       S    09:01   0:14 /usr/bin/python3 /usr/bin/cinder-volume --config-file /usr/share/cinder/cinder-dist.conf --config-file /etc/cinder/cinder.conf
5. added 'tripleo::profile::pacemaker::cinder::volume_bundle::bundle_user: cinder' to
   the templates and redeployed
6. Observed that cinder-volume got restarted and now runs with cinder
   user:
USER   PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
cinder   1  0.0  0.0   4204   804 ?        Ss   12:23   0:00 dumb-init --single-child -- /bin/bash /usr/local/bin/kolla_start
cinder   7  2.1  0.7 912976 145432 ?       S    12:23   0:04 /usr/bin/python3 /usr/bin/cinder-volume --config-file /usr/share/cinder/cinder-dist.conf --config-file /etc/cinder/cinder.conf
cinder  64  0.3  0.5 919908 118452 ?       S    12:23   0:00 /usr/bin/python3 /usr/bin/cinder-volume --config-file /usr/share/cinder/cinder-dist.conf --config-file /etc/cinder/cinder.conf

Change-Id: I985d0d192ef3accf7fdd31503348de80713fded4
2020-01-13 11:40:32 +01:00
Tobias Urdin 1523a4b804 Convert all class usage to relative names
Change-Id: Ib2ed745b682cf12f9469a5a64451adcabec400af
2019-12-08 23:23:25 +01:00
Michele Baldessari bad716070a Switch HA containers to k8s-file log-driver and make it a parameter
Currently in puppet-tripleo for the HA container we hardcode the following:
 options => "--user=root --log-driver=journald -e KOLLA_CONFIG_STRATEGY=COPY_ALWAYS${tls_priorities_real}",

Since at least podman had some changes in terms of supported driver
backends (and bugs) it's best if we make this configurable. While we're
at it we should also switch to k8s-file as a driver when podman is being
used which is what all other containers are using. When docker is the
default container_cli we will stick to journald as usual.

Tested this on a Train environment and successfully verified that
we still see the correct logs in /var/log/containers/.../...

Change-Id: I5b1483826f816d11a064a937d59f9a8f468315a5
Closes-Bug: #1853517
2019-11-22 11:36:37 +01:00
Michele Baldessari 3a8c2b0dc7 Make the rabbitmq-ready exec more stringent
Currently we use the following command to determine if rabbit is
up and running *and* ready to service requests:
rabbitmqctl eval "rabbit_mnesia:is_clustered()." | grep -q true

Now we have occasionally observed that rabbitmqctl policies commands
which are executed after said exec['rabbitmq-ready'] will fail.

One potential reason is that is_clustered() can return true *before*
the rabbit app is actually running. In fact we can see it does
return true even though the app is stopped:
()[root@controller-1 /]$ rabbitmqctl stop_app
Stopping rabbit application on node rabbit@controller-1 ...
()[root@controller-1 /]$ rabbitmqctl eval 'rabbit_mnesia:is_clustered().'
true

Let's switch to a combination of commands that check for the cluster to
be up *and* the rabbitmq app to be running:
()[root@controller-1 /]$ rabbitmqctl stop_app
Stopping rabbit application on node rabbit@controller-1 ...
()[root@controller-1 /]$ rabbitmqctl eval 'rabbit_nodes:is_running(node(), rabbit).'
false

Suggested-By: Bogdan Dobrelya <bdobreli@redhat.com>
Closes-Bug: #1835615

Change-Id: I29f779145a39cd16374a91626f7fae1581a18224
2019-08-19 19:56:35 +00:00
Michele Baldessari f1a593b642 Initial support for tls_priorities
We add initial support for being able to specify tls priorities in
pacemaker. For bundles this will happen via an env variable because
pacemaker_remote is started normally as a process and there is no
sourcing of /etc/sysconfig/pacemaker.

Tested on both queens and stein. Via a deploy and a redeploy against
existing cloud. Observed that:
A) We got PCMK_tls_priorities inside /etc/sysconfig/pacemaker with the
value that was passed in THT
B) Containers had the following env variable set:
  "PCMK_tls_priorities=normal",

The '-e' addition is a noop in case the PCMK_tls_priorities is unset
so that we do not change the signature of the resources and hence do
not needlessly restart the HA resource.

Depends-On: I1971810f6a90f244ed5ced972a5fe7fde29dde86
Change-Id: I703b5a429f48063474aace85bc45d948f5c91435
2019-07-27 07:59:45 +00:00
Jiri Stransky bac59f433b Fix rabbitmq staged upgrade
Fix the short name overriding, and add long name (fqdn) overriding.

Change-Id: Ia152aed696be15119ba5b75177ef82bc786c4b05
Partial-Bug: #1832588
2019-06-28 09:06:11 +00:00
Zuul 1e5c120f48 Merge "RabbitMQ: always allow promotion on HA queue during failover" 2019-06-14 19:40:52 +00:00
Michele Baldessari 610c8d8d41 RabbitMQ: always allow promotion on HA queue during failover
When the RabbitMQ experience a rolling restart of its peers, the
master of an HA queue fails over from one replica to another.

If there are messages sent to the HA queue while some rabbit
nodes are restarting, the latter will reconnect as unsynchronized
slaves. It can happen that during a rolling restart, all rabbit
nodes reconnect as unsynchronized, which prevents RabbitMQ to
automatically elect a new Master for failover. This has other
side effects on fanout queues and may prevent OpenStack
notification to be consumed properly.

Change the HA policy to always allow a promotion even when all
replicas are unsynchronized. When such rare condition happens,
rely on OpenStack client to retry RPC if they need to.

Closes-Bug: #1823305
Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>
Change-Id: Id9bdd36aa0ee81424212e3a89185311817a15aee
2019-06-14 10:07:24 +02:00
Jiri Stransky 566703dc27 Fix RabbitMQ locale for CentOS 7 (Puppet part)
It seems that CentOS 7 does not have C.UTF-8 locale. Since we need
UTF-8-based locale, use en_US.UTF-8 instead.

Change-Id: I25d2b9a227a7c5de127bdfd9d2f387be9eea01e0
Partial-Bug: #1823062
2019-04-04 11:14:18 +02:00
Michele Baldessari a92d1fccc6 Force C.UTF-8 when dealing with rabbitmq
When we use rabbitmq 3.7 we might hit the following issue when running rabbitmqctl commands inside containers (as puppet does):

  Error: Failed to apply catalog: Cannot parse invalid user line: warning:
  the VM is running with native name encoding of latin1 which may cause
  Elixir to malfunction as it expects utf8. Please ensure your locale is
  set to UTF-8 (which can be verified by running "locale" in your shell)

This is fundamentally the tripleo version of
https://github.com/voxpupuli/puppet-rabbitmq/issues/671

This is a strict requirement coming from Elixir:
https://github.com/elixir-lang/elixir/issues/3548

Since containers do not have UTF-8 as a default we have this problem:
[root@overcloud-controller-0 ~]# podman exec -it rabbitmq-bundle-podman-0 sh
()[root@overcloud-controller-0 /]$ rabbitmqctl -q list_users
warning: the VM is running with native name encoding of latin1 which may cause Elixir to malfunction as it expects utf8. Please ensure your locale is set to UTF-8 (which can be verified by ru nning "locale" in your shell) user tags
guest [administrator]
()[root@overcloud-controller-0 /]$ locale
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>
Related-Bug: #1822673
Change-Id: I21ef2e7862f3e5e21812d342b1681f8d5f7f005d
2019-04-02 14:25:07 +02:00
Sofer Athlan-Guyot 48b1775e35 Extra variables to reprovision pacemaker cluster one node at a time.
For the upgrade we have to re-provision the controller cluster, one
node at a time.

Using extra override variable set in hiera we are able to specify to
pacemaker which nodes should be added to the cluster.

Change-Id: I2f6ef4679265718fbbe8726ee6c81832bc468f3e
Implements: blueprint upgrades-with-os
2019-02-12 10:20:48 +01:00
Oliver Walsh 035de7493d cell_v2 multi-cell
- move nova dbsync from nova-api to nova-conductor
  - nova db is more tightly coupled to conductor/computes
  - we don't have a nova-api services on a CellController
  - super-conductor on Controller will sync cell0 db
- when additional cell
  - duplicate service node name hiera for transport_urls on cell stack
  - nova -> oslo_messaging_rpc_cell_node_names
  - neutron agent -> oslo_messaging_rpc_node_names
  - rabbit -> rabbit nodes are cell controllers

bp tripleo-multicell-basic

Co-Authored-By: Martin Schuppert <mschuppert@redhat.com>

Change-Id: I79c1080605611c5c7748a28d2afcc9c7275a2e5d
2019-02-05 09:53:50 +01:00
Michele Baldessari 736d69dad9 Add retries to HA bundles
The retry is needed in a composable HA environment because a two nodes
might be modifying the CIB at the same time and so we need to retry more
than once to get the freshest CIB, modify it and push it back. Currently
all HA resources have it but we did not add it in the bundles. While it
is a rare race, we should still plug it.

Change-Id: Ib9d9c76c83f103e329a9c575ae5c110d5ad3c048
Closes-Bug: #1809223
2019-01-04 12:51:51 +00:00
Michele Baldessari 44985bd42d Remove some of the excessive rabbitmq bundle logging
By removing the pam-systemd optinal session line we get rid of the
following line:
pam_systemd(su:session): Failed to connect to system bus: No such file or directory

It is useless inside a container anyway since the pam_systemd module
registers user sessions.

By adding a sufficient pam_succeed_if call fo when the user belongs to the
rabbitmq group we get rid of the following spurious log:
Oct 23 13:52:52 overcloud-controller-0 su: pam_unix(su:session): session opened for user rabbitmq by (uid=0)
Oct 23 13:52:54 overcloud-controller-0 su: pam_unix(su:session): session closed for user rabbitmq

We do not need this inside a container anyway. In the future (w/
pam_unix 1.2.0 and onwards we will be able to use the quiet option
instead).

Depends-On: Ic0789da4645a4ee186d82ad7d943de78d4d5c443

Change-Id: Icd199ca4ce4848c971488d8ab69e668add86b150
Related-Bug: #1806451
2018-12-11 16:17:16 +00:00
Michele Baldessari 177d951be3 Allow the container backend to be configurable
We added a container backend in puppet-pacemaker via
Ia4a7b58d14d80e85d51e98acec1aad2ba90b69de. Let's now
let tripleo override it when needed.

Tested this via some hiera keys overrides and it works correctly.

Change-Id: I610923327462b901840131316a4984c8fe98faaa
2018-11-15 20:41:24 +01:00
Michele Baldessari c372f5e6a1 Remove restart_flag leftovers for bundles
Since the introduction of I62870c055097569ceab2ff67cf0fe63122277c5b
"Introduce restart_bundle containers to detect config changes and
restart pacemaker resources" we actually use paunch to detect any
config changes (by verifying an md5 hash over the generated config
files of the service).

With this new way of detecting changes there is no need to use the
old 'tripleo::pacemaker::resource_restart_flag' method to restart
pcmk services.

Let's just remove this unused code.

Change-Id: Ib12dbe66575e3d54a8ec7d2c72c2b4619bc39b03
2018-10-18 05:46:21 +00:00
Michele Baldessari f2484a0bf9 Fix up property names in case of mixed case hostnames
When deploying a stack that containes mixed-case hostnames
the following error might be triggered:
Debug: try 15/20: /usr/sbin/pcs -f
/var/lib/pacemaker/cib/puppet-cib-backup20180405-8-1sqw3dc property set
--node TEST-STACK34-controller-1 redis-role=true
Debug: Error: Error: unable to set attribute redis-role
Could not map name=TEST-STACK34-controller-1 to a UUID
while the name in the cluster is test-stack34-controller-1

This used to work pre-bundles because we used the facter provided
$::hostname variable which was lower-cased for us. With bundles we
switched to setting cluster properties from the service bootstrap nodes
and so we used the '<service>_short_node_names' hiera key which might
contain mixed-case hostnames.

In order to fix this we just downcase() the short_node_names hiera
string that we loop on so we can get the same behaviour we had on bare
metal.

Tested on an env with mixed-case hostnames:
[root@uppercaseovercloud-controller-0 keystone]# hiera -c /etc/puppet/hiera.yaml rabbitmq_short_node_names
["UPPERCASEOverCloud-controller-0",
 "UPPERCASEOverCloud-controller-1",
 "UPPERCASEOverCloud-controller-2"]

Cluster pcs properties were set correctly:
[root@uppercaseovercloud-controller-0 keystone]# pcs property |grep rabbitmq
 uppercaseovercloud-controller-0: galera-role=true haproxy-role=true rabbitmq-role=true redis-role=true rmq-node-attr-last-known-rabbitmq=rabbit@uppercaseovercloud-controller-0
 uppercaseovercloud-controller-1: galera-role=true haproxy-role=true rabbitmq-role=true redis-role=true rmq-node-attr-last-known-rabbitmq=rabbit@uppercaseovercloud-controller-1
 uppercaseovercloud-controller-2: galera-role=true haproxy-role=true rabbitmq-role=true redis-role=true rmq-node-attr-last-known-rabbitmq=rabbit@uppercaseovercloud-controller-2

Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>
Depends-On: Ie240b8a4217827dd8ade82479a828817d63143ba
Closes-bug: #1773219
Change-Id: I5bd49c4a1b13b2310f8a1173aa6b86abfa5dab3d
2018-05-28 10:28:14 +02:00
Zuul 1a73b868ce Merge "Support separate oslo.messaging services for RPC and Notifications" 2018-04-29 13:02:17 +00:00
Zuul 408db62e22 Merge "Support both rabbitmq and oslo.messaging service nodes" 2018-04-07 00:39:46 +00:00
Andrew Smith c04557fba4 Support separate oslo.messaging services for RPC and Notifications
This commit introduces separate oslo.messaging services in place of
a single rabbitmq server. This enables the separation of rpc and
notifications, the continued use of single rabbitmq server as well
as the use of alternative oslo.messaging drivers/backends.

This patch:
* adds oslo_messaging_* hiera parameters
* update rabbitmq and qdrourterd services
* add release note

Depends-On: I03e99d35ed043cf11bea9b7462058bd80f4d99da
Depends-On: I934561612d26befd88a9053262836b47bdf4efb0
Change-Id: Ie181a92731e254b7f613ad25fee6cc37e985c315
2018-03-20 12:55:02 -04:00
Jiri Stransky d8d86cfe68 Conventional log directories for pacemaker bundles
Use /var/log/containers/<service> instead of /var/log/<service>, as
the rest of the containerized services.

Change-Id: Id5760c16260de991ff95168c76186edc113752c8
Depends-On: Icb311984104eac16cd391d75613517f62ccf6696
Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>
Closes-Bug: #1731969
2018-03-19 12:55:12 +00:00
Andrew Smith 79ccad4b8d Support both rabbitmq and oslo.messaging service nodes
This commit selects either the rabbitmq hosts or the
hosts associated to oslo.messaging rpc and notify services.
This is required for the transition of t-h-t to the use
of the separated oslo.messaging service backends.

This patch:
*select rpc and notify hosts from rabbitmq or oslo_messaging
*modify qdrouterd inter-router link port
*update qdr unit spec
*add release note

Needed-By: I934561612d26befd88a9053262836b47bdf4efb0
Change-Id: I154e2fe6f66b296b9b643627d57696e5178e1815
2018-03-16 18:16:42 -04:00
Damien Ciabrini 1cfecc39dc Fix rabbitmq-ready check for single node HA deployments
The current rabbitmq-ready exec waits for rabbitmq to become clustered
before it allows user creation. Unfortunately this doesn't work when
the deployment contains a single node, because rabbit doesn't trigger
the clustering mode at all.

Set the exec test according to the number of rabbit nodes, in order
to check for cluster state only when necessary.

Closes-Bug: #1741345

Change-Id: I24e5e344b7f657ce5d42a7c7c45be7b5ed5e6445
Co-Authored-By: John Eckersberg <jeckersb@redhat.com>
2018-01-05 10:14:48 +00:00
Michele Baldessari 2f33d74173 Fix up the rabbitmq-ready check
So the current rabbitmq-ready exec has a few unexpected problems:

1) The notify mechanism is not being called, but after discussion
we're comfortable in calling this all the time, just like we do this
for galera.
2) Calling rabbitmqctl inside a container is problematic because
the mere invocation of the cluster_status command will actually
spawn an epmd process which will take the epmd port and which will
subsequently make the rabbitmq-bundle started by pacemaker fail to
form a cluster.

For this reason (working around the rabbitmqctl issue is potentially
doable once we upgrade to erlang 19.x but not with older versions)
it is vital that this container gets spawned with /bin/epmd nooped
to /bin/true.

We now only proceed after rabbit tells us that it is part of a cluster.
Just checking for rabbit being up is not enough because if the user gets
created before the node joins a cluster, it might not be replicated
(depending on the timing).

Partial-Bug: #1739026

Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>
Co-Authored-By: John Eckersberg <jeckersb@redhat.com>
Change-Id: I54c541d86782665ae0f689428a16edc155f87993
Depends-On: Ie74a13a6c8181948900ea0de8ee9717f76f3ce79
2017-12-20 07:24:29 +01:00
Michele Baldessari b2dc580a3f Make sure rabbitmq is fully up before creating any rabbitmq resources
Right now after creating the rabbitmq pacemaker resource, we have no
guarantee that rabbit will be up. Let's add the same mechanism we use
today with the galera-ready exec resource. This gives us the guarantee
that once the resource has been created it is up and we can actually
create rabbitmq users (some 3rd party plugins do that, not stock
TripleO).

Specifically, we probe that the '{rabbit,' app shows up in the status,
so we can guarantee that rabbit is running before invoking any other
rabbitmqctl commands.

Change-Id: Ib37eb2e591f97de54ee6449817ae8d70c6541753
Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>
2017-11-17 07:26:55 +01:00
Alex Schultz 3c58543678 Revert "Revert "Set meta container-attribute-target=host attribute""
This reverts commit 1681d3bceb. 

NOTE: This needs to be tested against scenario004-containers before merging.

This is needed because when we run bundles we actually
want to store attributes on a per-node basis and not on a per-bundle
basis. By activating this attribute pacemaker will pass
some extra OCS_RESKEY_CRM_meta attributes that will help us in this
decision.

We can merge this once we have packages for pacemaker and
resource-agents releases that contain the necessary fixes.

Proper pacemaker and resource-agents are now in the repo [1] so
we can merge it and backport it to pike.

[1] https://buildlogs.centos.org/centos/7/cloud/x86_64/openstack-pike/

Closes-Bug: #1713007

Change-Id: Ie968470126833939c19223f04db29556e550673d
2017-10-30 16:12:46 +00:00
John Trowbridge 1681d3bceb Revert "Set meta container-attribute-target=host attribute"
This patch broke the containers scenario004 test because it relies on a
newer mariadb container than has actually passed CI at this time.

To revert this revert, we need to make sure we test
scenario004-containers against that patch.

This reverts commit 6bcb011723.

Closes-Bug: 1721497

Change-Id: I34c7c388eed94db1735c45e26661a0af8cdce8e9
2017-10-06 13:03:04 +00:00
Michele Baldessari 6bcb011723 Set meta container-attribute-target=host attribute
This is needed because when we run bundles we actually
want to store attributes on a per-node basis and not on a per-bundle
basis. By activating this attribute pacemaker will pass
some extra OCS_RESKEY_CRM_meta attributes that will help us in this
decision.

We can merge this once we have packages for pacemaker and
resource-agents releases that contain the necessary fixes.

Proper pacemaker and resource-agents are now in the repo [1] so
we can merge it and backport it to pike.

[1] https://buildlogs.centos.org/centos/7/cloud/x86_64/openstack-pike/

Closes-Bug: #1713007

Change-Id: I0dd06e953b4c81f217d0f4199b2337e4c3358086
2017-09-28 14:05:21 +02:00
Damien Ciabrini 86a3261b4d Enable TLS configuration for containerized RabbitMQ
In non-containerized deployments, RabbitMQ can be configured to use TLS for
serving and mirroring traffic.

Fix the creation of the rabbitmq bundle resource to enable TLS when configured.
The key and cert are passed as other configuration files and must be copied by
Kolla at container startup.

Change-Id: Ia64d79462de7012e5bceebf0ffe478a1cccdd6c9
Partial-Bug: #1709558
2017-08-09 07:51:58 +00:00
Michele Baldessari 1da0b51ecc Fix up the control-port for rabbitmq bundles
Mistakenly this was set to 3121 which is the same port that pacemaker
remote uses. Move this to 3122 which was the plan all along.

Also fix a wrong port comment in redis and mysql at the same time.

Change-Id: Iccca6a53a769570443091577c7d86f47119d9cbb
2017-07-21 10:46:48 +02:00
Martin André 1e90178298 Leverage kolla config_files to copy config into containers
This solves a problem with bind-mounts when the containers are holding
files descriptors open.

At the same time this makes the template more robust to puppet changes
since new config files will be available in the containers without
needing to update the templates.

Closes-Bug: #1698323
Change-Id: I857c94ba5f7f064d7c58df621ec5d477654b9166
Depends-On: I78dcec741a941dc21adba33ba33a6dc6ff1d217c
2017-07-12 09:56:56 +00:00
Steve Baker 94f13e6608 Ensure hiera step value is an integer
The step is typically set with the hieradata setting an integer value:

  {"step": 1}

However it would be useful for the value to be a string so that
substitutions are possible, for example:

  {"step": "%{::step}"}

This change ensures the step parameter defaults to an integer by
calling Integer(hiera('step'))

This change was made by manually removing the undef defaults from
fluentd.pp, uchiwa.pp, and sensu.pp then bulk updating with:

    find ./ -type f -print0 |xargs -0 sed -i "s/= hiera('step')/= Integer(hiera('step'))/"

Change-Id: I8a47ca53a7dea8391103abcb8960a97036a6f5b3
2017-06-14 14:31:52 +12:00
Michele Baldessari b10adec303 Make sure the resource bundles use a location_rule
In composable HA we bind resources to nodes that have special
node properties. We need to do this also for bundle resources
otherwise there is a potential race where the bundle might be
started on nodes where it is not supposed to during a small
window of time.

Tested with the depends-on and correctly obtained a containerized
composable HA deployment:

Docker container set: rabbitmq-bundle
[192.168.24.1:8787/tripleoupstream/centos-binary-rabbitmq:latest]
  rabbitmq-bundle-0    (ocf:💓rabbitmq-cluster):      Started overcloud-rabbit-0
  rabbitmq-bundle-1    (ocf:💓rabbitmq-cluster):      Started overcloud-rabbit-1
  rabbitmq-bundle-2    (ocf:💓rabbitmq-cluster):      Started overcloud-rabbit-2
Docker container set: galera-bundle
[192.168.24.1:8787/tripleoupstream/centos-binary-mariadb:latest]
  galera-bundle-0      (ocf:💓galera):        Master overcloud-galera-0
  galera-bundle-1      (ocf:💓galera):        Master overcloud-galera-1
  galera-bundle-2      (ocf:💓galera):        Master overcloud-galera-2
Docker container set: redis-bundle
[192.168.24.1:8787/tripleoupstream/centos-binary-redis:latest]
  redis-bundle-0       (ocf:💓redis): Master overcloud-controller-0
  redis-bundle-1       (ocf:💓redis): Slave overcloud-controller-1
  redis-bundle-2       (ocf:💓redis): Slave overcloud-controller-2
ip-192.168.24.11       (ocf:💓IPaddr2):       Started overcloud-controller-0
ip-10.0.0.7    (ocf:💓IPaddr2):       Started overcloud-controller-1
ip-172.16.2.11 (ocf:💓IPaddr2):       Started overcloud-controller-2
ip-172.16.2.9  (ocf:💓IPaddr2):       Started overcloud-controller-0
ip-172.16.1.6  (ocf:💓IPaddr2):       Started overcloud-controller-1
ip-172.16.3.7  (ocf:💓IPaddr2):       Started overcloud-controller-2
Docker container set: haproxy-bundle
[192.168.24.1:8787/tripleoupstream/centos-binary-haproxy:latest]
  haproxy-bundle-docker-0      (ocf:💓docker):        Started overcloud-controller-0
  haproxy-bundle-docker-1      (ocf:💓docker):        Started overcloud-controller-1
  haproxy-bundle-docker-2      (ocf:💓docker):        Started overcloud-controller-2

Depends-On: I44449861cbfe56304b8829c9ca10fd648353b3ae
Change-Id: I48fb490040497ba08cae19937159c0efdf99e3f8
2017-06-09 21:18:27 +02:00