puppet-tripleo

Commit Graph

Author	SHA1	Message	Date
Ghanshyam Mann	e06f50cb06	Retire Tripleo: remove repo content TripleO project is retiring - https://review.opendev.org/c/openstack/governance/+/905145 this commit remove the content of this project repo Change-Id: I73df79a8698625815ea4e3099904da448a49887e	2024-02-24 11:42:30 -08:00
Takashi Kajinami	8427725125	Pacemaker: Replace hiera by lookup (2) The hiera function is deprecated and does not work with the latest hieradata version 5. It should be replaced by the new lookup function[1]. [1] https://puppet.com/docs/puppet/7/hiera_automatic.html With the lookup function, we can define value type and merge behavior, but these are kept default at this moment to limit scope of this change to just simple replacement. Adding value type might be useful to make sure the value is in expected type (especially when a boolean value is expected), but we will revisit that later. example: lookup(<NAME>, [<VALUE TYPE>], [<MERGE BEHAVIOR>], [<DEFAULT VALUE>]) This covers the remaining manifests to set up pacemaker resource. Change-Id: I749b979a7333f68a646f36afa912603b1af0a943	2022-09-08 02:29:49 +09:00
Takashi Kajinami	6dc7cde6c6	RabbitMQ: Migrate environment/volumes definition This change effectively migrates environment and volumes used by rabbitmq pacemaker resource from puppet-tripleo to tht, so that we can reduce amount of logics we implement in puppet layer. Depends-on: https://review.opendev.org/854943 Change-Id: I5c895c6ad76d635f574824161f612eb102c673f4	2022-08-30 03:41:03 +00:00
Takashi Kajinami	ae15e803e0	RabbitMQ: Simplify how to suppress error from pam_unix.so This is follow-up of `44985bd42d`, and replaces the implementation to suppress error from pam_unix.so by the quiet option, as CentOS 7/RHEL 7 support was removed a long ago. Change-Id: I620f96dc21c5bc85b14152e92c79b648c4a1b343	2022-08-07 02:07:59 +09:00
Takashi Kajinami	ef041632ea	Remove implementations for Docker support ... because Docker support has been removed from tht and these are no longer used. Depends-on: https://review.opendev.org/843755 Change-Id: I5719d06464ba2c1d37898b44f70ac5521ceaaf7e	2022-06-20 17:29:07 +09:00
Takashi Kajinami	eac5caa96c	Fix lint failures We started seeing some lint failures which were not caught properly before. This change fixes all these failures to unblock the lint job. Change-Id: I8efbf29e0d153d48f114d8799ffb67e3c7a8185f	2022-01-31 16:25:16 +00:00
Bogdan Dobrelya	39aad09567	Make reply_ and _fanout queues non HA Based on [0][1] for better performance of rabbitmq cluster, short-living queues should not be replicated for HA. Those are not only amq.* but reply_* for RPC calls and *_fanout for casts/notifications. Note: there had been quite a few fixes in oslo.messaging to address the missing reply_ queues, and the most recent was [2] [0] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit [1] http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016569.html [2] https://review.opendev.org/q/Id5cddbefbe24ef100f1cc522f44430df77d217cb Change-Id: Ibf95bb7029cbe7f7bf8823fe2e724e9cafbf31c6 Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>	2021-11-30 14:10:45 +01:00
Michele Baldessari	fdca31a200	Bind mount the IPA crt when internal_tls is enabled In order for later reviews to make use of the FreeIPA internal CA we need to first bind mount it within the container. We need to add a default in the hiera definition (/etc/ipa/ca.crt) in order to break a cyclic dependency on the subsequent patches. (THT child change will set the rabbitmq::ssl_cacert key) Related-Bug: #1946374 Change-Id: Ib0236f9c086d520d0a27e3aa8b41927bc7b50c26	2021-10-09 09:16:55 +02:00
Cédric Jeanneret	e91aac2822	Add missing "z" flag for specific mounts Depending on the host history, it may happen some directory content don't have the correct SELinux type. This has been seen with OVN service, during a Queens -> Train FFU: while the /var/lib/openvswitch/ovn directory had the correct container_file_t type, some files in this location were typed with openvswitch_var_lib_t, leading to errors during the deploy part of the upgrade (after the OS upgrade, when the deploy is running on the cleaned host). The specific issue depends on the actual files with the wrong label, but usually it involves a container crash/error, leading to a deploy error, and a manual intervention in order to correct the SELinux type in the location. This situation may happen when first deployed on Queens, since it was using Docker. For the records, back then Docker Daemon was configured in order to disable the SELinux support, so it didn't really care about labels; but the situation is different with Podman, and we have a full SELinux support at all levels on the OS, leading to the issue. For the records, tripleo-heat-templates as well as tripleo-ansible are setting the "setype: container_file_t" on the directories, but we don't use the "recurse: true" in order to avoid performance issues - some locations might be huge, and it would take too much time to relabel everything via ansible. This patch aims to converge all the mounts to the same options, and ensure no SELinux denial can prevent the actual container startup and function. Change-Id: Ic3e427156fc82c524c763d1896937fcc3c49fabb Closes-Bug: #1943459	2021-09-14 12:59:31 +02:00
Michele Baldessari	ae8e9c4912	Allow to use the upstream rabbitmq-server-ha OCF resource agent We introduce a new hiera key in order to be able to use the upstream rabbitmq-server-ha OCF resource [1]. For it to work inside bundles we need to have a rabbitmq-server package inside the bundle which includes at least https://github.com/rabbitmq/rabbitmq-server/pull/2853 and also we need to be using at least pacemaker-2.0.4-4.el8. The rationale for this work is that the current rabbitmq-cluster resource agent maintained under the ClusterLabs umbrella is a cloned resource, which is limited in the number of actions it can do in a number of situations (partition, failover, etc). The upstream resource agent is a master/slave resource and allows for more expressive semantics in general and is preferrable [2]. [1] https://github.com/rabbitmq/rabbitmq-server/blob/master/scripts/rabbitmq-server-ha.ocf [2] https://github.com/lemenkov/pmk-rmq.md/blob/master/pmk-rmq.md Co-authored-by: Bogdan Dobrelya <bdobrelia@redhat.com> Change-Id: Ia273d0dbc668bbae4c6e9cb535bd68783faf0148	2021-07-31 20:56:46 +02:00
Takashi Kajinami	26ee01a0d9	Allow tuning timeouts for rabbitmq pacemaker resource This change introduces several timeout parameters so that users can tune operation timeouts about rabbimtq resource in pacemaker. Change-Id: Iaecdc0adb8455b2e660624f19a42e6dede5b931d	2021-06-09 08:26:34 +09:00
Takashi Kajinami	f08d83de05	Fix lint errors with the latest lint packages This change fixes the lint errors detected since we removed pins of lint packages. Note that this change also replaces absolute name used to call the tripleo::stunnel::service_proxy resource type, which is not yet detected by the latest lint rules. Closes-Bug: #1928079 Change-Id: I12ba801db92cb3df1d05f14f4c150ac765f0b874	2021-05-11 22:17:37 +09:00
Michele Baldessari	d185cbf032	Allow OCF resources to be created with --force While moving to running pcs commands on the host and off short-lived containers, we are confronted with the issue that pcs usually checks for the resource agent's existence on the host before creating it. Since we'd rather avoid installing the needed resource agents on the host (as it is inside a container), we allow a new 'force_ocf' parameter to be passed to those situations where we might need it. Depends-On: I20eb78a061a334b20f6b2274591c5d313a0af532 Related-Bug: #1863442 Change-Id: If9048196b5c03e3cfaba72f043b7f7275568bdc4	2020-05-08 08:12:28 +00:00
Takashi Kajinami	5f77bc71ac	Remove unnecessory usage of hiera We don't need to use hiera if the parameter is actually implemented in the class. Change-Id: Ia916707eaecb7a6d48f992ff2112fe8507544ee1	2020-04-21 23:30:39 +09:00
Michele Baldessari	06c4aa7446	Log stdout of HA containers When podman dropped the journald log-driver we rushed to move to the supported k8s-file driver. This had the side effect of us losing the stdout logs of the HA containers. In fact previously we were easily able to troubleshoot haproxy startup failures just by looking in the journal. These days instead if haproxy fails to start we have no traces whatsoever in the logs, because when a container fails it gets stopped by pacemaker (and consequently removed) and no logs on the system are available any longer. Tested as follows: 1) Redeploy a previously deployed overcloud that did not have the patch and observe that we now log the startup of HA bundles in /var/log/containers/stdouts/bundle.log [root@controller-0 stdouts]# ls -l bundle.log \|grep -v -e init -e restart -rw-------. 1 root root 16032 Apr 14 14:13 openstack-cinder-volume.log -rw-------. 1 root root 19515 Apr 14 14:00 haproxy-bundle.log -rw-------. 1 root root 10509 Apr 14 14:03 ovn-dbs-bundle.log -rw-------. 1 root root 6451 Apr 14 14:00 redis-bundle.log 2) Deploy a composable HA overcloud from scratch with the patch above and observe that we obtain the stdout on disk. Note that most HA containers log to their usual on-host files just fine, we are mainly missing haproxy logs and/or the kolla startup only of the HA containers. Closes-Bug: #1872734 Change-Id: I4270b398366e90206adffe32f812632b50df615b	2020-04-15 20:10:03 +00:00
Alex Schultz	a566d6b9b8	Add check for bootstrap_node for downcase Downcase in puppet 6.14 throws an error if the input to it is Undef. We can avoid this by checking for a value before trying to downcase. See context https://review.rdoproject.org/r/#/c/26297/ Change-Id: Ib2e97060523a4198a14949a15c9171b56928699c	2020-04-07 14:51:41 -06:00
Damien Ciabrini	e60351ee09	HA: fix rabbitmq readiness check for rabbitmq-server 3.8 In HA profiles, we wait for rabbitmq application readiness by parsing the output of "rabbitmqctl status". This breaks with rabbitmq-server 3.8 which changed the output of that command. Fix our check by using a "rabbitmqctl eval" and by relying on a stable function call rather than parsing output. This approach works for rabbitmq-server 3.6 to 3.8. Change-Id: Id88d0aee74e4b26fd64bbc2da5d0c0fc4bbd6644 Co-Authored-By: Yatin Karel <ykarel@redhat.com> Closes-Bug: #1864962	2020-02-27 16:41:44 +01:00
Michele Baldessari	d766eb81a3	Make the bundle user configurable via hiera Allow all bundles --user option to be overridden as some of them might prefer switching to a non-root user when possible. The ovn-dbs bundle is a bit special because it never specified any user. Hence we default that user to undef and do not set anything. Tested as follows: 1. deployed an overcloud 2. patched it with this change 3. redeployed and and then observed that no HA container has restarted at all 4. verified cinder-volume runs with root by default: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 4204 716 ? Ss 09:01 0:00 dumb-init --single-child -- /bin/bash /usr/local/bin/kolla_start root 7 0.7 0.7 912976 145760 ? S 09:01 1:04 /usr/bin/python3 /usr/bin/cinder-volume --config-file /usr/share/cinder/cinder-dist.conf --config-file /etc/cinder/cinder.conf root 71 0.1 0.6 925800 124640 ? S 09:01 0:14 /usr/bin/python3 /usr/bin/cinder-volume --config-file /usr/share/cinder/cinder-dist.conf --config-file /etc/cinder/cinder.conf 5. added 'tripleo::profile::pacemaker::cinder::volume_bundle::bundle_user: cinder' to the templates and redeployed 6. Observed that cinder-volume got restarted and now runs with cinder user: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND cinder 1 0.0 0.0 4204 804 ? Ss 12:23 0:00 dumb-init --single-child -- /bin/bash /usr/local/bin/kolla_start cinder 7 2.1 0.7 912976 145432 ? S 12:23 0:04 /usr/bin/python3 /usr/bin/cinder-volume --config-file /usr/share/cinder/cinder-dist.conf --config-file /etc/cinder/cinder.conf cinder 64 0.3 0.5 919908 118452 ? S 12:23 0:00 /usr/bin/python3 /usr/bin/cinder-volume --config-file /usr/share/cinder/cinder-dist.conf --config-file /etc/cinder/cinder.conf Change-Id: I985d0d192ef3accf7fdd31503348de80713fded4	2020-01-13 11:40:32 +01:00
Tobias Urdin	1523a4b804	Convert all class usage to relative names Change-Id: Ib2ed745b682cf12f9469a5a64451adcabec400af	2019-12-08 23:23:25 +01:00
Michele Baldessari	bad716070a	Switch HA containers to k8s-file log-driver and make it a parameter Currently in puppet-tripleo for the HA container we hardcode the following: options => "--user=root --log-driver=journald -e KOLLA_CONFIG_STRATEGY=COPY_ALWAYS${tls_priorities_real}", Since at least podman had some changes in terms of supported driver backends (and bugs) it's best if we make this configurable. While we're at it we should also switch to k8s-file as a driver when podman is being used which is what all other containers are using. When docker is the default container_cli we will stick to journald as usual. Tested this on a Train environment and successfully verified that we still see the correct logs in /var/log/containers/.../... Change-Id: I5b1483826f816d11a064a937d59f9a8f468315a5 Closes-Bug: #1853517	2019-11-22 11:36:37 +01:00
Michele Baldessari	3a8c2b0dc7	Make the rabbitmq-ready exec more stringent Currently we use the following command to determine if rabbit is up and running and ready to service requests: rabbitmqctl eval "rabbit_mnesia:is_clustered()." \| grep -q true Now we have occasionally observed that rabbitmqctl policies commands which are executed after said exec['rabbitmq-ready'] will fail. One potential reason is that is_clustered() can return true before the rabbit app is actually running. In fact we can see it does return true even though the app is stopped: ()[root@controller-1 /]$ rabbitmqctl stop_app Stopping rabbit application on node rabbit@controller-1 ... ()[root@controller-1 /]$ rabbitmqctl eval 'rabbit_mnesia:is_clustered().' true Let's switch to a combination of commands that check for the cluster to be up and the rabbitmq app to be running: ()[root@controller-1 /]$ rabbitmqctl stop_app Stopping rabbit application on node rabbit@controller-1 ... ()[root@controller-1 /]$ rabbitmqctl eval 'rabbit_nodes:is_running(node(), rabbit).' false Suggested-By: Bogdan Dobrelya <bdobreli@redhat.com> Closes-Bug: #1835615 Change-Id: I29f779145a39cd16374a91626f7fae1581a18224	2019-08-19 19:56:35 +00:00
Michele Baldessari	f1a593b642	Initial support for tls_priorities We add initial support for being able to specify tls priorities in pacemaker. For bundles this will happen via an env variable because pacemaker_remote is started normally as a process and there is no sourcing of /etc/sysconfig/pacemaker. Tested on both queens and stein. Via a deploy and a redeploy against existing cloud. Observed that: A) We got PCMK_tls_priorities inside /etc/sysconfig/pacemaker with the value that was passed in THT B) Containers had the following env variable set: "PCMK_tls_priorities=normal", The '-e' addition is a noop in case the PCMK_tls_priorities is unset so that we do not change the signature of the resources and hence do not needlessly restart the HA resource. Depends-On: I1971810f6a90f244ed5ced972a5fe7fde29dde86 Change-Id: I703b5a429f48063474aace85bc45d948f5c91435	2019-07-27 07:59:45 +00:00
Jiri Stransky	bac59f433b	Fix rabbitmq staged upgrade Fix the short name overriding, and add long name (fqdn) overriding. Change-Id: Ia152aed696be15119ba5b75177ef82bc786c4b05 Partial-Bug: #1832588	2019-06-28 09:06:11 +00:00
Zuul	1e5c120f48	Merge "RabbitMQ: always allow promotion on HA queue during failover"	2019-06-14 19:40:52 +00:00
Michele Baldessari	610c8d8d41	RabbitMQ: always allow promotion on HA queue during failover When the RabbitMQ experience a rolling restart of its peers, the master of an HA queue fails over from one replica to another. If there are messages sent to the HA queue while some rabbit nodes are restarting, the latter will reconnect as unsynchronized slaves. It can happen that during a rolling restart, all rabbit nodes reconnect as unsynchronized, which prevents RabbitMQ to automatically elect a new Master for failover. This has other side effects on fanout queues and may prevent OpenStack notification to be consumed properly. Change the HA policy to always allow a promotion even when all replicas are unsynchronized. When such rare condition happens, rely on OpenStack client to retry RPC if they need to. Closes-Bug: #1823305 Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com> Change-Id: Id9bdd36aa0ee81424212e3a89185311817a15aee	2019-06-14 10:07:24 +02:00
Jiri Stransky	566703dc27	Fix RabbitMQ locale for CentOS 7 (Puppet part) It seems that CentOS 7 does not have C.UTF-8 locale. Since we need UTF-8-based locale, use en_US.UTF-8 instead. Change-Id: I25d2b9a227a7c5de127bdfd9d2f387be9eea01e0 Partial-Bug: #1823062	2019-04-04 11:14:18 +02:00
Michele Baldessari	a92d1fccc6	Force C.UTF-8 when dealing with rabbitmq When we use rabbitmq 3.7 we might hit the following issue when running rabbitmqctl commands inside containers (as puppet does): Error: Failed to apply catalog: Cannot parse invalid user line: warning: the VM is running with native name encoding of latin1 which may cause Elixir to malfunction as it expects utf8. Please ensure your locale is set to UTF-8 (which can be verified by running "locale" in your shell) This is fundamentally the tripleo version of https://github.com/voxpupuli/puppet-rabbitmq/issues/671 This is a strict requirement coming from Elixir: https://github.com/elixir-lang/elixir/issues/3548 Since containers do not have UTF-8 as a default we have this problem: [root@overcloud-controller-0 ~]# podman exec -it rabbitmq-bundle-podman-0 sh ()[root@overcloud-controller-0 /]$ rabbitmqctl -q list_users warning: the VM is running with native name encoding of latin1 which may cause Elixir to malfunction as it expects utf8. Please ensure your locale is set to UTF-8 (which can be verified by ru nning "locale" in your shell) user tags guest [administrator] ()[root@overcloud-controller-0 /]$ locale LANG= LC_CTYPE="POSIX" LC_NUMERIC="POSIX" LC_TIME="POSIX" LC_COLLATE="POSIX" LC_MONETARY="POSIX" LC_MESSAGES="POSIX" LC_PAPER="POSIX" LC_NAME="POSIX" LC_ADDRESS="POSIX" LC_TELEPHONE="POSIX" LC_MEASUREMENT="POSIX" LC_IDENTIFICATION="POSIX" LC_ALL= Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com> Related-Bug: #1822673 Change-Id: I21ef2e7862f3e5e21812d342b1681f8d5f7f005d	2019-04-02 14:25:07 +02:00
Sofer Athlan-Guyot	48b1775e35	Extra variables to reprovision pacemaker cluster one node at a time. For the upgrade we have to re-provision the controller cluster, one node at a time. Using extra override variable set in hiera we are able to specify to pacemaker which nodes should be added to the cluster. Change-Id: I2f6ef4679265718fbbe8726ee6c81832bc468f3e Implements: blueprint upgrades-with-os	2019-02-12 10:20:48 +01:00
Oliver Walsh	035de7493d	cell_v2 multi-cell - move nova dbsync from nova-api to nova-conductor - nova db is more tightly coupled to conductor/computes - we don't have a nova-api services on a CellController - super-conductor on Controller will sync cell0 db - when additional cell - duplicate service node name hiera for transport_urls on cell stack - nova -> oslo_messaging_rpc_cell_node_names - neutron agent -> oslo_messaging_rpc_node_names - rabbit -> rabbit nodes are cell controllers bp tripleo-multicell-basic Co-Authored-By: Martin Schuppert <mschuppert@redhat.com> Change-Id: I79c1080605611c5c7748a28d2afcc9c7275a2e5d	2019-02-05 09:53:50 +01:00
Michele Baldessari	736d69dad9	Add retries to HA bundles The retry is needed in a composable HA environment because a two nodes might be modifying the CIB at the same time and so we need to retry more than once to get the freshest CIB, modify it and push it back. Currently all HA resources have it but we did not add it in the bundles. While it is a rare race, we should still plug it. Change-Id: Ib9d9c76c83f103e329a9c575ae5c110d5ad3c048 Closes-Bug: #1809223	2019-01-04 12:51:51 +00:00
Michele Baldessari	44985bd42d	Remove some of the excessive rabbitmq bundle logging By removing the pam-systemd optinal session line we get rid of the following line: pam_systemd(su:session): Failed to connect to system bus: No such file or directory It is useless inside a container anyway since the pam_systemd module registers user sessions. By adding a sufficient pam_succeed_if call fo when the user belongs to the rabbitmq group we get rid of the following spurious log: Oct 23 13:52:52 overcloud-controller-0 su: pam_unix(su:session): session opened for user rabbitmq by (uid=0) Oct 23 13:52:54 overcloud-controller-0 su: pam_unix(su:session): session closed for user rabbitmq We do not need this inside a container anyway. In the future (w/ pam_unix 1.2.0 and onwards we will be able to use the quiet option instead). Depends-On: Ic0789da4645a4ee186d82ad7d943de78d4d5c443 Change-Id: Icd199ca4ce4848c971488d8ab69e668add86b150 Related-Bug: #1806451	2018-12-11 16:17:16 +00:00
Michele Baldessari	177d951be3	Allow the container backend to be configurable We added a container backend in puppet-pacemaker via Ia4a7b58d14d80e85d51e98acec1aad2ba90b69de. Let's now let tripleo override it when needed. Tested this via some hiera keys overrides and it works correctly. Change-Id: I610923327462b901840131316a4984c8fe98faaa	2018-11-15 20:41:24 +01:00
Michele Baldessari	c372f5e6a1	Remove restart_flag leftovers for bundles Since the introduction of I62870c055097569ceab2ff67cf0fe63122277c5b "Introduce restart_bundle containers to detect config changes and restart pacemaker resources" we actually use paunch to detect any config changes (by verifying an md5 hash over the generated config files of the service). With this new way of detecting changes there is no need to use the old 'tripleo::pacemaker::resource_restart_flag' method to restart pcmk services. Let's just remove this unused code. Change-Id: Ib12dbe66575e3d54a8ec7d2c72c2b4619bc39b03	2018-10-18 05:46:21 +00:00
Michele Baldessari	f2484a0bf9	Fix up property names in case of mixed case hostnames When deploying a stack that containes mixed-case hostnames the following error might be triggered: Debug: try 15/20: /usr/sbin/pcs -f /var/lib/pacemaker/cib/puppet-cib-backup20180405-8-1sqw3dc property set --node TEST-STACK34-controller-1 redis-role=true Debug: Error: Error: unable to set attribute redis-role Could not map name=TEST-STACK34-controller-1 to a UUID while the name in the cluster is test-stack34-controller-1 This used to work pre-bundles because we used the facter provided $::hostname variable which was lower-cased for us. With bundles we switched to setting cluster properties from the service bootstrap nodes and so we used the '<service>_short_node_names' hiera key which might contain mixed-case hostnames. In order to fix this we just downcase() the short_node_names hiera string that we loop on so we can get the same behaviour we had on bare metal. Tested on an env with mixed-case hostnames: [root@uppercaseovercloud-controller-0 keystone]# hiera -c /etc/puppet/hiera.yaml rabbitmq_short_node_names ["UPPERCASEOverCloud-controller-0", "UPPERCASEOverCloud-controller-1", "UPPERCASEOverCloud-controller-2"] Cluster pcs properties were set correctly: [root@uppercaseovercloud-controller-0 keystone]# pcs property \|grep rabbitmq uppercaseovercloud-controller-0: galera-role=true haproxy-role=true rabbitmq-role=true redis-role=true rmq-node-attr-last-known-rabbitmq=rabbit@uppercaseovercloud-controller-0 uppercaseovercloud-controller-1: galera-role=true haproxy-role=true rabbitmq-role=true redis-role=true rmq-node-attr-last-known-rabbitmq=rabbit@uppercaseovercloud-controller-1 uppercaseovercloud-controller-2: galera-role=true haproxy-role=true rabbitmq-role=true redis-role=true rmq-node-attr-last-known-rabbitmq=rabbit@uppercaseovercloud-controller-2 Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com> Depends-On: Ie240b8a4217827dd8ade82479a828817d63143ba Closes-bug: #1773219 Change-Id: I5bd49c4a1b13b2310f8a1173aa6b86abfa5dab3d	2018-05-28 10:28:14 +02:00
Zuul	1a73b868ce	Merge "Support separate oslo.messaging services for RPC and Notifications"	2018-04-29 13:02:17 +00:00
Zuul	408db62e22	Merge "Support both rabbitmq and oslo.messaging service nodes"	2018-04-07 00:39:46 +00:00
Andrew Smith	c04557fba4	Support separate oslo.messaging services for RPC and Notifications This commit introduces separate oslo.messaging services in place of a single rabbitmq server. This enables the separation of rpc and notifications, the continued use of single rabbitmq server as well as the use of alternative oslo.messaging drivers/backends. This patch: * adds oslo_messaging_* hiera parameters * update rabbitmq and qdrourterd services * add release note Depends-On: I03e99d35ed043cf11bea9b7462058bd80f4d99da Depends-On: I934561612d26befd88a9053262836b47bdf4efb0 Change-Id: Ie181a92731e254b7f613ad25fee6cc37e985c315	2018-03-20 12:55:02 -04:00
Jiri Stransky	d8d86cfe68	Conventional log directories for pacemaker bundles Use /var/log/containers/<service> instead of /var/log/<service>, as the rest of the containerized services. Change-Id: Id5760c16260de991ff95168c76186edc113752c8 Depends-On: Icb311984104eac16cd391d75613517f62ccf6696 Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com> Closes-Bug: #1731969	2018-03-19 12:55:12 +00:00
Andrew Smith	79ccad4b8d	Support both rabbitmq and oslo.messaging service nodes This commit selects either the rabbitmq hosts or the hosts associated to oslo.messaging rpc and notify services. This is required for the transition of t-h-t to the use of the separated oslo.messaging service backends. This patch: select rpc and notify hosts from rabbitmq or oslo_messaging modify qdrouterd inter-router link port update qdr unit spec add release note Needed-By: I934561612d26befd88a9053262836b47bdf4efb0 Change-Id: I154e2fe6f66b296b9b643627d57696e5178e1815	2018-03-16 18:16:42 -04:00
Damien Ciabrini	1cfecc39dc	Fix rabbitmq-ready check for single node HA deployments The current rabbitmq-ready exec waits for rabbitmq to become clustered before it allows user creation. Unfortunately this doesn't work when the deployment contains a single node, because rabbit doesn't trigger the clustering mode at all. Set the exec test according to the number of rabbit nodes, in order to check for cluster state only when necessary. Closes-Bug: #1741345 Change-Id: I24e5e344b7f657ce5d42a7c7c45be7b5ed5e6445 Co-Authored-By: John Eckersberg <jeckersb@redhat.com>	2018-01-05 10:14:48 +00:00
Michele Baldessari	2f33d74173	Fix up the rabbitmq-ready check So the current rabbitmq-ready exec has a few unexpected problems: 1) The notify mechanism is not being called, but after discussion we're comfortable in calling this all the time, just like we do this for galera. 2) Calling rabbitmqctl inside a container is problematic because the mere invocation of the cluster_status command will actually spawn an epmd process which will take the epmd port and which will subsequently make the rabbitmq-bundle started by pacemaker fail to form a cluster. For this reason (working around the rabbitmqctl issue is potentially doable once we upgrade to erlang 19.x but not with older versions) it is vital that this container gets spawned with /bin/epmd nooped to /bin/true. We now only proceed after rabbit tells us that it is part of a cluster. Just checking for rabbit being up is not enough because if the user gets created before the node joins a cluster, it might not be replicated (depending on the timing). Partial-Bug: #1739026 Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com> Co-Authored-By: John Eckersberg <jeckersb@redhat.com> Change-Id: I54c541d86782665ae0f689428a16edc155f87993 Depends-On: Ie74a13a6c8181948900ea0de8ee9717f76f3ce79	2017-12-20 07:24:29 +01:00
Michele Baldessari	b2dc580a3f	Make sure rabbitmq is fully up before creating any rabbitmq resources Right now after creating the rabbitmq pacemaker resource, we have no guarantee that rabbit will be up. Let's add the same mechanism we use today with the galera-ready exec resource. This gives us the guarantee that once the resource has been created it is up and we can actually create rabbitmq users (some 3rd party plugins do that, not stock TripleO). Specifically, we probe that the '{rabbit,' app shows up in the status, so we can guarantee that rabbit is running before invoking any other rabbitmqctl commands. Change-Id: Ib37eb2e591f97de54ee6449817ae8d70c6541753 Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>	2017-11-17 07:26:55 +01:00
Alex Schultz	3c58543678	Revert "Revert "Set meta container-attribute-target=host attribute"" This reverts commit `1681d3bceb`. NOTE: This needs to be tested against scenario004-containers before merging. This is needed because when we run bundles we actually want to store attributes on a per-node basis and not on a per-bundle basis. By activating this attribute pacemaker will pass some extra OCS_RESKEY_CRM_meta attributes that will help us in this decision. We can merge this once we have packages for pacemaker and resource-agents releases that contain the necessary fixes. Proper pacemaker and resource-agents are now in the repo [1] so we can merge it and backport it to pike. [1] https://buildlogs.centos.org/centos/7/cloud/x86_64/openstack-pike/ Closes-Bug: #1713007 Change-Id: Ie968470126833939c19223f04db29556e550673d	2017-10-30 16:12:46 +00:00
John Trowbridge	1681d3bceb	Revert "Set meta container-attribute-target=host attribute" This patch broke the containers scenario004 test because it relies on a newer mariadb container than has actually passed CI at this time. To revert this revert, we need to make sure we test scenario004-containers against that patch. This reverts commit `6bcb011723`. Closes-Bug: 1721497 Change-Id: I34c7c388eed94db1735c45e26661a0af8cdce8e9	2017-10-06 13:03:04 +00:00
Michele Baldessari	6bcb011723	Set meta container-attribute-target=host attribute This is needed because when we run bundles we actually want to store attributes on a per-node basis and not on a per-bundle basis. By activating this attribute pacemaker will pass some extra OCS_RESKEY_CRM_meta attributes that will help us in this decision. We can merge this once we have packages for pacemaker and resource-agents releases that contain the necessary fixes. Proper pacemaker and resource-agents are now in the repo [1] so we can merge it and backport it to pike. [1] https://buildlogs.centos.org/centos/7/cloud/x86_64/openstack-pike/ Closes-Bug: #1713007 Change-Id: I0dd06e953b4c81f217d0f4199b2337e4c3358086	2017-09-28 14:05:21 +02:00
Damien Ciabrini	86a3261b4d	Enable TLS configuration for containerized RabbitMQ In non-containerized deployments, RabbitMQ can be configured to use TLS for serving and mirroring traffic. Fix the creation of the rabbitmq bundle resource to enable TLS when configured. The key and cert are passed as other configuration files and must be copied by Kolla at container startup. Change-Id: Ia64d79462de7012e5bceebf0ffe478a1cccdd6c9 Partial-Bug: #1709558	2017-08-09 07:51:58 +00:00
Michele Baldessari	1da0b51ecc	Fix up the control-port for rabbitmq bundles Mistakenly this was set to 3121 which is the same port that pacemaker remote uses. Move this to 3122 which was the plan all along. Also fix a wrong port comment in redis and mysql at the same time. Change-Id: Iccca6a53a769570443091577c7d86f47119d9cbb	2017-07-21 10:46:48 +02:00
Martin André	1e90178298	Leverage kolla config_files to copy config into containers This solves a problem with bind-mounts when the containers are holding files descriptors open. At the same time this makes the template more robust to puppet changes since new config files will be available in the containers without needing to update the templates. Closes-Bug: #1698323 Change-Id: I857c94ba5f7f064d7c58df621ec5d477654b9166 Depends-On: I78dcec741a941dc21adba33ba33a6dc6ff1d217c	2017-07-12 09:56:56 +00:00
Steve Baker	94f13e6608	Ensure hiera step value is an integer The step is typically set with the hieradata setting an integer value: {"step": 1} However it would be useful for the value to be a string so that substitutions are possible, for example: {"step": "%{::step}"} This change ensures the step parameter defaults to an integer by calling Integer(hiera('step')) This change was made by manually removing the undef defaults from fluentd.pp, uchiwa.pp, and sensu.pp then bulk updating with: find ./ -type f -print0 \|xargs -0 sed -i "s/= hiera('step')/= Integer(hiera('step'))/" Change-Id: I8a47ca53a7dea8391103abcb8960a97036a6f5b3	2017-06-14 14:31:52 +12:00
Michele Baldessari	b10adec303	Make sure the resource bundles use a location_rule In composable HA we bind resources to nodes that have special node properties. We need to do this also for bundle resources otherwise there is a potential race where the bundle might be started on nodes where it is not supposed to during a small window of time. Tested with the depends-on and correctly obtained a containerized composable HA deployment: Docker container set: rabbitmq-bundle [192.168.24.1:8787/tripleoupstream/centos-binary-rabbitmq:latest] rabbitmq-bundle-0 (ocf:💓rabbitmq-cluster): Started overcloud-rabbit-0 rabbitmq-bundle-1 (ocf:💓rabbitmq-cluster): Started overcloud-rabbit-1 rabbitmq-bundle-2 (ocf:💓rabbitmq-cluster): Started overcloud-rabbit-2 Docker container set: galera-bundle [192.168.24.1:8787/tripleoupstream/centos-binary-mariadb:latest] galera-bundle-0 (ocf:💓galera): Master overcloud-galera-0 galera-bundle-1 (ocf:💓galera): Master overcloud-galera-1 galera-bundle-2 (ocf:💓galera): Master overcloud-galera-2 Docker container set: redis-bundle [192.168.24.1:8787/tripleoupstream/centos-binary-redis:latest] redis-bundle-0 (ocf:💓redis): Master overcloud-controller-0 redis-bundle-1 (ocf:💓redis): Slave overcloud-controller-1 redis-bundle-2 (ocf:💓redis): Slave overcloud-controller-2 ip-192.168.24.11 (ocf:💓IPaddr2): Started overcloud-controller-0 ip-10.0.0.7 (ocf:💓IPaddr2): Started overcloud-controller-1 ip-172.16.2.11 (ocf:💓IPaddr2): Started overcloud-controller-2 ip-172.16.2.9 (ocf:💓IPaddr2): Started overcloud-controller-0 ip-172.16.1.6 (ocf:💓IPaddr2): Started overcloud-controller-1 ip-172.16.3.7 (ocf:💓IPaddr2): Started overcloud-controller-2 Docker container set: haproxy-bundle [192.168.24.1:8787/tripleoupstream/centos-binary-haproxy:latest] haproxy-bundle-docker-0 (ocf:💓docker): Started overcloud-controller-0 haproxy-bundle-docker-1 (ocf:💓docker): Started overcloud-controller-1 haproxy-bundle-docker-2 (ocf:💓docker): Started overcloud-controller-2 Depends-On: I44449861cbfe56304b8829c9ca10fd648353b3ae Change-Id: I48fb490040497ba08cae19937159c0efdf99e3f8	2017-06-09 21:18:27 +02:00

1 2

51 Commits