Commit Graph

41 Commits

Author SHA1 Message Date
Sven Kieske 984cb0a754 mariadb: fix cluster recovery
sometimes cluster recovery didn't work
because we only look for the sequence number in the last 200 lines
of the log file.

fix this by ingesting the complete file and only register the last
sequence number we find.

Closes-Bug: 1821173

Change-Id: Iea2661c9d5d262cf99edd5f5b567f252607a0003
Signed-off-by: Sven Kieske <kieske@osism.tech>
2024-04-22 11:05:11 +02:00
Martin Hiner a13d83400f Rename kolla_docker to kolla_container
Changes name of ansible module kolla_docker to
kolla_container.

Change-Id: I13c676ed0378aa721a21a1300f6054658ad12bc7
Signed-off-by: Martin Hiner <m.hiner@partner.samsung.com>
2023-11-15 13:54:57 +01:00
Michal Nasiadka cea076f379 Introduce oneshot docker_restart_policy
docker_restart_policy: no causes systemd units to not get created
and we use it in CI to disable restarts on services.

Introducing oneshot policy to not create systemd unit for oneshot
containers (those that are running bootstrap tasks, like db
bootstrap and don't need a systemd unit), but still create systemd
units for long lived containers but with Restart=No.

Change-Id: I9e0d656f19143ec2fcad7d6d345b2c9387551604
2023-11-14 15:17:50 +00:00
Ivan Halomi 9a3f463345 Add support of podman deployment
This change adds basic deployment based on Podman
container manager as an alternative to Docker.

Signed-off-by: Ivan Halomi <i.halomi@partner.samsung.com>
Signed-off-by: Martin Hiner <m.hiner@partner.samsung.com>
Signed-off-by: Petr Tuma <p.tuma@partner.samsung.com>
Change-Id: I2b52964906ba8b19b8b1098717b9423ab954fa3d
Depends-On: Ie4b4c1cf8fe6e7ce41eaa703b423dedcb41e3afc
2023-10-20 17:51:52 +02:00
Ivan Halomi 910f9bd36f Usage of kolla_container_engine variable instead of docker
First part of patchset:
 https://review.opendev.org/c/openstack/kolla-ansible/+/799229/
in which was suggested to split patch into smaller ones.

This implements kolla_container_engine variable
in command calls of docker,so later on it can be
also used for podman without further change.

Signed-off-by: Ivan Halomi <i.halomi@partner.samsung.com>
Change-Id: Ic30b67daa2e215524096ad1f4385c569e3d41b95
2022-10-28 09:15:55 +02:00
Michal Nasiadka 1aac65de0c Fix issues introduced by ansible-lint 6.6.0
mainly jinja spacing and jinja[invalid] related

Change-Id: I6f52f2b0c1ef76de626657d79486d31e0f47f384
2022-09-21 14:34:54 +00:00
Michal Arbet 63d72ea7e8 Use Docker healthchecks for mariadb-server service
This change enables the use of Docker healthchecks for
mariadb-server service.

Depends-On: https://review.opendev.org/c/openstack/kolla/+/805613
Change-Id: I893687a0501ea0f281b879df3141a354bff9eca6
2022-08-22 08:27:28 +00:00
Piotr Parczewski 377b6b4251 Improve MariaDB restore procedure
Change-Id: Id05122cb564f3e7475b2b76da8c111e2c72601b8
2022-01-11 21:47:21 +01:00
Michal Arbet 09b3c6ca07 Refactor mariadb to support shards
Kolla-ansible is currently installing mariadb
cluster on hosts defined in group['mariadb']
and render haproxy configuration for this hosts.

This is not enough if user want to have several
service databases in several mariadb clusters (shards).

Spread service databases to multiple clusters (shards)
is usefull especially for databases with high load
(neutron,nova).

How it works ?

It works exactly same as now, but group reference 'mariadb'
is now used as group where all mariadb clusters (shards)
are located, and mariadb clusters are installed to
dynamic groups created by group_by and host variable
'mariadb_shard_id'.

It also adding special user 'shard_X' which will be used
for creating users and databases, but only if haproxy
is not used as load-balance solution.

This patch will not affect user which has all databases
on same db cluster on hosts in group 'mariadb', host
variable 'mariadb_shard_id' is set to 0 if not defined.

Mariadb's task in loadbalancer.yml (haproxy) is configuring
mariadb default shard hosts as haproxy backends. If mariadb
role is used to install several clusters (shards), only
default one is loadbalanced via haproxy.

Mariadb's backup is working only for default shard (cluster)
when using haproxy as mariadb loadbalancer, if proxysql
is used, all shards are backuped.

After this patch will be merged, there will be way for proxysql
patches which will implement L7 SQL balancing based on
users and schemas.

Example of inventory:

[mariadb]
server1
server2
server3 mariadb_shard_id=1
server4 mariadb_shard_id=1
server5 mariadb_shard_id=2
server6 mariadb_shard_id=3

Extra:
wait_for_loadbalancer is removed instead of modified as its role
is served by check already. The relevant refactor is applied as
well.

Change-Id: I933067f22ecabc03247ea42baf04f19100dffd08
Co-Authored-By: Radosław Piliszek <radoslaw.piliszek@gmail.com>
2021-04-07 23:19:42 +02:00
fudunwei 068f3fea50 Negative seqno need to be considered when comparing seqno
Need to consider Negative seqno to compare in some cases,
but the task does not support to do that, we need to make it work.

1.we use mariabackup to restore datas on control1, delete the
mariadb data on control2 and control3, and then use cluster recovery,
 as a result that the seqno of the other two nodes will be '-1'.

2. add one more control node into our existing mariadb cluster,
and then use cluster recovery, the seqno of the new node will be '-1'.

Change-Id: Ic1ac8656f28c3835e091637014f075ac5479d390
2021-01-29 13:46:37 +08:00
fudunwei aec42f0f5f Correct spell error
correct spell 'Cheching' to 'Checking'

Change-Id: I3ceb6960c3b38f371d0d4163ee37d4b34e61f401
2021-01-25 09:58:12 +08:00
Mark Goddard f903d774af Fix mariadb_recovery when mariadb container is missing
Mariadb recovery fails if a cluster has previously been deployed, but any of
the mariadb containers do not exist.

Steps to reproduce
==================

* Deploy a mariadb galera cluster
* Remove the mariadb container from at least one host (docker rm -f mariadb)
* Run kolla-ansible mariadb_recovery

Expected results
================

The cluster is recovered, and a new container deployed where necessary.

Actual results
==============

The task 'Stop MariaDB containers' fails on any host where the container does
not exist.

Solution
========

This change fixes the issue by using the 'ignore_missing' flag for kolla_docker
with the stop_container action. This means the task does not fail when the
container does not exist. It is also necessary to swap some 'docker cp'
commands for 'cp' on the host, using the path to the volume.

Closes-Bug: #1907658

Change-Id: Ibd4a6adeb8443e12c45cbab65f501392ffb16fc7
2020-12-10 12:27:25 +00:00
Michal Nasiadka 026f5cc48a Custom haproxy script for monitoring galera
Depends-On: https://review.opendev.org/710217/

Change-Id: I85652f23e487c40192106d23f2cdd45a3077deca
2020-05-20 13:02:44 +02:00
LinPeiWen 8a206699d4 mariadb container name variable
mariadb container name variable is fixed in some places,
but in the defaults directory, mariadb container_name variable
is variable. If the mariadb container_name variable is changed
during deployment, it will not be assigned to container_name,
but a fixed 'mariadb' name.

Change-Id: Ie8efa509953d5efa5c3073c9b550be051a7f4f9b
2020-03-25 01:17:29 -04:00
Zuul 67a9d289b4 Merge "Fix multiple issues with MariaDB handling" 2020-01-21 09:29:59 +00:00
Radosław Piliszek 9f14ad651a Fix multiple issues with MariaDB handling
These affected both deploy (and reconfigure) and upgrade
resulting in WSREP issues, failed deploys or need to
recover the cluster.

This patch makes sure k-a does not abruptly terminate
nodes to break cluster.
This is achieved by cleaner separation between stages
(bootstrap, restart current, deploy new) and 3 phases
for restarts (to keep the quorum).

Upgrade actions, which operate on a healthy cluster,
went to its section.

Service restart was refactored.

We no longer rely on the master/slave distinction as
all nodes are masters in Galera.

Closes-bug: #1857908
Closes-bug: #1859145
Change-Id: I83600c69141714fc412df0976f49019a857655f5
2020-01-15 20:15:09 +01:00
Mark Goddard 5fb10e08fe Ansible lint: use command module instead of shell
Change-Id: Ibf40216b847f103e383f19fe1ef608a75fcfd452
Co-Authored-By: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
2020-01-13 10:45:10 +00:00
Mark Goddard a6cb008c54 Ansible lint: task names
Change-Id: Iecbc2fe5fa3391dca5a3cc7e575314b95942114b
Co-Authored-By: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
2020-01-13 10:38:12 +00:00
Michal Nasiadka 1009931162 Change local_action to delegate_to: localhost
As part of the effort to implement Ansible code linting in CI
(using ansible-lint) - we need to implement recommendations from
ansible-lint output [1].

One of them is to stop using local_action in favor of delegate_to -
to increase readability and and match the style of typical ansible
tasks.

[1]: https://review.opendev.org/694779/

Partially implements: blueprint ansible-lint

Change-Id: I46c259ddad5a6aaf9c7301e6c44cd8a1d5c457d3
2019-11-22 15:04:44 +00:00
Mark Goddard f979ae1f8e Fix restart policy after MariaDB recovery
After performing a recovery of MariaDB, the mariadb containers are left
without a restart policy. This leaves them unable to recover from the
crash of a single galera node. There is another issue, in that the
'master' node is left in a bootstrap configuration, with the
--wsrep-new-cluster argument configured as BOOTSTRAP_ARGS.

This change fixes these issues by removing the restart policy of 'no'
from the 'slave' containers, and recreating the master container without
the restart policy or bootstrap arguments.

Change-Id: I36c875611931163ca2c29ae93b71d3af64cb197c
Closes-Bug: #1851594
2019-11-07 10:08:57 +00:00
Zuul d8e961eeaa Merge "Wait for MariaDB to be accessible via HAProxy" 2019-08-27 12:58:06 +00:00
Scott Solkhon 03cd7eb356 Wait for MariaDB to be accessible via HAProxy
Explicitly wait for the database to be accessible via the load balancer.
Sometimes it can reject connections even when all database services are up,
possibly due to the health check polling in HAProxy.

Closes-Bug: #1840145
Change-Id: I7601bb710097a78f6b29bc4018c71f2c6283eef2
2019-08-15 10:00:36 +00:00
Radosław Piliszek 6a737b1968 Fix handling of docker restart policy
Docker has no restart policy named 'never'. It has 'no'.
This has bitten us already (see [1]) and might bite us again whenever
we want to change the restart policy to 'no'.

This patch makes our docker integration honor all valid restart policies
and only valid restart policies.
All relevant docker restart policy usages are patched as well.

I added some FIXMEs around which are relevant to kolla-ansible docker
integration. They are not fixed in here to not alter behavior.

[1] https://review.opendev.org/667363

Change-Id: I1c9764fb9bbda08a71186091aced67433ad4e3d6
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
2019-07-18 13:39:06 +00:00
Mark Goddard 86f373a198 Fixes for MariaDB bootstrap and recovery
* Fix wsrep sequence number detection. Log message format is
  'WSREP: Recovered position: <UUID>:<seqno>' but we were picking out
  the UUID rather than the sequence number. This is as good as random.

* Add become: true to log file reading and removal since
  I4a5ebcedaccb9261dbc958ec67e8077d7980e496 added become: true to the
  'docker cp' command which creates it.

* Don't run handlers during recovery. If the config files change we
  would end up restarting the cluster twice.

* Wait for wsrep recovery container completion (don't detach). This
  avoids a potential race between wsrep recovery and the subsequent
  'stop_container'.

* Finally, we now wait for the bootstrap host to report that it is in
  an OPERATIONAL state. Without this we can see errors where the
  MariaDB cluster is not ready when used by other services.

Change-Id: Iaf7862be1affab390f811fc485fd0eb6879fd583
Closes-Bug: #1834467
2019-07-05 09:20:34 +00:00
Mark Goddard b123bf6621 Use become for all docker tasks
Many tasks that use Docker have become specified already, but
not all. This change ensures all tasks that use the following
modules have become:

* kolla_docker
* kolla_ceph_keyring
* kolla_toolbox
* kolla_container_facts

It also adds become for 'command' tasks that use docker CLI.

Change-Id: I4a5ebcedaccb9261dbc958ec67e8077d7980e496
2019-06-06 19:04:58 +01:00
Mark Goddard a4bb8567da Fix up config file permissions on the host
Several config file permissions are incorrect on the host. In general,
files should be 0660, and directories and executables 0770.

Change-Id: Id276ac1864f280554e98b937f2845bb424d521de
Closes-Bug: #1821579
2019-04-02 17:23:31 +01:00
caoyuan 471985dc2c Update usage of "|" to "is"
With the more recent versions of ansible, we should now use
"is" instead of the "|"

This should update it.

Change-Id: I6fba56fca182349972e8b0ee5452b37aa4090e0c
2018-08-13 12:40:10 +05:30
Ha Manh Dong 30be04ea91 Specify 'become' for all tasks that use kolla_docker module
Add become to all tasks that use the module "kolla_docker"

Change-Id: I4309c4011687b88ec31d739fd8f834fe2326ff10
Partial-Implements: blueprint ansible-specific-task-become
2018-06-08 12:39:24 +00:00
Eduardo Gonzalez 8a63c801c6 Fix mariadb recover seqnum regex
Regex used to find the recover seqnum partition is not
returning the real num id rather a None.
Task fails due seqnum[0] is not iterable.

Change-Id: I1be55b6ebfc17c6d423e638662ec2a9f4b9b49a2
Closes-Bug: #1752128
2018-04-12 09:03:34 +02:00
Eduardo Gonzalez ea1a1dee0d Verify YAML syntax in gates
This patchset implements yamllint test to all *.yml
files.

Also fixes syntax errors to make jobs to pass.

Change-Id: I3186adf9835b4d0cada272d156b17d1bc9c2b799
2018-03-26 17:56:22 +02:00
Alexandru Bogdan Pica 465bc9ee1c Improve mariadb_recovery
The purpose of this change is to improve upon
https://review.openstack.org/#/c/531122/

- Moved vars inside the defaults/main.yml file
- Made the regex for the lineinfile safer

Change-Id: Id581c0b36f3d4bd61d3627b8364b79296b967387
Closes-Bug: 1746567
Related-Bug: 1682153
2018-02-01 00:01:44 +02:00
Marian Tudosoiu cead8ec623 Rework mariadb recovery tasks
In recover_cluster.yaml playbook the task to find the highest
seqno/Global Transaction ID is no longer relying only on grastate.dat
Instead it now follows the recommendations from galera cluster website
http://galeracluster.com/documentation-webpages/restartingcluster.html

Closes-Bug: 1682153

Change-Id: I5fc3eaa8baee659576c4c39aef9cfd351c8e9af7
2018-01-31 01:09:09 +02:00
Benjamin Diaz 6f64549e1b Set bash as shell when executing mariadb recovery task
Added 'executable' argument to the shell action in the
'Comparing seqno value' task in the cluster recovery playbook.

Change-Id: I3e96a4a76b44ffb558b9a41cde16e66a8d0fab1a
Closes-Bug: #1729603
2017-11-27 15:07:38 -03:00
caoyuan 5cd55bf236 Optimize reconfiguration for mariadb
Change-Id: I278609f9832955849bc9381120a1b260f5a03f1b
Partially-implements: blueprint better-reconfigure
2017-07-22 08:50:08 +08:00
Duong Ha-Quang 41686edba9 Replace always_run by check_mode
always_run is deprecated and removed in Ansible 2.4
check_mode is introduced in Ansible 2.2 and Kolla-ansible bump Ansible to
2.2.0 so it's safe to replace always_run by check_mode now.

Change-Id: Id1028d38b7bde30a6afe17b319dcdc77907914ab
Closes-Bug: #1643633
Implements: blueprint migrate-to-ansible-2-2-0
2017-06-15 08:10:33 +00:00
Eduardo Gonzalez f56d8e58ba Revert "updating-deprecated-ansible-modules"
check_mode option is included in Ansible 2.2.
Using in our playbooks mean that any other version before
Ansible 2.2 can be used

This reverts commit 529f202d00.

Change-Id: I3af96290443d760346264e6d994fd2a44de65543
Closes-Bug: #1644828
2016-11-25 14:41:38 +00:00
gardlt 529f202d00 updating-deprecated-ansible-modules
* update ceph tasks
* update mariadb tasks
Closes-Bug: #1643633

Change-Id: Ib81789574843edba6e33394a7f66a2e8077075eb
2016-11-21 21:52:26 -06:00
Jeffrey Zhang 1bcb139392 Choose node with largest seqno number for mariadb recovery
When all mariadb nodes are stopped gracefully, mariadb galera will
write it's last executed position into the grastate.dat file. Need find
the node with largest seqno number in that file and recovery from that
node.

Closes-Bug: #1627717
Change-Id: I6e97c190eec99c966bffde0698f783e519ba14bd
2016-10-09 12:08:58 +00:00
Jeffrey Zhang 0fcee87549 map the host localtime to the container
Closes-Bug: #1577148
Change-Id: I636cefc63cf532434a41af3898b63dffa711e280
2016-05-03 09:27:51 +08:00
SamYaple afce10dea4 Fix mysql bootstrap
The lightsout recover patch broke multinode mysql. Also the lightsout
recovery didnt probably pass the --wsrep-new-cluster flag. This
updates the mariadb bootstrap to work with multinode again.

Closes-Bug: #1559480
Related-Id: I903c3bcd069af39814bcabcef37684b1f043391f
Change-Id: I1ec91a8b2144930ea8f04cc1c201b53712352e4e
2016-03-21 12:23:00 +00:00
SamYaple 2aaaed770e MariaDB lights out recovery
This playbook only matters for multinode since AIO can recover from
power outage without additional configuration.

DocImpact
Implements: blueprint mariadb-lights-out
Change-Id: I903c3bcd069af39814bcabcef37684b1f043391f
2016-03-16 15:53:44 +00:00