Create node is_pc flag before starting to check if there
is more than one of those flags. Thus, we avoid race condition
when there is 0 is_pc flags and galera starts with --wsrep-new-cluster
on 2 nodes.
We set it before the check and, as setting them is synchronous through
Pacemaker CIB, in that case when >1 nodes attempt to bootstrap with
--wsrep-new-cluster, only one node will see <= 1 is_pc flags. Others
will see more than one and fail and reattempt to start. At that point
one of the nodes will already be bootstrapped, thus reelection will not
be triggered and the section of bootstrap will be skipped
Change-Id: I82a71132eef7877ac7ab1ed04263044b3b1e8d9b
Closes-bug: #1617400
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
* more intelligent calculation of default port provider
* additional check of patchcord existing in the bridge
OVS provider should be used for fake (OVS) ports, inserted
into OVS bridge. OVS provider should be used for inserting
native linux interface into OVS bridge.
But if we try to add native linux subinterface into
OVS bridge we should use lnx provider for creating
such port correctly.
Change-Id: Ib76b3340eca1ea22095da1cfffe7c224a139fb71
Closes-bug: #1682835
Previously, the backup restoration phase was not considered part of the
State Snapshot Transfer, as only the backup creation and transportation
processes are checked for this purpose. To cover the missing phase as
well, it is more reasonable to monitor the appropriate process that
controls the entire State Snapshot Transfer.
Closes-Bug: 1660275
Change-Id: Ie98af501c1cd130098381a8463452f892898470b
Signed-off-by: Gabor Orosz <gabor.orosz@ericsson.com>
According to [1], if file name has no extension,
the server adds an extension of ".err".
It causes two different issues.
The first one is impossibility to create /dev/stdout.err file
in newer versions of mysql (5.6.35 at least).
The second one is a potential OOM in case of intensive stderr flow from mysql.
Because the /dev mount point is a devtmpfs which is in-memory pseudo file system.
[1] https://dev.mysql.com/doc/refman/5.6/en/server-options.html#option_mysqld_log-error
Change-Id: I430d8dd085433e8cb84fdf0595b8357758d9ea75
- mysqld_safe helped to restart mysqld thus the convergence of cluster
was a few times faster
- Added logging for more convinient debugging
Closes-Bug: #1652008
Change-Id: I6ee1bf451f7c954aeb7a2974bb980054422454b8
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
For some reason we can have ip_forward reset sometimes
in the vrouter namespace, which leads to connectivity tests
issues. In order to fix this we move ip_forward setting
to get_ns() function to monitor and set it each monitor operation
Change-Id: I2d10d465259adc1f30161fee488feef4179b5c70
Related-bug: #1654967
1. Do not set routing on status command - this is useless and
destructive
2. Save default routes into a separate file and restore them
after flush
Change-Id: Ia128979920e054343b2ac05e437683772c81731a
Closes-bug: #1654967
With change Iaa4855d769fe1e0203fcfb9981413273e0e4dda2
we detect whether the node is running as a primary component
while it is not master. While it is a good solution, sometimes
we face race condition when the node which is a 'master' gets lower
sequence number due to other nodes updating their gtid and the same
time. Although it happens rarely and mostly on the slow or overloaded
environemnts, it leads to redundant mysql restarts and service
downtime for OpenStack APIs.
The proper fix would be to use master-slave resource and corresponding
script, but this is a far to big change for the bug under question.
The solution proposed checks if the node is a primary component during
start and monitor operations and also checks for number of currently
running primary components by setting and querying an additional
attribute `is_pc`. It triggers monitor failure only when the node
is not running with the 'master' GTID and is a primary component
and if there is more than one primary components.
Misc: fix functions return codes to reflect shell 'true'
and 'false' numeric values.
Change-Id: Id3ea32347ed37a6efffd3ee85dfb3110b2e8c8ca
Closes-bug: #1651982
1. Both scripts do not flush ip route table for non-local
routes, which makes them non-idempotent
2. Haproxy did not add routes on reload
Change-Id: I498870b45ac47e6d6d8808d18964f3c2777c930c
Closes-bug: #1652765
Pacemaker controls mysqld, thus we don't need mysqld_safe wrapper which
does the same. This should help to get statuses for OCF scripts on very
high loaded systems
Change-Id: I73649d60c3cc08cbe696c6bc97ee5aa0ad430908
Related-Bug: 1636841
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
Since keystone client CLI was substituted by openstackclient,
we should use openstackclient CLI to check keystone service.
Change-Id: Ia36ee208346fac73b036f86e3dd5f575013985cb
Closes-Bug: #1638440
If Type=forking in systemd's service unit file, systemd requires started
application to fork a child process and exit. This change makes rabbit-fence
work under this situation. It does not has any negative effect on ubuntu's
/sbin/init, execept fork one more time. In a word, this change makes rabbit-fence
more robust and portable in a very limited cost.
Change-Id: Ia0dfc204ba6879bd4252585a719c8ad9afac7daa
Closes-bug: #1633715
There was a typo near !/${OCF_RESOURCE_INSTANCE}:/
Also it eliminates unnecessary forks as node score
is placed in last column of a string.
Change-Id: I3c4efacb1cf6038a14cb6834ed94903bc9f92466
Closes-Bug: #1623935
Signed-off-by: Pavel Glushchak <pglushchak@virtuozzo.com>
${OCF_RESKEY_pid_file} is undefined,
as result mysqld always stops by SIGKILL
Closes-bug: #1626616
Change-Id: Ie5176c74caf55a6db1f84f9733f1a72adc7ea9f2
As per OpenStack licensing guide lines [1]:
[H102 H103] Newly contributed Source Code should be licensed under
the Apache 2.0 license.
[H104] Files with no code shouldn't contain any license header nor
comments, and must be left completely empty.
[1] http://docs.openstack.org/developer/hacking/#openstack-licensing
Change-Id: Id37d20c647b9f4e580732cf5d600ec9c53fdb7d8
Keystone service is run as wsgi service behind Apache Httpd
in Newton. So, we don't need to restart it, but instead we
can restart httpd service.
Change-Id: Ib3861cf6a57f6c6850b66162cbd29535e93f8da1
Closes-Bug: #1622355
Right now heat can support convergence mode, means we will give it
chances to raise bug 1599104. We simply add policies for worker queues.
This fix services with interruption signal and leave the target queues
open. The number of queues keep growing by repeat above action.
This patch add a expiration for queues. allow queues to be close as
long as they didn't been used for a long period of time(1 hour).
This fix must solve problem with growing number of RabbitMQ queues.
Change-Id: Icda32000f391780c4e3d5d3ebcc519bf853283b7
Related-Bug: #1599104
This script updates master node in fuel 9.x releases.
It upgrades system packages, runs fuel/examples/deploy.sh
as for boostrap admin node process (puppet tasks run),
then restarts all important master node services.
Logging for this script is also added.
Closes-bug: #1605602
Related-bug: #1616472
Related-bug: #1616393
Change-Id: Ic4ef722b861d260c3679dca9c74f6cc62052c376
This change fixes 'generate_vms.sh' idempotentency by undefining the
domain on error, and also adds command execution output to the logs.
Change-Id: I94a9b1340a521da2bbfd1c08d7e1e0dc47aa9f51
Closes-Bug: #1613241
Signed-off-by: Maksim Malchuk <mmalchuk@mirantis.com>
Otherwise an error is printed into lrmd.log on each script invokation,
see the referenced bug for details. The correct OCF_FUNCTIONS_DIR
setup is taken from the original RabbitMQ OCF script.
Change-Id: Ice31967698d0e3c0d35e25728115d040927a9d26
Closes-Bug: #1616161
Currently fuel-rabbit-fence is not able
to manage its pid-file properly.
Current implementation was refactored.
"fcntl" is used.
Closes-bug: #1614963
Change-Id: Ia5f58c3eb964d25c733cc91a6a5373ddbf193e77
This change updates the calls that we us in the ocf scripts to determine
if the rules are present to include the -w flag to prevent the scripts
from failing if another iptables call is currently running. It has been
reported that this can occur when the ocf scripts are running in
parallel to the puppet deployment (firewall task)
Change-Id: Ia603f5643720a5fa5407de36ca75830a7c3f57fa
Closes-Bug: #1605540
If all the mysql nodes are booted at the exact same time, we can end up
with a situation where the master determination can occur almost at the
same time. This change updates the gtid fetching that is done during
master determination to include a retry with a random 1-10 second sleep
in an attempt to allow for the other nodes to update pacemaker with
their gtid information.
Change-Id: Ib12fb927391857ca9e3fb0a3ee45a7eec9e7913e
Closes-Bug: #1610180
This change updates the calls that we use in the ocf scripts to
determine if the rules are present to include the -n flag to prevent
unnecessary dns lookups which can lead to deployment failures if dns is
unavailable.
Change-Id: I17d04fbad6def1217429fc3c92bed997fd510eb8
Closes-Bug: #1605540
At the end of fuel-migration process script performs syncing
fuel-migration.log before creating flag and we can not see is flag
created properly or not.
So procedure of flag creation was placed before final log sync.
Change-Id: Iabf768a3824947896024415533e3587365917a9f
Related-Bug: #1606298
This change moves the hiera usage from the rabbit-fence service to
the cluster::rabbitmq_fence puppet class.
Related-Bug: #1603182
Change-Id: I109487d2cd1d0eab19dd995959e1fcc68594a1bc
Signed-off-by: Maksim Malchuk <mmalchuk@mirantis.com>
If the upstream script is not available, we were using the ocf_log
command but missed the inclusion of the ocf-shellfuncs that defined it.
This pulls in ocf-shellfuncs so we will not improperly error when the
timing condition occurs.
Change-Id: If058cb1e7eaafbdc962ba17d4d730924cc086cc5
Closes-Bug: #1599479
This change adds a check to the fuel rabbitmq ocf script to ensure we
are returning the correct return code. Because the rabbitmq cluster
configuration may exist on all nodes prior to the rabbitmq-server
package, we get into a state where the rabbitmq-server script is
partially available. But since we are sourcing the upstream script, when
it fails it returns 1 which is being interpreted as the wrong error.
This change returns not running if the script is not available since
clearly we can't be running if the package is not installed yet.
Change-Id: Idbc2a9ded39a47e06183793ac4a63115f93c9ba6
Closes-Bug: #1599479
Just do stop, if called. Do not report success
from guessing the current state, ensure it stopped
instead.
Closes-bug: #1596434
Change-Id: I97b2e47e8810d53f0455aa8b4852fd76b03ecebc
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>