This patchset implements key rotation for managers only. The user
can specified either the full entity name (i.e: 'mgr.XXXX') or
simply 'mgr', which stands for the local manager.
After the entity's directory is located, a new pending key is
generated, the keyring file is mutated to include the new key and
then replaced in situ. Lastly, the manager service is restarted.
Note that Ceph only has one active manager at a certain point,
so it only makes sense to call this action on _every_ mon unit.
Change-Id: Ie24b3f30922fa5be6641e37635440891614539d5
func-test-pr: https://github.com/openstack-charmers/zaza-openstack-tests/pull/1195
This duplicates the check performed for ceph status and specialises it for
radosgw-admin sync status instead.
The config options available are:
- nagios_rgw_zones: this is which zones are expected to be connected
- nagios_rgw_additional_checks: this is equivalent to nagios_additional_checks
and allows for a configurable set of strings to grep for as critical alerts.
Change-Id: Ideb35587693feaf1cc0736e981005332e91ca861
During cluster deployment a situation can arise where there are
already osd relations but osds are not yet fully added to the cluster.
This can make version retrieval fail for osds. Retry version retrieval
to give the cluster a chance to settle.
Also update tests to install OpenStack from latest/edge
Change-Id: I12a1bcd32be2ed8a8e5ee0e304f716f5a190bd57
This PR makes some small changes in the upgrade path logic by
providing a fallback method of fetching the current ceph-mon
version and adding additional checks to see if the upgrade can
be done in a sane way.
Closes-Bug: #2024253
Change-Id: I1ca4316aaf4f0b855a12aa582a8188c88e926fa6
When ceph doing the version upgrade, it will check the previous ceph
from the `source` config variable which store in persistent file.
But the persistent file update is broken. It is because we use hookenv.Config
from ops framework, but the hookenv._run_atexit, which
save the change to file, is not been called.
Partial-Bug: #2007976
Change-Id: Ibf12a2b87736cb1d32788672fb390e027f15b936
func-test-pr: https://github.com/openstack-charmers/zaza-openstack-tests/pull/1047
This reverts commit dfbda68e1a.
Reason for revert:
The Ceph version check seems to be missing a consideration of users to
execute the nrpe check. It actually fails to get keyrings to execute the
command as it's run by a non-root user.
$ juju run-action --wait nrpe/0 run-nrpe-check name=check-ceph-daemons-versions
unit-nrpe-0:
UnitId: nrpe/0
id: "20"
results:
Stderr: |
2023-02-01T03:03:09.556+0000 7f4677361700 -1 auth: unable to find
a keyring on
/etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin:
(2) No such file or directory
2023-02-01T03:03:09.556+0000 7f4677361700 -1
AuthRegistry(0x7f467005f540) no keyring found at
/etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,
disabling cephx
2023-02-01T03:03:09.556+0000 7f4677361700 -1 auth: unable to find
a keyring on
/etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin:
(2) No such file or directory
2023-02-01T03:03:09.556+0000 7f4677361700 -1
AuthRegistry(0x7f4670064d88) no keyring found at
/etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,
disabling cephx
2023-02-01T03:03:09.560+0000 7f4677361700 -1 auth: unable to find
a keyring on
/etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin:
(2) No such file or directory
2023-02-01T03:03:09.560+0000 7f4677361700 -1
AuthRegistry(0x7f4677360000) no keyring found at
/etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,
disabling cephx
[errno 2] RADOS object not found (error connecting to the cluster)
check-output: 'UNKNOWN: could not determine OSDs versions, error: Command ''[''ceph'',
''versions'']'' returned non-zero exit status 1.'
status: completed
timing:
completed: 2023-02-01 03:03:10 +0000 UTC
enqueued: 2023-02-01 03:03:09 +0000 UTC
started: 2023-02-01 03:03:09 +0000 UTC
Related-Bug: #1943628
Change-Id: I84b306e84661e6664e8a69fa93dfdb02fa4f1e7e
Also, drop python-dbus for simplicity since "check_upstart_job" in nrpe
is not enabled any longer. And the python-dbus package is no longer
available on jammy either.
[on focal with systemd]
$ ls -1 /etc/nagios/nrpe.d/
check_ceph.cfg
check_conntrack.cfg
check_reboot.cfg
check_systemd_scopes.cfg
Closes-Bug: #1998163
Change-Id: I30bc22ae8509367207004b90eb2c38ad0fae9ffe
Add a new module ceph_status for checking ceph-mon status.
Provide the ceph_shared helpers for querying current status of
ceph-mon units. Also add some initial testing for the charm module.
Change-Id: I5079023ca692f0a2b7bfda96bb1834b8e9b1f0cc
This check does not require manually setting the number of expected
OSDs.
Initially, the charm sets the count (per-host) to that of what's
present in the OSD tree. The count will be updated (on a per-host
basis) when the number of OSDs grows, but not when it shrinks. There
is a charm action to reset the expected count using information from
the OSD tree.
Closes-Bug: #1952985
Change-Id: Ia6a060bf151908c1d4159e6bdffa7bfe1f0a7988
Alert rules can be attached as a resource and will be transmitted via
the metrics-endpoint relation. Default alert rules taken from upstream
ceph have been added for reference.
Change-Id: I6a3c6f06e9b9d911b35c8ced1968becc6471b362
In addition to trivial changes (passing `event` into
the `copy_pool` function), this change introduces an
update to the actions/__init__.py that allows succinct
import and use from the main charm.py.
An apparently unrelated change is the removal of
charm-proof from the lint job, as it fails with the
removal of actions/copy-pool.
Change-Id: I66a5590ddf0f0bb5ca073a91b451f8c78598609a
func-test-pr: https://github.com/openstack-charmers/zaza-openstack-tests/pull/866
This patchset changes a single action, 'change-osd-weight' so that
it's implemented with the operator framework.
Change-Id: Ia11885a2096b6e4b1ecda5caea38939e17098e1d
Add support for the metrics-endpoint relation. This allows relating
ceph-mon to prometheus-k8s which is being used in the COS Lite
observability stack. Upon relation, the ceph prometheus module will be
enabled and a corresponding scrape job configured for prometheus-k8s.
Drive-by test improvement for the utils module
Change-Id: Iaeee57aaa6f3678fdaef35f2582b4b4c974acb2a
This patchset implements the first rewrite of the charm using the
operator framework by simply calling into the hooks.
This change also includes functional validation about charm upgrades
from the previous stable to the locally built charm.
Fix tempest breakage for python < 3.8
Co-authored-by: Chris MacNaughton <chris.macnaughton@canonical.com>
Change-Id: I61308bb2900134ea163d9e92444066a3cb0de43d
func-test-pr: https://github.com/openstack-charmers/zaza-openstack-tests/pull/849
This NRPE check confirms if the versions of cluster daemons are divergent.
WARN - any minor version diverged
WARN – any versions are 1 release behind the mon
CRIT – any versions are 2 releases behind the mon
CRIT – any versions releases are head the mon
A juju action is also provided 'get-versions-report'
which provide to users, a quick way to see
daemons versions running on cluster hosts.
Closes-Bug: #1943628
Change-Id: I41b5c8576dc9cf885fa813a93e6d51e8804eb9d8
This action automatically repairs inconsistent placement groups
which are caused by read errors.
PGs are repaired using `ceph pg repair <pgid>`.
Action is only taken if on of a PG's shards has a "read_error",
and no action will be taken if any additional errors are found.
No action will be taken if multiple "read_errors" are found.
This action is intended to be safe to run in all contexts.
Closes-Bug: #1923218
Change-Id: I903dfe02aa3b7c67414e3d0d9b57f4042d301830
The get-or-create-user action allows to create and get user,
with its mon and osd capabilities, and retrieve the related
keyring.
The delete-user action allows to delete users.
Closes-Bug: 1899215
func-test-pr: https://github.com/openstack-charmers/zaza-openstack-tests/pull/765
Change-Id: I2bd148e442990b6ff978624023bd85a741c6259a
This change adds a profile name parameter in the create-pool action that
allows a replicated pool to be created with a CRUSH profile other than
the default replicated_rule.
Closes-Bug: #1905573
Change-Id: Ib21ded8f4a977b4a2d57c6b6b4bb82721b12c4ea
When ceph-mon is blocked on waiting for enough OSDs to be available,
it will display a message to that effect.
But this is misleading if ceph-mon has not been related to ceph-osd.
So if the two are not related,
and ceph-mon is waiting for OSDS,
then display a message about the relation missing.
Closes-Bug: #1886558
Change-Id: Ic5ee9d33d2bb874af7fc7c325773f88c5661fcc6
The mock third party library was needed for mock support in py2
runtimes. Since we now only support py36 and later, we can use the
standard lib unittest.mock module instead.
Change-Id: Idffdcf1153821c3d9514f3410e5609ea8c99fe74
This allows the user to change the configuration parameter
'balancer-mode' via Juju in order to set the balancer mode for Ceph.
Change-Id: I60dbd5f163e0c9d004275eff65db7ada41ad2660
Closes-Bug: #1888914
Adds a new get-quorum-status action to return some distilled info from
'ceph quorum_status', primarily for verification of which mon units are
online.
Partial-Bug: #1917690
Change-Id: I608832d849ee3e4f5d150082c328b63c6ab43de7
These changes provide more detailed outputs for the "list-pools" action.
The default action output has not changed ("<pool_id> <pool_name>,
<pool_id> <pool_name>, ..."), but when you pass the "format=json"
parameter, it will provide a list of pools with details about each pool.
The list of pools (with or without details) are parsed from
`ceph osd dump`.
Closes-Bug: #1920135
Change-Id: I6e2b834628312ed458527420ca83052d29bd2b9a
When the noout flag is set in a Ceph cluster, the Nagios check
currently marks this as a warning (like Ceph itself). However,
setting it to CRITICAL will raise visbility, and indicate to the
operator that this should be a temporary state.
Closes-Bug: 1926551
Change-Id: I9831cfea3f63e82fbc8bfebc938a9795b69111c7
Currently mon_relation only calls notify_rbd_mirrors when the cluster is
already bootstrapped which leads to broker requests not being handled
for other relations in some cases.
The change also moves the bootstrap attempt code into a separate
function and adds unit tests for mon_relation to cover different
branches for various inputs.
Closes-Bug: #1942224
Change-Id: Id9b611d128acb7d49a9a9ad9c096b232fefd6c68