Commit Graph

139 Commits

Author SHA1 Message Date
Luciano Lo Giudice 0572504230 Implement the 'rotate-key' action for managers
This patchset implements key rotation for managers only. The user
can specified either the full entity name (i.e: 'mgr.XXXX') or
simply 'mgr', which stands for the local manager.

After the entity's directory is located, a new pending key is
generated, the keyring file is mutated to include the new key and
then replaced in situ. Lastly, the manager service is restarted.

Note that Ceph only has one active manager at a certain point,
so it only makes sense to call this action on _every_ mon unit.

Change-Id: Ie24b3f30922fa5be6641e37635440891614539d5
func-test-pr: https://github.com/openstack-charmers/zaza-openstack-tests/pull/1195
2024-04-05 19:37:36 -03:00
Luciano Lo Giudice 380532111f Implement the 'list-entities' action
This action is the first step needed to implement key rotation
in charmed Ceph.

Change-Id: I59012621a0d9a2a1197fd7f8f0155cf85a37a056
2024-03-26 18:10:36 -03:00
Zuul 0a03288a72 Merge "Add nagios check for radosgw-admin sync status" 2024-01-10 07:40:46 +00:00
Danny Cocks 8d7c1060aa Add nagios check for radosgw-admin sync status
This duplicates the check performed for ceph status and specialises it for
radosgw-admin sync status instead.

The config options available are:
- nagios_rgw_zones: this is which zones are expected to be connected
- nagios_rgw_additional_checks: this is equivalent to nagios_additional_checks
and allows for a configurable set of strings to grep for as critical alerts.

Change-Id: Ideb35587693feaf1cc0736e981005332e91ca861
2024-01-10 10:42:24 +11:00
Samuel Walladge ffe81367e1 Add config option for rbd_stats_pools
This allows configuration RBD IO statistics collection for RBD pools.

Co-authored-by: Yoshi Kadokawa <yoshi.kadokawa@canonical.com>

Closes-Bug: #2042405

Related-Bug: #1989648
Change-Id: I2252163533a312f0f53165f946711ab20bb0e3c9
2023-11-13 00:35:37 +00:00
Peter Sabaini 324679f061 Tox: add Python 3.11 section to tox.ini
Also improve mocking unit tests

Change-Id: Ie4356c23e97cec48f5731323bc90d63335ecc753
2023-11-10 14:12:29 +02:00
Peter Sabaini 55beb2504d Fix version retrieval
During cluster deployment a situation can arise where there are
already osd relations but osds are not yet fully added to the cluster.
This can make version retrieval fail for osds. Retry version retrieval
to give the cluster a chance to settle.

Also update tests to install OpenStack from latest/edge

Change-Id: I12a1bcd32be2ed8a8e5ee0e304f716f5a190bd57
2023-09-29 21:04:23 +02:00
Luciano Lo Giudice 1a41aa24ce Fix ceph-mon upgrade path
This PR makes some small changes in the upgrade path logic by
providing a fallback method of fetching the current ceph-mon
version and adding additional checks to see if the upgrade can
be done in a sane way.

Closes-Bug: #2024253
Change-Id: I1ca4316aaf4f0b855a12aa582a8188c88e926fa6
2023-07-06 16:59:37 -03:00
Chris MacNaughton 88d37461dc Ensure broker requests are re-processed on upgrade-charm
When broker-request caching was added, it broke functionality
that ensured that clients were updated on charm-upgrade, this
change enables a bypass of that cache functionality and uses
it to re-process broker requests in the upgrade-charm hook.

Depends-On: https://review.opendev.org/c/openstack/charms.ceph/+/848311
Func-Test-Pr: https://github.com/openstack-charmers/zaza-openstack-tests/pull/1066
Closes-Bug: #1968369
Change-Id: Ibdad1fd5976fdf2d5f3384f1b120b0d5dda34947
2023-06-29 19:58:49 -03:00
jneo8 e99c38ae4c Fix persistent config file not update bug
When ceph doing the version upgrade, it will check the previous ceph
from the `source` config variable which store in persistent file.
But the persistent file update is broken. It is because we use hookenv.Config
from ops framework, but the hookenv._run_atexit, which
save the change to file, is not been called.

Partial-Bug: #2007976
Change-Id: Ibf12a2b87736cb1d32788672fb390e027f15b936
func-test-pr: https://github.com/openstack-charmers/zaza-openstack-tests/pull/1047
2023-05-23 09:51:53 +08:00
Nobuto Murata c9389a8cd0 Revert "Create NRPE check to verify ceph daemons versions"
This reverts commit dfbda68e1a.

Reason for revert:

The Ceph version check seems to be missing a consideration of users to
execute the nrpe check. It actually fails to get keyrings to execute the
command as it's run by a non-root user.

$ juju run-action --wait nrpe/0 run-nrpe-check name=check-ceph-daemons-versions
unit-nrpe-0:
  UnitId: nrpe/0
  id: "20"
  results:
    Stderr: |
      2023-02-01T03:03:09.556+0000 7f4677361700 -1 auth: unable to find
      a keyring on
      /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin:
      (2) No such file or directory
      2023-02-01T03:03:09.556+0000 7f4677361700 -1
      AuthRegistry(0x7f467005f540) no keyring found at
      /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,
      disabling cephx
      2023-02-01T03:03:09.556+0000 7f4677361700 -1 auth: unable to find
      a keyring on
      /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin:
      (2) No such file or directory
      2023-02-01T03:03:09.556+0000 7f4677361700 -1
      AuthRegistry(0x7f4670064d88) no keyring found at
      /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,
      disabling cephx
      2023-02-01T03:03:09.560+0000 7f4677361700 -1 auth: unable to find
      a keyring on
      /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin:
      (2) No such file or directory
      2023-02-01T03:03:09.560+0000 7f4677361700 -1
      AuthRegistry(0x7f4677360000) no keyring found at
      /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,
      disabling cephx
      [errno 2] RADOS object not found (error connecting to the cluster)
    check-output: 'UNKNOWN: could not determine OSDs versions, error: Command ''[''ceph'',
      ''versions'']'' returned non-zero exit status 1.'
  status: completed
  timing:
    completed: 2023-02-01 03:03:10 +0000 UTC
    enqueued: 2023-02-01 03:03:09 +0000 UTC
    started: 2023-02-01 03:03:09 +0000 UTC

Related-Bug: #1943628
Change-Id: I84b306e84661e6664e8a69fa93dfdb02fa4f1e7e
2023-02-01 12:31:16 +09:00
Nobuto Murata df676a097f Make sure lockfile-progs package is installed
Also, drop python-dbus for simplicity since "check_upstart_job" in nrpe
is not enabled any longer. And the python-dbus package is no longer
available on jammy either.

    [on focal with systemd]
    $ ls -1 /etc/nagios/nrpe.d/
    check_ceph.cfg
    check_conntrack.cfg
    check_reboot.cfg
    check_systemd_scopes.cfg

Closes-Bug: #1998163
Change-Id: I30bc22ae8509367207004b90eb2c38ad0fae9ffe
2023-01-19 02:14:15 +00:00
Zuul 16eb15791a Merge "Adds operator-native mds provides library" 2022-10-26 15:45:42 +00:00
Peter Sabaini 9debe75064 Rewrite the get-erasure-profile action with the ops framework
Change-Id: I07cb5838c446ba08469e1d0f22d75d74c40ef29c
2022-10-26 11:28:42 +02:00
Chris MacNaughton 40521754ae Adds operator-native mds provides library
Change-Id: Id9783ca8f7091d9f6fb9419642d08383685bffb3
2022-10-25 11:22:46 +02:00
Zuul 4ac36718f3 Merge "Rewrite get_health action with the Operator framework" 2022-10-11 15:08:37 +00:00
Zuul 9c96720d8b Merge "rewrite create-erasure-profile with ops famework" 2022-10-11 15:08:35 +00:00
Zuul 4d9d7a5b90 Merge "Rewrite the create-crush-rule action with the ops framework" 2022-10-11 14:39:32 +00:00
Chris MacNaughton 7703ba5c28 Add operator-native ceph-client library
Change-Id: Id9caf3b385094b9bc4010893034185d0a47c45d4
2022-10-07 13:05:53 -04:00
Zuul 11b7a7340b Merge "Rewrite update status machinery with the ops framework" 2022-10-07 14:45:06 +00:00
Peter Sabaini 4cb09a7ad4 Rewrite update status machinery with the ops framework
Add a new module ceph_status for checking ceph-mon status.

Provide the ceph_shared helpers for querying current status of
ceph-mon units. Also add some initial testing for the charm module.

Change-Id: I5079023ca692f0a2b7bfda96bb1834b8e9b1f0cc
2022-10-06 17:22:40 +02:00
Chris MacNaughton 255888fef3 Rewrite get_health action with the Operator framework
Change-Id: I68645a3d00c0622c7701c8177bcd510c3092afe4
2022-10-06 11:40:32 +00:00
Chris MacNaughton 0905362f04 rewrite create-erasure-profile with ops famework
Change-Id: I27b0e926865ecb39ad4f5ad25de8266e9db75695
2022-10-06 11:40:27 +00:00
Chris MacNaughton 5ae30304dd Rewrite the create-crush-rule action with the ops framework
Change-Id: Ifaccd20ba4a0f148a38d14edf0c26bd4a4d5d655
2022-10-06 11:40:22 +00:00
Edin Sarajlic b8af44aefa Add nagios check for expected number of OSDs
This check does not require manually setting the number of expected
OSDs.

Initially, the charm sets the count (per-host) to that of what's
present in the OSD tree. The count will be updated (on a per-host
basis) when the number of OSDs grows, but not when it shrinks. There
is a charm action to reset the expected count using information from
the OSD tree.

Closes-Bug: #1952985
Change-Id: Ia6a060bf151908c1d4159e6bdffa7bfe1f0a7988
2022-10-05 13:02:54 +00:00
Peter Sabaini e36a1890b4 Fix: make ceph_metrics test more robust
Instead of messing with the harness' construction patch the missing
network-get in place

Change-Id: I162a0b73d76a3ed18689c2baf258372efe5f2ec4
2022-09-29 15:14:41 +02:00
Peter Sabaini 9c7101f573 Implement prometheus alert rules
Alert rules can be attached as a resource and will be transmitted via
the metrics-endpoint relation. Default alert rules taken from upstream
ceph have been added for reference.

Change-Id: I6a3c6f06e9b9d911b35c8ced1968becc6471b362
2022-09-23 14:22:06 +02:00
Zuul 235993f479 Merge "Fix: disable prometheus module on relation depart" 2022-09-13 14:39:10 +00:00
Peter Sabaini 449f6aea4d Fix: disable prometheus module on relation depart
Disable the ceph prometheus module on relation departure

Change-Id: I44f906aa17407c19fa2bbb9b4fbaa86964837b9a
2022-09-09 15:15:38 +02:00
Chris MacNaughton e60a23ae16 Rewrite actions/copy_pool into the oeprator framework
In addition to trivial changes (passing `event` into
the `copy_pool` function), this change introduces an
update to the actions/__init__.py that allows succinct
import and use from the main charm.py.

An apparently unrelated change is the removal of
charm-proof from the lint job, as it fails with the
removal of actions/copy-pool.

Change-Id: I66a5590ddf0f0bb5ca073a91b451f8c78598609a
func-test-pr: https://github.com/openstack-charmers/zaza-openstack-tests/pull/866
2022-09-08 16:41:33 +00:00
Luciano Lo Giudice 5656db92df Rewrite the 'change-osd-weight' to use the op framework
This patchset changes a single action, 'change-osd-weight' so that
it's implemented with the operator framework.

Change-Id: Ia11885a2096b6e4b1ecda5caea38939e17098e1d
2022-09-08 12:28:17 -03:00
Peter Sabaini 24dfc7440d Add support for prometheus-k8s
Add support for the metrics-endpoint relation. This allows relating
ceph-mon to prometheus-k8s which is being used in the COS Lite
observability stack. Upon relation, the ceph prometheus module will be
enabled and a corresponding scrape job configured for prometheus-k8s.

Drive-by test improvement for the utils module

Change-Id: Iaeee57aaa6f3678fdaef35f2582b4b4c974acb2a
2022-09-06 10:14:37 +02:00
Luciano Lo Giudice 1ee3d04fda First rewrite of ceph-mon with operator framework
This patchset implements the first rewrite of the charm using the
operator framework by simply calling into the hooks.

This change also includes functional validation about charm upgrades
from the previous stable to the locally built charm.

Fix tempest breakage for python < 3.8

Co-authored-by: Chris MacNaughton <chris.macnaughton@canonical.com>

Change-Id: I61308bb2900134ea163d9e92444066a3cb0de43d
func-test-pr: https://github.com/openstack-charmers/zaza-openstack-tests/pull/849
2022-08-19 19:00:56 -03:00
Chris MacNaughton a1d0518c80 Disable insecure global-id reclamation
Closes-Bug: #1929262
Change-Id: Id9f4cfdd70bab0090b66cbc8aeb258936cbf909e
2022-08-16 16:56:37 -04:00
Hicham El Gharbi dfbda68e1a Create NRPE check to verify ceph daemons versions
This NRPE check confirms if the versions of cluster daemons are divergent.

WARN - any minor version diverged
WARN – any versions are 1 release behind the mon
CRIT – any versions are 2 releases behind the mon
CRIT – any versions releases are head the mon

A juju action is also provided 'get-versions-report'
which provide to users, a quick way to see
daemons versions running on cluster hosts.

Closes-Bug: #1943628
Change-Id: I41b5c8576dc9cf885fa813a93e6d51e8804eb9d8
2022-07-19 12:18:06 +02:00
Connor Chamberlain a1cffc6693 Added safe-pg-repair action
This action automatically repairs inconsistent placement groups
which are caused by read errors.

PGs are repaired using `ceph pg repair <pgid>`.

Action is only taken if on of a PG's shards has a "read_error",
and no action will be taken if any additional errors are found.
No action will be taken if multiple "read_errors" are found.

This action is intended to be safe to run in all contexts.

Closes-Bug: #1923218
Change-Id: I903dfe02aa3b7c67414e3d0d9b57f4042d301830
2022-06-23 18:18:35 +08:00
Juan Pablo Norena d3b2494ee8 Add get-or-create-user and delete-user actions for ceph auth
The get-or-create-user action allows to create and get user,
with its mon and osd capabilities, and retrieve the related
keyring.
The delete-user action allows to delete users.

Closes-Bug: 1899215
func-test-pr: https://github.com/openstack-charmers/zaza-openstack-tests/pull/765
Change-Id: I2bd148e442990b6ff978624023bd85a741c6259a
2022-06-09 07:11:08 -05:00
Robert Gildein 37105f11cd Add list-crush-rules action
This action provides a list of crush rules defined in CEPH clusters.

Closes-bug: #1957458
Change-Id: I2a5fdae776e00d869a624e1107ab42cf69bb2f50
2022-03-29 17:02:12 +02:00
Chris MacNaughton c07fb2dc6a Remove functionality for auth-supported
Closes-Bug: #1841445
Change-Id: I394d025ff5c0b4a73c6683d67b0949484a5924a1
2022-03-22 11:30:32 +01:00
Aqsa Malik da798bdd95 Add profile-name parameter in create-pool action
This change adds a profile name parameter in the create-pool action that
allows a replicated pool to be created with a CRUSH profile other than
the default replicated_rule.

Closes-Bug: #1905573

Change-Id: Ib21ded8f4a977b4a2d57c6b6b4bb82721b12c4ea
2022-02-11 16:35:30 +01:00
Samuel Walladge 48c52fafdd Display information if missing OSD relation
When ceph-mon is blocked on waiting for enough OSDs to be available,
it will display a message to that effect.
But this is misleading if ceph-mon has not been related to ceph-osd.
So if the two are not related,
and ceph-mon is waiting for OSDS,
then display a message about the relation missing.

Closes-Bug: #1886558
Change-Id: Ic5ee9d33d2bb874af7fc7c325773f88c5661fcc6
2022-01-13 14:44:56 +10:30
James Page e2d8f32d31 Use unittest.mock instead of mock
The mock third party library was needed for mock support in py2
runtimes. Since we now only support py36 and later, we can use the
standard lib unittest.mock module instead.

Change-Id: Idffdcf1153821c3d9514f3410e5609ea8c99fe74
2021-12-16 09:37:23 +00:00
Zuul 1e148346b7 Merge "Add balancer module support for 'upmap'" 2021-10-05 22:29:16 +00:00
Luciano Lo Giudice 691605e6fc Add balancer module support for 'upmap'
This allows the user to change the configuration parameter
'balancer-mode' via Juju in order to set the balancer mode for Ceph.

Change-Id: I60dbd5f163e0c9d004275eff65db7ada41ad2660
Closes-Bug: #1888914
2021-10-04 11:53:21 -03:00
Xav Paice 282e23416f Add get-quorum-status action
Adds a new get-quorum-status action to return some distilled info from
'ceph quorum_status', primarily for verification of which mon units are
online.

Partial-Bug: #1917690

Change-Id: I608832d849ee3e4f5d150082c328b63c6ab43de7
2021-09-23 12:56:58 +02:00
Zuul ab0ccb2450 Merge "Add format option to "list-pools" action" 2021-09-10 08:29:30 +00:00
Robert Gildein 185f1719d5 Add format option to "list-pools" action
These changes provide more detailed outputs for the "list-pools" action.
The default action output has not changed ("<pool_id> <pool_name>,
<pool_id> <pool_name>, ..."), but when you pass the "format=json"
parameter, it will provide a list of pools with details about each pool.

The list of pools (with or without details) are parsed from
`ceph osd dump`.

Closes-Bug: #1920135
Change-Id: I6e2b834628312ed458527420ca83052d29bd2b9a
2021-09-09 13:50:00 +02:00
Garrett Thompson 375a1d0056 Change noout to be a CRITICAL alert instead of WARNING.
When the noout flag is set in a Ceph cluster, the Nagios check
currently marks this as a warning (like Ceph itself). However,
setting it to CRITICAL will raise visbility, and indicate to the
operator that this should be a temporary state.

Closes-Bug: 1926551
Change-Id: I9831cfea3f63e82fbc8bfebc938a9795b69111c7
2021-09-07 14:34:33 -06:00
Dmitrii Shcherbakov 82743ab7e5 Notify more relations when cluster is bootstrapped
Currently mon_relation only calls notify_rbd_mirrors when the cluster is
already bootstrapped which leads to broker requests not being handled
for other relations in some cases.

The change also moves the bootstrap attempt code into a separate
function and adds unit tests for mon_relation to cover different
branches for various inputs.

Closes-Bug: #1942224
Change-Id: Id9b611d128acb7d49a9a9ad9c096b232fefd6c68
2021-09-01 23:26:57 +03:00
Zuul 6afeafc0ea Merge "Add support dashboard relation" 2021-08-20 16:42:34 +00:00