Commit Graph

942 Commits

Author SHA1 Message Date
Zuul 945c958ff4 Merge "Don't expect a static job name" 2024-03-21 15:53:35 +00:00
Nobuto Murata fb32621831 Don't expect a static job name
A job name passed via the prometheus_scrape library doesn't end up as a
static job name in the prometheus configuration file in the COS world
even though COS expects a fixed string. Practically we cannot have a
static job name like job=ceph in any of the alert rules in COS since the
charms will convert the string "ceph" into:

> juju_MODELNAME_ID_APPNAME_prometheus_scrape_JOBNAME(ceph)-N

Let's give up the possibility of the static job name and use "up{}" so
it will be annotated with the model name/ID, etc. without any specific
job related condition. It will break the alert rules when one unit have
more than one scraping endpoint because there will be no way to
distinguish multiple scraping jobs. Ceph MON only has one prometheus
endpoint for the time being so this change shouldn't cause an immediate
issue. Overall, it's not ideal but at least better than the current
status, which is an alert error out of the box.

The following alert rule:
> up{} == 0
will be converted and annotated as:
> up{juju_application="ceph-mon",juju_model="ceph",juju_model_uuid="UUID"} == 0

Closes-Bug: #2044062

Change-Id: I0df8bc0238349b5f03179dfb8f4da95da48140c7
2024-03-18 15:29:49 +09:00
Peter Sabaini 762ad83c19 Fix: defer cos-prometheus for bootstrap
If a COS prometheus changed event is processed but bootstrap hasn't
completed yet, we need to retry the event at a later time.

Closes-bug: #2042891

Change-Id: I3d274c09522f9d7ef56bc66f68d8488150c125d8
2024-03-01 22:25:54 +01:00
Peter Sabaini 35f9af8c96 Fixup: multisite alert rule help texts
Change-Id: I558804c8bbd162a15bd97a023ac612d32fd96b02
2024-01-19 19:12:40 +01:00
Zuul 6ae78a6e6c Merge "Add alerting rules for RGW multisite deployments" 2024-01-18 17:41:30 +00:00
Peter Sabaini 24fccea832 Add alerting rules for RGW multisite deployments
Add default prometheus alerting rules for RadosGW multisite deployments based
on the built-in Ceph RGW multisite metrics.

Note that the included prometheus_alerts.yml.default rule file
is included for reference only. The ceph-mon charm will utilize the
resource file from https://charmhub.io/ceph-mon/resources/alert-rules
for deployment so that operators can easily customize these rules.

Change-Id: I5a12162d73686963132a952bddd85ec205964de4
2024-01-17 16:50:37 +01:00
Peter Sabaini 1c9f3b210d Don't error out on missing OSDs
Ceph reef has a behaviour change where it doesn't always return
version keys for all components. In
I12a1bcd32be2ed8a8e5ee0e304f716f5a190bd57 an attempt was made to fix
this by retrying, however this code path can also be hit when a
component such as OSDs are absent. While a cluster without OSDs
wouldn't be functional it still should not cause the charm to error.

As a fix, just make the OSD component optional when querying for a
version instead of retrying.

Change-Id: I5524896c7ad944f6f22fb1498ab0069397b52418
2024-01-16 11:26:42 +01:00
Zuul 7223f2634f Merge "Retry setting rbd_stats_pools prometheus config" 2024-01-10 07:45:07 +00:00
Zuul 0a03288a72 Merge "Add nagios check for radosgw-admin sync status" 2024-01-10 07:40:46 +00:00
Danny Cocks 8d7c1060aa Add nagios check for radosgw-admin sync status
This duplicates the check performed for ceph status and specialises it for
radosgw-admin sync status instead.

The config options available are:
- nagios_rgw_zones: this is which zones are expected to be connected
- nagios_rgw_additional_checks: this is equivalent to nagios_additional_checks
and allows for a configurable set of strings to grep for as critical alerts.

Change-Id: Ideb35587693feaf1cc0736e981005332e91ca861
2024-01-10 10:42:24 +11:00
Luciano Lo Giudice d76939ef70 Retry setting rbd_stats_pools prometheus config
Setting the 'mgr/prometheus/rbd_stats_pools' option can fail
if we arrive too early, even if the cluster is bootstrapped. This is
particularly seen in ceph-radosgw test runs. This patchset thus
adds a retry decorator to work around this issue.

Change-Id: Id9b7b903e67154e7d2bb6fecbeef7fac126804a8
2024-01-03 18:10:30 -03:00
Luciano Lo Giudice 03868b2c9f Revert default source to 'bobcat'
The Openstack libs don't recognize Ceph releases when specifying
the charm source. Instead, we have to use an Openstack release.
Since it was set to quincy, reset it to bobcat.

Closes-Bug: #2026651
Change-Id: Ibac09d2bf77eeba69789434eaa6112c2028fbf64
2023-12-15 17:45:31 -03:00
Samuel Walladge ffe81367e1 Add config option for rbd_stats_pools
This allows configuration RBD IO statistics collection for RBD pools.

Co-authored-by: Yoshi Kadokawa <yoshi.kadokawa@canonical.com>

Closes-Bug: #2042405

Related-Bug: #1989648
Change-Id: I2252163533a312f0f53165f946711ab20bb0e3c9
2023-11-13 00:35:37 +00:00
Peter Sabaini 324679f061 Tox: add Python 3.11 section to tox.ini
Also improve mocking unit tests

Change-Id: Ie4356c23e97cec48f5731323bc90d63335ecc753
2023-11-10 14:12:29 +02:00
Peter Sabaini bc7a0fb6c3 Fix: increase timeout for get versions
Change-Id: Iee13e9a88f047f5835aee8e5a308ce2035d28891
2023-10-02 12:34:31 +02:00
Peter Sabaini 55beb2504d Fix version retrieval
During cluster deployment a situation can arise where there are
already osd relations but osds are not yet fully added to the cluster.
This can make version retrieval fail for osds. Retry version retrieval
to give the cluster a chance to settle.

Also update tests to install OpenStack from latest/edge

Change-Id: I12a1bcd32be2ed8a8e5ee0e304f716f5a190bd57
2023-09-29 21:04:23 +02:00
Peter Sabaini 3567a0589c Prune CI test jobs and test bundles
Change-Id: I1be06ec2901ac414388f4875c95631e4ed50145e
2023-09-04 16:26:25 +02:00
Luciano Lo Giudice 84cdcf3cd5 Return previous result of processed broker requests
Instead of returning an empty dict for already processed
broker requests, store the result and return it. This works
around issues in charms like ceph-fs that spin indefinitely
waiting for the response to a request that never arrives.

Closes-Bug: #2031414
Change-Id: Ie86f007d76fe75cc07cf7a973eff3f535a11dbe7
2023-08-23 11:57:52 -03:00
Zuul 61c8209e3d Merge "Add 2023.2 Bobcat support" 2023-08-04 15:12:34 +00:00
Corey Bryant 49e21b83a3 Add 2023.2 Bobcat support
* sync charm-helpers to classic charms
* change openstack-origin/source default to quincy
* add mantic to metadata series
* align testing with bobcat
* add new bobcat bundles
* add bobcat bundles to tests.yaml
* add bobcat tests to osci.yaml
* update build-on and run-on bases
* drop kinetic
* update charmcraft_channel to 2.x/stable

Change-Id: I4c9d7fc9f3f3588fa777b5ecb14971ff923f2d11
2023-08-03 13:52:35 -04:00
Jadon Naas 3eb5898a65 Add docs key and point at Discourse
Add the 'docs' key and point it at a Discourse topic
previously populated with the charm's README contents.

When the new charm revision is released to the Charmhub,
this Discourse-based content will be displayed there. In
the absense of the this new key, the Charmhub's default
behaviour is to display the value of the charm's
'description' key.

Change-Id: I173cadb5a8208283883e1119dbfc5d661809cc5f
2023-07-18 13:55:47 -04:00
Peter Sabaini ab84214805 Set consistent source
Avoid the unintuitive situation where users are deploying from
channel=quincy but get an older ceph due to deploying series=focal by
explicitly setting source=quincy which is what most users want anyway;
those that do not can still explicitly set source.

Change-Id: I9428e93ba6107ba5e2ebcc667995b3d88eb03d27
2023-07-10 09:36:25 +02:00
Luciano Lo Giudice 1a41aa24ce Fix ceph-mon upgrade path
This PR makes some small changes in the upgrade path logic by
providing a fallback method of fetching the current ceph-mon
version and adding additional checks to see if the upgrade can
be done in a sane way.

Closes-Bug: #2024253
Change-Id: I1ca4316aaf4f0b855a12aa582a8188c88e926fa6
2023-07-06 16:59:37 -03:00
Chris MacNaughton 88d37461dc Ensure broker requests are re-processed on upgrade-charm
When broker-request caching was added, it broke functionality
that ensured that clients were updated on charm-upgrade, this
change enables a bypass of that cache functionality and uses
it to re-process broker requests in the upgrade-charm hook.

Depends-On: https://review.opendev.org/c/openstack/charms.ceph/+/848311
Func-Test-Pr: https://github.com/openstack-charmers/zaza-openstack-tests/pull/1066
Closes-Bug: #1968369
Change-Id: Ibdad1fd5976fdf2d5f3384f1b120b0d5dda34947
2023-06-29 19:58:49 -03:00
Nobuto Murata b31de1f027 Don't clear osd_memory_target unconditionally
The charm can now set osd_memory_target, but it's not per device class
or type by the nature of how the charm works. Resetting
osd_memory_target always when osd_memory_target is not passed over the
relation is a bit risky behavior since operators may have set
osd_memory_target by hand with `ceph config` command out side of the
charm. Let's be less disruptive on the charm upgrade.

Closes-Bug: #1934143
Change-Id: I34dd33e54193a9ebdbc9571d153aa6206c85a067
2023-06-16 21:28:04 +09:00
Zuul 58469bc459 Merge "Configure ceph with osd-memory-target from ceph-osd charm" 2023-06-16 10:29:49 +00:00
Samuel Walladge 87da61b687 Configure ceph with osd-memory-target from ceph-osd charm
Change-Id: Id3f21f8ab68fb88529b6cbd78217e27772c2739c
2023-06-13 19:31:26 +09:00
Peter Sabaini af2323c457 rbd mirror relation: be persistent in getting pool info
Auth for getting pool details can fail initially if we set up a rbd
mirror relation at cloud bootstrap. Add some retry to give it another
chance

Change-Id: I2f5ac561120b1abe52ea0621bb472bc78495fa97
Partial-Bug: #2021967
2023-06-01 10:46:27 +02:00
jneo8 e99c38ae4c Fix persistent config file not update bug
When ceph doing the version upgrade, it will check the previous ceph
from the `source` config variable which store in persistent file.
But the persistent file update is broken. It is because we use hookenv.Config
from ops framework, but the hookenv._run_atexit, which
save the change to file, is not been called.

Partial-Bug: #2007976
Change-Id: Ibf12a2b87736cb1d32788672fb390e027f15b936
func-test-pr: https://github.com/openstack-charmers/zaza-openstack-tests/pull/1047
2023-05-23 09:51:53 +08:00
Peter Sabaini 357421f391 Testing: use mysql and rabbitmq from LTS
For better stability use LTS series for rabbitmq and mysql when
testing instead of interim releases.

Also remove xena (non-lts) from tests and yoga as a source default

Change-Id: Ie443c55dc4cc1b7f63eacfee79b28f210f1277e4
2023-05-22 08:58:38 +02:00
Peter Sabaini f172b8cd1e Fix: testing bundles for jammy and lunar were off
Change-Id: I314fef8551e896ab35678bc78f0233cb42030413
2023-05-08 20:40:03 +02:00
Luciano Lo Giudice f23d9e3d3e Remove relation test
The CephRelationTest class wasn't of much used and the test was
rather flaky, since it compared public IP addresses.

Change-Id: Iba5aad1d895ba8b28ce364899a1e41275dc3003b
func-test-pr: https://github.com/openstack-charmers/zaza-openstack-tests/pull/1034
2023-04-06 19:18:16 -03:00
Chris MacNaughton 1a81c4416c Add support for interim Ubuntu releases
- update bundles to include UCA pocket tests
- update test configuration
- update metadata to include kinetic and lunar
- update snapcraft to allow run-on for kinetic and lunar

Change-Id: I6b229b502dd4ee9f1d219240b86f7826abf0c25d
2023-03-17 08:56:03 -04:00
Zuul 7ea8cf6a35 Merge "Use a different name for the local key/value store" 2023-03-14 19:26:45 +00:00
Luciano Lo Giudice 616c4e3367 Use a different name for the local key/value store
The operator framework and charmhelpers use the same path for the
local K/V store, which causes problems when running certain hooks
like 'pre-series-upgrade'. In order to work around this issue, this
patchset makes the charmhelpers lib use a different path, while
migrating the DB file before doing so.

Closes-Bug: #2005137
Change-Id: Ic2e024371ff431888731753d29fff8538232009a
2023-03-14 12:11:08 -03:00
Facundo Ciccioli b9f7805203 Fix Nagios additional checks functionality
Commit 40b22e3d on juju/charm-helpers repo introduced shell quoting of
each argument passed to the check, turning the quoting of the double quotes
done here not only unnecessary but also damaging to the final command.

Closes-Bug: #2008784
Change-Id: Ifedd5875d27e72a857b01a48afcd058476734695
func-test-pr: https://github.com/openstack-charmers/zaza-openstack-tests/pull/1022
2023-03-13 14:36:46 +01:00
Chris MacNaughton 3a774be961 Fix issue with ceph-client relation handling
A bug was introduced when changing ceph-client to
an operator framework library that caused the
fallback application_name handling to present
a class name rather than a remote applicaiton name.

This change updates the handling to get at an
`app.name` rather than an `app`.

As a drive-by, this also allow-lists the fully-
qualified rename.sh.

Closes-Bug: #1995086
Change-Id: I57b685cb78ba5c4930eb0fa73d7ef09d39d73743
func-test-pr: https://github.com/openstack-charmers/zaza-openstack-tests/pull/1022
2023-03-10 19:41:11 +00:00
Zuul 011dddf14d Merge "Add kinetic support" 2023-02-01 15:51:50 +00:00
Nobuto Murata c9389a8cd0 Revert "Create NRPE check to verify ceph daemons versions"
This reverts commit dfbda68e1a.

Reason for revert:

The Ceph version check seems to be missing a consideration of users to
execute the nrpe check. It actually fails to get keyrings to execute the
command as it's run by a non-root user.

$ juju run-action --wait nrpe/0 run-nrpe-check name=check-ceph-daemons-versions
unit-nrpe-0:
  UnitId: nrpe/0
  id: "20"
  results:
    Stderr: |
      2023-02-01T03:03:09.556+0000 7f4677361700 -1 auth: unable to find
      a keyring on
      /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin:
      (2) No such file or directory
      2023-02-01T03:03:09.556+0000 7f4677361700 -1
      AuthRegistry(0x7f467005f540) no keyring found at
      /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,
      disabling cephx
      2023-02-01T03:03:09.556+0000 7f4677361700 -1 auth: unable to find
      a keyring on
      /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin:
      (2) No such file or directory
      2023-02-01T03:03:09.556+0000 7f4677361700 -1
      AuthRegistry(0x7f4670064d88) no keyring found at
      /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,
      disabling cephx
      2023-02-01T03:03:09.560+0000 7f4677361700 -1 auth: unable to find
      a keyring on
      /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin:
      (2) No such file or directory
      2023-02-01T03:03:09.560+0000 7f4677361700 -1
      AuthRegistry(0x7f4677360000) no keyring found at
      /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,
      disabling cephx
      [errno 2] RADOS object not found (error connecting to the cluster)
    check-output: 'UNKNOWN: could not determine OSDs versions, error: Command ''[''ceph'',
      ''versions'']'' returned non-zero exit status 1.'
  status: completed
  timing:
    completed: 2023-02-01 03:03:10 +0000 UTC
    enqueued: 2023-02-01 03:03:09 +0000 UTC
    started: 2023-02-01 03:03:09 +0000 UTC

Related-Bug: #1943628
Change-Id: I84b306e84661e6664e8a69fa93dfdb02fa4f1e7e
2023-02-01 12:31:16 +09:00
Zuul 87600a9c31 Merge "Create a key for ceph-osd for crash module auth" 2023-01-31 15:41:18 +00:00
Corey Bryant ccd9d43e2a Add kinetic support
Add 22.10 run-on base.

Change-Id: I2de43d9d547849ffea6df502a249c771a77a78aa
2023-01-31 10:11:16 -05:00
James Page 58fc48ebbe Ensure crushtool --test called correctly
Later Ceph releases require that the --test function of crushtool
is called with replica information for validation.

Pass in "--num-rep 3" as a basic check plus "--show-statistics"
to silence a non-fatal warning message.

This can be clean cherry-picked back at least as far as
Ceph 12.2.x.

Change-Id: I76d21ddd9da79535f68490b4231ae13705e27edb
Closes-Bug: 2003690
2023-01-23 12:17:22 +00:00
Samuel Walladge b2408e9dd7 Create a key for ceph-osd for crash module auth
This will be set on the osd relation,
so the ceph-osd charm can use this key for auth
by the crash reporting module.

ref. https://docs.ceph.com/en/latest/mgr/crash/

See https://review.opendev.org/c/openstack/charm-ceph-osd/+/869139
for how this key is used by ceph-osd.

Closes-Bug: #2000630
Change-Id: Ic95aae6b5981a6df1e0b3c310bcef8018c494a24
2023-01-23 08:33:31 +10:30
Nobuto Murata df676a097f Make sure lockfile-progs package is installed
Also, drop python-dbus for simplicity since "check_upstart_job" in nrpe
is not enabled any longer. And the python-dbus package is no longer
available on jammy either.

    [on focal with systemd]
    $ ls -1 /etc/nagios/nrpe.d/
    check_ceph.cfg
    check_conntrack.cfg
    check_reboot.cfg
    check_systemd_scopes.cfg

Closes-Bug: #1998163
Change-Id: I30bc22ae8509367207004b90eb2c38ad0fae9ffe
2023-01-19 02:14:15 +00:00
Luciano Lo Giudice af9143503c Unpin tox version
This unpinning is meant to solve the issues with tox 4.x breaking
all the virtualenv dependencies.

Change-Id: Ifc3381b2f2e4e41ebf6676080bf1831baffb0d42
2023-01-19 11:11:53 +09:00
Zuul 458517e89c Merge "Fix: init alert rules on rel change" 2022-12-01 17:13:14 +00:00
Peter Sabaini 1cf9d4d228 Fix: init alert rules on rel change
Check for alert rules early, on first metrics-endpoint rel change

Change-Id: Iea39c33c614d204ee39ad39da68c31d213ed19e6
2022-11-30 14:59:32 +01:00
Peter Sabaini d76dda4f55 Work around config initialisation behaviour change
The previous (classic) version of the charm initialised a Config
object in the install hook and let it go out of scope. Initialise
a config object explicitly in the install and upgrade charm hooks.

Change-Id: Ic389c840cc4253adaddcaa50d184db6ca66cb397
2022-11-02 14:38:03 +01:00
Zuul 16eb15791a Merge "Adds operator-native mds provides library" 2022-10-26 15:45:42 +00:00
Zuul a7e627f380 Merge "Add kinetic support" 2022-10-26 15:45:40 +00:00