Add support for prometheus-k8s

Add support for the metrics-endpoint relation. This allows relating
ceph-mon to prometheus-k8s which is being used in the COS Lite
observability stack. Upon relation, the ceph prometheus module will be
enabled and a corresponding scrape job configured for prometheus-k8s.

Drive-by test improvement for the utils module

Change-Id: Iaeee57aaa6f3678fdaef35f2582b4b4c974acb2a
This commit is contained in:
Peter Sabaini 2022-09-02 17:44:42 +02:00
parent 1ee3d04fda
commit 24dfc7440d
9 changed files with 2741 additions and 3 deletions

View File

@ -140,6 +140,9 @@ The charm supports Ceph metric monitoring with Prometheus. Add relations to the
> **Note**: Prometheus support is available starting with Ceph Luminous
(xenial-queens UCA pocket).
Alternatively, integration with the [COS Lite][cos-lite] observability
stack is available via the metrics-endpoint relation.
## Actions
This section lists Juju [actions][juju-docs-actions] supported by the charm.
@ -224,3 +227,4 @@ For general charm questions refer to the OpenStack [Charm Guide][cg].
[cloud-archive-ceph]: https://wiki.ubuntu.com/OpenStack/CloudArchive#Ceph_and_the_UCA
[upstream-ceph-buckets]: https://docs.ceph.com/docs/master/rados/operations/crush-map/#types-and-buckets
[jq]: https://stedolan.github.io/jq/
[cos-lite]: https://charmhub.io/cos-lite

View File

@ -0,0 +1,306 @@
# Copyright 2022 Canonical Ltd.
# See LICENSE file for licensing details.
"""## Overview.
This document explains how to use the `JujuTopology` class to
create and consume topology information from Juju in a consistent manner.
The goal of the Juju topology is to uniquely identify a piece
of software running across any of your Juju-managed deployments.
This is achieved by combining the following four elements:
- Model name
- Model UUID
- Application name
- Unit identifier
For a more in-depth description of the concept, as well as a
walk-through of it's use-case in observability, see
[this blog post](https://juju.is/blog/model-driven-observability-part-2-juju-topology-metrics)
on the Juju blog.
## Library Usage
This library may be used to create and consume `JujuTopology` objects.
The `JujuTopology` class provides three ways to create instances:
### Using the `from_charm` method
Enables instantiation by supplying the charm as an argument. When
creating topology objects for the current charm, this is the recommended
approach.
```python
topology = JujuTopology.from_charm(self)
```
### Using the `from_dict` method
Allows for instantion using a dictionary of relation data, like the
`scrape_metadata` from Prometheus or the labels of an alert rule. When
creating topology objects for remote charms, this is the recommended
approach.
```python
scrape_metadata = json.loads(relation.data[relation.app].get("scrape_metadata", "{}"))
topology = JujuTopology.from_dict(scrape_metadata)
```
### Using the class constructor
Enables instantiation using whatever values you want. While this
is useful in some very specific cases, this is almost certainly not
what you are looking for as setting these values manually may
result in observability metrics which do not uniquely identify a
charm in order to provide accurate usage reporting, alerting,
horizontal scaling, or other use cases.
```python
topology = JujuTopology(
model="some-juju-model",
model_uuid="00000000-0000-0000-0000-000000000001",
application="fancy-juju-application",
unit="fancy-juju-application/0",
charm_name="fancy-juju-application-k8s",
)
```
"""
import re
from collections import OrderedDict
from typing import Dict, List, Optional
# The unique Charmhub library identifier, never change it
LIBID = "bced1658f20f49d28b88f61f83c2d232"
LIBAPI = 0
LIBPATCH = 2
class InvalidUUIDError(Exception):
"""Invalid UUID was provided."""
def __init__(self, uuid: str):
self.message = "'{}' is not a valid UUID.".format(uuid)
super().__init__(self.message)
class JujuTopology:
"""JujuTopology is used for storing, generating and formatting juju topology information."""
def __init__(
self,
model: str,
model_uuid: str,
application: str,
unit: str = None,
charm_name: str = None,
):
"""Build a JujuTopology object.
A `JujuTopology` object is used for storing and transforming
Juju topology information. This information is used to
annotate Prometheus scrape jobs and alert rules. Such
annotation when applied to scrape jobs helps in identifying
the source of the scrapped metrics. On the other hand when
applied to alert rules topology information ensures that
evaluation of alert expressions is restricted to the source
(charm) from which the alert rules were obtained.
Args:
model: a string name of the Juju model
model_uuid: a globally unique string identifier for the Juju model
application: an application name as a string
unit: a unit name as a string
charm_name: name of charm as a string
"""
if not self.is_valid_uuid(model_uuid):
raise InvalidUUIDError(model_uuid)
self._model = model
self._model_uuid = model_uuid
self._application = application
self._charm_name = charm_name
self._unit = unit
def is_valid_uuid(self, uuid):
"""Validate the supplied UUID against the Juju Model UUID pattern."""
# TODO:
# Harness is harcoding an UUID that is v1 not v4: f2c1b2a6-e006-11eb-ba80-0242ac130004
# See: https://github.com/canonical/operator/issues/779
#
# >>> uuid.UUID("f2c1b2a6-e006-11eb-ba80-0242ac130004").version
# 1
#
# we changed the validation of the 3ed UUID block: 4[a-f0-9]{3} -> [a-f0-9]{4}
# See: https://github.com/canonical/operator/blob/main/ops/testing.py#L1094
#
# Juju in fact generates a UUID v4: https://github.com/juju/utils/blob/master/uuid.go#L62
# but does not validate it is actually v4:
# See:
# - https://github.com/juju/utils/blob/master/uuid.go#L22
# - https://github.com/juju/schema/blob/master/strings.go#L79
#
# Once Harness fixes this, we should remove this comment and refactor the regex or
# the entire method using the uuid module to validate UUIDs
regex = re.compile(
"^[a-f0-9]{8}-?[a-f0-9]{4}-?[a-f0-9]{4}-?[89ab][a-f0-9]{3}-?[a-f0-9]{12}$"
)
return bool(regex.match(uuid))
@classmethod
def from_charm(cls, charm):
"""Creates a JujuTopology instance by using the model data available on a charm object.
Args:
charm: a `CharmBase` object for which the `JujuTopology` will be constructed
Returns:
a `JujuTopology` object.
"""
return cls(
model=charm.model.name,
model_uuid=charm.model.uuid,
application=charm.model.app.name,
unit=charm.model.unit.name,
charm_name=charm.meta.name,
)
@classmethod
def from_dict(cls, data: dict):
"""Factory method for creating `JujuTopology` children from a dictionary.
Args:
data: a dictionary with five keys providing topology information. The keys are
- "model"
- "model_uuid"
- "application"
- "unit"
- "charm_name"
`unit` and `charm_name` may be empty, but will result in more limited
labels. However, this allows us to support charms without workloads.
Returns:
a `JujuTopology` object.
"""
return cls(
model=data["model"],
model_uuid=data["model_uuid"],
application=data["application"],
unit=data.get("unit", ""),
charm_name=data.get("charm_name", ""),
)
def as_dict(
self, *, remapped_keys: Dict[str, str] = None, excluded_keys: List[str] = None
) -> OrderedDict:
"""Format the topology information into an ordered dict.
Keeping the dictionary ordered is important to be able to
compare dicts without having to resort to deep comparisons.
Args:
remapped_keys: A dictionary mapping old key names to new key names,
which will be substituted when invoked.
excluded_keys: A list of key names to exclude from the returned dict.
uuid_length: The length to crop the UUID to.
"""
ret = OrderedDict(
[
("model", self.model),
("model_uuid", self.model_uuid),
("application", self.application),
("unit", self.unit),
("charm_name", self.charm_name),
]
)
if excluded_keys:
ret = OrderedDict({k: v for k, v in ret.items() if k not in excluded_keys})
if remapped_keys:
ret = OrderedDict(
(remapped_keys.get(k), v) if remapped_keys.get(k) else (k, v) for k, v in ret.items() # type: ignore
)
return ret
@property
def identifier(self) -> str:
"""Format the topology information into a terse string.
This crops the model UUID, making it unsuitable for comparisons against
anything but other identifiers. Mainly to be used as a display name or file
name where long strings might become an issue.
>>> JujuTopology( \
model = "a-model", \
model_uuid = "00000000-0000-4000-8000-000000000000", \
application = "some-app", \
unit = "some-app/1" \
).identifier
'a-model_00000000_some-app'
"""
parts = self.as_dict(
excluded_keys=["unit", "charm_name"],
)
parts["model_uuid"] = self.model_uuid_short
values = parts.values()
return "_".join([str(val) for val in values]).replace("/", "_")
@property
def label_matcher_dict(self) -> Dict[str, str]:
"""Format the topology information into a dict with keys having 'juju_' as prefix.
Relabelled topology never includes the unit as it would then only match
the leader unit (ie. the unit that produced the dict).
"""
items = self.as_dict(
remapped_keys={"charm_name": "charm"},
excluded_keys=["unit"],
).items()
return {"juju_{}".format(key): value for key, value in items if value}
@property
def label_matchers(self) -> str:
"""Format the topology information into a promql/logql label matcher string.
Topology label matchers should never include the unit as it
would then only match the leader unit (ie. the unit that
produced the matchers).
"""
items = self.label_matcher_dict.items()
return ", ".join(['{}="{}"'.format(key, value) for key, value in items if value])
@property
def model(self) -> str:
"""Getter for the juju model value."""
return self._model
@property
def model_uuid(self) -> str:
"""Getter for the juju model uuid value."""
return self._model_uuid
@property
def model_uuid_short(self) -> str:
"""Getter for the juju model value, truncated to the first eight letters."""
return self._model_uuid[:8]
@property
def application(self) -> str:
"""Getter for the juju application value."""
return self._application
@property
def charm_name(self) -> Optional[str]:
"""Getter for the juju charm name value."""
return self._charm_name
@property
def unit(self) -> Optional[str]:
"""Getter for the juju unit value."""
return self._unit

File diff suppressed because it is too large Load Diff

View File

@ -36,6 +36,8 @@ provides:
interface: ceph-rbd-mirror
prometheus:
interface: http
metrics-endpoint:
interface: prometheus_scrape
dashboard:
interface: ceph-dashboard
requires:

51
src/ceph_metrics.py Normal file
View File

@ -0,0 +1,51 @@
# Copyright 2022 Canonical Ltd.
# See LICENSE file for licensing details.
"""Provide ceph metrics to prometheus
Configure prometheus scrape jobs via the metrics-endpoint relation.
"""
import logging
from typing import Optional, Union, List
from charms.prometheus_k8s.v0 import prometheus_scrape
from charms_ceph import utils as ceph_utils
from ops.framework import BoundEvent
logger = logging.getLogger(__name__)
DEFAULT_CEPH_JOB = {
"metrics_path": "/metrics",
"static_configs": [{"targets": ["*:9283"]}],
}
class CephMetricsEndpointProvider(prometheus_scrape.MetricsEndpointProvider):
def __init__(
self,
charm,
relation_name: str = prometheus_scrape.DEFAULT_RELATION_NAME,
jobs=None,
alert_rules_path: str = prometheus_scrape.DEFAULT_ALERT_RULES_RELATIVE_PATH, # noqa
refresh_event: Optional[Union[BoundEvent, List[BoundEvent]]] = None,
):
if jobs is None:
jobs = [DEFAULT_CEPH_JOB]
super().__init__(
charm,
relation_name=relation_name,
jobs=jobs,
alert_rules_path=alert_rules_path,
refresh_event=refresh_event,
)
def _on_relation_changed(self, event):
"""Enable prometheus on relation change"""
if self._charm.unit.is_leader() and ceph_utils.is_bootstrapped():
logger.debug(
"is_leader and is_bootstrapped, running rel changed: %s", event
)
ceph_utils.mgr_enable_module("prometheus")
logger.debug("module_enabled")
super()._on_relation_changed(event)

View File

@ -5,6 +5,7 @@ from ops.main import main
import ops_openstack.core
import ceph_hooks as hooks
import ceph_metrics
class CephMonCharm(ops_openstack.core.OSBaseCharm):
@ -70,6 +71,8 @@ class CephMonCharm(ops_openstack.core.OSBaseCharm):
self._stored.is_started = True
fw = self.framework
self.metrics_endpoint = ceph_metrics.CephMetricsEndpointProvider(self)
fw.observe(self.on.install, self.on_install)
fw.observe(self.on.config_changed, self.on_config)
fw.observe(self.on.pre_series_upgrade, self.on_pre_series_upgrade)

View File

@ -84,7 +84,7 @@ deps = -r{toxinidir}/requirements.txt
[testenv:pep8]
basepython = python3
deps = flake8==3.9.2
charm-tools==2.8.3
charm-tools==2.8.4
commands = flake8 {posargs} unit_tests tests actions files src
charm-proof

View File

@ -0,0 +1,91 @@
#!/usr/bin/env python3
# Copyright 2022 Canonical Ltd.
# See LICENSE file for licensing details.
from unittest.mock import patch
import unittest
from ops import storage, model, framework
from ops.testing import Harness, _TestingModelBackend
import charm
class TestCephMetrics(unittest.TestCase):
def setUp(self):
super().setUp()
self.harness = Harness(charm.CephMonCharm)
# BEGIN: Workaround until network_get is implemented
class _TestingOPSModelBackend(_TestingModelBackend):
def network_get(self, endpoint_name, relation_id=None):
network_data = {
"bind-addresses": [
{
"addresses": [{"value": "10.0.0.10"}],
}
],
}
return network_data
self.harness._backend = _TestingOPSModelBackend(
self.harness._unit_name, self.harness._meta
)
self.harness._model = model.Model(
self.harness._meta, self.harness._backend
)
self.harness._framework = framework.Framework(
storage.SQLiteStorage(":memory:"),
self.harness._charm_dir,
self.harness._meta,
self.harness._model,
)
# END Workaround
self.addCleanup(self.harness.cleanup)
self.harness.begin()
self.harness.set_leader(True)
def test_init(self):
self.assertEqual(
self.harness.charm.metrics_endpoint._relation_name,
"metrics-endpoint",
)
@patch("ceph_metrics.ceph_utils.is_bootstrapped", return_value=True)
@patch("ceph_metrics.ceph_utils.is_mgr_module_enabled", return_value=False)
@patch("ceph_metrics.ceph_utils.mgr_enable_module")
def test_add_rel(
self,
mgr_enable_module,
_is_mgr_module_enable,
_is_bootstrapped,
):
rel_id = self.harness.add_relation("metrics-endpoint", "prometheus")
self.harness.add_relation_unit(rel_id, "prometheus/0")
unit_rel_data = self.harness.get_relation_data(
rel_id, self.harness.model.unit
)
self.assertEqual(
unit_rel_data["prometheus_scrape_unit_address"], "10.0.0.10"
)
# Trigger relation change event as a side effect
self.harness.update_relation_data(
rel_id, "prometheus/0", {"foo": "bar"}
)
mgr_enable_module.assert_called_once()
app_rel_data = self.harness.get_relation_data(
rel_id, self.harness.model.app
)
jobs = app_rel_data["scrape_jobs"]
self.assertEqual(
jobs,
(
'[{"metrics_path": "/metrics", '
'"static_configs": [{"targets": ["*:9283"]}]}]'
),
)

View File

@ -297,8 +297,7 @@ class CephUtilsTestCase(test_utils.CharmTestCase):
releases = utils.get_ceph_osd_releases()
self.assertEqual(len(releases), 2)
self.assertEqual(releases[0], ceph_release_1)
self.assertEqual(releases[1], ceph_release_2)
self.assertEqual(sorted(releases), [ceph_release_1, ceph_release_2])
@mock.patch.object(utils.subprocess, 'check_output')
@mock.patch.object(utils.json, 'loads')