Get conductor metric data

This change adds the capability for the ironic-conductor
and standalone service process to transmit timer and counter
metrics to the message bus notifier which may be consumed by
a ceilometer, ironic-prometheus-exporter, or other consumer of
metrics event data on to the message bus.

This functionality is not presently supported on dedicated API
services such as those running as an ``ironic-api`` application
process, or Ironic WSGI application. This is due to the lack of
an internal trigger mechanism to transmit the data in a metrics
update to the message bus and/or notifier plugin.

This change requires ironic-lib 5.4.0 to collect and ship metrics via
the message bus.

Depends-On: https://review.opendev.org/c/openstack/ironic-lib/+/865311
Change-Id: If6941f970241a22d96e06d88365f76edc4683364
This commit is contained in:
Julia Kreger 2022-11-23 08:25:10 -08:00
parent 8d2d0bfc8b
commit 82b8ec7a39
10 changed files with 350 additions and 65 deletions

View File

@ -17,8 +17,11 @@ These performance measurements, herein referred to as "metrics", can be
emitted from the Bare Metal service, including ironic-api, ironic-conductor,
and ironic-python-agent. By default, none of the services will emit metrics.
Configuring the Bare Metal Service to Enable Metrics
====================================================
It is important to stress that not only statsd is supported for metrics
collection and transmission. This is covered later on in our documentation.
Configuring the Bare Metal Service to Enable Metrics with Statsd
================================================================
Enabling metrics in ironic-api and ironic-conductor
---------------------------------------------------
@ -62,6 +65,30 @@ in the ironic configuration file as well::
agent_statsd_host = 198.51.100.2
agent_statsd_port = 8125
.. Note::
Use of a different metrics backend with the agent is not presently
supported.
Transmission to the Message Bus Notifier
========================================
Regardless if you're using Ceilometer,
`ironic-prometheus-exporter <https://docs.openstack.org/ironic-prometheus-exporter/latest/>`_,
or some scripting you wrote to consume the message bus notifications,
metrics data can be sent to the message bus notifier from the timer methods
*and* additional gauge counters by utilizing the ``[metrics]backend``
configuration option and setting it to ``collector``. When this is the case,
Information is cached locally and periodically sent along with the general sensor
data update to the messaging notifier, which can consumed off of the message bus,
or via notifier plugin (such as is done with ironic-prometheus-exporter).
.. NOTE::
Transmission of timer data only works for the Conductor or ``single-process``
Ironic service model. A separate webserver process presently does not have
the capability of triggering the call to retrieve and transmit the data.
.. NOTE::
This functionality requires ironic-lib version 5.4.0 to be installed.
Types of Metrics Emitted
========================
@ -79,6 +106,9 @@ additional load before enabling metrics. To see which metrics have changed names
or have been removed between releases, refer to the `ironic release notes
<https://docs.openstack.org/releasenotes/ironic/>`_.
Additional conductor metrics in the form of counts will also be generated in
limited locations where petinant to the activity of the conductor.
.. note::
With the default statsd configuration, each timing metric may create
additional metrics due to how statsd handles timing metrics. For more

View File

@ -98,6 +98,8 @@ class ConductorManager(base_manager.BaseConductorManager):
def __init__(self, host, topic):
super(ConductorManager, self).__init__(host, topic)
# NOTE(TheJulia): This is less a metric-able count, but a means to
# sort out nodes and prioritise a subset (of non-responding nodes).
self.power_state_sync_count = collections.defaultdict(int)
@METRICS.timer('ConductorManager._clean_up_caches')
@ -1433,6 +1435,11 @@ class ConductorManager(base_manager.BaseConductorManager):
finally:
waiters.wait_for_all(futures)
# report a count of the nodes
METRICS.send_gauge(
'ConductorManager.PowerSyncNodesCount',
len(nodes))
def _sync_power_state_nodes_task(self, context, nodes):
"""Invokes power state sync on nodes from synchronized queue.
@ -1451,6 +1458,7 @@ class ConductorManager(base_manager.BaseConductorManager):
can do here to avoid failing a brand new deploy to a node that
we've locked here, though.
"""
# FIXME(comstud): Since our initial state checks are outside
# of the lock (to try to avoid the lock), some checks are
# repeated after grabbing the lock so we can unlock quickly.
@ -1497,6 +1505,12 @@ class ConductorManager(base_manager.BaseConductorManager):
LOG.info("During sync_power_state, node %(node)s was not "
"found and presumed deleted by another process.",
{'node': node_uuid})
# TODO(TheJulia): The chance exists that we orphan a node
# in power_state_sync_count, albeit it is not much data,
# it could eventually cause the memory footprint to grow
# on an exceptionally large ironic deployment. We should
# make sure we clean it up at some point, but overall given
# minimal impact, it is definite low hanging fruit.
except exception.NodeLocked:
LOG.info("During sync_power_state, node %(node)s was "
"already locked by another process. Skip.",
@ -1513,6 +1527,7 @@ class ConductorManager(base_manager.BaseConductorManager):
# regular power state checking, maintenance is still a required
# condition.
filters={'maintenance': True, 'fault': faults.POWER_FAILURE},
node_count_metric_name='ConductorManager.PowerSyncRecoveryNodeCount',
)
def _power_failure_recovery(self, task, context):
"""Periodic task to check power states for nodes in maintenance.
@ -1859,6 +1874,7 @@ class ConductorManager(base_manager.BaseConductorManager):
predicate=lambda n, m: n.conductor_affinity != m.conductor.id,
limit=lambda: CONF.conductor.periodic_max_workers,
shared_task=False,
node_count_metric_name='ConductorManager.SyncLocalStateNodeCount',
)
def _sync_local_state(self, task, context):
"""Perform any actions necessary to sync local state.
@ -2644,14 +2660,63 @@ class ConductorManager(base_manager.BaseConductorManager):
# Yield on every iteration
eventlet.sleep(0)
def _sensors_conductor(self, context):
"""Called to collect and send metrics "sensors" for the conductor."""
# populate the message which will be sent to ceilometer
# or other data consumer
message = {'message_id': uuidutils.generate_uuid(),
'timestamp': datetime.datetime.utcnow(),
'hostname': self.host}
try:
ev_type = 'ironic.metrics'
message['event_type'] = ev_type + '.update'
sensors_data = METRICS.get_metrics_data()
except AttributeError:
# TODO(TheJulia): Remove this at some point, but right now
# don't inherently break on version mismatches when people
# disregard requriements.
LOG.warning(
'get_sensors_data has been configured to collect '
'conductor metrics, however the installed ironic-lib '
'library lacks the functionality. Please update '
'ironic-lib to a minimum of version 5.4.0.')
except Exception as e:
LOG.exception(
"An unknown error occured while attempting to collect "
"sensor data from within the conductor. Error: %(error)s",
{'error': e})
else:
message['payload'] = (
self._filter_out_unsupported_types(sensors_data))
if message['payload']:
self.sensors_notifier.info(
context, ev_type, message)
@METRICS.timer('ConductorManager._send_sensor_data')
@periodics.periodic(spacing=CONF.conductor.send_sensor_data_interval,
enabled=CONF.conductor.send_sensor_data)
@periodics.periodic(spacing=CONF.sensor_data.interval,
enabled=CONF.sensor_data.send_sensor_data)
def _send_sensor_data(self, context):
"""Periodically collects and transmits sensor data notifications."""
if CONF.sensor_data.enable_for_conductor:
if CONF.sensor_data.workers == 1:
# Directly call the sensors_conductor when only one
# worker is permitted, so we collect data serially
# instead.
self._sensors_conductor(context)
else:
# Also, do not apply the general threshold limit to
# the self collection of "sensor" data from the conductor,
# as were not launching external processes, we're just reading
# from an internal data structure, if we can.
self._spawn_worker(self._sensors_conductor, context)
if not CONF.sensor_data.enable_for_nodes:
# NOTE(TheJulia): If node sensor data is not required, then
# skip the rest of this method.
return
filters = {}
if not CONF.conductor.send_sensor_data_for_undeployed_nodes:
if not CONF.sensor_data.enable_for_undeployed_nodes:
filters['provision_state'] = states.ACTIVE
nodes = queue.Queue()
@ -2659,7 +2724,7 @@ class ConductorManager(base_manager.BaseConductorManager):
filters=filters):
nodes.put_nowait(node_info)
number_of_threads = min(CONF.conductor.send_sensor_data_workers,
number_of_threads = min(CONF.sensor_data.workers,
nodes.qsize())
futures = []
for thread_number in range(number_of_threads):
@ -2675,7 +2740,7 @@ class ConductorManager(base_manager.BaseConductorManager):
break
done, not_done = waiters.wait_for_all(
futures, timeout=CONF.conductor.send_sensor_data_wait_timeout)
futures, timeout=CONF.sensor_data.wait_timeout)
if not_done:
LOG.warning("%d workers for send sensors data did not complete",
len(not_done))
@ -2684,13 +2749,14 @@ class ConductorManager(base_manager.BaseConductorManager):
"""Filters out sensor data types that aren't specified in the config.
Removes sensor data types that aren't specified in
CONF.conductor.send_sensor_data_types.
CONF.sensor_data.data_types.
:param sensors_data: dict containing sensor types and the associated
data
:returns: dict with unsupported sensor types removed
"""
allowed = set(x.lower() for x in CONF.conductor.send_sensor_data_types)
allowed = set(x.lower() for x in
CONF.sensor_data.data_types)
if 'all' in allowed:
return sensors_data

View File

@ -18,6 +18,7 @@ import inspect
import eventlet
from futurist import periodics
from ironic_lib import metrics_utils
from oslo_log import log
from ironic.common import exception
@ -29,6 +30,9 @@ from ironic.drivers import base as driver_base
LOG = log.getLogger(__name__)
METRICS = metrics_utils.get_metrics_logger(__name__)
def periodic(spacing, enabled=True, **kwargs):
"""A decorator to define a periodic task.
@ -46,7 +50,7 @@ class Stop(Exception):
def node_periodic(purpose, spacing, enabled=True, filters=None,
predicate=None, predicate_extra_fields=(), limit=None,
shared_task=True):
shared_task=True, node_count_metric_name=None):
"""A decorator to define a periodic task to act on nodes.
Defines a periodic task that fetches the list of nodes mapped to the
@ -84,6 +88,9 @@ def node_periodic(purpose, spacing, enabled=True, filters=None,
iteration to determine the limit.
:param shared_task: if ``True``, the task will have a shared lock. It is
recommended to start with a shared lock and upgrade it only if needed.
:param node_count_metric_name: A string value to identify a metric
representing the count of matching nodes to be recorded upon the
completion of the periodic.
"""
node_type = collections.namedtuple(
'Node',
@ -116,10 +123,11 @@ def node_periodic(purpose, spacing, enabled=True, filters=None,
else:
local_limit = limit
assert local_limit is None or local_limit > 0
node_count = 0
nodes = manager.iter_nodes(filters=filters,
fields=predicate_extra_fields)
for (node_uuid, *other) in nodes:
node_count += 1
if predicate is not None:
node = node_type(node_uuid, *other)
if accepts_manager:
@ -158,6 +166,11 @@ def node_periodic(purpose, spacing, enabled=True, filters=None,
local_limit -= 1
if not local_limit:
return
if node_count_metric_name:
# Send post-run metrics.
METRICS.send_gauge(
node_count_metric_name,
node_count)
return wrapper

View File

@ -44,6 +44,7 @@ from ironic.conf import neutron
from ironic.conf import nova
from ironic.conf import pxe
from ironic.conf import redfish
from ironic.conf import sensor_data
from ironic.conf import service_catalog
from ironic.conf import snmp
from ironic.conf import swift
@ -80,6 +81,7 @@ neutron.register_opts(CONF)
nova.register_opts(CONF)
pxe.register_opts(CONF)
redfish.register_opts(CONF)
sensor_data.register_opts(CONF)
service_catalog.register_opts(CONF)
snmp.register_opts(CONF)
swift.register_opts(CONF)

View File

@ -97,41 +97,6 @@ opts = [
cfg.IntOpt('node_locked_retry_interval',
default=1,
help=_('Seconds to sleep between node lock attempts.')),
cfg.BoolOpt('send_sensor_data',
default=False,
help=_('Enable sending sensor data message via the '
'notification bus')),
cfg.IntOpt('send_sensor_data_interval',
default=600,
min=1,
help=_('Seconds between conductor sending sensor data message '
'to ceilometer via the notification bus.')),
cfg.IntOpt('send_sensor_data_workers',
default=4, min=1,
help=_('The maximum number of workers that can be started '
'simultaneously for send data from sensors periodic '
'task.')),
cfg.IntOpt('send_sensor_data_wait_timeout',
default=300,
help=_('The time in seconds to wait for send sensors data '
'periodic task to be finished before allowing periodic '
'call to happen again. Should be less than '
'send_sensor_data_interval value.')),
cfg.ListOpt('send_sensor_data_types',
default=['ALL'],
help=_('List of comma separated meter types which need to be'
' sent to Ceilometer. The default value, "ALL", is a '
'special value meaning send all the sensor data.')),
cfg.BoolOpt('send_sensor_data_for_undeployed_nodes',
default=False,
help=_('The default for sensor data collection is to only '
'collect data for machines that are deployed, however '
'operators may desire to know if there are failures '
'in hardware that is not presently in use. '
'When set to true, the conductor will collect sensor '
'information from all nodes when sensor data '
'collection is enabled via the send_sensor_data '
'setting.')),
cfg.IntOpt('sync_local_state_interval',
default=180,
help=_('When conductors join or leave the cluster, existing '

View File

@ -43,6 +43,7 @@ _opts = [
('nova', ironic.conf.nova.list_opts()),
('pxe', ironic.conf.pxe.opts),
('redfish', ironic.conf.redfish.opts),
('sensor_data', ironic.conf.sensor_data.opts),
('service_catalog', ironic.conf.service_catalog.list_opts()),
('snmp', ironic.conf.snmp.opts),
('swift', ironic.conf.swift.list_opts()),

View File

@ -0,0 +1,89 @@
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
from oslo_config import cfg
from ironic.common.i18n import _
opts = [
cfg.BoolOpt('send_sensor_data',
default=False,
deprecated_group='conductor',
deprecated_name='send_sensor_data',
help=_('Enable sending sensor data message via the '
'notification bus.')),
cfg.IntOpt('interval',
default=600,
min=1,
deprecated_group='conductor',
deprecated_name='send_sensor_data_interval',
help=_('Seconds between conductor sending sensor data message '
'via the notification bus. This was originally for '
'consumption via ceilometer, but the data may also '
'be consumed via a plugin like '
'ironic-prometheus-exporter or any other message bus '
'data collector.')),
cfg.IntOpt('workers',
default=4, min=1,
deprecated_group='conductor',
deprecated_name='send_sensor_data_workers',
help=_('The maximum number of workers that can be started '
'simultaneously for send data from sensors periodic '
'task.')),
cfg.IntOpt('wait_timeout',
default=300,
deprecated_group='conductor',
deprecated_name='send_sensor_data_wait_timeout',
help=_('The time in seconds to wait for send sensors data '
'periodic task to be finished before allowing periodic '
'call to happen again. Should be less than '
'send_sensor_data_interval value.')),
cfg.ListOpt('data_types',
default=['ALL'],
deprecated_group='conductor',
deprecated_name='send_sensor_data_types',
help=_('List of comma separated meter types which need to be '
'sent to Ceilometer. The default value, "ALL", is a '
'special value meaning send all the sensor data. '
'This setting only applies to baremetal sensor data '
'being processed through the conductor.')),
cfg.BoolOpt('enable_for_undeployed_nodes',
default=False,
deprecated_group='conductor',
deprecated_name='send_sensor_data_for_undeployed_nodes',
help=_('The default for sensor data collection is to only '
'collect data for machines that are deployed, however '
'operators may desire to know if there are failures '
'in hardware that is not presently in use. '
'When set to true, the conductor will collect sensor '
'information from all nodes when sensor data '
'collection is enabled via the send_sensor_data '
'setting.')),
cfg.BoolOpt('enable_for_conductor',
default=True,
help=_('If to include sensor metric data for the Conductor '
'process itself in the message payload for sensor '
'data which allows operators to gather instance '
'counts of actions and states to better manage '
'the deployment.')),
cfg.BoolOpt('enable_for_nodes',
default=True,
help=_('If to transmit any sensor data for any nodes under '
'this conductor\'s management. This option superceeds '
'the ``send_sensor_data_for_undeployed_nodes`` '
'setting.')),
]
def register_opts(conf):
conf.register_opts(opts, group='sensor_data')

View File

@ -26,6 +26,7 @@ from unittest import mock
import eventlet
from futurist import waiters
from ironic_lib import metrics as ironic_metrics
from oslo_config import cfg
import oslo_messaging as messaging
from oslo_utils import uuidutils
@ -4273,7 +4274,8 @@ class SensorsTestCase(mgr_utils.ServiceSetUpMixin, db_base.DbTestCase):
def test__filter_out_unsupported_types_all(self):
self._start_service()
CONF.set_override('send_sensor_data_types', ['All'], group='conductor')
CONF.set_override('data_types', ['All'],
group='sensor_data')
fake_sensors_data = {"t1": {'f1': 'v1'}, "t2": {'f1': 'v1'}}
actual_result = (
self.service._filter_out_unsupported_types(fake_sensors_data))
@ -4282,7 +4284,8 @@ class SensorsTestCase(mgr_utils.ServiceSetUpMixin, db_base.DbTestCase):
def test__filter_out_unsupported_types_part(self):
self._start_service()
CONF.set_override('send_sensor_data_types', ['t1'], group='conductor')
CONF.set_override('data_types', ['t1'],
group='sensor_data')
fake_sensors_data = {"t1": {'f1': 'v1'}, "t2": {'f1': 'v1'}}
actual_result = (
self.service._filter_out_unsupported_types(fake_sensors_data))
@ -4291,7 +4294,8 @@ class SensorsTestCase(mgr_utils.ServiceSetUpMixin, db_base.DbTestCase):
def test__filter_out_unsupported_types_non(self):
self._start_service()
CONF.set_override('send_sensor_data_types', ['t3'], group='conductor')
CONF.set_override('data_types', ['t3'],
group='sensor_data')
fake_sensors_data = {"t1": {'f1': 'v1'}, "t2": {'f1': 'v1'}}
actual_result = (
self.service._filter_out_unsupported_types(fake_sensors_data))
@ -4305,7 +4309,8 @@ class SensorsTestCase(mgr_utils.ServiceSetUpMixin, db_base.DbTestCase):
for i in range(5):
nodes.put_nowait(('fake_uuid-%d' % i, 'fake-hardware', '', None))
self._start_service()
CONF.set_override('send_sensor_data', True, group='conductor')
CONF.set_override('send_sensor_data', True,
group='sensor_data')
task = acquire_mock.return_value.__enter__.return_value
task.node.maintenance = False
@ -4334,7 +4339,8 @@ class SensorsTestCase(mgr_utils.ServiceSetUpMixin, db_base.DbTestCase):
nodes.put_nowait(('fake_uuid', 'fake-hardware', '', None))
self._start_service()
self.service._shutdown = True
CONF.set_override('send_sensor_data', True, group='conductor')
CONF.set_override('send_sensor_data', True,
group='sensor_data')
self.service._sensors_nodes_task(self.context, nodes)
acquire_mock.return_value.__enter__.assert_not_called()
@ -4343,7 +4349,8 @@ class SensorsTestCase(mgr_utils.ServiceSetUpMixin, db_base.DbTestCase):
nodes = queue.Queue()
nodes.put_nowait(('fake_uuid', 'fake-hardware', '', None))
CONF.set_override('send_sensor_data', True, group='conductor')
CONF.set_override('send_sensor_data', True,
group='sensor_data')
self._start_service()
@ -4361,7 +4368,7 @@ class SensorsTestCase(mgr_utils.ServiceSetUpMixin, db_base.DbTestCase):
nodes = queue.Queue()
nodes.put_nowait(('fake_uuid', 'fake-hardware', '', None))
self._start_service()
CONF.set_override('send_sensor_data', True, group='conductor')
CONF.set_override('send_sensor_data', True, group='sensor_data')
task = acquire_mock.return_value.__enter__.return_value
task.node.maintenance = True
@ -4384,10 +4391,10 @@ class SensorsTestCase(mgr_utils.ServiceSetUpMixin, db_base.DbTestCase):
mock_spawn):
self._start_service()
CONF.set_override('send_sensor_data', True, group='conductor')
CONF.set_override('send_sensor_data', True, group='sensor_data')
# NOTE(galyna): do not wait for threads to be finished in unittests
CONF.set_override('send_sensor_data_wait_timeout', 0,
group='conductor')
CONF.set_override('wait_timeout', 0,
group='sensor_data')
_mapped_to_this_conductor_mock.return_value = True
get_nodeinfo_list_mock.return_value = [('fake_uuid', 'fake', None)]
self.service._send_sensor_data(self.context)
@ -4395,6 +4402,37 @@ class SensorsTestCase(mgr_utils.ServiceSetUpMixin, db_base.DbTestCase):
self.service._sensors_nodes_task,
self.context, mock.ANY)
@mock.patch.object(queue, 'Queue', autospec=True)
@mock.patch.object(manager.ConductorManager, '_sensors_conductor',
autospec=True)
@mock.patch.object(manager.ConductorManager, '_spawn_worker',
autospec=True)
@mock.patch.object(manager.ConductorManager, '_mapped_to_this_conductor',
autospec=True)
@mock.patch.object(dbapi.IMPL, 'get_nodeinfo_list', autospec=True)
def test___send_sensor_data_disabled(
self, get_nodeinfo_list_mock,
_mapped_to_this_conductor_mock,
mock_spawn, mock_sensors_conductor,
mock_queue):
self._start_service()
CONF.set_override('send_sensor_data', True, group='sensor_data')
CONF.set_override('enable_for_nodes', False,
group='sensor_data')
CONF.set_override('enable_for_conductor', False,
group='sensor_data')
# NOTE(galyna): do not wait for threads to be finished in unittests
CONF.set_override('wait_timeout', 0,
group='sensor_data')
_mapped_to_this_conductor_mock.return_value = True
get_nodeinfo_list_mock.return_value = [('fake_uuid', 'fake', None)]
self.service._send_sensor_data(self.context)
mock_sensors_conductor.assert_not_called()
# NOTE(TheJulia): Can't use the spawn worker since it records other,
# unrelated calls. So, queue works well here.
mock_queue.assert_not_called()
@mock.patch('ironic.conductor.manager.ConductorManager._spawn_worker',
autospec=True)
@mock.patch.object(manager.ConductorManager, '_mapped_to_this_conductor',
@ -4407,12 +4445,42 @@ class SensorsTestCase(mgr_utils.ServiceSetUpMixin, db_base.DbTestCase):
mock_spawn.reset_mock()
number_of_workers = 8
CONF.set_override('send_sensor_data', True, group='conductor')
CONF.set_override('send_sensor_data_workers', number_of_workers,
group='conductor')
CONF.set_override('send_sensor_data', True, group='sensor_data')
CONF.set_override('workers', number_of_workers,
group='sensor_data')
# NOTE(galyna): do not wait for threads to be finished in unittests
CONF.set_override('send_sensor_data_wait_timeout', 0,
group='conductor')
CONF.set_override('wait_timeout', 0,
group='sensor_data')
_mapped_to_this_conductor_mock.return_value = True
get_nodeinfo_list_mock.return_value = [('fake_uuid', 'fake',
None)] * 20
self.service._send_sensor_data(self.context)
self.assertEqual(number_of_workers + 1,
mock_spawn.call_count)
# TODO(TheJulia): At some point, we should add a test to validate that
# a modified filter to return all nodes actually works, although
# the way the sensor tests are written, the list is all mocked.
@mock.patch('ironic.conductor.manager.ConductorManager._spawn_worker',
autospec=True)
@mock.patch.object(manager.ConductorManager, '_mapped_to_this_conductor',
autospec=True)
@mock.patch.object(dbapi.IMPL, 'get_nodeinfo_list', autospec=True)
def test___send_sensor_data_one_worker(
self, get_nodeinfo_list_mock, _mapped_to_this_conductor_mock,
mock_spawn):
self._start_service()
mock_spawn.reset_mock()
number_of_workers = 1
CONF.set_override('send_sensor_data', True, group='sensor_data')
CONF.set_override('workers', number_of_workers,
group='sensor_data')
# NOTE(galyna): do not wait for threads to be finished in unittests
CONF.set_override('wait_timeout', 0,
group='sensor_data')
_mapped_to_this_conductor_mock.return_value = True
get_nodeinfo_list_mock.return_value = [('fake_uuid', 'fake',
@ -4421,9 +4489,21 @@ class SensorsTestCase(mgr_utils.ServiceSetUpMixin, db_base.DbTestCase):
self.assertEqual(number_of_workers,
mock_spawn.call_count)
# TODO(TheJulia): At some point, we should add a test to validate that
# a modified filter to return all nodes actually works, although
# the way the sensor tests are written, the list is all mocked.
@mock.patch.object(messaging.Notifier, 'info', autospec=True)
@mock.patch.object(ironic_metrics.MetricLogger,
'get_metrics_data', autospec=True)
def test__sensors_conductor(self, mock_get_metrics, mock_notifier):
metric = {'metric': 'data'}
mock_get_metrics.return_value = metric
self._start_service()
self.service._sensors_conductor(self.context)
self.assertEqual(mock_notifier.call_count, 1)
self.assertEqual('ironic.metrics', mock_notifier.call_args.args[2])
metrics_dict = mock_notifier.call_args.args[3]
self.assertEqual(metrics_dict.get('event_type'),
'ironic.metrics.update')
self.assertDictEqual(metrics_dict.get('payload'),
metric)
@mgr_utils.mock_record_keepalive

View File

@ -0,0 +1,39 @@
---
features:
- |
Adds the ability for Ironic to send conductor process metrics
for monitoring. This requires the use of a new ``[metrics]backend``
option value of ``collector``. This data was previously only available
through the use of statsd. This requires ``ironic-lib`` version ``5.4.0``
or newer. This capability can be disabled using the
``[sensor_data]enable_for_conductor`` option if set to False.
- |
Adds a ``[sensor_data]enable_for_nodes`` configuration option
to allow operators to disable sending node metric data via the
message bus notifier.
- |
Adds a new gauge metric ``ConductorManager.PowerSyncNodesCount``
which tracks the nodes considered for power state synchrnozation.
- Adds a new gauge metric ``ConductorManager.PowerSyncRecoveryNodeCount``
which represents the number of nodes which are being evaluated for power
state recovery checking.
- Adds a new gauge metric ``ConductorManager.SyncLocalStateNodeCount``
which represents the number of nodes being tracked locally by the
conductor.
issues:
- Sensor data notifications to the message bus, such as using the
``[metrics]backend`` configuration option of ``collector`` on a dedicated
API service process or instance, is not presently supported. This
functionality requires a periodic task to trigger the transmission
of metrics messages to the message bus notifier.
deprecations:
- The setting values starting with ``send_sensor`` in the ``[conductor]``
configuration group have been deprecated and moved to a ``[sensor_data]``
configuration group. The names have been updated to shorter, operator
friendly names..
upgrades:
- Settings starting with ``sensor_data`` in the ``[conductor]``
configuration group have been moved to a ``[sensor_data]`` configuration
group amd have been renamed to have shorter value names. If configuration
values are not updated, the ``oslo.config`` library will emit a warning
in the logs.

View File

@ -14,7 +14,7 @@ WebOb>=1.7.1 # MIT
python-cinderclient!=4.0.0,>=3.3.0 # Apache-2.0
python-glanceclient>=2.8.0 # Apache-2.0
keystoneauth1>=4.2.0 # Apache-2.0
ironic-lib>=4.6.1 # Apache-2.0
ironic-lib>=5.4.0 # Apache-2.0
python-swiftclient>=3.2.0 # Apache-2.0
pytz>=2013.6 # MIT
stevedore>=1.29.0 # Apache-2.0