Add options to limit certain metrics

Some Agent plugins generate metrics that may not be desirable on some
installations where resources are severely limited.  This change adds
the ability to limit potentially non-essential metrics from some
plugins using the following boolean parameters:

* system: cpu_idle_only (reduces total CPU metrics by 3)
* system: net_bytes_only (reduces network metrics by 6 per device)
* libvirt: ping_only (reduces per-VM metrics by 20+)

These changes are described in more detail in the documentation
accompanying this patch.

monasca-setup now has the ability to configure these parameters
from the command line.  For example, to limit all non-essential
system metrics:
monasca-setup -d system -a 'cpu_idle_only=true
                            net_bytes_only=true
                            send_io_stats=false'

To limit libvirt per-VM metrics to host_alive_status only:
monasca-setup -d libvirt -a 'ping_only=true' --overwrite

Change-Id: I1fc7839907100dae52432e1af33170457b5888ef
This commit is contained in:
David Schroeder 2015-09-15 14:05:03 -06:00
parent 65d850c988
commit 630d7805e0
7 changed files with 51 additions and 5 deletions

View File

@ -4,6 +4,7 @@
- [System Checks](#system-checks)
- [System Metrics](#system-metrics)
- [Limiting System Metrics](#limiting-system-metrics)
- [Standard Plugins](#standard-plugins)
- [Dot File Configuration](#dot-file-configuration)
- [Default Plugin Detection](#default-plugin-detection)
@ -114,6 +115,20 @@ This section documents the system metrics that are sent by the Agent. This sect
| monasca.emit_time_sec | service=monitoring component=monasca-agent | Amount of time that the forwarder took to send metrics to the Monasca API.
| monasca.collection_time_sec | service=monitoring component=monasca-agent | Amount of time that the collector took for this collection run
### Limiting System Metrics
It is possible to reduce the number of system metrics with certain configuration parameters.
| Config Option | Values | Description |
| -------------- | ---------- | ------------------------------------------------------------------------------------------ |
| net_bytes_only | true/false | Sends bytes/sec metrics only, disabling packets/sec, packets_dropped/sec, and errors/sec. |
| cpu_idle_only | true/false | Sends idle_perc only, disabling wait/stolen/system/user metrics |
| send_io_stats | true/false | If true, sends I/O metrics for each disk device. If false, sends only disk space metrics. |
These parameters may added to `instances` in the plugin `.yaml` configuration file, or added via `monasca-setup` like this:
```
monasca-setup -d system -a 'cpu_idle_only=true net_bytes_only=true send_io_stats=false' --overwrite
```
By default, all metrics are enabled.
# Standard Plugins
Plugins are the way to extend the Monasca Agent. Plugins add additional functionality that allow the agent to perform checks on other applications, servers or services. This section describes the standard plugins that are delivered by default.
@ -997,6 +1012,8 @@ If the owner of the VM is in a different tenant the Agent Cross-Tenant Metric Su
`ping_check` includes the command line (sans the IP address) used to perform a ping check against instances. Set to False (or omit altogether) to disable ping checks. This is automatically populated during `monasca-setup` from a list of possible `ping` command lines. Generally, `fping` is preferred over `ping` because it can return a failure with sub-second resolution, but if `fping` does not exist on the system, `ping` will be used instead. If ping_check is disabled, the `host_alive_status` metric will not be published unless that VM is inactive. This is because the host status is inconclusive without a ping check.
`ping_only` will suppress all per-VM metrics aside from `host_alive_status` and `vm.host_alive_status`, including all I/O, network, memory, and CPU metrics. [Aggregate Metrics](#aggregate-metrics), however, would still be enabled if `ping_only` is true. By default, `ping_only` is false. If both `ping_only` and `ping_check` are set to false, the only metrics published by the Libvirt plugin would be the Aggregate Metrics.
Example config:
```
init_config:
@ -1009,6 +1026,7 @@ init_config:
nova_refresh: 14400
vm_probation: 300
ping_check: /usr/bin/fping -n -c1 -t250 -q
ping_only: false
instances:
- {}
```
@ -1016,6 +1034,11 @@ instances:
Note: If the Nova service login credentials are changed, `monasca-setup` would need to be re-run to use the new credentials. Alternately, `/etc/monasca/agent/conf.d/libvirt.yaml` could be modified directly.
Example `monasca-setup` usage:
```
monasca-setup -d libvirt -a 'ping_check=false ping_only=false'
```
### Instance Cache
The instance cache (`/dev/shm/libvirt_instances.yaml` by default) contains data that is not available to libvirt, but queried from Nova. To limit calls to the Nova API, the cache is only updated if a new instance is detected (libvirt sees an instance not already in the cache), or every `nova_refresh` seconds (see Configuration above).

View File

@ -28,13 +28,13 @@ class Cpu(checks.AgentCheck):
cpu_stats.iowait,
cpu_stats.idle,
cpu_stats.steal,
dimensions)
dimensions, instance)
if send_rollup_stats:
self.gauge('cpu.total_logical_cores', psutil.cpu_count(logical=True), dimensions)
num_of_metrics += 1
log.debug('Collected {0} cpu metrics'.format(num_of_metrics))
def _format_results(self, us, sy, wa, idle, st, dimensions):
def _format_results(self, us, sy, wa, idle, st, dimensions, instance):
data = {'cpu.user_perc': us,
'cpu.system_perc': sy,
'cpu.wait_perc': wa,
@ -42,7 +42,7 @@ class Cpu(checks.AgentCheck):
'cpu.stolen_perc': st}
for key in data.keys():
if data[key] is None:
if (data[key] is None or instance.get('cpu_idle_only') and 'idle_perc' not in key):
del data[key]
[self.gauge(key, value, dimensions) for key, value in data.iteritems()]

View File

@ -233,6 +233,10 @@ class LibvirtCheck(AgentCheck):
except OSError as e:
self.log.warn("OS error running '{0}' returned {1}".format(ping_cmd, e))
# Skip the remainder of the checks if ping_only is True in the config
if self.init_config.get('ping_only'):
continue
# Accumulate aggregate data
for gauge in agg_gauges:
if gauge in instance_cache.get(inst_name):

View File

@ -37,6 +37,8 @@ class Network(checks.AgentCheck):
nic = nics[nic_name]
self.rate('net.in_bytes_sec', nic.bytes_recv, device_name=nic_name, dimensions=dimensions)
self.rate('net.out_bytes_sec', nic.bytes_sent, device_name=nic_name, dimensions=dimensions)
if instance.get('net_bytes_only'):
continue
self.rate('net.in_packets_sec', nic.packets_recv, device_name=nic_name, dimensions=dimensions)
self.rate('net.out_packets_sec', nic.packets_sent, device_name=nic_name, dimensions=dimensions)
self.rate('net.in_errors_sec', nic.errin, device_name=nic_name, dimensions=dimensions)

View File

@ -2,6 +2,7 @@
Detection classes should be platform independent
"""
import ast
import logging
import sys
@ -58,6 +59,15 @@ class Plugin(object):
"""
raise NotImplementedError
def literal_eval(self, testval):
"""Return a literal boolean value if applicable
"""
if 'false' in str(testval).lower() or 'true' in str(testval).lower():
return ast.literal_eval(str(testval).capitalize())
else:
return testval
@property
def name(self):
"""Return _name if set otherwise the class name.

View File

@ -1,4 +1,3 @@
import ast
import ConfigParser
import grp
import logging
@ -120,7 +119,12 @@ class Libvirt(Plugin):
break
if 'ping_check' not in init_config:
log.info("\tUnable to find suitable ping command, disabling ping checks.")
init_config['ping_check'] = ast.literal_eval('False')
init_config['ping_check'] = self.literal_eval('False')
# Handle monasca-setup detection arguments, which take precedence
if self.args:
for arg in self.args:
init_config[arg] = self.literal_eval(self.args[arg])
config['libvirt'] = {'init_config': init_config,
'instances': [{}]}

View File

@ -31,6 +31,9 @@ class System(Plugin):
with open(os.path.join(self.template_dir, 'conf.d/' + metric + '.yaml'), 'r') as metric_template:
default_config = yaml.load(metric_template.read())
config[metric] = default_config
if self.args:
for arg in self.args:
config[metric]['instances'][0][arg] = self.literal_eval(self.args[arg])
log.info('\tConfigured {0}'.format(metric))
except (OSError, IOError):
log.info('\tUnable to configure {0}'.format(metric))