Added additional core related cpu and aggregated disk metrics

Added 1 additional cpu core metrics and 2 rollup disk metrics.
Added a configuration parameter to the disk plugin configuration
to enable/disable the aggregated disk metrics. Also added additional
documentation to the README file to reflect how to add dimensions
to a plugin instance.

Change-Id: I7e73f7103afa1e3e366769d56eb91233f5513726
This commit is contained in:
Gary Hessler 2015-03-17 18:36:11 -06:00
parent 728ce70e85
commit 9fcfd752f8
5 changed files with 62 additions and 12 deletions

View File

@ -214,11 +214,15 @@ Change
You must replace all of the curly brace values and you can also optionally tweak any of the other configuration items as well like a port number in the case of a port conflict. The config file options are documented in the agent.yaml.template file. You may also specify zero or more dimensions that would be included in every metric generated on that node, using the dimensions: value. Example: (include no extra dimensions on every metric)
dimensions: (This means no dimensions)
dimensions: (No dimensions example)
OR
dimensions: service:nova (This means one dimension called service with a value of nova)
dimensions: (Single dimension example)
service: nova
OR
dimensions: service:nova, group:group_a, zone:2 (This means three dimensions)
dimensions: (3 dimensions example)
service: nova
group: group_a
zone: 2
Once the configuration file has been updated and saved, monasca-agent must be restarted.
@ -369,8 +373,11 @@ This section documents the system metrics that are sent by the Agent. This sect
| cpu.stolen_perc | | Percentage of stolen CPU time, i.e. the time spent in other OS contexts when running in a virtualized environment |
| cpu.system_perc | | Percentage of time the CPU is used at the system level |
| cpu.user_perc | | Percentage of time the CPU is used at the user level |
| cpu.total_logical_cores | | Total number of logical cores available for an entire node (Includes hyper threading). **NOTE: This is an optional metric that is only sent when send_rollup_stats is set to true.** |
| disk.inode_used_perc | device, mount_point | The percentage of inodes that are used on a device |
| disk.space_used_perc | device, mount_point | The percentage of disk space that is being used on a device |
| disk.total_space_mb | | The total amount of disk space aggregated across all the disks on a particular node. **NOTE: This is an optional metric that is only sent when send_rollup_stats is set to true.** |
| disk.total_used_space_mb | | The total amount of used disk space aggregated across all the disks on a particular node. **NOTE: This is an optional metric that is only sent when send_rollup_stats is set to true.** |
| io.read_kbytes_sec | device | Kbytes/sec read by an io device
| io.read_req_sec | device | Number of read requests/sec to an io device
| io.read_time_sec | device | Amount of read time/sec to an io device
@ -460,8 +467,12 @@ init_config:
instances:
- username: john_smith
password: 123456
dimensions:
node_type: test
- username: jane_smith
password: 789012
dimensions:
node_type: production
```
#### init_config
@ -470,6 +481,9 @@ In the init_config section you can specify an arbitrary number of global name:va
#### instances
The instances section is a list of instances that this check will be run against. Your actual check() method is run once per instance. The name:value pairs for each instance specify details about the instance that are necessary for the check.
#### dimensions
The instances section can also contain optional dimensions. These dimensions will be added to any metrics generated by the check for that instance.
#### Plugin Documentation
Your plugin should include an example YAML configuration file to be placed in /etc/monasca/agent/conf.d/ which has the name of the plugin YAML file plus the extension '.example', so the example configuration file for the process plugin would be at /etc/monasca/agent/conf.d/process.yaml.example. This file should include a set of example init_config and instances clauses that demonstrate how the plugin can be configured.

View File

@ -3,3 +3,7 @@ init_config:
instances:
# Cpu check only supports one configured instance
- name: cpu_stats
# Send additional cpu rollup system metrics (Default = False)
# These metrics include the following metrics aggregated across all cpus:
# cpu.total_logical_cores
#send_rollup_stats: True

View File

@ -9,6 +9,11 @@ instances:
# Send additional disk i/o system metrics (Default = True)
#send_io_stats: False
# Send additional disk rollup system metrics (Default = False)
# These metrics include the following metrics aggregated across all disks:
# disk.total_mb and disk.total_used_mb
#send_rollup_stats: True
# Some infrastructures have many constantly changing virtual devices (e.g. folks
# running constantly churning linux containers) whose metrics aren't
# interesting. To filter out a particular pattern of devices

View File

@ -17,13 +17,23 @@ class Cpu(checks.AgentCheck):
"""
dimensions = self._set_dimensions(None, instance)
if instance is not None:
send_rollup_stats = instance.get("send_rollup_stats", False)
else:
send_rollup_stats = False
cpu_stats = psutil.cpu_times_percent(percpu=False)
self._format_results(cpu_stats.user + cpu_stats.nice,
cpu_stats.system + cpu_stats.irq + cpu_stats.softirq,
cpu_stats.iowait,
cpu_stats.idle,
cpu_stats.steal,
dimensions)
num_of_metrics = self._format_results(cpu_stats.user + cpu_stats.nice,
cpu_stats.system + cpu_stats.irq + cpu_stats.softirq,
cpu_stats.iowait,
cpu_stats.idle,
cpu_stats.steal,
dimensions)
if send_rollup_stats:
self.gauge('cpu.total_logical_cores', psutil.cpu_count(logical=True), dimensions)
num_of_metrics += 1
log.debug('Collected {0} cpu metrics'.format(num_of_metrics))
def _format_results(self, us, sy, wa, idle, st, dimensions):
data = {'cpu.user_perc': us,
@ -38,4 +48,4 @@ class Cpu(checks.AgentCheck):
[self.gauge(key, value, dimensions) for key, value in data.iteritems()]
log.debug('Collected {0} cpu metrics'.format(len(data)))
return len(data)

View File

@ -17,29 +17,36 @@ class Disk(checks.AgentCheck):
"""
dimensions = self._set_dimensions(None, instance)
rollup_dimensions = dimensions.copy()
if instance is not None:
use_mount = instance.get("use_mount", True)
send_io_stats = instance.get("send_io_stats", True)
send_rollup_stats = instance.get("send_rollup_stats", False)
# If we filter devices, get the list.
device_blacklist_re = self._get_re_exclusions(instance)
fs_types_to_ignore = self._get_fs_exclusions(instance)
else:
use_mount = True
fs_types_to_ignore = []
device_blacklist_re = None
send_io_stats = True
send_rollup_stats = False
device_blacklist_re = None
fs_types_to_ignore = []
partitions = psutil.disk_partitions(all=True)
if send_io_stats:
disk_stats = psutil.disk_io_counters(perdisk=True)
disk_count = 0
total_capacity = 0
total_used = 0
for partition in partitions:
if partition.fstype not in fs_types_to_ignore \
or (device_blacklist_re \
and not device_blacklist_re.match(partition.device)):
device_name = self._get_device_name(partition.device)
disk_usage = psutil.disk_usage(partition.mountpoint)
total_capacity += disk_usage.total
total_used += disk_usage.used
st = os.statvfs(partition.mountpoint)
if use_mount:
dimensions.update({'mount_point': partition.mountpoint})
@ -71,6 +78,16 @@ class Disk(checks.AgentCheck):
except KeyError:
log.debug('No Disk I/O metrics available for {0}...Skipping'.format(device_name))
if send_rollup_stats:
self.gauge("disk.total_space_mb",
total_capacity/1048576,
dimensions=rollup_dimensions)
self.gauge("disk.total_used_space_mb",
total_used/1048576,
dimensions=rollup_dimensions)
log.debug('Collected 2 rolled-up disk usage metrics')
def _get_re_exclusions(self, instance):
"""Parse device blacklist regular expression"""
filter = None