Updating metric names to match new format

Change-Id: I395c56864373dc43c89c86e455102ea54c50a64c
This commit is contained in:
gary-hessler 2014-08-14 11:03:09 -06:00
parent 178509a3aa
commit b65128c818
9 changed files with 122 additions and 109 deletions

View File

@ -351,12 +351,8 @@ This section documents the system metrics that are sent by the Agent. This sect
| cpu.stolen_perc | | Percentage of stolen CPU time, i.e. the time spent in other OS contexts when running in a virtualized environment |
| cpu.system_perc | | Percentage of time the CPU is used at the system level |
| cpu.user_perc | | Percentage of time the CPU is used at the user level |
| disk.free_inodes | device | The number of inodes that are free on a device |
| disk.used_inodes | device | The number of inodes that are used on a device |
| disk.total_inodes | device | The total number of inodes that are available on a device |
| disk.used_kbytes | device | The number of kilobytes of disk space that are used on a device |
| disk.total_kbytes | device | The total number of kilobytes of disk space that are available on a device |
| disk.free_kbytes | device | The number of kilobytes of disk space that are free on a device|
| disk_inode_utilization_perc | device | The percentage of inodes that are used on a device |
| disk_space_utilization_perc | device | The percentage of disk space that is being used on a device |
| io.read_kbytes_sec | device | Kbytes/sec read by an io device
| io.read_req_sec | device | Number of read requests/sec to an io device
| io.write_kbytes_sec |device | Kbytes/sec written by an io device
@ -373,7 +369,7 @@ This section documents the system metrics that are sent by the Agent. This sect
| mem.usable_mb | | Total megabytes of usable memory
| mem.usable_perc | | Percentage of total memory that is usable
| mem.used_buffers | | Number of buffers being used by the kernel for block io
| mem_used_cached | | Memory used for the page cache
| mem.used_cached | | Memory used for the page cache
| mem.used_shared | | Memory shared between separate processes and typically used for inter-process communication
| net.bytes_in | device | Number of network bytes received
| net.bytes_out | device | Number of network bytes sent
@ -381,9 +377,9 @@ This section documents the system metrics that are sent by the Agent. This sect
| net.packets_out | device | Number of network packets sent
| net.errors_in | device | Number of network errors on incoming network traffic
| net.errors_out | device | Number of network errors on outgoing network traffic
| collector.threads.count | service=monasca component=agent | Number of threads that the collector is consuming for this collection run
| collector.emit.time | service=monasca component=agent | Amount of time that the collector took for sending the collected metrics to the Forwarder for this collection run
| collector.collection.time | service=monasca component=agent | Amount of time that the collector took for this collection run
| thread_count | service=monasca component=collector | Number of threads that the collector is consuming for this collection run
| emit_time | service=monasca component=collector | Amount of time that the collector took for sending the collected metrics to the Forwarder for this collection run
| collection_time | service=monasca component=collector | Amount of time that the collector took for this collection run
# Plugin Checks
Plugins are the way to extend the Monasca Agent. Plugins add additional functionality that allow the agent to perform checks on other applications, servers or services.
@ -557,19 +553,19 @@ The process checks return the following metrics:
| Metric Name | Dimensions | Semantics |
| ----------- | ---------- | --------- |
| processes.mem.real | process_name | Amount of real memory a process is using
| processes.mem.rss | process_name | Amount of rss memory a process is using
| processes.io.read_count | process_name | Number of reads by a process
| processes.io.write_count | process_name | Number of writes by a process
| processes.io.read_bytes | process_name | Bytes read by a process
| processes.io.write_bytes | process_name | Bytes written by a process
| processes.threads | process_name | Number of threads a process is using
| processes.cpu_perc | process_name | Percentage of cpu being consumed by a process
| processes.vms | process_name | Amount of virtual memory a process is using
| processes.open_file_decorators | process_name | Number of files being used by a process
| processes.involuntary_ctx_switches | process_name | Number of involuntary context switches for a process
| processes.voluntary_ctx_switches | process_name | Number of voluntary context switches for a process
| processes.pid_count | process_name | Number of processes that exist with this process name
| process.mem.real | process_name, service, component | Amount of real memory a process is using
| process.mem.rss | process_name, service, component | Amount of rss memory a process is using
| process.io.read_count | process_name, service, component | Number of reads by a process
| process.io.write_count | process_name, service, component | Number of writes by a process
| process.io.read_bytes | process_name, service, component | Bytes read by a process
| process.io.write_bytes | process_name, service, component | Bytes written by a process
| process.threads | process_name, service, component | Number of threads a process is using
| process.cpu_perc | process_name, service, component | Percentage of cpu being consumed by a process
| process.vms | process_name, service, component | Amount of virtual memory a process is using
| process.open_file_decorators | process_name, service, component | Number of files being used by a process
| process.involuntary_ctx_switches | process_name, service, component | Number of involuntary context switches for a process
| process.voluntary_ctx_switches | process_name, service, component | Number of voluntary context switches for a process
| process.pid_count | process_name, service, component | Number of processes that exist with this process name
## Http Endpoint Checks
This section describes the http endpoint check that can be performed by the Agent. Http endpoint checks are checks that perform simple up/down checks on services, such as HTTP/REST APIs. An agent, given a list of URLs can dispatch an http request and report to the API success/failure as a metric.

View File

@ -108,16 +108,16 @@ class Collector(object):
def collector_stats(self, num_metrics, num_events, collection_time, emit_time):
metrics = {}
thread_count = threading.active_count()
metrics['monagent.collector.threads.count'] = thread_count
metrics['threads_count'] = thread_count
if thread_count > MAX_THREADS_COUNT:
log.warn("Collector thread count is high: %d" % thread_count)
metrics['monagent.collector.collection.time'] = collection_time
metrics['collection_time'] = collection_time
if collection_time > MAX_COLLECTION_TIME:
log.info("Collection time (s) is high: %.1f, metrics count: %d, events count: %d" %
(collection_time, num_metrics, num_events))
metrics['monagent.collector.emit.time'] = emit_time
metrics['emit_time'] = emit_time
if emit_time is not None and emit_time > MAX_EMIT_TIME:
log.info("Emit time (s) is high: %.1f, metrics count: %d, events count: %d" %
(emit_time, num_metrics, num_events))
@ -163,7 +163,10 @@ class Collector(object):
# Add in metrics on the collector run, emit_duration is from the previous run
for name, value in self.collector_stats(len(metrics_list), len(events),
collect_duration, self.emit_duration).iteritems():
metrics_list.append(Measurement(name, timestamp, value, {}))
metrics_list.append(Measurement(name,
timestamp,
value,
{'service': 'monasca', 'component': 'collector'}))
emitter_statuses = self._emit(metrics_list)
self.emit_duration = timer.step()

View File

@ -113,13 +113,9 @@ class Disk(Check):
self.logger.exception("Cannot parse %s" % (parts,))
if inodes:
usage_data['%s.disk_total_inodes' % parts[0]] = parts[1]
usage_data['%s.disk_used_inodes' % parts[0]] = parts[2]
usage_data['%s.disk_free_inodes' % parts[0]] = parts[3]
usage_data['%s.disk_inode_utilization_perc' % parts[0]] = float(parts[2]) / parts[1] * 100
else:
usage_data['%s.disk_total_kbytes' % parts[0]] = parts[1]
usage_data['%s.disk_used_kbytes' % parts[0]] = parts[2]
usage_data['%s.disk_free_kbytes' % parts[0]] = parts[3]
usage_data['%s.disk_space_utilization_perc' % parts[0]] = float(parts[2]) / parts[1] * 100
return usage_data
@ -261,20 +257,20 @@ class IO(Check):
names = {"wait": "await",
"svc_t": "svctm",
"%b": "%util",
"kr/s": "io_read_kbytes_sec",
"kw/s": "io_write_kbytes_sec",
"kr/s": "io.read_kbytes_sec",
"kw/s": "io.write_kbytes_sec",
"actv": "avgqu-sz"}
elif os_name == "freebsd":
names = {"svc_t": "await",
"%b": "%util",
"kr/s": "io_read_kbytes_sec",
"kw/s": "io_write_kbytes_sec",
"kr/s": "io.read_kbytes_sec",
"kw/s": "io.write_kbytes_sec",
"wait": "avgqu-sz"}
elif os_name == "linux":
names = {"rkB/s": "io_read_kbytes_sec",
"r/s": "io_read_req_sec",
"wkB/s": "io_write_kbytes_sec",
"w/s": "io_write_req_sec"}
names = {"rkB/s": "io.read_kbytes_sec",
"r/s": "io.read_req_sec",
"wkB/s": "io.write_kbytes_sec",
"w/s": "io.write_req_sec"}
# translate if possible
return names.get(metric_name, metric_name)
@ -435,9 +431,9 @@ class Load(Check):
# Split out the 3 load average values
load = [res.replace(',', '.') for res in re.findall(r'([0-9]+[\.,]\d+)', uptime)]
return {'load_avg_1_min': float(load[0]),
'load_avg_5_min': float(load[1]),
'load_avg_15_min': float(load[2]),
return {'cpu.load_avg_1_min': float(load[0]),
'cpu.load_avg_5_min': float(load[1]),
'cpu.load_avg_15_min': float(load[2]),
}
@ -537,35 +533,35 @@ class Memory(Check):
# Physical memory
# FIXME units are in MB, we should use bytes instead
try:
memData['mem_total_mb'] = int(meminfo.get('MemTotal', 0)) / 1024
memData['mem_free_mb'] = int(meminfo.get('MemFree', 0)) / 1024
memData['memphysBuffers'] = int(meminfo.get('Buffers', 0)) / 1024
memData['memphysCached'] = int(meminfo.get('Cached', 0)) / 1024
memData['memphysShared'] = int(meminfo.get('Shmem', 0)) / 1024
memData['mem.total_mb'] = int(meminfo.get('MemTotal', 0)) / 1024
memData['mem.free_mb'] = int(meminfo.get('MemFree', 0)) / 1024
memData['mem.used_buffers'] = int(meminfo.get('Buffers', 0)) / 1024
memData['mem.used_cached'] = int(meminfo.get('Cached', 0)) / 1024
memData['mem.used_shared'] = int(meminfo.get('Shmem', 0)) / 1024
memData['mem_usable_perc'] = memData['mem_total_mb'] - memData['mem_free_mb']
memData['mem.usable_perc'] = memData['mem.total_mb'] - memData['mem.free_mb']
# Usable is relative since cached and buffers are actually used to speed things up.
memData['mem_usable_mb'] = memData['mem_free_mb'] + \
memData['memphysBuffers'] + memData['memphysCached']
memData['mem.usable_mb'] = memData['mem.free_mb'] + \
memData['mem.used_buffers'] + memData['mem.used_cached']
if memData['mem_total_mb'] > 0:
memData['mem_usable_perc'] = float(
memData['mem_usable_mb']) / float(memData['mem_total_mb'])
if memData['mem.total_mb'] > 0:
memData['mem.usable_perc'] = float(
memData['mem.usable_mb']) / float(memData['mem.total_mb'])
except Exception:
self.logger.exception('Cannot compute stats from /proc/meminfo')
# Swap
# FIXME units are in MB, we should use bytes instead
try:
memData['mem_swap_total_mb'] = int(meminfo.get('SwapTotal', 0)) / 1024
memData['mem_swap_free_mb'] = int(meminfo.get('SwapFree', 0)) / 1024
memData['mem.swap_total_mb'] = int(meminfo.get('SwapTotal', 0)) / 1024
memData['mem.swap_free_mb'] = int(meminfo.get('SwapFree', 0)) / 1024
memData['mem_swap_used_mb'] = memData[
'mem_swap_total_mb'] - memData['mem_swap_free_mb']
memData['mem.swap_used_mb'] = memData[
'mem.swap_total_mb'] - memData['mem.swap_free_mb']
if memData['mem_swap_total_mb'] > 0:
memData['mem_swap_free_perc'] = float(
memData['mem_swap_free_mb']) / float(memData['mem_swap_total_mb'])
if memData['mem.swap_total_mb'] > 0:
memData['mem.swap_free_perc'] = float(
memData['mem.swap_free_mb']) / float(memData['mem.swap_total_mb'])
except Exception:
self.logger.exception('Cannot compute swap stats')
@ -747,11 +743,11 @@ class Cpu(Check):
When figures are not available, False is sent back.
"""
def format_results(us, sy, wa, idle, st):
data = {'cpu_user_perc': us,
'cpu_system_perc': sy,
'cpu_wait_perc': wa,
'cpu_idle_perc': idle,
'cpu_stolen_perc': st}
data = {'cpu.user_perc': us,
'cpu.system_perc': sy,
'cpu.wait_perc': wa,
'cpu.idle_perc': idle,
'cpu.stolen_perc': st}
for key in data.keys():
if data[key] is None:
del data[key]

View File

@ -67,6 +67,9 @@ class MySql(AgentCheck):
host, port, user, password, mysql_sock, defaults_file, dimensions, options = self._get_config(
instance)
if 'service' not in dimensions:
dimensions.update({'service': 'mysql'})
if (not host or not user) and not defaults_file:
raise Exception("Mysql host and user are needed.")
@ -84,7 +87,7 @@ class MySql(AgentCheck):
password = instance.get('pass', '')
mysql_sock = instance.get('sock', '')
defaults_file = instance.get('defaults_file', '')
dimensions = instance.get('dimensions', None)
dimensions = instance.get('dimensions', {})
options = instance.get('options', {})
return host, port, user, password, mysql_sock, defaults_file, dimensions, options

View File

@ -28,18 +28,18 @@ class Network(AgentCheck):
}
NETSTAT_GAUGE = {
('udp4', 'connections'): 'net_udp4_connections',
('udp6', 'connections'): 'net_udp6_connections',
('tcp4', 'established'): 'net_tcp4_established',
('tcp4', 'opening'): 'net_tcp4_opening',
('tcp4', 'closing'): 'net_tcp4_closing',
('tcp4', 'listening'): 'net_tcp4_listening',
('tcp4', 'time_wait'): 'net_tcp4_time_wait',
('tcp6', 'established'): 'net_tcp6_established',
('tcp6', 'opening'): 'net_tcp6_opening',
('tcp6', 'closing'): 'net_tcp6_closing',
('tcp6', 'listening'): 'net_tcp6_listening',
('tcp6', 'time_wait'): 'net_tcp6_time_wait',
('udp4', 'connections'): 'net.udp4_connections',
('udp6', 'connections'): 'net.udp6_connections',
('tcp4', 'established'): 'net.tcp4_established',
('tcp4', 'opening'): 'net.tcp4_opening',
('tcp4', 'closing'): 'net.tcp4_closing',
('tcp4', 'listening'): 'net.tcp4_listening',
('tcp4', 'time_wait'): 'net.tcp4_time_wait',
('tcp6', 'established'): 'net.tcp6_established',
('tcp6', 'opening'): 'net.tcp6_opening',
('tcp6', 'closing'): 'net.tcp6_closing',
('tcp6', 'listening'): 'net.tcp6_listening',
('tcp6', 'time_wait'): 'net.tcp6_time_wait',
}
def __init__(self, name, init_config, agent_config, instances=None):
@ -100,7 +100,7 @@ class Network(AgentCheck):
if iface in self._excluded_ifaces and metric in exclude_iface_metrics:
# skip it!
continue
self.rate('net_%s' % metric, val, device_name=iface)
self.rate('net.%s' % metric, val, device_name=iface)
count += 1
self.log.debug("tracked %s network metrics for interface %s" % (count, iface))

View File

@ -5,18 +5,18 @@ from monagent.common.util import Platform
class ProcessCheck(AgentCheck):
PROCESS_GAUGE = ('processes_threads',
'processes_cpu.pct',
'processes_mem.rss',
'processes_mem.vms',
'processes_mem.real',
'processes_open_file_decorators',
'processes_ioread_count',
'processes_iowrite_count',
'processes_ioread_bytes',
'processes_iowrite_bytes',
'processes_voluntary_ctx_switches',
'processes_involuntary_ctx_switches')
PROCESS_GAUGE = ('process.threads',
'process.cpu_pct',
'process.mem.rss',
'process.mem.vms',
'process.mem.real',
'process.open_file_descriptors',
'process.io.read_count',
'process.io.write_count',
'process.io.read_bytes',
'process.io.write_bytes',
'process.voluntary_ctx_switches',
'process.involuntary_ctx_switches')
@staticmethod
def is_psutil_version_later_than(v):
@ -193,7 +193,7 @@ class ProcessCheck(AgentCheck):
self.log.debug('ProcessCheck: process %s analysed' % name)
self.gauge('processes_pid_count', len(pids), dimensions=dimensions)
self.gauge('process.pid_count', len(pids), dimensions=dimensions)
metrics = dict(zip(ProcessCheck.PROCESS_GAUGE,
self.get_process_metrics(pids,

View File

@ -41,6 +41,8 @@ class Zookeeper(AgentCheck):
timeout = float(instance.get('timeout', 3.0))
dimensions = instance.get('dimensions', {})
if 'service' not in dimensions:
dimensions.update({'service': 'zookeeper'})
sock = socket.socket()
sock.settimeout(timeout)
buf = StringIO()
@ -74,14 +76,14 @@ class Zookeeper(AgentCheck):
if buf is not None:
# Parse the response
metrics, new_dimensions = self.parse_stat(buf)
dimensions.update(new_dimensions)
new_dimensions.update(dimensions)
# Write the data
for metric, value in metrics:
self.gauge(metric, value, dimensions=dimensions)
self.gauge(metric, value, dimensions=new_dimensions)
else:
# Reading from the client port timed out, track it as a metric
self.increment('zookeeper.timeouts', dimensions=dimensions)
self.increment('zookeeper.timeouts', dimensions=new_dimensions)
@classmethod
def parse_stat(cls, buf):

View File

@ -39,13 +39,13 @@ class ServicePlugin(Plugin):
for process in self.found_processes:
# Watch the service processes
log.info("\tMonitoring the {0} {1} process.".format(process, self.service_name))
config.merge(watch_process([process], self.service_name))
config.merge(watch_process([process], self.service_name, process))
if self.service_api_url and self.search_pattern:
# Setup an active http_status check on the API
log.info("\tConfiguring an http_check for the {0} API.".format(self.service_name))
config.merge(service_api_check(self.service_name + '-api', self.service_api_url,
self.search_pattern, self.service_name))
self.search_pattern, self.service_name + '_api'))
return config

View File

@ -26,7 +26,7 @@ def find_process_name(pname):
return None
def watch_process(search_strings, service=None):
def watch_process(search_strings, service=None, component=None):
"""Takes a list of process search strings and returns a Plugins object with the config set.
This was built as a helper as many plugins setup process watching
"""
@ -34,17 +34,16 @@ def watch_process(search_strings, service=None):
parameters = {'name': search_strings[0],
'search_string': search_strings}
# If service parameter is set in the plugin config, add the service dimension which
# will override the service in the agent config
if service:
parameters['dimensions'] = {'service': service}
dimensions = _get_dimensions(service, component)
if len(dimensions) > 0:
parameters['dimensions'] = dimensions
config['process'] = {'init_config': None,
'instances': [parameters]}
return config
def service_api_check(name, url, pattern, service=None):
def service_api_check(name, url, pattern, service=None, component=None):
"""Setup a service api to be watched by the http_check plugin."""
config = agent_config.Plugins()
parameters = {'name': name,
@ -53,12 +52,26 @@ def service_api_check(name, url, pattern, service=None):
'timeout': 10,
'use_keystone': True}
# If service parameter is set in the plugin config, add the service dimension which
# will override the service in the agent config
if service:
parameters['dimensions'] = {'service': service}
dimensions = _get_dimensions(service, component)
if len(dimensions) > 0:
parameters['dimensions'] = dimensions
config['http_check'] = {'init_config': None,
'instances': [parameters]}
return config
def _get_dimensions(service, component):
dimensions = {}
# If service parameter is set in the plugin config, add the service dimension which
# will override the service in the agent config
if service:
dimensions.update({'service': service})
# If component parameter is set in the plugin config, add the component dimension which
# will override the component in the agent config
if component:
dimensions.update({'component': component})
return dimensions