Added additional core related cpu and aggregated disk metrics

Added 1 additional cpu core metrics and 2 rollup disk metrics. Added a configuration parameter to the disk plugin configuration to enable/disable the aggregated disk metrics. Also added additional documentation to the README file to reflect how to add dimensions to a plugin instance. Change-Id: I7e73f7103afa1e3e366769d56eb91233f5513726
2015-03-17 18:36:11 -06:00 · 2015-03-17 18:36:11 -06:00 · 9fcfd752f8
parent 728ce70e85
commit 9fcfd752f8
5 changed files with 62 additions and 12 deletions
--- a/README.md
+++ b/README.md
@ -214,11 +214,15 @@ Change

 You must replace all of the curly brace values and you can also optionally tweak any of the other configuration items as well like a port number in the case of a port conflict.  The config file options are documented in the agent.yaml.template file.  You may also specify zero or more dimensions that would be included in every metric generated on that node, using the dimensions: value. Example: (include no extra dimensions on every metric)

-    dimensions: (This means no dimensions)
+    dimensions: (No dimensions example)
 			OR
-    dimensions: service:nova (This means one dimension called service with a value of nova)
+    dimensions: (Single dimension example)
+        service: nova
    		OR
-    dimensions: service:nova, group:group_a, zone:2 (This means three dimensions)
+    dimensions: (3 dimensions example)
+        service: nova
+        group: group_a
+        zone: 2

 Once the configuration file has been updated and saved, monasca-agent must be restarted.

@ -369,8 +373,11 @@ This section documents the system metrics that are sent by the Agent.  This sect
 | cpu.stolen_perc | | Percentage of stolen CPU time, i.e. the time spent in other OS contexts when running in a virtualized environment |
 | cpu.system_perc | | Percentage of time the CPU is used at the system level |
 | cpu.user_perc  | | Percentage of time the CPU is used at the user level |
+| cpu.total_logical_cores  | | Total number of logical cores available for an entire node (Includes hyper threading).  **NOTE: This is an optional metric that is only sent when send_rollup_stats is set to true.** |
 | disk.inode_used_perc | device, mount_point | The percentage of inodes that are used on a device |
 | disk.space_used_perc | device, mount_point | The percentage of disk space that is being used on a device |
+| disk.total_space_mb | | The total amount of disk space aggregated across all the disks on a particular node.  **NOTE: This is an optional metric that is only sent when send_rollup_stats is set to true.** |
+| disk.total_used_space_mb | | The total amount of used disk space aggregated across all the disks on a particular node.  **NOTE: This is an optional metric that is only sent when send_rollup_stats is set to true.** |
 | io.read_kbytes_sec | device | Kbytes/sec read by an io device
 | io.read_req_sec | device   | Number of read requests/sec to an io device
 | io.read_time_sec | device   | Amount of read time/sec to an io device
@ -460,8 +467,12 @@ init_config:
 instances:
    - username: john_smith
      password: 123456
+      dimensions:
+          node_type: test
    - username: jane_smith
      password: 789012
+      dimensions:
+          node_type: production
 ```

 #### init_config
@ -470,6 +481,9 @@ In the init_config section you can specify an arbitrary number of global name:va
 #### instances
 The instances section is a list of instances that this check will be run against. Your actual check() method is run once per instance. The name:value pairs for each instance specify details about the instance that are necessary for the check.

+#### dimensions
+The instances section can also contain optional dimensions. These dimensions will be added to any metrics generated by the check for that instance.
+
 #### Plugin Documentation
 Your plugin should include an example YAML configuration file to be placed in /etc/monasca/agent/conf.d/ which has the name of the plugin YAML file plus the extension '.example', so the example configuration file for the process plugin would be at /etc/monasca/agent/conf.d/process.yaml.example. This file should include a set of example init_config and instances clauses that demonstrate how the plugin can be configured.

--- a/conf.d/cpu.yaml
+++ b/conf.d/cpu.yaml
@ -3,3 +3,7 @@ init_config:
 instances:
    # Cpu check only supports one configured instance
    - name: cpu_stats
+      # Send additional cpu rollup system metrics (Default = False)
+      # These metrics include the following metrics aggregated across all cpus:
+      # cpu.total_logical_cores
+      #send_rollup_stats: True
--- a/conf.d/disk.yaml
+++ b/conf.d/disk.yaml
@ -9,6 +9,11 @@ instances:
      # Send additional disk i/o system metrics (Default = True)
      #send_io_stats: False

+      # Send additional disk rollup system metrics (Default = False)
+      # These metrics include the following metrics aggregated across all disks:
+      # disk.total_mb and disk.total_used_mb
+      #send_rollup_stats: True
+
      # Some infrastructures have many constantly changing virtual devices (e.g. folks
      # running constantly churning linux containers) whose metrics aren't
      # interesting. To filter out a particular pattern of devices
--- a/monasca_agent/collector/checks_d/cpu.py
+++ b/monasca_agent/collector/checks_d/cpu.py
@ -17,13 +17,23 @@ class Cpu(checks.AgentCheck):
        """
        dimensions = self._set_dimensions(None, instance)

+        if instance is not None:
+            send_rollup_stats =  instance.get("send_rollup_stats", False)
+        else:
+            send_rollup_stats =  False
+
        cpu_stats = psutil.cpu_times_percent(percpu=False)
-        self._format_results(cpu_stats.user + cpu_stats.nice,
-                             cpu_stats.system + cpu_stats.irq + cpu_stats.softirq,
-                             cpu_stats.iowait,
-                             cpu_stats.idle,
-                             cpu_stats.steal,
-                             dimensions)
+        num_of_metrics = self._format_results(cpu_stats.user + cpu_stats.nice,
+                                              cpu_stats.system + cpu_stats.irq + cpu_stats.softirq,
+                                              cpu_stats.iowait,
+                                              cpu_stats.idle,
+                                              cpu_stats.steal,
+                                              dimensions)
+        if send_rollup_stats:
+            self.gauge('cpu.total_logical_cores', psutil.cpu_count(logical=True), dimensions)
+            num_of_metrics += 1
+        log.debug('Collected {0} cpu metrics'.format(num_of_metrics))
+

    def _format_results(self, us, sy, wa, idle, st, dimensions):
        data = {'cpu.user_perc': us,
@ -38,4 +48,4 @@ class Cpu(checks.AgentCheck):

        [self.gauge(key, value, dimensions) for key, value in data.iteritems()]

-        log.debug('Collected {0} cpu metrics'.format(len(data)))
+        return len(data)
--- a/monasca_agent/collector/checks_d/disk.py
+++ b/monasca_agent/collector/checks_d/disk.py
@ -17,29 +17,36 @@ class Disk(checks.AgentCheck):

        """
        dimensions = self._set_dimensions(None, instance)
+        rollup_dimensions = dimensions.copy()

        if instance is not None:
            use_mount = instance.get("use_mount", True)
            send_io_stats = instance.get("send_io_stats", True)
+            send_rollup_stats =  instance.get("send_rollup_stats", False)
            # If we filter devices, get the list.
            device_blacklist_re = self._get_re_exclusions(instance)
            fs_types_to_ignore = self._get_fs_exclusions(instance)
        else:
            use_mount = True
-            fs_types_to_ignore = []
-            device_blacklist_re = None
            send_io_stats = True
+            send_rollup_stats =  False
+            device_blacklist_re = None
+            fs_types_to_ignore = []

        partitions = psutil.disk_partitions(all=True)
        if send_io_stats:
            disk_stats = psutil.disk_io_counters(perdisk=True)
        disk_count = 0
+        total_capacity = 0
+        total_used = 0
        for partition in partitions:
            if partition.fstype not in fs_types_to_ignore \
                or (device_blacklist_re \
                and not device_blacklist_re.match(partition.device)):
                    device_name = self._get_device_name(partition.device)
                    disk_usage = psutil.disk_usage(partition.mountpoint)
+                    total_capacity += disk_usage.total
+                    total_used += disk_usage.used
                    st = os.statvfs(partition.mountpoint)
                    if use_mount:
                        dimensions.update({'mount_point': partition.mountpoint})
@ -71,6 +78,16 @@ class Disk(checks.AgentCheck):
                        except KeyError:
                            log.debug('No Disk I/O metrics available for {0}...Skipping'.format(device_name))

+        if send_rollup_stats:
+            self.gauge("disk.total_space_mb",
+                        total_capacity/1048576,
+                        dimensions=rollup_dimensions)
+            self.gauge("disk.total_used_space_mb",
+                        total_used/1048576,
+                        dimensions=rollup_dimensions)
+            log.debug('Collected 2 rolled-up disk usage metrics')
+
+
    def _get_re_exclusions(self, instance):
        """Parse device blacklist regular expression"""
        filter = None