Commit Graph

34 Commits

Author SHA1 Message Date
melissaml 87cbdd6649 Replace git.openstack.org URLs with opendev.org URLs
Change-Id: I790c1876a3e44da8623c74632332f0e453dce1f6
2019-07-09 16:36:22 +00:00
Kevin Carter 969a30c6c7
Add grafana
This change introduces grafana into the stack which gives us a great
way to visualize the data. The grafana role from cloudalchemy is being
used for the bulk of the deployment.

Because the grafana deployment playbook is now standalone the mentions
of grafana in the other ops directories have been removed.

Change-Id: I23e1c96cd1fda7ece9b86a69f9f0326913de714d
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-04-13 10:31:34 -05:00
Jean-Philippe Evrard 9d4cf24ca7 Do not log passwords
This prevents data to be leaked into the callback plugin.

Change-Id: I3c3b03c18c547824ae1f4ac3272f5b2f8142dec0
2018-04-11 13:37:59 +02:00
Ramon Orru 2113b36bf0 Moving telegraf-plugins to drop them properly
When playbook-influx-telegraf.yml runs, it uses roles from mgrzybek
openstack-ansible-telegraf repo. Playbooks from that repo loads
search scripts in different dirs and reading a source path.

Change-Id: Ib1ca9f60ad5e686790b56e1c66ab53ed9cc490b7
2018-03-08 14:20:02 +01:00
Ramon Orru 1b1e2853d1 Using cluster_metrics host as default output
InfluxDB is usually already installed at this point
(running playbook-influx-db.yml playbook), so in most cases using
such host as default output avoids to specify additional information.

Change-Id: Iac5e16c3d24a74119ea2179ecc3e5273de20676e
2018-03-08 14:05:41 +01:00
Ramon Orru 3dbc21a0c3 Fixing typo
Change-Id: If52051ed649b83b772d2747cd47f6d299a61f386
2018-03-08 14:00:36 +01:00
Nguyen Hung Phuong 65340472d2 Remove empty file
cluster_metrics/etc/user_metrics.yml
is empty file. We probably should delete it.

Change-Id: Id8b2905a2c94dc3953ca2fbd924ba68e6c5c1bf5
2018-02-23 16:29:22 +07:00
Mathieu Grzybek 364e3135e7 Updates the last update date
Change-Id: I061a63930397b7af738420e9f5809619bd440ed0
2017-12-01 14:17:25 +01:00
Mathieu Grzybek 98b4699bc7 Uses a dedicated telegraf role
Change-Id: I17651120ac4b77fc6e0d4cbcde99e4e1fa36d1d1
2017-12-01 14:14:26 +01:00
Bjoern Teipel 69f5a01eb4 Fixing timing_counter based swift graphs
The non_negative_derivative function applied to the timing_counter
based graphs is replaced by the mean function, since these
timing_counter are not exactly behaving like traditional counters.

Change-Id: I5e2e5cdd2d04f469853f59f96da68839830bb359
2017-06-27 15:42:09 -05:00
Jenkins 00e01eb56a Merge "Add prometheus_client as optional output" 2017-06-13 14:57:50 +00:00
Jenkins 9375440072 Merge "Ensure interval is set to integer" 2017-06-13 14:57:36 +00:00
Melvin Hillsman 9e706b13a3 Add prometheus_client as optional output
Currently the only outputs plugin used is influxdb. This adds
prometheus_client outputs plugin and its directives as options.
The telegraf.conf.j2 template has been updated to check for the
outputs_prometheus_client variable and two other related variables.
New variable names and default values are shown in vars.yml.

Change-Id: I8d9380a4cc2ea58b4ad98b9fc964d45ff82090ed
Signed-off-by: Melvin Hillsman <mrhillsman@gmail.com>
2017-06-12 11:24:20 -05:00
mrhillsman 192390315a Ensure interval is set to integer
The run_int interval is being set to True. This patch ensures that
the interval is returned as an integer.

Change-Id: I1faf616b16e9da3dac45bec9e1ec3ca098563552
Signed-off-by: mrhillsman <mrhillsman@gmail.com>
2017-06-12 15:15:58 +00:00
mrhillsman fa89a223fd Update mysql login_host for grafana to allow being set via variable
Change-Id: I4c7bfd6cffe7949ad06f0baaed951ebe75d0c7c7
Signed-off-by: mrhillsman <mrhillsman@gmail.com>
2017-06-09 15:07:31 -05:00
Bjoern Teipel 2d2c1f1419 Telegraf playbook fixes
The vm_quota.py is missing as telegraf plugin but still
referenced inside the playbook-influx-telegraf.yml
playbook.
Additionally the my.cnf is not necessary to be present
on the telegraf hosts/containers, in order to function.
The override influxdb_protocol exposes the protocol to
be used for communicating with influxdb, usually HTTP

Change-Id: I90226d02e82d2516be4a4d84baff22e46ce709fb
2017-05-09 15:40:13 -05:00
Bjoern Teipel a4f8d4a972 Fixing counter based graphs inside the Swift dashboard
All counter based graphs inside the the Swift dashboard are fixed
and now correctly showing the timings per second rather than total
values per telegraf flush interval.
Additionally the High Response Time graphs are now using the timing_upper
metrics.

Minor issues inside the playbook-influx-telegraf.yml and telegraf.conf.j2
are fixed to support deployments without optional componentns like ironic,
cinder etc.

Change-Id: I0ac0d2004416cae7a6d137d98ab685b7abc22d3f
2017-04-25 10:41:04 -05:00
Bjoern Teipel 335d23f32b Adding Swift Proxy Server Dashboard
Provide initial version of a grafana OpenStack Swift Dashboard for the
Swift Proxy Server. The metrics are gathered by the built-in statsd
functionality of Swift and are forwarded via local telegraf daemons
to the influxDB.

Change-Id: Ieb7df97fbc7534e34ebde5a5fe365ff479de81fe
2017-04-13 16:19:09 -05:00
Kevin Carter 33da8fc8eb Ensure the components are isolated from the system
This creates a specific slice which all OpenStack services will operate
from. By creating an independent slice these components will be governed
away from the system slice allowing us to better optimise resource
consumption.

See the following for more information on slices:

* https://www.freedesktop.org/software/systemd/man/systemd.slice.html

See for following for more information on resource controls:

* https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html

Tools like ``systemd-cgtop`` and ``systemd-cgls`` will now give us
insight into specific processes, process groups, and resouce consumption
in ways that we've not had access to before. To enable some of this reporting
the accounting options have been added to the [Service] section of the unit
file.

Change-Id: Ife2e28ce6b3e0d0219b8a5ec2ca8d9dbe513d5a7
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2017-03-28 23:58:51 -05:00
Kevin Carter e786c69b15 Added Cinder storage pools data
This change adds the cinder storage pools data to the influx metric
collection system as a plugin.

Change-Id: I632b53aa09d69c6df28b86988629242a26ab9b50
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2017-01-27 17:38:28 +00:00
Nish Patwa f0b26e6301 Added kapacitor scripts
Added kapacitor tickscripts to trigger alerts based on certain
thresholds.

Change-Id: I66d1b1e58d279405637d9a2f06b3aae19fa29cc3
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2017-01-26 22:10:45 +00:00
Kevin Carter 229739377d Added kvm_virsh to telegraf plugins
The KVM virsh plugin already existed however the setup was not using the
new playbook plugin system. This change moves the kvm vish plugin into
that system and updates the plugin to use the influxdb line format
instead of the json format which was recently deprecated.

Change-Id: Ib23a0a231044389aab5669dc0c467175cd220423
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2017-01-10 18:02:49 +00:00
Kevin Carter 5b93b9a2c2 Added nova quota plugin
This change adds a second plugin to the telegraf setup. A change is
being made to the telegraf config file to allow for more than one
external plugin to be executed and to allow for full plugin execution
between telegraf reporting intervals.

Each plugin will potentially account for up to 8 seconds of runtime with
the telegraf agent now using a dynamic reporting interval based on the
number of plugins a given agent is needing to execute.

Change-Id: I652e8e2f13bd4fb9135280b76f2344177a14eaf7
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2016-12-15 15:19:50 +00:00
Kevin Carter 76ad4f52da Add ironic to the metric collection plugins
Change-Id: Ia2e9f19b284ba48beeee8a5d0c4b2a0bd34dd798
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2016-12-15 14:48:11 +00:00
Jenkins 926dd95918 Merge "Adding influx relay to make the existing monitoring stack highly available" 2016-11-17 13:40:58 +00:00
Nish Patwa 17450f35f3 Adding influx relay to make the existing monitoring stack highly available
Added Influxdb relay to make the existing monitoring stack highly
available. Relay replicates the data to multiple database instances.
Also added configutation in HAProxy that load balances the read queries
to influxdb instances and write queries to influxdb relays

        ┌─────────────────┐
        │writes & queries │
        └─────────────────┘
                 │
                 ▼
         ┌───────────────┐
         │               │
┌────────│ Load Balancer │─────────┐
│        │               │         │
│        └──────┬─┬──────┘         │
│               │ │                │
│               │ │                │
│        ┌──────┘ └────────┐       │
│        │ ┌─────────────┐ │       │┌──────┐
│        │ │/write or UDP│ │       ││/query│
│        ▼ └─────────────┘ ▼       │└──────┘
│  ┌──────────┐      ┌──────────┐  │
│  │ InfluxDB │      │ InfluxDB │  │
│  │ Relay    │      │ Relay    │  │
│  └──┬────┬──┘      └────┬──┬──┘  │
│     │    |              |  │     │
│     |  ┌─┼──────────────┘  |     │
│     │  │ └──────────────┐  │     │
│     ▼  ▼                ▼  ▼     │
│  ┌──────────┐      ┌──────────┐  │
│  │          │      │          │  │
└─▶│ InfluxDB │      │ InfluxDB │◀─┘
   │          │      │          │
   └──────────┘      └──────────┘

This patch is dependent on this patch:
https://review.openstack.org/#/c/392328/

Change-Id: I05bdaa0e2fb251b48df1d26d09ad63942872293a
2016-11-10 15:54:56 +00:00
Jean-Philippe Evrard 03c48e2357 Use apt_repository update_cache feature
With ansible 2.2, apt_repository update_cache feature has
been fixed. When a new repo will be added, apt-get update
will be run after the addition if update_cache is set to yes.

This combined with the apt module now properly checking the
cache validity, we can now have proper updating of the cache
with registering variables.

Change-Id: Ic9788156a88223dc0d27fafa2a798f396135f990
2016-11-04 16:51:31 +00:00
Joshua Hesketh 66b017144f Replace github with git.o.o
Change-Id: Id1ec52d14ecac9fd68b261c2be0e1dcdabbf7d81
2016-11-02 12:16:10 +11:00
Kevin Carter baf0553a36 Remove deprecated ansible_ssh_host variable
This changes 'ansible_ssh_host' to 'ansible_host'. The 'ansible_ssh_host'
variable has been deprecated as noted here: [0].

[0] - http://docs.ansible.com/ansible/intro_inventory.html#hosts-and-groups

Change-Id: Ie34bb924b55d4e1c7b4568c2eadd2a7a1a60a821
Related-Bug: #1636606
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2016-10-25 19:43:24 +00:00
Jenkins 763125e4b3 Merge "Added playbook to deploy Kapacitor" 2016-09-28 14:43:19 +00:00
Nish Patwa c1f7a5b2fb Added playbook to deploy Kapacitor
Added a playbook to deploy an alerting tool, Kapacitor that can work
with influxdb. Updated readme to demonstrate how to deploy Kapacitor.

Kapacitor can be used to trigger alerts based on some uncertain
events. It subscribes to influxdb to collect data.

General Flow:
Telegraf -> InfluxDb -> Grafana
Telegraf -> InfluxDb -> Kapacitor

Change-Id: I5c400cf9efbda43bb5cb7a9bbd890435e74127f3
2016-09-20 22:03:56 +00:00
Kevin Carter 4da4287b14 Updated the grafana dash boards
Change-Id: Ic5b87e62d830f8d77d39af97bd17966b6bb1e038
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2016-09-18 10:17:22 -05:00
Nish Patwa 7f7262996f Updated the readme file to add missing commands.
Added commands in readme file to point to the correct inventory
file and run grafana on HA proxy.

Change-Id: I14756b8986738a558f63d497b617dbc40f4d977d
2016-09-13 15:52:45 +00:00
Kevin Carter 19255fd1a8
implement minimal metric collection
This change implements metric collection system using influxdata
(influxdb and telegraf) with visulization using grafana. No
Dashboard automation is provided at this time however a template
dashboard can be used by importing the JSON files from the
dashboards directory.

Change-Id: I5445b01170054393a31afc2a20ffb3ea4eda1209
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2016-09-09 13:08:38 -05:00