The remove_config() function only removes an exact match for
the configured instance.
This will allow removing plugin configuration when all config is
not known.
Use case: A compute node has been removed, so any host_alive
ping checks that are configured for it should be removed. But at
the time of removal the list of target_hostnames to match are
not known.
Change-Id: I8050e1eed68d7b64f7a968b061afa69fe2e86d72
Story: 2004539
Task: 28287
This patch removes the code that does the copy of /sbin/ip to
/usr/bin/monasca-agent-ip. There is a limitation with /sbin/ip
that limits copying it to a new name that is longer than
2 characters. The error is:
./monasca-agent-ip a
Object "nasca-agent-ip" is unknown, try "ip help".
As this is not working on RHEL,SLES, or Ubuntu this code
should be removed.
Change-Id: I439be00070eb1cf16416325f23a86fc7cd518acc
Story: 2001593
Task: 6543
This is useful, for example when monitoring Slab memory leaks. To support
gathering this metric a minimum version of psutil 5.4.4 is required
(released on Apr 13th 2018).
Story: 2006815
Task: 37375
Change-Id: Ibe8def9e2a7c967a34236889aa03b287065abcdc
Since the Luminous release of Ceph, the plugin no longer exports metrics
such as object storage daemon stats, placement groups and pool stats.
Check for the installed version of the Ceph command and parse results
according to version.
Include test data for Jewel and Luminous Ceph clusters.
Story: 2005032
Task: 29515
Change-Id: I0aef0db25f49545c715b07880edd57135e3beafe
Co-Authored-By: Bharat Kunwar <bharat@stackhpc.com>
Co-Authored-By: Doug Szumski <doug@stackhpc.com>
Currently we don't have any capability to monitor the internal TLS/SSL
certificates. i.e. SSL certificates used by MySQL for replication, RabbitMQ for
distribution, etc. The cert_check plugin is not adequate for this purpose
becaue it can only check on certficates over HTTPS endpoints. Furthermore,
checking on these internal certificates over the network is cumbersome
because the agent plugin would have to speak specific protocols.
This patch adds a cert_file_check plugin to detect the certificate expiry
(in days from now) for the given X.509 certificate file in PEM format.
Similar to cert_check plugin, this plugin will a metric
'cert_file.cert_expire_days' which contains the number of days from now the
given certificate will be expired. If the certificate has already expired,
this will be a negative number.
Change-Id: Id95cc7115823f972e234417223ab5906b57447cc
Story: 2006753
A powerful metric to watch for a swift cluster is the
number of handoff partitions on a drive on a storage node.
A build up of handoff nodes on a particular server could
indicate a disk problem somewhere in the cluster. A bottleneck
somewhere. Or better, when would be a good time to rebalance
the ring (as you'd want to do it when existing backend data
movement is at a minimum.
So it turns out to be a great visualisation of the health of
a cluster.
That's what this check plugin does. Each instance check takes
the following values:
ring: <path to a Swift ring file>
devices: <path to the directory of mountpoints>
granularity: <either server or device>
To be able to determine primary vs handoff partitions on a drive
the swift ring needs to be consulted. If a storage node stores
more then 1 ring, and an instance would be defined for each.
You give swift a bunch of disks. These disks are placed in what
swift calls the 'devices' location. That is a directory where a
mount point for each mounted swift drive is located.
Finally, you can decide on the granularity, which defaults to
`server` if not defined. Only 2 metrics are created from this
check:
swift.partitions.primary_count
swift.partitions.handoff_count
But with the hostname dimension a ring dimension will also be set.
Allowing the graphing of the handoff vs partitions of each ring.
When the granularity is set to device, then an additional
dimension to the metric is added, the device name (the name of
the devices mount point). This allows the graphing and monitoring
of each device in a server if a finer granularity is required.
Because we need to consult the Swift ring there is a runtime
requirement on the Python Swift module being installed. But
this isn't required for the unit tests. Making it a runtime
dependency means when the check is loaded it'll log an error
and then exit if it can't import the swift module.
This is the second of two Swift check plugins I've been working on.
For more details see my blog post[1]
[1] - https://oliver.net.au/?p=358
Change-Id: Ie91add9af39f2ab0e5b575390c0c6355563c0bfc
Swift outputs alot of statsd metrics that you can point directly
at monasca-agents. However there is another swift endpoint,
recon, that is used to gather more metrics.
The Swift recon (or reconnaissance) API is an endpoint each of the
storage node servers make available via a REST API. This API can
either be hit manually or via the swift-recon tool.
This patch adds a check plugin that hits the recon REST API and
and send metrics to monasca.
This is the first of two Swift check plugins I'm working on.
For more details see my blog post[1]
[1] - https://oliver.net.au/?p=358
Change-Id: I503d74936f6f37fb261c1592845968319695475a
Prometheus plugin support is emphasized in README.rst. Also formating of
the first page is updated to improve readability. Promethes section in
Plugins.md is fixed to be correctly referenced from table of content.
Story: 2005625
Task: 30878
Change-Id: Icbf305435d1bacdeabd1654af5e14b58a3248282
This adds support for setting the statsd metrics aggregation interval
as part of Monasca setup. Setting this interval is useful for users
calculating rates from statsd metrics.
Story: 2005063
Task: 29607
Change-Id: I22f5f1700c438245fd7e98deb40d706358349b6c
Currently this paragraph is not very readable due to a lot of variable
names which are sometimes written with ", and sometimes without.
Mark these variables now as inline code (with backticks) to make the
paragraph more readable.
Change-Id: I6c06d50404e570db8e791da89a0f95f98597798e
Just some improvements to the formatting to help make the markdown
more readable. Did not try to address correctness of any of the
content.
Change-Id: I323ae1a942ca48e63416421407345b32aa2da121
To support python3 in the near future this was done:
* Removed dependency on supervisor.
* Added template configuration for systemd target that includes all
services.
* Added templates configuration for systemd service for every single
service.
* Changed monasca_setup to use the new templates.
In the meanwhile code was formated to cope with pep8 settings and some
other small changes were done to comply with pycodestyle and
pydocstring.
Task: 4126
Story: 2000975
Depends-On: https://review.openstack.org/#/c/566475/
Change-Id: I0d0c4ea41a830581d6b9f247fad6a2dda1f96cbe
Now the user will be able to configure one, default Keystone for all
services in `http_check.yml` instead of providing them one by one for
every instance.
Setting Keystone config in every instance is marked as depreciated.
Story: 2001843
Task: 12610
Change-Id: If52b52efab6cc14a7df583b1dc2596b04e6813bc
LXC plugin throw up a exception when try collect cpu metrics. This
patch fix it (tests are passing) and add swap collector.
Change-Id: I3b12ac6ce199006bc1e024d2b2626657519e4f0b
Story: 2001563
Task: 6507
This is desastrous to the rest of the system when
run outside a venv, as it overwrites the system ip
and it loses then capabilities to run for everyone else
Story: 2001593
Task: 6542
Change-Id: Ie0b7ef25b0f2cf6aca61adda4de5767ac2300cae
Adds detection plugin to monitor cassandra service, including
the process, and the data directory through the args.
Change-Id: Ic2c20bc878527f607c0eb871e98a79c1521c0507
Story: 2001499
task: 6289
improvements for monasca self-monitoring
add metrics:
- number of cores
- memory used (percentage)
- file system used (percentage)
story: 2001407
task: 6099
Change-Id: I11dd367543b6c17b9935aa4826345dd5df721445
Add configurable maximum size of batches of measumrement to write
to Monasca API. Prevents killing Monasca API in memory limited
configurations. The default is no limit.
Change-Id: I2bf84501cc51c24843d7c3befd8f9dd42f010f0c
Story: 2001434