openstacksdk statsd records for API operations are not response code
specific. This change adds a glob to the statsd record path to include
all return code response metrics in our API graphs. I believe this
change may have come with the openstacksdk 0.103.0 update. We also
update paths for servers POST, server details, and flavor details as
they have changed.
Note the network info is empty which is why we don't get graphs for
them, but the paths appear correct. I think this may be beacuse we don't
need to query network info in any of our clouds currently.
While we are at it we stop updating the airship and inap cloud graphs
since those should be cleaned up and this keeps the review overhead
smaller.
Change-Id: I5a6b80118afaf3b7782a1d1c131787f208583799
I think I generally messed these up on the original import as every
stat seems to refer to the same thing. Over time, the layout of
openstacksdk stats has changed meaning this doesn't work at all now.
Use stats that are actually in graphite which should show the overall
health of API requests.
Change-Id: I6bd82b38d80db2b56a399f80132a723564f9bc40
These resouces kindly donated by OSU OSL (https://osuosl.org/)
We have about 15 ARM64 nodes, we're sticking with just the regular
"os.large" instances.
(re)generate the nodepool graphs to account for this
The mirror site is active
Depends-On: https://review.opendev.org/c/opendev/system-config/+/786155
Change-Id: I8bc34beabd130d4a8bb004b0e029ec96945a95df
grafyaml actually knows this is deprecated, but it's actually more
than that and doesn't work to refresh the variable at all now. "1"
means "on load" which is what we want.
Change-Id: I34ecdd30c2188cb7e6ec32e33c6a6e99b6240934
The templating we end up with in the running grafana for the OVH regions
on the OVH dashboard is null. We set our OpenStack datasource to be our
default datasource but maybe we need to set it explicitly. Do this to
see if it changes the behavior.
Change-Id: Ie95dd980a5c117e1849b08a3611330ff06987c34
The minor updates are apparently due to us not having run the script
the last time it was updated with new urls.
Change-Id: I255d1e47b5cff29a3ed377b65ceab677ab1c272e
All of these dashboards are the same, and have mostly copied all the
same issues with them. This makes updating anything a massive pain.
This implements a single dashboard template with a small script to
create individual dashboards for each provider and its regions.
I have included a range of fixes. The y-axis format has changed in
later versions of grafana. The API time tracking is no longer scaled,
but we just tell grafana it is in ms and it displays it correctly.
The test nodes history graph is moved to the top, as it is probably
the most interesting graph (note this splits itself out per region, if
mulitple regions are selected). Values for "null as zero" are
consistently set. Various formatting fixes for the labels are
included.
Change-Id: I5fbffaec3c82aa1fce0947f771de67edd15f7dfc
These stats aren't updating any more. Unfortunately, I don't think
there's any current replacement as nodepool doesn't have any insight
into the job it is satisfying a request for.
Change-Id: Ib69fbda5ee019180cd8761d0ead474b426bce379
Since we now query a cloud for its quota information, lets track the
response rate in grafana.
Change-Id: Ie9e2727b5dc3d18f5e5fc37be89a9a5f9492eb47
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
Following the update to Zuul v3 some things changed:
- nodes.delete became nodes.deleting
- nodes.used became nodes.in-use but nodes.used is still relevant
as it's the status between 'in-use' and 'deleting'
- Add a panel for displaying failed nodes
Change-Id: I240d082115bd9078e45984d8fcff212a4e40e842
Depends-On: I6a89752d74ed7424267c3af3937ad01fb4bb8f86
Now that nodepool has been switch to use shade, we need to update
grafana to use the new shade syntax for Server related tasks.
Change-Id: I7698d54d89bda5327ac434fd8e662f0fe58d7f5e
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
I missed some tweaks on the previous Test Nodes graph change.
Also make the job runtimes wider like Paul suggested.
Change-Id: I5ac43909a679d273a557112ad8526a68de15f4f1
Add axis labels and units where appropriate.
Change the launch attempts graphs to summarize to 1m rather than
1h since grafana lets us zoom in. 1m is the lowest native unit
of time that will always show whole numbers for this metric (whose
lowest non-zero value is 1 event / 10 seconds).
Change the test nodes graph to stacked to match the way we normally
draw this graph, but change the tooltip to 'individual' so that
when hovering, individual values for the different states are
displayed, rather than cumulative (which does not make sense for
this application).
Also change the tooltip for the node graphs on the zuul dashboard
in the same manner.
Change-Id: I500aa486362476cff76a3d254093723f27021bed
Depends-On: Ie542dc4d0e151a00e84cc970c2cfa8c02377d7bf
These are per-region versions of the nodepool node state graph,
except that the values are not stacked in order to make the
individual values more accessible.
Change-Id: I8ec90758828484a9ffb7a90d2eacbcccc8b78bb4
There is no .error metric, but rather, errors are broken out by
cause. For this graph, simply display their sum.
Change-Id: Iae19e4e78098f3373c3195ff3ec52a11c5e92a3b