The zuul job queue is no longer stored in geard so we can safely remove
this graph which has been empty for a long time. Additionally we remove
the geard queue graph for logstash as we are beginning the process of
shutting down those services.
We update the span size of the remaining graphs to make them render a
bit more nicely without the old graphs in place.
Change-Id: I1a690cc90279547b9766c6043db6dbbe3e66deb9
We have replaced zk01-03 with zk04-06. We are less interested in the
data for 01-03 on the dashboard now so limit what the dashboard renders
to 04-06.
Change-Id: I5404ba40035f259e234c79529388d8af01fdc0ba
This adds the event queue processing time to the zuul dashboard;
that shows the effect of ZK on event processing. It also adds
several ZooKeeper server performance metrics.
Change-Id: I56196b781e8f7950c3db40647f84d7e8bc6c499d
The zuul executors are ze01-ze12.opendev.org now and not
ze01-ze12.openstack.org. We need to update the zuul status grafana
dashboard to only select the opendev servers and render them properly.
Change-Id: Iff6f311f7ba5bcfb9d23f2d289f071b791a788e8
Since node requests are something that we generally want to see
reduced to 0, it can be misleading to show a graph where the lowest
point on the y axis is, for example (as I write this) nearly 4,000.
Fix the minimum to zero so it's easier to see what the overall trend
toward zero is.
Change-Id: Iad1b5667fd6d1d4bfa9fe50706ad71debd01d5c6
Make it clear that these values are percentages. We update labels to
convey that and scale the values so that we get 71% instead of 7.145.
Change-Id: I64bb9cfa536c2ba395be0264839a7a50929d7477
Start tracking HDD usage for zuul-executors, as it now has the
ability to stop / start builds.
Change-Id: Ibf891deadcaa8e5d323992e9626765341ca5c44a
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
The scale argument was incorrectly outside the function for the load
average graph. Also we seem to have acquired some extra ")"'s somehow
Change-Id: I0a6a90ed30fac11aedc889cfdcfd4dbb3f815b5a
We're back to using zeXX_openstack_org in the metric names, this
translates them to zeXX in the legend.
Change-Id: I39af625885c3cd179555b3d3143a2391e49a7c81
This adds starting builds (builds which have not yet run their
first playbook), and percent available memory. Both of these are
used to control whether new jobs are accepted.
Change-Id: Ia0312d13c739da1d19983c8678f0198b0d8ca314
Due to differing heights, the current layout wastes a lot of space
at lower screen widths. This rearranges the graphs so that the
taller ones are on the bottom and therefore tile better.
Change-Id: I062cd1d96d236c564b15ca96acb72e7f2e49f012
There is an easy correlation between the Executor load and the
amount of Running builds. By re-ordering graphs, we're able to get
them to show one directly above the other, making it easier to
spot trends in the data.
When they're in different columns, it's harder to see the
correlation.
Change-Id: I7706c3293fbd702695fb3e4e917e33ca947beef9
This metric is no longer sent in newer versions of geard. If we
aren't running it now, we will be soon. Go ahead and drop it
from the graphs.
Change-Id: I2ee92f0673b28704a6e28b400554cd5a2c9642cd
We have a new metric to track if executors are accepting builds or
not.
Change-Id: Icd8671c026e2ed93b0acff536df795b3fa030539
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
Update zuul-status to use zuulv3 statsd information.
Change-Id: Ida83ad181d30acaec33fa39f7ef353ef99e232eb
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
This is the correct key under zuulv3. A bug (fix in progress) still
had some reports occasionally being emitted under the old key (on
reconfiguration events).
Change-Id: I5e2e65b1c9831f2f60e7916e26a3792d86ec25ae
Remove the tempest run count which doesn't provide a lot of insight.
Add graphs for two new metrics emitted by executors.
Change-Id: I23c563f1aa155fed341155a5f887c95448a17a07
The ordering is based on what we have in zuul/layout.yaml.
Change-Id: Idcb46c6c3a630bf032192642aa733a545ab03eea
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
This graph approximately answers the question "How many runs would
a third-party CI system be expected to handle for the integrated
gate?"
Change-Id: Iae74306ce3c922be3d82d61ca86e724f7e048dff
Add axis labels and units where appropriate.
Change the launch attempts graphs to summarize to 1m rather than
1h since grafana lets us zoom in. 1m is the lowest native unit
of time that will always show whole numbers for this metric (whose
lowest non-zero value is 1 event / 10 seconds).
Change the test nodes graph to stacked to match the way we normally
draw this graph, but change the tooltip to 'individual' so that
when hovering, individual values for the different states are
displayed, rather than cumulative (which does not make sense for
this application).
Also change the tooltip for the node graphs on the zuul dashboard
in the same manner.
Change-Id: I500aa486362476cff76a3d254093723f27021bed
Depends-On: Ie542dc4d0e151a00e84cc970c2cfa8c02377d7bf
Right now, the values we display are averages, which is confusing
to people. Setting valueName to current, we'll actually display the
current count for singlestats.
Change-Id: Icb1a62fb8b289165679ceec16e7d65dab98bf602
Depends-On: I4df8d130fce45cf58b01808997fc561cf8c4b42d
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
Trivial change that updates the name of the rows for the zuul status
dashboard. This is also to ensure our gate is working currently.
Change-Id: I68d8f40bee4ee9230d5abc3a4391eb66b2188d93
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
This is our first grafyaml dashboard which reproduces the current
graphs on status.o.o/zuul rendered by graphite. While we aren't
actually running grafyaml upstream yet, we can get started on building
our dashboards.
We also added a tox job to properly gate on the configuration.
Change-Id: Ia738bcb510e146ab38566f0c13ff483ec618a6ed
Depends-On: I16b9affd4402fe5d1637238a2e27f22fdd3986ff
Signed-off-by: Paul Belanger <pabelanger@redhat.com>