The above exception [1] occurs for example [2] when elasticsearch returns data
with more than one zuul_executor as a list.
This is what l#58 is able to sort
[(12.5, '5'), (12.5, '4'), (12.5, '3'), (25.0, '6'), (18.75, '2'), (6.25, '13'), (12.5, '1')]
This is when it throws exception
[(8.13953488372093, 'ze06.opendev.org'),
(12.790697674418604, 'ze10.opendev.org'),
(5.813953488372093, 'ze05.opendev.org'),
(8.13953488372093, 'ze01.opendev.org'),
(16.27906976744186, 'ze04.opendev.org'),
(4.651162790697675, 'ze03.opendev.org'),
(3.488372093023256, 'ze02.opendev.org'),
(4.651162790697675, 'ze08.opendev.org'),
(12.790697674418604, 'ze09.opendev.org'),
(20.930232558139537, 'ze12.opendev.org'),
(1.1627906976744187, 'ze11.opendev.org'),
(1.1627906976744187, ['ze12.opendev.org', 'ze11.opendev.org'])]
[1] https://0050cb9fd8118437e3e0-3c2a18acb5109e625907972e3aa6a592.ssl.cf5.rackcdn.com/790065/7/check/openstack-tox-py38/4968a73/tox/test_results/1449136.yaml.log
[2] https://review.opendev.org/c/openstack/tripleo-ci-health-queries/+/787569/6/output/elastic-recheck/1449136.yaml
Change-Id: Ie559d5764d9f68420119a7f9608389f0745a9c02
Jobs are failing with the following error in n-cpu:
Guest refused to detach volume <uuid>:
nova.exception.DeviceDetachFailed: Device detach failed for vdb:
Unable to detach the device from the live config.
At this time, this query has:
20 hits in the last 7 days, check and gate, all failures
Related-Bug: #1882521
Change-Id: Ib92b679f2d1dbd8131f58c8bb85fc2a3f65dbfb5
Created by simply installing the package in a local venv.
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Change-Id: Ie27eb9fca09b2b204e5bc95076fc5ada8128944f
This should render the need to use wrappers obsolete as all
file writing operations are now atomic, assuring that we either
write the entire file or fail.
That is important as we do not want to end-up serving partial files
with the web-server.
Change-Id: I696e2474b557e6b5fea707a198f32cea721cc150
Using a single test run orchestrator makes it easier to maintain
the project, especially as pytest is actively maintained.
Change-Id: I5c843984bc0a1b9264e744373e6a3fd9d43e99cd
Refactors configuration loading in order to simplify it and to
allow overriding defaults using environment variables.
This behavior is similar to other tools like pip or ansible, which
can load any configurable option from env.
This step ease migration towards containerized use, where we do not
want to keep any secrets inside containers and we may want to
avoid volume mounting, especially when testing.
Change-Id: I0d3a9f19b0ba8d1604d0ca63db01296a3219fb47
Replaces static implementation that received password and a member
function that can make use of the config object.
Change-Id: If9617b6db73eb49c5193f098d45e357a267529dd
- part of replacing puppet deployment with ansible and docker-compose
- current change tests container building
Change-Id: Id70f156e63751ebd14908cc5da969e964f63645b
Story: TRIPLEOCI-177
Switches queries testing to use of pytest which provides the following:
- test generator for each query (parametrize)
- ability to test a single query test
- generate html report with test results, making easier to investigate
failures.
- parallel executions
- minor bugfix which prevented running queries from running with py38
as the config parser requires only strings (None being invalid).
Change-Id: I982c694a5160a9ecfd117d177d30b911cfe53425
Apparently under linux pip can become fully blocked due to keyring
presence. That is a known open bug, so we apply the workaround until
it is fixed.
Change-Id: I30a5ec1b04b57a5604cb3caa3bc56ea2476e89ba
- Dropping py27 as is out of support
- Enable py38 testing, already default python on several distros.
- removes six as a dependency as is no longer needed for pure py3
Change-Id: I1e825073abc6cd55aa2fdc363358f2701152c57b
- Upgraded hacking(flake8)
- Added more modern tox linters environment (pep8 alias)
- Temporary added skips for broken newer rules
- Fixed few basic rule violations
- Moved flake8 config to setup.cfg (tox.ini is not recommended)
Change-Id: I75b3ce5d2ce965a9dc5bdfaa49b2aacd8f0195ad
The previous query for 1708704 was causing e-r to OOM and resulting in a
lack of graph data. This is beacuse tripleo's logstash.txt file is not
getting parsed properly and ends up with the entire file as the single
event. Then e-r downloads all those file copies and fills its memory and
breaks. We can work around this by only looking in job-output.txt files
for this bug. Then when tripleo's fix has flushed the bad events out (10
days after fix merges) we can revert this change.
Change-Id: Id619f90ffe84b3d4de334ea4b17026b9b3239d33
The json file outputs of e-r are loaded by web browsers in order to
render our graphs. These json files are actually quite large and part of
the reason why is we pretty print them with 4 space indents and they
have large nesting. Stop pretty printing (humans can pass the files
through a filter if necessary) in order to reduce the size of these
files and make browsers happier (less time spent downloading).
Change-Id: I19dedc2994169932eb0e90b6cdea3856637f5ef0
After the flot update the xaxis labels aren't meaningful to humans (in
fact I'm not quite sure what they were showing us). We can explicitly
state the input type as milliseconds and the label render format. Doing
this gives us labels that are meaningful to humans.
Change-Id: I7912a536f3de2756404f8c7e7f31d8bd5890ab22
Getting elasticsearch data for bug 1708704 is failing
in the check queue with:
pyelasticsearch.exceptions.ElasticHttpError: \
(500, 'ArrayIndexOutOfBoundsException[null]')
This might have to do with the size of the resulting
messages from the hits on the tripleo and kolla jobs,
I'm not sure.
What's clear though is the graph generation is blowing
up in the check queue on that bug but not the gate queue,
maybe due to a smaller result set, so this adds some
error handling in the graph generation for when a specific
bug query fails so it does not halt the entire build of the
graph.
Change-Id: Ibe18c9cccc421a6549a18148f1a2ce3c1e4339d4
45 hits in the last 7 days, with a spike since Dec 16, mostly
fortnebula nodes, all failures, check and gate.
Change-Id: Ic856d11e183075244de556b86d4ecdc7bcc78abd
This also shows up in multi-node jobs when attaching a volume
so update the query to include volume attach failures.
Change-Id: Ie77e3998b2ff4a508fa3b6078acb3ceec15d7e37
211 hits in 7 days, check and gate, all failures.
The message shows up in the n-api logs but filtering
on n-api logs isn't sufficient to get 100% failure rate
in logstash because there are tempest tests like
test_create_list_show_delete_interfaces_by_network_port
which handle the error and work around it so even though
the message shows up not all jobs fail because of it so
we use the tempest failure log in the console to fingerprint
this bug.
Change-Id: I7b4a3f4a483c5166e9aee1507f12bb31069a8fe0
This is split off from bug 1848078 since this test specifically
seems to hit this issue a lot in multinode jobs where the instance
is shelved from one node and unshelved on another.
14 hits in 7 days, all failures.
Change-Id: I9bd41e356abf72ff08415693b7b2b11a035a542d