It removes the existing the TQE collect-logs role in order to
test with the new opendev/ansible-role-collect-logs role.
https://tree.taiga.io/project/tripleo-ci-board/task/1001
Change-Id: Ib7892fca145a8c1947f54bfa8f7a35675e625e4d
Signed-off-by: Chandan Kumar <chkumar@redhat.com>
The reproducer script is assuming a job was started by zuul. In some instances
we are still running jobs from platform different than zuul like ci.centos. In
those instances the log collection fails because the reproducer script creation
misses a lot of files
This patch includes that role only under the condition that zuul variable is
defined.
Change-Id: I01ec134f85943a8332832bdc6caf138d978c1c37
ARA reports are generated automatically in infra when
they are in ara-report directory. Don't generate htmls in upstream
jobs to save a space.
Change-Id: I7aa81ae3b06878baeab471e340477ca8a20f8594
Standalone deployment log is not indexed by logstash which
makes impossible to investigate errors related to them.
Change-Id: Ia70bf9850fab5a3b92332b4885e895ed98653596
Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>
Sometimes container or service does not start, and this
doesn't make the CI fail. Until now, the failed containers
are listed in the /var/log/extras/ tree, but it's not
checked on a regular basis.
This patch intends to make a hard failure in case either
a service or a container doesn't start as expected.
Co-Authored-By: Cédric Jeanneret <cjeanner@redhat.com>
Related-Bug: #1816523
Change-Id: I001e2f27d2b562bb0be87c8eaadcf3622e530498
Implement container_cp() function for podman to be more in line with
"docker cp" where DEST is treated as a normal file.
The lack of -T option caused the container logs to be copied twice,
once in containers/<name>/log and a second time in
containers/<name>/log/log.
Change-Id: Ia6e20e29f8fe55330538f46b2b91d09f0129f43f
This is mandatory if we want to be able to know the reasons a
container is in "Exited (non-Zero)" state. Until now, we got
only the currently running containers stdout, and this has, in
fact, only little interest.
Change-Id: I94917543a9d058c911d04e083d7e9cd32335eb44
Until now, we don't have any output for iptables rules applied to v6.
It can be interesting to get them, especially since standard behavior
is to try first ipv6 if a name as associated v6. Even for localhost.
This patch also reformat a bit the output in the files, adding some
headers for a better reading.
Change-Id: I62e5c03fa38f5c4c266fbf27bd4f1ec0f3bf0633
Uses the new location of openstack-virtual-baremetal as it was
imported to openstack organization earlier today.
Change-Id: I6f6904b67052a97d8ebc5e4f3d766efb93fec7a5
Avoid repeated output of HTML content on console and
marks the task as not changed in ansible.
A grep failure would be easy to investigate as the file is collected.
Change-Id: I7520e8f3b5c01f39affeac398aeaeffe6dfdb6cf
Partial-Bug: #1787912
zuul_variables and zuul_console were added when there
was an investigation into simulating zuul behavior
for the reproducer. We are no longer following that
workflow for the reproducer and therefore these
failing lines can be removed.
Change-Id: I5f057ce78273ddf8cd6381c9a420b317713379b6
Collection package list from container is failing with
Error:-
unexpected EOF while looking for matching `)'
Issue is caused by https://review.openstack.org/#/c/610491/
This patch also does following apart from fixing the issue:
- Redirect stderr also to container info file
- use subshell instead of bash array to add readability
- use 'set -x' to print command instead of echo "+ $cmd"
- Also remove extra blank lines from container ALL_INFO file
Change-Id: Iff347eeed47c64af14bcd181d104c94612663802
Story: https://tree.taiga.io/project/tripleo-ci-board/task/377
Upgrades yamllint to latest version and adots use of its strict
checking.
Fix all known problems reported by yamllint so we don't have to do
that while touching these files.
Change-Id: I4bdc520d9e2aff086c4b463718bc1e053261a4f5
Story: https://tree.taiga.io/project/tripleo-ci-board/task/381
Getting HAProxy stats about its backends and health will help when
we have issues with timeouts and the like.
Closes-Bug: #1803716
Change-Id: Ic787f4ac32bf53c4409d8fd058a976fbb552cb94
Podman doesn't implement `cp', and its `top' subcommand does not
work like docker - it doesn't take `ps' standard options.
Change-Id: Id185477d842f9f10ca18efc4aad94ceedb94a53a
* Start running flake8 and ansible-lint via pre-commit
* Bumped hacking version to last release and fixed new errors
Change-Id: Iefe8794abba70660559fcb8cba12dc0b41737882
Story: https://tree.taiga.io/project/tripleo-ci-board/task/381
* The file getting generated is tempest.log under /home/zuul/
tempest.log not tempest-output.log that's why it is not able
to indexed in logstash.
* And tempest_log_file var is used twice in validate-tempest role
and tempest.log is used at each place which also leads that
tempest_output.log was never found in ci logs.
Related-Bug:#1802971
Change-Id: I9bb9f8bdd0a17d2a1481356caaf186ed6348f6ba
Makes those files conformant with current linting rules and avoids
linting errors when we need to toch them again.
Previous doing "pre-commit run -a" uncovered these errors, now is no
longer reporting any other errors.
Change-Id: Ie4cf229c8f11c2b55b323eac23c89483b26d3781
Looks like we where using yum at fedora28 job at build-test-packages, we
need to generalize code at build-test-package so it works with fedora
too.
Also install-build-repo was trying to use yum.
Change-Id: I8cea39a9923e23c5f0fceb895a1efe4cb8ec395d
Story: https://tree.taiga.io/project/tripleo-ci-board/task/319?kanban-status=1447275
This uses the zuul-variables file for zuul.projects and the json console
log written by the depends on, for the playbooks. This is tracked by the
ci team with [1]. These are made available to but not yet used in the
reproducer (that will come later, see the story linked in [1]).
[1] https://tree.taiga.io/project/tripleo-ci-board/task/270
Depends-On: https://review.openstack.org/615191
Co-Authored-By: Ronelle Landy <rlandy@redhat.com>
Change-Id: I64923cfd75e697b98507be1ff398c14654108ddf
Use atop[1] tool to monitor the whole job process.
Atop generates binary output that could be downloaded
and then investigated locally.
Using atop -r /path/to/atop.bin you can read the file
and by pressing "t" to move 10 seconds futher or by
pressing "b" to jump to specific time in job and to see
what happened on host in this time. It allows to track
all resources in specific time.
It allows also tracking of containers separately.
For more info you can visit the site[1]
If atop installation fails it shouldn't fail the job,
so ignore_errors is added.
Currently it's for undercloud in OVB and all nodes in
multinode.
[1] https://www.atoptool.nl/
Change-Id: I7e17db3e376218f620a18db7ea7ca82d7578f618
Depends-On: Ibcdcfb4d8c5c94e1a06c7e635b0b6778ad318094
- warning from the output by setting no_log for sensitive
variable.
[WARNING]: Module did not set no_log for influxdb_password
- deprecation of "static", using 'import_tasks' for static
inclusions or 'include_tasks' for dynamic inclusions.
Change-Id: I774d59b0d1bf5324c5a8b7c95a06f07299478e6a
We want to ensure we get all the logs we want, from both docker
and podman engines, even if we get some mixed environment for
some reason.
Change-Id: I00e2f9b7755b7e32b7ed20b482d851aacb17464e
Playbooks, roles and logs for the overcloud deployment
now are in /var/lib/mistral. These files should be
captured by default.
Change-Id: I00f7de1d1f6a4ac1c8785b92c6edef10c95bc6cd
ara_overcloud_db_path was undefined in collect-logs role and
undercloud data was collected twice, ansible didn't alert
about undefined variable if it's in "environment"
Close-Bug: #1794238
Change-Id: I1d982a129337188a884e366cdc56a07637107e4b