Commit Graph

207 Commits

Author SHA1 Message Date
Chandan Kumar 920762abf4 Remove existing TQE collect-logs to test new role
It removes the existing the TQE collect-logs role in order to
test with the new opendev/ansible-role-collect-logs role.

https://tree.taiga.io/project/tripleo-ci-board/task/1001

Change-Id: Ib7892fca145a8c1947f54bfa8f7a35675e625e4d
Signed-off-by: Chandan Kumar <chkumar@redhat.com>
2019-05-02 15:08:26 +00:00
Sagi Shnaidman 89668ff83b Collect services statuses in logs
Change-Id: I5fbcec3a587cdb76d486e05951940bfc3fa61b76
2019-03-21 19:18:03 +02:00
Arx Cruz 12d708db90 Add testrepository.subunit file to root dir
os_tempest role copy these files to a different directory than the
default one for tempest, since they don't use it to submit the results
to openstack health.

https://tree.taiga.io/project/tripleo-ci-board/task/832?kanban-status=1447275

Change-Id: I3e70d31ea3c6e1ea966b812703cdcdbb293dc722
2019-03-12 13:55:55 +01:00
Zuul 0d841863be Merge "Reproducer script: run only on job started by zuul" 2019-03-02 06:20:58 +00:00
Gabriele Cerami e38421592b Reproducer script: run only on job started by zuul
The reproducer script is assuming a job was started by zuul. In some instances
we are still running jobs from platform different than zuul like ci.centos. In
those instances the log collection fails because the reproducer script creation
misses a lot of files
This patch includes that role only under the condition that zuul variable is
defined.

Change-Id: I01ec134f85943a8332832bdc6caf138d978c1c37
2019-03-01 14:51:12 +00:00
Sagi Shnaidman 0ad99eb267 Use infra generation of ARA reports
ARA reports are generated automatically in infra when
they are in ara-report directory. Don't generate htmls in upstream
jobs to save a space.

Change-Id: I7aa81ae3b06878baeab471e340477ca8a20f8594
2019-02-28 13:54:20 +00:00
Zuul 5d927f0c97 Merge "Raise an error if a service or container is failed" 2019-02-27 13:45:07 +00:00
Sagi Shnaidman 8a6b1ae251 Rename errors file if it's big
when renamed it's not sent to logstash and not indexed, which
prevents logstash OOM failures.
Change-Id: I6389f6700a401f494187a283ca8fbc3b69784541
2019-02-25 21:04:00 +02:00
Bogdan Dobrelya 95f6287755 Include standalone deploy logs for logstash index
Standalone deployment log is not indexed by logstash which
makes impossible to investigate errors related to them.

Change-Id: Ia70bf9850fab5a3b92332b4885e895ed98653596
Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>
2019-02-25 13:33:54 +01:00
Chandan Kumar 74983421a1 Raise an error if a service or container is failed
Sometimes container or service does not start, and this
doesn't make the CI fail. Until now, the failed containers
are listed in the /var/log/extras/ tree, but it's not
checked on a regular basis.

This patch intends to make a hard failure in case either
a service or a container doesn't start as expected.

Co-Authored-By: Cédric Jeanneret <cjeanner@redhat.com>
Related-Bug: #1816523
Change-Id: I001e2f27d2b562bb0be87c8eaadcf3622e530498
2019-02-25 11:39:36 +01:00
Martin André 87c3d5c390 Treat dest as normal file when copying out of podman containers
Implement container_cp() function for podman to be more in line with
"docker cp" where DEST is treated as a normal file.

The lack of -T option caused the container logs to be copied twice,
once in containers/<name>/log and a second time in
containers/<name>/log/log.

Change-Id: Ia6e20e29f8fe55330538f46b2b91d09f0129f43f
2019-02-19 10:49:52 +01:00
Ronelle Landy 891a714889 Design new role for zuul-based reproducer
This role creates a minimal bash script
to launch the reproducer via zuul/nodepool
and friends with the same variables as used
in the CI job.

Depends-On: https://review.rdoproject.org/r/#/c/18664/
Depends-On: https://review.openstack.org/#/c/635550
Story: https://tree.taiga.io/project/tripleo-ci-board/task/607
Change-Id: I9cdfa3e9d710257cdcd15979dcf0c65222ff3ac6
2019-02-08 11:39:09 -05:00
Cédric Jeanneret eff3808538 Loop over *all* containers in order to get the container STDOUT
This is mandatory if we want to be able to know the reasons a
container is in "Exited (non-Zero)" state. Until now, we got
only the currently running containers stdout, and this has, in
fact, only little interest.

Change-Id: I94917543a9d058c911d04e083d7e9cd32335eb44
2019-02-04 11:21:09 +01:00
Cédric Jeanneret 689fef43af Add output for ip6tables and related ipv6 things
Until now, we don't have any output for iptables rules applied to v6.
It can be interesting to get them, especially since standard behavior
is to try first ipv6 if a name as associated v6. Even for localhost.

This patch also reformat a bit the output in the files, adding some
headers for a better reading.

Change-Id: I62e5c03fa38f5c4c266fbf27bd4f1ec0f3bf0633
2019-01-31 07:49:30 +01:00
Chandan Kumar df2401d766 Use os_tempest for running tempest on standalone
It depends on ansible-config_template and python_venv_build
role.[1.]

Make tempest_cidr cacheable so that it will be consumed in
os_tempest role as it does not depends on extra-commons so
added an extra tasks to mkake the variable useable.

[1.] https://tree.taiga.io/project/tripleo-ci-board/us/234

https://tree.taiga.io/project/tripleo-ci-board/us/554

Change-Id: I5eb7fb64411220bc198ebae15f866693eadc3a4d
2019-01-23 16:30:57 +05:30
Sorin Sbarnea 28b46708ef Corrected openstack-virtual-baremetal repo location
Uses the new location of openstack-virtual-baremetal as it was
imported to openstack organization earlier today.

Change-Id: I6f6904b67052a97d8ebc5e4f3d766efb93fec7a5
2019-01-15 16:01:53 +00:00
Sorin Sbarnea 55f3bfecac Improve output of Verify Sphinx build task
Avoid repeated output of HTML content on console and
marks the task as not changed in ansible.

A grep failure would be easy to investigate as the file is collected.

Change-Id: I7520e8f3b5c01f39affeac398aeaeffe6dfdb6cf
Partial-Bug: #1787912
2019-01-02 16:42:18 +00:00
Ronelle Landy c399e3a842 Remove reproducer lines added to get zuul related info
zuul_variables and zuul_console were added when there
was an investigation into simulating zuul behavior
for the reproducer. We are no longer following that
workflow for the reproducer and therefore these
failing lines can be removed.

Change-Id: I5f057ce78273ddf8cd6381c9a420b317713379b6
2018-12-19 16:54:19 -05:00
Zuul e25a4ca4d5 Merge "Fix regression of collect-logs package listing" 2018-12-05 19:01:11 +00:00
Sorin Sbarnea a5580e7a98 Fix regression of collect-logs package listing
Collection package list from container is failing with
Error:-
    unexpected EOF while looking for matching `)'

Issue is caused by https://review.openstack.org/#/c/610491/

This patch also does following apart from fixing the issue:
- Redirect stderr also to container info file
- use subshell instead of bash array to add readability
- use 'set -x' to print command instead of echo "+ $cmd"
- Also remove extra blank lines from container ALL_INFO file

Change-Id: Iff347eeed47c64af14bcd181d104c94612663802
Story: https://tree.taiga.io/project/tripleo-ci-board/task/377
2018-12-04 16:56:03 +00:00
Sorin Sbarnea 72141b7fab Adopt yamllint strict linting
Upgrades yamllint to latest version and adots use of its strict
checking.

Fix all known problems reported by yamllint so we don't have to do
that while touching these files.

Change-Id: I4bdc520d9e2aff086c4b463718bc1e053261a4f5
Story: https://tree.taiga.io/project/tripleo-ci-board/task/381
2018-11-26 12:37:21 +00:00
Zuul 3be1290957 Merge "Correct some commands from the "Collect container info" tasks" 2018-11-23 18:56:01 +00:00
Zuul baa798b0d7 Merge "Allow to collect HAProxy stats and log them in a file" 2018-11-22 22:11:22 +00:00
Zuul c5156af3e6 Merge "Migrate flake8 to pre-commit" 2018-11-22 21:48:07 +00:00
Cédric Jeanneret 0aee2f5e44 Allow to collect HAProxy stats and log them in a file
Getting HAProxy stats about its backends and health will help when
we have issues with timeouts and the like.

Closes-Bug: #1803716
Change-Id: Ic787f4ac32bf53c4409d8fd058a976fbb552cb94
2018-11-21 08:15:30 +01:00
Cédric Jeanneret 54ec8d226c Correct some commands from the "Collect container info" tasks
Podman doesn't implement `cp', and its `top' subcommand does not
work like docker - it doesn't take `ps' standard options.

Change-Id: Id185477d842f9f10ca18efc4aad94ceedb94a53a
2018-11-21 08:15:01 +01:00
Zuul 7f6a3ff733 Merge "Fixed logstash file name for tempest" 2018-11-21 00:59:58 +00:00
Sorin Sbarnea cf56a554db Migrate flake8 to pre-commit
* Start running flake8 and ansible-lint via pre-commit
* Bumped hacking version to last release and fixed new errors

Change-Id: Iefe8794abba70660559fcb8cba12dc0b41737882
Story: https://tree.taiga.io/project/tripleo-ci-board/task/381
2018-11-20 15:38:35 +00:00
Cédric Jeanneret 25760f71ba Correct buggy typo
Change-Id: I6e898ba9339bbce980a4681ec79803461b4aa88c
2018-11-20 07:51:44 +01:00
Zuul 7fcb5c2801 Merge "Adapt code to newer code style (linters)" 2018-11-20 06:34:00 +00:00
Chandan Kumar 5102788c23 Fixed logstash file name for tempest
* The file getting generated is tempest.log under /home/zuul/
  tempest.log not tempest-output.log that's why it is not able
  to indexed in logstash.
* And tempest_log_file var is used twice in validate-tempest role
  and tempest.log is used at each place which also leads that
  tempest_output.log was never found in ci logs.

Related-Bug:#1802971

Change-Id: I9bb9f8bdd0a17d2a1481356caaf186ed6348f6ba
2018-11-19 11:49:38 +00:00
Cédric Jeanneret 1647bf5bc6 ensure we get podman container logs
Change-Id: Ied69b2e73bc6b60bd0279684aab34bea75c7695e
2018-11-19 11:35:48 +01:00
Sorin Sbarnea cc82349363 Adapt code to newer code style (linters)
Makes those files conformant with current linting rules and avoids
linting errors when we need to toch them again.

Previous doing "pre-commit run -a" uncovered these errors, now is no
longer reporting any other errors.

Change-Id: Ie4cf229c8f11c2b55b323eac23c89483b26d3781
2018-11-16 12:42:51 +00:00
Zuul 712b4da542 Merge "remove older and slower portions of collect-logs" 2018-11-16 09:16:27 +00:00
Cédric Jeanneret 2af2c91909 Fetch version for container engine.
Also ensure we have a decent output wit some more spacing

Change-Id: I92b7e14d844eef81af73f6e720ba352a87f0d5a1
2018-11-15 08:27:32 +01:00
Zuul d9b119acda Merge "Introduce zuul.projects and executed pre-run playbooks for reproducer" 2018-11-12 20:51:16 +00:00
Quique Llorente 5b7e12b826 Use correct package manager at DLRN
Looks like we where using yum at fedora28 job at build-test-packages, we
need to generalize code at build-test-package so it works with fedora
too.

Also install-build-repo was trying to use yum.

Change-Id: I8cea39a9923e23c5f0fceb895a1efe4cb8ec395d
Story: https://tree.taiga.io/project/tripleo-ci-board/task/319?kanban-status=1447275
2018-11-10 12:09:17 +00:00
Marios Andreou 2f2d7ab4e2 Introduce zuul.projects and executed pre-run playbooks for reproducer
This uses the zuul-variables file for zuul.projects and the json console
log written by the depends on, for the playbooks. This is tracked by the
ci team with [1]. These are made available to but not yet used in the
reproducer (that will come later, see the story linked in [1]).

[1] https://tree.taiga.io/project/tripleo-ci-board/task/270
Depends-On: https://review.openstack.org/615191
Co-Authored-By: Ronelle Landy <rlandy@redhat.com>
Change-Id: I64923cfd75e697b98507be1ff398c14654108ddf
2018-11-09 15:27:22 +00:00
Zuul 5689e8f6aa Merge "Run atop for monitoring deployment" 2018-11-09 13:35:14 +00:00
Sagi Shnaidman 5215572299 Run atop for monitoring deployment
Use atop[1] tool to monitor the whole job process.
Atop generates binary output that could be downloaded
and then investigated locally.
Using atop -r /path/to/atop.bin you can read the file
and by pressing "t" to move 10 seconds futher or by
pressing "b" to jump to specific time in job and to see
what happened on host in this time. It allows to track
all resources in specific time.
It allows also tracking of containers separately.
For more info you can visit the site[1]
If atop installation fails it shouldn't fail the job,
so ignore_errors is added.
Currently it's for undercloud in OVB and all nodes in
multinode.

[1] https://www.atoptool.nl/

Change-Id: I7e17db3e376218f620a18db7ea7ca82d7578f618
Depends-On: Ibcdcfb4d8c5c94e1a06c7e635b0b6778ad318094
2018-11-08 01:27:33 +02:00
Sorin Sbarnea 5fb90bcb09 Fix runtime Ansible warnings
- warning from the output by setting no_log for sensitive
variable.
[WARNING]: Module did not set no_log for influxdb_password

- deprecation of "static", using 'import_tasks' for static
inclusions or 'include_tasks' for dynamic inclusions.

Change-Id: I774d59b0d1bf5324c5a8b7c95a06f07299478e6a
2018-11-07 13:43:26 +00:00
Zuul cf3c08c0b4 Merge "Use dnf and python3 on platforms where these are default" 2018-11-06 17:46:57 +00:00
Zuul e393c2c832 Merge "by default collect all files in /var/lib/mistral" 2018-11-05 18:56:24 +00:00
Sorin Sbarnea 04585eb6f4 Use dnf and python3 on platforms where these are default
Roles do not need to make any assumptions about having some facts
already gathered, thus at start they should assure they gather any
missiong facts that are used inside the role.

Change-Id: I49fd1a0c070d96aecb880164acde490c9e7c95ef
Story: https://tree.taiga.io/project/tripleo-ci-board/task/153
Depends-On: https://review.openstack.org/#/c/615489/
2018-11-05 15:58:33 +00:00
Sagi Shnaidman 229922b603 Add /var/log/tripleo-container-image-prepare.log
Change-Id: I71299390b543c1f43eb852ebe4a20e490e6c7d8a
2018-10-28 23:19:37 +02:00
Cédric Jeanneret 47914f502d Add podman support for log collection
We want to ensure we get all the logs we want, from both docker
and podman engines, even if we get some mixed environment for
some reason.

Change-Id: I00e2f9b7755b7e32b7ed20b482d851aacb17464e
2018-10-05 14:20:03 +02:00
Wes Hayutin 1c5ad2c975 by default collect all files in /var/lib/mistral
Playbooks, roles and logs for the overcloud deployment
now are in /var/lib/mistral.  These files should be
captured by default.

Change-Id: I00f7de1d1f6a4ac1c8785b92c6edef10c95bc6cd
2018-10-02 12:34:52 -06:00
Sagi Shnaidman 20943755b2 Add log of ansible modify container role to logstash
Index logs of ansible modify images role, which updates containers

Change-Id: Id29de6d65e85e40580a30c67da133a317b823213
2018-10-02 19:52:55 +03:00
Sagi Shnaidman 22126f7010 Fix overcloud ARA data collection
ara_overcloud_db_path was undefined in collect-logs role and
undercloud data was collected twice, ansible didn't alert
about undefined variable if it's in "environment"

Close-Bug: #1794238
Change-Id: I1d982a129337188a884e366cdc56a07637107e4b
2018-09-28 13:28:35 +03:00
Zuul 5e0a6b1e9f Merge "Fix used paths to match custom working dir" 2018-09-27 08:01:09 +00:00