Commit Graph

31 Commits

Author SHA1 Message Date
Clark Boylan ddb310c599 Retire this project
We've shutdown the subunit2sql processors and don't need to manage them
with puppet any more.

Depends-On: https://review.opendev.org/c/openstack/project-config/+/839235
Change-Id: Id72f597c28807684fcfc8795a93ba0dc1b7e403c
2022-04-25 09:51:59 -07:00
Matthew Treinish fc0a11ebc4
Strip legacy prefix if it's present
Migrated jobs have the legacy- prefix which makes tracking things across
the migration a bit difficult. This commit strips that from the job
name on db insertion.

Change-Id: Ibfca7a7f4ff66ec6d3809f0ee96a98d598e27e91
2017-10-03 15:35:05 -04:00
Matthew Treinish 2a7070995e
Ensure that build_names are unique per project
With the migration to zuul v3 project names for common jobs, like unit
tests, no longer are unique per project. This causes issues for
openstack-health and subunit2sql because we can longer easily filter a
unit test job between projects. This commit adds a function to ensure
that the project name is in the build_name in the metadata. (assuming
devstack or tempest is not in the job_name) This way we can still filter
based on the job name per project.

Change-Id: I5231105af975d987d5f5e5f77c6ac038e10cd832
2017-10-03 15:34:18 -04:00
Matthew Treinish c8b965bde6
Add source_url to mqtt payload
This commit adds the source_url to the mqtt payload for events emitted
by the subunit worker. Just the build_uuid isn't enough to figure out
what's going on because we typically do 2 events for each uuid, one with
a .gz extension and the other not. We need to distinguish between the 2
to figure out what's actually going on.

Change-Id: I56f0b9fb1128b412cd88e49164eb91c3e9e3e4cb
2017-04-26 17:01:25 -04:00
Matthew Treinish 4037e7b1ba
Handle mqtt errors on failures before out_event is set
There are some conditions where we fail before out_event is defined. In
those cases the mqtt publish won't work and will raise an exception
because out_event is not defined. This commit addresses that by using
fields instead of out_event. (which is the original source for
out_event) Fields exists earlier and build_uuid always remains the same
across the 2 dicts. Additionally we define fields to an empty dict
outside of the try to ensure that in the off chance of a failure before
fields is defined we don't fail sending a failure on mqtt.

Change-Id: I546a2ee42ea0f67f35c5e132d51f239a32a5582e
2017-04-25 14:18:09 -04:00
Matthew Treinish 5e8ec7ad3f
Config is a dict so use get not getattr
The config file is a yaml and parsed into a dict. However there was a
mistake in the recent patch that treated it as a ConfigParser object and
used getattr to access the mqtt options. This doesn't work and needs to
just be a get to access the key from the dict. This commit does that so
now it should work without crashing.

Change-Id: Ia3613adb6a037c9d406e6e33361d6d5ee826c9cd
2017-04-13 22:33:10 -04:00
Matthew Treinish c06cdc8a8a
Add MQTT support to the gearman worker
This commit adds support to the gearman worker for publishing an mqtt
message when processing a gearman job succeeds or fails. By default this
is disabled since it requires extra configuration to tell the worker how
to talk to the mqtt broker.

Right now the payload of the message is just the build_uuid and whether
it was written to the db or not. Eventually some details about the
subunit2sql db entry will be added to the payload. But this requires
changes to either subunit2sql or how the worker calls the subunit2sql
api before it is feasible.

Change-Id: Ibd13b737eccf52863a69d20843cb7d50242f7bb9
2017-04-10 16:08:10 -04:00
Matthew Treinish 7f701e51ee
Switch to storing run wall time in runs database
This commit uses an option added in the subunit2sql 1.8.0 release to
set the run_time column in the runs table to be the wall time for the
subunit stream, in other words the stop time of the last test - the
start time of the first test. By default subunit2sql stores the sum of
all the individual test run times as the value for the run_time column
which is of questionable value for doing analysis. By switching what we
store for this value in the infra DB hopefully this data will be of more
use to people.

Change-Id: Ief4cfa91f9661444b3680b428da0a3c4ca5dedd0
2016-09-26 09:37:26 -04:00
Matthew Treinish fa6b116168
Stop using a seperate retreiver thread
Since we recently switched from having 2 threads, 1 to retreive
subunit and 1 to process it, to having a single thread do both there
isn't any reason to launch to run that in a separate thread anymore.
This commit just removes the use of threading and runs everything in
the single process.

Change-Id: I5205fc73178b7d5a4bbee61e68b16b63499f5dd8
2016-04-20 16:13:26 -04:00
James E. Blair fa17a37a52 Make main thread non-daemon
This way it will wait to exit.

Change-Id: I2ee353b2c25e67bd87a42abf6b725867d4727b67
2016-04-20 13:09:11 -07:00
James E. Blair b8b253cfaf Ensure all jobs send a completion packet
In the job handler, aborted jobs would never send a work_complete
packet, which would cause the gearman server to track them
indefinitely.  Update the handler so that all jobs send either
a work_complete or work_exception.

Change-Id: Ia3b914762d46b1873888d5025e4ba86f9d042895
2016-04-13 16:14:09 -07:00
Matthew Treinish d31b5d9710
Make subunit2sql population single threaded
This commit switches the gearman worker to handle both retrieving the
subunit file and populating the DB with that data in a single thread.
This is as opposed to pushing it on an in memory queue and processing
the streams in a separate thread. This should have 2 major benefits
first it should make the behavior much more consistent, we'll only
send work completed to the gearman server after data is already in
the db. It should also significantly improve our memory consumption.

Change-Id: Ifaac2f150e26158b0a22e1c33e0c5d39c10182a4
2016-04-08 14:59:07 -04:00
Jenkins 224ed54b1e Merge "Fix log_url parsing for new job types" 2016-04-08 18:21:24 +00:00
Tim Buckley 3663cc2be0 Fix log_url parsing for new job types
Previously, the log_dir was determined by taking the dirname() of the
log_url twice, which works when all collected 'testrepository.subunit'
files were located under 'logs/'. Now that the worker is collecting
subunit files from all jobs, this assumption can result in the Zuul
UUID being cut off from the generated log_dir. This change adds a
check to make sure path segments are only removed when the log_dir
is 'logs/' or 'logs/old/'.

Change-Id: I75c53c498261e44989cdb7bf49d909ebde2b2699
2016-03-09 12:46:17 -07:00
Matthew Treinish 5d769819f5
Remove inaccurate comment
This commit removes an inaccurate comment that got copy and pasted
over when the script was originally created. The StdOut processor from
the logstash gearman client only works in foreground mode (for obvious
reasons) and since it was the simplest example that was used as a basis
for the subunit processor. However, the subunit processor doesn't share
this limitation and can (and has been) run as a daemon in the
background.

Change-Id: I332523567d16b0994b06e278b68ffe8bcb7d9bfa
2016-03-01 20:51:23 -05:00
Matthew Treinish ad3cc21c67
Switch use of cStringIO.StringIO to io.BytesIO
This commit switches all our uses of cStringIO.StringIO to io.BytesIO.
The StringIO class is the wrong class to use for this, especially
considering that subunit v2 is binary and not a string. This commit
fixes it so we're using the proper type.

Change-Id: I5d21f7ca2f40cbd5c2659f6cee1570ebe2d4a983
2016-03-01 19:59:30 -05:00
Matthew Treinish e12fc216ed
Add more debug logging for closed file issues
This commit adds additional debug logging to the subunit gearman
worker to try and identify the source of the closed file bug we're
hitting in production. This adds logging of the memory address where
the cStringIO.StringIO objects before and after being pushed on the
queue. It also adds a distinct message for the object if it's closed
before or after being on the queue.

Change-Id: I4a4f064e8885f8f9c8fc2974c0f71d837f41454e
2016-03-01 12:32:47 -05:00
Jenkins f4667bbba4 Merge "Use first test from subunit_stream for run_at value" 2016-03-01 00:57:55 +00:00
Matthew Treinish d8063ad5f5
Fix bug on missing subunit2sql data
This commit fixes an issue when handling a nonexistent subunit stream.
The script was returning on gearman an error but then continued to
use the internally queue up the subunit event to process, even though
there wasn't a stream. This would cause a stacktrace later when the
actual processing was done on the internal queue. This commit adds
the queueing inside an else block to prevent it from running when
we don't have a subunit stream.

Change-Id: I66fdc5d7ae3702411a0b42757087cf61a4c69e35
2016-02-29 17:25:55 -05:00
Matthew Treinish 551a4fc136
Use first test from subunit_stream for run_at value
There is an issue when there is a backlog or a worker crashes when we
start to work through the gearman queue all the newly added runs to
the subunit2sql db have a run_at value of when it was processed, not
when it was actually run.

The subunit2sql cli (and corresponding pythong api) by default uses
the current time as the run_at time when creating a new run. However
there is option which allows you to specify the value for run_at. This
commit uses the first test run from the subunit as the run_at value to
ensure that we have a somewhat accurate run_at value for the run in
the db regardless of when the subunit stream is actually processed.

Change-Id: I55663476865ec3739faf1b077aff7a02d0e0fa79
2016-02-27 21:30:30 -05:00
Jenkins f5b7b81a2b Merge "Allow worker to make use of subunit2sql targets" 2016-01-14 01:00:35 +00:00
Matthew Treinish 5327cc64a8
Handle 404 and download errors properly
This commit adds proper error handling to the subunit gearman worker
in the event a download error occurs. Prior to this change if a
subunit stream is unable to be downloaded for whatever reason an empty
row was being added to the subunit2sql db. This was because after
logging the exception an empty stream file gets passed to subunit2sql
which would treat that as adding a run which didn't run anything.
This a waste of time and is confusing to users of the DB because
there will be a number of runs in the DB which didn't actually run
anything.

Change-Id: I1f8fd7ffd9c16ce2dddd534d4c641e6d65249d91
2015-12-07 15:48:31 -05:00
Matthew Treinish c38d9d37d4
Lets time the super long migrations we have to run
Sometimes the subunit2sql DB migrations can take a *really* long time
it'd kinda be nice to know exactly how long. In an effort to actually
know the duration this commit simple runs the DB migrations under
time to record how much time we're actually spending on running these
migrations. That way it'll be logged in case anyone wants to bother to
check. (which is admittedly unlikely)

Change-Id: I31fe204f0544e9b7b58158a578552f907aa18543
2015-11-24 16:41:20 -05:00
Clint Byrum 390f44eb0e Allow worker to make use of subunit2sql targets
A plugin interface has been added, so make sure we load them. This will
allow us to eventually ingest counters while processing the stream.

Change-Id: I0614986eeae3c6f4681162c755311eab5a730862
Depends-On: Id16c87382f3767982ce815d129ae56a941375546
Implements: counter-inspection
2015-11-11 18:15:57 -08:00
Matthew Treinish 65f8c07f06
Ensure we pass a StringIO object to subunit_read
This commit adjusts the retrieve_subunit_v2() method to ensure the
returned buffer is an open "file" just passing the GzipFile object
to subunit2sql doesn't work so this attempts to sidestep the issue
by wrapping the contents of the decompressed file in a cStringIO
object.

Change-Id: I4cf2642e90e7850512c7d51d2375cd5871861c64
2015-10-13 20:25:44 -04:00
Matthew Treinish 468f46722a
Ensure we close the subunit_v2 stream file object
This subunit_v2 stream file object (or file like object) that gets
eventually passed into the event processor is never closed. The
read_subunit class only does a flush() on the object when it's done
with it. This commit adds a close() after all the processing is done
just to ensure we don't leak in the future.

Change-Id: I2975d30f772a1fe60489b7d6a9f4ae50d0d87e7f
2015-10-13 19:18:06 -04:00
Matthew Treinish 14bb617079
Remove subunit v1 to v2 conversion step
We stopped storing subunit v1 on the log servers a long time ago, but
there wasn't a pressing need to stop doing the conversion stage because
a v2 stream will just pass through subunit-1to2. However, the subprocess
command used for that step was leaking FDs which is causing issues. So
this commit fixes the leak by just removing the conversion stage since
it no longer has a purpose anyway.

Change-Id: I04c6120c2c98dcd17eb21c908919450b11934100
2015-10-13 19:01:16 -04:00
Matthew Treinish 5813f6d125
Make unhandled exceptions not fatal
This commit removes a raise in the server class when it encounters an
unhandled exception. Previously the exception would be logged and then
it would raise it again, which would be fatal to the process. We
encountered a DB connection issue a few weeks ago which shouldn't be
fatal to the worker (just that one event/db connection) but this
crashed the worker process. So this attempts to remedy this from
happening in the future by not making this sort of failure fatal to
the whole process.

Maybe this isn't a good idea, but since the first time we encountered
an unhandled exception being raised in the server class it brought
down the worker for ~22 days. So the thought is instead of crashing
just logging the failure and moving on is a better idea. However,
if there is really a fatal exception being raised this means the
process will never crash and just log it on each event.

Change-Id: I4019831e24144205508f7918dab5c8fce6cfc74a
2015-06-22 09:52:59 -04:00
Matthew Treinish 32cb330367 Revert "Add support to the subunit workers to reuse zuul uuids"
With grenade jobs this breaks the unique constraint on the runs table for the id column.

This reverts commit 811e5a717c.

Change-Id: I723c0415b83e3b33106e8d3557013678b666fac6
2015-01-29 19:21:27 +00:00
Matthew Treinish 811e5a717c Add support to the subunit workers to reuse zuul uuids
The latest subunit2sql release, 0.2.1, added support for setting the
id for a run. This commit adds support to the gearman workers for
doing this using the zuul build_uuids. This way if an event has a
build_uuid set in the metadata the id column for the row will match
zuul's. This will be useful for when eventually want to correlate
data between zuul and subunit2sql.

Change-Id: Ic4aa13a6c92c3bc759e080fdc4c2e07ff8b881bd
2014-12-23 12:54:18 -05:00
Matthew Treinish 9a06307c40 Add subunit2sql gearman workers
This adds a new gearman worker to process the subunit files from
the gate job runs. It will use subunit2sql to connect to a sql
server and process the data from the subunit file. The
log-gearman-client is modified to allow for pushing subunit jobs
to gearman, and the worker model for processsing logs is borrowed
to process the subunit files.

Change-Id: I83103eb6afc22d91f916583c36c0e956c23a64b3
2014-10-29 13:03:49 -04:00