Commit Graph

56 Commits

Author SHA1 Message Date
Clark Boylan fb7c8790dd Retire this project
We've shutdown the log processing service and don't need to manage it
with puppet any more.

Depends-On: https://review.opendev.org/c/openstack/project-config/+/839235
Change-Id: I451488faf6a7502a5171d2a4299d7a4e40d96072
2022-04-25 09:48:21 -07:00
melanie witt 89bfe00dda Stream log files instead of loading full files into memory
For awhile now lately, we have been seeing Elastic Search indexing
quickly fall behind as some log files generated in the gate have become
larger. Currently, we download a full log file into memory and then
emit it line-by-line to be received by a logstash listener. When log
files are large (example: 40M) logstash gets bogged down processing
them.

Instead of downloading full files into memory, we can stream the files
and emit their lines on-the-fly to try to alleviate load on the log
processor.

This:

  * Replaces use of urllib2.urlopen with requests with stream=True
  * Removes manual decoding of gzip and deflate compression
    formats as these are decoded automatically by requests.iter_lines
  * Removes unrelated unused imports
  * Removes an unused arg 'retry' from the log retrieval method

Change-Id: I6d32036566834da75f3a73f2d086475ef3431165
2020-11-09 09:49:07 -08:00
Zuul 09b0ed74f7 Merge "Handle case where content encoding isn't set" 2019-08-27 18:01:38 +00:00
Zuul c0f67cef05 Merge "Don't try to get .gz suffixed files in addition to base url" 2019-08-24 02:25:13 +00:00
Clark Boylan 972b6355b0 Handle case where content encoding isn't set
Belts and suspenders for cases where content encoding may not be
present. I believe this is possible if the content is served with the
identity encoding. In that case setting the encoding header isn't
required.

Change-Id: If18670d4fd3656a35f818247539b7afad39493e6
2019-08-23 18:45:06 -07:00
Clark Boylan 15991cfded Don't try to get .gz suffixed files in addition to base url
Zuul gives us the source url to index. Previously we tried to fetch
url + .gz because in many cases we uploaded the file as a gzip file but
logically treated it as unzipped. Now with logs in swift we compress
files without the .gz suffix. This means we should be able to always
fetch the url that zuul provides unmodified.

Depends-On: https://review.opendev.org/678303
Change-Id: I0ea4d9daa905ccb50372b73b5035758fc0963716
2019-08-23 21:55:17 +00:00
Clark Boylan b9063a7e7e Fix systemd severity filter input data
The severity filters are passed the entire json event and not just a
string. Update the systemd filter to access the message string out of
the event json dict.

Prior to this we get a type error:

  2019-08-19 17:18:48,055 Exception handling log event.
  Traceback (most recent call last):
    File "/usr/local/bin/log-gearman-worker.py", line 255, in
  _handle_event
      keep_line = f.process(out_event)
    File "/usr/local/bin/log-gearman-worker.py", line 183, in process
      m = self.SYSTEMDRE.match(msg)
  TypeError: expected string or buffer

Change-Id: I7ab56ac397133f00539d9d3374fa400363ef12d6
2019-08-19 10:27:45 -07:00
Ian Wienand 3119c0cddd log-gearman-worker: remove obsolete GET debug filter, add local filter
Now that logs have moved into swift, the os-loganalyze middleware that
stripped DEBUG level logs when the URL was given a ?level= parameter
no longer functions.

We move to filtering DEBUG statements directly.  Because services in
devstack now run as systemd services, their log files are actually
journalctl dumps.  Thus we add a new filter for systemd style
timestamps and messages (this is loosely based on the zuul log viewer
at [1]).

[1] 8c1f4e9d6b/web/src/actions/logfile.js

Change-Id: I54087c95c809612758139136d5b3e86b1a6372be
2019-08-19 17:15:25 +10:00
Ian Wienand 5b30a3a6c0 log-gearman-worker: Remove jenkins streaming workaround
We don't need to worry about the file changing under us any more; this
was all pre-zuul, let alone pre-using-swift for logs.  Remove this
workaround.

Change-Id: I5938dcef5550d4c62c8158c5f89ace75ae99aedc
2019-08-19 13:06:58 +10:00
Ian Wienand bca04e3155 log-gearman-worker: handle deflate encoded values
We are now logging to swift and store the objects as deflate encoded
data [1].  This means that we get back "Content-Encoding: deflate"
data when downloading the logs (even despite us not accepting that).

So put in a path for deflate encoding to the extant code with zlib.
For completeness we also update our Accept-Encoding: header to show we
accept deflate.

[1] 60e7542875/roles/upload-logs-swift/library/zuul_swift_upload.py (L608)

Change-Id: I328bafea3ddae858fd77af043f16c499ddd5a30e
2019-08-19 13:02:32 +10:00
Clark Boylan 772a94ff6d Force geard to listen on ::
By default geard only listens on ipv4 0.0.0.0 which means ipv6
connectiosn don't work. Because we run dual stack and things expect ipv6
to work (we have AAAA dns records after all) force geard to listen on ::
which will accept ipv6 and ipv4 connections.

Change-Id: Ibf3bfc5f80ca139b375ee2902dc3149ac791ef96
2018-10-18 15:47:14 -07:00
Zuul 8c748b0cd5 Merge "Add support for running a standalone geard" 2018-10-18 16:44:21 +00:00
James E. Blair 625bb48d13 Add severity info to logstash and filter out DEBUG lines
This adds severity as a logstash field for every oslo formatted
log line, and does not add any lines which are at DEBUG level.

This means we no longer rely on the level=INFO query paremeter
in order to remove DEBUG lines, so we will avoid sending them
to logstash regardless of whether os-loganalyze is used.

Change-Id: I8c4ac76a7fa0c3badd82fc7c54959ef6eb052732
2018-08-06 10:00:40 -07:00
Clark Boylan ffb50e9e8e Add support for running a standalone geard
We don't use the jenkins log client 0mq events anymore with zuulv3.
Instead zuul jobs submit the log indexing jobs directly to the gearman
queue for log processing. This means we only need a geard to be running
so add support for running just that daemon.

Change-Id: Iedcb5b29875494b8e18fa125adb08ec2e34d0064
2017-12-22 10:21:52 -08:00
Clark Boylan 54eb1a0785 Collapse logically identical filenames for crm114
Log files come with many names while still containing the same logical
content. That may be because the path to them differs (eg /var/log/foo.log
and /opt/stack/log/foo.log) or due to file rotations (eg
/var/log/foo.log and /var/log/foo.log.1) or due to compression (eg
/var/log/foo.log and /var/log/foo.log.gz). At the end of the day these
are all the same foo.log log file.

This means when we do machine learning on the log files we can collapse
all these different cases down into a single case that we learn on. This
has become more important with the recent running out of disk space due
to all the non unique log paths out there for our log files but should
also result in better learning.

Change-Id: I4ba276870b73640909ac469b336a436eb127f611
2017-11-22 23:05:35 +00:00
Clark Boylan f35b4e2490 Reduce log worker internal queue size
We are having OOM problems with larger log files. Attempt to make this
more robust by having much smaller internal log line message queues (we
reduce the queue size to about 10% the original size). The idea here is
that if we have the old 130k queue size full then grab a large log file
the overhead there is fairly significant whereas if we have a small 16k
queue size and grab a large log file we really only have to worry about
the size of the logfile itself.

Depends-On: Iddbbab9ea5996df4922bf7927deb8f0354378ab7
Change-Id: I761fabaa1b5aae64790def721980151f9fdc720d
2017-11-18 23:40:46 +00:00
Clark Boylan 88e0d21347 Only send mqtt events for processed files
We were previously sending events for every file we attempted to
process, not just those that were processed and also for every single
log line event. This effectively doubled the io performed by the
logstash workers which seemed to slow the whole pipeline down. Trim it
down to only recording events for log files that are processed which
should significantly trim down the total number of events.

Change-Id: I0daf3eb2e2b3240e3efa4f2c7bac57de99505df0
2017-08-03 15:28:45 -07:00
Clark Boylan becc05e0aa else needs to be else:
Fix missing ':' syntax error.

Change-Id: I65d26db42eb871c230fd880457e12a25016baf1e
2017-08-03 09:44:46 -07:00
Matthew Treinish 662ae3777c
Handle cases without a build_change field
Previously the mqtt topic generation always assumed a build_change was
present. However there are some cases where the isn't a build_change in
the metadata, like periodic, post, and release jobs. This commit handles
those edge cases so it uses the build queue in the topic instead of the
build_change. If that doesn't work the topic is just the project.

Change-Id: I26dba76e3475749d00a45b076d981778f885c339
2017-08-03 11:46:55 -04:00
Clark Boylan 8a55071b10 Fix syntax errors
There was bad indentation and a missing '.' in config.get.
_generate_topic())_ is an object method not global function and it takes
an action argument.

Change-Id: I01c4af83cf98f0d7191041a864618a1608f97647
2017-08-02 15:23:29 -07:00
Matthew Treinish b1a4357058
Add MQTT support to the gearman worker
This commit adds support to the gearman worker for publishing an mqtt
message when processing a gearman job succeeds or fails. It also adds
a message for when the processor passes the logs to logstash either via
stdout or over a socket. By default this is disabled since it requires
extra configuration to tell the worker how to talk to the mqtt broker.

Depends-On: Id0308d2d4d1843fcca73f459cffa2ae944bebd0c
Change-Id: I43be3562780c61591ebede61f3a8929e8217f199
2017-04-27 10:06:34 -04:00
Clark Boylan 1f0f91fdbb Reduce log client logging by default
We had been running at debug level which is incredibly verbose. Remove
the -d flag. This will cause the logs which are logged to go to
stdout/err which should mean that upstart (or whatever init system) will
deal with them for us.

We should properly clean this up so that debug logging is useful again
in the long term.

Change-Id: I613c135ea56507d083df8c66e8846c6fbfa8b2ed
2016-09-27 17:18:23 -07:00
Clark Boylan 8491d10d26 Decouple log processing from logstash
As part of the move to logstash 2.0 we are relying on upstream packaging
for logstash. This packaging replaces a lot of the micromanagement of
users and groups and dirs that was done in puppet for logstash. This is
great news because its less work for us but means that the log
processors can't rely on puppet resources for those items and we don't
actually want to install logstash package everywhere we run log
processor daemons.

Since the log processors don't need a logstash service running and
actually don't need any of the logstash stuff at all decouple them
completely and have the log processor daemons use their own user, group,
log dir, config dir, etc. With this in place we can easily switch to
using the logstash packages only where we actually need logstash to be
running.

Change-Id: I2354fbe9d3ab25134c52bfe58f562dfdf9ff6786
2016-03-09 13:52:35 -08:00
James E. Blair d7d9d50ee2 Change node_region to node_provider
This matches nodepool terminology to reduce confusion.

Change-Id: I3a8776010dcaf6677a450d0a9cb770313e604019
2015-12-17 14:51:59 -08:00
K Jonathan Harker 84c7e72312 Revert "Switch to using the new log_processor project"
This reverts commit b548b141ce.

b548b141ce was supposed to depend-on
https://review.openstack.org/248868

Change-Id: If3d4ad8a1cd45e6e63155a76dc1477ab38b156e3
2015-12-07 16:21:00 -08:00
K Jonathan Harker b548b141ce Switch to using the new log_processor project
The python scripts have been moved to their own project at
openstack-infra/log_processor. Delete the files here and start
installing that project from source. As a part of this split, the
.py extension has been dropped from the filename of the installed
executables.

Change-Id: Ied3025df46b5014a092be0c26e43d4f90699a43f
2015-11-25 15:23:26 -08:00
Matthew Treinish b2190b1a42
Add a node_region field to the job metadata
The node region can be figured out from the build_node very easily and
having a discrete field will make filtering to a single region much
simpler. This commit adds a new metadata field 'node_region' which is
the cloud region that the build_node ran in.

Change-Id: I06bbb62d21871ee61dbfb911143efff376992b98
2015-11-19 19:16:13 -05:00
Joshua Hesketh cd55cdf7d7 Revert "Create subunit proccessor subclass"
This reverts commit 135ac1809d.

EventProcessor was called before being defined. The code also doesn't
look entirely right. Reverting this to fix up the logstash servers

Change-Id: I2fb8081426646565814090c152d04d7349c16945
2015-11-19 11:05:53 +00:00
Jenkins 75ed9aca88 Merge "Create subunit proccessor subclass" 2015-11-18 14:23:41 +00:00
Jenkins 4b308c5308 Merge "Add the ability to filter on project" 2015-11-15 13:49:48 +00:00
Jenkins 6f4720fd7b Merge "Process ZUUL_VOTING parameter" 2015-10-08 15:35:29 +00:00
Matt Riedemann eeddbf5a43 Process ZUUL_VOTING parameter
Read the ZUUL_VOTING parameter and add to the event before posting for
log processing.

The plan is that elastic-recheck will eventually use this field for
filtering out non-voting jobs from the e-r uncategorized bugs page.

Depends-On: I40746bb77aab900c1dd2637f940c14f72a904a61

Change-Id: I1f3c2a65104db39fdd7d786d421cded1b436a5f6
2015-09-16 09:04:27 -07:00
Anita Kuno 2b6961467a Add build_zuul_url parameter
Currently logstash does not track the zuul url. The zuul url
contains the zm (zuul merger) node identifier.

While trying to troubleshoot a zuul cloning issue, I noticed all
faiures were coming from the same zm (zuul merger) node. Tracking
the build_zuul_url can be helpful. This patch adds the
build_zuul_url parameter.

Change-Id: I83358dc0d9b27852df2395a9c52d2daaaeda712b
2015-09-01 14:22:16 -04:00
Clark Boylan 17883b76b0 Import socket so we can use it to get name info
Previously this was using socket.getaddrinfo() without importing socket
and causing the daemon to fail. Running in the foreground did not use
statsd thus did not attempt to resolve the statsd host which is how this
got past manual testing. Import socket to get everything working agian.

Change-Id: I280973bdcdf472736a07d19173559b062ed74d3c
2015-07-17 11:15:19 -07:00
Jenkins a193d901cf Merge "Lazily connect to logstash" 2015-07-15 22:50:58 +00:00
Jenkins 2595ee1273 Merge "Retry on EAI_AGAIN name resolution failures" 2015-07-15 22:50:57 +00:00
K Jonathan Harker 135ac1809d Create subunit proccessor subclass
This allows for subunit files that do not include subunit in the name.

Change-Id: I8504fad6a4dea98700c204984cf00fea95de8369
2015-06-11 11:18:06 -07:00
K Jonathan Harker 622f6d9471 Add the ability to filter on project
Implement a project-filter option to gearman client config alongside the
job-filter and build-queue-filter options.

Change-Id: Ia71f216f4acc9de145eb9124df691393d2a86808
2015-06-11 09:13:41 -07:00
Clark Boylan 2aa7b07ebb Lazily connect to logstash
Because boot order is such a mess we will lazily connect to the logstash
TCP/UDP ports to allow for logstash to come up before we start writing
to it. This takes advantage of existing logstash restart handler code in
the log processors.

Change-Id: I836c55806c88cc86b7973b3d40f4bfce076970f5
2015-03-05 11:14:15 -08:00
Clark Boylan 3cd22c77cc Retry on EAI_AGAIN name resolution failures
There is no sane way to convince Ubuntu to start these services after
name resolution is working (because sysv init is horribly broken on
Ubuntu). Work around this by catching EAI_AGAIN errors during name
resolution and retrying until we can resolve names.

This logs each failed resolution attempt so that users are aware of the
issue if investigating logs.

Change-Id: If94d4f04d0e1cfedc358fd9d678a36fc9cd8aa7b
2015-03-05 10:29:48 -08:00
Clark Boylan e3641f727f Start processes after network and named
Log processing requires networking and name resolution to be available.
Specify these deps in the LSB init headers so that we get proper boot
time start sequences for these services.

Change-Id: Ic36eba2654e7425f3aba8ee5c215150b7d94d658
2015-03-04 08:36:51 -08:00
Matthew Treinish 19d70bee3e Add support to log gearman client to filter on build-queue
This commit adds a new job filter to the gearman client to filter
based on the build queue. This is used for the subunit jobs which
we don't want to run on check jobs.

Change-Id: If81fe98d8d67bb718c53a963695a7d06f5f6625d
2014-11-19 09:42:47 -05:00
Matthew Treinish e5fbd6ca48 Add subunit2sql gearman workers
This adds a new gearman worker to process the subunit files from
the gate job runs. It will use subunit2sql to connect to a sql
server and process the data from the subunit file. The
log-gearman-client is modified to allow for pushing subunit jobs
to gearman, and the worker model for processsing logs is borrowed
to process the subunit files.

Change-Id: I83103eb6afc22d91f916583c36c0e956c23a64b3
2014-10-29 13:03:49 -04:00
Clark Boylan 742c92e537 Handle log processing subprocess cleanup better
We are leaking file descriptors in our log worker processes because we
are are not catch all possible errors leaving some actions left behind
to do. More aggressively catch errors so that all cleanup happens

Change-Id: I7a73a36c6fc42d4eba636cf36c8cfffcea48a318
2014-09-03 17:03:27 -07:00
Christian Berendt 46b9ae5771 Use except x as y instead of except x, y
According to https://docs.python.org/3/howto/pyporting.html the
syntax changed in Python 3.x. The new syntax is usable with
Python >= 2.6 and should be preferred to be compatible with Python3.

Enabled hacking check H231.

Change-Id: I4c20a04bc7732efc2d4bbcbc3d285107b244e5fa
2014-05-29 23:55:41 +02:00
Clark Boylan bbbf64f74c Don't treat IDs as uniquely special in CRM114
The openstack logs a full of various IDs and UUIDs but they are not
uniquely special when it comes to filtering them. Instead replace each
ID with a token making CRM114's life much easier.

Change-Id: Id9b430c0d31889b89e4e0c1790a2405d73f501b5
2014-03-24 15:19:49 -07:00
Clark Boylan 5a3ff67db4 Better logstash field data.
We are currently using a lot of wildcard searches in elasticsearch which
are slow. Provide better field data so that we can replace those
wildcard searches with filters. In particular add a short uuid field and
make the filename tag field the basename of the filepath so that grenade
and non grenade files all end up with the same tags.

Change-Id: If558017fceae96bcf197e611ab5cac1cfe7ae9bf
2014-03-13 14:42:58 -07:00
James E. Blair c24a8a75e7 Use statsd in logstash client
Have the log-gearman-client (aka jenkins-log-client) initialize
the statsd parameters when starting the geard server.  Also, make
sure that the python statsd package is installed on the host.

Change-Id: I04fe1a7609f08bc710891b6a3b92d0f4d156d86c
2014-02-24 15:34:48 -08:00
Clark Boylan cc5d9265ec Handle log filter exceptions more gracefully.
If there is an exception filtering a log event handle that by removing
the filter and continuing to process the remaining log events for the
assocaited file. This prevents non filter data from being lost when the
filters have an exception.

Change-Id: I65141daf21a873096829c41fdc2c77cbeecde2e3
2014-02-10 10:20:12 -08:00
Clark Boylan 585112564d Close unneeded fds before execing CRM 114.
CRM 114 is being forked off of the gearman worker processes and as a
result has open fds for log files and tcp connections. CRM 114 should be
isolated from the fds so that it doesn't crash when they change
unexpectedly. Close the fds using the subprocess.Popen close_fds flag.

Change-Id: I4fbdf3564771be7d7a7e4c518e571634de576253
2014-02-05 09:44:55 -08:00