For awhile now lately, we have been seeing Elastic Search indexing
quickly fall behind as some log files generated in the gate have become
larger. Currently, we download a full log file into memory and then
emit it line-by-line to be received by a logstash listener. When log
files are large (example: 40M) logstash gets bogged down processing
them.
Instead of downloading full files into memory, we can stream the files
and emit their lines on-the-fly to try to alleviate load on the log
processor.
This:
* Replaces use of urllib2.urlopen with requests with stream=True
* Removes manual decoding of gzip and deflate compression
formats as these are decoded automatically by requests.iter_lines
* Removes unrelated unused imports
* Removes an unused arg 'retry' from the log retrieval method
Change-Id: I6d32036566834da75f3a73f2d086475ef3431165
This is a thing that puppet has gone back and forth on and now we are on
the wrong side of it. Fix it like we've fixed it elsewhere.
Change-Id: I6d514b2345ff284c57409cc508786b76258d9f4a
The logstash log processing and subunit2sql tooling is often paired
together on servers. Currently subunit2sql depends on
os-performance-tools which depend on statsd<3.0. That means our package
resource here for statsd that tried to install latest conflicts with our
usage of subunit2sql on the same server.
Avoid this conflict by installing 2.1.2 for geard and then subunit2sql
will be happy too.
Change-Id: I3ac04cb93025ae2e2115ed23ba4927c2060f6dc8
We don't use the jenkins log client 0mq events anymore with zuulv3.
Instead zuul jobs submit the log indexing jobs directly to the gearman
queue for log processing. This means we only need a geard to be running
so add support for running just that daemon.
Change-Id: Iedcb5b29875494b8e18fa125adb08ec2e34d0064
Now that we are upgrading to Xenial, we need to take into account
that we're running with systemd and reload it so that it picks up the
new service.
Change-Id: Id02ac2bc51132a8d8d4a77cb05d41fa902765b28
The new paho-mqtt 1.3.0 release brings
https://github.com/eclipse/paho.mqtt.python/commit/0a8cccc
which prevents its use on Ubuntu Trusty's default Python interpreter.
Until we upgrade to a newer Python there, stay on paho-mqtt 1.2.3 so
that things keep working.
Change-Id: I4ffcd8c7906c86a40f3cd8f8d83fb8208944d189
This commit adds support to the gearman worker for publishing an mqtt
message when processing a gearman job succeeds or fails. It also adds
a message for when the processor passes the logs to logstash either via
stdout or over a socket. By default this is disabled since it requires
extra configuration to tell the worker how to talk to the mqtt broker.
Depends-On: Id0308d2d4d1843fcca73f459cffa2ae944bebd0c
Change-Id: I43be3562780c61591ebede61f3a8929e8217f199
As part of the move to logstash 2.0 we are relying on upstream packaging
for logstash. This packaging replaces a lot of the micromanagement of
users and groups and dirs that was done in puppet for logstash. This is
great news because its less work for us but means that the log
processors can't rely on puppet resources for those items and we don't
actually want to install logstash package everywhere we run log
processor daemons.
Since the log processors don't need a logstash service running and
actually don't need any of the logstash stuff at all decouple them
completely and have the log processor daemons use their own user, group,
log dir, config dir, etc. With this in place we can easily switch to
using the logstash packages only where we actually need logstash to be
running.
Change-Id: I2354fbe9d3ab25134c52bfe58f562dfdf9ff6786
log_processor class may be applied to a server along with
other classes also declaring python-daemon as a dependency.
As puppet cannmot handle this, add a if defined check.
Change-Id: I40dc68bd93f113912373cb10b376819d30eb3087
The python scripts have been moved to their own project at
openstack-infra/log_processor. Delete the files here and start
installing that project from source. As a part of this split, the
.py extension has been dropped from the filename of the installed
executables.
Change-Id: Ied3025df46b5014a092be0c26e43d4f90699a43f
In anticipation of puppet 4, start trying to deal with puppet 4 things
that can be helpfully predicted by puppet lint plugins. Also fix lint
errors caught by the puppet-lint-absolute_classname-check and
puppet-lint-empty_string-check gems.
This patch changes a scope.lookupvar function call to ruby instance
variable in the jenkins-log-client.default.erb template. This is safe
to do because the template is called in the log_processor::client class
and the variable in question is within that scope.
Change-Id: Ia7d6af8bc76a65e37f5dfd184e37855fe3b97046
Have the log-gearman-client (aka jenkins-log-client) initialize
the statsd parameters when starting the geard server. Also, make
sure that the python statsd package is installed on the host.
Change-Id: I04fe1a7609f08bc710891b6a3b92d0f4d156d86c
Separate the jenkins log client and worker bits into a new module
called log_processor with ::client and ::worker classes.
Instantiate two workers on each logstash worker node.
Change-Id: I7cfec410983c25633e6b555f22a85e9435884cfb