Retire repository

Fuel (from openstack namespace) and fuel-ccp (in x namespace)
repositories are unused and ready to retire.

This change removes all content from the repository and adds the usual
README file to point out that the repository is retired following the
process from
https://docs.openstack.org/infra/manual/drivers.html#retiring-a-project

See also
http://lists.openstack.org/pipermail/openstack-discuss/2019-December/011647.html

Depends-On: https://review.opendev.org/699362
Change-Id: I6b38110f2d230006cd9cce1da5d2cf76cf470d35
This commit is contained in:
Andreas Jaeger 2019-12-18 09:54:55 +01:00
parent 617e225baa
commit 16f5f4cf44
81 changed files with 10 additions and 11090 deletions

69
.gitignore vendored
View File

@ -1,69 +0,0 @@
*.py[cod]
# C extensions
*.so
# Packages
*.egg
*.egg-info
dist
build
.eggs
eggs
parts
bin
var
sdist
develop-eggs
.installed.cfg
lib
lib64
# Installer logs
pip-log.txt
# Unit test / coverage reports
.coverage
cover
.tox
nosetests.xml
.testrepository
.venv
# Translations
*.mo
# Mr Developer
.mr.developer.cfg
.project
.pydevproject
# Complexity
output/*.html
output/*/index.html
# Sphinx
doc/build
# oslo-config-generator
etc/*.sample
# pbr generates these
AUTHORS
ChangeLog
# Editors
*~
.*.swp
.*sw?
# Vagrant
.vagrant
vagrant/Vagrantfile.custom
vagrant/vagrantkey*
# generated openrc
openrc
# tests
tests/.cache/*

176
LICENSE
View File

@ -1,176 +0,0 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.

10
README.rst Normal file
View File

@ -0,0 +1,10 @@
This project is no longer maintained.
The contents of this repository are still available in the Git
source code management system. To see the contents of this
repository before it reached its end of life, please check out the
previous commit with "git checkout HEAD^1".
For any further questions, please email
openstack-discuss@lists.openstack.org or join #openstack-dev on
Freenode.

View File

@ -1,16 +0,0 @@
FROM {{ image_spec("base-tools") }}
MAINTAINER {{ maintainer }}
# Install alarm-manager and dependencies
COPY alarm-manager.py /opt/ccp/bin/
COPY requirements.txt /tmp/requirements.txt
COPY config-files /etc/alarm-manager/
RUN pip install --no-cache-dir -r /tmp/requirements.txt \
&& useradd --user-group alarm-manager \
&& usermod -a -G microservices alarm-manager \
&& chown -R alarm-manager: /etc/alarm-manager \
&& chmod 755 /opt/ccp/bin/alarm-manager.py \
&& rm -f /tmp/requirements.txt
USER alarm-manager

View File

@ -1,603 +0,0 @@
#!/usr/bin/env python
#
# Copyright 2016 Mirantis, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
#
# Global imports
# --------------
import argparse
import hashlib
import jinja2
import logging
import os
import pyinotify
import re
import sys
import yaml
# Best practice code for logging
# ------------------------------
try: # Python 2.7+
from logging import NullHandler
except ImportError:
class NullHandler(logging.Handler):
def emit(self, record):
pass
# Global variables initialization
# -------------------------------
dflt_cfg_dir = os.path.join(
'/etc', 'alarm-manager')
dflt_config = os.path.join(
dflt_cfg_dir, 'config', 'alarm-manager.ini')
dflt_template = os.path.join(
dflt_cfg_dir, 'templates', 'lua_alarming_template.j2')
dflt_cfg_template = os.path.join(
dflt_cfg_dir,
'templates', 'alarm_manager_lua_config_template.cfg.j2')
dflt_dest_dir = os.path.join(
'/opt', 'ccp', 'lua', 'modules', 'stacklight_alarms')
dflt_cfg_dest_dir = os.path.join(
'/var', 'lib', 'hindsight', 'load', 'analysis')
dflt_alarm_file = 'alarms.yaml'
# Logging initialization
# ----------------------
def logger_init(cfg_file):
"""Initialize logger instance."""
log = logging.getLogger()
log.setLevel(logging.DEBUG)
try:
log.debug('Looking for log configuration file: %s' % cfg_file)
# Default logging configuration file
logging.config.fileConfig(cfg_file)
except Exception:
# Only add handler if not already done
if len(log.handlers) == 0:
# Hardcoded default logging configuration if no/bad config file
console_handler = logging.StreamHandler(sys.stdout)
fmt_str = "[%(asctime)s.%(msecs)03d %(name)s %(levelname)s] " \
"%(message)s"
console_handler.setFormatter(
logging.Formatter(fmt_str, "%Y-%m-%d %H:%M:%S"))
log.addHandler(console_handler)
log.setLevel(logging.DEBUG)
log.debug('Defaulting to stdout')
return log
log = logger_init(None)
# Class for keeping configuration parameters
# ------------------------------------------
class AlarmConfig():
"""
Class used to store parameters
"""
def __init__(self, code_dest_dir, config_dest_dir,
source_file, template, config_template):
self._code_dest_dir = code_dest_dir
self._config_dest_dir = config_dest_dir
self._source_file = source_file
self._template = template
self._config_template = config_template
self._sha256 = None
# Class for processing inotify events
# ------------------------------------
class InotifyEventsHandler(pyinotify.ProcessEvent):
"""
Class used to process inotify events.
"""
def my_init(self, cfg, name, out=None):
"""
@param cfg: configuration to use for generation callback.
@type cfg: AlarmConfig.
@param name: File name to be watched.
@type name: String.
@param out: Logger where events will be written.
@type out: Object providing a valid logging object interface.
"""
if out is None:
out = log
self._out = out
self._cfg = cfg
self._name = name
def process_default(self, event):
"""
Writes event string representation to logging object provided to
my_init().
@param event: Event to be processed. Can be of any type of events but
IN_Q_OVERFLOW events (see method process_IN_Q_OVERFLOW).
@type event: Event instance
"""
self._out.debug(
'Received event %s'
% str(event))
# File name on which inotify event has been triggered does
# not match => return right away
if event.name != self._name:
self._out.debug(
'Ignoring event %s (path does not match %s)'
% (str(event), self._name))
return
self._out.info('File %s has been updated' % event.name)
# Callback function called with proper parameters
if not yaml_alarms_2_lua_and_hindsight_cfg_files(
self._cfg
):
log.error('Error converting YAML alarms into LUA code')
# Check alarm entry for field existence and type
# TODO: see if we ca use similar methods from
# fuel-ccp which uses jsonschema to validate types.
# -------------------------------------------------
def check_alarm_entry_field(alarm, field, ftype):
try:
akeys = alarm.keys()
# Field lookup
if field not in akeys:
log.error('Error parsing file: alarm entry does ' +
'not have a %s field: %s'
% (field, alarm))
return False
# Do we need to check for proper type too ?
if ftype is not None:
vfield = alarm[field]
vftype = type(vfield)
# Check for proper type
if vftype is not ftype:
log.error('Error parsing file: alarm entry does ' +
'not have a field %s is not of type ' +
'%s: found %s [%s]'
% (field, ftype.__name__, vftype.__name__, alarm))
return False
except Exception as e:
log.error('Error checking for %s: %s' % (field, e))
return False
return True
# YAML alarms structure validation
#
# TODO: see if we ca use similar methods from
# fuel-ccp which uses jsonschema to validate types.
#
# TODO: do not return false right away
# when processing lists so that most errors
# are reported at once allowing for faster
# achievement of correctness
# -------------------------------------------------
def validate_yaml(alarms_yaml):
log.info('Validating YAML alarms structure')
ctx = ''
try:
log.debug('Retrieving all alarms')
# Try to retrieve alarms definitions
# and check for overall validity
alarms = alarms_yaml['alarms']
if alarms is None:
log.error('Error parsing file: empty alarm list')
return False
# alarms entry should be a list
atype = type(alarms)
if atype is not list:
log.error('Error parsing file: alarms entry is not a list (%s)'
% atype.__name__)
return False
# Keep the complete list of alarm names
anames = []
# Checking all alarms
for alarm in alarms:
akeys = alarm.keys()
if not check_alarm_entry_field(alarm, 'name', str):
return False
# TODO do we need to add some more checks here ?
anames.append(alarm['name'])
log.debug('Found %d alarms' % len(anames))
# Try to retrieve alarms groups definitions
# and check overall validity
log.debug('Retrieving alarms groups')
cluster_alarms = alarms_yaml['node_cluster_alarms']
ckeys = cluster_alarms.keys()
for ckey in ckeys:
log.debug('Parsing alarms group %s' % ckey)
ctx = ' under node_cluster_alarms[%s]' % ckey
# Are there some alarm key defined
# (if not, the next line throws exception)
c_alarms = cluster_alarms[ckey]['alarms']
if c_alarms is None:
log.error('Error parsing file: empty alarm list%s' % ctx)
return False
# Now check validity of alarm entries
akeys = c_alarms.keys()
log.debug('Found %d alarms in group %s' % (len(akeys), ckey))
for k in akeys:
# Must be a list
v = c_alarms[k]
ktype = type(v)
if ktype is not list:
log.error('Error parsing file: alarm entry for %s ' +
'is not a list (%s)%s'
% (k, ktype.__name__, ctx))
return False
# Each member of list must be a string
for s in v:
stype = type(s)
if stype is not str:
log.error('Error parsing file: alarm entry for %s ' +
'is not a list of strings (%s) [%s]%s'
% (k, stype.__name__, s, ctx))
return False
# Now check that all alarm referenced in
# alarm groups have been defined
for agroup in c_alarms:
for aname in c_alarms[agroup]:
if aname not in anames:
log.error(
('Error parsing file: alarm with name %s is not ' +
'defined but is referenced in alarm group %s')
% (aname, agroup))
return False
except KeyError as e:
log.error('Error parsing file: can not find %s key%s' % (e, ctx))
return False
except Exception as e:
log.error('Error parsing file: unknown exception %s %s'
% (type(e), str(e)))
return False
return True
# Retrieve alarm by its name within list
# --------------------------------------
def find_alarm_by_name(aname, alarms):
for alarm in alarms:
if alarm['name'] == aname:
return alarm
return None
# Check file for content changes and returns boolean
# True => content has changed
# False = content is unchanged
#
# File path can be altered using string substitutions
# so as to adapt to Hindsight current running state
# when file are moved around once taken into account
# ---------------------------------------------------
def content_changed(file_fullpath, file_content, replace=None):
log.debug(
'Checking file %s for changes'
% file_fullpath)
fullpath = file_fullpath
# Do we need to replace some parts of the path
if replace is not None:
for k in replace.keys():
fullpath = fullpath.replace(k, replace[k])
log.debug(
'Checking file path %s for changes'
% fullpath)
# File does not exist => needs to be created therefore
# content has changed
if not os.path.isfile(fullpath):
log.debug(
'File %s does not exist'
% fullpath)
return True
# Read the file content
with open(fullpath, 'r') as in_fd:
try:
old_content = in_fd.read()
# Compare former content to new one
if old_content == file_content:
log.debug(
'File %s content has not changed'
% fullpath)
return False
except Exception as e:
log.error(
'Error reading %s got exception: %s'
% (fullpath, e))
return True
log.debug(
'File %s content has changed'
% fullpath)
return True
# Convert YAML file containing alarms into lua code
# and create Hindsight configuration files
# -------------------------------------------------
def yaml_alarms_2_lua_and_hindsight_cfg_files(
alarm_config):
(lua_code_dest_dir,
lua_config_dest_dir,
yaml_file,
template,
cfg_template) = (alarm_config._code_dest_dir,
alarm_config._config_dest_dir,
alarm_config._source_file,
alarm_config._template,
alarm_config._config_template)
log.info(
'Converting alarm YAML file %s to LUA code in %s and configs in %s'
% (yaml_file, lua_code_dest_dir, lua_config_dest_dir))
try:
if os.stat(yaml_file).st_size == 0:
log.error('File %s will not be parsed: size = 0' % yaml_file)
return False
# Open file and retrieve YAML structure if correctly formed
with open(yaml_file, 'r') as in_fd:
try:
alarms_defs = in_fd.read()
sha256sum = hashlib.sha256(alarms_defs).hexdigest()
if sha256sum == alarm_config._sha256:
log.warning('No change detected in file: %s' % yaml_file)
return True
alarm_config._sha256 = sha256sum
alarms_yaml = yaml.load(alarms_defs)
except yaml.YAMLError as exc:
log.error('Error parsing file: %s' % exc)
return False
# Check overall validity of alarms definitions
if not validate_yaml(alarms_yaml):
log.error('Error validating alarms definitions')
return False
# Now retrieve the information for config and code files generation
cluster_alarms = alarms_yaml['node_cluster_alarms']
for afd_cluster_name in cluster_alarms:
for key in cluster_alarms[afd_cluster_name]['alarms'].keys():
# Key can not contain dash or other non letter/numbers
if not re.match('^[A-Za-z0-9]*$', key):
log.error('Alarm group name can only contain letters ' +
'and digits: %s'
% key)
return False
# Build list of associated alarms
alarms = []
for aname in cluster_alarms[afd_cluster_name]['alarms'][key]:
alarms.append(
find_alarm_by_name(
aname, alarms_yaml['alarms']))
# Write LUA code file
afd_file = 'afd_node_%s_%s_alarms' % (afd_cluster_name, key)
lua_code_dest_file = os.path.join(
lua_code_dest_dir, "%s.lua" % afd_file)
lua_code = template.render(alarms=alarms)
updated_lua_code = False
# Check if the generated code has changed
if not content_changed(lua_code_dest_file, lua_code):
log.info('Unchanged LUA file %s' % lua_code_dest_file)
else:
# LUA code changes should force re-generation of config
# file so as to force Hindsight to take changes into
# account
updated_lua_code = True
log.info('Writing LUA file: %s' % lua_code_dest_file)
# Produce LUA code file corresponding to alarm
with open(lua_code_dest_file, 'w') as out_fd:
try:
out_fd.write(lua_code)
except Exception as e:
log.error('Error writing %s: got exception: %s'
% (lua_code_dest_file, e))
return False
# Write LUA config file
afd_file = 'afd_node_%s_%s_alarms' % (afd_cluster_name, key)
lua_config_dest_file = os.path.join(
lua_config_dest_dir, "%s.cfg" % afd_file)
lua_config = cfg_template.render(
afd_file=afd_file,
afd_cluster_name=afd_cluster_name,
afd_logical_name=key
)
# Check if the generated config has changed
# of if we need to force config writing due to
# changes in LUA code above
#
# Note that config is written into .../load/...
# and moved to .../run/... by Hindsight
if (
not content_changed(
lua_config_dest_file,
lua_config,
{'/load/': '/run/'}) and
not updated_lua_code):
log.info('Unchanged config file %s' % lua_config_dest_file)
else:
log.info('Writing config file: %s' % lua_config_dest_file)
with open(lua_config_dest_file, 'w') as out_fd:
try:
out_fd.write(lua_config)
except Exception as e:
log.error('Error writing %s: got exception: %s'
% (lua_config_dest_file, e))
return False
except Exception as e:
log.error('Error got exception: %s' % e)
return False
return True
# Command line argument parsing
# -----------------------------
def cmd_line_args_parser():
parser = argparse.ArgumentParser(
description="""Alarm manager watches for new alarms definitions
in specified directory and applies them TBC ...
"""
)
parser.add_argument(
'-c', '--config',
help='log level and format configuration file (default %s)'
% dflt_config,
default=dflt_config,
dest='config'
)
parser.add_argument(
'-d', '--code-destdir',
help='destination path for LUA plugins code files ' +
'(default %s)' % dflt_dest_dir,
default=dflt_dest_dir,
dest='code_dest_dir'
)
parser.add_argument(
'-D', '--config-destdir',
help='destination path for LUA plugins configuration ' +
'files (default %s)' % dflt_cfg_dest_dir,
default=dflt_cfg_dest_dir,
dest='config_dest_dir'
)
parser.add_argument(
'-t', '--template',
help='LUA template file (default %s)' % dflt_template,
default=dflt_template,
dest='template'
)
parser.add_argument(
'-T', '--config-template',
help='LUA plugins configuration template file (default %s)' %
(dflt_cfg_template),
default=dflt_cfg_template,
dest='cfg_template'
)
parser.add_argument(
'-w', '--watch-path',
help='path to watch for changes (default %s)' %
(dflt_cfg_dir),
default=dflt_cfg_dir,
dest='watch_path'
)
parser.add_argument(
'-x', '--exit',
help='exit program without watching filesystem changes',
action='store_const',
const=True, default=False,
dest='exit'
)
args = parser.parse_args()
log = logger_init(args.config)
log.info('Watch path: %s\n\tConfig: %s\n\tTemplate: %s'
% (args.watch_path, args.config, args.template))
if (
not os.path.isdir(args.watch_path) or
not os.access(args.watch_path, os.R_OK)):
log.error("{} not a directory or is not readable"
.format(args.watch_path))
sys.exit(1)
if (
not os.path.isdir(args.code_dest_dir) or
not os.access(args.code_dest_dir, os.W_OK)):
log.error("{} not a directory or is not writable"
.format(args.code_dest_dir))
sys.exit(1)
if (
not os.path.isdir(args.config_dest_dir) or
not os.access(args.config_dest_dir, os.W_OK)):
log.error("{} not a directory or is not writable"
.format(args.config_dest_dir))
sys.exit(1)
if (
not os.path.isfile(args.template) or
not os.access(args.template, os.R_OK)):
log.error("{} not a file or is not readable".format(args.template))
sys.exit(1)
if (
not os.path.isfile(args.cfg_template) or
not os.access(args.cfg_template, os.R_OK)):
log.error("{} not a file or is not readable".format(args.cfg_template))
sys.exit(1)
src = os.path.join(args.watch_path, dflt_alarm_file)
log.info('Looking for existing readable file: %s' % src)
if os.access(src, os.R_OK):
log.info('Using LUA template %s and LUA config template %s'
% (args.template, args.cfg_template))
j2_env = jinja2.Environment(
loader=jinja2.FileSystemLoader(
os.path.dirname(
args.template)),
trim_blocks=True)
template = j2_env.get_template(
os.path.basename(
args.template))
j2_cfg_env = jinja2.Environment(
loader=jinja2.FileSystemLoader(
os.path.dirname(
args.cfg_template)),
trim_blocks=True)
cfg_template = j2_cfg_env.get_template(
os.path.basename(
args.cfg_template))
alarm_cfg = AlarmConfig(
args.code_dest_dir,
args.config_dest_dir,
src,
template,
cfg_template)
if not yaml_alarms_2_lua_and_hindsight_cfg_files(
alarm_cfg
):
log.error('Error converting YAML alarms into LUA code')
# Asked to leave right away or continue watching inotify events ?
if args.exit:
sys.exit(0)
# watch manager instance
wm = pyinotify.WatchManager()
# notifier instance and init
notifier = pyinotify.Notifier(
wm,
default_proc_fun=InotifyEventsHandler(
cfg=alarm_cfg,
name=dflt_alarm_file))
# What mask to apply
mask = pyinotify.IN_CLOSE_WRITE
log.debug('Start monitoring of %s' % args.watch_path)
# Do not recursively dive into path
# Do not add watches on newly created subdir in path
# Do not do globbing on path name
wm.add_watch(args.watch_path,
mask, rec=False,
auto_add=False,
do_glob=False)
# Loop forever (until sigint signal get caught)
notifier.loop(callback=None)
if __name__ == '__main__':
cmd_line_args_parser()

View File

@ -1,22 +0,0 @@
[loggers]
keys=root
[handlers]
keys=stream_handler
[formatters]
keys=formatter
[logger_root]
level=DEBUG
handlers=stream_handler
[handler_stream_handler]
class=StreamHandler
level=DEBUG
formatter=formatter
args=(sys.stdout,)
[formatter_formatter]
format=%(asctime)s.%(msecs)03d - %(name)s - %(levelname)s - %(message)s
datefmt=%Y-%m-%d %H:%M:%S

View File

@ -1,41 +0,0 @@
local M = {}
setfenv(1, M) -- Remove external access to contain everything in the module
local alarms = {
{% for alarm in alarms %}
{
{% for fkey in alarm.keys()|sort() %}
{% if fkey != "trigger" %}
['{{ fkey }}'] = '{{ alarm[fkey] }}',
{% endif %}
{% endfor %}
{% if alarm.trigger is defined %}
['trigger'] = {
{% if alarm.trigger.logical_operator is defined %}
['logical_operator'] = '{{ alarm.trigger.logical_operator }}',
{% endif %}
['rules'] = {
{% for rule in alarm.trigger.rules %}
{
{% for fkey in rule.keys()|sort() %}
{% if fkey != "fields" %}
['{{ fkey }}'] = '{{ rule[fkey] }}',
{% endif %}
{% endfor %}
{% if rule.fields is defined %}
['fields'] = {
{% for fkey in rule.fields.keys() %}
['{{ fkey }}'] = '{{ rule.fields[fkey] }}'
{% endfor %}
},
{% endif %}
},
{% endfor %}
},
},
{% endif %}
},
{% endfor %}
}
return alarms

View File

@ -1,6 +0,0 @@
# The order of packages is significant, because pip processes them in the order
# of appearance. Changing the order has an impact on the overall integration
# process, which may cause wedges in the gate later.
pyinotify>=0.9.6;sys_platform!='win32' and sys_platform!='darwin' and sys_platform!='sunos5' # MIT
PyYAML>=3.1.0 # MIT
Jinja2>=2.8 # BSD License (3 clause)

View File

@ -1,26 +0,0 @@
FROM {{ image_spec("base-tools") }}
MAINTAINER {{ maintainer }}
RUN apt-get -y -t jessie-backports --no-install-recommends install golang \
&& apt-get clean
# ReplaceMe with a heka package install
COPY install-heka.sh /tmp/
RUN mkdir -p /var/cache/hekad /usr/share/heka/lua_modules /etc/heka
RUN bash -x /tmp/install-heka.sh
# Add this to heka package?
COPY plugins/modules /usr/share/heka/lua_modules/
COPY plugins/decoders /usr/share/heka/lua_decoders/
COPY plugins/encoders /usr/share/heka/lua_encoders/
RUN useradd --user-group heka \
&& usermod -a -G microservices heka \
&& chown -R heka: /usr/share/heka /etc/heka /var/cache/hekad
# https://github.com/mozilla-services/heka/issues/1881
ENV GODEBUG cgocheck=0
# We need to mount docker.sock for docker plugin. And this sock need
# docker group or root user permissions.
#USER heka

View File

@ -1,4 +0,0 @@
%microservices ALL=(root) NOPASSWD: /bin/chown heka\:microservices /var/log/microservices, /usr/bin/chown heka\:microservices /var/log/microservices
%microservices ALL=(root) NOPASSWD: /bin/chmod 2775 /var/log/microservices, /usr/bin/chmod 2775 /var/log/microservices
%microservices ALL=(root) NOPASSWD: /bin/chown heka\: /var/cache/hekad, /usr/bin/chown heka\: /var/cache/hekad
%microservices ALL=(root) NOPASSWD: /bin/chown heka\:microservices /var/lib/microservices/heka, /usr/bin/chown heka\:microservices /var/lib/microservices/heka

View File

@ -1,36 +0,0 @@
#!/bin/bash
set -e
PLUGINDIR="$1"
export GOPATH="/go"
mkdir -p "$GOPATH/src" "$GOPATH/bin"
chmod -R 777 "$GOPATH"
export PATH=/usr/local/go/bin:$GOPATH/bin/:$PATH
echo "Get system dependencies..."
BUILD_DEPS="git gcc g++ libc6-dev make cmake debhelper fakeroot patch"
apt-get update
apt-get install -y --no-install-recommends $BUILD_DEPS
echo "Get and build Heka..."
cd /tmp
git clone -b dev --single-branch https://github.com/mozilla-services/heka
cd heka
touch message/message.pb.go # make sure message/message.pb.go has a date
# more recent than message/message.proto, to
# prevent make from attempting to re-generate
# message.pb.go
source build.sh # changes GOPATH to /tmp/heka/build/heka and builds Heka
install -vD /tmp/heka/build/heka/bin/* /usr/local/bin/
cp -rp /tmp/heka/build/heka/lib/lib* /usr/lib/
cp -rp /tmp/heka/build/heka/lib/luasandbox/modules/* /usr/share/heka/lua_modules/
echo "Clean up..."
apt-get purge -y --auto-remove $BUILD_DEPS
apt-get clean
rm -rf /tmp/heka
rm -rf /var/lib/apt/lists/*
rm -rf $GOPATH

View File

@ -1,100 +0,0 @@
-- Copyright 2015-2016 Mirantis, Inc.
--
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
--
-- http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
local l = require 'lpeg'
l.locale(l)
local dt = require "date_time"
local common_log_format = require 'common_log_format'
local patt = require 'os_patterns'
local utils = require 'os_utils'
local msg = {
Timestamp = nil,
Type = 'log',
Hostname = nil,
Payload = nil,
Pid = nil,
Fields = nil,
Severity = nil,
}
local severity_label = utils.severity_to_label_map[msg.Severity]
local access_log_pattern = read_config("access_log_pattern") or error(
"access_log_pattern configuration must be specificed")
local access_log_grammar = common_log_format.build_apache_grammar(access_log_pattern)
local request_grammar = l.Ct(patt.http_request)
-- Since "common_log_format.build_apache_grammar", doesnt support ErrorLogFormat,
-- we have to create error log grammar by ourself. Example error log string:
-- 2016-08-15 10:46:27.679999 wsgi:error 340:140359488239360 Not Found: /favicon.ico
local sp = patt.sp
local colon = patt.colon
local p_timestamp = l.Cg(l.Ct(dt.rfc3339_full_date * (sp + l.P"T") * dt.rfc3339_partial_time * (dt.rfc3339_time_offset + dt.timezone_offset)^-1), "Timestamp")
local p_module = l.Cg(l.R("az")^0, "Module")
local p_errtype = l.Cg(l.R("az")^0, "ErrorType")
local p_pid = l.Cg(l.digit^-5, "Pid")
local p_tid = l.Cg(l.digit^-15, "TreadID")
local p_mess = l.Cg(patt.Message, "Message")
local error_log_grammar = l.Ct(p_timestamp * sp * p_module * colon * p_errtype * sp * p_pid * colon * p_tid * sp * p_mess)
function prepare_message (timestamp, pid, severity, severity_label, programname, payload)
msg.Logger = 'openstack.horizon-apache'
msg.Payload = payload
msg.Timestamp = timestamp
msg.Pid = pid
msg.Severity = severity
msg.Fields = {}
msg.Fields.programname = programname
msg.Fields.severity_label = severity_label
end
function process_message ()
-- logger is either "horizon-access" or "horizon-error"
local logger = read_message("Logger")
local log = read_message("Payload")
local m
if logger == "horizon-access" then
m = access_log_grammar:match(log)
if m then
prepare_message(m.Timestamp, m.Pid, "6", "INFO", logger, log)
msg.Fields.http_status = m.status
msg.Fields.http_response_time = m.request_time.value / 1e6 -- us to sec
local request = m.request
r = request_grammar:match(request)
if r then
msg.Fields.http_method = r.http_method
msg.Fields.http_url = r.http_url
msg.Fields.http_version = r.http_version
end
else
return -1, string.format("Failed to parse %s log: %s", logger, string.sub(log, 1, 64))
end
elseif logger == "horizon-error" then
m = error_log_grammar:match(log)
if m then
prepare_message(m.Timestamp, m.Pid, "3", "ERROR", logger, m.Message)
else
return -1, string.format("Failed to parse %s log: %s", logger, string.sub(log, 1, 64))
end
else
error("Logger unknown")
end
return utils.safe_inject_message(msg)
end

View File

@ -1,72 +0,0 @@
-- Copyright 2015-2016 Mirantis, Inc.
--
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
--
-- http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
local l = require 'lpeg'
l.locale(l)
local common_log_format = require 'common_log_format'
local patt = require 'os_patterns'
local utils = require 'os_utils'
local msg = {
Timestamp = nil,
Type = 'log',
Hostname = nil,
Payload = nil,
Pid = nil,
Fields = nil,
Severity = 6,
}
local severity_label = utils.severity_to_label_map[msg.Severity]
local apache_log_pattern = read_config("apache_log_pattern") or error(
"apache_log_pattern configuration must be specificed")
local apache_grammar = common_log_format.build_apache_grammar(apache_log_pattern)
local request_grammar = l.Ct(patt.http_request)
function process_message ()
-- logger is either "keystone-apache-public" or "keystone-apache-admin"
local logger = read_message("Logger")
local log = read_message("Payload")
local m
m = apache_grammar:match(log)
if m then
msg.Logger = 'openstack.keystone-apache'
msg.Payload = log
msg.Timestamp = m.time
msg.Fields = {}
msg.Fields.http_status = m.status
msg.Fields.http_response_time = m.request_time.value / 1e6 -- us to sec
msg.Fields.programname = logger
msg.Fields.severity_label = severity_label
local request = m.request
m = request_grammar:match(request)
if m then
msg.Fields.http_method = m.http_method
msg.Fields.http_url = m.http_url
msg.Fields.http_version = m.http_version
end
return utils.safe_inject_message(msg)
end
return -1, string.format("Failed to parse %s log: %s", logger, string.sub(log, 1, 64))
end

View File

@ -1,62 +0,0 @@
-- Copyright 2015-2016 Mirantis, Inc.
--
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
--
-- http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
require "string"
local dt = require "date_time"
local l = require 'lpeg'
l.locale(l)
local patt = require 'os_patterns'
local utils = require 'os_utils'
local msg = {
Timestamp = nil,
Type = 'log',
Hostname = nil,
Payload = nil,
Pid = nil,
Fields = {
programname = 'mysql',
severity_label = nil,
},
Severity = nil,
}
-- mysqld logs are cranky, the hours have no leading zero and the "real" severity level is enclosed by square brackets...
-- 2016-07-28 11:09:24 139949080807168 [Note] InnoDB: Dumping buffer pool(s) not yet started
-- Different pieces of pattern
local sp = patt.sp
local colon = patt.colon
local p_timestamp = l.Cg(dt.rfc3339_full_date * sp^1 * dt.rfc3339_partial_time, "Timestamp")
local p_thread_id = l.digit^-15
local p_severity_label = l.P"[" * l.Cg(l.R("az", "AZ")^0 / string.upper, "SeverityLabel") * l.P"]"
local p_message = l.Cg(patt.Message, "Message")
local mysql_grammar = l.Ct(p_timestamp * sp^1 * p_thread_id * sp^1 * p_severity_label * sp^1 * p_message)
function process_message ()
local log = read_message("Payload")
local logger = read_message("Logger")
local m = mysql_grammar:match(log)
if not m then return -1 end
msg.Timestamp = m.Timestamp
msg.Logger = logger
msg.Payload = m.Message
msg.Fields.severity_label = m.SeverityLabel
return utils.safe_inject_message(msg)
end

View File

@ -1,164 +0,0 @@
-- Copyright 2015-2016 Mirantis, Inc.
--
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
--
-- http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
require "string"
require "table"
local l = require 'lpeg'
l.locale(l)
local patt = require 'os_patterns'
local utils = require 'os_utils'
local service_pattern = read_config("heka_service_pattern") or
error('heka_service_pattern must be specified')
local msg = {
Timestamp = nil,
Type = 'log',
Hostname = nil,
Payload = nil,
Pid = nil,
Fields = nil,
Severity = nil,
}
-- traceback_lines is a reference to a table used to accumulate lines of
-- a Traceback. traceback_key represent the key of the Traceback lines
-- being accumulated in traceback_lines. This is used to know when to
-- stop accumulating and inject the Heka message.
local traceback_key = nil
local traceback_lines = nil
function prepare_message (service, timestamp, pid, severity_label,
python_module, programname, cont_name, payload)
msg.Logger = 'openstack.' .. service
msg.Timestamp = timestamp
msg.Payload = payload
msg.Pid = pid
msg.Severity = utils.label_to_severity_map[severity_label] or 7
msg.Fields = {}
msg.Fields.severity_label = severity_label
msg.Fields.python_module = python_module
msg.Fields.programname = programname
msg.Fields.container_name = cont_name
msg.Payload = payload
end
-- OpenStack log messages are of this form:
-- 2015-11-30 08:38:59.306 3434 INFO oslo_service.periodic_task [-] Blabla...
--
-- [-] is the "request" part, it can take multiple forms.
function process_message ()
local cont_name = read_message("Fields[ContainerName]")
local program = string.match(cont_name, service_pattern)
local service = nil
if program == nil then
program = "unknown_program"
else
service = string.match(program, '(.-)%-.*')
--- Most of the OS services should match the pattern e.g. "nova-api"
--- But some of them dont, e.g "keystone"
if service == nil then
service = string.match(program, '(%a+)')
end
end
--- If service is still nil, it means we fail to match current service
--- using both patterns, so we set fallback one.
if service == nil then
service = "unknown_service"
end
local log = read_message("Payload")
local m
m = patt.openstack:match(log)
if not m then
return -1 --, string.format("Failed to parse %s log: %s", logger, string.sub(log, 1, 64))
end
-- You could debug something here using this:
-- add_to_payload(string.format("Debug: %s\n", VAR))
-- inject_payload("txt", "debug")
local key = {
Timestamp = m.Timestamp,
Pid = m.Pid,
SeverityLabel = m.SeverityLabel,
PythonModule = m.PythonModule,
service = service,
program = program,
}
if traceback_key ~= nil then
-- If traceback_key is not nil then it means we've started accumulated
-- lines of a Python traceback. We keep accumulating the traceback
-- lines util we get a different log key.
if utils.table_equal(traceback_key, key) then
table.insert(traceback_lines, m.Message)
return 0
else
prepare_message(traceback_key.service, traceback_key.Timestamp,
traceback_key.Pid, traceback_key.SeverityLabel,
traceback_key.PythonModule, traceback_key.program,
cont_name, table.concat(traceback_lines, ''))
traceback_key = nil
traceback_lines = nil
-- Ignore safe_inject_message status code here to still get a
-- chance to inject the current log message.
utils.safe_inject_message(msg)
end
end
if patt.traceback:match(m.Message) then
-- Python traceback detected, begin accumulating the lines making
-- up the traceback.
traceback_key = key
traceback_lines = {}
table.insert(traceback_lines, m.Message)
return 0
end
prepare_message(service, m.Timestamp, m.Pid, m.SeverityLabel, m.PythonModule,
program, cont_name, m.Message)
m = patt.openstack_request_context:match(msg.Payload)
if m then
msg.Fields.request_id = m.RequestId
if m.UserId then
msg.Fields.user_id = m.UserId
end
if m.TenantId then
msg.Fields.tenant_id = m.TenantId
end
end
m = patt.openstack_http:match(msg.Payload)
if m then
msg.Fields.http_method = m.http_method
msg.Fields.http_status = m.http_status
msg.Fields.http_url = m.http_url
msg.Fields.http_version = m.http_version
msg.Fields.http_response_size = m.http_response_size
msg.Fields.http_response_time = m.http_response_time
m = patt.ip_address:match(msg.Payload)
if m then
msg.Fields.http_client_ip_address = m.ip_address
end
end
return utils.safe_inject_message(msg)
end

View File

@ -1,87 +0,0 @@
-- Copyright 2015-2016 Mirantis, Inc.
--
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
--
-- http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
require "string"
local l = require 'lpeg'
l.locale(l)
local dt = require "date_time"
local patt = require 'os_patterns'
local utils = require 'os_utils'
local msg = {
Timestamp = nil,
Type = 'log',
Hostname = nil,
Payload = nil,
Fields = {},
Severity = nil,
}
-- ovs logs looks like this:
-- 2016-08-10T09:27:41Z|00038|connmgr|INFO|br-ex<->tcp:127.0.0.1:6633: 2 flow_mods 10 s ago (2 adds)
-- Different pieces of pattern
local sp = patt.sp
local colon = patt.colon
local pipe = patt.pipe
local dash = patt.dash
local p_timestamp = l.Cg(l.Ct(dt.rfc3339_full_date * (sp + l.P"T") * dt.rfc3339_partial_time * (dt.rfc3339_time_offset + dt.timezone_offset)^-1), "Timestamp")
local p_id = l.Cg(l.digit^-5, "Message_ID")
local p_module = l.Cg(l.R("az")^0, "Module")
local p_severity_label = l.Cg(l.R("AZ")^0, "SeverityLabel")
local p_message = l.Cg(patt.Message, "Message")
local ovs_grammar = l.Ct(p_timestamp * pipe * p_id * pipe * p_module * pipe * p_severity_label * pipe * p_message)
local pattern = read_config("heka_service_pattern") or "^k8s_(.-)%..*"
function process_message ()
local cont_name = read_message("Fields[ContainerName]")
local program = string.match(cont_name, pattern)
local service = nil
if program == nil then
program = "unknown_program"
else
service = string.match(program, '(.-)%-.*')
end
--- If service is still nil, it means we fail to match current service
--- using both patterns, so we set fallback one.
if service == nil then
service = "unknown_service"
end
local log = read_message("Payload")
local m = ovs_grammar:match(log)
if not m then return -1 end
if m.SeverityLabel == "WARN" then
m.SeverityLabel = "WARNING"
end
msg.Timestamp = m.Timestamp
msg.Logger = service
msg.Payload = m.Message
msg.Severity = utils.label_to_severity_map[m.SeverityLabel] or 7
msg.Fields.module = m.Module
msg.Fields.message_id = m.Message_ID
msg.Fields.programname = program
msg.Fields.container_name = cont_name
msg.Fields.severity_label = m.SeverityLabel
return utils.safe_inject_message(msg)
end

View File

@ -1,73 +0,0 @@
-- Copyright 2015-2016 Mirantis, Inc.
--
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
--
-- http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
local dt = require "date_time"
local l = require 'lpeg'
l.locale(l)
local patt = require 'os_patterns'
local utils = require 'os_utils'
local msg = {
Timestamp = nil,
Type = 'log',
Hostname = nil,
Payload = nil,
Pid = nil,
Fields = {
programname = 'rabbitmq',
severity_label = nil,
},
Severity = nil,
}
-- RabbitMQ message logs are formatted like this:
-- =ERROR REPORT==== 2-Jan-2015::09:17:22 ===
-- Blabla
-- Blabla
--
local message = l.Cg(patt.Message / utils.chomp, "Message")
-- The token before 'REPORT' isn't standardized so it can be a valid severity
-- level as 'INFO' or 'ERROR' but also 'CRASH' or 'SUPERVISOR'.
local severity = l.Cg(l.R"AZ"^1, "SeverityLabel")
local day = l.R"13" * l.R"09" + l.R"19"
local datetime = l.Cg(day, "day") * patt.dash * dt.date_mabbr * patt.dash * dt.date_fullyear *
"::" * dt.rfc3339_partial_time
local timestamp = l.Cg(l.Ct(datetime)/ dt.time_to_ns, "Timestamp")
local grammar = l.Ct("=" * severity * " REPORT==== " * timestamp * " ===" * l.P'\n' * message)
function process_message ()
local log = read_message("Payload")
local m = grammar:match(log)
if not m then
return -1
end
msg.Timestamp = m.Timestamp
msg.Payload = m.Message
msg.Logger = read_message("Logger")
if utils.label_to_severity_map[m.SeverityLabel] then
msg.Severity = utils.label_to_severity_map[m.SeverityLabel]
elseif m.SeverityLabel == 'CRASH' then
msg.Severity = 2 -- CRITICAL
else
msg.Severity = 5 -- NOTICE
end
msg.Fields.severity_label = utils.severity_to_label_map[msg.Severity]
return utils.safe_inject_message(msg)
end

View File

@ -1,49 +0,0 @@
-- Copyright 2016 Mirantis, Inc.
--
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
--
-- http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
--
-- The code in this file was inspired by Heka's rsyslog.lua decoder plugin.
-- https://github.com/mozilla-services/heka/blob/master/sandbox/lua/decoders/rsyslog.lua
local syslog = require "syslog"
local utils = require "os_utils"
local msg = {
Timestamp = nil,
Type = 'log',
Hostname = read_config("hostname"),
Payload = nil,
Pid = nil,
Severity = nil,
Fields = nil
}
-- See https://github.com/openstack/swift/blob/2a8b455/swift/common/utils.py#L1423-L1424
local swift_grammar = syslog.build_rsyslog_grammar('<%PRI%>%programname%: %msg%')
function process_message ()
local log = read_message("Payload")
local fields = swift_grammar:match(log)
if not fields then return -1 end
msg.Severity = fields.pri.severity
fields.syslogfacility = fields.pri.facility
fields.pri = nil
msg.Payload = fields.msg
fields.msg = nil
msg.Fields = fields
return utils.safe_inject_message(msg)
end

View File

@ -1,55 +0,0 @@
-- Copyright 2016 Mirantis, Inc.
--
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
--
-- http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
--
-- The code in this file was inspired by Heka's rsyslog.lua decoder plugin.
-- https://github.com/mozilla-services/heka/blob/master/sandbox/lua/decoders/rsyslog.lua
local syslog = require "syslog"
local utils = require "os_utils"
local msg = {
Timestamp = nil,
Type = 'log',
Hostname = read_config("hostname"),
Payload = nil,
Pid = nil,
Severity = nil,
Fields = nil
}
-- See https://tools.ietf.org/html/rfc3164
local grammar = syslog.build_rsyslog_grammar('<%PRI%>%TIMESTAMP% %syslogtag% %msg%')
function process_message ()
local log = read_message("Payload")
local fields = grammar:match(log)
if not fields then return -1 end
msg.Timestamp = fields.timestamp
fields.timestamp = nil
msg.Severity = fields.pri.severity
fields.syslogfacility = fields.pri.facility
fields.pri = nil
fields.programname = fields.syslogtag.programname
msg.Pid = fields.syslogtag.pid
fields.syslogtag = nil
msg.Payload = fields.msg
fields.msg = nil
msg.Fields = fields
return utils.safe_inject_message(msg)
end

View File

@ -1,26 +0,0 @@
-- Copyright 2016 Mirantis, Inc.
--
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
--
-- http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
require "string"
local interpolate = require "msg_interpolate"
local utils = require "os_utils"
local header_template = "<%{Severity}>%{%FT%TZ} %{Hostname} %{programname}[%{Pid}]:"
function process_message()
local timestamp = read_message("Timestamp") / 1e9
local header = interpolate.interpolate_from_msg(header_template, timestamp)
local payload = string.format("%s %s\n", header, read_message("Payload"))
return utils.safe_inject_payload("txt", "", payload)
end

View File

@ -1,145 +0,0 @@
-- Copyright 2015-2016 Mirantis, Inc.
--
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
--
-- http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
local table = require 'table'
local dt = require "date_time"
local l = require 'lpeg'
l.locale(l)
local tonumber = tonumber
local M = {}
setfenv(1, M) -- Remove external access to contain everything in the module
function format_uuid(t)
return table.concat(t, '-')
end
function anywhere (patt)
return l.P {
patt + 1 * l.V(1)
}
end
sp = l.space
colon = l.P":"
dash = l.P"-"
dot = l.P'.'
quote = l.P'"'
pipe = l.P'|'
local x4digit = l.xdigit * l.xdigit * l.xdigit * l.xdigit
local uuid_dash = l.C(x4digit * x4digit * dash * x4digit * dash * x4digit * dash * x4digit * dash * x4digit * x4digit * x4digit)
local uuid_nodash = l.Ct(l.C(x4digit * x4digit) * l.C(x4digit) * l.C(x4digit) * l.C(x4digit) * l.C(x4digit * x4digit * x4digit)) / format_uuid
-- Return a UUID string in canonical format (eg with dashes)
Uuid = uuid_nodash + uuid_dash
-- Parse a datetime string and return a table with the following keys
-- year (string)
-- month (string)
-- day (string)
-- hour (string)
-- min (string)
-- sec (string)
-- sec_frac (number less than 1, can be nil)
-- offset_sign ('-' or '+', can be nil)
-- offset_hour (number, can be nil)
-- offset_min (number, can be nil)
--
-- The datetime string can be formatted as
-- 'YYYY-MM-DD( |T)HH:MM:SS(.ssssss)?(offset indicator)?'
TimestampTable = l.Ct(dt.rfc3339_full_date * (sp + l.P"T") * dt.rfc3339_partial_time * (dt.rfc3339_time_offset + dt.timezone_offset)^-1)
-- Returns the parsed datetime converted to nanosec
Timestamp = TimestampTable / dt.time_to_ns
programname = (l.R("az", "AZ", "09") + l.P"." + dash + l.P"_")^1
Pid = l.digit^1
SeverityLabel = l.P"CRITICAL" + l.P"ERROR" + l.P"WARNING" + l.P"INFO" + l.P"AUDIT" + l.P"DEBUG"
Message = l.P(1)^0
-- Capture for OpenStack logs producing four values: Timestamp, Pid,
-- SeverityLabel, PythonModule and Message.
--
-- OpenStack log messages are of this form:
-- 2015-11-30 08:38:59.306 3434 INFO oslo_service.periodic_task [-] Blabla...
--
-- [-] is the "request" part, it can take multiple forms. See below.
openstack = l.Ct(l.Cg(Timestamp, "Timestamp")* sp * l.Cg(Pid, "Pid") * sp *
l.Cg(SeverityLabel, "SeverityLabel") * sp * l.Cg(programname, "PythonModule") *
sp * l.Cg(Message, "Message"))
-- Capture for OpenStack request context producing three values: RequestId,
-- UserId and TenantId.
--
-- Notes:
--
-- OpenStack logs include a request context, enclosed between square brackets.
-- It takes one of these forms:
--
-- [-]
-- [req-0fd2a9ba-448d-40f5-995e-33e32ac5a6ba - - - - -]
-- [req-4db318af-54c9-466d-b365-fe17fe4adeed 8206d40abcc3452d8a9c1ea629b4a8d0 112245730b1f4858ab62e3673e1ee9e2 - - -]
--
-- In the 1st case the capture produces nil.
-- In the 2nd case the capture produces one value: RequestId.
-- In the 3rd case the capture produces three values: RequestId, UserId, TenantId.
--
-- The request id may be formatted as 'req-xxx' or 'xxx' depending on the project.
-- The user id and tenant id may not be present depending on the OpenStack release.
openstack_request_context = (l.P(1) - "[" )^0 * "[" * l.P"req-"^-1 *
l.Ct(l.Cg(Uuid, "RequestId") * sp * ((l.Cg(Uuid, "UserId") * sp *
l.Cg(Uuid, "TenantId")) + l.P(1)^0)) - "]"
local http_method = l.Cg(l.R"AZ"^3, "http_method")
local url = l.Cg( (1 - sp)^1, "http_url")
local http_version = l.Cg(l.digit * dot * l.digit, "http_version")
-- Pattern for the "<http_method> <http_url> HTTP/<http_version>" format found
-- found in both OpenStack and Apache log files.
-- Example : OPTIONS /example.com HTTP/1.0
http_request = http_method * sp * url * sp * l.P'HTTP/' * http_version
-- Patterns for HTTP status, HTTP response size and HTTP response time in
-- OpenLayers logs.
--
-- Notes:
-- Nova changes the default log format of eventlet.wsgi (see nova/wsgi.py) and
-- prefixes the HTTP status, response size and response time values with
-- respectively "status: ", "len: " and "time: ".
-- Other OpenStack services just rely on the default log format.
-- TODO(pasquier-s): build the LPEG grammar based on the log_format parameter
-- passed to eventlet.wsgi.server similar to what the build_rsyslog_grammar
-- function does for RSyslog.
local openstack_http_status = l.P"status: "^-1 * l.Cg(l.digit^3, "http_status")
local openstack_response_size = l.P"len: "^-1 * l.Cg(l.digit^1 / tonumber, "http_response_size")
local openstack_response_time = l.P"time: "^-1 * l.Cg(l.digit^1 * dot^0 * l.digit^0 / tonumber, "http_response_time")
-- Capture for OpenStack HTTP producing six values: http_method, http_url,
-- http_version, http_status, http_response_size and http_response_time.
openstack_http = anywhere(l.Ct(
quote * http_request * quote * sp *
openstack_http_status * sp * openstack_response_size * sp *
openstack_response_time
))
-- Capture for IP addresses producing one value: ip_address.
ip_address = anywhere(l.Ct(
l.Cg(l.digit^-3 * dot * l.digit^-3 * dot * l.digit^-3 * dot * l.digit^-3, "ip_address")
))
-- Pattern used to match the beginning of a Python Traceback.
traceback = l.P'Traceback (most recent call last):'
return M

View File

@ -1,89 +0,0 @@
-- Copyright 2015-2016 Mirantis, Inc.
--
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
--
-- http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
local cjson = require 'cjson'
local string = require 'string'
local patt = require 'os_patterns'
local pairs = pairs
local inject_message = inject_message
local inject_payload = inject_payload
local read_message = read_message
local pcall = pcall
local M = {}
setfenv(1, M) -- Remove external access to contain everything in the module
severity_to_label_map = {
[0] = 'EMERGENCY',
[1] = 'ALERT',
[2] = 'CRITICAL',
[3] = 'ERROR',
[4] = 'WARNING',
[5] = 'NOTICE',
[6] = 'INFO',
[7] = 'DEBUG',
}
label_to_severity_map = {
EMERGENCY = 0,
ALERT = 1,
CRITICAL = 2,
ERROR = 3,
WARNING = 4,
NOTICE = 5,
INFO= 6,
DEBUG = 7,
}
function chomp(s)
return string.gsub(s, "\n$", "")
end
-- Call inject_message() wrapped by pcall()
function safe_inject_message(msg)
local ok, err_msg = pcall(inject_message, msg)
if not ok then
return -1, err_msg
else
return 0
end
end
-- Call inject_payload() wrapped by pcall()
function safe_inject_payload(payload_type, payload_name, data)
local ok, err_msg = pcall(inject_payload, payload_type, payload_name, data)
if not ok then
return -1, err_msg
else
return 0
end
end
-- Shallow comparison between two tables.
-- Return true if the two tables have the same keys with identical
-- values, otherwise false.
function table_equal(t1, t2)
-- all key-value pairs in t1 must be in t2
for k, v in pairs(t1) do
if t2[k] ~= v then return false end
end
-- there must not be other keys in t2
for k, v in pairs(t2) do
if t1[k] == nil then return false end
end
return true
end
return M

View File

@ -1,29 +0,0 @@
FROM {{ image_spec("base-tools") }}
MAINTAINER {{ maintainer }}
# We use MOS packages for hindsight, lua_sandbox and lua_sandbox_extensions
COPY sources.mos.list /etc/apt/sources.list.d/
COPY mos.pref /etc/apt/preferences.d/
COPY bootstrap-hindsight.sh /opt/ccp/bin/
RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 1FA22B08 \
&& apt-get update \
&& apt-get install -y --no-install-recommends \
hindsight \
lua-sandbox-extensions \
&& cp /usr/share/luasandbox/sandboxes/heka/input/prune_input.lua \
/usr/share/luasandbox/sandboxes/heka/input/heka_tcp.lua \
/var/lib/hindsight/run/input/
ADD output/*.lua /var/lib/hindsight/run/output/
ADD input/*.lua /var/lib/hindsight/run/input/
ADD analysis/*.lua /var/lib/hindsight/run/analysis/
ADD modules/*.lua /opt/ccp/lua/modules/stacklight/
RUN useradd --user-group hindsight \
&& usermod -a -G microservices hindsight \
&& chown -R hindsight: /var/lib/hindsight /etc/hindsight \
&& tar cf - -C /var/lib hindsight | tar xf - -C /opt/ccp
USER hindsight

View File

@ -1,117 +0,0 @@
-- Copyright 2015-2016 Mirantis, Inc.
--
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
--
-- http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
local string = require 'string'
local message = require 'stacklight.message'
local afd = require 'stacklight.afd'
local afd_annotation = require 'stacklight.afd_annotation'
-- node or service
local afd_type = read_config('afd_type') or error('afd_type must be specified!')
local afd_msg_type
local afd_metric_name
if afd_type == 'node' then
afd_msg_type = 'afd_node_metric'
afd_metric_name = 'node_status'
elseif afd_type == 'service' then
afd_msg_type = 'afd_service_metric'
afd_metric_name = 'service_status'
else
error('invalid afd_type value')
end
-- ie: controller for node AFD / rabbitmq for service AFD
local afd_cluster_name = read_config('afd_cluster_name') or
error('afd_cluster_name must be specified!')
-- ie: cpu for node AFD / queue for service AFD
local afd_logical_name = read_config('afd_logical_name') or
error('afd_logical_name must be specified!')
local hostname = read_config('hostname') or error('hostname must be specified')
local afd_file = read_config('afd_file') or error('afd_file must be specified')
local all_alarms = require('stacklight_alarms.' .. afd_file)
local A = require 'stacklight.afd_alarms'
A.load_alarms(all_alarms)
function process_message()
local metric_name = read_message('Fields[name]')
local ts = read_message('Timestamp')
local value, err_msg = message.read_values()
if not value then
return -1, err_msg
end
-- retrieve field values
local fields = {}
for _, field in ipairs(A.get_metric_fields(metric_name)) do
local field_value = read_message(string.format('Fields[%s]', field))
if not field_value then
return -1, "Cannot find Fields[" .. field .. "] for the metric " .. metric_name
end
fields[field] = field_value
end
A.add_value(ts, metric_name, value, fields)
return 0
end
function timer_event(ns)
if A.is_started() then
local state, alarms = A.evaluate(ns)
if state then -- it was time to evaluate at least one alarm
for _, alarm in ipairs(alarms) do
afd.add_to_alarms(
alarm.state,
alarm.alert['function'],
alarm.alert.metric,
alarm.alert.fields,
{}, -- tags
alarm.alert.operator,
alarm.alert.value,
alarm.alert.threshold,
alarm.alert.window,
alarm.alert.periods,
alarm.alert.message)
end
-- Message example:
-- msg = {
-- Type = 'afd_node_metric',
-- Payload = '{"alarms":[...]}',
-- Fields = {
-- name = 'node_status',
-- value = 0,
-- hostname = 'node1',
-- source = 'cpu',
-- cluster = 'system',
-- dimensions = {'cluster', 'source', 'hostname'},
-- }
-- }
local msg = afd.inject_afd_metric(
afd_msg_type, afd_metric_name, afd_cluster_name, afd_logical_name,
state, hostname)
if msg then
afd_annotation.inject_afd_annotation(msg)
end
end
else
A.set_start_time(ns)
end
end

View File

@ -1,31 +0,0 @@
#!/bin/bash
# This script is used for bootstrapping
# Hindsight with proper directories contents
# when using emptydir Kubernetes volumes
# As these are created empty
# Hindsight will not start properly
# as files will be missing
# Therefore the need to run this script
# with the proper destination directory
# as its command line parameter
set -e
if [ $# -ne 1 ]; then
echo "Usage: $0 directory"
exit 1
fi
if [ ! -d "$1" ]; then
echo "Error: $1 does not exist or is not a directory"
exit 1
fi
SRC=/opt/ccp/hindsight
if [ ! -d "$SRC" ]; then
echo "Error: $SRC does not exist or is not a directory"
exit 1
fi
tar cf - -C $SRC . | tar xf - -C $1 --strip-components=1

View File

@ -1,700 +0,0 @@
--
-- This sandbox queries the kubelet "stats" API to collect statistics on Kubernetes
-- pods and namespaces.
--
-- The sandbox injects Heka messages for the following metrics:
--
-- * k8s_check: Expresses the success or failure of the data collection.
-- * k8s_pods_count: The number of pods in a given namespace.
-- * k8s_pods_count_total: The total number of pods on the node.
-- * k8s_pod_cpu_usage: The CPU usage of a given pod. For example 50 means that
-- the pod consumes 50% of CPU. The value may be greater than 100 on
-- multicore nodes.
-- * k8s_namespace_cpu_usage: The CPU usage of all the pods of a given namespace.
-- * k8s_pods_cpu_usage: The CPU usage of all the pods on the node.
-- * k8s_pod_memory_usage: The memory in Bytes used by a given pod. For example
-- 100000 means that the pod consumes 100000 Bytes of memory.
-- * k8s_namespace_memory_usage: The memory in Bytes used by all the pods of
-- a given namespace.
-- * k8s_pods_memory_usage: The memory in Bytes used by all the pods on the
-- node.
-- * k8s_pod_working_set: The working set in Bytes of a given pod.
-- * k8s_namespace_working_set: The working set in Bytes of all the pods of a
-- given namespace.
-- * k8s_pods_working_set: The working set in Bytes of all the pods on the
-- node.
-- * k8s_pod_major_page_faults: The number of major page faults per second
-- for a given pod.
-- * k8s_namespace_major_page_faults: The number of major page faults per second
-- for all the pods of a given namespace.
-- * k8s_pods_major_page_faults: The number of major page faults per second for
-- all the pods on the node.
-- * k8s_pod_page_faults: The number of minor page faults per second for
-- a given pod.
-- * k8s_namespace_page_faults: The number of minor page faults per second for
-- all the pods of a given namespace.
-- * k8s_pods_page_faults: The number of minor page faults per second for all
-- the pods on the node.
-- * k8s_pod_rx_bytes: The number of bytes per second received over the network
-- for a given pod.
-- * k8s_namespace_rx_bytes: The number of bytes per second received over the
-- network for all the pods of a given namespace.
-- * k8s_pods_rx_bytes: The number of bytes per second received over the
-- network for all the pods on the node.
-- * k8s_pod_tx_bytes: The number of bytes per second sent over the network
-- for a given pod.
-- * k8s_namespace_tx_bytes: The number of bytes per second sent over the
-- network for all the pods of a given namespace.
-- * k8s_pods_tx_bytes: The number of bytes per second sent over the
-- network for all the pods on the node.
-- * k8s_pod_rx_errors: The number of errors per second received over the network
-- for a given pod.
-- * k8s_namespace_rx_errors: The number of errors per second received over the
-- network for all the pods of a given namespace.
-- * k8s_pods_rx_errors: The number of errors per second received over the
-- network for all the pods on the node.
-- * k8s_pod_tx_errors: The number of errors per second sent over the network
-- for a given pod.
-- * k8s_namespace_tx_errors: The number of errors per second sent over the
-- network for all the pods of a given namespace.
-- * k8s_pods_tx_errors: The number of errors per second sent over the
-- network for all the pods on the node.
--
-- Configuration variables:
--
-- * kubernetes_host: The hostname or IP to use to access the Kubernetes
-- API. Optional. Default is "kubernetes".
-- * kubelet_stats_node: The name of the Kubernetes node onto which the
-- Kubelet to query runs. At init time the plugin uses the Kubernetes API
-- to get the corresponding internal IP address. Required.
-- * kubelet_stats_port: The port to use to access the Kubelet stats API.
-- Optional. Default value is 10255.
--
-- Configuration example:
--
-- filename = "kubelet_stats.lua"
-- kubelet_stats_node = "node1"
-- kubelet_stats_port = 10255
-- ticker_interval = 10 -- query Kubelet every 10 seconds
--
local cjson = require 'cjson'
local date_time = require 'lpeg.date_time'
local http = require 'socket.http'
local https = require 'ssl.https'
local io = require 'io'
local ltn12 = require 'ltn12'
local function read_file(path)
local fh, err = io.open(path, 'r')
if err then return nil, err end
local content = fh:read('*all')
fh:close()
return content, nil
end
-- get the node IP for "node_name". Done by querying the Kubernetes API
local function get_node_ip_address(kubernetes_host, node_name)
local token_path = '/var/run/secrets/kubernetes.io/serviceaccount/token'
local token, err_msg = read_file(token_path)
if not token then
return nil, err_msg
end
local url = string.format('https://%s/api/v1/nodes/%s',
kubernetes_host, node_name)
local resp_body = {}
local res, code, headers, status = https.request {
url = url,
cafile = '/var/run/secrets/kubernetes.io/serviceaccount/ca.crt',
headers = {
Authorization = string.format('Bearer %s', token)
},
sink = ltn12.sink.table(resp_body)
}
if not res then
return nil, code
end
local ok, doc = pcall(cjson.decode, table.concat(resp_body))
if not ok then
local err_msg = string.format(
'HTTP response does not contain valid JSON: %s', doc)
return nil, err_msg
end
local status = doc['status']
if not status then
return nil, 'HTTP JSON does not contain node status'
end
local addresses = status['addresses']
if not addresses then
return nil, 'HTTP JSON does not contain node addresses'
end
for _, address in ipairs(addresses) do
if address['type'] == 'InternalIP' then
return address['address'], ''
end
end
return nil, string.format('No IP address found for %s', node_name)
end
local kubernetes_host = read_config('kubernetes_host') or 'kubernetes'
local kubelet_stats_port = read_config('kubelet_stats_port') or 10255
local kubelet_stats_node = read_config('kubelet_stats_node')
assert(kubelet_stats_node, 'kubelet_stats_node missing in plugin config')
local kubelet_stats_ip_address, err_msg = get_node_ip_address(
kubernetes_host, kubelet_stats_node)
assert(kubelet_stats_ip_address, err_msg)
local summary_url = string.format('http://%s:%d/stats/summary',
kubelet_stats_ip_address, kubelet_stats_port)
local pods_stats = {}
-- message skeletons for each metric type
local k8s_check_msg = {
Type = 'metric',
Timestamp = nil,
Hostname = nil,
Fields = {
name = 'k8s_check',
value = nil,
dimensions = {'hostname'},
hostname = nil
}
}
local k8s_pod_msg = {
Type = 'metric',
Timestamp = nil,
Hostname = nil,
Fields = {
name = nil,
value = nil,
dimensions = {'pod_name', 'pod_namespace', 'hostname'},
hostname = nil,
pod_name = nil,
pod_namespace = nil
}
}
local k8s_namespace_msg = {
Type = 'metric',
Timestamp = nil,
Hostname = nil,
Fields = {
name = nil,
value = nil,
dimensions = {'pod_namespace', 'hostname'},
hostname = nil,
pod_namespace = nil
}
}
local k8s_pods_msg = {
Type = 'metric',
Timestamp = nil,
Hostname = nil,
Fields = {
name = nil,
value = nil,
dimensions = {'hostname'},
hostname = nil
}
}
-- inject a pod-level metric message
local function inject_pod_metric(name, value, hostname, pod_namespace, pod_name)
k8s_pod_msg.Fields.name = name
k8s_pod_msg.Fields.value = value
k8s_pod_msg.Fields.hostname = hostname
k8s_pod_msg.Fields.pod_namespace = pod_namespace
k8s_pod_msg.Fields.pod_name = pod_name
inject_message(k8s_pod_msg)
end
-- inject a namespace-level metric message
local function inject_namespace_metric(name, value, hostname, pod_namespace)
k8s_namespace_msg.Fields.name = name
k8s_namespace_msg.Fields.value = value
k8s_namespace_msg.Fields.hostname = hostname
k8s_namespace_msg.Fields.pod_namespace = pod_namespace
inject_message(k8s_namespace_msg)
end
-- inject a node-level metric message
local function inject_pods_metric(name, value, hostname)
k8s_pods_msg.Fields.name = name
k8s_pods_msg.Fields.value = value
k8s_pods_msg.Fields.hostname = hostname
inject_message(k8s_pods_msg)
end
-- Send a "stats" query to kubelet, and return the JSON response in a Lua table
local function send_stats_query()
local resp_body, resp_status = http.request(summary_url)
if resp_body and resp_status == 200 then
-- success
local ok, doc = pcall(cjson.decode, resp_body)
if ok then
return doc, ''
else
local err_msg = string.format('HTTP response does not contain valid JSON: %s', doc)
return nil, err_msg
end
else
-- error
local err_msg = resp_status
if resp_body then
err_msg = string.format('kubelet stats query error: [%s] %s',
resp_status, resp_body)
end
return nil, err_msg
end
end
-- Collect cpu statistics for a container
local function collect_container_cpu_stats(container_cpu, prev_stats, curr_stats)
local cpu_usage
if container_cpu then
local cpu_scrape_time = date_time.rfc3339:match(container_cpu['time'])
curr_stats.cpu = {
scrape_time = date_time.time_to_ns(cpu_scrape_time),
usage = container_cpu['usageCoreNanoSeconds']
}
if prev_stats and prev_stats.cpu then
local time_diff = curr_stats.cpu.scrape_time - prev_stats.cpu.scrape_time
if time_diff > 0 then
cpu_usage = 100 *
(curr_stats.cpu.usage - prev_stats.cpu.usage) / time_diff
end
end
end
return cpu_usage
end
-- Collect memory statistics for a container
local function collect_container_memory_stats(container_memory, prev_stats, curr_stats)
local memory_usage, major_page_faults, page_faults, working_set
if container_memory then
memory_usage = container_memory['usageBytes']
working_set = container_memory['workingSetBytes']
local memory_scrape_time = date_time.rfc3339:match(container_memory['time'])
curr_stats.memory = {
scrape_time = date_time.time_to_ns(memory_scrape_time),
major_page_faults = container_memory['majorPageFaults'],
page_faults = container_memory['pageFaults']
}
if prev_stats and prev_stats.memory then
local time_diff = curr_stats.memory.scrape_time - prev_stats.memory.scrape_time
if time_diff > 0 then
major_page_faults = 1e9 *
(curr_stats.memory.major_page_faults -
prev_stats.memory.major_page_faults) / time_diff
page_faults = 1e9 *
(curr_stats.memory.page_faults -
prev_stats.memory.page_faults) / time_diff
end
end
end
return memory_usage, major_page_faults, page_faults, working_set
end
-- Collect statistics for a container
local function collect_container_stats(container, prev_stats, curr_stats)
-- cpu stats
local cpu_usage =
collect_container_cpu_stats(container['cpu'], prev_stats, curr_stats)
-- memory stats
local memory_usage, major_page_faults, page_faults, working_set =
collect_container_memory_stats(container['memory'], prev_stats, curr_stats)
return cpu_usage, memory_usage, major_page_faults, page_faults, working_set
end
-- Collect statistics for a group of containers
local function collect_containers_stats(containers, prev_stats, curr_stats)
local aggregated_cpu_usage, aggregated_memory_usage,
aggregated_major_page_faults, aggregated_page_faults,
aggregated_working_set
for _, container in ipairs(containers) do
local container_name = container['name']
curr_stats[container_name] = {}
local container_prev_stats
if prev_stats then
container_prev_stats = prev_stats[container_name]
end
local cpu_usage, memory_usage, major_page_faults, page_faults, working_set =
collect_container_stats(container,
container_prev_stats, curr_stats[container_name])
if cpu_usage then
aggregated_cpu_usage = (aggregated_cpu_usage or 0) + cpu_usage
end
if memory_usage then
aggregated_memory_usage = (aggregated_memory_usage or 0) + memory_usage
end
if major_page_faults then
aggregated_major_page_faults = (aggregated_major_page_faults or 0) +
major_page_faults
end
if page_faults then
aggregated_page_faults = (aggregated_page_faults or 0) + page_faults
end
if working_set then
aggregated_working_set = (aggregated_working_set or 0) + working_set
end
end
return aggregated_cpu_usage, aggregated_memory_usage,
aggregated_major_page_faults, aggregated_page_faults,
aggregated_working_set
end
-- Collect statistics for a pod
local function collect_pod_stats(pod, prev_stats, curr_stats)
curr_stats.containers = {}
local containers_prev_stats
if prev_stats then
containers_prev_stats = prev_stats.containers
end
-- collect cpu and memory containers stats
local cpu_usage, memory_usage, major_page_faults, page_faults, working_set =
collect_containers_stats(pod['containers'] or {},
containers_prev_stats, curr_stats.containers)
-- collect network stats
local rx_bytes, tx_bytes, rx_errors, tx_errors
local pod_network = pod['network']
if pod_network then
local network_scrape_time = date_time.rfc3339:match(pod_network['time'])
curr_stats.network = {
scrape_time = date_time.time_to_ns(network_scrape_time),
rx_bytes = pod_network['rxBytes'],
tx_bytes = pod_network['txBytes'],
rx_errors = pod_network['rxErrors'],
tx_errors = pod_network['txErrors']
}
if prev_stats and prev_stats.network then
local time_diff = curr_stats.network.scrape_time -
prev_stats.network.scrape_time
if time_diff > 0 then
rx_bytes = 1e9 *
(curr_stats.network.rx_bytes -
prev_stats.network.rx_bytes) / time_diff
tx_bytes = 1e9 *
(curr_stats.network.tx_bytes -
prev_stats.network.tx_bytes) / time_diff
rx_errors = 1e9 *
(curr_stats.network.rx_errors -
prev_stats.network.rx_errors) / time_diff
tx_errors = 1e9 *
(curr_stats.network.tx_errors -
prev_stats.network.tx_errors) / time_diff
end
end
end
return cpu_usage, memory_usage, major_page_faults, page_faults, working_set,
rx_bytes, tx_bytes, rx_errors, tx_errors
end
-- Collect statistics for a group of pods
local function collect_pods_stats(node_name, pods, prev_stats, curr_stats)
local pods_count_by_ns = {}
local pods_stats_by_ns = {}
local pods_count_total = 0
local pods_cpu_usage = 0
local pods_memory_usage = 0
local pods_major_page_faults = 0
local pods_page_faults = 0
local pods_working_set = 0
local pods_rx_bytes = 0
local pods_tx_bytes = 0
local pods_rx_errors = 0
local pods_tx_errors = 0
for _, pod in ipairs(pods) do
local pod_ref = pod['podRef']
local pod_uid = pod_ref['uid']
local pod_name = pod_ref['name']
local pod_namespace = pod_ref['namespace']
curr_stats[pod_uid] = {}
local pod_cpu_usage,
pod_memory_usage,
pod_major_page_faults,
pod_page_faults,
pod_working_set,
pod_rx_bytes,
pod_tx_bytes,
pod_rx_errors,
pod_tx_errors = collect_pod_stats(
pod, prev_stats[pod_uid], curr_stats[pod_uid])
if pod_cpu_usage then
-- inject k8s_pod_cpu_usage metric
inject_pod_metric('k8s_pod_cpu_usage',
pod_cpu_usage, node_name, pod_namespace, pod_name)
if not pods_stats_by_ns[pod_namespace] then
pods_stats_by_ns[pod_namespace] = {cpu_usage = pod_cpu_usage}
else
pods_stats_by_ns[pod_namespace].cpu_usage =
(pods_stats_by_ns[pod_namespace].cpu_usage or 0) + pod_cpu_usage
end
pods_cpu_usage = pods_cpu_usage + pod_cpu_usage
end
if pod_memory_usage then
-- inject k8s_pod_memory_usage metric
inject_pod_metric('k8s_pod_memory_usage',
pod_memory_usage, node_name, pod_namespace, pod_name)
if not pods_stats_by_ns[pod_namespace] then
pods_stats_by_ns[pod_namespace] = {memory_usage = pod_memory_usage}
else
pods_stats_by_ns[pod_namespace].memory_usage =
(pods_stats_by_ns[pod_namespace].memory_usage or 0) + pod_memory_usage
end
pods_memory_usage = pods_memory_usage + pod_memory_usage
end
if pod_major_page_faults then
-- inject k8s_pod_major_page_faults metric
inject_pod_metric('k8s_pod_major_page_faults',
pod_major_page_faults, node_name, pod_namespace, pod_name)
if not pods_stats_by_ns[pod_namespace] then
pods_stats_by_ns[pod_namespace] = {major_page_faults = pod_major_page_faults}
else
pods_stats_by_ns[pod_namespace].major_page_faults =
(pods_stats_by_ns[pod_namespace].major_page_faults or 0) + pod_major_page_faults
end
pods_major_page_faults = pods_major_page_faults + pod_major_page_faults
end
if pod_page_faults then
-- inject k8s_pod_page_faults metric
inject_pod_metric('k8s_pod_page_faults',
pod_page_faults, node_name, pod_namespace, pod_name)
if not pods_stats_by_ns[pod_namespace] then
pods_stats_by_ns[pod_namespace] = {page_faults = pod_page_faults}
else
pods_stats_by_ns[pod_namespace].page_faults =
(pods_stats_by_ns[pod_namespace].page_faults or 0) + pod_page_faults
end
pods_page_faults = pods_page_faults + pod_page_faults
end
if pod_working_set then
-- inject k8s_pod_working_set metric
inject_pod_metric('k8s_pod_working_set',
pod_working_set, node_name, pod_namespace, pod_name)
if not pods_stats_by_ns[pod_namespace] then
pods_stats_by_ns[pod_namespace] = {working_set = pod_working_set}
else
pods_stats_by_ns[pod_namespace].working_set =
(pods_stats_by_ns[pod_namespace].working_set or 0) + pod_working_set
end
pods_working_set = pods_working_set + pod_working_set
end
if pod_rx_bytes then
-- inject k8s_pod_rx_bytes metric
inject_pod_metric('k8s_pod_rx_bytes',
pod_rx_bytes, node_name, pod_namespace, pod_name)
if not pods_stats_by_ns[pod_namespace] then
pods_stats_by_ns[pod_namespace] = {rx_bytes = pod_rx_bytes}
else
pods_stats_by_ns[pod_namespace].rx_bytes =
(pods_stats_by_ns[pod_namespace].rx_bytes or 0) + pod_rx_bytes
end
pods_rx_bytes = pods_rx_bytes + pod_rx_bytes
end
if pod_tx_bytes then
-- inject k8s_pod_tx_bytes metric
inject_pod_metric('k8s_pod_tx_bytes',
pod_tx_bytes, node_name, pod_namespace, pod_name)
if not pods_stats_by_ns[pod_namespace] then
pods_stats_by_ns[pod_namespace] = {tx_bytes = pod_tx_bytes}
else
pods_stats_by_ns[pod_namespace].tx_bytes =
(pods_stats_by_ns[pod_namespace].tx_bytes or 0) + pod_tx_bytes
end
pods_tx_bytes = pods_tx_bytes + pod_tx_bytes
end
if pod_rx_errors then
-- inject k8s_pod_rx_errors metric
inject_pod_metric('k8s_pod_rx_errors',
pod_rx_errors, node_name, pod_namespace, pod_name)
if not pods_stats_by_ns[pod_namespace] then
pods_stats_by_ns[pod_namespace] = {rx_errors = pod_rx_errors}
else
pods_stats_by_ns[pod_namespace].rx_errors =
(pods_stats_by_ns[pod_namespace].rx_errors or 0) + pod_rx_errors
end
pods_rx_errors = pods_rx_errors + pod_rx_errors
end
if pod_tx_errors then
-- inject k8s_pod_tx_errors metric
inject_pod_metric('k8s_pod_tx_errors',
pod_tx_errors, node_name, pod_namespace, pod_name)
if not pods_stats_by_ns[pod_namespace] then
pods_stats_by_ns[pod_namespace] = {tx_errors = pod_tx_errors}
else
pods_stats_by_ns[pod_namespace].tx_errors =
(pods_stats_by_ns[pod_namespace].tx_errors or 0) + pod_tx_errors
end
pods_tx_errors = pods_tx_errors + pod_tx_errors
end
if not pods_count_by_ns[pod_namespace] then
pods_count_by_ns[pod_namespace] = 1
else
pods_count_by_ns[pod_namespace] = pods_count_by_ns[pod_namespace] + 1
end
pods_count_total = pods_count_total + 1
end
for pod_namespace, namespace_stats in pairs(pods_stats_by_ns) do
if namespace_stats.cpu_usage then
-- inject k8s_namespace_cpu_usage metric
inject_namespace_metric('k8s_namespace_cpu_usage',
namespace_stats.cpu_usage, node_name, pod_namespace)
end
if namespace_stats.memory_usage then
-- inject k8s_namespace_memory_usage metric
inject_namespace_metric('k8s_namespace_memory_usage',
namespace_stats.memory_usage, node_name, pod_namespace)
end
if namespace_stats.major_page_faults then
-- inject k8s_namespace_major_page_faults metric
inject_namespace_metric('k8s_namespace_major_page_faults',
namespace_stats.major_page_faults, node_name, pod_namespace)
end
if namespace_stats.page_faults then
-- inject k8s_namespace_page_faults metric
inject_namespace_metric('k8s_namespace_page_faults',
namespace_stats.page_faults, node_name, pod_namespace)
end
if namespace_stats.working_set then
-- inject k8s_namespace_working_set metric
inject_namespace_metric('k8s_namespace_working_set',
namespace_stats.working_set, node_name, pod_namespace)
end
if namespace_stats.rx_bytes then
-- inject k8s_namespace_rx_bytes metric
inject_namespace_metric('k8s_namespace_rx_bytes',
namespace_stats.rx_bytes, node_name, pod_namespace)
end
if namespace_stats.tx_bytes then
-- inject k8s_namespace_tx_bytes metric
inject_namespace_metric('k8s_namespace_tx_bytes',
namespace_stats.tx_bytes, node_name, pod_namespace)
end
if namespace_stats.rx_errors then
-- inject k8s_namespace_rx_errors metric
inject_namespace_metric('k8s_namespace_rx_errors',
namespace_stats.rx_errors, node_name, pod_namespace)
end
if namespace_stats.tx_errors then
-- inject k8s_namespace_tx_errors metric
inject_namespace_metric('k8s_namespace_tx_errors',
namespace_stats.tx_errors, node_name, pod_namespace)
end
end
for pod_namespace, pods_count in pairs(pods_count_by_ns) do
-- inject k8s_pods_count metric
inject_namespace_metric('k8s_pods_count',
pods_count, node_name, pod_namespace)
end
-- inject k8s_pods_count_total metric
inject_pods_metric('k8s_pods_count_total', pods_count_total, node_name)
-- inject k8s_pods_cpu_usage metric
inject_pods_metric('k8s_pods_cpu_usage', pods_cpu_usage, node_name)
-- inject k8s_pods_memory_usage metric
inject_pods_metric('k8s_pods_memory_usage', pods_memory_usage, node_name)
-- inject k8s_pods_major_page_faults metric
inject_pods_metric('k8s_pods_major_page_faults', pods_major_page_faults, node_name)
-- inject k8s_pods_page_faults metric
inject_pods_metric('k8s_pods_page_faults', pods_page_faults, node_name)
-- inject k8s_pods_working_set metric
inject_pods_metric('k8s_pods_working_set', pods_working_set, node_name)
-- inject k8s_pods_rx_bytes metric
inject_pods_metric('k8s_pods_rx_bytes', pods_rx_bytes, node_name)
-- inject k8s_pods_tx_bytes metric
inject_pods_metric('k8s_pods_tx_bytes', pods_tx_bytes, node_name)
-- inject k8s_pods_rx_errors metric
inject_pods_metric('k8s_pods_rx_errors', pods_rx_errors, node_name)
-- inject k8s_pods_tx_errors metric
inject_pods_metric('k8s_pods_tx_errors', pods_tx_errors, node_name)
end
-- Function called every ticker interval. Queries the kubelet "stats" API,
-- does aggregations, and inject metric messages.
function process_message()
local doc, err_msg = send_stats_query()
if not doc then
-- inject a k8s_check "failure" metric
k8s_check_msg.Fields.value = 0
k8s_check_msg.Fields.hostname = node_name
inject_message(k8s_check_msg)
return -1, err_msg
end
local pods = doc['pods']
if not pods then
return -1, "no pods in kubelet stats response"
end
local curr_stats = {}
collect_pods_stats(doc['node']['nodeName'], pods, pods_stats, curr_stats)
pods_stats = curr_stats
-- inject a k8s_check "success" metric
k8s_check_msg.Fields.value = 1
k8s_check_msg.Fields.hostname = node_name
inject_message(k8s_check_msg)
return 0
end

View File

@ -1,185 +0,0 @@
-- Copyright 2015-2016 Mirantis, Inc.
--
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
--
-- http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
local cjson = require 'cjson'
local string = require 'string'
local table = require 'table'
local utils = require 'stacklight.utils'
local constants = require 'stacklight.constants'
local read_message = read_message
local assert = assert
local ipairs = ipairs
local pcall = pcall
local M = {}
setfenv(1, M) -- Remove external access to contain everything in the module
local function read_field(msg, name)
return msg.Fields[name]
end
function read_status(msg)
return read_field(msg, 'value')
end
function read_source(msg)
return read_field(msg, 'source')
end
function read_hostname(msg)
return read_field(msg, 'hostname')
end
function read_cluster(msg)
return read_field(msg, 'cluster')
end
function extract_alarms(msg)
local ok, payload = pcall(cjson.decode, msg.Payload)
if not ok or not payload.alarms then
return nil
end
return payload.alarms
end
-- return a human-readable message from an alarm table
-- for instance: "CPU load too high (WARNING, rule='last(load_midterm)>=5', current=7)"
function get_alarm_for_human(alarm)
local metric
if #(alarm.fields) > 0 then
local fields = {}
for _, field in ipairs(alarm.fields) do
fields[#fields+1] = field.name .. '="' .. field.value .. '"'
end
metric = string.format('%s[%s]', alarm.metric, table.concat(fields, ','))
else
metric = alarm.metric
end
local host = ''
if alarm.hostname then
host = string.format(', host=%s', alarm.hostname)
end
return string.format(
"%s (%s, rule='%s(%s)%s%s', current=%.2f%s)",
alarm.message,
alarm.severity,
alarm['function'],
metric,
alarm.operator,
alarm.threshold,
alarm.value,
host
)
end
function alarms_for_human(alarms)
local alarm_messages = {}
local hint_messages = {}
for _, v in ipairs(alarms) do
if v.tags and v.tags.dependency_level and v.tags.dependency_level == 'hint' then
hint_messages[#hint_messages+1] = get_alarm_for_human(v)
else
alarm_messages[#alarm_messages+1] = get_alarm_for_human(v)
end
end
if #hint_messages > 0 then
alarm_messages[#alarm_messages+1] = "Other related alarms:"
end
for _, v in ipairs(hint_messages) do
alarm_messages[#alarm_messages+1] = v
end
return alarm_messages
end
local alarms = {}
-- append an alarm to the list of pending alarms
-- the list is sent when inject_afd_metric is called
function add_to_alarms(status, fn, metric, fields, tags, operator, value, threshold, window, periods, message)
local severity = constants.status_label(status)
assert(severity)
alarms[#alarms+1] = {
severity=severity,
['function']=fn,
metric=metric,
fields=fields or {},
tags=tags or {},
operator=operator,
value=value,
threshold=threshold,
window=window or 0,
periods=periods or 0,
message=message
}
end
function get_alarms()
return alarms
end
function reset_alarms()
alarms = {}
end
-- inject an AFD event into the pipeline
function inject_afd_metric(msg_type, metric_name, cluster_name, logical_name,
state, hostname)
local payload
if #alarms > 0 then
payload = utils.safe_json_encode({alarms=alarms})
reset_alarms()
if not payload then
return
end
else
-- because cjson encodes empty tables as objects instead of arrays
payload = '{"alarms":[]}'
end
local msg = {
Type = msg_type,
Payload = payload,
Fields = {
name = metric_name,
value = state,
hostname = hostname,
cluster = cluster_name,
source = logical_name,
dimensions = {'cluster', 'hostname', 'source'},
}
}
local err_code, err_msg = utils.safe_inject_message(msg)
if err_code ~= 0 then
return nil, err_msg
end
return msg
end
MATCH = 1
NO_MATCH = 2
NO_DATA = 3
MISSING_DATA = 4
return M

View File

@ -1,224 +0,0 @@
-- Copyright 2015-2016 Mirantis, Inc.
--
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
--
-- http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
local assert = assert
local ipairs = ipairs
local pairs = pairs
local string = string
local setmetatable = setmetatable
local table_utils = require 'stacklight.table_utils'
local constants = require 'stacklight.constants'
local afd = require 'stacklight.afd'
local Rule = require 'stacklight.afd_rule'
local SEVERITIES = {
warning = constants.WARN,
critical = constants.CRIT,
down = constants.DOWN,
unknown = constants.UNKW,
okay = constants.OKAY,
}
local Alarm = {}
Alarm.__index = Alarm
setfenv(1, Alarm) -- Remove external access to contain everything in the module
function Alarm.new(alarm)
local a = {}
setmetatable(a, Alarm)
a._metrics_list = nil
a.name = alarm.name
a.description = alarm.description
if alarm.trigger.logical_operator then
a.logical_operator = string.lower(alarm.trigger.logical_operator)
else
a.logical_operator = 'or'
end
a.severity_str = string.upper(alarm.severity)
a.severity = SEVERITIES[string.lower(alarm.severity)]
assert(a.severity ~= nil)
a.skip_when_no_data = false
if alarm.no_data_policy then
if string.lower(alarm.no_data_policy) == 'skip' then
a.skip_when_no_data = true
else
a.no_data_severity = SEVERITIES[string.lower(alarm.no_data_policy)]
end
else
a.no_data_severity = constants.UNKW
end
assert(a.skip_when_no_data or a.no_data_severity ~= nil)
a.rules = {}
a.initial_wait = 0
for _, rule in ipairs(alarm.trigger.rules) do
local r = Rule.new(rule)
a.rules[#a.rules+1] = r
local wait = r.window * r.periods
if wait > a.initial_wait then
a.initial_wait = wait * 1e9
end
end
a.start_time_ns = 0
return a
end
-- return the Set of metrics used by the alarm
function Alarm:get_metrics()
if not self._metrics_list then
self._metrics_list = {}
for _, rule in ipairs(self.rules) do
if not table_utils.item_find(rule.metric, metrics) then
self._metrics_list[#self._metrics_list+1] = rule.metric
end
end
end
return self._metrics_list
end
-- return a list of field names used for the metric
-- (can have duplicate names)
function Alarm:get_metric_fields(metric_name)
local fields = {}
for _, rule in ipairs(self.rules) do
if rule.metric == metric_name then
for k, _ in pairs(rule.fields) do
fields[#fields+1] = k
end
for _, g in ipairs(rule.group_by) do
fields[#fields+1] = g
end
end
end
return fields
end
function Alarm:has_metric(metric)
return table_utils.item_find(metric, self:get_metrics())
end
-- dispatch datapoint in datastores
function Alarm:add_value(ts, metric, value, fields)
local data
for id, rule in pairs(self.rules) do
if rule.metric == metric then
rule:add_value(ts, value, fields)
end
end
end
-- convert fields to fields map
-- {foo="bar"} --> {name="foo", value="bar"}
local function convert_field_list(fields)
local named_fields = {}
for name, value in pairs(fields or {}) do
named_fields[#named_fields+1] = {name=name, value=value}
end
return named_fields
end
-- return: state of alarm and a list of alarm details.
--
-- with alarm list when state != OKAY:
-- {
-- {
-- value = <current value>,
-- fields = <metric fields table>,
-- message = <string>,
-- },
-- }
function Alarm:evaluate(ns)
local state = constants.OKAY
local matches = 0
local all_alerts = {}
local function add_alarm(rule, value, message, fields)
all_alerts[#all_alerts+1] = {
severity = self.severity_str,
['function'] = rule.fct,
metric = rule.metric,
operator = rule.relational_operator,
threshold = rule.threshold,
window = rule.window,
periods = rule.periods,
value = value,
fields = fields,
message = message
}
end
local one_unknown = false
local msg
for _, rule in ipairs(self.rules) do
local eval, context_list = rule:evaluate(ns)
if eval == afd.MATCH then
matches = matches + 1
msg = self.description
elseif eval == afd.MISSING_DATA then
msg = 'No datapoint have been received over the last ' .. rule.observation_window .. ' seconds'
one_unknown = true
elseif eval == afd.NO_DATA then
msg = 'No datapoint have been received ever'
one_unknown = true
end
for _, context in ipairs(context_list) do
add_alarm(rule, context.value, msg,
convert_field_list(context.fields))
end
end
if self.logical_operator == 'and' then
if one_unknown then
if self.skip_when_no_data then
state = nil
else
state = self.no_data_severity
end
elseif #self.rules == matches then
state = self.severity
end
elseif self.logical_operator == 'or' then
if matches > 0 then
state = self.severity
elseif one_unknown then
if self.skip_when_no_data then
state = nil
else
state = self.no_data_severity
end
end
end
if state == nil or state == constants.OKAY then
all_alerts = {}
end
return state, all_alerts
end
function Alarm:set_start_time(ns)
self.start_time_ns = ns
end
function Alarm:is_evaluation_time(ns)
local delta = ns - self.start_time_ns
if delta >= self.initial_wait then
return true
end
return false
end
return Alarm

View File

@ -1,118 +0,0 @@
-- Copyright 2015-2016 Mirantis, Inc.
--
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
--
-- http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
local pairs = pairs
local ipairs = ipairs
local table_utils = require 'stacklight.table_utils'
local constants = require 'stacklight.constants'
local Alarm = require 'stacklight.afd_alarm'
local all_alarms = {}
local M = {}
setfenv(1, M) -- Remove external access to contain everything in the module
-- return a list of field names required for the metric
function get_metric_fields(metric_name)
local fields = {}
for name, alarm in pairs(all_alarms) do
local mf = alarm:get_metric_fields(metric_name)
if mf then
for _, field in pairs(mf) do
if not table_utils.item_find(field, fields) then
fields[#fields+1] = field
end
end
end
end
return fields
end
-- return list of alarms interested by a metric
function get_interested_alarms(metric)
local interested_alarms = {}
for _, alarm in pairs(all_alarms) do
if alarm:has_metric(metric) then
interested_alarms[#interested_alarms+1] = alarm
end
end
return interested_alarms
end
function add_value(ts, metric, value, fields)
local interested_alarms = get_interested_alarms(metric)
for _, alarm in ipairs (interested_alarms) do
alarm:add_value(ts, metric, value, fields)
end
end
function reset_alarms()
all_alarms = {}
end
function evaluate(ns)
local global_state
local all_alerts = {}
for _, alarm in pairs(all_alarms) do
if alarm:is_evaluation_time(ns) then
local state, alerts = alarm:evaluate(ns)
global_state = constants.max_status(state, global_state)
for _, a in ipairs(alerts) do
all_alerts[#all_alerts+1] = { state=state, alert=a }
end
-- raise the first triggered alarm except for OKAY/UNKW states
if global_state ~= constants.UNKW and global_state ~= constants.OKAY then
break
end
end
end
return global_state, all_alerts
end
function get_alarms()
return all_alarms
end
function get_alarm(alarm_name)
for _, a in ipairs(all_alarms) do
if a.name == alarm_name then
return a
end
end
end
function load_alarm(alarm)
local A = Alarm.new(alarm)
all_alarms[#all_alarms+1] = A
end
function load_alarms(alarms)
for _, alarm in ipairs(alarms) do
load_alarm(alarm)
end
end
local started = false
function set_start_time(ns)
for _, alarm in ipairs(all_alarms) do
alarm:set_start_time(ns)
end
started = true
end
function is_started()
return started
end
return M

View File

@ -1,102 +0,0 @@
-- Copyright 2015-2016 Mirantis, Inc.
--
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
--
-- http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
local string = require 'string'
local table = require 'table'
local utils = require 'stacklight.utils'
local consts = require 'stacklight.constants'
local afd = require 'stacklight.afd'
local M = {}
setfenv(1, M)
local statuses = {}
local annotation_msg = {
Type = 'metric',
Fields = {
name = 'annotation',
dimensions = {'cluster', 'source', 'hostname'},
value_fields = {'title', 'tags', 'text'},
title = nil,
tags = nil,
text = nil,
cluster = nil,
source = nil,
hostname = nil,
}
}
function inject_afd_annotation(msg)
local previous
local text
local source = afd.read_source(msg)
local status = afd.read_status(msg)
local hostname = afd.read_hostname(msg)
local alarms = afd.extract_alarms(msg)
local cluster = afd.read_cluster(msg)
if not source or not status or not hostname or not alarms or not cluster then
return -1
end
if not statuses[source] then
statuses[source] = {}
end
previous = statuses[source]
text = table.concat(afd.alarms_for_human(alarms), '<br />')
-- build the title
if not previous.status and status == consts.OKAY then
-- don't send an annotation when we detect a new cluster which is OKAY
return 0
elseif not previous.status then
title = string.format('General status is %s',
consts.status_label(status))
elseif previous.status ~= status then
title = string.format('General status %s -> %s',
consts.status_label(previous.status),
consts.status_label(status))
-- TODO(pasquier-s): generate an annotation when the set of alarms has
-- changed. the following code generated an annotation whenever at least
-- one value associated to an alarm was changing. This led to way too
-- many annotations with alarms monitoring the CPU usage for instance.
-- elseif previous.text ~= text then
-- title = string.format('General status remains %s',
-- consts.status_label(status))
else
-- nothing has changed since the last message
return 0
end
annotation_msg.Fields.title = title
annotation_msg.Fields.tags = source
annotation_msg.Fields.text = text
annotation_msg.Fields.source = source
annotation_msg.Fields.hostname = hostname
annotation_msg.Fields.cluster = cluster
-- store the last status and alarm text for future messages
previous.status = status
previous.text = text
return utils.safe_inject_message(annotation_msg)
end
return M

View File

@ -1,279 +0,0 @@
-- Copyright 2015-2016 Mirantis, Inc.
--
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
--
-- http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
local circular_buffer = require 'circular_buffer'
local stats = require 'lsb.stats'
local setmetatable = setmetatable
local ipairs = ipairs
local pairs = pairs
local math = require 'math'
local string = string
local table = table
local assert = assert
local type = type
-- StackLight libs
local table_utils = require 'stacklight.table_utils'
local constants = require 'stacklight.constants'
local afd = require 'stacklight.afd'
local matching = require 'stacklight.value_matching'
local MIN_WINDOW = 10
local MIN_PERIOD = 1
local SECONDS_PER_ROW = 5
local Rule = {}
Rule.__index = Rule
setfenv(1, Rule) -- Remove external access to contain everything in the module
function Rule.new(rule)
local r = {}
setmetatable(r, Rule)
local win = MIN_WINDOW
if rule.window and rule.window + 0 > 0 then
win = rule.window + 0
end
r.window = win
local periods = MIN_PERIOD
if rule.periods and rule.periods + 0 > 0 then
periods = rule.periods + 0
end
r.periods = periods
r.relational_operator = rule.relational_operator
r.metric = rule.metric
r.fields = rule.fields or {}
-- build field matching
r.field_matchers = {}
for f, expression in pairs(r.fields) do
r.field_matchers[f] = matching.new(expression)
end
r.fct = rule['function']
r.threshold = rule.threshold + 0
r.value_index = rule.value or nil -- Can be nil
-- build unique rule id
local arr = {r.metric, r.fct, r.window, r.periods}
for f, v in table_utils.orderedPairs(r.fields or {}) do
arr[#arr+1] = string.format('(%s=%s)', f, v)
end
r.rule_id = table.concat(arr, '/')
r.group_by = rule.group_by or {}
r.cbuf_size = math.ceil(r.window * r.periods / SECONDS_PER_ROW)
r.ids_datastore = {}
r.datastore = {}
r.observation_window = math.ceil(r.window * r.periods)
return r
end
function Rule:get_datastore_id(fields)
if #self.group_by == 0 or fields == nil then
return self.rule_id
end
local arr = {}
arr[#arr + 1] = self.rule_id
for _, g in ipairs(self.group_by) do
arr[#arr + 1] = fields[g]
end
return table.concat(arr, '/')
end
function Rule:fields_accepted(fields)
if not fields then
fields = {}
end
local matched_fields = 0
local no_match_on_fields = true
for f, expression in pairs(self.field_matchers) do
no_match_on_fields = false
for k, v in pairs(fields) do
if k == f then
if expression:matches(v) then
matched_fields = matched_fields + 1
else
return false
end
end
end
end
return no_match_on_fields or matched_fields > 0
end
function Rule:get_circular_buffer()
local fct
if self.fct == 'min' or self.fct == 'max' then
fct = self.fct
else
fct = 'sum'
end
local cbuf = circular_buffer.new(self.cbuf_size, 1, SECONDS_PER_ROW)
cbuf:set_header(1, self.metric, fct, fct)
return cbuf
end
-- store datapoints in cbuf, create the cbuf if not exists.
-- value can be a table where the index to choose is referenced by self.value_index
function Rule:add_value(ts, value, fields)
if not self:fields_accepted(fields) then
return
end
if type(value) == 'table' then
value = value[self.value_index]
end
if value == nil then
return
end
local data
local uniq_field_id = self:get_datastore_id(fields)
if not self.datastore[uniq_field_id] then
self.datastore[uniq_field_id] = {
fields = self.fields,
cbuf = self:get_circular_buffer()
}
if #self.group_by > 0 then
self.datastore[uniq_field_id].fields = fields
end
self:add_datastore(uniq_field_id)
end
data = self.datastore[uniq_field_id]
if self.fct == 'avg' then
data.cbuf:add(ts, 1, value)
else
data.cbuf:set(ts, 1, value)
end
end
function Rule:add_datastore(id)
if not table_utils.item_find(id, self.ids_datastore) then
self.ids_datastore[#self.ids_datastore+1] = id
end
end
function Rule:compare_threshold(value)
return constants.compare_threshold(value, self.relational_operator, self.threshold)
end
local function isnumber(value)
return value ~= nil and not (value ~= value)
end
local available_functions = {last=true, avg=true, max=true, min=true, sum=true,
variance=true, sd=true, diff=true}
-- evaluate the rule against datapoints
-- return a list: match (bool or string), context ({value=v, fields=list of field table})
--
-- examples:
-- true, { {value=100, fields={{queue='nova'}, {queue='neutron'}}, ..}
-- false, { {value=10, fields={}}, ..}
-- with 2 special cases:
-- - never receive one datapoint
-- 'nodata', {}
-- - no more datapoint received for a metric
-- 'missing', {value=-1, fields={}}
-- There is a drawback with the 'missing' state and could leads to emit false positive
-- state. For example when the monitored thing has been renamed/deleted,
-- it's normal to don't receive datapoint anymore .. for example a filesystem.
function Rule:evaluate(ns)
local fields = {}
local one_match, one_no_match, one_missing_data = false, false, false
for _, id in ipairs(self.ids_datastore) do
local data = self.datastore[id]
if data then
local cbuf_time = data.cbuf:current_time()
-- if we didn't receive datapoint within the observation window this means
-- we don't receive anymore data and cannot compute the rule.
if ns - cbuf_time > self.observation_window * 1e9 then
one_missing_data = true
fields[#fields+1] = {value = -1, fields = data.fields}
else
assert(available_functions[self.fct])
local result
if self.fct == 'last' then
local last
local t = ns
while (not isnumber(last)) and t >= ns - self.observation_window * 1e9 do
last = data.cbuf:get(t, 1)
t = t - SECONDS_PER_ROW * 1e9
end
if isnumber(last) then
result = last
else
one_missing_data = true
fields[#fields+1] = {value = -1, fields = data.fields}
end
elseif self.fct == 'diff' then
local first, last
local t = ns
while (not isnumber(last)) and t >= ns - self.observation_window * 1e9 do
last = data.cbuf:get(t, 1)
t = t - SECONDS_PER_ROW * 1e9
end
if isnumber(last) then
t = ns - self.observation_window * 1e9
while (not isnumber(first)) and t <= ns do
first = data.cbuf:get(t, 1)
t = t + SECONDS_PER_ROW * 1e9
end
end
if not isnumber(last) or not isnumber(first) then
one_missing_data = true
fields[#fields+1] = {value = -1, fields = data.fields}
else
result = last - first
end
else
local values = data.cbuf:get_range(1)
result = stats[self.fct](values)
end
if result then
local m = self:compare_threshold(result)
if m then
one_match = true
fields[#fields+1] = {value=result, fields=data.fields}
else
one_no_match = true
end
end
end
end
end
if one_match then
return afd.MATCH, fields
elseif one_missing_data then
return afd.MISSING_DATA, fields
elseif one_no_match then
return afd.NO_MATCH, {}
else
return afd.NO_DATA, {{value=-1, fields=self.fields}}
end
end
return Rule

View File

@ -1,78 +0,0 @@
-- Copyright 2015-2016 Mirantis, Inc.
--
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
--
-- http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
local M = {}
setfenv(1, M) -- Remove external access to contain everything in the module
-- The status values were chosen to match with the Grafana constraints:
-- OKAY => green
-- WARN & UNKW => orange
-- CRIT & DOWN => red
OKAY=0
WARN=1
UNKW=2
CRIT=3
DOWN=4
local STATUS_LABELS = {
[OKAY]='OKAY',
[WARN]='WARN',
[UNKW]='UNKNOWN',
[CRIT]='CRITICAL',
[DOWN]='DOWN'
}
function status_label(v)
return STATUS_LABELS[v]
end
local STATUS_WEIGHTS = {
[UNKW]=0,
[OKAY]=1,
[WARN]=2,
[CRIT]=3,
[DOWN]=4
}
function max_status(val1, val2)
if not val1 then
return val2
elseif not val2 then
return val1
elseif STATUS_WEIGHTS[val1] > STATUS_WEIGHTS[val2] then
return val1
else
return val2
end
end
function compare_threshold(value, op, threshold)
local rule_matches = false
if op == '==' or op == 'eq' then
rule_matches = value == threshold
elseif op == '!=' or op == 'ne' then
rule_matches = value ~= threshold
elseif op == '>=' or op == 'gte' then
rule_matches = value >= threshold
elseif op == '>' or op == 'gt' then
rule_matches = value > threshold
elseif op == '<=' or op == 'lte' then
rule_matches = value <= threshold
elseif op == '<' or op == 'lt' then
rule_matches = value < threshold
end
return rule_matches
end
return M

View File

@ -1,88 +0,0 @@
-- Copyright 2015-2016 Mirantis, Inc.
--
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
--
-- http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
local cjson = require 'cjson'
local inject_message = inject_message
local read_message = read_message
local string = string
local pcall = pcall
local M = {}
setfenv(1, M) -- Remove external access to contain everything in the module
-- Return the value and index of the last field with a given name.
function read_field(name)
local i = -1
local value = nil
local variable_name = string.format('Fields[%s]', name)
repeat
local tmp = read_message(variable_name, i + 1)
if tmp == nil then
break
end
value = tmp
i = i + 1
until false
return value, i
end
-- Extract value(s) from the message. The value can be either a scalar value
-- or a table for multi-value metrics.Return nil and an error message on
-- failure. The argument "tags" is optional, it's used for sanity checks.
function read_values(tags)
if not tags then
tags = {}
end
local value
local value_fields, value_fields_index = read_field('value_fields')
if value_fields ~= nil then
if tags['value_fields'] ~= nil and value_fields_index == 0 then
return nil, 'index of field "value_fields" should not be 0'
end
local i = 0
value = {}
repeat
local value_key = read_message(
'Fields[value_fields]', value_fields_index, i)
if value_key == nil then
break
end
local value_val, value_index = read_field(value_key)
if value_val == nil then
return nil, string.format('field "%s" is missing', value_key)
end
if tags[value_key] ~= nil and value_index == 0 then
return nil, string.format(
'index of field "%s" should not be 0', value_key)
end
value[value_key] = value_val
i = i + 1
until false
else
local value_index
value, value_index = read_field('value')
if value == nil then
-- "value" is a required field
return nil, 'field "value" is missing'
end
if tags['value'] ~= nil and value_index == 0 then
return nil, 'index of field "value" should not be 0'
end
end
return value, ''
end
return M

View File

@ -1,34 +0,0 @@
-- Copyright 2015-2016 Mirantis, Inc.
--
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
--
-- http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
local l = require 'lpeg'
l.locale(l)
local tonumber = tonumber
local M = {}
setfenv(1, M) -- Remove external access to contain everything in the module
function anywhere (patt)
return l.P {
patt + 1 * l.V(1)
}
end
sp = l.space
-- Pattern used to match a number
Number = l.P"-"^-1 * l.xdigit^1 * (l.S(".,") * l.xdigit^1 )^-1 / tonumber
return M

View File

@ -1,83 +0,0 @@
-- Copyright 2015-2016 Mirantis, Inc.
--
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
--
-- http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
local table = require 'table'
local ipairs = ipairs
local pairs = pairs
local type = type
local M = {}
setfenv(1, M) -- Remove external access to contain everything in the module
-- return the position (index) of an item in a list, nil if not found
function item_pos(item, list)
if type(list) == 'table' then
for i, v in ipairs(list) do
if v == item then
return i
end
end
end
end
-- return true if an item is present in the list, false otherwise
function item_find(item, list)
return item_pos(item, list) ~= nil
end
-- from http://lua-users.org/wiki/SortedIteration
function __genOrderedIndex( t )
local orderedIndex = {}
for key in pairs(t) do
table.insert( orderedIndex, key )
end
table.sort( orderedIndex )
return orderedIndex
end
function orderedNext(t, state)
-- Equivalent of the next function, but returns the keys in the alphabetic
-- order. We use a temporary ordered key table that is stored in the
-- table being iterated.
key = nil
if state == nil then
-- the first time, generate the index
t.__orderedIndex = __genOrderedIndex( t )
key = t.__orderedIndex[1]
else
-- fetch the next value
for i = 1,table.getn(t.__orderedIndex) do
if t.__orderedIndex[i] == state then
key = t.__orderedIndex[i+1]
end
end
end
if key then
return key, t[key]
end
-- no more value to return, cleanup
t.__orderedIndex = nil
return
end
function orderedPairs(t)
-- Equivalent of the pairs() function on tables. Allows to iterate
-- in order
return orderedNext, t, nil
end
return M

View File

@ -1,46 +0,0 @@
-- Copyright 2015-2016 Mirantis, Inc.
--
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
--
-- http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
local cjson = require 'cjson'
local inject_message = inject_message
local read_message = read_message
local string = string
local pcall = pcall
local M = {}
setfenv(1, M) -- Remove external access to contain everything in the module
-- Encode a Lua variable as JSON without raising an exception if the encoding
-- fails for some reason (for instance, the encoded buffer exceeds the sandbox
-- limit)
function safe_json_encode(v)
local ok, data = pcall(cjson.encode, v)
if not ok then
return
end
return data
end
-- Call inject_message() wrapped by pcall()
function safe_inject_message(msg)
local ok, err_msg = pcall(inject_message, msg)
if not ok then
return -1, err_msg
else
return 0
end
end
return M

View File

@ -1,171 +0,0 @@
-- Copyright 2016 Mirantis, Inc.
--
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
--
-- http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
local l = require "lpeg"
l.locale(l)
local pcall = pcall
local string = require 'string'
local patterns = require 'stacklight.patterns'
local error = error
local setmetatable = setmetatable
local tonumber = tonumber
local C = l.C
local P = l.P
local S = l.S
local V = l.V
local Ct = l.Ct
local Cc = l.Cc
local Optional_space = patterns.sp^0
local Only_spaces = patterns.sp^1 * -1
local function space(pat)
return Optional_space * pat * Optional_space
end
local EQ = P'=='
local NEQ = P'!='
local GT = P'>'
local LT = P'<'
local GTE = P'>='
local LTE = P'<='
local MATCH = P'=~'
local NO_MATCH = P'!~'
local OR = P'||'
local AND = P'&&'
local function get_operator(op)
if op == '' then
return '=='
end
return op
end
local numerical_operator = (EQ + NEQ + LTE + GTE + GT + LT )^-1 / get_operator
local sub_numerical_expression = space(numerical_operator) * patterns.Number * Optional_space
local is_plain_numeric = (sub_numerical_expression * ((OR^1 + AND^1) * sub_numerical_expression)^0) * -1
local quoted_string = (P'"' * C((P(1) - (P'"'))^1) * P'"' + C((P(1) - patterns.sp)^1))
local string_operator = (EQ + NEQ + MATCH + NO_MATCH)^-1 / get_operator
local sub_string_expression = space(string_operator) * quoted_string * Optional_space
local is_plain_string = (sub_string_expression * ((OR^1 + AND^1) * sub_string_expression)^0) * -1
local numerical_expression = P {
'OR';
AND = Ct(Cc('and') * V'SUB' * space(AND) * V'AND' + V'SUB'),
OR = Ct(Cc('or') * V'AND' * space(OR) * V'OR' + V'AND'),
SUB = Ct(sub_numerical_expression)
} * -1
local string_expression = P {
'OR';
AND = Ct(Cc('and') * V'SUB' * space(AND) * V'AND' + V'SUB'),
OR = Ct(Cc('or') * V'AND' * space(OR) * V'OR' + V'AND'),
SUB = Ct(sub_string_expression)
} * -1
local is_complex = patterns.anywhere(EQ + NEQ + LTE + GTE + GT + LT + MATCH + NO_MATCH + OR + AND)
local function eval_tree(tree, value)
local match = false
if type(tree[1]) == 'table' then
match = eval_tree(tree[1], value)
else
local operator = tree[1]
if operator == 'and' or operator == 'or' then
match = eval_tree(tree[2], value)
for i=3, #tree, 1 do
local m = eval_tree(tree[i], value)
if operator == 'or' then
match = match or m
else
match = match and m
end
end
else
local matcher = tree[2]
if operator == '==' then
return value == matcher
elseif operator == '!=' then
return value ~= matcher
elseif operator == '>' then
return value > matcher
elseif operator == '<' then
return value < matcher
elseif operator == '>=' then
return value >= matcher
elseif operator == '<=' then
return value <= matcher
elseif operator == '=~' then
local ok, m = pcall(string.find, value, matcher)
return ok and m ~= nil
elseif operator == '!~' then
local ok, m = pcall(string.find, value, matcher)
return ok and m == nil
end
end
end
return match
end
local MatchExpression = {}
MatchExpression.__index = MatchExpression
setfenv(1, MatchExpression) -- Remove external access to contain everything in the module
function MatchExpression.new(expression)
local r = {}
setmetatable(r, MatchExpression)
if is_complex:match(expression) then
r.is_plain_numeric_exp = is_plain_numeric:match(expression) ~= nil
if r.is_plain_numeric_exp then
r.tree = numerical_expression:match(expression)
elseif is_plain_string:match(expression) ~= nil then
r.tree = string_expression:match(expression)
end
if r.tree == nil then
error('Invalid expression: ' .. expression)
end
else
if expression == '' or Only_spaces:match(expression) then
error('Expression is empty')
end
r.is_simple_equality_matching = true
end
r.expression = expression
return r
end
function MatchExpression:matches(value)
if self.is_simple_equality_matching then
return self.expression == value or
tonumber(self.expression) == value or
tonumber(value) == self.expression
end
if self.is_plain_numeric_exp then
value = tonumber(value)
if value == nil then
return false
end
end
return eval_tree(self.tree, value)
end
return MatchExpression

View File

@ -1,71 +0,0 @@
local M = {}
setfenv(1, M) -- Remove external access to contain everything in the module
local alarms = {
{
['name'] = 'cpu-critical',
['description'] = 'The CPU usage is too high',
['severity'] = 'critical',
['trigger'] = {
['logical_operator'] = 'or',
['rules'] = {
{
['metric'] = 'intel.procfs.cpu.idle_percentage',
['fields'] = {
['cpuID'] = 'all'
},
['relational_operator'] = '<=',
['threshold'] = '5',
['window'] = '120',
['periods'] = '0',
['function'] = 'avg',
},
{
['metric'] = 'intel.procfs.cpu.iowait_percentage',
['fields'] = {
['cpuID'] = 'all'
},
['relational_operator'] = '>=',
['threshold'] = '35',
['window'] = '120',
['periods'] = '0',
['function'] = 'avg',
},
},
},
},
{
['name'] = 'cpu-warning',
['description'] = 'The CPU usage is high',
['severity'] = 'warning',
['trigger'] = {
['logical_operator'] = 'or',
['rules'] = {
{
['metric'] = 'intel.procfs.cpu.idle_percentage',
['fields'] = {
['cpuID'] = 'all'
},
['relational_operator'] = '<=',
['threshold'] = '15',
['window'] = '120',
['periods'] = '0',
['function'] = 'avg',
},
{
['metric'] = 'intel.procfs.cpu.iowait_percentage',
['fields'] = {
['cpuID'] = 'all'
},
['relational_operator'] = '>=',
['threshold'] = '25',
['window'] = '120',
['periods'] = '0',
['function'] = 'avg',
},
},
},
},
}
return alarms

View File

@ -1,3 +0,0 @@
Package: *
Pin: origin "mirror.fuel-infra.org"
Pin-Priority: 500

View File

@ -1,207 +0,0 @@
--
-- Inspired from the lua_sandbox Postgres Output Example
-- https://github.com/mozilla-services/lua_sandbox/blob/f1ee9eb/docs/heka/output.md#example-postgres-output
--
local os = require 'os'
local http = require 'socket.http'
local message = require 'stacklight.message'
--local write = require 'io'.write
--local flush = require 'io'.flush
local influxdb_host = read_config('host') or error('influxdb host is required')
local influxdb_port = read_config('port') or error('influxdb port is required')
local batch_max_lines = read_config('batch_max_lines') or 3000
assert(batch_max_lines > 0, 'batch_max_lines must be greater than zero')
local db = read_config("database") or error("database config is required")
local write_url = string.format('http://%s:%d/write?db=%s', influxdb_host, influxdb_port, db)
local query_url = string.format('http://%s:%s/query', influxdb_host, influxdb_port)
local database_created = false
local buffer = {}
local buffer_len = 0
local function escape_string(str)
return tostring(str):gsub("([ ,])", "\\%1")
end
local function encode_scalar_value(value)
if type(value) == "number" then
-- Always send numbers as formatted floats, so InfluxDB will accept
-- them if they happen to change from ints to floats between
-- points in time. Forcing them to always be floats avoids this.
return string.format("%.6f", value)
elseif type(value) == "string" then
-- string values need to be double quoted
return '"' .. value:gsub('"', '\\"') .. '"'
elseif type(value) == "boolean" then
return '"' .. tostring(value) .. '"'
end
end
local function encode_value(value)
if type(value) == "table" then
local values = {}
for k, v in pairs(value) do
table.insert(
values,
string.format("%s=%s", escape_string(k), encode_scalar_value(v))
)
end
return table.concat(values, ',')
else
return "value=" .. encode_scalar_value(value)
end
end
local function write_batch()
assert(buffer_len > 0)
local body = table.concat(buffer, '\n')
local resp_body, resp_status = http.request(write_url, body)
if resp_body and resp_status == 204 then
-- success
buffer = {}
buffer_len = 0
return resp_body, ''
else
-- error
local err_msg = resp_status
if resp_body then
err_msg = string.format('influxdb write error: [%s] %s',
resp_status, resp_body)
end
return nil, err_msg
end
end
local function create_database()
-- query won't fail if database already exists
local body = string.format('q=CREATE DATABASE %s', db)
local resp_body, resp_status = http.request(query_url, body)
if resp_body and resp_status == 200 then
-- success
return resp_body, ''
else
-- error
local err_msg = resp_status
if resp_body then
err_msg = string.format('influxdb create database error [%s] %s',
resp_status, resp_body)
end
return nil, err_msg
end
end
-- create a line for the current message, return nil and an error string
-- if the message is invalid
local function create_line()
local tags = {}
local dimensions, dimensions_index = message.read_field('dimensions')
if dimensions then
local i = 0
repeat
local tag_key = read_message('Fields[dimensions]', dimensions_index, i)
if tag_key == nil then
break
end
-- skip the plugin_running_on dimension
if tag_key ~= 'plugin_running_on' then
local variable_name = string.format('Fields[%s]', tag_key)
local tag_val = read_message(variable_name, 0)
if tag_val == nil then
-- the dimension is advertized in the "dimensions" field
-- but there is no field for it, so we consider the
-- entire message as invalid
return nil, string.format('dimension "%s" is missing', tag_key)
end
tags[escape_string(tag_key)] = escape_string(tag_val)
end
i = i + 1
until false
end
if tags['dimensions'] ~= nil and dimensions_index == 0 then
return nil, 'index of field "dimensions" should not be 0'
end
local name, name_index = message.read_field('name')
if name == nil then
-- "name" is a required field
return nil, 'field "name" is missing'
end
if tags['name'] ~= nil and name_index == 0 then
return nil, 'index of field "name" should not be 0'
end
local value, err_msg = message.read_values(tags)
if value == nil then
return nil, err_msg
end
local tags_array = {}
for tag_key, tag_val in pairs(tags) do
table.insert(tags_array, string.format('%s=%s', tag_key, tag_val))
end
return string.format('%s,%s %s %d',
escape_string(name),
table.concat(tags_array, ','),
encode_value(value),
string.format('%d', read_message('Timestamp'))), ''
end
function process_message()
if not database_created then
local ok, err_msg = create_database()
if not ok then
return -3, err_msg -- retry
end
database_created = true
end
local line, err_msg = create_line()
if line == nil then
-- the message is not valid, skip it
return -2, err_msg -- skip
end
buffer_len = buffer_len + 1
buffer[buffer_len] = line
if buffer_len > batch_max_lines then
local ok, err_msg = write_batch()
if not ok then
buffer[buffer_len] = nil
buffer_len = buffer_len - 1
-- recreate database on retry
if string.match(err_msg, 'database not found') then
database_created = false
end
return -3, err_msg -- retry
end
return 0
end
return -4 -- batching
end
function timer_event(ns)
if buffer_len > 0 then
local ok, _ = write_batch()
if ok then
update_checkpoint()
end
end
end

View File

@ -1 +0,0 @@
deb http://mirror.fuel-infra.org/mos-repos/ubuntu/snapshots/9.0-latest/ mos9.0-proposed main

View File

@ -1,18 +0,0 @@
FROM {{ image_spec("base-tools") }}
MAINTAINER {{ maintainer }}
# NOTE(elemoine): the InfluxDB package is downloaded from dl.influxdb.com. Do
# we want to host the package instead?
RUN gpg \
--keyserver hkp://ha.pool.sks-keyservers.net \
--recv-keys 05CE15085FC09D18E99EFB22684A14CF2582E0C5 \
&& curl https://dl.influxdata.com/influxdb/releases/influxdb_{{ influxdb_version }}_amd64.deb.asc -o /tmp/influxdb.deb.asc \
&& curl https://dl.influxdata.com/influxdb/releases/influxdb_{{ influxdb_version }}_amd64.deb -o /tmp/influxdb.deb \
&& gpg --batch --verify /tmp/influxdb.deb.asc /tmp/influxdb.deb \
&& dpkg -i /tmp/influxdb.deb \
&& chown -R influxdb: /etc/influxdb \
&& usermod -a -G microservices influxdb \
&& rm -f /tmp/influxdb.deb.asc /tmp/influxdb.deb
USER influxdb

View File

@ -1,11 +0,0 @@
FROM {{ image_spec("base-tools") }}
MAINTAINER {{ maintainer }}
RUN curl https://download.elastic.co/kibana/kibana/kibana-{{ kibana_version }}-amd64.deb -o /tmp/kibana.deb \
&& dpkg -i /tmp/kibana.deb \
&& rm -f /tmp/kibana.deb
RUN usermod -a -G microservices kibana \
&& chown -R kibana: /opt/kibana
USER kibana

View File

@ -1,16 +0,0 @@
FROM {{ image_spec("base-tools") }}
MAINTAINER {{ maintainer }}
# Install Snap
ADD install.sh /tmp/
RUN mkdir -p /etc/snap/auto \
&& bash /tmp/install.sh /etc/snap/auto \
&& rm /tmp/install.sh \
&& useradd --user-group snap \
&& usermod -a -G microservices snap \
&& chown -R snap: /etc/snap \
&& apt-get purge -y --auto-remove \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
USER snap

View File

@ -1,58 +0,0 @@
#!/bin/bash
set -e
AUTO_DISCOVERY_PATH="$1"
#
# Install Snap and Snap plugins
#
# Snap release, platform and architecture
RELEASE=v0.15.0-beta-126-g9a05d66
PLATFORM=linux
ARCH=amd64
TDIR=/tmp/snap
# Binary storage service URI components
PROTOCOL=https
HOST=bintray.com
BASEURL="mirantis/snap/download_file?file_path="
mkdir -p $TDIR
# Retrieve archived binaries and extract them in temporary location
for a in snap snap-plugins; do
f="${a}-${RELEASE}-${PLATFORM}-${ARCH}.tar.gz"
# -L required due to potential successive redirections
curl -s -k -L -o $TDIR/$f ${PROTOCOL}://${HOST}/${BASEURL}$f
tar zxCf $TDIR $TDIR/$f --exclude '*mock[12]'
done
# Copy retrieved binaries excluding demo plugins
install --owner=root --group=root --mode=755 $TDIR/snap-${RELEASE}/bin/* $TDIR/snap-${RELEASE}/plugin/* /usr/local/bin
# Make the plugins auto-loadable by the snap framework
for f in /usr/local/bin/snap-plugin*; do
ln -s $f $AUTO_DISCOVERY_PATH
done
# Update some permissions for plugins which require privileged access to filesystem
#
# the processes snap plugin accesses files like /proc/1/io which
# only the root user can read
#
# the smart snap plugin accesses files in /host-proc and /host-dev (/proc and /dev
# from the host) which also requires root user access
#
for f in snap-plugin-collector-processes snap-plugin-collector-smart; do
chmod u+s /usr/local/bin/$f
done
#
# Clean up
#
apt-get purge -y --auto-remove $BUILD_DEPS
apt-get clean
rm -rf /var/lib/apt/lists/*
rm -rf $TDIR
exit 0

View File

@ -1,36 +0,0 @@
dsl_version: 0.1.0
service:
name: cron
kind: DaemonSet
containers:
- name: cron
image: cron
volumes:
- name: mysql-logs
path: "/var/log/ccp/mysql"
type: host
readOnly: False
- name: rabbitmq-logs
path: "/var/log/ccp/rabbitmq"
type: host
readOnly: False
- name: keystone-logs
path: "/var/log/ccp/keystone"
type: host
readOnly: False
- name: horizon-logs
path: "/var/log/ccp/horizon"
type: host
readOnly: False
daemon:
command: cron -f
files:
- logrotate.conf
- logrotate-services.conf
files:
logrotate.conf:
path: /etc/logrotate.conf
content: cron-logrotate-global.conf.j2
logrotate-services.conf:
path: /etc/logrotate.d/logrotate-services.conf
content: cron-logrotate-services.conf.j2

View File

@ -1,11 +0,0 @@
filename = "afd.lua"
-- log_level = 7
message_matcher = "TRUE"
ticker_interval = 10
{% raw %}
afd_type = "node"
afd_file = "{{ afd_file }}"
afd_cluster_name = "{{ afd_cluster_name }}"
afd_logical_name = "{{ afd_logical_name }}"
{% endraw %}
hostname = "{{ node_name }}"

View File

@ -1,79 +0,0 @@
alarms:
- name: 'root-fs-warning'
description: 'The root filesystem free space is low'
severity: 'warning'
enabled: 'true'
trigger:
rules:
- metric: 'intel.procfs.filesystem.space_percent_free'
fields:
filesystem: 'rootfs'
relational_operator: '<'
threshold: 10
window: 60
periods: 0
function: min
- name: 'root-fs-critical'
description: 'The root filesystem free space is too low'
severity: 'critical'
enabled: 'true'
trigger:
rules:
- metric: 'intel.procfs.filesystem.space_percent_free'
fields:
filesystem: 'rootfs'
relational_operator: '<'
threshold: 5
window: 60
periods: 0
function: min
- name: 'cpu-critical'
description: 'The CPU usage is too high'
severity: 'critical'
trigger:
logical_operator: 'or'
rules:
- metric: 'intel.procfs.cpu.idle_percentage'
fields:
cpuID: 'all'
relational_operator: '<='
threshold: '5'
window: '120'
periods: '0'
function: 'avg'
- metric: 'intel.procfs.cpu.iowait_percentage'
fields:
cpuID: 'all'
relational_operator: '>='
threshold: '35'
window: '120'
periods: '0'
function: 'avg'
- name: 'cpu-warning'
description: 'The CPU usage is high'
severity: 'warning'
trigger:
logical_operator: 'or'
rules:
- metric: 'intel.procfs.cpu.idle_percentage'
fields:
cpuID: 'all'
relational_operator: '<='
threshold: '15'
window: '120'
periods: '0'
function: 'avg'
- metric: 'intel.procfs.cpu.iowait_percentage'
fields:
cpuID: 'all'
relational_operator: '>='
threshold: '25'
window: '120'
periods: '0'
function: 'avg'
node_cluster_alarms:
system:
alarms:
rootfs: ['root-fs-critical', 'root-fs-warning']
cpu: ['cpu-critical', 'cpu-warning']

View File

@ -1,12 +0,0 @@
{{ cron.rotate.interval }}
rotate {{ cron.rotate.days }}
copytruncate
compress
delaycompress
notifempty
missingok
minsize {{ cron.rotate.minsize }}
maxsize {{ cron.rotate.maxsize }}
include /etc/logrotate.d

View File

@ -1,12 +0,0 @@
{% set services = [
'mysql',
'rabbitmq',
'keystone',
'horizon',
]
%}
{% for service in services %}
"/var/log/ccp/{{ service }}/*.log"
{}
{% endfor %}

View File

@ -1,27 +0,0 @@
configs:
kibana:
port:
cont: 5601
heka:
max_procs: 2
service_pattern: "^k8s_(.-)%..*"
hindsight_heka_tcp_port:
cont: 5565
influxdb:
database: "ccp"
host: "influxdb"
password: ""
port:
cont: 8086
user: ""
snap:
log_level: 3
cron:
rotate:
interval: "daily"
days: 6
minsize: "1M"
maxsize: "100M"
versions:
influxdb_version: "0.13.0"
kibana_version: "4.6.1"

View File

@ -1,40 +0,0 @@
#!/bin/bash
GRAFANA_URL="http://{{ grafana.user }}:{{ grafana.password }}@{{ address('grafana') }}:{{ grafana.port.cont }}"
echo "Waiting for Grafana to come up..."
until $(curl --fail --output /dev/null --silent ${GRAFANA_URL}/api/org); do
printf "."
sleep 2
done
echo -e "Grafana is up and running.\n"
echo "Creating InfluxDB datasource..."
curl -i -XPOST -H "Accept: application/json" -H "Content-Type: application/json" "${GRAFANA_URL}/api/datasources" -d '
{
"name": "CCP InfluxDB",
"type": "influxdb",
"access": "proxy",
"isDefault": true,
"url": "'"{{ address('influxdb', influxdb.port, with_scheme=True) }}"'",
"password": "'"{{ influxdb.password }}"'",
"user": "'"{{ influxdb.user }}"'",
"database": "'"{{ influxdb.database }}"'"
}'
if [ $? -ne 0 ]; then
echo "Can not create InfluxDB datasource"
exit 1
fi
echo -e "InfluxDB datasource was successfully created.\n"
echo "Importing default dashboards..."
for dashboard in /tmp/*.dashboard.json; do
echo -e "\tImporting ${dashboard}..."
curl -i -XPOST --data "@${dashboard}" -H "Accept: application/json" -H "Content-Type: application/json" "${GRAFANA_URL}/api/dashboards/db"
if [ $? -ne 0 ]; then
echo "Error importing ${dashboard}"
exit 1
fi
echo -e "\tDone"
done
echo -e "Default dashboards succesfully imported\n"

View File

@ -1,10 +0,0 @@
[docker_logs_decoder]
type = "MultiDecoder"
subs = ['openstack_log_decoder', 'ovs_log_decoder']
cascade_strategy = "first-wins"
log_sub_errors = false
[docker_log_input]
type = "DockerLogInput"
decoder = "docker_logs_decoder"
log_decode_failures = false

View File

@ -1,16 +0,0 @@
[elasticsearch_json_encoder]
type = "ESJsonEncoder"
index = {% raw %}"%{Type}-%{%Y.%m.%d}"{% endraw %}
es_index_from_timestamp = true
fields = ["Timestamp", "Type", "Logger", "Severity", "Payload", "Pid", "Hostname", "DynamicFields"]
[elasticsearch_output]
type = "ElasticSearchOutput"
server = "{{ address('elasticsearch', elasticsearch.port, with_scheme=True) }}"
message_matcher = "Type == 'log'"
encoder = "elasticsearch_json_encoder"
use_buffering = true
[elasticsearch_output.buffering]
max_buffer_size = 1073741824 # 1024 * 1024 * 1024
max_file_size = 134217728 # 128 * 1024 * 1024
full_action = "block"

View File

@ -1,10 +0,0 @@
[hekad]
maxprocs = {{ heka.max_procs }}
[debug_output]
type = "LogOutput"
message_matcher = "Fields[payload_name] == 'debug'"
encoder = "rst_encoder"
[rst_encoder]
type = "RstEncoder"

View File

@ -1,13 +0,0 @@
[horizon_apache_log_decoder]
type = "SandboxDecoder"
filename = "lua_decoders/os_horizon_apache_log.lua"
[horizon_apache_log_decoder.config]
access_log_pattern = '%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b %D \"%{Referer}i\" \"%{User-Agent}i\"'
[horizon_apache_logstreamer_input]
type = "LogstreamerInput"
decoder = "horizon_apache_log_decoder"
log_directory = "/var/log/ccp/horizon"
file_match = 'horizon-(?P<Service>.+)\.log\.?(?P<Seq>\d*)$'
priority = ["^Seq"]
differentiator = ["horizon-", "Service"]

View File

@ -1,13 +0,0 @@
[keystone_apache_log_decoder]
type = "SandboxDecoder"
filename = "lua_decoders/os_keystone_apache_log.lua"
[keystone_apache_log_decoder.config]
apache_log_pattern = '%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b %D \"%{Referer}i\" \"%{User-Agent}i\"'
[keystone_apache_logstreamer_input]
type = "LogstreamerInput"
decoder = "keystone_apache_log_decoder"
log_directory = "/var/log/ccp/keystone"
file_match = 'keystone-(?P<Service>.+)\.log\.?(?P<Seq>\d*)$'
priority = ["^Seq"]
differentiator = ["keystone-", "Service"]

View File

@ -1,11 +0,0 @@
[mariadb_log_decoder]
type = "SandboxDecoder"
filename = "lua_decoders/os_mysql_log.lua"
[mariadb_logstreamer_input]
type = "LogstreamerInput"
decoder = "mariadb_log_decoder"
log_directory = "/var/log/ccp/mysql"
file_match = 'mysql\.log\.?(?P<Seq>\d*)$'
priority = ["^Seq"]
differentiator = ['mysql']

View File

@ -1,6 +0,0 @@
[openstack_log_decoder]
type = "SandboxDecoder"
filename = "lua_decoders/os_openstack_log.lua"
[openstack_log_decoder.config]
heka_service_pattern = "{{ heka.service_pattern }}"

View File

@ -1,6 +0,0 @@
[ovs_log_decoder]
type = "SandboxDecoder"
filename = "lua_decoders/os_ovs.lua"
[ovs_log_decoder.config]
heka_service_pattern = "{{ heka.service_pattern }}"

View File

@ -1,18 +0,0 @@
[rabbitmq_log_decoder]
type = "SandboxDecoder"
filename = "lua_decoders/os_rabbitmq_log.lua"
[rabbitmq_log_splitter]
type = "RegexSplitter"
delimiter = '\n\n(=[^=]+====)'
delimiter_eol = false
deliver_incomplete_final = true
[rabbitmq_logstreamer_input]
type = "LogstreamerInput"
decoder = "rabbitmq_log_decoder"
splitter = "rabbitmq_log_splitter"
log_directory = "/var/log/ccp/rabbitmq"
file_match = '(?P<Service>rabbitmq.*)\.log\.?(?P<Seq>\d*)$'
priority = ["^Seq"]
differentiator = ["Service"]

View File

@ -1,22 +0,0 @@
output_path = "/var/lib/hindsight/output"
sandbox_load_path = "/var/lib/hindsight/load"
sandbox_run_path = "/var/lib/hindsight/run"
analysis_lua_path = "/usr/lib/x86_64-linux-gnu/luasandbox/modules/?.lua;/opt/ccp/lua/modules/?.lua"
analysis_lua_cpath = "/usr/lib/x86_64-linux-gnu/luasandbox/modules/?.so"
io_lua_path = analysis_lua_path .. ";/usr/lib/x86_64-linux-gnu/luasandbox/io_modules/?.lua"
io_lua_cpath = analysis_lua_cpath .. ";/usr/lib/x86_64-linux-gnu/luasandbox/io_modules/?.so"
hostname = "{{ node_name }}"
input_defaults = {
-- output_limit = 64 * 1024
-- memory_limit = 8 * 1024 * 1024
-- instruction_limit = 1e6
-- preserve_data = false
-- ticker_interval = 0
}
analysis_defaults = {
}
output_defaults = {
}

View File

@ -1,9 +0,0 @@
filename = "afd.lua"
log_level = 7
message_matcher = "TRUE"
ticker_interval = 10
afd_type = "node"
afd_file = "afd_node_default_cpu_alarms"
afd_cluster_name = "default"
afd_logical_name = "cpu"
hostname = "{{ node_name }}"

View File

@ -1,6 +0,0 @@
filename = "heka_tcp.lua"
address = "localhost"
port = {{ hindsight_heka_tcp_port.cont }}
-- the heka_tcp plugin is a "Continuous" plugin, so instruction_limit
-- must be set to zero
instruction_limit = 0

View File

@ -1,8 +0,0 @@
filename = "influxdb_tcp.lua"
host = "influxdb"
port = {{ influxdb.port.cont }}
database = "{{ influxdb.database }}"
batch_max_lines = 3000
message_matcher = "TRUE"
ticker_interval = 10
log_level = 7

View File

@ -1,4 +0,0 @@
filename = "kubelet_stats.lua"
kubelet_stats_port = 10255
kubelet_stats_node = "{{ node_name }}"
ticker_interval = 10

View File

@ -1,5 +0,0 @@
filename = "prune_input.lua"
ticker_interval = 60
input = true
analysis = true
exit_on_stall = false

View File

@ -1,17 +0,0 @@
reporting-disabled = true
[meta]
dir = "/var/lib/influxdb/meta"
[data]
engine = "tsm1"
dir = "/var/lib/influxdb/data"
wal-dir = "/var/lib/influxdb/wal"
[admin]
enabled = true
[http]
auth-enabled = false # FIXME(elemoine)
bind-address = "{{ network_topology["private"]["address"] }}:{{ influxdb.port.cont }}"
log-enabled = false

View File

@ -1,62 +0,0 @@
# Kibana is served by a back end server. This controls which port to use.
port: {{ kibana.port.cont }}
# The host to bind the server to.
host: "{{ network_topology["private"]["address"] }}"
# The Elasticsearch instance to use for all your queries.
elasticsearch_url: "{{ address('elasticsearch', elasticsearch.port, with_scheme=True) }}"
# preserve_elasticsearch_host true will send the hostname specified in `elasticsearch`. If you set it to false,
# then the host you use to connect to *this* Kibana instance will be sent.
elasticsearch_preserve_host: true
# Kibana uses an index in Elasticsearch to store saved searches, visualizations
# and dashboards. It will create a new index if it doesn't already exist.
kibana_index: ".kibana"
# If your Elasticsearch is protected with basic auth, this is the user credentials
# used by the Kibana server to perform maintence on the kibana_index at statup. Your Kibana
# users will still need to authenticate with Elasticsearch (which is proxied thorugh
# the Kibana server)
# kibana_elasticsearch_username: user
# kibana_elasticsearch_password: pass
# The default application to load.
# kibana.defaultAppId: "dashboard/Main"
# Time in milliseconds to wait for responses from the back end or elasticsearch.
# This must be > 0
request_timeout: 300000
# Time in milliseconds for Elasticsearch to wait for responses from shards.
# Set to 0 to disable.
shard_timeout: 0
# Set to false to have a complete disregard for the validity of the SSL
# certificate.
verify_ssl: false
# If you need to provide a CA certificate for your Elasticsarech instance, put
# the path of the pem file here.
# ca: /path/to/your/CA.pem
# SSL for outgoing requests from the Kibana Server (PEM formatted)
# ssl_key_file: /path/to/your/server.key
# ssl_cert_file: /path/to/your/server.crt
# Set the path to where you would like the process id file to be created.
# pid_file: /var/run/kibana.pid
# Plugins that are included in the build, and no longer found in the plugins/ folder
bundled_plugin_ids:
- plugins/dashboard/index
- plugins/discover/index
- plugins/doc/index
- plugins/kibana/index
- plugins/markdown_vis/index
- plugins/metric_vis/index
- plugins/settings/index
- plugins/table_vis/index
- plugins/vis_types/index
- plugins/visualize/index

File diff suppressed because it is too large Load Diff

View File

@ -1,108 +0,0 @@
{
"version": 1,
"schedule": {
"type": "simple",
"interval": "10s"
},
"max-failures": 5,
"workflow": {
"collect": {
"config": {
"/intel": {
"proc_path": "/host-proc"
},
"/intel/disk": {
"dev_path": "/host-dev"
}
},
"tags": {
"/intel": {
"hostname": "{{ node_name }}"
}
},
"metrics": {
"/intel/disk/smart/*": {},
"/intel/procfs/cpu/*/idle_percentage": {},
"/intel/procfs/cpu/*/iowait_percentage": {},
"/intel/procfs/cpu/*/irq_percentage": {},
"/intel/procfs/cpu/*/nice_percentage": {},
"/intel/procfs/cpu/*/softirq_percentage": {},
"/intel/procfs/cpu/*/steal_percentage": {},
"/intel/procfs/cpu/*/system_percentage": {},
"/intel/procfs/cpu/*/user_percentage": {},
"/intel/procfs/filesystem/*/inodes_free": {},
"/intel/procfs/filesystem/*/inodes_reserved": {},
"/intel/procfs/filesystem/*/inodes_used": {},
"/intel/procfs/filesystem/*/space_free": {},
"/intel/procfs/filesystem/*/space_reserved": {},
"/intel/procfs/filesystem/*/space_used": {},
"/intel/procfs/filesystem/*/inodes_percent_free": {},
"/intel/procfs/filesystem/*/inodes_percent_reserved": {},
"/intel/procfs/filesystem/*/inodes_percent_used": {},
"/intel/procfs/filesystem/*/space_percent_free": {},
"/intel/procfs/filesystem/*/space_percent_reserved": {},
"/intel/procfs/filesystem/*/space_percent_used": {},
"/intel/procfs/filesystem/*/device_name": {},
"/intel/procfs/filesystem/*/device_type": {},
"/intel/procfs/disk/*/merged_read": {},
"/intel/procfs/disk/*/merged_write": {},
"/intel/procfs/disk/*/octets_read": {},
"/intel/procfs/disk/*/octets_write": {},
"/intel/procfs/disk/*/ops_read": {},
"/intel/procfs/disk/*/ops_write": {},
"/intel/procfs/disk/*/time_read": {},
"/intel/procfs/disk/*/time_write": {},
"/intel/procfs/iface/*/bytes_recv": {},
"/intel/procfs/iface/*/bytes_sent": {},
"/intel/procfs/iface/*/compressed_recv": {},
"/intel/procfs/iface/*/compressed_sent": {},
"/intel/procfs/iface/*/drop_recv": {},
"/intel/procfs/iface/*/drop_sent": {},
"/intel/procfs/iface/*/errs_recv": {},
"/intel/procfs/iface/*/errs_sent": {},
"/intel/procfs/iface/*/fifo_recv": {},
"/intel/procfs/iface/*/fifo_sent": {},
"/intel/procfs/iface/*/frame_recv": {},
"/intel/procfs/iface/*/frame_sent": {},
"/intel/procfs/iface/*/multicast_recv": {},
"/intel/procfs/iface/*/multicast_sent": {},
"/intel/procfs/iface/*/packets_recv": {},
"/intel/procfs/iface/*/packets_sent": {},
"/intel/procfs/load/min1": {},
"/intel/procfs/load/min5": {},
"/intel/procfs/load/min15": {},
"/intel/procfs/load/min1_rel": {},
"/intel/procfs/load/min5_rel": {},
"/intel/procfs/load/min15_rel": {},
"/intel/procfs/load/runnable_scheduling": {},
"/intel/procfs/load/existing_scheduling": {},
"/intel/procfs/meminfo/buffers": {},
"/intel/procfs/meminfo/cached": {},
"/intel/procfs/meminfo/mem_free": {},
"/intel/procfs/meminfo/mem_used": {},
"/intel/procfs/processes/dead": {},
"/intel/procfs/processes/parked": {},
"/intel/procfs/processes/running": {},
"/intel/procfs/processes/sleeping": {},
"/intel/procfs/processes/stopped": {},
"/intel/procfs/processes/tracing": {},
"/intel/procfs/processes/waiting": {},
"/intel/procfs/processes/wakekill": {},
"/intel/procfs/processes/waking": {},
"/intel/procfs/processes/zombie": {},
"/intel/procfs/swap/all/cached_bytes": {},
"/intel/procfs/swap/all/free_bytes": {},
"/intel/procfs/swap/io/in_pages_per_sec": {},
"/intel/procfs/swap/io/out_pages_per_sec": {},
"/intel/procfs/swap/all/used_bytes": {}
},
"publish": [{
"plugin_name": "heka",
"config": {
"host": "localhost",
"port": {{ hindsight_heka_tcp_port.cont }}
}
}]
}
}
}

View File

@ -1,5 +0,0 @@
log_level: {{ snap.log_level }}
control:
plugin_load_timeout: 15
plugin_trust_level: 0
auto_discover_path: /etc/snap/auto

File diff suppressed because it is too large Load Diff

View File

@ -1,69 +0,0 @@
dsl_version: 0.1.0
service:
name: heka
kind: DaemonSet
containers:
- name: heka
image: heka
volumes:
- name: docker-sock
type: host
path: /run/docker.sock
- name: mysql-logs
path: "/var/log/ccp/mysql"
type: host
readOnly: True
- name: rabbitmq-logs
path: "/var/log/ccp/rabbitmq"
type: host
readOnly: True
- name: keystone-logs
path: "/var/log/ccp/keystone"
type: host
readOnly: True
- name: horizon-logs
path: "/var/log/ccp/horizon"
type: host
readOnly: True
daemon:
command: hekad --config=/etc/heka
dependencies:
- elasticsearch
files:
- heka-global.toml
- heka-elasticsearch.toml
- heka-mariadb.toml
- heka-openstack.toml
- heka-rabbitmq.toml
- heka-ovs.toml
- heka-dockerlogs.toml
- heka-keystone.toml
- heka-horizon.toml
files:
heka-global.toml:
path: /etc/heka/heka-global.toml
content: heka-global.toml.j2
heka-elasticsearch.toml:
path: /etc/heka/heka-elasticsearch.toml
content: heka-elasticsearch.toml.j2
heka-mariadb.toml:
path: /etc/heka/heka-mariadb.toml
content: heka-mariadb.toml.j2
heka-openstack.toml:
path: /etc/heka/heka-openstack.toml
content: heka-openstack.toml.j2
heka-rabbitmq.toml:
path: /etc/heka/heka-rabbitmq.toml
content: heka-rabbitmq.toml.j2
heka-ovs.toml:
path: /etc/heka/heka-ovs.toml
content: heka-ovs.toml.j2
heka-dockerlogs.toml:
path: /etc/heka/heka-dockerlogs.toml
content: heka-dockerlogs.toml
heka-keystone.toml:
path: /etc/heka/heka-keystone.toml
content: heka-keystone.toml.j2
heka-horizon.toml:
path: /etc/heka/heka-horizon.toml
content: heka-horizon.toml.j2

View File

@ -1,43 +0,0 @@
dsl_version: 0.1.0
service:
name: influxdb
ports:
- {{ influxdb.port }}
containers:
- name: influxdb
image: influxdb
daemon:
command: influxd -config /etc/influxdb/influxdb.conf
files:
- influxdb.conf
# {% if grafana is defined and grafana.enable %}
post:
- name: stacklight-grafana-configure
command: /opt/ccp/bin/grafana-configure.sh
type: single
dependencies:
- grafana
files:
- grafana-configure.sh
- kubernetes-dashboard
- system-dashboard
# {% endif %}
volumes:
- name: influxdb-data
type: empty-dir
path: /var/lib/influxdb
files:
influxdb.conf:
path: /etc/influxdb/influxdb.conf
content: influxdb.conf.j2
perm: "0600"
grafana-configure.sh:
path: /opt/ccp/bin/grafana-configure.sh
content: grafana-configure.sh.j2
perm: "0755"
kubernetes-dashboard:
path: /tmp/kubernetes.dashboard.json
content: kubernetes.dashboard.json
system-dashboard:
path: /tmp/system.dashboard.json
content: system.dashboard.json

View File

@ -1,18 +0,0 @@
dsl_version: 0.1.0
service:
name: kibana
ports:
- {{ kibana.port }}
containers:
- name: kibana
image: kibana
daemon:
command: /opt/kibana/bin/kibana
dependencies:
- elasticsearch
files:
- kibana.yml
files:
kibana.yml:
path: /opt/kibana/config/kibana.yml
content: kibana.yml.j2

View File

@ -1,94 +0,0 @@
dsl_version: 0.1.0
service:
name: stacklight-collector
kind: DaemonSet
containers:
- name: hindsight
image: hindsight
pre:
- name: service-bootstrap
type: local
command: /opt/ccp/bin/bootstrap-hindsight.sh /var/lib/hindsight
daemon:
command: /usr/bin/hindsight /etc/hindsight/hindsight.cfg
files:
- hindsight.cfg
- heka-tcp.cfg
- prune-input.cfg
- influxdb-tcp.cfg
- kubelet-stats.cfg
volumes:
- name: hindsight
type: empty-dir
path: /var/lib/hindsight
- name: stacklight-alarms
type: empty-dir
path: /opt/ccp/lua/modules/stacklight_alarms
- name: snap
image: snap
privileged: true
daemon:
command: snapd --config /etc/snap/snap.conf
files:
- snap.conf
- snap-task.json
volumes:
- name: proc
type: host
path: /proc
mount-path: /host-proc
- name: dev
type: host
path: /dev
mount-path: /host-dev
- name: alarm-manager
image: alarm-manager
daemon:
command: /opt/ccp/bin/alarm-manager.py -w /etc/alarm-manager
files:
- alarms.yaml
- lua-cfg-template.j2
volumes:
- name: hindsight
type: empty-dir
path: /var/lib/hindsight
- name: stacklight-alarms
type: empty-dir
path: /opt/ccp/lua/modules/stacklight_alarms
files:
hindsight.cfg:
path: /etc/hindsight/hindsight.cfg
content: hindsight.cfg.j2
perm: "0600"
heka-tcp.cfg:
path: /var/lib/hindsight/run/input/heka_tcp.cfg
content: hindsight_heka_tcp.cfg.j2
perm: "0600"
prune-input.cfg:
path: /var/lib/hindsight/run/input/prune_input.cfg
content: hindsight_prune_input.cfg
perm: "0600"
influxdb-tcp.cfg:
path: /var/lib/hindsight/run/output/influxdb_tcp.cfg
content: hindsight_influxdb_tcp.cfg.j2
perm: "0600"
kubelet-stats.cfg:
path: /var/lib/hindsight/run/input/kubelet_stats.cfg
content: hindsight_kubelet_stats.cfg.j2
perm: "0600"
snap.conf:
path: /etc/snap/snap.conf
content: snap.conf.j2
perm: "0600"
snap-task.json:
path: /etc/snap/auto/task.json
content: snap-task.json.j2
perm: "0600"
alarms.yaml:
path: /etc/alarm-manager/alarms.yaml
content: alarms.yaml
perm: "0600"
lua-cfg-template.j2:
path: /etc/alarm-manager/templates/alarm_manager_lua_config_template.cfg.j2
content: alarm_manager_lua_config_template.cfg.j2
perm: "0600"

View File

@ -1,89 +0,0 @@
#!/usr/bin/python3
# Copyright 2016 Mirantis, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
#
import argparse
import sys
import glob
import os
import json
class Action(argparse.Action):
def __call__(self, parser, namespace, values, option_string=None):
path = values
if os.path.isdir(path):
path = "{}/*.json".format(path)
elif os.path.isfile(path):
pass
else:
raise ValueError("'{}' no such file or directory".format(path))
setattr(namespace, self.dest, path)
parser = argparse.ArgumentParser(
formatter_class=argparse.RawDescriptionHelpFormatter,
description="""
Format JSON file with ordered keys
Remove sections:
templating.list[].current
templating.list[].options
Override time entry to {{"from": "now-1h","to": "now"}}
Enable sharedCrosshair
Increment the version
WARNING: this script modifies all manipulated files.
if a DIRECTORY is provided, all files with suffix '.json' will be modified.
WARNING: this script modifies all manipulated files.""")
parser.add_argument('path',
action=Action,
help="Path to JSON file or directory "
"including .json files")
path = parser.parse_args().path
for f in glob.glob(path):
print('Processing {}...'.format(f))
data = None
absf = os.path.abspath(f)
with open(absf) as _in:
data = json.load(_in)
dashboard = data.get('dashboard')
if not dashboard:
print('Malformed JSON: no "dashboard" key')
sys.exit(1)
for k, v in dashboard.items():
if k == 'annotations':
for anno in v.get('list', []):
anno['datasource'] = 'CCP InfluxDB'
if k == 'templating':
variables = v.get('list', [])
for o in variables:
if o['type'] == 'query':
o['options'] = []
o['current'] = {}
o['refresh'] = 1
dashboard['time'] = {'from': 'now-1h', 'to': 'now'}
dashboard['sharedCrosshair'] = True
dashboard['refresh'] = '1m'
dashboard['id'] = None
dashboard['version'] = dashboard.get('version', 0) + 1
with open(absf, 'w') as out:
json.dump(data, out, indent=2, sort_keys=True)
print('Done processing {}.'.format(f))

View File

@ -1,5 +0,0 @@
#!/bin/bash
set -ex
workdir=$(dirname $0)
yamllint -c $workdir/yamllint.yaml $(find . -not -path '*/\.*' -type f -name '*.yaml')

View File

@ -1,21 +0,0 @@
extends: default
rules:
braces:
max-spaces-inside: 1
comments:
level: error
comments-indentation:
level: warning
document-end:
present: no
document-start:
level: error
present: no
empty-lines:
max: 1
max-start: 0
max-end: 0
line-length:
level: warning
max: 120

27
tox.ini
View File

@ -1,27 +0,0 @@
[tox]
minversion = 1.6
envlist = linters,bashate,py34,py27,pep8
skipsdist = True
[testenv:pep8]
commands = flake8 {posargs}
[testenv:venv]
commands = {posargs}
[testenv:linters]
deps = yamllint
commands =
{toxinidir}/tools/yamllint.sh
[testenv:bashate]
deps = bashate>=0.2
whitelist_externals = bash
commands = bash -c "find {toxinidir} -type f -name '*.sh' -not -path '*/.tox/*' -print0 | xargs -0 bashate -v"
[flake8]
# E123, E125 skipped as they are invalid PEP-8.
show-source = True
ignore = E123,E125,H102
builtins = _
exclude=.venv,.git,.tox,dist,doc,*openstack/common*,*lib/python*,*egg,build