Added a call to stop all react scripts, by passing all known routing
keys to the react_killer function. Then stop the threadpool executor.
Finally, Raise a known exception to stop the watchdog thread. This
will throw an ugly traceback, but will shutdown the engine
gracefully.
Also made minor changes to the example react.json, to change the log
format. Knowing the time a log was printed is useful.
Change-Id: Ibed06f79547312d188feb499f937eb5390d60c3e
When watchdog detects that repair script(s) have been killed, get
a list of scripts to nuke and pass to stop_repair_scripts. Then,
get its routing key(s), and send a message from a special user to
any queue listening on those keys.
Modified an example repair script to show how it could be killed,
but need a more concrete way that that. For now, messages from
'react_killer' will raise the RepairStoppedException, which will
stop react scripts
Modified the example engine cfg to have some details about the
kombu connection to use.
Implements blueprint kill-repair-scripts
Change-Id: I67e15e9b9ebb5d36c5cb0e01995bc95f7a73b3dd
Added a parser and function to unregister repair scripts. Remove
the repair script from the backend repair cfg, and watchdog will
catch it in the engine.
Change-Id: I7b93ca7e5eb4430b7c9502c8dd84af75b2a9fae3
We use this field when disabling an engine. Stay clear of possible
KeyErrors by adding this field when registering an engine.
Change-Id: Iacca4f99be018b5147b6a00f82e7c772ec88a8f3
Allow users to remove audit scripts through a simple CLI call. The
unregister-audit call will remove an audit script from audit.cfg,
thus stopping the execution of all but the currently scheduled audit
scripts.
partially implements blueprint simple api
Change-Id: I1d5328d87b607c2f5cfdaebb7448a11673f38d48
Rename scheduler_parser to start_engine parser to make more sense,
also add the -p (purge) option that an earlier commit needs.
Change-Id: I5058e6f60804166e71fd5fefb2e298d27630b3d8
Add a call to stop engine, this function just sets the enabled field
to false, and psutil terminates the process (equivalent to what we
do now). Also add psutil to requirements
Change-Id: Idc3edb2bf1c9ed55d7d77973c59e3d3562e2ad8b
Add a field called enabled in the engine cfg file that keeps track of
files. This can be used to keep track of stopped engines, and
monitored using watchdog/similar. Also added some more checks at
engine creation time.
Partially Closes-bug:1309406
Change-Id: I1c365c2c438e6ed0a44413e1d09c69d3fab7ab7b
Abstract out cfg file operations in the register workflow. Changed
the get_driver code in engine to be static, so we can call it from
main too.
Added a new function in file_backend to return the config file given
the script type. Eg. return audit_cfg for audit.
Added a new function in file_backend to replace check_duplicate, that
returns True if a script already is already registered.
Added a couple of string variables in base.py
The function get_cfg_file, when using a db, will actually return a
table. So this belongs in the backend, the code refactor here ensures
this function is not called in the main() code.
Raise errors instead of returning None in the some backend functions
Completes blueprint backend-abstraction
Change-Id: I20d6bd46caf56c750e4b1193a6f5d00ce4e930f6
Added code to load the file backend as a driver in engine __init__,
and a function in FileBackend to create cfg files.
Also added extra field in engine.cfg to specify what kind of backend
to use.
Change-Id: I6d3f24d4f676c72c94afff2c4c7f54a35cf1d4b1
If those files don't exist, create them at startup. Else adding the first audit
or repair script fails because the cfg file doesn't exist.
Change-Id: I9143f59364167e98f69616351b4ac1df8ebd4ff8
Try to follow the format
name1:
arg1:
arg2:
name2:
arg1:
arg2:
Changes in several places to reflect the new format.
Change-Id: I182bbb701ac0e1885078f9ec3789fcff799acf5a
Store pid along with other details. Change some code to take care of the new
format, including the utils function that reads yaml.
Henceforth, try to keep yaml files in the project in the format:
name1:
arg1:
arg2:
name2:
arg1:
arg2:
Also wrote a function to write yaml in utils.py
Change-Id: I838ec927a439ac1aeea88ba0b1d71fc782777204
There were some bugs in the commit to create queues dynamically. The
engine now creates queues that are needed, and passes to react scripts.
Also made some fixes to example code, added some config files. They
contain usernames, but should be simple enough to modify and test
Change-Id: Ife1977b3f8d669024fd853b6691300b5dd4fd73f
Correct logging: reset loggers in engine.py, and setup your own
Added functions to setup logging in engine.py and Audit class,
can be used by audit scripts. Added a reset_logger function in utils
Expose __main__.py as a CLI on packaging. So after installing
the entropy package, you can call commands like
entropy register-audit
entropy start-engine
from anywhere in your machine. Made changes to setup.cfg to make
the main function an entry point.
Moved the engine_cfg file to /tmp/engines.cfg. That way, even though
it is hardcoded, it's at least fairly uniform across machines.
Change-Id: I704bf5e4635ffc539d7a73c5f84ef4bf8b2e801e
Set logging handlers in each file, instead of a global one
Move CLI output to stdout
Remove one hardcoded value
Change-Id: I0d1bfcbd642bdc43547bf177bed53c32eaf956b9
Remain consistent with other options and just pass a name and
config file to register a new engine.
Changed the cmdline call start-engine a bit, and some changes
to account for the difference in input.
Change-Id: Idc73528dc39d79d530a5a5c901761ef59ab13f33
We need to separate out engine code and audit and repair scripts if
pypi distribution is reqd. This is handled in this commit.
Further commits will remove the two variables hardcoded currently,
and expose __main__ as a cmdline script on installation.
Use imp for module finding and loading, use full path to script in
audit and repair cfg files.
Move audit and repair scripts to examples. Make some changes to hardcoded
stuff to account for this
Update gitignore
Change-Id: I50831003c6f7272967dbeb5c558b76b0183c91be
Added an audit script with functions to flavor-list, boot and
delete vms. Use CLI for now, can switch to novaclient later.
Use paramiko to ssh onto a host with access to the cluster,
then run nova list, delete the vms we created, and boot a new
one.
Added a new queue for this audit/repair pair, dynamically creating
queues or sharing them is TODO.
Added react script, removed the ability to call from commandline.
Can show off the feature without that.
Added templates for conf files
Change-Id: I3fe70534573aa70bd9407e18dcbe11e0e784595c
Remove usage of globals.py
Rewrite register-audit and register-repair to avoid using globals.
Get cfg file from engine name and script type instead.
Remove cfg files from git
Change-Id: I8ee119b4ebf55fa18ff4f6a83c0859ddc6699c5f
Moved all major functionalities, like adding audit/repair scripts,
the watchdog handler, etc to the engine class. There is still some
cleanup to do, like getting rid of some references to the globals
variables, which will come in part III, in progress.
Realized that we need full names to at least the audit.cfg and
repair.cfg files, removing all the cfg files from git, because
it makes the repo look ugly with usernames in them. Will add in
sample cfg files, even though they shouldn't really be created
manually.
Verified that the code is now in the same state as before we used
a scheduler class, ie no difference in expected behavior
Change-Id: If9eeb9201ac6dd30705c3246c304b304054dc577
Move start-scheduler logic into a class (in engine.py). Call to
start-engine() will create a new engine class, and run a watchdog
thread on the cfg files associated with that engine.
In followup commits, will also start scheduler in the constructor,
and move more code into the class. The existing globals file can
be deleted at that point.
Change-Id: I3a547a538fecaabdb84c927df9439c30119bf74f
While trying to migrate stuff to a class I noticed some functions
that could be in utils.
Removed a join statement that shouldn't be there because we use
futures now.
Change-Id: Iab457a4b34ff176a6e39f1a02ba9b5377602e652
The right thing to do is use a thread pool, that is the like for
like replacement for what was going on earlier.
use all_futures to track all currently started scripts, will use
this to kill threads, etc later.
Change-Id: I5274a381cb0ff8744cb1efee265c7d1e74895098
Currently we use threading, and do thread.join() to start audit/repair
scripts. Using futures and executors makes the code simpler to read. Note
that it might not necessarily improve performance. I set the max number of
workers arbitrarily to 8, but we can work out a way to set this number
correctly.
Change-Id: I3c9f3194753c79d57204b49c7ec2444fd454bfc7
Newly added jobs will change either audit.cfg or repair.cfg,
which are watched by watchdog. Call the right function to add
the job to currently running ones.
Minor typo fixes to cfg files for scripts.
Use utils.load_yaml instead of yaml.load_all. That function
uses safe_load_all, so better security-wise
Change-Id: Ib8137a0a1d9a3b960d9c64f9c6424709d57b8747
There is some redundant code in start_scheduler() when starting
scripts. Write start_scripts() to address this.
Change-Id: I1f6b9a66c7385bbb066cefdf0cfb4cb176d84805
Watch cfg files for changes, if so, call right callback.
Move audit.cfg and repair.cfg into cfg/ for easier
monitoring using watchdog.
Change-Id: Iace75f36f0bfb5b83fe53c7d63b110f10534808f
Store list of running audits and repairs, will aid in other things
later on, like preventing duplicates, adding scripts at runtime,
etc.
Removed needless import in vm_count
Change-Id: I9ed811783e5bc4a7799e8a5f73a4e55a15fdfee4
vm_count.py gets number of vms running in a cluster,
react will throw an error if it's above a limit.
Not adding vm_count.json to git, similar to audit.json,
contains api and compute hostnames in addition.
Don't use extrapolation for every log message
Add audit conf files to gitignore
Remove stevedore stuff, not using now, can add back later if needed
Use libvirt bindings to talk to hypervisor. Only one hypervisor
for now, changes soon for multiple hypervisors.
Change-Id: I843e3600a62cb6698526b3498358e4b90121ba1a
Load modules dynamically, allowing better control over audit/react scripts
Change code structure a bit (put audit scripts in audit/ dir, react scripts
in repair/ dir)
Enable stevedore, for audit/react scripts installed with the package
Remove all homedir references
Change-Id: I7351d6b7cd9ca5ba9cfa9526dfbefbfecacc3dc8
Use register-audit to register audit script, register-repair to
register repair script. start-scheduler to start all the react
scripts and then schedule audit scripts.
Added audit.cfg and repair.cfg files to strore registered scripts,
this will help restart after failure.
Added globals.py to store global variables
Removed validate_cfg function that wasn't doing anything
Change-Id: Id9140d2665e5710e6ffe2ed707135ff9a30ccdff
As described in https://pypi.python.org/pypi/pause ,
the pause library has higher precision than the sleep lib,
and uses machine timestamp instead of counters.
Change-Id: I0f1135757ef8d1ed6e4eb203b84632ef5ec91977
Move entropy.py -> __main__.py so that it can just
be used by running $ python entropy (inside the
entropy root folder).
Also use yaml loading instead of json loading since
yaml allows for comments inside the file (yaml is
a superset of json).
Removed import json from __main__.py to keep pep8 happy
Also changed react.py to use a per module logger instead
of root logger
Change-Id: I5eb24319dee4f04891878c6e61cc4d7835b14d34