Commit Graph

43 Commits

Author SHA1 Message Date
Pranesh Pandurangan c79d12c645 Finish changes to stop an engine
Added a call to stop all react scripts, by passing all known routing
keys to the react_killer function. Then stop the threadpool executor.
Finally, Raise a known exception to stop the watchdog thread. This
will throw an ugly traceback, but will shutdown the engine
gracefully.

Also made minor changes to the example react.json, to change the log
format. Knowing the time a log was printed is useful.

Change-Id: Ibed06f79547312d188feb499f937eb5390d60c3e
2014-07-14 16:47:43 -07:00
Pranesh Pandurangan 5b647e5f79 Add logic to stop repair scripts
When watchdog detects that repair script(s) have been killed, get
a list of scripts to nuke and pass to stop_repair_scripts. Then,
get its routing key(s), and send a message from a special user to
any queue listening on those keys.

Modified an example repair script to show how it could be killed,
but need a more concrete way that that. For now, messages from
'react_killer' will raise the RepairStoppedException, which will
stop react scripts

Modified the example engine cfg to have some details about the
kombu connection to use.

Implements blueprint kill-repair-scripts
Change-Id: I67e15e9b9ebb5d36c5cb0e01995bc95f7a73b3dd
2014-06-20 11:41:23 -07:00
Pranesh Pandurangan 7a6999c9eb Add an unregister repair option
Added a parser and function to unregister repair scripts. Remove
the repair script from the backend repair cfg, and watchdog will
catch it in the engine.

Change-Id: I7b93ca7e5eb4430b7c9502c8dd84af75b2a9fae3
2014-06-19 01:18:59 -07:00
Jenkins e2f7101b0b Merge "When creating an engine, add an enabled field" 2014-06-17 00:53:32 +00:00
Pranesh Pandurangan f3c51ab67d When creating an engine, add an enabled field
We use this field when disabling an engine. Stay clear of possible
KeyErrors by adding this field when registering an engine.

Change-Id: Iacca4f99be018b5147b6a00f82e7c772ec88a8f3
2014-06-12 23:08:41 -07:00
Pranesh Pandurangan 473e2febc0 Add an audit unregister option
Allow users to remove audit scripts through a simple CLI call. The
unregister-audit call will remove an audit script from audit.cfg,
thus stopping the execution of all but the currently scheduled audit
scripts.

partially implements blueprint simple api
Change-Id: I1d5328d87b607c2f5cfdaebb7448a11673f38d48
2014-06-11 03:07:58 +00:00
Pranesh Pandurangan b1ca5a4c91 Refactor some parser code
Rename scheduler_parser to start_engine parser to make more sense,
also add the -p (purge) option that an earlier commit needs.

Change-Id: I5058e6f60804166e71fd5fefb2e298d27630b3d8
2014-06-11 03:07:23 +00:00
Pranesh Pandurangan 47e35d1d2c Add a stop engine call
Add a call to stop engine, this function just sets the enabled field
to false, and psutil terminates the process (equivalent to what we
do now). Also add psutil to requirements

Change-Id: Idc3edb2bf1c9ed55d7d77973c59e3d3562e2ad8b
2014-06-10 20:06:05 -07:00
pran1990 81a9042ff1 Add some more checks to engine creation
Add a field called enabled in the engine cfg file that keeps track of
files. This can be used to keep track of stopped engines, and
monitored using watchdog/similar. Also added some more checks at
engine creation time.

Partially Closes-bug:1309406
Change-Id: I1c365c2c438e6ed0a44413e1d09c69d3fab7ab7b
2014-06-10 19:20:55 -07:00
Pranesh Pandurangan 0be7808705 Move register-audit/repair code to backend
Abstract out cfg file operations in the register workflow. Changed
the get_driver code in engine to be static, so we can call it from
main too.

Added a new function in file_backend to return the config file given
the script type. Eg. return audit_cfg for audit.

Added a new function in file_backend to replace check_duplicate, that
returns True if a script already is already registered.

Added a couple of string variables in base.py

The function get_cfg_file, when using a db, will actually return a
table. So this belongs in the backend, the code refactor here ensures
this function is not called in the main() code.

Raise errors instead of returning None in the some backend functions

Completes blueprint backend-abstraction
Change-Id: I20d6bd46caf56c750e4b1193a6f5d00ce4e930f6
2014-06-09 19:50:56 -07:00
pran1990 d4fcac48f1 Move cfg file creation to driver
Added code to load the file backend as a driver in engine __init__,
and a function in FileBackend to create cfg files.

Also added extra field in engine.cfg to specify what kind of backend
to use.

Change-Id: I6d3f24d4f676c72c94afff2c4c7f54a35cf1d4b1
2014-06-02 20:31:14 +00:00
pran1990 0f5954359c Move some file creation code from main to utils
As part of engine creation, we create files for audit and repair cfg.
Move this to utils.

Change-Id: I9d5075cb854ab5585fddcb58013ffa2530add970
2014-06-02 13:29:19 -07:00
pran1990 9e68d9cdd1 Add some structure for backend abstraction
Add some backend placeholders.

implements blueprint backend-abstraction
Change-Id: Ia6bd3020d2f666ee317e1cf89ae2f3e10e6977aa
2014-05-30 08:41:29 +00:00
pran1990 1d38e2ca2b Create audit and repair cfg at startup
If those files don't exist, create them at startup. Else adding the first audit
or repair script fails because the cfg file doesn't exist.

Change-Id: I9143f59364167e98f69616351b4ac1df8ebd4ff8
2014-05-19 21:31:11 +00:00
pran1990 38520d41d8 Change format of audit and react cfg files
Try to follow the format
name1:
    arg1:
    arg2:
name2:
    arg1:
    arg2:

Changes in several places to reflect the new format.
Change-Id: I182bbb701ac0e1885078f9ec3789fcff799acf5a
2014-05-18 00:19:33 -07:00
pran1990 ef2b443dac Change format of engine configuration file
Store pid along with other details. Change some code to take care of the new
format, including the utils function that reads yaml.

Henceforth, try to keep yaml files in the project in the format:
name1:
    arg1:
    arg2:
name2:
    arg1:
    arg2:

Also wrote a function to write yaml in utils.py

Change-Id: I838ec927a439ac1aeea88ba0b1d71fc782777204
2014-05-17 22:40:46 -07:00
pran1990 479662ad48 More logging fixes, and queue work
There were some bugs in the commit to create queues dynamically. The
engine now creates queues that are needed, and passes to react scripts.

Also made some fixes to example code, added some config files. They
contain usernames, but should be simple enough to modify and test

Change-Id: Ife1977b3f8d669024fd853b6691300b5dd4fd73f
2014-04-27 14:08:26 -07:00
pran1990 24983c6fc6 Make entropy suitable for pypi distribution, part 2
Correct logging: reset loggers in engine.py, and setup your own
Added functions to setup logging in engine.py and Audit class,
can be used by audit scripts. Added a reset_logger function in utils

Expose __main__.py as a CLI on packaging. So after installing
the entropy package, you can call commands like
entropy register-audit
entropy start-engine
from anywhere in your machine. Made changes to setup.cfg to make
the main function an entry point.

Moved the engine_cfg file to /tmp/engines.cfg. That way, even though
it is hardcoded, it's at least fairly uniform across machines.

Change-Id: I704bf5e4635ffc539d7a73c5f84ef4bf8b2e801e
2014-04-15 23:03:04 -07:00
pran1990 92b73b563b Make entropy suitable for pypi distribution, part 1
Set logging handlers in each file, instead of a global one
Move CLI output to stdout
Remove one hardcoded value

Change-Id: I0d1bfcbd642bdc43547bf177bed53c32eaf956b9
2014-04-14 17:52:26 -07:00
pran1990 e58f59c858 Use a config file to register engines
Remain consistent with other options and just pass a name and
config file to register a new engine.

Changed the cmdline call start-engine a bit, and some changes
to account for the difference in input.

Change-Id: Idc73528dc39d79d530a5a5c901761ef59ab13f33
2014-04-13 16:39:55 -07:00
pran1990 e4fd9aa1f2 Make entropy suitable for pypi distribution, part 0
We need to separate out engine code and audit and repair scripts if
pypi distribution is reqd. This is handled in this commit.

Further commits will remove the two variables hardcoded currently,
and expose __main__ as a cmdline script on installation.

Use imp for module finding and loading, use full path to script in
audit and repair cfg files.

Move audit and repair scripts to examples. Make some changes to hardcoded
stuff to account for this

Update gitignore
Change-Id: I50831003c6f7272967dbeb5c558b76b0183c91be
2014-04-10 13:08:51 -07:00
pran1990 090fc4dd21 Cleanup logging
Change inappropriate usages of LOG.error and LOG.warning.

Change-Id: I7964aaabc5533e6f438c56bde6817ba6a939b283
2014-03-27 15:08:27 -07:00
pran1990 b65a8a4d12 Add vmbooter audit and react scripts
Added an audit script with functions to flavor-list, boot and
delete vms. Use CLI for now, can switch to novaclient later.

Use paramiko to ssh onto a host with access to the cluster,
then run nova list, delete the vms we created, and boot a new
one.

Added a new queue for this audit/repair pair, dynamically creating
queues or sharing them is TODO.

Added react script, removed the ability to call from commandline.
Can show off the feature without that.

Added templates for conf files

Change-Id: I3fe70534573aa70bd9407e18dcbe11e0e784595c
2014-03-26 19:44:38 -07:00
pran1990 a7a4d5feee Do not start scheduler in constructor
Set up the engine object in the constructor, and start it in a
separate run function

Change-Id: Ic688c3e8059f18e328735b6dd2a55ae86745fc50
2014-03-17 16:53:21 -07:00
pran1990 aa5e2e9775 Remove unused global
The variable entropy_engine doesn't really have to be a global

Change-Id: I26ea9b356d5de99a4657e55a9a6e503e5fb8833f
2014-03-17 16:52:49 -07:00
pran1990 616e6c69d0 Move code into a scheduler class, part III
Remove usage of globals.py

Rewrite register-audit and register-repair to avoid using globals.
Get cfg file from engine name and script type instead.

Remove cfg files from git

Change-Id: I8ee119b4ebf55fa18ff4f6a83c0859ddc6699c5f
2014-03-16 23:09:24 -07:00
pran1990 3ac2fde405 Move code into a scheduler class, part II
Moved all major functionalities, like adding audit/repair scripts,
the watchdog handler, etc to the engine class. There is still some
cleanup to do, like getting rid of some references to the globals
variables, which will come in part III, in progress.

Realized that we need full names to at least the audit.cfg and
repair.cfg files, removing all the cfg files from git, because
it makes the repo look ugly with usernames in them. Will add in
sample cfg files, even though they shouldn't really be created
manually.

Verified that the code is now in the same state as before we used
a scheduler class, ie no difference in expected behavior

Change-Id: If9eeb9201ac6dd30705c3246c304b304054dc577
2014-03-16 16:59:08 -07:00
pran1990 925f45cf45 Move code into a scheduler class: part 1
Move start-scheduler logic into a class (in engine.py). Call to
start-engine() will create a new engine class, and run a watchdog
thread on the cfg files associated with that engine.

In followup commits, will also start scheduler in the constructor,
and move more code into the class. The existing globals file can
be deleted at that point.

Change-Id: I3a547a538fecaabdb84c927df9439c30119bf74f
2014-03-14 00:44:42 -07:00
pran1990 adaeaf3c05 Clean up code a bit
While trying to migrate stuff to a class I noticed some functions
that could be in utils.

Removed a join statement that shouldn't be there because we use
futures now.

Change-Id: Iab457a4b34ff176a6e39f1a02ba9b5377602e652
2014-03-13 17:34:29 -07:00
pran1990 5c290a4b3b Use a thread pool, not process pool
The right thing to do is use a thread pool, that is the like for
like replacement for what was going on earlier.

use all_futures to track all currently started scripts, will use
this to kill threads, etc later.

Change-Id: I5274a381cb0ff8744cb1efee265c7d1e74895098
2014-03-12 12:33:46 -07:00
pran1990 ebf5c34f9b Fix some bugs
load_yaml needs a string, not a file handle

Change-Id: I8743ac1dfa3730cf53c329700888c014c86a68f7
2014-03-11 19:34:22 -07:00
pran1990 a1360a4953 Use ProcessPoolExecutor instread of threading
Currently we use threading, and do thread.join() to start audit/repair
scripts. Using futures and executors makes the code simpler to read. Note
that it might not necessarily improve performance. I set the max number of
workers arbitrarily to 8, but we can work out a way to set this number
correctly.

Change-Id: I3c9f3194753c79d57204b49c7ec2444fd454bfc7
2014-03-11 17:46:15 -07:00
Debo~ Dutta bf2e38f03b test commit - fix typo
Change-Id: I264869b5bd3edecfdf2346fe8f453700f6622ec5
2014-03-04 00:31:20 -08:00
pran1990 d509cbeb55 Add jobs at runtime
Newly added jobs will change either audit.cfg or repair.cfg,
which are watched by watchdog. Call the right function to add
the job to currently running ones.

Minor typo fixes to cfg files for scripts.

Use utils.load_yaml instead of yaml.load_all. That function
uses safe_load_all, so better security-wise

Change-Id: Ib8137a0a1d9a3b960d9c64f9c6424709d57b8747
2014-02-24 19:33:48 -08:00
pran1990 75fc1bd888 Restructure code a bit
There is some redundant code in start_scheduler() when starting
scripts. Write start_scripts() to address this.

Change-Id: I1f6b9a66c7385bbb066cefdf0cfb4cb176d84805
2014-02-24 19:25:12 -08:00
pran1990 02dceddfa7 Introduce watchdog into entropy
Watch cfg files for changes, if so, call right callback.
Move audit.cfg and repair.cfg into cfg/ for easier
monitoring using watchdog.

Change-Id: Iace75f36f0bfb5b83fe53c7d63b110f10534808f
2014-02-24 19:15:09 -08:00
pran1990 52bff736c9 Store list of running audits and repairs
Store list of running audits and repairs, will aid in other things
later on, like preventing duplicates, adding scripts at runtime,
etc.

Removed needless import in vm_count
Change-Id: I9ed811783e5bc4a7799e8a5f73a4e55a15fdfee4
2014-02-24 18:48:50 -08:00
pran1990 bffddde2f9 Add an example audit/repair script
vm_count.py gets number of vms running in a cluster,
react will throw an error if it's above a limit.

Not adding vm_count.json to git, similar to audit.json,
contains api and compute hostnames in addition.

Don't use extrapolation for every log message
Add audit conf files to gitignore

Remove stevedore stuff, not using now, can add back later if needed

Use libvirt bindings to talk to hypervisor. Only one hypervisor
for now, changes soon for multiple hypervisors.

Change-Id: I843e3600a62cb6698526b3498358e4b90121ba1a
2014-02-10 18:08:26 -08:00
pran1990 1c9c640da3 Remove some hardcoding
Use a new module field in audit.json to specify which module to
load from the scheduler

Change-Id: I6d9f846be3e379b5179da740697962a01f591d21
2014-02-07 00:49:46 -08:00
pran1990 5fc67635ad Enable stevedore and dynamic loading
Load modules dynamically, allowing better control over audit/react scripts
Change code structure a bit (put audit scripts in audit/ dir, react scripts
in repair/ dir)
Enable stevedore, for audit/react scripts installed with the package
Remove all homedir references

Change-Id: I7351d6b7cd9ca5ba9cfa9526dfbefbfecacc3dc8
2014-02-03 00:01:15 -08:00
pran1990 ee0cc7d4c7 Change code structure a bit
Use register-audit to register audit script, register-repair to
register repair script. start-scheduler to start all the react
scripts and then schedule audit scripts.

Added audit.cfg and repair.cfg files to strore registered scripts,
this will help restart after failure.

Added globals.py to store global variables

Removed validate_cfg function that wasn't doing anything

Change-Id: Id9140d2665e5710e6ffe2ed707135ff9a30ccdff
2014-01-03 16:29:22 -08:00
pran1990 a16b654009 Use pause library for sleeping
As described in https://pypi.python.org/pypi/pause ,
the pause library has higher precision than the sleep lib,
and uses machine timestamp instead of counters.

Change-Id: I0f1135757ef8d1ed6e4eb203b84632ef5ec91977
2013-12-28 19:43:57 -08:00
Joshua Harlow 31413508a9 Small adjustments
Move entropy.py -> __main__.py so that it can just
be used by running $ python entropy (inside the
entropy root folder).

Also use yaml loading instead of json loading since
yaml allows for comments inside the file (yaml is
a superset of json).

Removed import json from __main__.py to keep pep8 happy

Also changed react.py to use a per module logger instead
of root logger

Change-Id: I5eb24319dee4f04891878c6e61cc4d7835b14d34
2013-12-16 23:47:17 -08:00