Support timeout for stats capture cron job

In this charm we run a cron job to check rabbitmq status and it is
possible that the commands run could fail or hang if e.g. rabbit
is not healthy. Currently the cron will never timeout and could
hang forever so we add a new timeout config option 'cron-timeout'
which, when set, will result in the a SIGINT being sent to the
application and if that fails to exit within 10s a SIGKILL is sent.
We also fix logging so that all output goes to syslog local0.notice.

Change-Id: I0bb8780c5cc64a24384648f00c8068d5d666d28c
Closes-Bug: 1716854
This commit is contained in:
Zhang Hua 2017-09-26 14:45:24 +08:00
parent a68b912cf5
commit ff4da882a2
2 changed files with 13 additions and 1 deletions

View File

@ -83,6 +83,14 @@ options:
description: |
Cron schedule used to generate rabbitmq stats. To disable,
either unset this config option or set it to an empty string ('').
cron-timeout:
type: int
default: 300
description: |
Run a command with a time limit specified in seconds in cron.
This timeout will govern to the rabbitmq stats capture, and that once
the timeout is reached a SIGINT is sent to the program, if it doesn't
exits before 10 seconds a SIGKILL is sent.
queue_thresholds:
type: string
default: "[['\\*', '\\*', 100, 200]]"

View File

@ -109,6 +109,8 @@ STATS_CRONFILE = '/etc/cron.d/rabbitmq-stats'
STATS_DATAFILE = os.path.join(RABBIT_DIR, 'data',
'{}_queue_stats.dat'
''.format(rabbit.get_unit_hostname()))
CRONJOB_CMD = ("{schedule} root timeout -k 10s -s SIGINT {timeout} "
"{command} 2>&1 | logger -p local0.notice\n")
INITIAL_CLIENT_UPDATE_KEY = 'initial_client_update_done'
@ -590,7 +592,9 @@ def update_nrpe_checks():
os.path.join(NAGIOS_PLUGINS, 'check_rabbitmq_queues.py'))
if config('stats_cron_schedule'):
script = os.path.join(SCRIPTS_DIR, 'collect_rabbitmq_stats.sh')
cronjob = "{} root {}\n".format(config('stats_cron_schedule'), script)
cronjob = CRONJOB_CMD.format(schedule=config('stats_cron_schedule'),
timeout=config('cron-timeout'),
command=script)
rsync(os.path.join(charm_dir(), 'scripts',
'collect_rabbitmq_stats.sh'), script)
write_file(STATS_CRONFILE, cronjob)