2.0 KiB

Raw Blame History

Notification Engine

This engine reads alarms from Kafka and then notifies the customer using their configured notification method.

Architecture

There are four processing steps separated by queues implemented with python multiprocessing. The steps are:

Reads Alarms from Kafka. - KafkaConsumer class
Determine notification type for an alarm. Done by reading from mysql. - AlarmProcessor class
Send Notification. - NotificationProcessor class
Update Vertica and Kafka that the notifications were sent. SentNotificationProcessor class

There are three internal queues:

alarms - kafka alarms are added to this queue. Consists of Alarm objects.
notifications - notifications to be sent are added to this queue. Consists of Notification objects.
sent_notifications - notifications that have been sent are added here. Consists of Notification objects.

Notification classes inherit from the notification abstract class and implement their specific notification method.

High Availability

HA is handled by utilizing multiple partitions withing kafka. When multiple notification engines are running the partitions are spread out among them, as engines die/restart things reshuffle.

The final step of writing back to to Vertica that an alarm was sent then updating the kafka pointer, could fail to run in a catastrophic failure. This would result in multiple notifications which is an acceptable failure mode, better to send a notification twice than not at all.

It is assumed the notification engine will be run by a process supervisor which will restart it in case of a failure.

Operation

Yaml config file by default in '/etc/mon/notification.yaml' process runs via upstart script.

Future Considerations

How fast is the mysql db? How much load do we put on it. Initially I think it makes most sense to read notification details for each alarm but eventually I may want to cache that info.
I am starting with a single KafkaConsumer and a single SentNotificationProcessor depending on load this may need to scale.