summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorWitold Bedyk <witold.bedyk@est.fujitsu.com>2018-06-05 15:32:09 +0200
committerWitold Bedyk <witold.bedyk@est.fujitsu.com>2018-06-19 15:19:00 +0200
commit20d655774440595d06675fc1ccff34b3f3a4321c (patch)
tree879165c4475b68781f8d79a6441d2e0f2d4c448f
parent086428009c90a4048d80a8b8dae86c7f024360c8 (diff)
Convert README to reStructuredText
* Add PyPI validation check for README.rst [1] * Add docutils to test-requirements.txt * Add lower bound for jira [1] https://docs.openstack.org/project-team-guide/project-setup/python.html#running-the-style-checks Change-Id: I5d90ccb1b919c4bab66b468a8ddb714ffc5f1635 Story: 2001980 Task: 20013
Notes
Notes (review): Code-Review+1: Denis Poisson <denis.poisson@ts.fujitsu.com> Code-Review+2: Dobroslaw Zybort <dobroslaw.zybort@ts.fujitsu.com> Workflow+1: Dobroslaw Zybort <dobroslaw.zybort@ts.fujitsu.com> Verified+2: Zuul Submitted-by: Zuul Submitted-at: Thu, 05 Jul 2018 06:50:02 +0000 Reviewed-on: https://review.openstack.org/572393 Project: openstack/monasca-notification Branch: refs/heads/master
-rw-r--r--README.md109
-rw-r--r--README.rst139
-rw-r--r--lower-constraints.txt1
-rw-r--r--setup.cfg4
-rw-r--r--test-requirements.txt1
-rw-r--r--tox.ini1
6 files changed, 144 insertions, 111 deletions
diff --git a/README.md b/README.md
deleted file mode 100644
index d17ce85..0000000
--- a/README.md
+++ /dev/null
@@ -1,109 +0,0 @@
1Team and repository tags
2========================
3
4[![Team and repository tags](https://governance.openstack.org/tc/badges/monasca-notification.svg)](https://governance.openstack.org/tc/reference/tags/index.html)
5
6<!-- Change things from this point on -->
7
8# Notification Engine
9
10This engine reads alarms from Kafka and then notifies the customer using their configured notification method.
11Multiple notification and retry engines can run in parallel up to one per available Kafka partition. Zookeeper
12is used to negotiate access to the Kafka partitions whenever a new process joins or leaves the working set.
13
14# Architecture
15The notification engine generates notifications using the following steps:
161. Reads Alarms from Kafka, with no auto commit. - KafkaConsumer class
172. Determine notification type for an alarm. Done by reading from mysql. - AlarmProcessor class
183. Send Notification. - NotificationProcessor class
194. Successful notifications are added to a sent notification topic. - NotificationEngine class
205. Failed notifications are added to a retry topic. - NotificationEngine class
216. Commit offset to Kafka - KafkaConsumer class
22
23The notification engine uses three Kafka topics:
241. alarm_topic: Alarms inbound to the notification engine.
252. notification_topic: Successfully sent notifications.
263. notification_retry_topic: Unsuccessful notifications.
27
28A retry engine runs in parallel with the notification engine and gives any
29failed notification a configurable number of extra chances at succeess.
30
31The retry engine generates notifications using the following steps:
321. Reads Notification json data from Kafka, with no auto commit. - KafkaConsumer class
332. Rebuild the notification that failed. - RetryEngine class
343. Send Notification. - NotificationProcessor class
354. Successful notifictions are added to a sent notification topic. - RetryEngine class
365. Failed notifications that have not hit the retry limit are added back to the retry topic. - RetryEngine class
376. Failed notifications that have hit the retry limit are discarded. - RetryEngine class
386. Commit offset to Kafka - KafkaConsumer class
39
40The retry engine uses two Kafka topics:
411. notification_retry_topic: Notifications that need to be retried.
422. notification_topic: Successfully sent notifications.
43
44## Fault Tolerance
45When reading from the alarm topic no committing is done. The committing is done only after processing. This allows
46the processing to continue even though some notifications can be slow. In the event of a catastrophic failure some
47notifications could be sent but the alarms not yet acknowledged. This is an acceptable failure mode, better to send a
48notification twice than not at all.
49
50The general process when a major error is encountered is to exit the daemon which should allow the other processes to
51renegotiate access to the Kafka partitions. It is also assumed the notification engine will be run by a process
52supervisor which will restart it in case of a failure. This way any errors which are not easy to recover from are
53automatically handled by the service restarting and the active daemon switching to another instance.
54
55Though this should cover all errors there is risk that an alarm or set of alarms can be processed and notifications
56sent out multiple times. To minimize this risk a number of techniques are used:
57
58- Timeouts are implemented with all notification types.
59- An alarm TTL is utilized. Any alarm older than the TTL is not processed.
60
61# Operation
62Yaml config file by default is in '/etc/monasca/notification.yaml', a sample is in this project.
63
64## Monitoring
65statsd is incorporated into the daemon and will send all stats to statsd server launched by monasca-agent.
66Default host and port points at **localhost:8125**.
67
68- Counters
69 - ConsumedFromKafka
70 - AlarmsFailedParse
71 - AlarmsNoNotification
72 - NotificationsCreated
73 - NotificationsSentSMTP
74 - NotificationsSentWebhook
75 - NotificationsSentPagerduty
76 - NotificationsSentFailed
77 - NotificationsInvalidType
78 - AlarmsFinished
79 - PublishedToKafka
80- Timers
81 - ConfigDBTime
82 - SendNotificationTime
83
84# Future Considerations
85- More extensive load testing is needed
86 - How fast is the mysql db? How much load do we put on it. Initially I think it makes most sense to read notification
87 details for each alarm but eventually I may want to cache that info.
88 - How expensive are commits to Kafka for every message we read? Should we commit every N messages?
89 - How efficient is the default Kafka consumer batch size?
90 - Currently we can get ~200 notifications per second per NotificationEngine instance using webhooks to a local
91 http server. Is that fast enough?
92 - Are we putting too much load on Kafka at ~200 commits per second?
93
94# License
95
96Copyright (c) 2014 Hewlett-Packard Development Company, L.P.
97
98Licensed under the Apache License, Version 2.0 (the "License");
99you may not use this file except in compliance with the License.
100You may obtain a copy of the License at
101
102 http://www.apache.org/licenses/LICENSE-2.0
103
104Unless required by applicable law or agreed to in writing, software
105distributed under the License is distributed on an "AS IS" BASIS,
106WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
107implied.
108See the License for the specific language governing permissions and
109limitations under the License.
diff --git a/README.rst b/README.rst
new file mode 100644
index 0000000..a3aeec3
--- /dev/null
+++ b/README.rst
@@ -0,0 +1,139 @@
1Team and repository tags
2========================
3
4|Team and repository tags|
5
6.. raw:: html
7
8 <!-- Change things from this point on -->
9
10Notification Engine
11===================
12
13This engine reads alarms from Kafka and then notifies the customer using
14the configured notification method. Multiple notification and retry
15engines can run in parallel, up to one per available Kafka partition.
16Zookeeper is used to negotiate access to the Kafka partitions whenever a
17new process joins or leaves the working set.
18
19Architecture
20============
21
22The notification engine generates notifications using the following
23steps:
24
251. Read Alarms from Kafka, with no auto commit. -
26 monasca\_common.kafka.KafkaConsumer class
272. Determine notification type for an alarm. Done by reading from mysql. - AlarmProcessor class
283. Send notification. - NotificationProcessor class
294. Add successful notifications to a sent notification topic. - NotificationEngine class
305. Add failed notifications to a retry topic. - NotificationEngine class
316. Commit offset to Kafka - KafkaConsumer class
32
33The notification engine uses three Kafka topics:
34
351. alarm\_topic: Alarms inbound to the notification engine.
362. notification\_topic: Successfully sent notifications.
373. notification\_retry\_topic: Failed notifications.
38
39A retry engine runs in parallel with the notification engine and gives
40any failed notification a configurable number of extra chances at
41success.
42
43The retry engine generates notifications using the following steps:
44
451. Read notification json data from Kafka, with no auto commit. - KafkaConsumer class
462. Rebuild the notification that failed. - RetryEngine class
473. Send notification. - NotificationProcessor class
484. Add successful notifications to a sent notification topic. - RetryEngine class
495. Add failed notifications that have not hit the retry limit back to the retry topic. -
50 RetryEngine class
516. Discard failed notifications that have hit the retry limit. - RetryEngine class
527. Commit offset to Kafka. - KafkaConsumer class
53
54The retry engine uses two Kafka topics:
55
561. notification\_retry\_topic: Notifications that need to be retried.
572. notification\_topic: Successfully sent notifications.
58
59Fault Tolerance
60---------------
61
62When reading from the alarm topic, no committing is done. The committing
63is done only after processing. This allows the processing to continue
64even though some notifications can be slow. In the event of a
65catastrophic failure some notifications could be sent but the alarms
66have not yet been acknowledged. This is an acceptable failure mode,
67better to send a notification twice than not at all.
68
69The general process when a major error is encountered is to exit the
70daemon which should allow the other processes to renegotiate access to
71the Kafka partitions. It is also assumed that the notification engine
72will be run by a process supervisor which will restart it in case of a
73failure. In this way, any errors which are not easy to recover from are
74automatically handled by the service restarting and the active daemon
75switching to another instance.
76
77Though this should cover all errors, there is the risk that an alarm or
78a set of alarms can be processed and notifications are sent out multiple
79times. To minimize this risk a number of techniques are used:
80
81- Timeouts are implemented for all notification types.
82- An alarm TTL is utilized. Any alarm older than the TTL is not
83 processed.
84
85Operation
86=========
87
88``oslo.config`` is used for handling configuration options. A sample
89configuration file ``etc/monasca/notification.conf.sample`` can be
90generated by running:
91
92::
93
94 tox -e genconfig
95
96Monitoring
97----------
98
99StatsD is incorporated into the daemon and will send all stats to the
100StatsD server launched by monasca-agent. Default host and port points to
101**localhost:8125**.
102
103- Counters
104
105 - ConsumedFromKafka
106 - AlarmsFailedParse
107 - AlarmsNoNotification
108 - NotificationsCreated
109 - NotificationsSentSMTP
110 - NotificationsSentWebhook
111 - NotificationsSentPagerduty
112 - NotificationsSentFailed
113 - NotificationsInvalidType
114 - AlarmsFinished
115 - PublishedToKafka
116
117- Timers
118
119 - ConfigDBTime
120 - SendNotificationTime
121
122Future Considerations
123=====================
124
125- More extensive load testing is needed:
126
127 - How fast is the mysql db? How much load do we put on it. Initially I
128 think it makes most sense to read notification details for each alarm
129 but eventually I may want to cache that info.
130 - How expensive are commits to Kafka for every message we read? Should
131 we commit every N messages?
132 - How efficient is the default Kafka consumer batch size?
133 - Currently we can get ~200 notifications per second per
134 NotificationEngine instance using webhooks to a local http server. Is
135 that fast enough?
136 - Are we putting too much load on Kafka at ~200 commits per second?
137
138.. |Team and repository tags| image:: https://governance.openstack.org/tc/badges/monasca-notification.svg
139 :target: https://governance.openstack.org/tc/reference/tags/index.html
diff --git a/lower-constraints.txt b/lower-constraints.txt
index 2065e59..6509c4b 100644
--- a/lower-constraints.txt
+++ b/lower-constraints.txt
@@ -4,6 +4,7 @@ bandit==1.4.0
4configparser==3.5.0 4configparser==3.5.0
5coverage==4.0 5coverage==4.0
6debtcollector==1.2.0 6debtcollector==1.2.0
7docutils==0.11
7extras==1.0.0 8extras==1.0.0
8fixtures==3.0.0 9fixtures==3.0.0
9flake8==2.5.5 10flake8==2.5.5
diff --git a/setup.cfg b/setup.cfg
index f1fcccd..d850df6 100644
--- a/setup.cfg
+++ b/setup.cfg
@@ -8,7 +8,7 @@ classifier=
8 License :: OSI Approved :: Apache Software License 8 License :: OSI Approved :: Apache Software License
9 Topic :: System :: Monitoring 9 Topic :: System :: Monitoring
10keywords = openstack monitoring email 10keywords = openstack monitoring email
11description-file = README.md 11description-file = README.rst
12home-page = https://github.com/stackforge/monasca-notification 12home-page = https://github.com/stackforge/monasca-notification
13license = Apache 13license = Apache
14 14
@@ -35,5 +35,5 @@ universal = 1
35 35
36[extras] 36[extras]
37jira_plugin = 37jira_plugin =
38 jira 38 jira>=1.0.3
39 Jinja2>=2.10 # BSD License (3 clause) 39 Jinja2>=2.10 # BSD License (3 clause)
diff --git a/test-requirements.txt b/test-requirements.txt
index 73c420e..c0cf4e9 100644
--- a/test-requirements.txt
+++ b/test-requirements.txt
@@ -15,3 +15,4 @@ testrepository>=0.0.18 # Apache-2.0/BSD
15SQLAlchemy!=1.1.5,!=1.1.6,!=1.1.7,!=1.1.8,>=1.0.10 # MIT 15SQLAlchemy!=1.1.5,!=1.1.6,!=1.1.7,!=1.1.8,>=1.0.10 # MIT
16PyMySQL>=0.7.6 # MIT License 16PyMySQL>=0.7.6 # MIT License
17psycopg2>=2.6.2 # LGPL/ZPL 17psycopg2>=2.6.2 # LGPL/ZPL
18docutils>=0.11 # OSI-Approved Open Source, Public Domain
diff --git a/tox.ini b/tox.ini
index b763daf..65716aa 100644
--- a/tox.ini
+++ b/tox.ini
@@ -43,6 +43,7 @@ basepython = python3
43commands = 43commands =
44 {[testenv:flake8]commands} 44 {[testenv:flake8]commands}
45 {[testenv:bandit]commands} 45 {[testenv:bandit]commands}
46 python setup.py check --restructuredtext --strict
46 47
47[testenv:venv] 48[testenv:venv]
48basepython = python3 49basepython = python3