5.1 KiB
Upgrade Apache Kafka client
Include the URL of your story:
https://storyboard.openstack.org/#!/story/2003705
Currently in all Python Monasca components the copy of kafka-python library in version 0.9.5 (released on Feb 16, 2016) is used1. This specification describes the process of upgrading the Apache Kafka client to confluent-kafka-python2. This will improve the performance and reliability. Sticking with the old frozen client version is also unacceptable in terms of security.
Problem description
The use of KeyedProducer and SimpleConsumer in kafka-python library has been deprecated as of
version 1.0.03. Further use of this code poses a
security risk. Additionally, profiling of monasca-persister
has shown that most of the time is spent during the consumption of Kafka
messages4. Thus, there is a big potential on
improving overall Monasca performance by upgrading the used Kafka
client.
Proposed change
The wiki page hosted by Apache Software Foundation lists available Python clients5. There are currently three actively maintained and supported clients: confluent-kafka-python, kafka-python and pykafka. Several benchmarks have shown6,7 that the client maintained by Confluent is both the fastest and most complete.
There is significant performance improvement when using asynchronous producer (~50x). Sending messages asynchronously will require more care to avoid duplicating the persisted data but performance gain justifies that.
confluent-kafka-python is also the only client which offers support for Apache Avro serialization which reduces the size of messages and thus additionally speeds up communication.
The proposed change includes using:
- confluent-kafka-python library
- in asynchronous mode
Code changes will affect following components:
- monasca-common
- monasca-{log,event}-api
- monasca-persister
- monasca-notification
- monasca-transform
Java components (monasca-thresh and monasca-persister) are out of scope of this specification. Client upgrading in these components should be handled separately.
This client has an external dependency on librdkafka, a finely tuned C client.
Alternatives
- pykafka
- new version of kafka-python
- use synchronous mode
Data model impact
No data model impact.
REST API impact
No REST API impact.
Security impact
This change will improve the security because of removing the deprecated and unmaintained code.
Other end user impact
No end user impact.
Performance Impact
This change should dramatically improve the performance of the complete solution. In particular performance of monasca-persister and monasca-api is expected to improve.
Other deployer impact
New libraries should be packaged and deployed:
- confluent-kafka-python
- librdkafka
Developer impact
confluent-kafka-python has to be used instead of kafka-python in all affected components.
Implementation
Assignee(s)
- Primary assignee:
-
witek
- Other contributors:
-
<>
Work Items
- remove code using pykafka
- remove pykafka from requirements and lower-constraints
- add confluent-kafka-python to global-requirements
- implement common routines in
monasca-common
- use new code in:
monasca-{log,events}-api
monasca-persister
monasca-notification
monasca-transform
- delete old deprecated code
Dependencies
New packages have to be build for:
- confluent-kafka-python
- librdkafka
Testing
We should test the implementation using existing integration tests (tempest). Additionally we should test the scenario when the producer fails to receive response from Kafka for some of the messages in the bulk. It should be avoided that duplicate entries are created in the database.
The implantation should be followed by executing following tests on the complete stack:
- stress
- endurance
- performance
Documentation Impact
No documentation impact.
References
History
Release Name | Description |
---|---|
Stein | Introduced |
https://github.com/dpkp/kafka-python/blob/master/docs/changelog.rst#100-feb-15-2016↩︎
http://git.openstack.org/cgit/openstack/monasca-persister/commit/?id=a7112fd30bd545dd850e0e267dcceb9ea27551ad↩︎
https://cwiki.apache.org/confluence/display/KAFKA/Clients#Clients-Python↩︎
https://github.com/monasca/monasca-perf/blob/master/kafka_python_client_perf/monascaInvestigationKafkaPythonAPIs.md↩︎
http://activisiongamescience.github.io/2016/06/15/Kafka-Client-Benchmarking/↩︎