Upgrade Apache Kafka client

Currently in all Python Monasca components the copy of `kafka-python`
library in version 0.9.5 (released on Feb 16, 2016) is used. This
specification describes the process of upgrading the Apache Kafka client
to `confluent-kafka-python`. This will improve the performance and
reliability. Sticking with the old frozen client version is also
unacceptable in terms of security.

Change-Id: I59f3effcdba39199d61d70a201d8e760840d3627
Story: 2003705
Task: 26360
This commit is contained in:
Witold Bedyk 2018-09-10 07:04:00 -06:00 committed by Doug Szumski
parent c13558ba09
commit a4674a5e7f
6 changed files with 223 additions and 0 deletions

View File

@ -27,6 +27,7 @@ Here you can find the specs, and spec template, for each release:
specs/queens/index
specs/rocky/index
specs/stein/index
There are also some approved backlog specifications that are looking for
owners:

View File

@ -0,0 +1 @@
../../../../specs/stein/approved

View File

@ -0,0 +1 @@
../../../../specs/stein/implemented

View File

@ -0,0 +1,26 @@
=============================
Monasca Stein Specifications
=============================
Template:
.. toctree::
:maxdepth: 1
Specification Template (Stein release) <template>
Stein implemented specs:
.. toctree::
:glob:
:maxdepth: 1
.. implemented/*
Stein approved (but not implemented) specs:
.. toctree::
:glob:
:maxdepth: 1
approved/*

View File

@ -0,0 +1 @@
../../../../specs/stein-template.rst

View File

@ -0,0 +1,193 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
===========================
Upgrade Apache Kafka client
===========================
Include the URL of your story:
https://storyboard.openstack.org/#!/story/2003705
Currently in all Python Monasca components the copy of `kafka-python` library
in version 0.9.5 (released on Feb 16, 2016) is used [1]_. This specification
describes the process of upgrading the Apache Kafka client to
`confluent-kafka-python` [2]_. This will improve the performance and
reliability. Sticking with the old frozen client version is also unacceptable
in terms of security.
Problem description
===================
The use of `KeyedProducer` and `SimpleConsumer` in `kafka-python` library has
been deprecated as of version 1.0.0 [3]_. Further use of this code poses a
security risk. Additionally, profiling of ``monasca-persister`` has shown that
most of the time is spent during the consumption of Kafka messages [7]_. Thus,
there is a big potential on improving overall Monasca performance by upgrading
the used Kafka client.
Proposed change
===============
The wiki page hosted by Apache Software Foundation lists available Python
clients [4]_. There are currently three actively maintained and supported
clients: `confluent-kafka-python`, `kafka-python` and `pykafka`. Several
benchmarks have shown [5]_, [6]_ that the client maintained by Confluent is
both the fastest and most complete.
There is significant performance improvement when using asynchronous producer
(~50x). Sending messages asynchronously will require more care to avoid
duplicating the persisted data but performance gain justifies that.
`confluent-kafka-python` is also the only client which offers support for
Apache Avro serialization which reduces the size of messages and thus
additionally speeds up communication.
The proposed change includes using:
* `confluent-kafka-python` library
* in asynchronous mode
Code changes will affect following components:
* monasca-common
* monasca-{log,event}-api
* monasca-persister
* monasca-notification
* monasca-transform
Java components (`monasca-thresh` and `monasca-persister`) are out of scope of
this specification. Client upgrading in these components should be handled
separately.
This client has an external dependency on `librdkafka`, a finely tuned C
client.
Alternatives
------------
* `pykafka`
* new version of `kafka-python`
* use synchronous mode
Data model impact
-----------------
No data model impact.
REST API impact
---------------
No REST API impact.
Security impact
---------------
This change will improve the security because of removing the deprecated and
unmaintained code.
Other end user impact
---------------------
No end user impact.
Performance Impact
------------------
This change should dramatically improve the performance of the complete
solution. In particular performance of `monasca-persister` and `monasca-api` is
expected to improve.
Other deployer impact
---------------------
New libraries should be packaged and deployed:
* `confluent-kafka-python`
* `librdkafka`
Developer impact
----------------
`confluent-kafka-python` has to be used instead of `kafka-python` in all
affected components.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
witek
Other contributors:
<>
Work Items
----------
* remove code using `pykafka`
* remove `pykafka` from requirements and lower-constraints
* add `confluent-kafka-python` to global-requirements
* implement common routines in `monasca-common`
* use new code in:
* monasca-{log,events}-api
* monasca-persister
* monasca-notification
* monasca-transform
* delete old deprecated code
Dependencies
============
New packages have to be build for:
* `confluent-kafka-python`
* `librdkafka`
Testing
=======
We should test the implementation using existing integration tests (tempest).
Additionally we should test the scenario when the producer fails to receive
response from Kafka for some of the messages in the bulk. It should be avoided
that duplicate entries are created in the database.
The implantation should be followed by executing following tests on the
complete stack:
* stress
* endurance
* performance
Documentation Impact
====================
No documentation impact.
References
==========
.. [1] https://github.com/dpkp/kafka-python/releases/tag/v0.9.5
.. [2] https://github.com/confluentinc/confluent-kafka-python
.. [3] https://github.com/dpkp/kafka-python/blob/master/docs/changelog.rst#100-feb-15-2016
.. [4] https://cwiki.apache.org/confluence/display/KAFKA/Clients#Clients-Python
.. [5] https://github.com/monasca/monasca-perf/blob/master/kafka_python_client_perf/monascaInvestigationKafkaPythonAPIs.md
.. [6] http://activisiongamescience.github.io/2016/06/15/Kafka-Client-Benchmarking/
.. [7] http://git.openstack.org/cgit/openstack/monasca-persister/commit/?id=a7112fd30bd545dd850e0e267dcceb9ea27551ad
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Stein
- Introduced