oslo.messaging

Commit Graph

Author	SHA1	Message	Date
Zuul	e978eb920f	Merge "Make oslo.messaging, magnum, and zaqar reproducible."	2024-04-12 13:24:34 +00:00
Zuul	41fc2a2d35	Merge "Use StopWatch timer when waiting for message"	2024-04-09 13:47:33 +00:00
Thomas Goirand	dc55d64df9	Make oslo.messaging, magnum, and zaqar reproducible. Whilst working on the Reproducible Builds effort [0], we noticed that python-oslo.messaging could not be built reproducibly. This is because the documentation captures the hostname of the build system. [0] https://reproducible-builds.org/ This patch uses sample_default from oslo.config to fix this. Please accept this patch to fix oslo.messaging, magnum, and zaqar (at least, probably others). Change-Id: Ie8717e182f709cbddd645a356789e262b47646d3	2024-04-09 11:49:38 +02:00
Zuul	986cd4ab24	Merge "Fix incorrect desc of rabbit_stream_fanout option"	2024-03-27 20:08:09 +00:00
Zuul	63f95e92c3	Merge "kafka: Fix invalid hostaddr format for IPv6 address"	2024-03-27 12:02:40 +00:00
frankming	ede60d7a83	Fix incorrect desc of rabbit_stream_fanout option Description of rabbit_stream_fanout option is incorrect. Actually it reuses the description of quorum queues. So we need to fix it with a correct stream queue description. Closes-Bug: #2058616 Change-Id: I614280c656f7d5fe9043abee93218a9907c395ff Signed-off-by: frankming <chen27508959@outlook.com>	2024-03-21 15:53:56 +08:00
Takashi Kajinami	b0e28a1603	kafka: Fix invalid hostaddr format for IPv6 address When IPv6 address is used for host, the hostaddr should be formatted in [<address>]:<port> format instead of <address>:<port> format. This ensures the correct format is used. Closes-Bug: 1907702 Change-Id: I6f4a453a69e942d5b2d66ffeca6960b85c8bc721	2024-02-20 19:09:56 +09:00
Arnaud Morin	b62208a54c	Use StopWatch timer when waiting for message When waiting for a message in a queue, the queue.get(block=True) prevent the heartbeats to be sent at correct interval. So instead of blocking the thread, doing a loop using a StopWatch timer until the timeout is reached. Closes-Bug: #2035113 Signed-off-by: Arnaud Morin <arnaud.morin@ovhcloud.com> Change-Id: Ie5cf5d2bd281508bcd2db1409f18ad96b0822639	2024-02-16 14:50:49 +01:00
Guillaume Espanel	5988c7bf14	Restore read stream queues from last known offset When an agent reconnected to a rabbitmq server, it would start consumming messages from the last offset available in the stream. This could cause important messages to be lost. With this patch, oslo_messaging will keep track of the last consummed offset and restore reading from that point. Related-bug: #2031497 Change-Id: I449008829b0c0a1a759c211b83f7a99d9c7f2c0d	2024-02-12 17:55:45 +01:00
Hervé Beraud	97d457f0af	Display the reply queue's name in timeout logs It would be helpful if "Timed out waiting for <service>" log messages at least specified on which `reply_q` it was waited for. Example without the reply_q: ``` 12228 2020-09-14 14:56:37.187 7 WARNING nova.conductor.api [req-1e081db6-808b-4af1-afc1-b87db7839394 - - - - -] Timed out waiting for nova-conductor. Is it running? Or did this service start before nova-conductor? Reattempting establishment of nova-conductor connection...: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 1640e7ef6f314451ba9a75d9ff6136ad ``` Example after adding the reply_q: ``` 12228 2020-09-14 14:56:37.187 7 WARNING nova.conductor.api [req-1e081db6-808b-4af1-afc1-b87db7839394 - - - - -] Timed out waiting for nova-conductor. Is it running? Or did this service start before nova-conductor? Reattempting establishment of nova-conductor connection...: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply (reply_2882766a63b540dabaf7d019cf0c0cda) to message ID 1640e7ef6f314451ba9a75d9ff6136ad ``` It could help us to more merely debug and observe if something went wrong with a reply queue. Change-Id: Ied2c881c71930dc631919113adc00112648f9d72 Closes-Bug: #1896925	2024-02-06 15:17:46 +01:00
Zuul	a417b425a0	Merge "Add an option to use rabbitmq stream for fanout queues"	2024-01-19 15:24:44 +00:00
julien.cosmao	e95f334459	Add an option to use rabbitmq stream for fanout queues This is introducing the "stream" queues for fanout so all components relying on fanout can use the same stream, lowering the number of queues needed and leveraging the new "stream" type of queues from rabbitmq. Closes-Bug: #2031497 Change-Id: I5056a19aada9143bcd80aaf064ced8cad441e6eb Signed-off-by: Arnaud Morin <arnaud.morin@ovhcloud.com>	2024-01-15 09:23:36 +01:00
Zuul	6ad1ccf89c	Merge "Add QManager to amqp driver"	2024-01-12 19:42:37 +00:00
Zuul	875506fff0	Merge "Enable use of quorum queues for transient messages"	2024-01-11 22:19:53 +00:00
Gorka Eguileor	f65607fa48	Fix clearing of the fake RPC Exchange The current fake driver does not properly clean up the fake RPC exchange between tests. This means that if a test invokes code that makes an RPC request, using the fake driver, without consuming the RPC message, then another test may receive this request making it fail. This issues has been found while working on a Cinder patch and has been worked-arounded there with Change-Id I52ee4b345b0a4b262e330a9a89552cd216eafdbe. This patch fixes the source of the problem by clearing the exchange class dictionary in the FakeExchangeManager during the FakeDriver cleanup. Change-Id: If82c2175cf7242b80509d180cdf92323c0f4c43b	2023-11-15 12:26:47 +01:00
Arnaud Morin	4614132ad0	Add QManager to amqp driver The purpose of this change is to introduce an optional mechanism to keep the queues name consistent between service restart. Oslo messaging is already re-using the queues while running, but the queues are created using a random name at the beginning. This change propose an option named use_queue_manager (default to False - so the behavior is not changed) that can be set to True to switch to a consistent naming based on hostname and processname. Related-bug: #2031497 Signed-off-by: Arnaud Morin <arnaud.morin@ovhcloud.com> Change-Id: I2acdef4e03164fdabcb50fb98a4ac14b1aefda00	2023-11-12 00:08:20 +01:00
Arnaud Morin	989dbb8aad	Enable use of quorum queues for transient messages Add a new flag rabbit_transient_quorum_queue to enable the use of quorum for transient queues (reply_ and _fanout_) This is helping a lot OpenStack services to not fail (and recover) from a rabbit node issue. Related-bug: #2031497 Signed-off-by: Arnaud Morin <arnaud.morin@ovhcloud.com> Change-Id: Icee5ee6938ca7c9651f281fb835708fc88b8464f	2023-11-12 00:08:20 +01:00
Arnaud Morin	8e3c523fd7	Auto-delete the failed quorum rabbit queues When rabbit is failing for a specific quorum queue, the only thing to do is to delete the queue (as per rabbit doc, see [1]). So, to avoid the RPC service to be broken until an operator eventually do a manual fix on it, catch any INTERNAL ERROR (code 541) and trigger the deletion of the failed queues under those conditions. So on next queue declare (triggered from various retries), the queue will be created again and the service will recover by itself. Closes-Bug: #2028384 Related-bug: #2031497 [1] https://www.rabbitmq.com/quorum-queues.html#availability Signed-off-by: Arnaud Morin <arnaud.morin@ovhcloud.com> Change-Id: Ib8dba833542973091a4e0bf23bb593aca89c5905	2023-11-12 00:08:20 +01:00
Arnaud Morin	f23f3276c4	Allow creating transient queues with no expire When an operator rely on rabbitmq policies, there is no point to set the queue TTL in config. Moreover, using policies is much more simpler as you dont need to delete/recreate the queues to apply the new parameter (see [1]). So, adding the possibility to set the transient queue TTL to 0 will allow the creation of the queue without the x-expire parameter and only the policy will apply. [1] https://www.rabbitmq.com/parameters.html#policies Related-bug: #2031497 Signed-off-by: Arnaud Morin <arnaud.morin@ovhcloud.com> Change-Id: I34bad0f6d8ace475c48839adc68a023dd0c380de	2023-11-12 00:08:20 +01:00
Arnaud Morin	3438726dd0	Add some logs when sending RPC messages Related-bug: #2031497 Signed-off-by: Arnaud Morin <arnaud.morin@ovhcloud.com> Change-Id: I7d0c318624d3d02182392ca3f06eed04d4133728	2023-11-12 00:08:11 +01:00
Zuul	38c86a93ad	Merge "Set default heartbeat_rate to 3"	2023-10-11 13:29:33 +00:00
Zuul	fa15630041	Merge "Deprecate the amqp1 driver and Remove qpid functional tests"	2023-08-13 10:36:35 +00:00
Arnaud Morin	36fb5bceab	Set default heartbeat_rate to 3 Kombu recommend to run heartbeat_check every seconds but we use a lock around the kombu connection so, to not lock to much this lock to most of the time do nothing except waiting the events drain, we start heartbeat_check and retrieve the server heartbeat packet only two times more than the minimum required for the heartbeat works: heartbeat_timeout / heartbeat_rate / 2.0 Because of this, we are not sending the heartbeat frames at correct intervals. E.G. If heartbeat_timeout=60 and rate=2, AMQP protocol expects to send a frame every 30sec. With the current heartbeat_check implementation, heartbeat_check will be called every: heartbeat_timeout / heartbeat_rate / 2.0 = 60 / 2 / 2.0 = 15 Which will result in the following frame flow: T+0 --> do nothing (60/2 > 0) T+15 --> do nothing (60/2 > 15) T+30 --> do nothing (60/2 > 30) T+45 --> send a frame (60/2 < 45) ... With heartbeat_rate=3, the heartbeat_check will be executed more often: heartbeat_timeout / heartbeat_rate / 2.0 = 60 / 3 / 2.0 = 10 Frame flow: T+0 --> do nothing (60/3 > 0) T+10 --> do nothing (60/3 > 10) T+20 --> do nothing (60/3 > 20) T+30 --> send a frame (60/3 < 30) ... Now we are sending the frame with correct intervals Closes-bug: #2008734 Signed-off-by: Arnaud Morin <arnaud.morin@ovhcloud.com> Change-Id: Ie646d254faf5e45ba46948212f4c9baf1ba7a1a8	2023-08-08 15:23:59 +02:00
Andrew Bogott	0602d1a10a	Increase ACK_REQUEUE_EVERY_SECONDS_MAX to exceed default kombu_reconnect_delay Previously the two values were the same; this caused us to always exceed the timeout limit ACK_REQUEUE_EVERY_SECONDS_MAX which results in various code paths never being traversed due to premature timeout exceptions. Also apply min/max values to kombu_reconnect_delay so it doesn't exceed ACK_REQUEUE_EVERY_SECONDS_MAX and break things again. Closes-Bug: #1993149 Change-Id: I103d2aa79b4bd2c331810583aeca53e22ee27a49	2023-04-20 15:27:58 -05:00
Arnaud Morin	fd2381c723	Disable greenthreads for RabbitDriver "listen" connections When enabling heartbeat_in_pthread, we were restoring the "threading" python library from eventlet to original one in RabbitDriver but we forgot to do the same in AMQPDriverBase (RabbitDriver is subclass of AMQPDriverBase). We also need to use the original "queue" so that queues are not going to use greenthreads as well. Related-bug: #1961402 Related-bug: #1934937 Closes-bug: #2009138 Signed-off-by: Arnaud Morin <arnaud.morin@ovhcloud.com> Change-Id: I34ea0d1381e934297df2f793e0d2594ef8254f00	2023-03-03 11:24:27 +01:00
Dmitriy Rabotyagov	115cfb5b7c	Fix typo in quorum-related variables for RabbitMQ In [1] there was a typo made in variable names. To prevent even futher awkwardness regarding variable naming, we fix typo and publish a release note for ones that already using variables in their deployments. [1] https://review.opendev.org/c/openstack/oslo.messaging/+/831058 Change-Id: Icc438397c11521f3e5e9721f85aba9095e0831c2	2023-02-14 15:20:00 +00:00
Zuul	9f710ce6cd	Merge "Remove logging from ProducerConnection._produce_message"	2022-12-21 07:46:22 +00:00
Zuul	bd73f14fd2	Merge "Warn when we force creating a non durable exchange"	2022-12-20 20:12:47 +00:00
Zuul	b3c666ff34	Merge "Force creating non durable control exchange when a precondition failed"	2022-11-16 09:27:05 +00:00
Hervé Beraud	b83b87d49e	Warn when we force creating a non durable exchange Adding warning logs so that users can detect the fallback with durable exchanges. Change-Id: Iabce0986fae6ed8838f1f94496b5994fc19cc5ef	2022-10-18 14:17:19 +02:00
Hervé Beraud	0f63c227f5	Deprecate the amqp1 driver and Remove qpid functional tests A recent oslo.messaging patch [1], not yet merged, who aim to update the test runtime for antelope lead us to the following error: ``` qdrouterd: Python: ModuleNotFoundError: No module named 'qpid_dispatch' ``` Neither debian nor ubuntu in the latest releases have any binary built for the qpid backend, not even 3rd party. Only qpid proton, the client lib, is available. To solve this issue, these changes propose to deprecate the AMQP1 driver who is the one based on qpid and proton, and propose to remove the related functional tests. The AMQP1 driver doesn't seems to be widely used. [1] https://review.opendev.org/c/openstack/oslo.messaging/+/856643 Closes-Bug: 1992587 Change-Id: Id2ca9cd9ee8b8dbdd14dcd00ebd8188d20ea18dc	2022-10-18 11:27:46 +02:00
Slawek Kaplonski	e44f286ebc	Change default value of "heartbeat_in_pthread" to False As was reported in the related bug some time ago, setting that option to True for nova-compute can break it as it's non-wsgi service. We also noticed same problems with randomly stucked non-wsgi services like e.g. neutron agents and probably the same issue can happen with any other non-wsgi service. To avoid that this patch changes default value of that config option to be False. Together with [1] it effectively reverts change done in [2] some time ago. [1] https://review.opendev.org/c/openstack/oslo.messaging/+/800621 [2] https://review.opendev.org/c/openstack/oslo.messaging/+/747395 Related-Bug: #1934937 Closes-Bug: #1961402 Change-Id: I85f5b9d1b5d15ad61a9fcd6e25925b7eeb8bf6e7	2022-08-16 14:14:29 +00:00
Guillaume Espanel	43f2224aac	Remove logging from ProducerConnection._produce_message In impl_kafka, _produce_message is run in a tpool.execute context but it was also calling logging functions. This could cause subsequent calls to logging functions to deadlock. This patch moves the logging calls out of the tpool.execute scope. Change-Id: I81167eea0a6b1a43a88baa3bc383af684f4b1345 Closes-bug: #1981093	2022-08-03 17:35:16 +02:00
Zuul	4186386748	Merge "Add quorum queue control configurations"	2022-06-13 17:14:16 +00:00
Zuul	ca498b61c0	Merge "Add EXTERNAL as rabbit login method"	2022-04-27 15:16:44 +00:00
Zuul	64888bd05a	Merge "Add a new option to enforce the OpenSSL FIPS mode"	2022-04-26 14:15:36 +00:00
hamza alqtaishat	821197b947	Add EXTERNAL as rabbit login method As explained in the link below kombu has login method called external https://docs.celeryq.dev/projects/kombu/en/latest/_modules/kombu/connection.html The login method external is not listed as a choice in the Rabbit driver As explained in RabbitMQ documention https://www.rabbitmq.com/access-control.html for Authentication using Client TLS (x.509) Certificate Data clients must be configured to use the EXTERNAL mechanism. Closes-Bug: #1970276 Change-Id: I5c38d3a3cafd49f8abc031e36bc595f32a8631d2	2022-04-25 22:33:12 +00:00
hamza alqtaishat	8932ad237b	Add quorum queue control configurations the quorum queue type add features that did not exist before or not handled in rabbitmq the following link shows some of them https://blog.rabbitmq.com/posts/2020/04/rabbitmq-gets-an-ha-upgrade/ the options below control the quorum queue and ensure the stability of the quorum system x-max-in-memory-length x-max-in-memory-bytes x-delivery-limit which control the memory usage and handle message poisoning Closes-Bug: #1962348 Change-Id: I570227d6102681f4f9d8813ed0d7693a1160c21d	2022-04-06 19:46:40 +00:00
Zuul	2d090b5d6b	Merge "Adding support for rabbitmq quorum queues"	2022-02-08 14:57:32 +00:00
Hervé Beraud	7e8acbf870	Adding support for rabbitmq quorum queues https://www.rabbitmq.com/quorum-queues.html The quorum queue is a modern queue type for RabbitMQ implementing a durable, replicated FIFO queue based on the Raft consensus algorithm. It is available as of RabbitMQ 3.8.0. the quorum queues can not be set by policy so this should be done when declaring the queue. To declare a quorum queue set the x-queue-type queue argument to quorum (the default is classic). This argument must be provided by a client at queue declaration time; it cannot be set or changed using a policy. This is because policy definition or applicable policy can be changed dynamically but queue type cannot. It must be specified at the time of declaration. its good for the oslo messaging to add support for that type of queue that have multiple advantaged over mirroring. If quorum queues are sets mirrored queues will be ignored. Closes-Bug: #1942933 Change-Id: Id573e04c287e034e50626daf6e18a34735d45251	2022-02-05 07:12:49 +00:00
Zuul	5d165cc713	Merge "[rabbit] use retry parameters during notification sending"	2022-01-12 15:29:41 +00:00
Balazs Gibizer	7b3968d9b0	[rabbit] use retry parameters during notification sending The rabbit backend now applies the [oslo_messaging_notifications]retry, [oslo_messaging_rabbit]rabbit_retry_interval, rabbit_retry_backoff and rabbit_interval_max configuration parameters when tries to establish the connection to the message bus during notification sending. This patch also clarifies the differences between the behavior of the kafka and the rabbit drivers in this regard. Closes-Bug: #1917645 Change-Id: Id4ccafc95314c86ae918336e42cca64a6acd4d94	2022-01-12 12:22:55 +01:00
Hervé Beraud	1fd461647f	Force creating non durable control exchange when a precondition failed Precondition failed exception related to durable exchange config may be triggered when a control exchange is shared between services and when services try to create it with configs that differ from each others. RabbitMQ will reject the services that try to create it with a configuration that differ from the one used first. This kind of exception is not managed for now and services can fails without handling this kind of issue. These changes catch this kind exception to analyze if they related to durable config. In this case we try to re-declare the failing exchange/queue as non durable. This problem can be easily reproduced by running a local RabbitMQ server. By setting the config below (sample.conf): ``` [DEFAULT] transport_url = rabbit://localhost/ [OSLO_MESSAGING_RABBIT] amqp_durable_queues = true ``` And by running our simulator twice: ``` $ tox -e venv -- python tools/simulator.py -d rpc-server -w 40 $ tox -e venv -- python tools/simulator.py --config-file ./sample.conf -d rpc-server -w 40 ``` The first one will create a default non durable control exchange. The second one will create the same default control exchange but as durable. Closes-Bug: #1953351 Change-Id: I27625b468c428cde6609730c8ab429c2c112d010	2021-12-15 13:56:12 +01:00
John Eckersberg	02a38f507d	amqp1: fix race when reconnecting Currently this is how reconnect works: - pyngus detects failure and invokes callback Controller.connection_failed() which in turn calls Controller._handle_connection_loss() - The first thing that _handle_connection_loss does is to set self.addresser to None (important later) - Then it defers _do_reconnect after a delay (normally 1 second) - (1 second passes) - _do_reconnect calls _hard_reset which resets the controller state However, there is a race here. This can happen: - The above, up until it defers and waits for 1 second - Controller.send() is invoked on a task - A new Sender is created, and critically because self.reply_link still exists and is active, we call sender.attach and pass in self.addresser. Remember _handle_connection_loss sets self.addresser to None. - Eventually Sender.attach throws an AttributeError because it attempts to call addresser.resolve() but addresser is None The reason this happens is because although the connection is dead, the controller state is still half-alive because _hard_reset hasn't been called yet since it's deferred one second in _do_reconnect. The fix here is to move _hard_reset out of _do_reconnect and directly into _handle_connection_loss. The eventloop is woken up immediately to process _hard_reset but _do_reconnect is still deferred as before so as to retain the desired reconnect backoff behavior. Closes-Bug: #1941652 Change-Id: Ife62a7d76022908f0dc6a77f1ad607cb2fbd3e8f	2021-11-09 15:59:54 -05:00
Hervé Beraud	384738a92d	Add a new option to enforce the OpenSSL FIPS mode This option ``ssl_enforce_fips_mode`` allow us to enforce the FIPS mode if supported by the version of python in use. https://en.wikipedia.org/wiki/Federal_Information_Processing_Standards Change-Id: I50c7de71bfd38137eb83d23e910298946507ce9f	2021-11-08 15:05:30 +01:00
Zuul	feb72de7b8	Merge "Remove deprecation of heartbeat_in_pthread"	2021-10-21 12:33:29 +00:00
Hervé Beraud	d24edef117	Remove deprecation of heartbeat_in_pthread In some circumstances services can be executed outside of mod_wsgi and in a monkey patched environment. In this context we need to leave the possibility to users to execute the heartbeat in a green thread. The heartbeat_in_pthread was tagged as depreacted few months and planned for a future removal. These changes drop this deprecation to allow to enable green threads if needed. Closes-Bug: #1934937 Change-Id: Iee2e5a6f7d71acba70bbc857f0bd7d83e32a7b8c	2021-10-14 15:20:42 +02:00
John Eckersberg	ca939fc0e4	rabbit: move stdlib_threading bits into _utils The amqp1 driver also needs this same logic, so split it out and share it. Change-Id: I2e9dbfa27887e26807f199c9d359bacd7c15c67a	2021-09-22 14:45:39 -04:00
Zuul	d4f7ea21fc	Merge "use message id cache for RPC listener"	2021-09-13 16:54:59 +00:00
Zuul	ef0b31f112	Merge "limit maximum timeout in the poll loop"	2021-09-13 16:39:10 +00:00

1 2 3 4 5 ...

838 Commits