Commit Graph

1317 Commits

Author SHA1 Message Date
Zuul e978eb920f Merge "Make oslo.messaging, magnum, and zaqar reproducible." 2024-04-12 13:24:34 +00:00
Zuul 41fc2a2d35 Merge "Use StopWatch timer when waiting for message" 2024-04-09 13:47:33 +00:00
Thomas Goirand dc55d64df9 Make oslo.messaging, magnum, and zaqar reproducible.
Whilst working on the Reproducible Builds effort [0], we noticed that
python-oslo.messaging could not be built reproducibly.

This is because the documentation captures the hostname of the build
system.

[0] https://reproducible-builds.org/

This patch uses sample_default from oslo.config to fix this.

Please accept this patch to fix oslo.messaging, magnum, and zaqar
(at least, probably others).

Change-Id: Ie8717e182f709cbddd645a356789e262b47646d3
2024-04-09 11:49:38 +02:00
Zuul 986cd4ab24 Merge "Fix incorrect desc of rabbit_stream_fanout option" 2024-03-27 20:08:09 +00:00
Zuul 63f95e92c3 Merge "kafka: Fix invalid hostaddr format for IPv6 address" 2024-03-27 12:02:40 +00:00
frankming ede60d7a83 Fix incorrect desc of rabbit_stream_fanout option
Description of rabbit_stream_fanout option is incorrect. Actually it
reuses the description of quorum queues. So we need to fix it with a
correct stream queue description.

Closes-Bug: #2058616
Change-Id: I614280c656f7d5fe9043abee93218a9907c395ff
Signed-off-by: frankming <chen27508959@outlook.com>
2024-03-21 15:53:56 +08:00
Takashi Kajinami b0e28a1603 kafka: Fix invalid hostaddr format for IPv6 address
When IPv6 address is used for host, the hostaddr should be formatted
in [<address>]:<port> format instead of <address>:<port> format. This
ensures the correct format is used.

Closes-Bug: 1907702
Change-Id: I6f4a453a69e942d5b2d66ffeca6960b85c8bc721
2024-02-20 19:09:56 +09:00
Arnaud Morin b62208a54c Use StopWatch timer when waiting for message
When waiting for a message in a queue, the queue.get(block=True) prevent
the heartbeats to be sent at correct interval.

So instead of blocking the thread, doing a loop using a StopWatch timer
until the timeout is reached.

Closes-Bug: #2035113

Signed-off-by: Arnaud Morin <arnaud.morin@ovhcloud.com>
Change-Id: Ie5cf5d2bd281508bcd2db1409f18ad96b0822639
2024-02-16 14:50:49 +01:00
Guillaume Espanel 5988c7bf14 Restore read stream queues from last known offset
When an agent reconnected to a rabbitmq server, it would start
consumming messages from the last offset available in the stream.

This could cause important messages to be lost.

With this patch, oslo_messaging will keep track of the last consummed
offset and restore reading from that point.

Related-bug: #2031497

Change-Id: I449008829b0c0a1a759c211b83f7a99d9c7f2c0d
2024-02-12 17:55:45 +01:00
Hervé Beraud 97d457f0af
Display the reply queue's name in timeout logs
It would be helpful if "Timed out waiting for <service>" log messages at least
specified on which `reply_q` it was waited for.

Example without the reply_q:

```
12228 2020-09-14 14:56:37.187 7 WARNING nova.conductor.api
[req-1e081db6-808b-4af1-afc1-b87db7839394 - - - - -] Timed out waiting for
nova-conductor.  Is it running? Or did this service start before
nova-conductor?  Reattempting establishment of nova-conductor connection...:
oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to
message ID 1640e7ef6f314451ba9a75d9ff6136ad
```

Example after adding the reply_q:

```
12228 2020-09-14 14:56:37.187 7 WARNING nova.conductor.api
[req-1e081db6-808b-4af1-afc1-b87db7839394 - - - - -] Timed out waiting for
nova-conductor.  Is it running? Or did this service start before
nova-conductor?  Reattempting establishment of nova-conductor connection...:
oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply
(reply_2882766a63b540dabaf7d019cf0c0cda)
to message ID 1640e7ef6f314451ba9a75d9ff6136ad
```

It could help us to more merely debug and observe if something went
wrong with a reply queue.

Change-Id: Ied2c881c71930dc631919113adc00112648f9d72
Closes-Bug: #1896925
2024-02-06 15:17:46 +01:00
Takashi Kajinami 4f4c2772da Bump hacking (again)
The previous attempt did not update the version in pre commit config
so the old version is still used by pep8 target.

Change-Id: Idf8c7d99f7c6aeb0244d58e85524ba1f039195d8
2024-01-26 01:10:57 +09:00
Zuul a417b425a0 Merge "Add an option to use rabbitmq stream for fanout queues" 2024-01-19 15:24:44 +00:00
Jay Faulkner 800c58826e Utilize the new RequestContext redacted_copy method
We now expect context objects to support returning a redacted copy of
themselves.

As a related cleanup, removed the practice entirely of using
dictionaries to represent contexts in unit tests and the logging driver.

As part of developing this change, I discovered code in Glance (and
potentially other services) which explicitly pass {} in lieu of a
context when notifying; so we now properly handle dictionaries as
contexts.

To ensure we have the method required; require oslo.context 5.3.0 or
newer.

Change-Id: I894f38cc83c98d3e8d48b59864c0c7c2d27e7dcd
2024-01-16 12:08:20 -08:00
julien.cosmao e95f334459 Add an option to use rabbitmq stream for fanout queues
This is introducing the "stream" queues for fanout so all components
relying on fanout can use the same stream, lowering the number of queues
needed and leveraging the new "stream" type of queues from rabbitmq.

Closes-Bug: #2031497

Change-Id: I5056a19aada9143bcd80aaf064ced8cad441e6eb
Signed-off-by: Arnaud Morin <arnaud.morin@ovhcloud.com>
2024-01-15 09:23:36 +01:00
Zuul 6ad1ccf89c Merge "Add QManager to amqp driver" 2024-01-12 19:42:37 +00:00
Zuul 875506fff0 Merge "Enable use of quorum queues for transient messages" 2024-01-11 22:19:53 +00:00
Gorka Eguileor f65607fa48 Fix clearing of the fake RPC Exchange
The current fake driver does not properly clean up the fake RPC exchange
between tests.

This means that if a test invokes code that makes an RPC request, using
the fake driver, without consuming the RPC message, then another test
may receive this request making it fail.

This issues has been found while working on a Cinder patch and has been
worked-arounded there with Change-Id
I52ee4b345b0a4b262e330a9a89552cd216eafdbe.

This patch fixes the source of the problem by clearing the exchange
class dictionary in the FakeExchangeManager during the FakeDriver
cleanup.

Change-Id: If82c2175cf7242b80509d180cdf92323c0f4c43b
2023-11-15 12:26:47 +01:00
Arnaud Morin 4614132ad0 Add QManager to amqp driver
The purpose of this change is to introduce an optional mechanism to keep
the queues name consistent between service restart.
Oslo messaging is already re-using the queues while running, but the
queues are created using a random name at the beginning.

This change propose an option named use_queue_manager (default to False
- so the behavior is not changed) that can be set to True to switch to a
consistent naming based on hostname and processname.

Related-bug: #2031497

Signed-off-by: Arnaud Morin <arnaud.morin@ovhcloud.com>
Change-Id: I2acdef4e03164fdabcb50fb98a4ac14b1aefda00
2023-11-12 00:08:20 +01:00
Arnaud Morin 989dbb8aad Enable use of quorum queues for transient messages
Add a new flag rabbit_transient_quorum_queue to enable the use of quorum
for transient queues (reply_ and _fanout_)

This is helping a lot OpenStack services to not fail (and recover) from
a rabbit node issue.

Related-bug: #2031497

Signed-off-by: Arnaud Morin <arnaud.morin@ovhcloud.com>
Change-Id: Icee5ee6938ca7c9651f281fb835708fc88b8464f
2023-11-12 00:08:20 +01:00
Arnaud Morin 8e3c523fd7 Auto-delete the failed quorum rabbit queues
When rabbit is failing for a specific quorum queue, the only thing to
do is to delete the queue (as per rabbit doc, see [1]).

So, to avoid the RPC service to be broken until an operator eventually
do a manual fix on it, catch any INTERNAL ERROR (code 541) and trigger
the deletion of the failed queues under those conditions.
So on next queue declare (triggered from various retries), the queue
will be created again and the service will recover by itself.

Closes-Bug: #2028384
Related-bug: #2031497

[1] https://www.rabbitmq.com/quorum-queues.html#availability

Signed-off-by: Arnaud Morin <arnaud.morin@ovhcloud.com>
Change-Id: Ib8dba833542973091a4e0bf23bb593aca89c5905
2023-11-12 00:08:20 +01:00
Arnaud Morin f23f3276c4 Allow creating transient queues with no expire
When an operator rely on rabbitmq policies, there is no point to set the
queue TTL in config.
Moreover, using policies is much more simpler as you dont need to
delete/recreate the queues to apply the new parameter (see [1]).
So, adding the possibility to set the transient queue TTL to 0 will
allow the creation of the queue without the x-expire parameter and only
the policy will apply.

[1] https://www.rabbitmq.com/parameters.html#policies

Related-bug: #2031497

Signed-off-by: Arnaud Morin <arnaud.morin@ovhcloud.com>
Change-Id: I34bad0f6d8ace475c48839adc68a023dd0c380de
2023-11-12 00:08:20 +01:00
Arnaud Morin 3438726dd0 Add some logs when sending RPC messages
Related-bug: #2031497

Signed-off-by: Arnaud Morin <arnaud.morin@ovhcloud.com>
Change-Id: I7d0c318624d3d02182392ca3f06eed04d4133728
2023-11-12 00:08:11 +01:00
Zuul f455edd601 Merge "Bump bandit and make oslo.messaging compatible with latest rules" 2023-10-20 13:47:33 +00:00
Zuul 38c86a93ad Merge "Set default heartbeat_rate to 3" 2023-10-11 13:29:33 +00:00
Jay Faulkner c1b606f77e Add is_admin to safe fields list for notifications
We encountered bug 2037312 in unit tests when attempting to get this
change rolled out. Heat apparently will attempt to set is_admin using
policy logic if it's not passed in for a new context; this breaks as the
context we are requested doesn't have all the needed information to
exercise the policy logic.

is_admin is just a bool; it's not sensitive; easiest route forward is to
add it to the safe list

Closes-bug: 2037312
Change-Id: I78b08edfcb8115cddd7de9c6c788c0a57c8218a8
2023-09-25 17:51:32 +00:00
Zuul 3485301b18 Merge "Only allow safe context fields in notifications" 2023-08-17 12:43:26 +00:00
Zuul fa15630041 Merge "Deprecate the amqp1 driver and Remove qpid functional tests" 2023-08-13 10:36:35 +00:00
Jay Faulkner 1b315615e7 Only allow safe context fields in notifications
Publishing a fully hydrated context object in a notification would give
someone with access to that notification the ability to impersonate the
original actor through inclusion of sensitive fields.

Now, instead, we pare down the context object to the bare minimum before
passing it for serialization in notification workflows.

Related-bug: 2030976
Change-Id: Ic94323658c89df1c1ff32f511ca23502317d0f00
2023-08-11 13:07:54 -07:00
Arnaud Morin 36fb5bceab Set default heartbeat_rate to 3
Kombu recommend to run heartbeat_check every seconds but we use a lock
around the kombu connection so, to not lock to much this lock to most of
the time do nothing except waiting the events drain, we start
heartbeat_check and retrieve the server heartbeat packet only two times
more than the minimum required for the heartbeat works:
    heartbeat_timeout / heartbeat_rate / 2.0

Because of this, we are not sending the heartbeat frames at correct
intervals. E.G.

If heartbeat_timeout=60 and rate=2, AMQP protocol expects to send a
frame
every 30sec.

With the current heartbeat_check implementation, heartbeat_check will be
called every:
    heartbeat_timeout / heartbeat_rate / 2.0 = 60 / 2 / 2.0 = 15
Which will result in the following frame flow:
    T+0  --> do nothing (60/2 > 0)
    T+15 --> do nothing (60/2 > 15)
    T+30 --> do nothing (60/2 > 30)
    T+45 --> send a frame (60/2 < 45)
    ...

With heartbeat_rate=3, the heartbeat_check will be executed more often:
    heartbeat_timeout / heartbeat_rate / 2.0 = 60 / 3 / 2.0 = 10
Frame flow:
    T+0  --> do nothing (60/3 > 0)
    T+10 --> do nothing (60/3 > 10)
    T+20 --> do nothing (60/3 > 20)
    T+30 --> send a frame (60/3 < 30)
    ...

Now we are sending the frame with correct intervals

Closes-bug: #2008734

Signed-off-by: Arnaud Morin <arnaud.morin@ovhcloud.com>
Change-Id: Ie646d254faf5e45ba46948212f4c9baf1ba7a1a8
2023-08-08 15:23:59 +02:00
Hervé Beraud ee13e53614 Bump bandit and make oslo.messaging compatible with latest rules
- Apply a timeout to requests calls to avoid uncontrolled
  resource consumption (CWE-400) [1].
- Ignore CWE 377

[1] https://cwe.mitre.org/data/definitions/400.html
[2] https://cwe.mitre.org/data/definitions/377.html

Change-Id: Ic558ad392424a25b5fd9a10749163d8427159eda
2023-05-17 11:06:34 +02:00
Andrew Bogott 0602d1a10a Increase ACK_REQUEUE_EVERY_SECONDS_MAX to exceed default kombu_reconnect_delay
Previously the two values were the same; this caused us
to always exceed the timeout limit ACK_REQUEUE_EVERY_SECONDS_MAX
which results in various code paths never being traversed
due to premature timeout exceptions.

Also apply min/max values to kombu_reconnect_delay so it doesn't
exceed ACK_REQUEUE_EVERY_SECONDS_MAX and break things again.

Closes-Bug: #1993149
Change-Id: I103d2aa79b4bd2c331810583aeca53e22ee27a49
2023-04-20 15:27:58 -05:00
Arnaud Morin fd2381c723 Disable greenthreads for RabbitDriver "listen" connections
When enabling heartbeat_in_pthread, we were restoring the "threading"
python library from eventlet to original one in RabbitDriver but we
forgot to do the same in AMQPDriverBase (RabbitDriver is subclass of
AMQPDriverBase).

We also need to use the original "queue" so that queues are not going to
use greenthreads as well.

Related-bug: #1961402
Related-bug: #1934937
Closes-bug: #2009138

Signed-off-by: Arnaud Morin <arnaud.morin@ovhcloud.com>
Change-Id: I34ea0d1381e934297df2f793e0d2594ef8254f00
2023-03-03 11:24:27 +01:00
Dmitriy Rabotyagov 115cfb5b7c Fix typo in quorum-related variables for RabbitMQ
In [1] there was a typo made in variable names. To prevent even futher
awkwardness regarding variable naming, we fix typo and publish a
release note for ones that already using variables in their deployments.

[1] https://review.opendev.org/c/openstack/oslo.messaging/+/831058

Change-Id: Icc438397c11521f3e5e9721f85aba9095e0831c2
2023-02-14 15:20:00 +00:00
Tobias Urdin 687dea2e65 Support overriding class for get_rpc_* helper functions
We currently do not support overriding the class being
instantiated in the RPC helper functions, this adds that
support so that projects that define their own classes
that inherit from oslo.messaging can use the helpers.

For example neutron utilizes code from neutron-lib that
has it's own RPCClient implementation that inherits from
oslo.messaging, in order for them to use for example
the get_rpc_client helper they need support to override
the class being returned. The alternative would be to
modify the internal _manual_load variable which seems
counter-productive to extending the API provided to
consumers.

Change-Id: Ie22f2ee47a4ca3f28a71272ee1ffdb88aaeb7758
2023-01-23 08:40:37 +00:00
Zuul 9f710ce6cd Merge "Remove logging from ProducerConnection._produce_message" 2022-12-21 07:46:22 +00:00
Zuul bd73f14fd2 Merge "Warn when we force creating a non durable exchange" 2022-12-20 20:12:47 +00:00
Zuul 2e81fac973 Merge "Implement get_rpc_client function" 2022-12-01 18:45:46 +00:00
Zuul b3c666ff34 Merge "Force creating non durable control exchange when a precondition failed" 2022-11-16 09:27:05 +00:00
Tobias Urdin 4ead7cb2dc Implement get_rpc_client function
We already expose functions to handle the instantiation
of classes such as RPCServer and RPCTransport but the
same was never done for RPCClient so the API is
inconsistent in its enforcement.

This adds a get_rpc_client function that should be used
instead of instatiating the RPCClient class directly to
be more consistent.

This also allows to handle more logic inside the function
in the future such as if implementations for an async client
is implemented, as investigation in [1] has shown.

[1] https://review.opendev.org/c/openstack/oslo.messaging/+/858936

Change-Id: Ia4d1f0497b9e2728bde02f4ff05fdc175ddffe66
2022-10-25 11:42:40 +00:00
Hervé Beraud b83b87d49e Warn when we force creating a non durable exchange
Adding warning logs so that users can detect the fallback with durable
exchanges.

Change-Id: Iabce0986fae6ed8838f1f94496b5994fc19cc5ef
2022-10-18 14:17:19 +02:00
Hervé Beraud 0f63c227f5 Deprecate the amqp1 driver and Remove qpid functional tests
A recent oslo.messaging patch [1], not yet merged, who aim to update the
test runtime for antelope lead us to the following error:

```
qdrouterd: Python: ModuleNotFoundError: No module named 'qpid_dispatch'
```

Neither debian nor ubuntu in the latest releases have any binary
built for the qpid backend, not even 3rd party. Only qpid proton,
the client lib, is available.

To solve this issue, these changes propose to deprecate the AMQP1 driver
who is the one based on qpid and proton, and propose to remove the
related functional tests.

The AMQP1 driver doesn't seems to be widely used.

[1] https://review.opendev.org/c/openstack/oslo.messaging/+/856643

Closes-Bug: 1992587
Change-Id: Id2ca9cd9ee8b8dbdd14dcd00ebd8188d20ea18dc
2022-10-18 11:27:46 +02:00
Zuul 4979f304dd Merge "update hacking pin to support flake8 3.8.3" 2022-08-30 19:45:57 +00:00
Slawek Kaplonski e44f286ebc Change default value of "heartbeat_in_pthread" to False
As was reported in the related bug some time ago, setting that
option to True for nova-compute can break it as it's non-wsgi service.
We also noticed same problems with randomly stucked non-wsgi services
like e.g. neutron agents and probably the same issue can happen with
any other non-wsgi service.

To avoid that this patch changes default value of that config option
to be False.
Together with [1] it effectively reverts change done in [2] some time
ago.

[1] https://review.opendev.org/c/openstack/oslo.messaging/+/800621
[2] https://review.opendev.org/c/openstack/oslo.messaging/+/747395

Related-Bug: #1934937
Closes-Bug: #1961402

Change-Id: I85f5b9d1b5d15ad61a9fcd6e25925b7eeb8bf6e7
2022-08-16 14:14:29 +00:00
Guillaume Espanel 43f2224aac Remove logging from ProducerConnection._produce_message
In impl_kafka, _produce_message is run in a tpool.execute
context but it was also calling logging functions.
This could cause subsequent calls to logging functions to
deadlock.

This patch moves the logging calls out of the tpool.execute scope.

Change-Id: I81167eea0a6b1a43a88baa3bc383af684f4b1345
Closes-bug: #1981093
2022-08-03 17:35:16 +02:00
Zuul 4186386748 Merge "Add quorum queue control configurations" 2022-06-13 17:14:16 +00:00
Sean Mooney cde68026eb update hacking pin to support flake8 3.8.3
this change updates the max version of hacking
to 4.1.0 to allow pre-commit to work with the
flake 3.8.3 release and correct one new error that was
raised as a result.

Change-Id: I3a0242208f411b430db0e7429e2c773f45b3d301
2022-05-23 14:39:56 +00:00
Zuul ca498b61c0 Merge "Add EXTERNAL as rabbit login method" 2022-04-27 15:16:44 +00:00
Zuul 64888bd05a Merge "Add a new option to enforce the OpenSSL FIPS mode" 2022-04-26 14:15:36 +00:00
hamza alqtaishat 821197b947 Add EXTERNAL as rabbit login method
As explained in the link below kombu has login method called external
https://docs.celeryq.dev/projects/kombu/en/latest/_modules/kombu/connection.html

The login method external is not listed as a choice in the Rabbit driver

As explained in RabbitMQ documention
https://www.rabbitmq.com/access-control.html
for Authentication using Client TLS (x.509) Certificate Data
clients must be configured to use the EXTERNAL mechanism.

Closes-Bug: #1970276
Change-Id: I5c38d3a3cafd49f8abc031e36bc595f32a8631d2
2022-04-25 22:33:12 +00:00
hamza alqtaishat 8932ad237b Add quorum queue control configurations
the quorum queue type add features that did not exist before or not
handled in rabbitmq the following link shows some of them
https://blog.rabbitmq.com/posts/2020/04/rabbitmq-gets-an-ha-upgrade/

the options below control the quorum queue and ensure the stability of
the quorum system
x-max-in-memory-length
x-max-in-memory-bytes
x-delivery-limit

which control the memory usage and handle message poisoning

Closes-Bug: #1962348
Change-Id: I570227d6102681f4f9d8813ed0d7693a1160c21d
2022-04-06 19:46:40 +00:00