Presently novajoin has no way of differentiating between hosts and
hostnames. As a result, it is possible for a host to be inadvertantly
deleted in certain conditions.
This fix aims to resolve this and other join/delete edge cases by
passing the instance-id (server uuid) from nova along in the
description field that is passed to IdM. We can use this
description and id to ensure we delete only the hosts we meant to.
Overview of changes:
- Persist nova instance-id in IdM's Description field
- Update join logic to handle hosts with old Description field
- Update join logic to cause nova deploy failure when attempting to
add a host with a hostname that is already enrolled
- Add new DuplicateInstanceError exception type
- Add new DeleteInstanceIdMismatch exception type
- Add inline comments documenting code flow
- IPAClient add_host doc strings for clarity
Change-Id: I676bac162a6ec35366c506bdb660cf3913131afd
We are having a hard time keeping track of which operations
correspond to which request. This patch adds the ability to track
operations in the notifier with the message_id of the notification
being processed. This message_id (which is generated by oslo is
a uuid
For the server, we could also set the message_id to the request_id
of the python-requests object received, but this is already
logged as part of the server logs.
Change-Id: Ie8b885a2b5cba6684e92c49eed4a99d24621402e
Debugging is confusing when the same names are used for methods
in two different controllers. Fixing this to more accurately
reflect whats going on.
Change-Id: I3740cd3ae81776cb1ecf066e617e615d880dc2e8
Right now, the backoff mechanism is broken when the backoff is
set to something non-zero. Basically, you go into this state where
you retry ad infinitum, leading to inconsistent behavior.
This change fixes the mechanism so that you only get a fixed number
of retries. You can choose (through a new config parameter) to allow
backoff (or not).
To restore some of the old behavior, the default for the connect_retries
parameter has been increased from 2 to 4, and the max backoff time has
been decreased from 1024 to 512 seconds. Its unlikely that we'd ever
reach that backoff time without a large number of retries, but 1024
seems too long.
And there is a new exception that is thrown when the connection
fails. This will result in nice 500 errors in the novajoin-server,
and some log messages for the notifier.
Change-Id: I10547fbde8966c8694346ed8c054e627bee2ee51
This patch adds logic to handle compact service metadata that
has been split into multiple lines to avoid hitting the metadata
size limit.
Co-Authored-By: Grzegorz Grasza <xek@redhat.com>
Change-Id: Ida39f5768c67f982b2fe316f6fae4988a74c8534
Support nova versioned notifications. Unversioned notifications
are still supported and the default. The CI is configured to test
versioned notifications, and both implementations use the same methods.
Because of this, testing versioned notifications also covers
unversioned notifications, since the execution path flows through both.
Change-Id: If028afa9e9fbcb344786cd287605e0d9af5d3c01
This adds support for creating and removing DNS A records when
floating IPs are associated and disassociated in neutron.
novajoin-install and functional tests are enhanced to test it.
Change-Id: I82c83ad9e8c84ddfd4ecfc4d5c3b31a418af97a7
This implements adding additional services via the metadata interface by
reacting on the compute.instance.update notifications. This effectively
covers updates from already enrolled nodes with some services towards
adding new services.
Note that this still requires folks to remove services manually if
they're no longer used.
Another important thing to note is that this doesn't yet cover updates
from non-enrolled deployments to enrolling them and adding services.
Related-Bug: #1715295
Change-Id: I48ab94a184657f6730281740935a05143abbc499
DeprecationWarning: Using function/method 'oslo_messaging.get_transport()'
is deprecated: use get_rpc_transport or get_notification_transport
Change-Id: I6d940c89a2dc580996a3f4dd308c483b0e43589b
Notifications can be multi-threaded but each thread was sharing
the same IPAClient instance which was causing contention in the
retry code (and likely the ccache).
Move the IPA object closer to execution so each notification will
have its own instance. This will mean more kinit activity but
each one will be isolated and be able to handle expired tickets,
IPA being down, network issues, etc.
Add backoff code on failures so we don't spam the IPA server with
retries. It is a by-two backoff from 2 - 1024 seconds.
NOTE: novajoin-server will not use this backoff code. It will
continue to use the retries configuration setting. This is because
we know there is a limited window to respond so cannot infinitely
do a retry unlike notifications.
Change-Id: Ia18d3f97f7549c89dcf4e6f014f44c3fcebc919f
Check ipa_enroll in the instance and image metadata to see if
enrollment was requested before attempting to delete a host
from IPA.
Change-Id: Iccc833db47da09b97a16c02e5b184b8f9e1d97d1
If we have our own queue, we don't need to requeue the notifications. On
the other hand, requeuing would then lead to a lot of messages lingering
on the queue, which we would need to clean up manually. Consuming the
messages without requeuing saves us from this issue.
Change-Id: I31af8f74e2115e1fe3a01e5ae55e050c1f24704e
Using the default notifications topic results in race conditions with
other services that consume these, where this will sporadically work
depending on who gets the notification first. Making this configurable
allows us to avoid these issues by having a novajoin specific queue.
Change-Id: I86b2a9df317e8f9a877d7619a2918e50d58b7c85
These methods are used to decide whether a given host or service
can be removed or not. We want to avoid deleting hosts that
manage other services or services being managed by other hosts.
Change-Id: I45f3d49e53a3f1bdeed149ae0790d820c87a9a58
If compact_services is not set in the instance metadata
then there is no need to call handle_compact_services()
Change-Id: I4c7d098cd2ced4f7903fb048122a924304585347
This will use the IPA domain by default. If a domain is specified
then it will be used.
For testing, if there is no config option and IPA is not available
then the domain 'test' will be mocked.
Change-Id: I21b66c61830c49a79094927f42c22c76ec1b447c
This was requested by the puppet team for inclusion in the
puppet-nova module. It does make a certain amount of sense for
all nova-type configuration to exist in a single place.
oslo.config isn't really made to handle this type of thing. If
project='nova' and prog='join' it would still find
/etc/nova/nova.conf first and try to load that instead. So I
had to duplicate some of the oslo.config code and hard code
nova and join in order to simulate the previous behavior but
work for my purposes.
The original location, /etc/join/join.conf, is the final
fallback to support any existing installations.
flake8 is quite a bit more picky and discovered a lot of
issues.
Don't let missing configuration blow things up in order to be
able to run unit tests.
Hacky workaround for missing ipalib/ipapython in PyPy
The sqlite cache wouldn't work in an HA system as one can never
tell which controller is going to be requested. This would require
a shared or replicating database backend which is simply overkill
for this.
Instead let the last request win. With each request for metadata
generate a new OTP and update the value in the host record in
IPA.