After reviewing reports of multiple CCache cropping up in logs, we
found an issue in the way novajoin is initiating and updating
cache files containing keytabs. The result was numerous extra cache
files being created and overwritten.
With this change we ensure that the credentials cache is properly
shared across workers and that when new credentials are being
created, the cache files are locked to avoid potential conflicts.
Updates DEBUG level logging to include useful cache troubleshooting
breadcrumbs.
Change-Id: I07e0004f77e0d52ab2a2707c5fe50f48f718b717
Co-Authored-By: Ade Lee <alee@redhat.com>
python-nss does not exist (and is not needed) in RHEL8.
We need to conditionally import nss to avoid errors in RHEL8.
Change-Id: I699fbfab4c2106f24260c99905b1bd40a8e683a8
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1758771
Presently, when novajoin fails to make a connection with the IPA
server, for any reason, it will immediately re-attempt to make
the connection when the backoff is unset (it is off by default).
As a result, any timing related issues could be the source of
the connection issues will likely result in no connection at all.
This change adds a new configuration option, retry_delay, which
will halt subsequent connection attempts for N seconds where N
is the retry_delay. By default this is set to 5 seconds, mirroring
internal ipalib behavior[1].
[1] - https://github.com/freeipa/freeipa/blob/master/ipalib/install/kinit.py#L29-L30
Change-Id: Iec96e4bd6643c0a657c8db424cc72deb10f170bd
Presently novajoin has no way of differentiating between hosts and
hostnames. As a result, it is possible for a host to be inadvertantly
deleted in certain conditions.
This fix aims to resolve this and other join/delete edge cases by
passing the instance-id (server uuid) from nova along in the
description field that is passed to IdM. We can use this
description and id to ensure we delete only the hosts we meant to.
Overview of changes:
- Persist nova instance-id in IdM's Description field
- Update join logic to handle hosts with old Description field
- Update join logic to cause nova deploy failure when attempting to
add a host with a hostname that is already enrolled
- Add new DuplicateInstanceError exception type
- Add new DeleteInstanceIdMismatch exception type
- Add inline comments documenting code flow
- IPAClient add_host doc strings for clarity
Change-Id: I676bac162a6ec35366c506bdb660cf3913131afd
While debugging nova-compute logs it was noted that error messages
were not being populated rather the fault_name was. Updating the
response we hand back to Nova to contain message within the
'error' key of the returned object.
Change-Id: I2e0f415a512e53261b1e366cd75b310dd06eec27
Exception message had incorrect string format that would result
in a TypeError being raise if/when this exception was caught.
Change-Id: I676631d79394e512371a8367f84b91761e983faa
* Fix cloud-init error message when OTP is missing
* Add a log message in novajoin-server
Change-Id: Ib299269c564744af6a5fcded9195d27be1c14ce7
Related-Bug: 1836529
We are having a hard time keeping track of which operations
correspond to which request. This patch adds the ability to track
operations in the notifier with the message_id of the notification
being processed. This message_id (which is generated by oslo is
a uuid
For the server, we could also set the message_id to the request_id
of the python-requests object received, but this is already
logged as part of the server logs.
Change-Id: Ie8b885a2b5cba6684e92c49eed4a99d24621402e
Debugging is confusing when the same names are used for methods
in two different controllers. Fixing this to more accurately
reflect whats going on.
Change-Id: I3740cd3ae81776cb1ecf066e617e615d880dc2e8
Right now, the backoff mechanism is broken when the backoff is
set to something non-zero. Basically, you go into this state where
you retry ad infinitum, leading to inconsistent behavior.
This change fixes the mechanism so that you only get a fixed number
of retries. You can choose (through a new config parameter) to allow
backoff (or not).
To restore some of the old behavior, the default for the connect_retries
parameter has been increased from 2 to 4, and the max backoff time has
been decreased from 1024 to 512 seconds. Its unlikely that we'd ever
reach that backoff time without a large number of retries, but 1024
seems too long.
And there is a new exception that is thrown when the connection
fails. This will result in nice 500 errors in the novajoin-server,
and some log messages for the notifier.
Change-Id: I10547fbde8966c8694346ed8c054e627bee2ee51
We get an instance ID directly from nova, which calls our API,
consequently we don't need to call back to nova to double check
if the instance ID realy exists.
Additionally, defer calling keystone and glance APIs to the moment
that the retrieved objects are realy needed.
Change-Id: I64a20c88229490690798aaf75ca0d96d98032467
In TripleO and devstack alike, service users are part of the "service"
project; while TripleO doesn't have a "service" role. So lets depend on
the project to enforce policy. This way this will still work out of the
box with TripleO.
Change-Id: I01cf7b38904bb0311658348dcdc0b0efd4f36c0e
Closes-Bug: #1812844
* Add default policy for handling the create request.
* Allow it to be accessed only by nova service.
* Remove unused code copied from cinder.
Change-Id: Ieaa407f27c6774d1fd17850a9571de5554360bae
This fixes ModuleNotFoundError: No module named 'StringIO',
raised in Python 3 functional tests. We also patch paramiko
on Python 3, since we use it in functional tests.
Change-Id: I357dd9c3ec7c0a76d31b7f94ec0e844d9bdcb5c5
This patch adds logic to handle compact service metadata that
has been split into multiple lines to avoid hitting the metadata
size limit.
Co-Authored-By: Grzegorz Grasza <xek@redhat.com>
Change-Id: Ida39f5768c67f982b2fe316f6fae4988a74c8534
In CI we get a random ResponseNotReady exception,
which is caused by the server closing the keepalive socket.
This will close and retry the connection.
This patch adds this reconnect in a second place that was missed.
Change-Id: I745aea8dcb51598ca7d7a371dce66c7dd6ae8005
This patch also moves the novajoin-install and novajoin-ipa-setup
scripts to the default python scripts directory. This is because
there is no other way to fixup the #! line for python3, apart from
modifying setup.py, which is managed by the global requirements repo.
Change-Id: I21ccb475905feebdb91aa158ce3845744b2f0a5f
Support nova versioned notifications. Unversioned notifications
are still supported and the default. The CI is configured to test
versioned notifications, and both implementations use the same methods.
Because of this, testing versioned notifications also covers
unversioned notifications, since the execution path flows through both.
Change-Id: If028afa9e9fbcb344786cd287605e0d9af5d3c01
This adds support for creating and removing DNS A records when
floating IPs are associated and disassociated in neutron.
novajoin-install and functional tests are enhanced to test it.
Change-Id: I82c83ad9e8c84ddfd4ecfc4d5c3b31a418af97a7
A basic test to check that a spawned instance
will be added to and than deleted from FreeIPA.
This also fixes the novajoin-install script to
work by default on devstack.
Change-Id: Id7e940360ade74d605fef9004c6a5454790c55a4
In CI we get a random ResponseNotReady exception,
which is caused by the server closing the keepalive socket.
This will close and retry the connection.
Change-Id: I28e51450cbfea8bf7a18e5783355b68f806eb999
Since novajoin integrates Nova with FreeIPA, functional tests
won't be able to run without FreeIPA. Therefore, we want to run
integration tests together with functional tests.
Change-Id: I93a3ef03b8bf2141710602fd8ba5f01098767fe3
In freeipa f62a0fdb904d2a4bb1961847e240dbb6df3b0b67 the IPA
client library was modified to remove the log_manager. This patch
fixes the novajoin code for all versions of IPA.
See rhbz# 1644747
Co-Authored-By: Juan Antonio Osorio Robles <jaosorior@redhat.com>
Co-Authored-By: Grzegorz Grasza <xek@redhat.com>
Change-Id: I2da12bedfc8790ebd1005c98f2e05953d127b3b9
Change: Id107000b3a667f5724331e281912560cff6f92f0 implemented
caching in the IPAClient. We need to store the OTP in the cache
and return the cached OTP, not the one generated on the join
request in case there is a cache hit, since we do not update
the OTP in FreeIPA when the host is in the cache.
Closes-Bug: #1796415
Change-Id: Ic19ee7c2228d275397bc4be04432126fd2f228ec
This adds two caches: one for hosts and another one for services. The
service cache also contains which hosts are managing the service.
This was done in order to reduce the calls to FreeIPA and to try to make
novajoin slightly more efficient.
Note that this was only added to the "add" functions, and the delete
functions merely update the cache. This is because checking for hosts
managing a group would require the cache to be consistent between all
the processes (and novajoin could be ran in several), and for this the
best thing would be to use a distributed cache. Being this the first
attempt, we leave this functionality out of the scope for this patch.
Change-Id: Id107000b3a667f5724331e281912560cff6f92f0
This implements adding additional services via the metadata interface by
reacting on the compute.instance.update notifications. This effectively
covers updates from already enrolled nodes with some services towards
adding new services.
Note that this still requires folks to remove services manually if
they're no longer used.
Another important thing to note is that this doesn't yet cover updates
from non-enrolled deployments to enrolling them and adding services.
Related-Bug: #1715295
Change-Id: I48ab94a184657f6730281740935a05143abbc499