Commit Graph

46 Commits

Author SHA1 Message Date
Harry Rybacki a71617627a Add locks to cache and cleanup kinit logic
After reviewing reports of multiple CCache cropping up in logs, we
found an issue in the way novajoin is initiating and updating
cache files containing keytabs. The result was numerous extra cache
files being created and overwritten.

With this change we ensure that the credentials cache is properly
shared across workers and that when new credentials are being
created, the cache files are locked to avoid potential conflicts.

Updates DEBUG level logging to include useful cache troubleshooting
breadcrumbs.

Change-Id: I07e0004f77e0d52ab2a2707c5fe50f48f718b717
Co-Authored-By: Ade Lee <alee@redhat.com>
2019-12-09 13:21:15 -05:00
Harry Rybacki c9299d5c37 Add configurable delay to connection rety attempts with IPA
Presently, when novajoin fails to make a connection with the IPA
server, for any reason, it will immediately re-attempt to make
the connection when the backoff is unset (it is off by default).
As a result, any timing related issues could be the source of
the connection issues will likely result in no connection at all.

This change adds a new configuration option, retry_delay, which
will halt subsequent connection attempts for N seconds where N
is the retry_delay. By default this is set to 5 seconds, mirroring
internal ipalib behavior[1].

[1] - https://github.com/freeipa/freeipa/blob/master/ipalib/install/kinit.py#L29-L30

Change-Id: Iec96e4bd6643c0a657c8db424cc72deb10f170bd
2019-10-21 10:21:51 -04:00
Harry Rybacki 4f1353bb1f Handle add_host and delete_host cases more robustly
Presently novajoin has no way of differentiating between hosts and
hostnames. As a result, it is possible for a host to be inadvertantly
deleted in certain conditions.

This fix aims to resolve this and other join/delete edge cases  by
passing the instance-id (server uuid) from nova along in the
description field that is passed to IdM. We can use this
description and id to ensure we delete only the hosts we meant to.

Overview of changes:
- Persist nova instance-id in IdM's Description field
- Update join logic to handle hosts with old Description field
- Update join logic to cause nova deploy failure when attempting to
  add a host with a hostname that is already enrolled
- Add new DuplicateInstanceError exception type
- Add new DeleteInstanceIdMismatch exception type
- Add inline comments documenting code flow
- IPAClient add_host doc strings for clarity

Change-Id: I676bac162a6ec35366c506bdb660cf3913131afd
2019-09-12 09:26:39 -04:00
Zuul 0bcb4ec41b Merge "Fix error message when OTP is missing, add logging" 2019-09-11 20:32:33 +00:00
Harry Rybacki 5b6b8607ec Log statement missing string replacement arg
Include missing arg

Change-Id: Ic494e58fc3b1f74f574e0dd8255ddc36ad2249c9
2019-09-05 15:46:40 -04:00
Grzegorz Grasza 956cc87cc1 Fix error message when OTP is missing, add logging
* Fix cloud-init error message when OTP is missing
* Add a log message in novajoin-server

Change-Id: Ib299269c564744af6a5fcded9195d27be1c14ce7
Related-Bug: 1836529
2019-08-30 14:57:13 +00:00
Ade Lee ade787b90c Add debug messages
We are having a hard time keeping track of which operations
correspond to which request.  This patch adds the ability to track
operations in the notifier with the message_id of the notification
being processed.  This message_id (which is generated by oslo is
a uuid

For the server, we could also set the message_id to the request_id
of the python-requests object received, but this is already
logged as part of the server logs.

Change-Id: Ie8b885a2b5cba6684e92c49eed4a99d24621402e
2019-08-22 07:54:17 +00:00
Ade Lee 6ed30c9476 Fix backoff mechanism
Right now, the backoff mechanism is broken when the backoff is
set to something non-zero.  Basically, you go into this state where
you retry ad infinitum, leading to inconsistent behavior.

This change fixes the mechanism so that you only get a fixed number
of retries.  You can choose (through a new config parameter) to allow
backoff (or not).

To restore some of the old behavior, the default for the connect_retries
parameter has been increased from 2 to 4, and the max backoff time has
been decreased from 1024 to 512 seconds.  Its unlikely that we'd ever
reach that backoff time without a large number of retries, but 1024
seems too long.

And there is a new exception that is thrown when the connection
fails.  This will result in nice 500 errors in the novajoin-server,
and some log messages for the notifier.

Change-Id: I10547fbde8966c8694346ed8c054e627bee2ee51
2019-08-20 11:02:07 -04:00
Grzegorz Grasza fe512714e2 Fix for ipalib 4.7.2 (Fedora 29)
This fixes u"unknown command 'b'xxx''" errors.

Change-Id: I155bd37e7007fce4e083f8e5f7c4a3511a44ae4a
2019-01-29 15:12:56 +01:00
Zuul 59b91ceebf Merge "Test compact_services metadata" 2018-12-19 20:50:58 +00:00
Grzegorz Grasza f5aab5544d Test compact_services metadata
This tests new and old formats, as well as instance metadata updates.

Change-Id: Ie7b3bcdbb98bb2786000207b72e7b289d5051b8f
2018-12-19 20:39:53 +01:00
Grzegorz Grasza f8036d01a5 Reconnect on httplib.ResponseNotReady
In CI we get a random ResponseNotReady exception,
which is caused by the server closing the keepalive socket.
This will close and retry the connection.
This patch adds this reconnect in a second place that was missed.

Change-Id: I745aea8dcb51598ca7d7a371dce66c7dd6ae8005
2018-12-03 16:38:53 +01:00
Grzegorz Grasza ed1838b7af Fix errors preventing novajoin to start on Python 3
This patch also moves the novajoin-install and novajoin-ipa-setup
scripts to the default python scripts directory. This is because
there is no other way to fixup the #! line for python3, apart from
modifying setup.py, which is managed by the global requirements repo.

Change-Id: I21ccb475905feebdb91aa158ce3845744b2f0a5f
2018-11-26 17:55:15 +01:00
Grzegorz Grasza 4d997dddc6 Support for associating and disassociating neutron floating IPs
This adds support for creating and removing DNS A records when
floating IPs are associated and disassociated in neutron.
novajoin-install and functional tests are enhanced to test it.

Change-Id: I82c83ad9e8c84ddfd4ecfc4d5c3b31a418af97a7
2018-11-22 15:40:05 +01:00
Grzegorz Grasza fe72231faa Test OpenStack server instance enrollment
A basic test to check that a spawned instance
will be added to and than deleted from FreeIPA.
This also fixes the novajoin-install script to
work by default on devstack.

Change-Id: Id7e940360ade74d605fef9004c6a5454790c55a4
2018-11-20 20:01:06 +01:00
Grzegorz Grasza e8ced3d13c Reconnect on httplib.ResponseNotReady
In CI we get a random ResponseNotReady exception,
which is caused by the server closing the keepalive socket.
This will close and retry the connection.

Change-Id: I28e51450cbfea8bf7a18e5783355b68f806eb999
2018-11-13 12:33:20 +01:00
Harald Jensås 96ab6fd525 Fix - Invalid ipaotp returned if host in cache
Change: Id107000b3a667f5724331e281912560cff6f92f0 implemented
caching in the IPAClient. We need to store the OTP in the cache
and return the cached OTP, not the one generated on the join
request in case there is a cache hit, since we do not update
the OTP in FreeIPA when the host is in the cache.

Closes-Bug: #1796415
Change-Id: Ic19ee7c2228d275397bc4be04432126fd2f228ec
2018-10-06 01:01:32 +02:00
Juan Antonio Osorio Robles 6ce780fc90 Add basic service and host caches
This adds two caches: one for hosts and another one for services. The
service cache also contains which hosts are managing the service.

This was done in order to reduce the calls to FreeIPA and to try to make
novajoin slightly more efficient.

Note that this was only added to the "add" functions, and the delete
functions merely update the cache. This is because checking for hosts
managing a group would require the cache to be consistent between all
the processes (and novajoin could be ran in several), and for this the
best thing would be to use a distributed cache. Being this the first
attempt, we leave this functionality out of the scope for this patch.

Change-Id: Id107000b3a667f5724331e281912560cff6f92f0
2018-02-09 17:36:28 +02:00
Rob Crittenden 70173f38ee Fix IPA v4.5.0 import issue with kinit_keytab
Centralize ipalib_imported so that all primary IPA imports are
done in one place.

The issue was that IPA imports were being done in two places
one of which was simpler and not aware of the import errors of
the other.

The symptom was that if one of the moved IPA v4.5.0 imports
failed then this was handled properly in ipa.py but in util.py
where only ipalib.api is imported the import would succeed.

This mismatch meant that api.finalize() wouddln't have been
called in ipa.py so any references to api.env.* in util.py
would fail.

Change-Id: I6016892630510b816721cea5b20c8d6e7f8d34fa
2017-09-11 20:04:14 -04:00
Rob Crittenden 1eae7a670b Make the IPA connection code more robust for notifications
Notifications can be multi-threaded but each thread was sharing
the same IPAClient instance which was causing contention in the
retry code (and likely the ccache).

Move the IPA object closer to execution so each notification will
have its own instance. This will mean more kinit activity but
each one will be isolated and be able to handle expired tickets,
IPA being down, network issues, etc.

Add backoff code on failures so we don't spam the IPA server with
retries. It is a by-two backoff from 2 - 1024 seconds.

NOTE: novajoin-server will not use this backoff code. It will
continue to use the retries configuration setting. This is because
we know there is a limited window to respond so cannot infinitely
do a retry unlike notifications.

Change-Id: Ia18d3f97f7549c89dcf4e6f014f44c3fcebc919f
2017-03-14 12:19:26 -04:00
Rob Crittenden 886dae3ad8 Delete DNS entries manually instead of relying on updatedns
IPA < 4.4.0 will fail if updatedns is True and no DNS entries
exist when deleting a host, so we disabled updatedns for the
subhost case.

This can result in leftover DNS entries.

So instead, drop updatedns for all delete cases and make a call
to dnsrecord-del and delete all DNS entries for a hostname
that way instead.

Change-Id: I6650c99811001adcb4417c3369d5df077b06d765
2017-03-14 12:19:26 -04:00
Rob Crittenden 54b9ddd0b2 Use custom split_principal method rather than the IPA version(s)
I intended to use upstream code for splitting the principal but
the semantics were just too different so I'm using a version
similar to the IPA v4.0.0 which does some extra validation that
we probably don't need but will split a principal without requiring
a REALM.

Change-Id: Ib6c8c6bab7694380c85dda230e7ec018c90d44d1
2017-02-20 10:39:40 -05:00
Jenkins 1a5bd2fce3 Merge "Implement service_has_hosts and host_has_services" 2017-02-10 16:44:39 +00:00
Jenkins 12dfff8aa6 Merge "Refactor the connection code to be more robust" 2017-02-06 22:01:22 +00:00
Juan Antonio Osorio Robles 955d554426 Remove Force argument from the delete-service call
This is not available for this call.

Change-Id: I88f03c17a0abe80d0637b424c5fb1d9e2f206ccc
2017-02-03 17:51:18 +00:00
Rob Crittenden 3e4c66d84c Refactor the connection code to be more robust
Remove some duplicated code, handle KerberosError and make
get_host_and_realm() a top-level method.

Add integration test for IPA connection code. It must be run
manually against a current installation at the moment.

Make the default IPA retry two so we actually do a retr

Change-Id: I6c0b52ed964851eff43c2d3fb209a90f9b1539e4
2017-02-01 10:30:22 -05:00
Rob Crittenden 8078c6161f Add compatibility for IPA 4.4 which requires TGT for API
IPA 4.4 added thin client capabilities. This is done by downloading
the call schema from the IPA server and is done during the
finalize() step. This requires a TGT.

So we need to ensure that a kinit is done before finalize() is
called both in the standalone installer and in the ipa code.

Change-Id: Id87b83cb945c946cf78c425aae19c311d900249a
2017-01-18 15:23:11 -05:00
Rob Crittenden 674781d049 Implement service_has_hosts and host_has_services
These methods are used to decide whether a given host or service
can be removed or not. We want to avoid deleting hosts that
manage other services or services being managed by other hosts.

Change-Id: I45f3d49e53a3f1bdeed149ae0790d820c87a9a58
2017-01-17 19:42:40 +00:00
Ade Lee b98afddf53 Add code to remove compact and managed hosts on delete notification
Change-Id: Ideb455a1b2723b22425ec17fa89f26918693c52b
2017-01-05 11:34:36 -05:00
Ade Lee 743ac85c96 Add batch operations
With all the managed services being added to the controllers,
the operations with IPA were taking too long and the nova metadata
read timeout was expiring.  One solution is to add the services
operations in a single batch, reducing round trip and re-auth time.

Change-Id: I78cacb1deb876185857e80154cf9fbec5b7d65d1
2016-12-21 10:16:14 +02:00
Juan Antonio Osorio Robles eeebeea447 Use nova metadata to create extra hosts/services
Co-Authored-By: Ade Lee <alee@redhat.com>
Change-Id: If2ae3c987fa0a073db1e88112809494afb6f9901
2016-12-16 08:50:53 +02:00
Rob Crittenden 28a010e3ae Make domain an optional configuration option
This will use the IPA domain by default. If a domain is specified
then it will be used.

For testing, if there is no config option and IPA is not available
then the domain 'test' will be mocked.

Change-Id: I21b66c61830c49a79094927f42c22c76ec1b447c
2016-12-15 16:57:29 -05:00
Rob Crittenden ef2c9baa36 Address issues found by pep8, pylint and unit tests
flake8 is quite a bit more picky and discovered a lot of
issues.

Don't let missing configuration blow things up in order to be
able to run unit tests.

Hacky workaround for missing ipalib/ipapython in PyPy
2016-11-09 19:52:38 +00:00
Rob Crittenden 24bf47dd6d Let the last metadata fetch win in setting the OTP
It appears that the metadata can be retrieved multiple times
when standing up a new instance. This is problematic for OTP
because we only have the clear value during the add phase so
if the host already exists we blow up.

I had been using a cache to store the value based on instance_id
but it wouldn't scale or work with HA without using a real
database backend. So instead the last update wins.
2016-10-07 10:31:36 -04:00
Rob Crittenden 0cc9fb7259 Updated fix for renewing credentials
The previous fix was incomplete. The exception handling was
wrong so it wasn't being caught and even if it had been the
old connection was still "open" which caused creating a new
one to throw its own exception.
2016-10-06 17:49:12 -04:00
Rob Crittenden 3b7f3d852e Catch the TicketExpired exception and renew credentials 2016-10-04 10:53:37 -04:00
Rob Crittenden e0eb3eed51 Address more issues reported by pylint
This does not bring it to 100% passing but it gets a lot closer.
2016-09-20 09:02:32 -04:00
Rob Crittenden 7fb07ad480 Add target for pep8 and lint, fix initial batch of issues
There are still quite a few lint errors to address but none
of them seems critical.
2016-09-20 09:02:32 -04:00
Rob Crittenden acaf13220b Fix PEP8 issue 2016-09-20 09:02:32 -04:00
Rob Crittenden ef7b5da50d Specify IPA API version 2.146 to act as a 4.2.0 client
This version was picked because services as members of roles
was added in this version.
2016-09-19 17:41:28 -04:00
Rob Crittenden dc7189d262 Drop the custom IPA request handler and use the IPA library
Use ipalib to handle the RPC rather than python-requests. This
is more robust and is a whole lot less work. It also solves
the problem of re-using credentials and better handling stale
ccache.
2016-09-14 11:06:25 -04:00
Rob Crittenden fb3fdffa5a Remove sqlite3 cache and get ipa_enroll from image metadata
Since I have credentials I don't need a cache to determine an
instance hostname so do away with that.

This will also allow enrollment if an image has the ipa_enroll
property set to True.

Finally, don't re-enforce ipa_enroll in the IPA code, that should
happen beforehand.
2016-08-29 17:30:51 -04:00
Rob Crittenden ded8b5e2f7 Don't set OS distro or version to a value of None
When fetching the os_* values from metadata the default was
None which could add that a string in the value sent to IPA.
2016-08-25 17:09:55 -04:00
Rob Crittenden bee74b53a8 Use auth token passed in to fetch image metadata from glance
The nova team doesn't want to pass in system_metadata which
contained the image metadata because the format/naming may
change over time. They'd prefer the caller fetch the image
metadata.

Added that, using the current authenticated token.

At some point will need to add the ability to generate a
token using some special service user given that some/most/all
requests to metadata will be unauthenticated.
2016-07-18 21:42:46 +00:00
Rob Crittenden 215674d542 Set IPA domain, fix errors caught in integration testing
Set the IPA domain in join.conf so hostnames will get the
IPA domain, instance_name + domain.

Don't blow up if metadata or system_metadata comes in as None.

Add some missing variable definitions caught by pylint.

Read join.conf in the notify server as well.

Re-order the kinit in the installation script to not fail
if the user has no pre-existing ticket.

Don't copy join.conf and api-paste.ini from going into
/usr/share/novajoin.
2016-07-07 19:41:54 +00:00
Rob Crittenden 1c51140028 Initial commit of REST/notification services
This is based heavily on the WSGI code in cinder.

There are two services: a REST service and a notification
listener.

Currently both log only to stdout.

The configuration file join.conf controls the REST service.

nova configuration should look like this (assuming the REST
service is running on the nova compute host).

vendordata_providers = StaticJSON, DynamicJSON
vendordata_dynamic_targets = 'join@http://127.0.0.1:9999/v1/'
vendordata_driver = nova.api.metadata.vendordata_http.HTTPFileVendorData
vendordata_dynamic_connect_timeout = 5
vendordata_dynamic_read_timeout = 30
vendordata_jsonfile_path = /etc/nova/cloud-config.json

For the notification service like this:

notification_driver = messaging
notification_topic = notifications
notify_on_state_change = vm_state

Authentication is disabled in api-paste.ini for now.
2016-07-05 19:53:11 +00:00