Provide a new ComputeManager for Ironic

Ironic is internally clustered and that causes some friction with Nova which assumes that each n-cpu instance is the only one responsible for any given VM. While this is going to be addressed in the long term, at the moment we need to make it possible to run in an HA setup for Ironic. The short term solution proposed is to run 2+ nova-compute's each of which reports the same hostname (e.g. ironic). This works but has some caveats - one of which is that _init_instance will now race between different nova-compute instances starting up at the same time. A custom ComputeManager permits us to address that without prejuidice to future long term solutions. Relatedly, TripleO needs Ironic to permit service startup before keystone is initialised, which the removal of API calls during init_host permits - and as there are no API calls needed for correct behaviour of the Ironic driver, this is straight forward :). See https://etherpad.openstack.org/p/ironic-nova-friction for more discussion. Change-Id: I68d46c4da8715df03c3a88393b55665dc57045a3 Closes-Bug: #1295503
2014-03-25 11:28:03 +13:00 · 2014-03-25 11:28:03 +13:00 · a719472d94
parent fadfa2fd80
commit a719472d94
3 changed files with 88 additions and 15 deletions
--- a/etc/ironic/ironic.conf.sample
+++ b/etc/ironic/ironic.conf.sample
@ -703,12 +703,6 @@
 # with Identity API Server. (integer value)
 #http_request_max_retries=3

-# Allows to pass in the name of a fake http_handler callback
-# function used instead of httplib.HTTPConnection or
-# httplib.HTTPSConnection. Useful for unit testing where
-# network is not available. (string value)
-#http_handler=<None>
-
 # Single shared secret with the Keystone configuration used
 # for bootstrapping a Keystone installation, or otherwise
 # bypassing the normal authentication process. (string value)
@ -746,20 +740,24 @@
 # value)
 #signing_dir=<None>

-# If defined, the memcache server(s) to use for caching (list
-# value)
+# Optionally specify a list of memcached server(s) to use for
+# caching. If left undefined, tokens will instead be cached
+# in-process. (list value)
 # Deprecated group/name - [DEFAULT]/memcache_servers
 #memcached_servers=<None>

-# In order to prevent excessive requests and validations, the
-# middleware uses an in-memory cache for the tokens the
-# Keystone API returns. This is only valid if memcache_servers
-# is defined. Set to -1 to disable caching completely.
-# (integer value)
+# In order to prevent excessive effort spent validating
+# tokens, the middleware caches previously-seen tokens for a
+# configurable duration (in seconds). Set to -1 to disable
+# caching completely. (integer value)
 #token_cache_time=300

-# Value only used for unit testing (integer value)
-#revocation_cache_time=1
+# Determines the frequency at which the list of revoked tokens
+# is retrieved from the Identity service (in seconds). A high
+# number of revocation events combined with a low cache
+# duration may significantly reduce performance. (integer
+# value)
+#revocation_cache_time=300

 # (optional) if defined, indicate whether token data should be
 # authenticated or authenticated and encrypted. Acceptable
--- a/ironic/nova/compute/init.py
+++ b/ironic/nova/compute/init.py
--- a/ironic/nova/compute/manager.py
+++ b/ironic/nova/compute/manager.py
@ -0,0 +1,75 @@
+# coding=utf-8
+#
+# Copyright 2014 Red Hat, Inc.
+# Copyright 2013 Hewlett-Packard Development Company, L.P.
+# All Rights Reserved.
+#
+#    Licensed under the Apache License, Version 2.0 (the "License"); you may
+#    not use this file except in compliance with the License. You may obtain
+#    a copy of the License at
+#
+#         http://www.apache.org/licenses/LICENSE-2.0
+#
+#    Unless required by applicable law or agreed to in writing, software
+#    distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+#    WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+#    License for the specific language governing permissions and limitations
+#    under the License.
+
+"""Short term workaround for friction in the Nova compute manager with  Ironic.
+
+https://etherpad.openstack.org/p/ironic-nova-friction contains current design
+work. The goal here is to generalise the areas where n-c talking to a clustered
+hypervisor has issues, and long term fold them into the main ComputeManager.
+"""
+
+from nova.compute import manager
+import nova.context
+
+
+class ClusteredComputeManager(manager.ComputeManager):
+
+    def init_host(self):
+        """Initialization for a clustered compute service."""
+        self.driver.init_host(host=self.host)
+        # Not used currently.
+        # context = nova.context.get_admin_context()
+        # instances = instance_obj.InstanceList.get_by_host(
+        #     context, self.host, expected_attrs=['info_cache'])
+
+        # defer_iptables_apply is moot for clusters - no local iptables
+        # if CONF.defer_iptables_apply:
+        #     self.driver.filter_defer_apply_on()
+
+        self.init_virt_events()
+
+        # try:
+            # evacuation is moot for a clustered hypervisor
+            # # checking that instance was not already evacuated to other host
+            # self._destroy_evacuated_instances(context)
+            # Don't run _init_instance until we solve the partitioning problem
+            # - with N n-cpu's all claiming the same hostname, running
+            # _init_instance here would lead to race conditions where each runs
+            # _init_instance concurrently.
+            # for instance in instances:
+            #     self._init_instance(context, instance)
+        # finally:
+            # defer_iptables_apply is moot for clusters - no local iptables
+            # if CONF.defer_iptables_apply:
+            #     self.driver.filter_defer_apply_off()
+
+    def pre_start_hook(self):
+        """After the service is initialized, but before we fully bring
+        the service up by listening on RPC queues, make sure to update
+        our available resources (and indirectly our available nodes).
+        """
+        # This is an optimisation to immediately advertise resources but
+        # the periodic task will update them eventually anyway, so ignore
+        # errors as they may be transient (e.g. the scheduler isn't
+        # available...). XXX(lifeless) this applies to all ComputeManagers
+        # and once I feature freeze is over we should push that to nova
+        # directly.
+        try:
+            self.update_available_resource(nova.context.get_admin_context())
+        except Exception:
+            pass