openstack/nova - nova - OpenDev: Free Software Needs Free Tools

Commit Graph

Author	SHA1	Message	Date
Zuul	6bd99eb2ea	Merge "Correctly reset instance task state in rebooting hard"	2024-03-20 13:34:22 +00:00
John Garbutt	947bb5f641	Make compute node rebalance safer Many bugs around nova-compute rebalancing are focused around problems when the compute node and placement resources are deleted, and sometimes they never get re-created. To limit this class of bugs, we add a check to ensure a compute node is only ever deleted when it is known to have been deleted in Ironic. There is a risk this might leave orphaned compute nodes and resource providers that need manual clean up because users do not want to delete the node in Ironic, but are removing it from nova management. But on balance, it seems safer to leave these cases up to the operator to resolve manually, and collect feedback on how to better help those users. blueprint ironic-shards Change-Id: I2bc77cbb77c2dd5584368563dc4250d71913906b	2024-02-25 13:25:27 -08:00
Zuul	bdd7daffbb	Merge "Check if destination can support the src mdev types"	2024-02-23 15:46:01 +00:00
Ghanshyam Mann	0c1e1ccf03	HyperV: Remove RDP console API RDP console was only for HyperV driver so removing the API. As API url stay same (because same used for other console types API), RDP console API will return 400. Cleaning up the related config options as well as moving its API ref to obsolete seciton. Keeping RPC method to avoid error when old controller is used with new compute. It can be removed in next RPC version bump. Change-Id: I8f5755009da4af0d12bda096d7a8e85fd41e1a8c	2024-02-13 12:24:38 -08:00
Pierre-Samuel Le Stang	aa3e8fef7b	Correctly reset instance task state in rebooting hard When a user ask for a reboot hard of a running instance while nova compute is unavailable (service stopped or host down) it might happens under certain conditions that the instance stays in rebooting_hard task_state after nova-compute start again. This patch aims to fix that. Closes-Bug: #1999674 Change-Id: I170e390fe4e467898a8dc7df6a446f62941d49ff	2024-02-06 10:04:01 +01:00
Sylvain Bauza	489aab934c	Check if destination can support the src mdev types Now that the source knows that both the computes support the right libvirt version, it passes to the destination the list of mdevs it has for the instance. By this change, we'll verify if the types of those mdevs are actually supported by the destination. On the next change, we'll pass the destination mdevs back to the source. Partially-Implements: blueprint libvirt-mdev-live-migrate Change-Id: Icb52fa5eb0adc0aa6106a90d87149456b39e79c2	2024-02-05 15:10:55 +01:00
Zuul	ef08dcb5f3	Merge "Reproducer test of bug #1999674 "	2023-12-20 03:50:36 +00:00
Zuul	17b7aa3926	Merge "[codespell] fix typos in tests"	2023-12-13 23:46:08 +00:00
Sean Mooney	2232ca95f2	[codespell] fix typos in tests this mainly fixes typos in the tests and one type in an exception message. some addtional items are added to the dict based on our usage of vars in test but we could remove them later by doing minor test updates. They are intentionally not fixed in this commit to limit scope creep. Change-Id: Iacfbb0a5dc8ffb0857219c8d7c7a7d6e188f5980	2023-10-03 11:08:55 +01:00
Dan Smith	abbac59e33	Sanity check that new hosts have no instances If we are starting up and we think we are a new host (i.e. no pre- existing service record was found for us), we expect to have no instances on our hypervisor. If that is not the case, it is likely that we got pointed at a new fresh database or the wrong database (i.e. a different cell from our own). In that case, we should abort startup to avoid creating new service, compute node, and resource provider records. This is a sort of follow-on additional improvement related to work done in blueprint stable-compute-uuid. Change-Id: Id817c51c90485119270f3b7f3c52858f42100637	2023-10-02 12:35:00 -07:00
Zuul	11843f249c	Merge "Fix pep8 errors with new hacking"	2023-09-22 17:34:40 +00:00
Sylvain Bauza	36a5740e2a	Revert "Make compute node rebalance safter" This reverts commit `772f5a1ae4`. Change-Id: I20e78dfafe19fc1e7dc7344238c01cb585f744dc	2023-09-13 19:24:20 +02:00
John Garbutt	772f5a1ae4	Make compute node rebalance safter Many bugs around nova-compute rebalancing are focused around problems when the compute node and placement resources are deleted, and sometimes they never get re-created. To limit this class of bugs, we add a check to ensure a compute node is only ever deleted when it is known to have been deleted in Ironic. There is a risk this might leave orphaned compute nodes and resource providers that need manual clean up because users do not want to delete the node in Ironic, but are removing it from nova management. But on balance, it seems safer to leave these cases up to the operator to resolve manually, and collect feedback on how to better help those users. blueprint ironic-shards Change-Id: I7cd9e5ab878cea05462cac24de581dca6d50b3c3	2023-08-31 17:21:15 +00:00
Zuul	e5ee5e035c	Merge "Add compute_id to Instance object"	2023-07-27 01:38:06 +00:00
Zuul	54de747c25	Merge "Add dest_compute_id to Migration object"	2023-07-26 02:04:49 +00:00
Zuul	e02c5f0e7a	Merge "Populate ComputeNode.service_id"	2023-07-14 22:41:39 +00:00
Zuul	1fe8c4becb	Merge "Fix failed count for anti-affinity check"	2023-06-07 14:35:52 +00:00
Zuul	fc8951efb9	Merge "Process unlimited exceptions raised by unplug_vifs"	2023-06-07 14:16:10 +00:00
Yusuke Okada	56d320a203	Fix failed count for anti-affinity check The late anti-affinity check runs in the compute manager to avoid parallel scheduling requests to invalidate the anti-affinity server group policy. When the check fails the instance is re-scheduled. However this failure counted as a real instance boot failure of the compute host and can lead to de-prioritization of the compute host in the scheduler via BuildFailureWeigher. As the late anti-affinity check does not indicate any fault of the compute host itself it should not be counted towards the build failure counter. This patch adds new build results to handle this case. Closes-Bug: #1996732 Change-Id: I2ba035c09ace20e9835d9d12a5c5bee17d616718 Signed-off-by: Yusuke Okada <okada.yusuke@fujitsu.com>	2023-06-06 10:15:16 +02:00
Dan Smith	625fb569a7	Add compute_id to Instance object This adds the compute_id field to the Instance object and adds checks in save() and create() to make sure we no longer update node without also updating compute_id. Related to blueprint compute-object-ids Change-Id: I0740a2e4e09a526da8565a18e6761b4dbdc4ec0b	2023-05-31 07:13:16 -07:00
Dan Smith	70516d4ff9	Add dest_compute_id to Migration object This makes us store the compute_id of the destination node in the Migration object. Since resize/cold-migration changes the node affiliation of an instance to the destination node from the source node, we need a positive record of the node id to be used. The destination node set this to its own node when creating the migration, and it is used by the source node when the switchover happens. Because the migration may be backleveled for an older node involved in that process and thus saved or passed without this field, this adds a compatibility routine that falls back to looking up the node by host/nodename. Related to blueprint compute-object-ids Change-Id: I362a40403d1094be36412f5f7afba00da8af8301	2023-05-31 07:13:16 -07:00
Dan Smith	afad847e4d	Populate ComputeNode.service_id The ComputeNode object already has a service_id field that we stopped using a while ago. This moves us back to the point where we set it when creating new ComputeNode records, and also migrates existing records when they are loaded. The resource tracker is created before we may have created the service record, but is updated afterwards in the pre_start_hook(). So this adds a way for us to pass the service_ref to the resource tracker during that hook so that it is present before the first time we update all of our ComputeNode records. It also makes sure to pass the Service through from the actual Service manager instead of looking it up again to make sure we maintain the tight relationship and avoid any name-based ambiguity. Related to blueprint compute-object-ids Change-Id: I5e060d674b6145c9797c2251a2822106fc6d4a71	2023-05-31 07:06:34 -07:00
Artom Lifshitz	faa1e64e5b	Fix pep8 errors with new hacking Hacking has bumped the version of flake8 that it's using to 5.0 in its 6.0.1 release. This turns up quite a few pep8 errors lurking in our code. Fix them. Needed-by: https://review.opendev.org/c/openstack/hacking/+/874516 Change-Id: I3b9c7f9f5de757f818ec358c992ffb0e5f3e310f	2023-04-28 08:34:52 -04:00
Dan Smith	82deb0ce4b	Stop ignoring missing compute nodes in claims The resource tracker will silently ignore attempts to claim resources when the node requested is not managed by this host. The misleading "self.disabled(nodename)" check will fail if the nodename is not known to the resource tracker, causing us to bail early with a NopClaim. That means we also don't do additional setup like creating a migration context for the instance, claim resources in placement, and handle PCI/NUMA things. This behavior is quite old, and clearly doesn't make sense in a world with things like placement. The bulk of the test changes here are due to the fact that a lot of tests were relying on this silent ignoring of a mismatching node, because they were passing node names that weren't even tracked. This change makes us raise an error if this happens so that we can actually catch it, and avoid silently continuing with no resource claim. Change-Id: I416126ee5d10428c296fe618aa877cca0e8dffcf	2023-04-24 15:26:52 -07:00
Alexey Stupnikov	dacae335e4	Process unlimited exceptions raised by unplug_vifs Currently compute manager's _cleanup_allocated_networks() method expects NotImplementedError or exception.NovaException when calling self.driver.unplug_vifs. In reality, other class of exception could be raised. It could break the Nova Compute flow and leave instance in inconsistent state. This patch switches from exception.NovaException to all kinds of exceptions. Closes-bug: #2015092 Change-Id: Icaf3cc93edfea97ee4fa497bdeb5f7d631c8ae55	2023-04-03 17:37:14 +02:00
Zuul	d443f8e4c4	Merge "Transport context to all threads"	2023-02-27 15:11:25 +00:00
Dan Smith	d892905904	Protect against a deleted node id file If we are starting up for the first time, we expect to generate and write a node_uuid file if one does not exist. If we are upgrading, we expect to do the same. However, if we are starting up not after an upgrade and not for the first time, a missing compute_id file is an error, and we should abort. Because of the additional check that this adds, which is called from a variety of places that don't have the stable compute node singleton stubbed to make it happy, this mocks the check for any test that does not specifically aim to exercise it. Related to blueprint stable-compute-uuid Change-Id: If83ce14b96e7d84ae38eba9d798754557d5abdfd	2023-02-01 09:20:59 -08:00
Dan Smith	72370a188c	Check our nodes for hypervisor_hostname changes When we are loading our ComputeNode objects by UUID according to what the virt driver reported, we can sanity check the DB records against the virt driver's hostname. This covers the case where our CONF.host has not changed but the hostname reported by the virt driver has, assuming we can find the ComputeNode object(s) that match our persistent node uuid. Related to blueprint stable-compute-uuid Change-Id: I41635210d7d6f46b437b06d2570a26a80ed8676a	2023-02-01 09:20:59 -08:00
Zuul	ac6f895361	Merge "compute: enhance compute evacuate instance to support target state"	2023-02-01 11:49:38 +00:00
Sahid Orentino Ferdjaoui	8c2e765989	compute: enhance compute evacuate instance to support target state Related to the bp/allowing-target-state-for-evacuate. This change is extending compute API to accept a new argument targetState. The targetState argument when set will force state of an evacuated instance to the destination host. Signed-off-by: Sahid Orentino Ferdjaoui <sahid.ferdjaoui@industrialdiscipline.com> Change-Id: I9660d42937ad62d647afc6be965f166cc5631392	2023-01-31 11:29:01 +01:00
Dan Smith	e258164f5a	Detect host renames and abort startup Except on ironic, if we are able to determine that our locally- configured node has a hostname other than what we expect, we should abort startup. Since we currently depend on the loose name-based association of nodes, services, and instances, we need to make sure we do not startup in an inconsistent configuration. Related to blueprint stable-compute-uuid Change-Id: I595b27a57516cffe1f172cf2fb736e1b11373a1d	2023-01-30 12:46:36 -08:00
Dan Smith	23c5f3d585	Make resource tracker use UUIDs instead of names This makes the resource tracker look up and create ComputeNode objects by uuid instead of nodename. For drivers like ironic that already provide 'uuid' in the resources dict, we can use that. For those that do not, we force the uuid to be the locally-persisted node uuid, and use that to find/create the ComputeNode object. A (happy) side-effect of this is that if we find a deleted compute node object that matches that of our hypervisor, we undelete it instead of re-creating one with a new uuid, which may clash with our old one. This means we remove some of the special-casing of ironic rebalance, although the tests for that still largely stay the same. Change-Id: I6a582a38c302fd1554a49abc38cfeda7c324d911	2023-01-30 10:53:44 -08:00
Dan Smith	53a925ff0f	Persist existing node uuids locally If we start up from an older service version, we need to look at the database to determine what our node is and write it locally. This leap of faith will be the last time we need to look up our node based on our hostname. Instead of relying on this, operators can (ideally) pre-populate the compute node identities for maximum safety. Change-Id: I178700bf5ba172603593b1c9972520d05face849	2023-01-20 08:07:06 -08:00
Dan Smith	cf111d1001	Pass service ref to init_host(), if exists This just adds the service ref to init_host() so that we can pass it (if it exists) at startup. On the first run, this will be None, so we know that we don't have an existing service record and thus do not need to do any migration. Related to blueprint stable-compute-uuid Change-Id: I1491c9b234ef0c262b5ed01d4c138eba8dedff77	2023-01-20 07:22:02 -08:00
Pierre-Samuel Le Stang	e5766446e5	Reproducer test of bug #1999674 This commit aims to show that an active instance which is hard rebooted while nova compute is unavailable (service or host down) stays in task_state REBOOTING_HARD after nova compute is available again. Related-Bug: #1999674 Change-Id: Ic672ce509ca21715f74931b8a6c6990b1c20ce30	2023-01-05 17:23:17 +01:00
Zuul	1c46c4e9e5	Merge "Store allocated RP in InstancePCIRequest"	2022-12-21 19:20:09 +00:00
Balazs Gibizer	f86f1800f0	Store allocated RP in InstancePCIRequest After the scheduler selected a target host and allocated an allocation candidate that is passed the filters nova need to make sure that PCI claim will allocate the real PCI devices from the RP which is allocated in placement. Placement returns the request group - provider mapping for each allocation candidate so nova can map which InstancePCIRequest was fulfilled from which RP in the selected allocation candidate. This mapping is then recorded in the InstancePCIRequest object and used during the PCI claim to filter for PCI pools that can be used to claim PCI devices from. blueprint: pci-device-tracking-in-placement Change-Id: I18bb31e23cc014411db68c31317ed983886d1a8e	2022-12-21 16:17:34 +01:00
John Garbutt	8a476061c5	Ironic: retry when node not available After a baremetal instance is deleted, and its allocation is removed in placement, the ironic node might start cleaning. Eventually nova will notice and update the inventory to be reserved. During this window, a new instance may have already picked this ironic node. When that race happens today the build fails with an error: "Failed to reserve node ..." This change tries to ensure the remaining alternative hosts are attempted before aborting the build. Clearly the race is still there, but this makes it less painful. Related-Bug: #1974070 Change-Id: Ie5cdc17219c86927ab3769605808cb9d9fa9fa4d	2022-12-15 16:33:43 +00:00
Sean Mooney	8449b7caef	[compute] always set instance.host in post_livemigration This change add a new _post_live_migration_update_host function that wraps _post_live_migration and just ensures that if we exit due to an exception instance.host is set to the destination host. when we are in _post_live_migration the guest has already started running on the destination host and we cannot revert. Sometimes admins or users will hard reboot the instance expecting that to fix everything when the vm enters the error state after the failed migrations. Previously this would end up recreating the instance on the source node leading to possible data corruption if the instance used shared storage. Change-Id: Ibc4bc7edf1c8d1e841c72c9188a0a62836e9f153 Partial-Bug: #1628606	2022-10-07 12:23:13 +00:00
Balazs Gibizer	e43bf900dc	Gracefully ERROR in _init_instance if vnic_type changed If the vnic_type of a bound port changes from "direct" to "macvtap" and then the compute service is restarted then during _init_instance nova tries to plug the vif of the changed port. However as it now has macvtap vnic_type nova tries to look up the netdev of the parent VF. Still that VF is consumed by the instance so there is no such netdev on the host OS. This error killed the compute service at startup due to unhandled exception. This patch adds the exception handler, logs an ERROR and continue initializing other instances on the host. Also this patch adds a detailed ERROR log when nova detects that the vnic_type changed during _heal_instance_info_cache periodic. Closes-Bug: #1981813 Change-Id: I1719f8eda04e8d15a3b01f0612977164c4e55e85	2022-09-08 09:19:16 +02:00
Rajat Dhasmana	30aab9c234	Add support for volume backed server rebuild This patch adds the plumbing for rebuilding a volume backed instance in compute code. This functionality will be enabled in a subsequent patch which adds a new microversion and the external support for requesting it. The flow of the operation is as follows: 1) Create an empty attachment 2) Detach the volume 3) Request cinder to reimage the volume 4) Wait for cinder to notify success to nova (via external events) 5) Update and complete the attachment Related blueprint volume-backed-server-rebuild Change-Id: I0d889691de1af6875603a9f0f174590229e7be18	2022-08-31 16:38:37 +05:30
Balazs Gibizer	0d526d1f4b	Reject PCI dependent device config The PCI tracking in placement does not support the configuration where both a PF and its children VFs are configured for nova usage. This patch adds logic to detect and reject such configuration. To be able to kill the service if started with such config special exception handling is added for the update_available_resource code path, similarly how a failed reshape is handled. blueprint: pci-device-tracking-in-placement Change-Id: I708724465d2afaa37a65c231c64da88fc8b458eb	2022-08-25 10:00:10 +02:00
Balazs Gibizer	14e68ac6e9	Rename [pci]passthrough_whitelist to device_spec A later patch in the pci-device-tracking-in-placement work will extend the existing [pci]passthrough_whitelist config syntax. So we take the opportunity here to deprecate the old non inclusive passthrough_whitelist name and introduce a better one. All the usage of CONF.pci.passthrough_whitelist is now changed over to the new device_spec config. Also the in tree documentation is updated accordinly. However the nova code still has a bunch of references to the "whitelist" terminology. That will be handled in subsequent patches. blueprint: pci-device-tracking-in-placement Change-Id: I843032e113642416114f169069eebf6a56ed78dd	2022-08-10 17:08:35 +02:00
Fabian Wiesel	646fc51732	Transport context to all threads The nova.utils.spawn and spawn_n methods transport the context (and profiling information) to the newly created threads. But the same isn't done when submitting work to thread-pools in the ComputeManager. The code doing that for spawn and spawn_n is extracted to a new function and called to submit the work to the thread-pools. Closes-Bug: #1962574 Change-Id: I9085deaa8cf0b167d87db68e4afc4a463c00569c	2022-08-04 17:36:23 +05:30
Balazs Gibizer	f8cf050a13	Remove double mocking In py310 unittest.mock does not allow to mock the same function twice as the second mocking will fail to autospec the Mock object created by the first mocking. This patch manually fixes the double mocking. Fixed cases: 1) one of the mock was totally unnecessary so it was removed 2) the second mock specialized the behavior of the first generic mock. In this case the second mock is replaced with the configuration of the first mock 3) a test case with two test steps mocked the same function for each step with overlapping mocks. Here the overlap was removed to have the two mock exists independently The get_connection injection in the libvirt functional test needed a further tweak (yeah I know it has many already) to act like a single mock (basically case #2) instead of a temporary re-mocking. Still the globalness of the get_connection mocking warrant the special set / reset logic there. Change-Id: I3998d0d49583806ac1c3ae64f1b1fe343cefd20d	2022-08-02 15:31:15 +02:00
Stephen Finucane	89ef050b8c	Use unittest.mock instead of third party mock Now that we no longer support py27, we can use the standard library unittest.mock module instead of the third party mock lib. Most of this is autogenerated, as described below, but there is one manual change necessary: nova/tests/functional/regressions/test_bug_1781286.py We need to avoid using 'fixtures.MockPatch' since fixtures is using 'mock' (the library) under the hood and a call to 'mock.patch.stop' found in that test will now "stop" mocks from the wrong library. We have discussed making this configurable but the option proposed isn't that pretty [1] so this is better. The remainder was auto-generated with the following (hacky) script, with one or two manual tweaks after the fact: import glob for path in glob.glob('nova/tests/*/.py', recursive=True): with open(path) as fh: lines = fh.readlines() if 'import mock\n' not in lines: continue import_group_found = False create_first_party_group = False for num, line in enumerate(lines): line = line.strip() if line.startswith('import ') or line.startswith('from '): tokens = line.split() for lib in ( 'ddt', 'six', 'webob', 'fixtures', 'testtools' 'neutron', 'cinder', 'ironic', 'keystone', 'oslo', ): if lib in tokens[1]: create_first_party_group = True break if create_first_party_group: break import_group_found = True if not import_group_found: continue if line.startswith('import ') or line.startswith('from '): tokens = line.split() if tokens[1] > 'unittest': break elif tokens[1] == 'unittest' and ( len(tokens) == 2 or tokens[4] > 'mock' ): break elif not line: break if create_first_party_group: lines.insert(num, 'from unittest import mock\n\n') else: lines.insert(num, 'from unittest import mock\n') del lines[lines.index('import mock\n')] with open(path, 'w+') as fh: fh.writelines(lines) Note that we cannot remove mock from our requirements files yet due to importing pypowervm unit test code in nova unit tests. This library still uses the mock lib, and since we are importing test code and that lib (correctly) only declares mock in its test-requirements.txt, mock would not otherwise be installed and would cause errors while loading nova unit test code. [1] https://github.com/testing-cabal/fixtures/pull/49 Change-Id: Id5b04cf2f6ca24af8e366d23f15cf0e5cac8e1cc Signed-off-by: Stephen Finucane <stephenfin@redhat.com>	2022-08-01 17:46:26 +02:00
Sean Mooney	cd2c2f359b	ignore deleted server groups in validation This change simply catches the exception raised when we lookup a servergroup via a hint and the validation upcall is enabled. Change-Id: I858b4da35382a9f4dcf88f4b6db340e1f34eb82d Closes-Bug: #1890244	2022-06-21 19:13:55 +01:00
Sean Mooney	84a84f7f2f	add repoducer test for bug 1890244 This change adds a test to simulate validating a instnace group policy where the group has been deleted but is still referenced in the scheduler hint. Change-Id: I803e6286a773d9e53639ab0cd449fc72bb3be613 Related-Bug: #1890244	2022-06-21 12:04:20 +01:00
Zuul	93a65f06df	Merge "Record SRIOV PF MAC in the binding profile"	2022-06-13 17:25:14 +00:00
Rajesh Tailor	2521810e55	Fix typos This change fixes some of the typos in unit tests as well as in nova code-base. Change-Id: I209bbb270baf889fcb2b9a4d1ce0ab4a962d0d0e	2022-05-30 17:40:00 +05:30

1 2 3 4 5 ...

911 Commits