openstack/nova - nova - OpenDev: Free Software Needs Free Tools

Commit Graph

Author	SHA1	Message	Date
zhong.zhou	f3eb76e57b	Validate flavor image min ram when resize volume-backed instance When resize instance, the flavors returned may not meet the image minimum memory requirement, resizing instance ignores the minimum memory limit of the image, which may cause the resizing be successfully, but the instance fails to start because the memory is too small to run the system. Related-Bug: 2007968 Change-Id: I132e444eedc10b950a2fc9ed259cd6d9aa9bed65	2024-04-18 10:53:04 +08:00
Zuul	6bd99eb2ea	Merge "Correctly reset instance task state in rebooting hard"	2024-03-20 13:34:22 +00:00
Sylvain Bauza	a87c10afa7	Update compute rpc alias for caracal This adds an alias for Caracal Change-Id: I4a57cdac68cab4cda2a1928dd4346c9f2bca14c3	2024-03-14 16:59:45 +01:00
Zuul	45e5d213f8	Merge "Removed explicit call to delete attachment"	2024-03-14 10:47:58 +00:00
Artom Lifshitz	c1ccc1a316	pwr mgmt: handle live migrations correctly Previously, live migrations completely ignored CPU power management. This patch makes sure that we correctly: * Power up the cores on the destination during pre_live_migration, as we need them powered up before the instance starts on the destination. * If the live migration is successful, power down the vacated cores on the source. * In case of a rollback, power down the cores previously powered up on pre_live_migration. Closes-bug: 2056613 Change-Id: I787bd7807950370cd865f29b95989d489d4826d0	2024-03-11 14:21:27 -04:00
Amit Uniyal	dc6dac360c	Removed explicit call to delete attachment This was a TODO to remove delete attachment call from refresh after remove_volume_connection call. Remove volume connection process itself deletes attachment on passing delete_attachment flag. Bumps RPC API version. Change-Id: I03ec3ee3ee1eeb6563a1dd6876094a7f4423d860	2024-03-01 06:26:48 +00:00
John Garbutt	947bb5f641	Make compute node rebalance safer Many bugs around nova-compute rebalancing are focused around problems when the compute node and placement resources are deleted, and sometimes they never get re-created. To limit this class of bugs, we add a check to ensure a compute node is only ever deleted when it is known to have been deleted in Ironic. There is a risk this might leave orphaned compute nodes and resource providers that need manual clean up because users do not want to delete the node in Ironic, but are removing it from nova management. But on balance, it seems safer to leave these cases up to the operator to resolve manually, and collect feedback on how to better help those users. blueprint ironic-shards Change-Id: I2bc77cbb77c2dd5584368563dc4250d71913906b	2024-02-25 13:25:27 -08:00
Zuul	9a9ab2128b	Merge "Reserve mdevs to return to the source"	2024-02-23 15:46:08 +00:00
Zuul	bdd7daffbb	Merge "Check if destination can support the src mdev types"	2024-02-23 15:46:01 +00:00
Sylvain Bauza	2e1e12cd62	Reserve mdevs to return to the source The destination lookups at the src mdev types and returns its own mdevs using the same type. We also reserve them by an internal dict and we make sure we can cleanup this dict if the live-migration aborts. Partially-Implements: blueprint libvirt-mdev-live-migrate Change-Id: I4a7e5292dd3df63943bd9f01803fa933e0466014	2024-02-16 16:05:48 +01:00
Ghanshyam Mann	0c1e1ccf03	HyperV: Remove RDP console API RDP console was only for HyperV driver so removing the API. As API url stay same (because same used for other console types API), RDP console API will return 400. Cleaning up the related config options as well as moving its API ref to obsolete seciton. Keeping RPC method to avoid error when old controller is used with new compute. It can be removed in next RPC version bump. Change-Id: I8f5755009da4af0d12bda096d7a8e85fd41e1a8c	2024-02-13 12:24:38 -08:00
Pierre-Samuel Le Stang	aa3e8fef7b	Correctly reset instance task state in rebooting hard When a user ask for a reboot hard of a running instance while nova compute is unavailable (service stopped or host down) it might happens under certain conditions that the instance stays in rebooting_hard task_state after nova-compute start again. This patch aims to fix that. Closes-Bug: #1999674 Change-Id: I170e390fe4e467898a8dc7df6a446f62941d49ff	2024-02-06 10:04:01 +01:00
Sylvain Bauza	489aab934c	Check if destination can support the src mdev types Now that the source knows that both the computes support the right libvirt version, it passes to the destination the list of mdevs it has for the instance. By this change, we'll verify if the types of those mdevs are actually supported by the destination. On the next change, we'll pass the destination mdevs back to the source. Partially-Implements: blueprint libvirt-mdev-live-migrate Change-Id: Icb52fa5eb0adc0aa6106a90d87149456b39e79c2	2024-02-05 15:10:55 +01:00
Amit Uniyal	b5173b4192	Fixes: bfv vm reboot ends up in an error state. we only need to verify if bdm has attachment id and it should be present in both nova and cinde DB. For tests coverage, added tests for bfv server to test different bdm source type. Closes-Bug: 2048154 Closes-Bug: 2048184 Change-Id: Icffcbad27d99a800e3f285565c0b823f697e388c	2024-01-18 05:53:51 +00:00
Zuul	39f560d673	Merge "pre-commit: Bump linter versions"	2023-12-21 05:33:02 +00:00
Zuul	7c2e79f762	Merge "Allow best effort sending of notifications"	2023-12-20 23:29:44 +00:00
Stephen Finucane	7116d8e5f1	pre-commit: Bump linter versions Change-Id: I6825266702a7a4626b0c80bebdcb83cbb43849ea Signed-off-by: Stephen Finucane <stephenfin@redhat.com>	2023-12-20 18:33:33 +00:00
Zuul	d1309c4745	Merge "Call Neutron immediately upon _post_live_migration() start"	2023-12-20 09:04:50 +00:00
Zuul	5e914c27a0	Merge "Bump hacking version"	2023-12-18 21:20:36 +00:00
Artom Lifshitz	06d25926a1	Allow best effort sending of notifications In the previous patch we changed the ordering of operations during post_live_migration() to minimize guest networking downtime by activating destination host port bindings as soon as possible. Review of that patch led to the realization that exceptions during notification sending can prevent the port binding activation from happening. Instead of handling that in a localized try/catch, this patch implements a general best_effort kwarg to our two notification sending helpers to allow callers to indicate that any exceptions during notification sending should not be fatal. Change-Id: I01a15d6fffe98816ae019e67dc72784299fedfd3	2023-12-17 08:37:11 -05:00
Artom Lifshitz	26fbc9e8e7	Call Neutron immediately upon _post_live_migration() start Previously, we first called to Cinder to cleanp source volume connections, then to Neutron to activate destination port bindings. This means that, if Cinder or the storage backend were slow, the live migrated instance would be without network connectivity during that entire process. This patch immediately activates the port bindings on the destination (and thus gives the instance network connectivity). We just need to get the old network_info first, in order to use it in notifications and to stash it for the later call to driver.post_live_migration_at_source(). This is a smaller and safer change than the parallelization attempt in the subsequent patch, so it's done in its own patch because it might be backportable, and would help with network downtime during live migration. To avoid any potential data leaks, we want to be certain to cleanup volumes on the source. To that end we wrap the code that is being moved before the source volume cleanup code in a try/finally block in order to prevent any uncaught exception from blocking the cleanup. Change-Id: I700943723a32e732e3e3be825f3fd44a9f923a0b	2023-12-16 21:03:57 -05:00
Sean Mooney	f4852f4c81	[codespell] fix final typos and enable ci This chnage adds the pre-commit config and tox targets to run codespell both indepenetly and via the pep8 target. This change correct all the final typos in the codebase as detected by codespell. Change-Id: Ic4fb5b3a5559bc3c43aca0a39edc0885da58eaa2	2023-12-15 12:32:42 +00:00
Stephen Finucane	3973fc393c	Bump hacking version This bumps the version of flake8 and resolves some erroneous failures in f-strings. A number of new E721 (do not compare types) class errors are picked up, which are all addressed. Change-Id: I7a1937b107ff3af8d1e5fe23fc32b120ef4697f7 Signed-off-by: Stephen Finucane <stephenfin@redhat.com>	2023-12-14 10:54:26 +00:00
Zuul	17b7aa3926	Merge "[codespell] fix typos in tests"	2023-12-13 23:46:08 +00:00
Zuul	0d0600d62e	Merge "Do not untrack resources of a server being unshelved"	2023-12-05 12:56:30 +00:00
Danylo Vodopianov	eb8519d811	Packed virtqueue support was added. 1) Extend flavor/image extra spec. 2) New xml parameter for qemu command was added. 3) New request filter added for scheduler. 4) Unit and Functional tests were updated 5) Requirments was updated ( os-traits = 3.0.0 ) 6) Releasnote was added Nova spec: https://review.opendev.org/c/openstack/nova-specs/+/868377 Depends-On: https://review.opendev.org/c/openstack/os-traits/+/876069 Change-Id: I789eeae86947e9a3cbd7d5fcc58d2aabe3b8b84c	2023-11-29 16:06:33 +02:00
Sylvain Bauza	ee9ed0f7c6	Fix rebuild compute RPC API exception for rolling-upgrades By I0d889691de1af6875603a9f0f174590229e7be18 we broke rebuild for Yoga or older computes. By I9660d42937ad62d647afc6be965f166cc5631392 we broke rebuild for Zed computes. Fixing this by making the parameters optional. Change-Id: I0ca04045f8ac742e2b50490cbe5efccaee45c5c0 Closed-Bug: #2040264	2023-10-30 10:17:42 -07:00
Dan Smith	190ecc6b8b	Clean up service_get_all() When we added the all_cells flag to this we just kinda hacked it into place, leaving a big chunk of the method nested inside a conditional. This refactors out that chunk into a helper, and also corrects a naming error that was very confusing when reading the code (a variable named "service" which was a list of services). Change-Id: I41ff076864dce9ed826922f6609536ea4545a181	2023-10-09 07:45:08 -07:00
Dan Smith	86889b9182	Warn if we find compute services in cell0 While debugging a field issue recently, we determined that computes had been pointed at cell0 and created service and node records there. This makes us warn during service list if we find compute services in cell0 to tip off operators that they have a configuration problem. Change-Id: Id95c0d02cc34348623b01997fcd1930628d48ccc	2023-10-09 07:03:48 -07:00
Sean Mooney	2232ca95f2	[codespell] fix typos in tests this mainly fixes typos in the tests and one type in an exception message. some addtional items are added to the dict based on our usage of vars in test but we could remove them later by doing minor test updates. They are intentionally not fixed in this commit to limit scope creep. Change-Id: Iacfbb0a5dc8ffb0857219c8d7c7a7d6e188f5980	2023-10-03 11:08:55 +01:00
Dan Smith	abbac59e33	Sanity check that new hosts have no instances If we are starting up and we think we are a new host (i.e. no pre- existing service record was found for us), we expect to have no instances on our hypervisor. If that is not the case, it is likely that we got pointed at a new fresh database or the wrong database (i.e. a different cell from our own). In that case, we should abort startup to avoid creating new service, compute node, and resource provider records. This is a sort of follow-on additional improvement related to work done in blueprint stable-compute-uuid. Change-Id: Id817c51c90485119270f3b7f3c52858f42100637	2023-10-02 12:35:00 -07:00
Zuul	11843f249c	Merge "Fix pep8 errors with new hacking"	2023-09-22 17:34:40 +00:00
Zuul	4b0515514d	Merge "Revert "Make compute node rebalance safter""	2023-09-13 21:32:20 +00:00
Sylvain Bauza	36a5740e2a	Revert "Make compute node rebalance safter" This reverts commit `772f5a1ae4`. Change-Id: I20e78dfafe19fc1e7dc7344238c01cb585f744dc	2023-09-13 19:24:20 +02:00
Sylvain Bauza	f502a23600	Update compute rpc alias for bobcat This adds an alias for Bobcat Change-Id: I244c8b17c6c0483e9c9e4131e1c94db34a439e77	2023-09-05 16:23:42 +02:00
Zuul	eee5b39b8e	Merge "Make compute node rebalance safter"	2023-09-02 08:26:43 +00:00
Sean Mooney	68b2131d81	only attempt to clean up dangling bdm if cinder is installed This change ensure we only try to clean up dangling bdms if cinder is installed and reachable. Closes-Bug: #2033752 Change-Id: I0ada59d8901f8620fd1f3dc20d6be303aa7dabca	2023-09-01 17:00:40 +00:00
John Garbutt	772f5a1ae4	Make compute node rebalance safter Many bugs around nova-compute rebalancing are focused around problems when the compute node and placement resources are deleted, and sometimes they never get re-created. To limit this class of bugs, we add a check to ensure a compute node is only ever deleted when it is known to have been deleted in Ironic. There is a risk this might leave orphaned compute nodes and resource providers that need manual clean up because users do not want to delete the node in Ironic, but are removing it from nova management. But on balance, it seems safer to leave these cases up to the operator to resolve manually, and collect feedback on how to better help those users. blueprint ironic-shards Change-Id: I7cd9e5ab878cea05462cac24de581dca6d50b3c3	2023-08-31 17:21:15 +00:00
Amit Uniyal	9d5935d007	Delete dangling bdms On reboot, check the instance volume status on the cinder side. Verify if volume exists and cinder has an attachment ID, else delete its BDMS data from nova DB and vice versa. Updated existing test cases to use CinderFixture while rebooting as reboot calls get_all_attachments Implements: blueprint https://blueprints.launchpad.net/nova/+spec/cleanup-dangling-volume-attachments Closes-Bug: 2019078 Change-Id: Ieb619d4bfe0a6472aefb118b58283d7ad8d24c29	2023-08-31 14:19:58 +00:00
Bence Romsics	f1dc4ec39b	Do not untrack resources of a server being unshelved This patch concerns the time when a VM is being unshelved and the compute manager set the task_state to spawning, claimed resources of the VM and then called driver.spawn(). So the instance is in vm_state SHELVED_OFFLOADED, task_state spawning. If at this point a new update_available_resource periodic job is started that collects all the instances assigned to the node to calculate resource usage. However the calculation assumed that a VM in SHELVED_OFFLOADED state does not need resource allocation on the node (probably being removed from the node as it is offloaded) and deleted the resource claim. Given all this we ended up with the VM spawned successfully but having lost the resource claim on the node. This patch changes what we do in vm_state SHELVED_OFFLOADED, task_state spawning. We no longer delete the resource claim in this state and keep tracking the resource in stats. Change-Id: I8c9944810c09d501a6d3f60f095d9817b756872d Closes-Bug: #2025480	2023-08-17 10:50:32 +02:00
Zuul	bee1313240	Merge "Online migrate missing Instance.compute_id fields"	2023-07-28 18:36:58 +00:00
Zuul	e5ee5e035c	Merge "Add compute_id to Instance object"	2023-07-27 01:38:06 +00:00
Zuul	54de747c25	Merge "Add dest_compute_id to Migration object"	2023-07-26 02:04:49 +00:00
Zuul	e02c5f0e7a	Merge "Populate ComputeNode.service_id"	2023-07-14 22:41:39 +00:00
Zuul	1fe8c4becb	Merge "Fix failed count for anti-affinity check"	2023-06-07 14:35:52 +00:00
Zuul	fc8951efb9	Merge "Process unlimited exceptions raised by unplug_vifs"	2023-06-07 14:16:10 +00:00
Yusuke Okada	56d320a203	Fix failed count for anti-affinity check The late anti-affinity check runs in the compute manager to avoid parallel scheduling requests to invalidate the anti-affinity server group policy. When the check fails the instance is re-scheduled. However this failure counted as a real instance boot failure of the compute host and can lead to de-prioritization of the compute host in the scheduler via BuildFailureWeigher. As the late anti-affinity check does not indicate any fault of the compute host itself it should not be counted towards the build failure counter. This patch adds new build results to handle this case. Closes-Bug: #1996732 Change-Id: I2ba035c09ace20e9835d9d12a5c5bee17d616718 Signed-off-by: Yusuke Okada <okada.yusuke@fujitsu.com>	2023-06-06 10:15:16 +02:00
Dan Smith	84e7bed27e	Online migrate missing Instance.compute_id fields This migrates existing Instance records with incomplete compute_id fields, where necessary, while updating node resources. This is done separately from the earlier patch to do new instances just to demonstrate in CI that we're still able to run in such a partially- migrated state. Related to blueprint compute-object-ids Change-Id: Ie18342deeddb7f58564953e6b46ea0b0d7495595	2023-05-31 07:13:16 -07:00
Dan Smith	625fb569a7	Add compute_id to Instance object This adds the compute_id field to the Instance object and adds checks in save() and create() to make sure we no longer update node without also updating compute_id. Related to blueprint compute-object-ids Change-Id: I0740a2e4e09a526da8565a18e6761b4dbdc4ec0b	2023-05-31 07:13:16 -07:00
Dan Smith	70516d4ff9	Add dest_compute_id to Migration object This makes us store the compute_id of the destination node in the Migration object. Since resize/cold-migration changes the node affiliation of an instance to the destination node from the source node, we need a positive record of the node id to be used. The destination node set this to its own node when creating the migration, and it is used by the source node when the switchover happens. Because the migration may be backleveled for an older node involved in that process and thus saved or passed without this field, this adds a compatibility routine that falls back to looking up the node by host/nodename. Related to blueprint compute-object-ids Change-Id: I362a40403d1094be36412f5f7afba00da8af8301	2023-05-31 07:13:16 -07:00

1 2 3 4 5 ...

7800 Commits