openstack/heat - heat - OpenDev: Free Software Needs Free Tools

Commit Graph

Author	SHA1	Message	Date
Stephen Finucane	43a5f3984e	db: Remove layer of indirection We don't have another ORM to content with here. Simplify 'heat.db.sqlalchemy' to 'heat.db'. Signed-off-by: Stephen Finucane <stephenfin@redhat.com> Change-Id: Id1db6c0ff126859f436c6c9b1187c250f38ebb62	2023-03-25 12:02:27 +09:00
Zane Bitter	5c326c22df	Simplify logic in retrigger_check_resource() The node to retrigger (cleanup or update) depends only on whether the update node appears in the new traversal's graph, not on what type of node in the old traversal was blocking the new one. Simplify the logic and remove the unused parameter. Also use the ConvergenceNode named tuple instead of raw tuples everywhere. Change-Id: I00aecb2b4b52d3d759446f22c69891fb85c4c735	2020-04-30 10:51:45 -04:00
Zane Bitter	38614a78c1	Add unit test for nested stack cancel Test that when cancelling a nested stack, its children also get cancelled. Change-Id: Icfd4ef1654dd141d17541bed48fee412001efdec	2019-10-29 23:18:13 -04:00
Zane Bitter	e63778efc9	Eliminate client race condition in convergence delete Previously when doing a delete in convergence, we spawned a new thread to start the delete. This was to ensure the request returned without waiting for potentially slow operations like deleting snapshots and stopping existing workers (which could have caused RPC timeouts). The result, however, was that the stack was not guaranteed to be DELETE_IN_PROGRESS by the time the request returned. In the case where a previous delete had failed, a client request to show the stack issued soon after the delete had returned would likely show the stack status as DELETE_FAILED still. Only a careful examination of the updated_at timestamp would reveal that this corresponded to the previous delete and not the one just issued. In the case of a nested stack, this could leave the parent stack effectively undeletable. (Since the updated_at time is not modified on delete in the legacy path, we never checked it when deleting a nested stack.) To prevent this, change the order of operations so that the stack is first put into the DELETE_IN_PROGRESS state before the delete_stack call returns. Only after the state is stored, spawn a thread to complete the operation. Since there is no stack lock in convergence, this gives us the flexibility to cancel other in-progress workers after we've already written to the Stack itself to start a new traversal. The previous patch in the series means that snapshots are now also deleted after the stack is marked as DELETE_IN_PROGRESS. This is consistent with the legacy path. Change-Id: Ib767ce8b39293c2279bf570d8399c49799cbaa70 Story: #1669608 Task: 23174	2018-07-30 20:48:28 -04:00
Zane Bitter	6a176a270c	Use a namedtuple for convergence graph nodes The node key in the convergence graph is a (resource id, update/!cleanup) tuple. Sometimes it would be convenient to access the members by name, so convert to a namedtuple. Change-Id: Id8c159b0137df091e96f1f8d2312395d4a5664ee	2017-09-26 16:46:17 -04:00
Jenkins	b4a1ad2bd5	Merge "Avoid creating two Stacks when loading Resource"	2017-08-14 12:23:17 +00:00
ricolin	552f94b928	Add converge flag in stack update for observing on reality Add converge parameter for stack update API and RPC call, that allow triggering observe on reality. This will be triggered by API call with converge argument (with True or False value) within. This flag also works for resources within nested stack. Implements bp get-reality-for-resources Change-Id: I151b575b714dcc9a5971a1573c126152ecd7ea93	2017-08-07 05:39:29 +00:00
Zane Bitter	960f626c24	Avoid creating two Stacks when loading Resource When load()ing a Resource in order to check it, we must load its definition from whatever version of the template it was created or last updated with. Previously we created a second Stack object with that template in order to obtain the resource definition. Since all we really need in order to obtain this is the StackDefinition, create just that instead. Change-Id: Ia05983c3d1b838d2e28bb5eca38d13e83ccaf368 Implements: blueprint stack-definition	2017-07-21 10:44:51 -04:00
Jenkins	224a83821a	Merge "Fix _retrigger_replaced in convergence worker"	2017-07-21 10:53:03 +00:00
Zane Bitter	33a16aa7a8	Log unhandled exceptions in worker RPC calls to the worker use 'cast', so nothing is listening to find out the result. If an exception occurs we will never hear about it. This change logs such unhandled exceptions as errors. Change-Id: I51365a9dee8fd4eff85e77d3e42bf33be814a22c Partial-Bug: #1703043	2017-07-10 16:43:38 -04:00
ricolin	6d7506c690	Fix _retrigger_replaced in convergence worker Fix missing argument in _retrigger_replaced when calling CheckResource. Closes-Bug: #1702487 Change-Id: Idc81b50fcc7036aa90f1489a348572ef03aa3381	2017-07-06 10:25:23 +08:00
Zane Bitter	5681e237c5	Avoid creating new resource with old template If a traversal is interrupted by a fresh update before a particular resource is created, then the resource is left stored in the DB with the old template ID. While an update always uses the new template, a create assumes that the template ID in the DB is correct. Since the resource has never been created, the new traversal will create it using the old template. To resolve this, detect the case where the resource has not been created yet and we are about to create it and the traversal ID is still current, and always use the new resource definition in that case. Change-Id: Ifa0ce9e1e08f86b30df00d92488301ea05b45b14 Closes-Bug: #1663745	2017-06-05 23:14:19 -04:00
liyi	8f10215ffd	Remove log translations Log messages are no longer being translated. This removes all use of the _LE, _LI, and _LW translation markers to simplify logging and to avoid confusion with new contributions. See: http://lists.openstack.org/pipermail/openstack-i18n/2016-November/002574.html http://lists.openstack.org/pipermail/openstack-dev/2017-March/113365.html Change-Id: Ieec8028305099422e1b0f8fc84bc90c9ca6c694f	2017-03-25 17:11:50 +08:00
Zane Bitter	bc4fde4dce	Add a NodeData class to roll up resource data Formalise the format for the output data from a node in the convergence graph (i.e. resource reference ID, attributes, &c.) by creating an object with an API rather than ad-hoc dicts. Change-Id: I7a705b41046bfbf81777e233e56aba24f3166510 Partially-Implements: blueprint stack-definition	2017-02-24 10:10:26 -05:00
Thomas Herve	84067dba88	Remove db.api wrapper The db.api module provides a useless indirection to the only implementation we ever had, sqlalchemy. Let's use that directly instead of the wrapper. Change-Id: I80353cfed801b95571523515fd3228eae45c96ae	2016-12-13 09:40:29 +01:00
Jenkins	bcf3889774	Merge "Cleanup service usage"	2016-11-22 13:14:09 +00:00
Crag Wolfe	892a4eac36	Do not load templates in stop_traversal When iterating through nested stacks in stop_traversal, there is no need to load or process templates. Change-Id: If2795cff4a9e7052e2186c811cdcd3e9451f9ff6	2016-11-07 11:27:21 -08:00
Thomas Herve	34f6ff920e	Cleanup service usage oslo_service Service usage in the engine was slightly wrong: we inherited from the base class without using its threadgroup, and we also inherited from it in utility classes that were not real services. This cleans up those. Change-Id: I0f902afb2b4fb03c579d071f9b502e3108aa460a	2016-11-03 07:59:10 +01:00
zhufl	5c74723f5e	Add missing %s in print message This is to add missing %s in print message Change-Id: Ibfc88c579442c38b5c58babae358d113c85c4172	2016-09-21 10:58:35 +08:00
Jenkins	07808e280a	Merge "Re-trigger on update-replace"	2016-09-20 23:26:40 +00:00
Anant Patil	99b055b423	Re-trigger on update-replace It is found that the inter-leaving of lock when a update-replace of a resource is needed is the reason for new traversal not being triggered. Consider the order of events below: 1. A server is being updated. The worker locks the server resource. 2. A rollback is triggered because some one cancelled the stack. 3. As part of rollback, new update using old template is started. 4. The new update tries to take the lock but it has been already acquired in (1). The new update now expects that the when the old resource is done, it will re-trigger the new traversal. 5. The old update decides to create a new resource for replacement. The replacement resource is initiated for creation, a check_resource RPC call is made for new resource. 6. A worker, possibly in another engine, receives the call and then it bails out when it finds that there is a new traversal initiated (from 2). Now, there is no progress from here because it is expected (from 4) that there will be a re-trigger when the old resource is done. This change takes care of re-triggering the new traversal from worker when it finds that there is a new traversal and an update-replace. Note that this issue will not be seen when there is no update-replace because the old resource will finish (either fail or complete) and in the same thread it will find the new traversal and trigger it. Closes-Bug: #1625073 Change-Id: Icea5ba498ef8ca45cd85a9721937da2f4ac304e0	2016-09-20 11:58:24 +00:00
Anant Patil	bc2e136fe3	Cancel traversal of nested stack The stack cancel update would halt the parent stack from propagating but the nested stacks kept on going till they either failed or completed. This is not desired, the cancel update should stop all the nested stacks from moving further, albeit, it shouldn't abruptly stop the currently running workers. Change-Id: I3e1c58bbe4f92e2d2bfea539f3d0e861a3a7cef1 Co-Authored-By: Zane Bitter <zbitter@redhat.com> Closes-Bug: #1623201	2016-09-15 10:30:58 -04:00
Anant Patil	2e281df428	Fix sync point delete When a resource failed, the stack state was set to FAILED and current traversal was set to emoty string. The actual traversal was lost and there was no way to delete the sync points belonging to the actual traversal. This change keeps the current traversal when you do a state set, so that later you can delete the sync points belonging to it. Also, the current traversal is set to empty when the stack has failed and there is no need to rollback. Closes-Bug: #1618155 Change-Id: Iec3922af92b70b0628fb94b7b2d597247e6d42c4	2016-09-14 17:04:22 +05:30
Anant Patil	873a40851d	Convergence: basic framework for cancelling workers Implements mechanism to cancel existing workers (in_progress resources). The stack-cancel-update request lands in one of the engines, and if there are any workers in that engine which are working for the stack, they are cancelled first and then other engines are requested to cancel the workers. Change-Id: I464c4fdb760247d436473af49448f7797dc0130d	2016-09-10 09:22:36 +02:00
Zane Bitter	9c79ee4d69	Add interrupt points for convergence check-resource operations This allows a convergence operation to be cancelled at an appropriate point (i.e. between steps in a task) by sending a message to a queue. Note that there's no code yet to actually cancel any operations (specifically, sending a cancel message to the stack will _not_ cause the check_resource operations to be cancelled under convergence). Change-Id: I9469c31de5e40334083ef1dd20243f2f6779549e Related-Bug: #1545063 Co-Authored-By: Anant Patil <anant.patil@hpe.com>	2016-08-26 11:02:45 +00:00
Anant Patil	084d0eb20f	Convergence cancel update implementation Implements: (1) stack-cancel-update <stack_id> will start another update using the previous template/environment. We'll start rolling back; in-progress resources will be allowed to complete normally. (2) stack-cancel-update <stack_id> --no-rollback will set the traversal_id to None so no further resources will be updated; in-progress resources will be allowed to complete normally. Change-Id: I46ebdebb130be7410abe3e0b62f85da9856287b6	2016-08-23 17:01:57 +05:30
Anant Patil	459086f984	Convergence: Cancel message Implements a cancel message sending mechanism. A cancel message is sent to heat engines working on the stack. Change-Id: I3b529addbd02a79364f7f2a041fc87d5019dd5d9 Patial-Bug: #1533176	2016-07-05 07:52:03 +00:00
Jenkins	98b5f3b79c	Merge "Convergence: Refactor worker"	2016-05-12 07:13:23 +00:00
Rabi Mishra	51d913a30d	Check for worker_service initialization When stopping the engine check if the worker_service is intialized or not before stopping it. Change-Id: I876c2cef4bf6589b9bc45f58b5cd52ed0323c9e9 Closes-Bug: #1572851	2016-04-25 08:33:30 -05:00
Anant Patil	829e80d06e	Convergence: Refactor worker Refactor the worker service; move the check resource code to its own class in another file and keep the convergence worker RPC API clean. This refactor will help us contain the convergence logic in a separate class file instead of in RCP API. The RPC service class should only have the APIs it implements. Change-Id: Ie9cf4daba7e6bf61f4cac3388494e8c9efefa4d7	2016-04-22 12:52:16 +00:00
Jenkins	9d03183ab5	Merge "Use EntityNotFound instead of SyncPointNotFound"	2016-03-30 07:35:50 +00:00
Anant Patil	afd08e07b5	Convergence: Avoid cache when resolving input data While constructing input-data for building the cache, the resource attributes must resolve without hitting the cache again. It is unnecessary to look into cache for resolving attributes of a freshly baked resource. Change-Id: I0893c17d87c687ca5cf370c4443f471160bd2f3c	2016-03-08 06:54:06 +00:00
Thomas Herve	c4f8db9681	Add function tests for event sinks Add a new functional test using Zaqar as a target for event sinks. This fixes the behavior when convergence is on. Change-Id: I4bbdec55b98d0a261168229540a411d423e9406d	2016-02-22 09:41:13 +00:00
ricolin	0c8d9145da	Use EntityNotFound instead of SyncPointNotFound Unify NotFound exception with Entitynotfound. Change-Id: I0c69596eb332b768a606c7b11ef768c4a1404d2e Depends-On: I782c372723f188bab38656e5b7cc401d23808ffb	2016-01-17 06:19:52 +00:00
Anant Patil	b84417b6ce	Convergence: Pick resource from dead engine worker When a engine worker crashes or is restarted, the resources being provisioned in it remain in IN_PROGRESS state. Next stack update should pick these resources and work on them. The implementation is to set the status of resource as FAILED and re-trigger check_resource. Change-Id: Ib7fd73eadd0127f8fae47881b59388b31131daf4 Closes-Bug: #1501161	2016-01-06 16:01:08 +05:30
Rakesh H S	24d265327e	Convergence: Re-trigger failed resource for latest traversal Presently, when a resource of previous traversal completes its action successfully we re-trigger this resource for latest traversal.(since the latest traversal will be waiting for its completion) However, if a resource of previous traversal fails we do not re-trigger which leads to latest traversal waiting endlessly. This patch re-triggers the resource for latest traversal even when the resource fails. Change-Id: I9f70878ad7f1ff7c2facb950e496681425b54fc4 Partial-Bug: #1512343	2015-11-26 09:46:08 +00:00
Anant Patil	634c24ecfe	Convergence: Concurrency subtle issues To avoid certain concurrency related issues, the DB update API needs to be given the traversal ID of the stack intended to be updated. By making this change, we can void having following at all the places: if current_traversal != stack.current_traversal: return The check for current traversal should be implicit, as a part of stack's store and state_set methods, where self.current_traversal should be used as expected traversal to be updated. All the state changes or updates in DB to the stack object go through this implicit check (using update...where). When stack updates are triggered, the current traversal should be backed up as previous traversal, a new traversal should be generated and the stack should be stored in DB with expected traversal as the previous traversal. This will ensure that no two updates can simultaneously succeed on same stack with same traversal ID. This was one of our primary goal. Following example cases describe the issues we encounter: 1. When 2 updates, U1 and U2 try to update a stack concurrently: 1. Current traversal(CT) is X 2. U1 loads stack with CT=X 3. U2 loads stack with CT=X 4. U2 stores the stack and updates CT=Y 5. U1 stores the stack and updates the CT=Z Both the updates have succeeded, and both would be running until one of the workers does stack.current_traversal == current_traversal and bail out. Ideally, U1 should have failed: only one should be allowed in case of concurrent update. When both U1 and U2 pass X as the expected traversal ID of the stack, then this problem is solved. 2. A resource R is being provisioned for stack with current traversal CT=X: 1. An new update U is issued, it loads the stack with CT=X. 2. Resource R fails and loads the stack with CT=X to mark it as FAILED. 3. Update U updates the stack with CT=Y and goes ahead with sync_point etc., marks stack as UPDATE_IN_PROGRESS 4. Resource marks the stack as UPDATE_FAILED, which to user means that update U has failed, but it actually is going on. With this patch, when Resource R fails, it will supply CT=X as expected traversal to be updated and will eventually fail because update U with CT=Y has taken over. Partial-Bug: #1512343 Change-Id: I6ca11bed1f353786bb05fec62c89708d98159050	2015-11-26 09:45:49 +00:00
Rakesh H S	77c11d037c	Convergence: Load resource stack with correct template When loading a resource, load the stack with template of the resource. Appropriate stack needs to be assigned to resource(resource.stack), else resource actions will fail. Co-Authored-By: Anant Patil <anant.patil@hp.com> Partial-Bug: #1512343 Change-Id: Ic4526152c8fd027049514b71554036321a61efd2	2015-11-26 14:05:21 +05:30
Peter Razumovsky	2da170c435	Fix [H405] pep rule in heat/engine Fix [H405] rule in heat/engine python files. Implements bp docstring-improvements Change-Id: Iaa1541eb03c4db837ef3a0e4eb22393ba32e270f	2015-09-21 14:51:46 +03:00
Rakesh H S	1956ddd2a6	Convergence: Store resource status in cache data Fix failing convergence gate functional tests - store resource uuid, action, status in cache data. Most of the code requires the resource to have proper status and uuid to work. - initialize rsrc._data to None so that the resource data is fetched from db first time. Change-Id: I7309c7da8fe1ce3e1c7e3d3027dea2e400111015 Co-Authored-By: Anant Patil <anant.patil@hp.com> Partial-Bug: #1492116 Closes-Bug: #1495094	2015-09-14 17:29:18 +05:30
Oleksii Chuprykov	f1b2d9add5	Move Resource exceptions to common module (4) It is convenient to have all exceptions in exception module. Also it is reduces namespace cluttering of resource module and decreases the number of dependencies in other modules (we do not need to import resource in some cases for now). UpdateInProgress exception is moved in this patch. Change-Id: If694c264639bbce5334e1e6e7403b225ce1d3aee	2015-09-04 11:24:47 +00:00
Oleksii Chuprykov	4e2cfb991a	Move Resource exceptions to common module (1) It is convenient to have all exceptions in exception module. Also it is reduces namespace cluttering of resource module and decreases the number of dependencies in other modules (we do not need to import resource in some cases for now). UpdateReplace exception is moved in this patch. Change-Id: Ief441ca2022a0d50e88d709d1a062631479715b7	2015-09-04 14:23:53 +03:00
Angus Salkeld	dd0859a080	Convergence: add support for the path_component store the attr name and path so attributes don't get shadowed e.g. get_attr: [res1, attr_x, show] get_attr: [res1, attr_x, something] Change-Id: I724e91b32776aa5813d2b821c2062424e0635a69	2015-09-01 12:53:05 +05:30
Angus Salkeld	881e4d051a	Convergence: input_data physical_resource_id -> reference_id 1. we are caching the result of FnGetRefId which can be the name 2. cache_data_resource_attribute() was trying to access "attributes" instead of "attrs". Change-Id: I59d55dcee2af521924fdb5da14e012dcc7b4dd3f	2015-08-18 12:06:36 +10:00
Anant Patil	b5968ef068	Convergence: Implementation of timeout The resource provisioning work is distributed among heat engines, so the timeout also has to be distributed and brought to the resource level granularity. Thus, 1. Before invoking check_resource on a resource, ensure that the stack has not timed out. 2. Pass the remaining amount of time to the resource converge method so that it can raise timeout exception if it cannot finish in the remaining time. Once timeout exception is raised by a resource converge method, the corresponding stack is marked as FAILED with "Timed out" as failure reason. Then, if rollback is enabled on the stack, it is triggered. Change-Id: Id1806d546c67505137f57f72d5b463dc229a666d	2015-08-07 10:05:30 +05:30
Jenkins	ae7cb9bfcb	Merge "Convergence: Refactor convergence dependency"	2015-08-04 11:26:22 +00:00
Sirushti Murugesan	5d1027a135	Convergence: Do create operation only if action is INIT All resources that are new will have an INIT state. Instead of having a complex strategy to decide whether the resource should be created or updated, just check for the action to see if it is in the INIT state or not. If it is not, then always trigger the update workflow. Also, this fixes a bug where we triggered a create for a resource without a resource id that originally should've been updated because it was in UPDATE_FAILED which was the unhandled case. Change-Id: I3f2318fecfe76592e8b54e9c09fdf1614197e83f	2015-08-03 19:12:27 +05:30
Anant Patil	cd3931c635	Convergence: Refactor convergence dependency A new property is added to fetch convergence dependencies from the stack. Change-Id: If2eb29f9222f21390513fad5702dc4718d5c4165	2015-08-01 04:20:56 +00:00
Angus Salkeld	d23ebb6065	Convergence: clarify what "data" is Mostly in worker we have arguments called "data", it is not clear if these are serialized or not (and if they have adopt data in them). 1. split adopt data out (add RPC support for the new argument) 2. name arguments "resource_data" for deserialized data 3. name arguments "rpc_data" for serialized data 4. make sure all data into client.check_resource() is serialized Change-Id: Ie6bd0e45d2857d3a23235776c2b96cce02cb711a	2015-08-01 04:19:33 +00:00
Angus Salkeld	14897230fb	Clean up the worker service logging 1. remove the duplication between service.py and worker.py 2. use the topic, version & engine_id when logging Change-Id: I2b7dfbbe1d5a68a9f1739ab53ba5c08691b495e1	2015-08-01 04:19:15 +00:00

1 2

80 Commits