stacktach

Commit Graph

Author	SHA1	Message	Date
OpenDev Sysadmins	fda6435ff3	OpenDev Migration Patch This commit was bulk generated and pushed by the OpenDev sysadmins as a part of the Git hosting and code review systems migration detailed in these mailing list posts: http://lists.openstack.org/pipermail/openstack-discuss/2019-March/003603.html http://lists.openstack.org/pipermail/openstack-discuss/2019-April/004920.html Attempts have been made to correct repository namespaces and hostnames based on simple pattern matching, but it's possible some were updated incorrectly or missed entirely. Please reach out to us via the contact information listed at https://opendev.org/ with any questions you may have.	2019-04-19 19:50:32 +00:00
Monsyne Dragon	02350960d0	Pin librabbitmq due to compilation error. Librabbitmq 2.0.0 has build issues in some environments, this is apparently due to a bug in librabbitmq's c-extention. Pin to the previous version (1.6.1) until bug is resolved. Change-Id: I060fbeee176434bfa8e1041fe4b0caaac68992c5	2018-08-14 00:01:31 +00:00
ishamibrahim	865c1aa10d	A python script to clean the stacktach DB. This adds data on instances that are missing or deleted in Nova DB to the stacktach DB instance_deletes table. CHANGE-1 : removing DB connection method from utils.py and checking if an entry in the stacktach_instancedeletes already exists to avoid duplicates. The DB connection method has also been removed from the utils.py file. Change-Id: I7b3fbf41b8ef54e9e6bdb53046c96c2824f64613	2017-09-14 21:13:31 +05:30
Monsyne Dragon	69e04b1c3a	Add Version number in logs at start. Report the version number on startup of verifier and worker daemons to make debugging of deployments easier. Change-Id: Ib6f9008ab103a67d958004e7151f30065daa5a3d	2017-09-05 20:15:02 +00:00
Isham Ibrahim	89ff15391b	VIRT-2874: Using terminated_at value instead of deleted_at from the request to use as the deleted_at field in the DB. Unit test changes also included PHASE-II Changed test cases since the current ones were logically incorrect. Change-Id: Idb3d42a245711fc72c32ab66b395a7aa67e1bb87	2017-07-20 19:27:51 +05:30
Isham Ibrahim	b7aa9c8980	VIRT-2985: Continuing the loop for a batch update of the `exists` pingback from yagi in case of django.ObjectNotExist and django.MultipleObjectExist errors. unit tests for batch update containing django errors have also been updated. SECOND ITERATION - Added another mock patch to fake Django DB transactions. Change-Id: Ia953d9b5393ee315c6296f0f0e9d98ff4c456e0c	2017-07-12 15:49:15 +05:30
Jenkins	d6acee808d	Merge "JIRA VIRT-2986. Added an exponential back off strategy for RabbitMQ connection failure in the callback method of the nova verifier."	2017-06-01 22:26:35 +00:00
Isham Ibrahim	a80b25fcce	VIRT-2996: Handling malformed notifications sent by Nova to stacktach. Before:- Stacktack threw an exception and was unable to log the message. It later tries to process the same erroneous notification repeatedly. After: Stacktach catched the malformed notification and logs the message. It later acknowledges RabbiMQ so that the message is removed from queue. Change-Id: I7a33816a7ce4660513b047a7e54c3223a63c8cb3	2017-05-30 20:19:34 +05:30
Isham Ibrahim	1b1a5cc4c9	JIRA VIRT-2986. Added an exponential back off strategy for RabbitMQ connection failure in the callback method of the nova verifier. Change-Id: I8a1c0d14a28e3f0f8f46ba15f14b84dc35fe10ee	2017-05-11 13:15:02 +05:30
Jenkins	0c4b90f1c3	Merge "Update .gitreview for new namespace"	2017-05-08 15:09:52 +00:00
Monsyne Dragon	f5a03f1afe	Fixes for stacktach verifier processes Fix memory usage for verifiers. Events to verify were being loaded from the db into an in-memory fifo queue to spool to worker processes. This was not being limited, resulting in a large amount of memory being used if events were read from the DB faster than they were being processed. This change pauses the loading of events if the in-memory queue grows larger than specified batchsize. Also, verifier child processes were not handling signals (like SIGTERM) properly, resulting in them not shutting down properly. Added proper signal handling. Change-Id: Ife25ca07398acf111f4388071b5f2e4eafeecb05	2016-07-21 22:54:48 +00:00
Monsyne Dragon	0c8ee8fc40	Reset 'Verifying' notifications on verifier start. If the stacktach verifier crashes, notifications 'in-flight' can be stuck in 'verifying' status'. This change flips those back to 'pending' so they get processed. Change-Id: Ie4aabed0c4991429a3e18e3b28813917d822867a	2016-03-07 18:05:00 +00:00
Monsyne Dragon	5ac182ac4e	Properly log verifier exceptions. If an exception is thrown in the verifier child process for a specific exchange, log it properly. Was simply printing to stdout, which goes nowhere for daemon processes. Change-Id: I528ad08e70d7bdf03e9a8e1d8abe45d09f2eb476	2016-03-04 21:07:17 +00:00
Monsyne Dragon	03cd412254	Fix missing import in verifier start script. Replace accidentally removed datetime import. Change-Id: I2476d0eb15aca37e01c522e950a23886ce70eff0	2016-02-29 21:45:09 +00:00
Monsyne Dragon	c389b8f8a9	Add watchdog for verifier processes. Add a watchdog in parent process to check verifier child processes, and restart if needed. Change-Id: Icacc4c046a8f4ba949499780cdc4724c9fd54fba	2016-02-25 16:36:02 +00:00
Jeremy Stanley	eb178e736e	Update .gitreview for new namespace Change-Id: I6bd00fd50a2b9a8bbfa6e5561656c1c5377bba61	2015-10-17 22:39:03 +00:00
Min Pae	2b4535c8a2	fixing syntax error on return line Change-Id: Ia28ecf3887374ba4a1c82ca2aa555537991962a6	2015-07-12 17:48:58 -07:00
Monsyne Dragon	82aca32286	Automatically check and restart worker processes. Make the parent worker process automatically restart hung or dead child processes. The parent will check all the child processes every 30 sec to make sure they are still running. If not they will be restarted. Also child processes update a heartbeat timestamp periodically while processing messages. If the parent detects that that timestamp hasn't been updated in a configurable amount of time (default 600sec) it terminates the old process and spins up a new one. Change-Id: I28ffbe64391d04a6e85b7e197393352ee1e978b0	2015-07-09 16:50:55 +00:00
Monsyne Dragon	2fa78b9309	Fix ordering bug procesing updates. Occasionally, out of order notifications received from a resize operation would incorrectly produce a verification error, because the launched_at time is changed several times during the operation. Fix this to keep the last chronological launched_at time in the operation, not the last received. Also fix nondeterministic multiprocessing bug that was occasionally causing unittests to hang. Change-Id: Iba8b0bbd0cb8b2b063335ca9ab0ad95cf127087a	2015-06-16 21:31:40 +00:00
Monsyne Dragon	359e1b91ae	Bump Django version to avoid memory leak Django 1.5 has a memory leak (as mentioned in the 1.5.1 release notes: https://www.djangoproject.com/weblog/2013/mar/28/django-151/ ) Bump django requirement to >= 1.5.1 to avoid blowing out memory on stacktach worker processes. Change-Id: If05e05f0c12083bbdc624f1be1461509b10f5011	2015-04-08 19:00:06 +00:00
Monsyne Dragon	3abc36d02c	Fix instance_type_id not always being populated. Fix to make sure instance_type_id is always populated on InstanceUsage, even if the compute.instance.create operation is split acreooss multiple requests. Change-Id: Ic6243e8d5156d0e49a8fa1748a6a152724f01a14	2014-12-17 23:33:07 +00:00
Josh Kearney	fea828ab35	Bump the minimum required version of kombu to 3.0.23. This fixes the bug mentioned here: https://groups.google.com/forum/#!topic/celery-users/2SU8mieMyvE Also add .gitreview for StackForge. Change-Id: Ia3e8ed732e703a7d295ebdf059b4b01a47056d63	2014-10-10 13:18:02 -05:00
Sandy Walsh	7c591f2b57	Fix ordering problem causing tests to fail randomly Iterates the dictionary in the same order when building the mox as the code uses when performing the operations. Also changed the UUID's to make them a little more distquishing. Change-Id: I2c43e7f85e1b2655a46c24dc209386fe7fb48fa4	2014-09-15 20:13:39 +00:00
Monsyne Dragon	38590dd5c1	Fix nondeterministic test ordering bug. Fix a unit test bug. Fix was in rackerlabs repo, but somehow never made it to stackforge. Change-Id: Ie8a056e553f7385b335771ede1de1c2a8c01ae7a	2014-08-26 16:01:14 +00:00
Jenkins	b6235230ec	Merge "RM8278: Fixed glance_usage_audit report"	2014-08-13 18:25:56 +00:00
Manali Latkar	4ea5a86036	RM8278: Fixed glance_usage_audit report 1.Fixed unpacking values bug. 2.Corrected audit period from 86399.000001 to 86400 so that it picks up correct status counts. Change-Id: Ieea1f451e6db72aaa6d83696134986a0288bedb1	2014-08-12 16:04:23 +05:30
Manali Latkar	7d77f59478	handling the exception in case no instanceusage with specific instance and launched_at is present Change-Id: I7eb6d0b76aa41b0f26ab4b81fe033ddd527fc331	2014-08-11 16:38:49 +05:30
Jenkins	f0f8d2e9ca	Merge "set config-filename such that verifier can start Fixed a bug that was introduced that prevents verifier from starting without special environment variables et"	2014-08-05 03:53:10 +00:00
Phillip Moore	51cb63b40c	set config-filename such that verifier can start Fixed a bug that was introduced that prevents verifier from starting without special environment variables et Change-Id: I5b69851c688d4fdc43343422df58ec3edde67b0e	2014-07-31 21:23:42 +00:00
Monsyne Dragon	3e4eb35653	Fix db reconnect issue under django 1.6+ Django's orm layer will not auto-reconnect after it looses the connection to the mysql server, until you manually close the database connection in django 1.6 and above. (see: https://code.djangoproject.com/ticket/21597) This is an issue for persistant connections, as MySQL will timeout inactive connections, and any loss of the db connection will cause the stacktach worker to simply repeat the error "MySQL server has gone away" until restarted. This fix will allow the stacktach worker to properly reconnect. Change-Id: I0b0bc75b7e21fd183f3b0e7a55d727ff98d6f02b	2014-07-25 19:38:21 +00:00
Anuj Mathur	bea3a75a35	Nova usage report fix Gracefully handled case where rawdata entry does not exist for request_id while generating nova usage audit report. Change-Id: I675b2b5e9c4be70d45fc2385f1b448c159610f56	2014-05-30 17:03:54 +05:30
Priyanka Agrawal	655d9acf0a	Added try catch for request_id null When the request_id is null, there was an exception generated. Currently setting the deployment to None for the reports in case the request_id is null Change-Id: Idde2178d217ac16f1b3e275c730e3fce68ba9f1b	2014-05-27 12:31:03 +05:30
Sandy Walsh	8a0f06ac79	Freshen up with latest from RackerLabs (and include tox.ini) Added instance hours report Initial version of report to calculate unit hours used for nova instances Breakdown by flavor, flavor class, account/billing types and by tenant. Moved license so script has shebang as the first line Add tenant info cache. Refactor Instance hr report. Added cache table for basic tenant info for reports. Refactor instance_hours report to use table. Improve performance of tenant info update. use bulk sql operations to speed up the tenant info update, as it's taking ~40s/1000 tenants to update on a decent machine. Fix some tests broken by rebase. Fix unittests broken by rebase. Also, renumber migration due to collision. Add Apache license header to new files. Fixed bug with fetching deployment information in reconciler. Reverted old method for fetching current usage's deployment and added new method to fetch latest deployment information for a request_id. Made the field mismatch error message more readable Refactored nova and glance verifier tests the exists are updated with 201 send_status as part of stacktach down repair mechanism Revert "Fixed bug with fetching deployment information in" Revert "Adding host and deployment info to missing exists entries in the nova usage audit" Revert "Added column headers for host and deployment in json reports" Only log ERROR on last retry fixed the wrong status name for sent_failed variable in audit report fixing documentation for urls that are not available for glance deprecating stacky urls (usage, deletes, exists) that are not used anymore Revert "Revert "Added column headers for host and deployment in json reports"" Revert "Revert "Adding host and deployment info to missing exists entries in the nova usage audit"" Revert "Revert "Fixed bug with fetching deployment information in"" Cell and compute info added for verification failures as well. If that is not present(request_id is not populated for an InstanceUsage entry), the cells display '-' Add tox support for move to stackforge Add tox support for move to stackforge Change-Id: Id94c2a7f1f9061e972e90c3f54e39c9dec11943b	2014-05-08 15:58:03 -03:00
Thomas Maddox	6325c1ab5f	Merge pull request #314 from SandyWalsh/apache Switched to Apache licensing	2014-03-28 10:31:28 -05:00
Sandy Walsh	eae07eecac	Switched to Apache licensing	2014-03-28 11:47:16 -03:00
Manali Latkar	fd5810301f	Merge pull request #313 from anujm/add_columns_to_response_header Added column headers for host and deployment in json reports	2014-03-28 12:13:08 +05:30
Anuj Mathur	00bcb90117	Added column headers for host and deployment in json reports	2014-03-28 11:09:42 +05:30
anujm	e2adb255b7	Merge pull request #312 from manalilatkar/add_host_and_deployment_in_audit RM 5495: adding host and deployment info to missing exists entries in the nova usage audit	2014-03-25 20:09:13 +05:30
Manali Latkar	60be387b97	Adding host and deployment info to missing exists entries in the nova usage audit	2014-03-24 15:31:44 +05:30
anujm	b646d6150e	Merge pull request #309 from manalilatkar/os_distro_optional Making the type and null check for os_version optional according to import image_type	2014-03-19 21:17:30 +05:30
Manali Latkar	e2f99d8ae4	Making the type and null check for os_version optional according to import image_type	2014-03-19 17:21:40 +05:30
Sandy Walsh	da5cd479cb	Merge pull request #308 from yottatsa/debian Simple debianization patch	2014-03-19 08:39:31 -03:00
Vladimir Eremin	1193543757	basic debianization	2014-03-19 15:19:09 +04:00
Vladimir Eremin	24c54efcda	basic debianization	2014-03-19 14:49:04 +04:00
anujm	259400cc92	Merge pull request #306 from manalilatkar/store_AH_event_id modified api to store AH event id along with send_status sent by yagi	2014-03-18 20:59:16 +05:30
Manali Latkar	b1db1eb8f3	modified api to store AH event id along with send_status sent by yagi	2014-03-14 16:40:25 +05:30
anujm	9d1a93a49e	Merge pull request #281 from TelekomLabs/log_decode_errors log decode errors, do not ack.	2014-03-14 15:56:43 +05:30
Bernhard K. Weisshuhn	4f60f245cf	log decode errors, do not ack.	2014-03-14 10:34:56 +01:00
anujm	1c7cbab784	Merge pull request #305 from anujm/add_logging_for_rabbitmq Added logging on RabbitMQ connection error and revival	2014-03-13 20:26:17 +05:30
Anuj Mathur	c2aa435869	Added logging on RabbitMQ connection error and revival	2014-03-13 15:26:25 +05:30

1 2 3 4 5 ...

792 Commits All Branches Search

792 Commits

All Branches