Commit Graph

792 Commits

Author SHA1 Message Date
OpenDev Sysadmins fda6435ff3 OpenDev Migration Patch
This commit was bulk generated and pushed by the OpenDev sysadmins
as a part of the Git hosting and code review systems migration
detailed in these mailing list posts:

http://lists.openstack.org/pipermail/openstack-discuss/2019-March/003603.html
http://lists.openstack.org/pipermail/openstack-discuss/2019-April/004920.html

Attempts have been made to correct repository namespaces and
hostnames based on simple pattern matching, but it's possible some
were updated incorrectly or missed entirely. Please reach out to us
via the contact information listed at https://opendev.org/ with any
questions you may have.
2019-04-19 19:50:32 +00:00
Monsyne Dragon 02350960d0 Pin librabbitmq due to compilation error.
Librabbitmq 2.0.0 has build issues in some environments, this is
apparently due to a bug in librabbitmq's c-extention.

Pin to the previous version (1.6.1) until bug is resolved.

Change-Id: I060fbeee176434bfa8e1041fe4b0caaac68992c5
2018-08-14 00:01:31 +00:00
ishamibrahim 865c1aa10d A python script to clean the stacktach DB. This adds data on instances that are missing or deleted in Nova DB to the stacktach DB instance_deletes table.
CHANGE-1 : removing DB connection method from utils.py and checking if an entry in the stacktach_instancedeletes already exists to avoid duplicates. The DB connection method has also been removed from the utils.py file.

Change-Id: I7b3fbf41b8ef54e9e6bdb53046c96c2824f64613
2017-09-14 21:13:31 +05:30
Monsyne Dragon 69e04b1c3a Add Version number in logs at start.
Report the version number on startup of verifier and worker daemons
to make debugging of deployments easier.

Change-Id: Ib6f9008ab103a67d958004e7151f30065daa5a3d
2017-09-05 20:15:02 +00:00
Isham Ibrahim 89ff15391b VIRT-2874: Using terminated_at value instead of deleted_at from the request to use as the deleted_at field in the DB. Unit test changes also included
PHASE-II Changed test cases since the current ones were logically incorrect.

Change-Id: Idb3d42a245711fc72c32ab66b395a7aa67e1bb87
2017-07-20 19:27:51 +05:30
Isham Ibrahim b7aa9c8980 VIRT-2985: Continuing the loop for a batch update of the `exists` pingback from yagi in case of django.ObjectNotExist and django.MultipleObjectExist errors. unit tests for batch update containing django errors have also been updated.
SECOND ITERATION - Added another mock patch to fake Django DB transactions.

Change-Id: Ia953d9b5393ee315c6296f0f0e9d98ff4c456e0c
2017-07-12 15:49:15 +05:30
Jenkins d6acee808d Merge "JIRA VIRT-2986. Added an exponential back off strategy for RabbitMQ connection failure in the callback method of the nova verifier." 2017-06-01 22:26:35 +00:00
Isham Ibrahim a80b25fcce VIRT-2996: Handling malformed notifications sent by Nova to stacktach.
Before:- Stacktack threw an exception and was unable to log the message. It later tries to process the same erroneous notification repeatedly.
After: Stacktach catched the malformed notification and logs the message. It later acknowledges RabbiMQ so that the message is removed from queue.

Change-Id: I7a33816a7ce4660513b047a7e54c3223a63c8cb3
2017-05-30 20:19:34 +05:30
Isham Ibrahim 1b1a5cc4c9 JIRA VIRT-2986. Added an exponential back off strategy for RabbitMQ connection failure in the callback method of the nova verifier.
Change-Id: I8a1c0d14a28e3f0f8f46ba15f14b84dc35fe10ee
2017-05-11 13:15:02 +05:30
Jenkins 0c4b90f1c3 Merge "Update .gitreview for new namespace" 2017-05-08 15:09:52 +00:00
Monsyne Dragon f5a03f1afe Fixes for stacktach verifier processes
Fix memory usage for verifiers. Events to verify were being loaded
from the db into an in-memory fifo queue to spool to worker processes.
This was not being limited, resulting in a large amount of memory
being used if events were read from the DB faster than they were
being processed. This change pauses the loading of events if the
in-memory queue grows larger than specified batchsize.

Also, verifier child processes were not handling signals (like SIGTERM)
properly, resulting in them not shutting down properly.
Added proper signal handling.

Change-Id: Ife25ca07398acf111f4388071b5f2e4eafeecb05
2016-07-21 22:54:48 +00:00
Monsyne Dragon 0c8ee8fc40 Reset 'Verifying' notifications on verifier start.
If the stacktach verifier crashes, notifications
'in-flight' can be stuck in 'verifying' status'.

This change flips those back to 'pending' so they get
processed.

Change-Id: Ie4aabed0c4991429a3e18e3b28813917d822867a
2016-03-07 18:05:00 +00:00
Monsyne Dragon 5ac182ac4e Properly log verifier exceptions.
If an exception is thrown in the verifier child process for a
specific exchange, log it properly.

Was simply printing to stdout, which goes nowhere for daemon processes.

Change-Id: I528ad08e70d7bdf03e9a8e1d8abe45d09f2eb476
2016-03-04 21:07:17 +00:00
Monsyne Dragon 03cd412254 Fix missing import in verifier start script.
Replace accidentally removed datetime import.

Change-Id: I2476d0eb15aca37e01c522e950a23886ce70eff0
2016-02-29 21:45:09 +00:00
Monsyne Dragon c389b8f8a9 Add watchdog for verifier processes.
Add a watchdog in parent process to check verifier child processes,
and restart if needed.

Change-Id: Icacc4c046a8f4ba949499780cdc4724c9fd54fba
2016-02-25 16:36:02 +00:00
Jeremy Stanley eb178e736e Update .gitreview for new namespace
Change-Id: I6bd00fd50a2b9a8bbfa6e5561656c1c5377bba61
2015-10-17 22:39:03 +00:00
Min Pae 2b4535c8a2 fixing syntax error on return line
Change-Id: Ia28ecf3887374ba4a1c82ca2aa555537991962a6
2015-07-12 17:48:58 -07:00
Monsyne Dragon 82aca32286 Automatically check and restart worker processes.
Make the parent worker process automatically restart
hung or dead child processes.

The parent will check all the child processes every 30 sec
to make sure they are still running. If not they will be restarted.

Also child processes update a heartbeat timestamp periodically
while processing messages. If the parent detects that that timestamp
hasn't been updated in a configurable amount of time (default 600sec)
it terminates the old process and spins up a new one.

Change-Id: I28ffbe64391d04a6e85b7e197393352ee1e978b0
2015-07-09 16:50:55 +00:00
Monsyne Dragon 2fa78b9309 Fix ordering bug procesing updates.
Occasionally, out of order notifications received from a resize operation
would incorrectly produce a verification error, because the launched_at time
is changed several times during the operation.
Fix this to keep the last chronological launched_at time in the operation,
not the last received.

Also fix nondeterministic multiprocessing bug that was occasionally causing
unittests to hang.

Change-Id: Iba8b0bbd0cb8b2b063335ca9ab0ad95cf127087a
2015-06-16 21:31:40 +00:00
Monsyne Dragon 359e1b91ae Bump Django version to avoid memory leak
Django 1.5 has a memory leak (as mentioned in the 1.5.1 release notes:
https://www.djangoproject.com/weblog/2013/mar/28/django-151/ )

Bump django requirement to >= 1.5.1 to avoid blowing out memory on stacktach
worker processes.

Change-Id: If05e05f0c12083bbdc624f1be1461509b10f5011
2015-04-08 19:00:06 +00:00
Monsyne Dragon 3abc36d02c Fix instance_type_id not always being populated.
Fix to make sure instance_type_id is always populated
on InstanceUsage, even if the compute.instance.create
operation is split acreooss multiple requests.

Change-Id: Ic6243e8d5156d0e49a8fa1748a6a152724f01a14
2014-12-17 23:33:07 +00:00
Josh Kearney fea828ab35 Bump the minimum required version of kombu to 3.0.23. This fixes
the bug mentioned here:

https://groups.google.com/forum/#!topic/celery-users/2SU8mieMyvE

Also add .gitreview for StackForge.

Change-Id: Ia3e8ed732e703a7d295ebdf059b4b01a47056d63
2014-10-10 13:18:02 -05:00
Sandy Walsh 7c591f2b57 Fix ordering problem causing tests to fail randomly
Iterates the dictionary in the same order when building the mox
as the code uses when performing the operations.

Also changed the UUID's to make them a little more
distquishing.

Change-Id: I2c43e7f85e1b2655a46c24dc209386fe7fb48fa4
2014-09-15 20:13:39 +00:00
Monsyne Dragon 38590dd5c1 Fix nondeterministic test ordering bug.
Fix a unit test bug. Fix was in rackerlabs repo, but somehow never
made it to stackforge.

Change-Id: Ie8a056e553f7385b335771ede1de1c2a8c01ae7a
2014-08-26 16:01:14 +00:00
Jenkins b6235230ec Merge "RM8278: Fixed glance_usage_audit report" 2014-08-13 18:25:56 +00:00
Manali Latkar 4ea5a86036 RM8278: Fixed glance_usage_audit report
1.Fixed unpacking values bug.
2.Corrected audit period from 86399.000001 to 86400 so that it
picks up correct status counts.

Change-Id: Ieea1f451e6db72aaa6d83696134986a0288bedb1
2014-08-12 16:04:23 +05:30
Manali Latkar 7d77f59478 handling the exception in case no instanceusage with specific instance and launched_at is present
Change-Id: I7eb6d0b76aa41b0f26ab4b81fe033ddd527fc331
2014-08-11 16:38:49 +05:30
Jenkins f0f8d2e9ca Merge "set config-filename such that verifier can start Fixed a bug that was introduced that prevents verifier from starting without special environment variables et" 2014-08-05 03:53:10 +00:00
Phillip Moore 51cb63b40c set config-filename such that verifier can start
Fixed a bug that was introduced that prevents verifier from starting
without special environment variables et

Change-Id: I5b69851c688d4fdc43343422df58ec3edde67b0e
2014-07-31 21:23:42 +00:00
Monsyne Dragon 3e4eb35653 Fix db reconnect issue under django 1.6+
Django's orm layer will not auto-reconnect after it looses
the connection to the mysql server, until you manually close the
database connection in django 1.6 and above.

(see: https://code.djangoproject.com/ticket/21597)

This is an issue for persistant connections, as MySQL will timeout
inactive connections, and any loss of the db connection will cause
the stacktach worker to simply repeat the error
"MySQL server has gone away" until restarted.

This fix will allow the stacktach worker to properly reconnect.

Change-Id: I0b0bc75b7e21fd183f3b0e7a55d727ff98d6f02b
2014-07-25 19:38:21 +00:00
Anuj Mathur bea3a75a35 Nova usage report fix
Gracefully handled case where rawdata entry does
not exist for request_id while generating nova usage
audit report.

Change-Id: I675b2b5e9c4be70d45fc2385f1b448c159610f56
2014-05-30 17:03:54 +05:30
Priyanka Agrawal 655d9acf0a Added try catch for request_id null
When the request_id is null, there was an exception generated.
Currently setting the deployment to None for the reports
in case the request_id is null

Change-Id: Idde2178d217ac16f1b3e275c730e3fce68ba9f1b
2014-05-27 12:31:03 +05:30
Sandy Walsh 8a0f06ac79 Freshen up with latest from RackerLabs (and include tox.ini)
Added instance hours report

Initial version of report to calculate unit hours used
for nova instances

Breakdown by flavor, flavor class, account/billing types and by tenant.

Moved license so script has shebang as the first line
Add tenant info cache.
Refactor Instance hr report.
Added cache table for basic tenant info for reports.
Refactor instance_hours report to use table.
Improve performance of tenant info update.

use bulk sql operations to speed up the tenant info update,
as it's taking ~40s/1000 tenants to update on a decent machine.

Fix some tests broken by rebase. Fix unittests broken by
rebase. Also, renumber migration due to collision.

Add Apache license header to new files.

Fixed bug with fetching deployment information in
reconciler. Reverted old method for fetching
current usage's deployment and added new method to
fetch latest deployment information for
a request_id.

Made the field mismatch error message more readable
Refactored nova and glance verifier tests

the exists are updated with 201 send_status as part of stacktach down repair mechanism

Revert "Fixed bug with fetching deployment information in"

Revert "Adding host and deployment info to missing exists entries in the nova usage audit"

Revert "Added column headers for host and deployment in json reports"

Only log ERROR on last retry

fixed the wrong status name for sent_failed variable in audit report

fixing documentation for urls that are not available for glance

deprecating stacky urls (usage, deletes, exists) that are not
used anymore

Revert "Revert "Added column headers for host and deployment in json reports""

Revert "Revert "Adding host and deployment info to missing exists entries in the nova usage audit""

Revert "Revert "Fixed bug with fetching deployment information in""

Cell and compute info added for verification failures as well.
If that is not present(request_id is not populated for an
InstanceUsage entry), the cells display '-'

Add tox support for move to stackforge

Add tox support for move to stackforge

Change-Id: Id94c2a7f1f9061e972e90c3f54e39c9dec11943b
2014-05-08 15:58:03 -03:00
Thomas Maddox 6325c1ab5f Merge pull request #314 from SandyWalsh/apache
Switched to Apache licensing
2014-03-28 10:31:28 -05:00
Sandy Walsh eae07eecac Switched to Apache licensing 2014-03-28 11:47:16 -03:00
Manali Latkar fd5810301f Merge pull request #313 from anujm/add_columns_to_response_header
Added column headers for host and deployment in json reports
2014-03-28 12:13:08 +05:30
Anuj Mathur 00bcb90117 Added column headers for host and deployment in json reports 2014-03-28 11:09:42 +05:30
anujm e2adb255b7 Merge pull request #312 from manalilatkar/add_host_and_deployment_in_audit
RM 5495: adding host and deployment info to missing exists entries in the nova usage audit
2014-03-25 20:09:13 +05:30
Manali Latkar 60be387b97 Adding host and deployment info to missing exists entries in the nova usage audit 2014-03-24 15:31:44 +05:30
anujm b646d6150e Merge pull request #309 from manalilatkar/os_distro_optional
Making the type and null check for os_version optional according to import image_type
2014-03-19 21:17:30 +05:30
Manali Latkar e2f99d8ae4 Making the type and null check for os_version optional according to import image_type 2014-03-19 17:21:40 +05:30
Sandy Walsh da5cd479cb Merge pull request #308 from yottatsa/debian
Simple debianization patch
2014-03-19 08:39:31 -03:00
Vladimir Eremin 1193543757 basic debianization 2014-03-19 15:19:09 +04:00
Vladimir Eremin 24c54efcda basic debianization 2014-03-19 14:49:04 +04:00
anujm 259400cc92 Merge pull request #306 from manalilatkar/store_AH_event_id
modified api to store AH event id along with send_status sent by yagi
2014-03-18 20:59:16 +05:30
Manali Latkar b1db1eb8f3 modified api to store AH event id along with send_status sent by yagi 2014-03-14 16:40:25 +05:30
anujm 9d1a93a49e Merge pull request #281 from TelekomLabs/log_decode_errors
log decode errors, do not ack.
2014-03-14 15:56:43 +05:30
Bernhard K. Weisshuhn 4f60f245cf log decode errors, do not ack. 2014-03-14 10:34:56 +01:00
anujm 1c7cbab784 Merge pull request #305 from anujm/add_logging_for_rabbitmq
Added logging on RabbitMQ connection error and revival
2014-03-13 20:26:17 +05:30
Anuj Mathur c2aa435869 Added logging on RabbitMQ connection error and revival 2014-03-13 15:26:25 +05:30