Librabbitmq 2.0.0 has build issues in some environments, this is
apparently due to a bug in librabbitmq's c-extention.
Pin to the previous version (1.6.1) until bug is resolved.
Change-Id: I060fbeee176434bfa8e1041fe4b0caaac68992c5
CHANGE-1 : removing DB connection method from utils.py and checking if an entry in the stacktach_instancedeletes already exists to avoid duplicates. The DB connection method has also been removed from the utils.py file.
Change-Id: I7b3fbf41b8ef54e9e6bdb53046c96c2824f64613
Report the version number on startup of verifier and worker daemons
to make debugging of deployments easier.
Change-Id: Ib6f9008ab103a67d958004e7151f30065daa5a3d
Before:- Stacktack threw an exception and was unable to log the message. It later tries to process the same erroneous notification repeatedly.
After: Stacktach catched the malformed notification and logs the message. It later acknowledges RabbiMQ so that the message is removed from queue.
Change-Id: I7a33816a7ce4660513b047a7e54c3223a63c8cb3
Fix memory usage for verifiers. Events to verify were being loaded
from the db into an in-memory fifo queue to spool to worker processes.
This was not being limited, resulting in a large amount of memory
being used if events were read from the DB faster than they were
being processed. This change pauses the loading of events if the
in-memory queue grows larger than specified batchsize.
Also, verifier child processes were not handling signals (like SIGTERM)
properly, resulting in them not shutting down properly.
Added proper signal handling.
Change-Id: Ife25ca07398acf111f4388071b5f2e4eafeecb05
If the stacktach verifier crashes, notifications
'in-flight' can be stuck in 'verifying' status'.
This change flips those back to 'pending' so they get
processed.
Change-Id: Ie4aabed0c4991429a3e18e3b28813917d822867a
If an exception is thrown in the verifier child process for a
specific exchange, log it properly.
Was simply printing to stdout, which goes nowhere for daemon processes.
Change-Id: I528ad08e70d7bdf03e9a8e1d8abe45d09f2eb476
Make the parent worker process automatically restart
hung or dead child processes.
The parent will check all the child processes every 30 sec
to make sure they are still running. If not they will be restarted.
Also child processes update a heartbeat timestamp periodically
while processing messages. If the parent detects that that timestamp
hasn't been updated in a configurable amount of time (default 600sec)
it terminates the old process and spins up a new one.
Change-Id: I28ffbe64391d04a6e85b7e197393352ee1e978b0
Occasionally, out of order notifications received from a resize operation
would incorrectly produce a verification error, because the launched_at time
is changed several times during the operation.
Fix this to keep the last chronological launched_at time in the operation,
not the last received.
Also fix nondeterministic multiprocessing bug that was occasionally causing
unittests to hang.
Change-Id: Iba8b0bbd0cb8b2b063335ca9ab0ad95cf127087a
Django 1.5 has a memory leak (as mentioned in the 1.5.1 release notes:
https://www.djangoproject.com/weblog/2013/mar/28/django-151/ )
Bump django requirement to >= 1.5.1 to avoid blowing out memory on stacktach
worker processes.
Change-Id: If05e05f0c12083bbdc624f1be1461509b10f5011
Fix to make sure instance_type_id is always populated
on InstanceUsage, even if the compute.instance.create
operation is split acreooss multiple requests.
Change-Id: Ic6243e8d5156d0e49a8fa1748a6a152724f01a14
Iterates the dictionary in the same order when building the mox
as the code uses when performing the operations.
Also changed the UUID's to make them a little more
distquishing.
Change-Id: I2c43e7f85e1b2655a46c24dc209386fe7fb48fa4
1.Fixed unpacking values bug.
2.Corrected audit period from 86399.000001 to 86400 so that it
picks up correct status counts.
Change-Id: Ieea1f451e6db72aaa6d83696134986a0288bedb1
Fixed a bug that was introduced that prevents verifier from starting
without special environment variables et
Change-Id: I5b69851c688d4fdc43343422df58ec3edde67b0e
Django's orm layer will not auto-reconnect after it looses
the connection to the mysql server, until you manually close the
database connection in django 1.6 and above.
(see: https://code.djangoproject.com/ticket/21597)
This is an issue for persistant connections, as MySQL will timeout
inactive connections, and any loss of the db connection will cause
the stacktach worker to simply repeat the error
"MySQL server has gone away" until restarted.
This fix will allow the stacktach worker to properly reconnect.
Change-Id: I0b0bc75b7e21fd183f3b0e7a55d727ff98d6f02b
Gracefully handled case where rawdata entry does
not exist for request_id while generating nova usage
audit report.
Change-Id: I675b2b5e9c4be70d45fc2385f1b448c159610f56
When the request_id is null, there was an exception generated.
Currently setting the deployment to None for the reports
in case the request_id is null
Change-Id: Idde2178d217ac16f1b3e275c730e3fce68ba9f1b
Added instance hours report
Initial version of report to calculate unit hours used
for nova instances
Breakdown by flavor, flavor class, account/billing types and by tenant.
Moved license so script has shebang as the first line
Add tenant info cache.
Refactor Instance hr report.
Added cache table for basic tenant info for reports.
Refactor instance_hours report to use table.
Improve performance of tenant info update.
use bulk sql operations to speed up the tenant info update,
as it's taking ~40s/1000 tenants to update on a decent machine.
Fix some tests broken by rebase. Fix unittests broken by
rebase. Also, renumber migration due to collision.
Add Apache license header to new files.
Fixed bug with fetching deployment information in
reconciler. Reverted old method for fetching
current usage's deployment and added new method to
fetch latest deployment information for
a request_id.
Made the field mismatch error message more readable
Refactored nova and glance verifier tests
the exists are updated with 201 send_status as part of stacktach down repair mechanism
Revert "Fixed bug with fetching deployment information in"
Revert "Adding host and deployment info to missing exists entries in the nova usage audit"
Revert "Added column headers for host and deployment in json reports"
Only log ERROR on last retry
fixed the wrong status name for sent_failed variable in audit report
fixing documentation for urls that are not available for glance
deprecating stacky urls (usage, deletes, exists) that are not
used anymore
Revert "Revert "Added column headers for host and deployment in json reports""
Revert "Revert "Adding host and deployment info to missing exists entries in the nova usage audit""
Revert "Revert "Fixed bug with fetching deployment information in""
Cell and compute info added for verification failures as well.
If that is not present(request_id is not populated for an
InstanceUsage entry), the cells display '-'
Add tox support for move to stackforge
Add tox support for move to stackforge
Change-Id: Id94c2a7f1f9061e972e90c3f54e39c9dec11943b