monasca-transform

Commit Graph

Author	SHA1	Message	Date
Witek Bedyk	811acd76c9	Remove project content on master branch This is step 2b of repository deprecation process as described in [1]. Project deprecation has been anounced here [2]. [1] https://docs.openstack.org/project-team-guide/repository.html#step-2b-remove-project-content [2] http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016814.html Depends-On: https://review.opendev.org/751983 Change-Id: I83bb2821d64a4dddd569ff9939aa78d271834f08	2020-09-15 10:12:44 +02:00
melissaml	040211ba89	Update hacking version to latest This commit updates hacking version to 1.1.x and fixes related pep8 issues. Also added pycodestyle in test-requirements Story: 2004930 Task: 29318 Co-Authored-By: Akhil Jain <akhil.jain@india.nec.com> Change-Id: Id3ad30d23b902ee6f7277f7ec20d7d523df232f6	2019-06-12 13:22:41 +05:30
Ashwin Agate	963f774818	Fix Swift Rate Calculation * To find the oldest and newest quantities for calculating rate, Spark Data Frame is converted to a Spark RDD, which makes it easy to do a group by and sort. The syntax to pull the value from a map column type in an RDD is rdd.map_column['key'] which is different from dataframe.map_column.key which was causing 'dict' object has no attribute exception * Renamed first_attempt_at_spark_test.py to test_first_attempt_at_spark, because all the tests in that file were not being run. * Changed the pre transform and transform specs which were being used by test_first_attempt_at_spark to newer 'dimensions#field_name that was introduced in rocky release. Story: 2005449 Task: 30501 Change-Id: I6adeee54fe261c535372b8f5f3580e7d3261259b	2019-04-14 07:41:14 -07:00
Witold Bedyk	8257dd9f42	Fix PEP8 tests for Python 3 * relace long with int * replace execfile with exec Change-Id: If98949fa5f49091fbf11c95a65302a4f844538c9 Story: 2003240 Task: 26827	2018-10-01 11:21:23 +02:00
Zuul	fba8cef150	Merge "Enable mutable config in monasca_transform"	2018-09-11 23:20:35 +00:00
sajuptpm	410115088e	Enable mutable config in monasca_transform New releases of oslo.config support a 'mutable' parameter to Opts. oslo.service provides an option here Icec3e664f3fe72614e373b2938e8dee53cf8bc5e allows services to tell oslo.service they want mutate_config_files to be called by passing a parameter. This commit is to use the same. This allows monasca_transform to benefit from I1e7a69de169cc85f4c09954b2f46ce2da7106d90, where the 'debug' option (owned by oslo.log) is made mutable. we should be able to turn debug logging on and off by changing the config. tc goal: https://governance.openstack.org/tc/goals/rocky/enable-mutable-configuration.html Change-Id: I86571df78014a810ffa881ceceeddfc5193c9ca5	2018-07-25 16:42:42 +05:30
Ashwin Agate	fbad704cc2	Remove service_id from pre-transform spec Remove unused field service_id from pre-transform spec. service_id's original purpose was identify the service that is genarating metric, but this information should be provided by the source, as a dimension rather than assigning its value in pre-transform spec. Change-Id: I223eb2296df438b139e3d9b5aaf4b1b679f70797 Depends-on: I81a35e048e6bd5649c6b3031ac2722be6a309088 Story: 2001815 Task: 12556	2018-07-17 15:39:24 -07:00
Ashwin Agate	0cf08c45c5	Cleanup pre transform and transform specs * Removed unused fields "event_status", "event_version", "record_type", "mount", "device", "pod_name", "container_name" "app", "interface", "deployment" and "daemon_set" from record_store data. Now it is not required to add new dimension, meta or value_meta fields to record store data instead use special notation, e.g. "dimension#" to refer to any dimension field in the incoming metric. * Refactor and eliminate need to add any new metric.dimensions field in multiple places e.g. add to record store and instance usage dataframe schema and in all generic aggregation components. Added a new Map type column called "extra_data_map" to store any new fields, in instance usage data format. Map type column eliminates the need to add new columns to instance usage data. * Allow users to define any fields in "meta", "metric.dimensions" and "metric.value_meta" fields for aggregation in "aggregation_group_by_list" or "setter_group_by_list" using "dimensions#{$field_name}" or "meta#{$field_name}" or "value_meta#{$field_name}" * Updated generic aggregation components and data formats docs. Change-Id: I81a35e048e6bd5649c6b3031ac2722be6a309088 Story: 2001815 Task: 19605	2018-05-29 16:35:33 -07:00
Amir Mofakhar	37d4f09057	Update pep8 checks * set the maximum line length to 100 * cleaned up the codes for pep8 Change-Id: Iab260a4e77584aae31c0596f39146dd5092b807a Signed-off-by: Amir Mofakhar <amofakhar@op5.com>	2018-04-18 10:05:00 +02:00
Ashwin Agate	3589dd0820	Set region in metric meta from config file - set region in 'meta' section of aggregated metric from config file. Change-Id: I82b00e9d88704835537cfd0285678b94bcc0f3a8	2018-03-27 16:47:46 -07:00
Jenkins	49429cf7ff	Merge "Switch to using Spark version 2.2.0"	2017-08-25 02:59:30 +00:00
Ashwin Agate	022bd11a4d	Switch to using Spark version 2.2.0 Following changes were required: 1.) By default the pre-built distribution for Spark 2.2.0 is compiled with Scala 2.11. monasca-transform requires Spark compiled with Scala 2.10 since we use spark streaming to pull data from Kafka and the version of Kafka is compatible with Scala 2.10. The recommended way is to compile Spark with Scala 2.10, but for purposes of devstack plugin made changes to pull the required jars from mvn directly. (see SPARK_JARS and SPARK_JAVA_LIB variables in settings) All jars get moved to <SPARK_HOME>/assembly/target/assembly/ target/scala_2.10/jars/ Note: <SPARK_HOME>/jars gets renamed to <SPARK_HOME>/jars_original. spark-submit defaults to assembly location if <SPARK_HOME>/jars directory is missing. 2.) Updated start up scripts for spark worker and spark master with a new env variable SPARK_SCALA_VERSIOn=2.10. Also updated PYTHONPATH variable to add new py4j-0.10.4-src.zip file 3.) Some changes to adhere to deprecated pyspark function calls which were removed in Spark 2.0 Change-Id: I8f8393bb91307d55f156b2ebf45225a16ae9d8f4	2017-08-21 11:18:22 -07:00
rajat29	31ab7c40e1	Stop using deprecated 'message' attribute in Exception The 'message' attribute has been deprecated and removed from Python3. For more details, please check: https://www.python.org/dev/peps/pep-0352/ Change-Id: Ieaf6196fad7aa5e98ba6d6f5cea6f5f413fd4b69	2017-08-01 16:11:40 +05:30
Ashwin Agate	94510afed9	Check periodically if host is leader Check periodically if host continues to be a leader once elected. Failure to check might lead to a situation where the host has lost leadership but is not aware of the situation. If the host is no longer the leader then stand down as a leader, stop any spark-submit processes running on the node and reset state in the transform thread. Removed --supervise option when invoking spark-submit to turn off built in driver management. Added some hardening to better catch exceptions in main transform service thread and also periodic leader check function so that the threads don't die when they encounter an unhandled exception. Change-Id: If2e13e3ed6cb30b3d7fa5f1b440c4c39b87692be	2017-07-05 23:13:29 -07:00
Ashwin Agate	5c15a99a5c	Enhanced refresh monasca transform script refresh_monasca_script.sh is useful for development in a devstack environment. Enhanced the script to * start and stop monasca-transform process running in a screen session * added more hardening to catch for errors and exit if any command fails. * added debugging statements help track down any errors when the script is run. Change-Id: Idab02d555eed192d8242c870017955b935532c3d	2017-04-19 17:16:50 -07:00
agateaaa	2da390414e	Hourly aggregation account for early arrving metrics With this change pre hourly processor which does the hourly aggregation (second stage) and writes the final aggregated metrics to metris topic in kafka now accounts for any early arriving metrics. This change along with two previous changes to pre hourly processor that added 1.) configurable late metrics slack time (https://review.openstack.org/#/c/394497/), and 2.) batch filtering (https://review.openstack.org/#/c/363100/) will make sure all late arriving or early arriving metrics for an hour are aggregated appropriately. Also made improvement in MySQL offset to call delete excess revisions only once. Change-Id: I919cddf343821fe52ad6a1d4170362311f84c0e4	2017-04-17 15:29:34 -07:00
Ashwin Agate	f99a3faf68	Fix development environment and functional tests Changing devstack environment vagrant box and also rename the devstack VM to 'devstack' from 'pg-tips' Also fixing all the tests that were broken when they were moved from tests/unit to tests/functional with this review https://review.openstack.org/#/c/400237/ Update devstack README with a section called Development workflow for monasca-transform with steps developers can take to develop and run tests. Change-Id: I11678148ba2bcb96eb3e2a522176683dc8bca30a	2017-04-11 12:13:25 -07:00
Jenkins	f17fa649af	Merge "Remove unused logging import"	2017-03-23 22:50:34 +00:00
agatea	8aef98d2d2	Use to kafka_lib library in monasca common kafka_python 0.9.5 was moved to monasca common Upstream community wants to move to newer version of kafka python which has number of performance problems. See https://review.openstack.org/#/c/420579/ and https://review.openstack.org/#/c/424840/ Monasca transform uses kafka python library to write aggregated metrics to kafka as well as read offset information in case of hourly aggregation. Since long term plan is to move to pykafka in the future we will have to investigate if that functionality is available. Change-Id: I831c9e259b3d7b92fb2834193034e15b62c80c37	2017-03-15 16:22:50 -07:00
Jenkins	b852f7141f	Merge "Started adding kubernetes metrics aggregation"	2017-03-03 19:18:47 +00:00
Jenkins	e6f7057786	Merge "Corrected catch up aggregation logic"	2017-03-03 19:12:24 +00:00
Flint Calvin	d8f283c378	Started adding kubernetes metrics aggregation There is a desire to use Monasca Transform to aggregate kubernetes metrics. This change is a start in that direction. The transformation specs in the tests folder now include some representative aggregations of some kubernetes metrics. This commit also includes some changes to get first_attempt_at_spark_test.py working again after being moved from the unit test folder to the functional test folder. Change-Id: I038ecaf42e67d5c994980991232a2a8428a4f4e3	2017-03-02 15:38:25 +00:00
agatea	1579d8b9e5	Reuse existing spark sql context Prevent creating a new spark sql context object with every batch. Profiling of java heap for the driver indicated that there is a steady increase (~12MB over 5 days) of org.apache.spark.sql.execution.metric.LongSQLMetricValue and org.apache.spark.sql.execution.ui.SQLTaskMetrics with each batch execution. These are used by the spark streaming ui and were not being garbage collected. See https://issues.apache.org/jira/browse/SPARK-17381 with a similar issue. This change along with setting spark.sql.ui.retainedExecutions to a low number in sparks-defaults.conf will reduce gradual increase in heap size. Also made a change to catch unhandled MemberNotJoined exception because of whichthe transform service thread went into a unresponsive state. Change-Id: Ibf244cbfc00a90ada66f492b473719c25fa17fd2	2017-02-27 14:06:53 -08:00
Anh Tran	f8f5667c91	Remove unused logging import Change-Id: I68a166a212f1bd5bc40170cd96bd36668666f6a8	2017-02-17 10:45:46 +07:00
David C Kennedy	00b4797a65	Corrected catch up aggregation logic Fixed a bug where the hourly agregation would run at every iteration if the hour is zero (midnight) because zero is falsey. Change-Id: I9652f02aea30f3ddb6f154db716aa4057455be06	2017-02-14 14:56:23 +00:00
Cao Xuan Hoang	6cba31c0dd	Replaced e.message with str(e) For logging the exception message: e.message has been deprecated. The preferred way is to call str(e). More details: https://www.python.org/dev/peps/pep-0352/ Change-Id: I27b6a7b1f5e336df3cd618684cedfd01c840c99f	2017-01-19 02:00:43 +00:00
Ashwin Agate	c189feeb8b	Delete hourly offsets from offsets table Pre Hourly processor fails if offsets recorded in kafka_offsets table no longer exist in kafka. This change deletes the offsets from kafka_offsets table, so that the pre hourly processor can resume processing with the next run. Change-Id: I017c271e630fdf6de05a73b3bfcb14f5ed18615f	2017-01-09 19:35:51 +00:00
Jenkins	ef08eea0ce	Merge "Add configurable amnesty period for late metrics"	2016-11-24 09:12:34 +00:00
David C Kennedy	26e53336d4	Add configurable amnesty period for late metrics Added configuration option to allow the pre-hourly transformation to be done at a specified period past the hour. This includes a check to ensure that if not done yet for the hour but overdue processing is done at the earliest time. Change-Id: I8882f3089ca748ce435b4e9a92196a72a0a8e63f	2016-11-22 13:03:52 +00:00
Ashwin Agate	c3fcd61f93	Remove unique metric count aggregation Removing unique metric count aggregation since it causes data from Kafka to be pulled and processed twice and leads to unecessary increase in over all time required to process a batch. Change-Id: I2046f95709232979dfd590d5293c803cac05bbb2	2016-11-22 12:58:15 +00:00
David C Kennedy	8a6e619f72	Populate the project id for kafka publish This needs to be the admin project id so for devstack this needs to be written to the configuration file once the users/projects etc are created and identifiable. Add a similar process to the refresh script. Correct the configuration property name to 'project' rather than using the old nomencature 'tenant'. Change-Id: Ib9970ffacf5ee0f7f006722038a1db8024c1385e	2016-11-11 15:24:22 +00:00
David C Kennedy	8ac3250aef	Remove metric filters from transform-spec The metric filter allows metrics to be filtered before aggregation, for example to exclude metrics from certain environments or nodes from being included where they are outside the scope of data aggregation. This is a powerful feature but not appropriate for all scenarios (e.g. devstack). So default the filter to empty. Change-Id: Icb790a0ec41133bfac54244aae8782a5cc665186	2016-11-02 15:58:07 +00:00
Ashwin Agate	1c65ca011b	Validate metrics before publishing to kafka Validate monasca metrics using monasca-common validate library (requires monasca-common >= 1.1.0) Change-Id: Iea784edbb3b57db57e6a90d1fc557b2c386c3713	2016-09-30 19:07:02 +00:00
Flint Calvin	0ea79c0305	Added aggregation results to application log Made changes such that debug-level log entries are written to the application log noting which aggregated metrics are submitted during pre-hourly and hourly processing. Change-Id: I64c6a18233614fe680aa0b084570ee7885f316e5	2016-09-23 18:24:20 +00:00
Flint Calvin	87a8960467	Changed hourly storage.objects.size_agg operation The hourly aggregation for storage.objects.size_agg should have been a sum rather than an average. Change-Id: Icf018a24c5de0efb67faeeee1418bad5064a39e7	2016-09-14 17:13:26 +00:00
Flint Calvin	4edad0286a	Eliminated ceiling function for utilization metrics Removed the calls to the ceiling function on utilization metrics aggregation such that they now are exact values (i.e., not rounded up to the next integral value). Change-Id: I9813b94acb051f6754da2d559090318010f86e57	2016-09-09 21:07:29 +00:00
Flint Calvin	3cdb0d1687	Made corrections such that swiftlm.diskusage.rate_agg is now correctly based on swiftlm.diskusage.host.val.avail (instead of incorrectly being based on swiftlm.diskusage.host.val.size). Change-Id: If17853e166c050cefbf390791a8696ce520fca96	2016-09-06 20:52:49 +00:00
Flint Calvin	bf2e42b3e0	Made changes to prevent multiple metrics in the same batch. Change-Id: Iec9935c21d8b65bf79067d4a009859c898b75993	2016-08-31 18:18:25 +00:00
Jenkins	424077c581	Merge "Added aggregation of storage-objects.size."	2016-08-19 22:09:44 +00:00
Flint Calvin	eff0e74c50	Eliminated processing_meta from hourly metrics. Change-Id: Ibcb00cae12d87e93546bddf14a796662425f7167	2016-08-18 23:25:06 +00:00
Flint Calvin	d4f791e9ef	Added aggregation of storage-objects.size. Change-Id: Iafd5ff8a8faf958ac9bada9981e8cd419d82ed2e	2016-08-18 20:09:30 +00:00
Flint Calvin	0365bfa5bb	Modifications to include processing_meta in pre-hourly metrics. Change-Id: I3464008cf8695864b75cbbbfd6570db5defa8cb5	2016-08-16 22:27:04 +00:00
Flint Calvin	615e52d5cd	Modifications to make rate calculations work with two-stage aggregation. Change-Id: I8c7b6112a04ba378ba1911a342cb97e8c388ebc6	2016-08-09 16:33:34 +00:00
Flint Calvin	accbacb19e	Reintroduced some aggregations which were apparently lost when we transitioned to the upstream Monasca-Transform OpenStack repo. Specifically, the missing aggregations were those for the nova.vm.cpu.total_allocated and nova.vm.mem.total_allocated_mb source metrics. This set of changes also includes the resolution of a couple pre-existing pep8 errors. Change-Id: I84bf19b674aeadcd0d27799a887d0b89d0381550	2016-08-04 15:13:43 +00:00
Flint Calvin	a9775506cb	Removed 'device' as an expected dimension on incoming Swift metrics (since it is no longer included in them). Change-Id: Ide8a463b8678aec38857e6376118a09588b98e0a	2016-07-28 16:09:27 +00:00
Jenkins	beae384c2d	Merge "add test cases for fetch_quantity_prehourly_instance_usage"	2016-07-28 14:51:21 +00:00
darfed	bb83b30dc1	Add TLS/SSL capability to database connection Add properties to conf file to allow configuration of SSL for the database connection. Done for both the python and java connection strings. Change-Id: I4c3d25c3f8f12eae801a6a818bf4ac7acd93d2dc	2016-07-28 10:12:10 +01:00
Michael Dong	acba1782ad	add test cases for fetch_quantity_prehourly_instance_usage Removed dependency of yaml module. Change-Id: I87ce80d420bc75ddbef2c8454f088be25f9ff908	2016-07-25 12:19:24 -07:00
Jenkins	aa7addbc84	Merge "timestamp parsed into utc instead of localtime(which was default) Closes Bug:#160531"	2016-07-25 15:28:03 +00:00
Michael Dong	c498564929	add 'string' to all firstrecord_timestamp and lastrecord_timestamp to fix bug Closes-Bug:#1603529 Change-Id: I5585fc5c376e3220d79d22be3394e3b6ad0e6214	2016-07-19 13:22:51 -07:00

1 2

65 Commits