monasca-transform

Commit Graph

Author	SHA1	Message	Date
Witek Bedyk	811acd76c9	Remove project content on master branch This is step 2b of repository deprecation process as described in [1]. Project deprecation has been anounced here [2]. [1] https://docs.openstack.org/project-team-guide/repository.html#step-2b-remove-project-content [2] http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016814.html Depends-On: https://review.opendev.org/751983 Change-Id: I83bb2821d64a4dddd569ff9939aa78d271834f08	2020-09-15 10:12:44 +02:00
Ashwin Agate	963f774818	Fix Swift Rate Calculation * To find the oldest and newest quantities for calculating rate, Spark Data Frame is converted to a Spark RDD, which makes it easy to do a group by and sort. The syntax to pull the value from a map column type in an RDD is rdd.map_column['key'] which is different from dataframe.map_column.key which was causing 'dict' object has no attribute exception * Renamed first_attempt_at_spark_test.py to test_first_attempt_at_spark, because all the tests in that file were not being run. * Changed the pre transform and transform specs which were being used by test_first_attempt_at_spark to newer 'dimensions#field_name that was introduced in rocky release. Story: 2005449 Task: 30501 Change-Id: I6adeee54fe261c535372b8f5f3580e7d3261259b	2019-04-14 07:41:14 -07:00
Ashwin Agate	fbad704cc2	Remove service_id from pre-transform spec Remove unused field service_id from pre-transform spec. service_id's original purpose was identify the service that is genarating metric, but this information should be provided by the source, as a dimension rather than assigning its value in pre-transform spec. Change-Id: I223eb2296df438b139e3d9b5aaf4b1b679f70797 Depends-on: I81a35e048e6bd5649c6b3031ac2722be6a309088 Story: 2001815 Task: 12556	2018-07-17 15:39:24 -07:00
Ashwin Agate	0cf08c45c5	Cleanup pre transform and transform specs * Removed unused fields "event_status", "event_version", "record_type", "mount", "device", "pod_name", "container_name" "app", "interface", "deployment" and "daemon_set" from record_store data. Now it is not required to add new dimension, meta or value_meta fields to record store data instead use special notation, e.g. "dimension#" to refer to any dimension field in the incoming metric. * Refactor and eliminate need to add any new metric.dimensions field in multiple places e.g. add to record store and instance usage dataframe schema and in all generic aggregation components. Added a new Map type column called "extra_data_map" to store any new fields, in instance usage data format. Map type column eliminates the need to add new columns to instance usage data. * Allow users to define any fields in "meta", "metric.dimensions" and "metric.value_meta" fields for aggregation in "aggregation_group_by_list" or "setter_group_by_list" using "dimensions#{$field_name}" or "meta#{$field_name}" or "value_meta#{$field_name}" * Updated generic aggregation components and data formats docs. Change-Id: I81a35e048e6bd5649c6b3031ac2722be6a309088 Story: 2001815 Task: 19605	2018-05-29 16:35:33 -07:00
Ashwin Agate	022bd11a4d	Switch to using Spark version 2.2.0 Following changes were required: 1.) By default the pre-built distribution for Spark 2.2.0 is compiled with Scala 2.11. monasca-transform requires Spark compiled with Scala 2.10 since we use spark streaming to pull data from Kafka and the version of Kafka is compatible with Scala 2.10. The recommended way is to compile Spark with Scala 2.10, but for purposes of devstack plugin made changes to pull the required jars from mvn directly. (see SPARK_JARS and SPARK_JAVA_LIB variables in settings) All jars get moved to <SPARK_HOME>/assembly/target/assembly/ target/scala_2.10/jars/ Note: <SPARK_HOME>/jars gets renamed to <SPARK_HOME>/jars_original. spark-submit defaults to assembly location if <SPARK_HOME>/jars directory is missing. 2.) Updated start up scripts for spark worker and spark master with a new env variable SPARK_SCALA_VERSIOn=2.10. Also updated PYTHONPATH variable to add new py4j-0.10.4-src.zip file 3.) Some changes to adhere to deprecated pyspark function calls which were removed in Spark 2.0 Change-Id: I8f8393bb91307d55f156b2ebf45225a16ae9d8f4	2017-08-21 11:18:22 -07:00
Flint Calvin	d8f283c378	Started adding kubernetes metrics aggregation There is a desire to use Monasca Transform to aggregate kubernetes metrics. This change is a start in that direction. The transformation specs in the tests folder now include some representative aggregations of some kubernetes metrics. This commit also includes some changes to get first_attempt_at_spark_test.py working again after being moved from the unit test folder to the functional test folder. Change-Id: I038ecaf42e67d5c994980991232a2a8428a4f4e3	2017-03-02 15:38:25 +00:00
David C Kennedy	8ac3250aef	Remove metric filters from transform-spec The metric filter allows metrics to be filtered before aggregation, for example to exclude metrics from certain environments or nodes from being included where they are outside the scope of data aggregation. This is a powerful feature but not appropriate for all scenarios (e.g. devstack). So default the filter to empty. Change-Id: Icb790a0ec41133bfac54244aae8782a5cc665186	2016-11-02 15:58:07 +00:00
Flint Calvin	87a8960467	Changed hourly storage.objects.size_agg operation The hourly aggregation for storage.objects.size_agg should have been a sum rather than an average. Change-Id: Icf018a24c5de0efb67faeeee1418bad5064a39e7	2016-09-14 17:13:26 +00:00
Flint Calvin	3cdb0d1687	Made corrections such that swiftlm.diskusage.rate_agg is now correctly based on swiftlm.diskusage.host.val.avail (instead of incorrectly being based on swiftlm.diskusage.host.val.size). Change-Id: If17853e166c050cefbf390791a8696ce520fca96	2016-09-06 20:52:49 +00:00
Flint Calvin	d4f791e9ef	Added aggregation of storage-objects.size. Change-Id: Iafd5ff8a8faf958ac9bada9981e8cd419d82ed2e	2016-08-18 20:09:30 +00:00
Flint Calvin	615e52d5cd	Modifications to make rate calculations work with two-stage aggregation. Change-Id: I8c7b6112a04ba378ba1911a342cb97e8c388ebc6	2016-08-09 16:33:34 +00:00
Flint Calvin	accbacb19e	Reintroduced some aggregations which were apparently lost when we transitioned to the upstream Monasca-Transform OpenStack repo. Specifically, the missing aggregations were those for the nova.vm.cpu.total_allocated and nova.vm.mem.total_allocated_mb source metrics. This set of changes also includes the resolution of a couple pre-existing pep8 errors. Change-Id: I84bf19b674aeadcd0d27799a887d0b89d0381550	2016-08-04 15:13:43 +00:00
Flint Calvin	a9775506cb	Removed 'device' as an expected dimension on incoming Swift metrics (since it is no longer included in them). Change-Id: Ide8a463b8678aec38857e6376118a09588b98e0a	2016-07-28 16:09:27 +00:00
darfed	bb83b30dc1	Add TLS/SSL capability to database connection Add properties to conf file to allow configuration of SSL for the database connection. Done for both the python and java connection strings. Change-Id: I4c3d25c3f8f12eae801a6a818bf4ac7acd93d2dc	2016-07-28 10:12:10 +01:00
Jenkins	3bf98a894b	Merge "Added filter capability for transform specs."	2016-07-19 03:02:55 +00:00
Flint Calvin	c7128b0136	Added filter capability for transform specs. Change-Id: Ie5b456039c9810da19c1699cc7d5a44277496843	2016-07-18 22:24:05 +00:00
Ashwin Agate	90b20bfd41	Change to monasca-common simport Use monasca-common simport library Closes-Bug: #1596331 Change-Id: I695d6db9c5c49c0120e73b76ea75f7a30222419d	2016-07-09 19:04:19 +00:00
Ashwin Agate	00b874a6b3	Two stage transformation Breaking down the aggregation into two stages. The first stage aggregates raw metrics frequently and is implemented as a Spark Streaming job which aggregates metrics at a configurable time interval (defaults to 10 minutes) and writes the intermediate aggregated data, or instance usage data to new "metrics_pre_hourly" kafka topic. The second stage is implemented as a batch job using Spark Streaming createRDD direct stream batch API, which is triggered by the first stage only when first stage runs at the top of the hour. Also enhanced kafka offsets table to keep track of offsets from two stages along with streaming batch time, last time version row got updated and revision number. By default it should keep last 10 revisions to the offsets for each application. Change-Id: Ib2bf7df6b32ca27c89442a23283a89fea802d146	2016-06-28 13:47:50 +00:00
Flint Calvin	d8e73f3bde	Added several Swift aggregations (including a new usage component for calculating rate changes). Also fixed some pep8 issues. Change-Id: I46685d39ace663595aa524f04d8d35a71c9432c3	2016-06-21 19:44:02 +00:00
Flint Calvin	c7aabb6927	Added aggregation for vm.mem.used_mb and swiftlm.diskusage.host.val.size. Also renamed disk.allocation to vm.disk.allocation and resolved a problem with resource_id not being found for certain aggregations. Change-Id: Iad82d149e7a04ed1e0ecfe936b90acfff1dca13e	2016-06-13 22:38:46 +00:00
Flint Calvin	11e8bac2cf	Added aggregation for cpu.total_logical_cores and cpu.utilized_logical_cores by host. Change-Id: Ib6c6def9f882ad0fb010494a5561225c37882a07	2016-06-06 15:12:31 +00:00
Flint Calvin	e4ade60711	Implemented aggregation for disk.allocation. Also set the apache download source to use the archive site to ensure that the dependency package does not disappear. Also brought the vagrant environment inline with monasca-api (i.e., use the same values for private network, add substitution for kafka brokers ip address to the conf). Also parameterised dependency sources (i.e., added settings to parameterise the maven and apache repositories for the devstack plugin). Change-Id: If9f0e2ed16bbfcd62152d29e5c7c86f5d555f9aa	2016-06-01 15:56:26 +00:00
Ashwin Agate	8f61dd95a9	monasca-transform initial commit The monasca-transform is a new component in Monasca that aggregates and transforms metrics. monasca-transform is a Spark based data driven aggregation engine which collects, groups and aggregates existing individual Monasca metrics according to business requirements and publishes new transformed (derived) metrics to the Monasca Kafka queue. Since the new transformed metrics are published as any other metric in Monasca, alarms can be set and triggered on the transformed metric, just like any other metric. Co-Authored-By: Flint Calvin <flint.calvin@hp.com> Co-Authored-By: David Charles Kennedy <david.c.kennedy@hpe.com> Co-Authored-By: Ashwin Agate <ashwin.agate@hp.com> Implements: blueprint monasca-transform Change-Id: I0e67ac7a4c9a5627ddaf698855df086d55a52d26	2016-05-26 00:10:37 +00:00

23 Commits