monasca-transform

Commit Graph

Author	SHA1	Message	Date
Witek Bedyk	811acd76c9	Remove project content on master branch This is step 2b of repository deprecation process as described in [1]. Project deprecation has been anounced here [2]. [1] https://docs.openstack.org/project-team-guide/repository.html#step-2b-remove-project-content [2] http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016814.html Depends-On: https://review.opendev.org/751983 Change-Id: I83bb2821d64a4dddd569ff9939aa78d271834f08	2020-09-15 10:12:44 +02:00
Witold Bedyk	8257dd9f42	Fix PEP8 tests for Python 3 * relace long with int * replace execfile with exec Change-Id: If98949fa5f49091fbf11c95a65302a4f844538c9 Story: 2003240 Task: 26827	2018-10-01 11:21:23 +02:00
Ashwin Agate	f43d3498c4	Update monasca-transform devstack plugin * set "publish_region" config value from variable "$REGION_NAME" * set default "publish_kafka_project_id" to "mini-mon" instead of "admin" * update the systemd service unit file since that file is created by devstack's "run_process" function - set KillMode to 'control-group', since monasca-tranform generates several child process * remove monasca-transform.service file since its now being generated by devstack plugin. Change-Id: I6654b7973f8502d4805d25c96b2038291e398552 Story: 2001815 Task: 14328	2018-04-17 13:05:35 -07:00
Ashwin Agate	022bd11a4d	Switch to using Spark version 2.2.0 Following changes were required: 1.) By default the pre-built distribution for Spark 2.2.0 is compiled with Scala 2.11. monasca-transform requires Spark compiled with Scala 2.10 since we use spark streaming to pull data from Kafka and the version of Kafka is compatible with Scala 2.10. The recommended way is to compile Spark with Scala 2.10, but for purposes of devstack plugin made changes to pull the required jars from mvn directly. (see SPARK_JARS and SPARK_JAVA_LIB variables in settings) All jars get moved to <SPARK_HOME>/assembly/target/assembly/ target/scala_2.10/jars/ Note: <SPARK_HOME>/jars gets renamed to <SPARK_HOME>/jars_original. spark-submit defaults to assembly location if <SPARK_HOME>/jars directory is missing. 2.) Updated start up scripts for spark worker and spark master with a new env variable SPARK_SCALA_VERSIOn=2.10. Also updated PYTHONPATH variable to add new py4j-0.10.4-src.zip file 3.) Some changes to adhere to deprecated pyspark function calls which were removed in Spark 2.0 Change-Id: I8f8393bb91307d55f156b2ebf45225a16ae9d8f4	2017-08-21 11:18:22 -07:00
Ashwin Agate	8feb6db47d	Monasca Transform devstack rotate spark logs in /var/run/spark Enable spark configuration to rotate logs in /var/run/spark directory and keep minimum set of files on devstack environment. Change-Id: I6d2c46d4ec53475e49f5bff6f2b82dccc0d01bbf	2017-04-20 12:57:23 -07:00
agateaaa	2da390414e	Hourly aggregation account for early arrving metrics With this change pre hourly processor which does the hourly aggregation (second stage) and writes the final aggregated metrics to metris topic in kafka now accounts for any early arriving metrics. This change along with two previous changes to pre hourly processor that added 1.) configurable late metrics slack time (https://review.openstack.org/#/c/394497/), and 2.) batch filtering (https://review.openstack.org/#/c/363100/) will make sure all late arriving or early arriving metrics for an hour are aggregated appropriately. Also made improvement in MySQL offset to call delete excess revisions only once. Change-Id: I919cddf343821fe52ad6a1d4170362311f84c0e4	2017-04-17 15:29:34 -07:00
Joseph Davis	a64f1247a8	Change Spark version to 1.6.3 We have seen several instances where Spark 1.6.1 over time continues to consume more and more resources. The change Ibf244cbfc00a90ada66f492b473719c25fa17fd2 was not enough alone to curb this growth, but the new version of Spark has shown better behavior. Related changes will also need to be done in any installer, such as Ansible. Change-Id: Ib6b1220cf0186def115846c8cf71684bb2d6e8c7	2017-03-14 14:31:43 -07:00
agatea	1579d8b9e5	Reuse existing spark sql context Prevent creating a new spark sql context object with every batch. Profiling of java heap for the driver indicated that there is a steady increase (~12MB over 5 days) of org.apache.spark.sql.execution.metric.LongSQLMetricValue and org.apache.spark.sql.execution.ui.SQLTaskMetrics with each batch execution. These are used by the spark streaming ui and were not being garbage collected. See https://issues.apache.org/jira/browse/SPARK-17381 with a similar issue. This change along with setting spark.sql.ui.retainedExecutions to a low number in sparks-defaults.conf will reduce gradual increase in heap size. Also made a change to catch unhandled MemberNotJoined exception because of whichthe transform service thread went into a unresponsive state. Change-Id: Ibf244cbfc00a90ada66f492b473719c25fa17fd2	2017-02-27 14:06:53 -08:00
David C Kennedy	88e04a3096	Fix ci The devstack plugin carries usage of `sudo -u` which doesn't seem to work in the ci environment. Replace it with sudo followed by appropriate permissions changes. Use ${DEST} instead of literal /opt/stack to fit with gate usage. Enabled monasca-api plugin in the settings and the required monasca services along with zookeeper. Change-Id: I6effede4ac9a2faf1c44eff9cd96bbf9c924d703	2016-12-15 07:55:17 +00:00
David C Kennedy	0a73dc9110	Refactored vagrant environment for xenial Change-Id: I6cf6452f68bb143710ece4e80e2b216a07eaf07b	2016-11-25 10:52:55 +00:00
David C Kennedy	26e53336d4	Add configurable amnesty period for late metrics Added configuration option to allow the pre-hourly transformation to be done at a specified period past the hour. This includes a check to ensure that if not done yet for the hour but overdue processing is done at the earliest time. Change-Id: I8882f3089ca748ce435b4e9a92196a72a0a8e63f	2016-11-22 13:03:52 +00:00
David C Kennedy	8a6e619f72	Populate the project id for kafka publish This needs to be the admin project id so for devstack this needs to be written to the configuration file once the users/projects etc are created and identifiable. Add a similar process to the refresh script. Correct the configuration property name to 'project' rather than using the old nomencature 'tenant'. Change-Id: Ib9970ffacf5ee0f7f006722038a1db8024c1385e	2016-11-11 15:24:22 +00:00
David C Kennedy	501297ab03	Enable cleanup for spark-worker The tmpfs that is /run on devstack can fill up quickly, enable the cleanup. Change-Id: I687204c2713e0edc075d57159e8842de8e519710	2016-11-08 09:57:07 +00:00
Tomasz Trębski	698856b383	Use database password from variable There is a variable that tells what is the database (mysql) password. However plugin.sh is using a hardcoded password. Commits provides using DATABASE_PASSWORD variable (the same one as devstack is using) + defines a variable for m-transform user - MONASCA_TRANSFORM_DB_PASSWORD Change-Id: I9fc8296ef31b22564f2cf1536e51ab3abc8c9dc9	2016-10-25 14:55:47 +01:00
Flint Calvin	0ea79c0305	Added aggregation results to application log Made changes such that debug-level log entries are written to the application log noting which aggregated metrics are submitted during pre-hourly and hourly processing. Change-Id: I64c6a18233614fe680aa0b084570ee7885f316e5	2016-09-23 18:24:20 +00:00
Flint Calvin	bf2e42b3e0	Made changes to prevent multiple metrics in the same batch. Change-Id: Iec9935c21d8b65bf79067d4a009859c898b75993	2016-08-31 18:18:25 +00:00
David C Kennedy	b3f85e134e	Change jdbc driver to drizzle drizzle is BSD licensed compared with the GPLv2 mysql-connector from oracle. Change-Id: Ie203a285cf031e5c29d6d7673fab437cfe8acca4	2016-07-08 18:29:24 +00:00
darfed	9c95206bac	Corrected log file name The log file was being duplicated at monasca-transform.log and monasca_transform.log. Fixed this to be set simply at monasca-transform.log. Change-Id: I6a63737c569b06a271e11b880675edadfbdcc250	2016-06-30 21:59:19 +01:00
Ashwin Agate	00b874a6b3	Two stage transformation Breaking down the aggregation into two stages. The first stage aggregates raw metrics frequently and is implemented as a Spark Streaming job which aggregates metrics at a configurable time interval (defaults to 10 minutes) and writes the intermediate aggregated data, or instance usage data to new "metrics_pre_hourly" kafka topic. The second stage is implemented as a batch job using Spark Streaming createRDD direct stream batch API, which is triggered by the first stage only when first stage runs at the top of the hour. Also enhanced kafka offsets table to keep track of offsets from two stages along with streaming batch time, last time version row got updated and revision number. By default it should keep last 10 revisions to the offsets for each application. Change-Id: Ib2bf7df6b32ca27c89442a23283a89fea802d146	2016-06-28 13:47:50 +00:00
Flint Calvin	d8e73f3bde	Added several Swift aggregations (including a new usage component for calculating rate changes). Also fixed some pep8 issues. Change-Id: I46685d39ace663595aa524f04d8d35a71c9432c3	2016-06-21 19:44:02 +00:00
David C Kennedy	02b23741a5	Removed spark-events from config Allow spark to configure the location of spark-events. Add spark events log config to spark-defaults in devstack plugin. Move spark events logging to /var/log/spark/events for devstack plugin. Set group permissions to ensure spark events log directory is group, but not world writable. Change-Id: I26aef23a9a801a02a20e14899e1c89b10556e4d4	2016-06-20 10:15:02 +01:00
David C Kennedy	05c36ab8e5	Allow configurable SPARK_HOME Spark could feasibly be installed in any location so we should allow SPARK_HOME to be specified in the conf file and that value used in the spark-submit carried out in the transform service invocation. Change-Id: I4d25ccaa0e271eeb783d186666cdc8aaf131097c	2016-06-03 16:55:23 +01:00
Flint Calvin	e4ade60711	Implemented aggregation for disk.allocation. Also set the apache download source to use the archive site to ensure that the dependency package does not disappear. Also brought the vagrant environment inline with monasca-api (i.e., use the same values for private network, add substitution for kafka brokers ip address to the conf). Also parameterised dependency sources (i.e., added settings to parameterise the maven and apache repositories for the devstack plugin). Change-Id: If9f0e2ed16bbfcd62152d29e5c7c86f5d555f9aa	2016-06-01 15:56:26 +00:00
Ashwin Agate	8f61dd95a9	monasca-transform initial commit The monasca-transform is a new component in Monasca that aggregates and transforms metrics. monasca-transform is a Spark based data driven aggregation engine which collects, groups and aggregates existing individual Monasca metrics according to business requirements and publishes new transformed (derived) metrics to the Monasca Kafka queue. Since the new transformed metrics are published as any other metric in Monasca, alarms can be set and triggered on the transformed metric, just like any other metric. Co-Authored-By: Flint Calvin <flint.calvin@hp.com> Co-Authored-By: David Charles Kennedy <david.c.kennedy@hpe.com> Co-Authored-By: Ashwin Agate <ashwin.agate@hp.com> Implements: blueprint monasca-transform Change-Id: I0e67ac7a4c9a5627ddaf698855df086d55a52d26	2016-05-26 00:10:37 +00:00

24 Commits