* To find the oldest and newest quantities
for calculating rate, Spark Data Frame
is converted to a Spark RDD, which makes it
easy to do a group by and sort. The syntax to pull
the value from a map column type in an
RDD is rdd.map_column['key'] which
is different from dataframe.map_column.key
which was causing 'dict' object has no attribute
exception
* Renamed first_attempt_at_spark_test.py to
test_first_attempt_at_spark, because all the tests
in that file were not being run.
* Changed the pre transform and transform specs
which were being used by test_first_attempt_at_spark
to newer 'dimensions#field_name that was introduced
in rocky release.
Story: 2005449
Task: 30501
Change-Id: I6adeee54fe261c535372b8f5f3580e7d3261259b
Remove unused field service_id from pre-transform spec.
service_id's original purpose was identify the service
that is genarating metric, but this information should
be provided by the source, as a dimension rather
than assigning its value in pre-transform spec.
Change-Id: I223eb2296df438b139e3d9b5aaf4b1b679f70797
Depends-on: I81a35e048e6bd5649c6b3031ac2722be6a309088
Story: 2001815
Task: 12556
* Removed unused fields "event_status", "event_version",
"record_type", "mount", "device", "pod_name", "container_name"
"app", "interface", "deployment" and "daemon_set"
from record_store data. Now it is not required to add
new dimension, meta or value_meta fields to record store data
instead use special notation, e.g. "dimension#" to refer to any
dimension field in the incoming metric.
* Refactor and eliminate need to add any new metric.dimensions
field in multiple places e.g. add to record store and
instance usage dataframe schema and in all generic
aggregation components. Added a new Map type column
called "extra_data_map" to store any new fields, in
instance usage data format. Map type column eliminates the
need to add new columns to instance usage data.
* Allow users to define any fields in "meta",
"metric.dimensions" and "metric.value_meta" fields
for aggregation in "aggregation_group_by_list" or
"setter_group_by_list" using "dimensions#{$field_name}"
or "meta#{$field_name}" or "value_meta#{$field_name}"
* Updated generic aggregation components and data formats docs.
Change-Id: I81a35e048e6bd5649c6b3031ac2722be6a309088
Story: 2001815
Task: 19605
Following changes were required:
1.)
By default the pre-built distribution
for Spark 2.2.0 is compiled with Scala 2.11.
monasca-transform requires Spark compiled with
Scala 2.10 since we use spark streaming to
pull data from Kafka and the version of Kafka
is compatible with Scala 2.10.
The recommended way is to compile Spark
with Scala 2.10, but for purposes of devstack
plugin made changes to pull the required jars
from mvn directly.
(see SPARK_JARS and SPARK_JAVA_LIB variables in
settings)
All jars get moved to
<SPARK_HOME>/assembly/target/assembly/
target/scala_2.10/jars/
Note: <SPARK_HOME>/jars gets renamed
to <SPARK_HOME>/jars_original.
spark-submit defaults to assembly location
if <SPARK_HOME>/jars directory is missing.
2.) Updated start up scripts for spark
worker and spark master with a new env variable
SPARK_SCALA_VERSIOn=2.10. Also updated
PYTHONPATH variable to add new
py4j-0.10.4-src.zip file
3.) Some changes to adhere to deprecated pyspark
function calls which were removed in Spark 2.0
Change-Id: I8f8393bb91307d55f156b2ebf45225a16ae9d8f4
There is a desire to use Monasca Transform to aggregate
kubernetes metrics. This change is a start in that
direction. The transformation specs in the tests
folder now include some representative aggregations
of some kubernetes metrics.
This commit also includes some changes to get
first_attempt_at_spark_test.py working again
after being moved from the unit test folder to the
functional test folder.
Change-Id: I038ecaf42e67d5c994980991232a2a8428a4f4e3
The metric filter allows metrics to be filtered before aggregation,
for example to exclude metrics from certain environments or nodes
from being included where they are outside the scope of data
aggregation. This is a powerful feature but not appropriate for
all scenarios (e.g. devstack). So default the filter to empty.
Change-Id: Icb790a0ec41133bfac54244aae8782a5cc665186
The hourly aggregation for storage.objects.size_agg
should have been a sum rather than an average.
Change-Id: Icf018a24c5de0efb67faeeee1418bad5064a39e7
correctly based on swiftlm.diskusage.host.val.avail (instead of
incorrectly being based on swiftlm.diskusage.host.val.size).
Change-Id: If17853e166c050cefbf390791a8696ce520fca96
we transitioned to the upstream Monasca-Transform OpenStack repo.
Specifically, the missing aggregations were those for the
nova.vm.cpu.total_allocated and nova.vm.mem.total_allocated_mb
source metrics.
This set of changes also includes the resolution of a couple
pre-existing pep8 errors.
Change-Id: I84bf19b674aeadcd0d27799a887d0b89d0381550
Add properties to conf file to allow configuration of
SSL for the database connection. Done for both the python
and java connection strings.
Change-Id: I4c3d25c3f8f12eae801a6a818bf4ac7acd93d2dc
Breaking down the aggregation into two stages.
The first stage aggregates raw metrics frequently and is
implemented as a Spark Streaming job which
aggregates metrics at a configurable time interval
(defaults to 10 minutes) and writes the intermediate
aggregated data, or instance usage data
to new "metrics_pre_hourly" kafka topic.
The second stage is implemented
as a batch job using Spark Streaming createRDD
direct stream batch API, which is triggered by the
first stage only when first stage runs at the
top of the hour.
Also enhanced kafka offsets table to keep track
of offsets from two stages along with streaming
batch time, last time version row got updated
and revision number. By default it should keep
last 10 revisions to the offsets for each
application.
Change-Id: Ib2bf7df6b32ca27c89442a23283a89fea802d146
swiftlm.diskusage.host.val.size. Also renamed disk.allocation to
vm.disk.allocation and resolved a problem with resource_id not
being found for certain aggregations.
Change-Id: Iad82d149e7a04ed1e0ecfe936b90acfff1dca13e
apache download source to use the archive site to ensure that
the dependency package does not disappear. Also brought the
vagrant environment inline with monasca-api (i.e., use the
same values for private network, add substitution for kafka
brokers ip address to the conf). Also parameterised
dependency sources (i.e., added settings to parameterise the
maven and apache repositories for the devstack plugin).
Change-Id: If9f0e2ed16bbfcd62152d29e5c7c86f5d555f9aa
The monasca-transform is a new component in Monasca that
aggregates and transforms metrics.
monasca-transform is a Spark based data driven aggregation
engine which collects, groups and aggregates existing individual
Monasca metrics according to business requirements and publishes
new transformed (derived) metrics to the Monasca Kafka queue.
Since the new transformed metrics are published as any other
metric in Monasca, alarms can be set and triggered on the
transformed metric, just like any other metric.
Co-Authored-By: Flint Calvin <flint.calvin@hp.com>
Co-Authored-By: David Charles Kennedy <david.c.kennedy@hpe.com>
Co-Authored-By: Ashwin Agate <ashwin.agate@hp.com>
Implements: blueprint monasca-transform
Change-Id: I0e67ac7a4c9a5627ddaf698855df086d55a52d26