* Removed unused fields "event_status", "event_version",
"record_type", "mount", "device", "pod_name", "container_name"
"app", "interface", "deployment" and "daemon_set"
from record_store data. Now it is not required to add
new dimension, meta or value_meta fields to record store data
instead use special notation, e.g. "dimension#" to refer to any
dimension field in the incoming metric.
* Refactor and eliminate need to add any new metric.dimensions
field in multiple places e.g. add to record store and
instance usage dataframe schema and in all generic
aggregation components. Added a new Map type column
called "extra_data_map" to store any new fields, in
instance usage data format. Map type column eliminates the
need to add new columns to instance usage data.
* Allow users to define any fields in "meta",
"metric.dimensions" and "metric.value_meta" fields
for aggregation in "aggregation_group_by_list" or
"setter_group_by_list" using "dimensions#{$field_name}"
or "meta#{$field_name}" or "value_meta#{$field_name}"
* Updated generic aggregation components and data formats docs.
Change-Id: I81a35e048e6bd5649c6b3031ac2722be6a309088
Story: 2001815
Task: 19605
With this change pre hourly processor which does the
hourly aggregation (second stage) and writes the
final aggregated metrics to metris topic in kafka
now accounts for any early arriving metrics.
This change along with two previous changes
to pre hourly processor that added
1.) configurable late metrics slack time
(https://review.openstack.org/#/c/394497/), and
2.) batch filtering
(https://review.openstack.org/#/c/363100/)
will make sure all late arriving or early
arriving metrics for an hour are aggregated
appropriately.
Also made improvement in MySQL offset to call
delete excess revisions only once.
Change-Id: I919cddf343821fe52ad6a1d4170362311f84c0e4
There is a desire to use Monasca Transform to aggregate
kubernetes metrics. This change is a start in that
direction. The transformation specs in the tests
folder now include some representative aggregations
of some kubernetes metrics.
This commit also includes some changes to get
first_attempt_at_spark_test.py working again
after being moved from the unit test folder to the
functional test folder.
Change-Id: I038ecaf42e67d5c994980991232a2a8428a4f4e3
Added configuration option to allow the pre-hourly transformation to be
done at a specified period past the hour. This includes a check to
ensure that if not done yet for the hour but overdue processing is done
at the earliest time.
Change-Id: I8882f3089ca748ce435b4e9a92196a72a0a8e63f
This needs to be the admin project id so for devstack this needs to be written
to the configuration file once the users/projects etc are created and
identifiable.
Add a similar process to the refresh script.
Correct the configuration property name to 'project' rather than using the old
nomencature 'tenant'.
Change-Id: Ib9970ffacf5ee0f7f006722038a1db8024c1385e
Made changes such that debug-level log entries are written to
the application log noting which aggregated metrics are submitted
during pre-hourly and hourly processing.
Change-Id: I64c6a18233614fe680aa0b084570ee7885f316e5
Add properties to conf file to allow configuration of
SSL for the database connection. Done for both the python
and java connection strings.
Change-Id: I4c3d25c3f8f12eae801a6a818bf4ac7acd93d2dc
The log file was being duplicated at monasca-transform.log and
monasca_transform.log. Fixed this to be set simply at
monasca-transform.log.
Change-Id: I6a63737c569b06a271e11b880675edadfbdcc250
Breaking down the aggregation into two stages.
The first stage aggregates raw metrics frequently and is
implemented as a Spark Streaming job which
aggregates metrics at a configurable time interval
(defaults to 10 minutes) and writes the intermediate
aggregated data, or instance usage data
to new "metrics_pre_hourly" kafka topic.
The second stage is implemented
as a batch job using Spark Streaming createRDD
direct stream batch API, which is triggered by the
first stage only when first stage runs at the
top of the hour.
Also enhanced kafka offsets table to keep track
of offsets from two stages along with streaming
batch time, last time version row got updated
and revision number. By default it should keep
last 10 revisions to the offsets for each
application.
Change-Id: Ib2bf7df6b32ca27c89442a23283a89fea802d146
Spark could feasibly be installed in any location so we should
allow SPARK_HOME to be specified in the conf file and that
value used in the spark-submit carried out in the transform
service invocation.
Change-Id: I4d25ccaa0e271eeb783d186666cdc8aaf131097c
The monasca-transform is a new component in Monasca that
aggregates and transforms metrics.
monasca-transform is a Spark based data driven aggregation
engine which collects, groups and aggregates existing individual
Monasca metrics according to business requirements and publishes
new transformed (derived) metrics to the Monasca Kafka queue.
Since the new transformed metrics are published as any other
metric in Monasca, alarms can be set and triggered on the
transformed metric, just like any other metric.
Co-Authored-By: Flint Calvin <flint.calvin@hp.com>
Co-Authored-By: David Charles Kennedy <david.c.kennedy@hpe.com>
Co-Authored-By: Ashwin Agate <ashwin.agate@hp.com>
Implements: blueprint monasca-transform
Change-Id: I0e67ac7a4c9a5627ddaf698855df086d55a52d26