This commit updates hacking version to 1.1.x and fixes related
pep8 issues.
Also added pycodestyle in test-requirements
Story: 2004930
Task: 29318
Co-Authored-By: Akhil Jain <akhil.jain@india.nec.com>
Change-Id: Id3ad30d23b902ee6f7277f7ec20d7d523df232f6
* To find the oldest and newest quantities
for calculating rate, Spark Data Frame
is converted to a Spark RDD, which makes it
easy to do a group by and sort. The syntax to pull
the value from a map column type in an
RDD is rdd.map_column['key'] which
is different from dataframe.map_column.key
which was causing 'dict' object has no attribute
exception
* Renamed first_attempt_at_spark_test.py to
test_first_attempt_at_spark, because all the tests
in that file were not being run.
* Changed the pre transform and transform specs
which were being used by test_first_attempt_at_spark
to newer 'dimensions#field_name that was introduced
in rocky release.
Story: 2005449
Task: 30501
Change-Id: I6adeee54fe261c535372b8f5f3580e7d3261259b
New releases of oslo.config support a 'mutable' parameter to Opts.
oslo.service provides an option here Icec3e664f3fe72614e373b2938e8dee53cf8bc5e
allows services to tell oslo.service they want mutate_config_files to be
called by passing a parameter.
This commit is to use the same. This allows monasca_transform to benefit from
I1e7a69de169cc85f4c09954b2f46ce2da7106d90, where the 'debug' option
(owned by oslo.log) is made mutable. we should be able to turn debug
logging on and off by changing the config.
tc goal:
https://governance.openstack.org/tc/goals/rocky/enable-mutable-configuration.html
Change-Id: I86571df78014a810ffa881ceceeddfc5193c9ca5
Remove unused field service_id from pre-transform spec.
service_id's original purpose was identify the service
that is genarating metric, but this information should
be provided by the source, as a dimension rather
than assigning its value in pre-transform spec.
Change-Id: I223eb2296df438b139e3d9b5aaf4b1b679f70797
Depends-on: I81a35e048e6bd5649c6b3031ac2722be6a309088
Story: 2001815
Task: 12556
* Removed unused fields "event_status", "event_version",
"record_type", "mount", "device", "pod_name", "container_name"
"app", "interface", "deployment" and "daemon_set"
from record_store data. Now it is not required to add
new dimension, meta or value_meta fields to record store data
instead use special notation, e.g. "dimension#" to refer to any
dimension field in the incoming metric.
* Refactor and eliminate need to add any new metric.dimensions
field in multiple places e.g. add to record store and
instance usage dataframe schema and in all generic
aggregation components. Added a new Map type column
called "extra_data_map" to store any new fields, in
instance usage data format. Map type column eliminates the
need to add new columns to instance usage data.
* Allow users to define any fields in "meta",
"metric.dimensions" and "metric.value_meta" fields
for aggregation in "aggregation_group_by_list" or
"setter_group_by_list" using "dimensions#{$field_name}"
or "meta#{$field_name}" or "value_meta#{$field_name}"
* Updated generic aggregation components and data formats docs.
Change-Id: I81a35e048e6bd5649c6b3031ac2722be6a309088
Story: 2001815
Task: 19605
* set the maximum line length to 100
* cleaned up the codes for pep8
Change-Id: Iab260a4e77584aae31c0596f39146dd5092b807a
Signed-off-by: Amir Mofakhar <amofakhar@op5.com>
Following changes were required:
1.)
By default the pre-built distribution
for Spark 2.2.0 is compiled with Scala 2.11.
monasca-transform requires Spark compiled with
Scala 2.10 since we use spark streaming to
pull data from Kafka and the version of Kafka
is compatible with Scala 2.10.
The recommended way is to compile Spark
with Scala 2.10, but for purposes of devstack
plugin made changes to pull the required jars
from mvn directly.
(see SPARK_JARS and SPARK_JAVA_LIB variables in
settings)
All jars get moved to
<SPARK_HOME>/assembly/target/assembly/
target/scala_2.10/jars/
Note: <SPARK_HOME>/jars gets renamed
to <SPARK_HOME>/jars_original.
spark-submit defaults to assembly location
if <SPARK_HOME>/jars directory is missing.
2.) Updated start up scripts for spark
worker and spark master with a new env variable
SPARK_SCALA_VERSIOn=2.10. Also updated
PYTHONPATH variable to add new
py4j-0.10.4-src.zip file
3.) Some changes to adhere to deprecated pyspark
function calls which were removed in Spark 2.0
Change-Id: I8f8393bb91307d55f156b2ebf45225a16ae9d8f4
The 'message' attribute has been deprecated and removed
from Python3.
For more details, please check:
https://www.python.org/dev/peps/pep-0352/
Change-Id: Ieaf6196fad7aa5e98ba6d6f5cea6f5f413fd4b69
Check periodically if host continues to be a
leader once elected. Failure to check
might lead to a situation where the host
has lost leadership but is not aware of the
situation.
If the host is no longer the leader then
stand down as a leader, stop any spark-submit
processes running on the node and reset state
in the transform thread.
Removed --supervise option when invoking
spark-submit to turn off built in driver
management.
Added some hardening to better catch exceptions
in main transform service thread and also
periodic leader check function so that the
threads don't die when they encounter
an unhandled exception.
Change-Id: If2e13e3ed6cb30b3d7fa5f1b440c4c39b87692be
refresh_monasca_script.sh is useful for
development in a devstack environment.
Enhanced the script to
* start and stop monasca-transform process
running in a screen session
* added more hardening to catch for errors
and exit if any command fails.
* added debugging statements help track
down any errors when the script is run.
Change-Id: Idab02d555eed192d8242c870017955b935532c3d
With this change pre hourly processor which does the
hourly aggregation (second stage) and writes the
final aggregated metrics to metris topic in kafka
now accounts for any early arriving metrics.
This change along with two previous changes
to pre hourly processor that added
1.) configurable late metrics slack time
(https://review.openstack.org/#/c/394497/), and
2.) batch filtering
(https://review.openstack.org/#/c/363100/)
will make sure all late arriving or early
arriving metrics for an hour are aggregated
appropriately.
Also made improvement in MySQL offset to call
delete excess revisions only once.
Change-Id: I919cddf343821fe52ad6a1d4170362311f84c0e4
Changing devstack environment vagrant box and
also rename the devstack VM to 'devstack'
from 'pg-tips'
Also fixing all the tests that were broken when
they were moved from tests/unit to tests/functional
with this review
https://review.openstack.org/#/c/400237/
Update devstack README with a section called
Development workflow for monasca-transform with
steps developers can take to develop and run
tests.
Change-Id: I11678148ba2bcb96eb3e2a522176683dc8bca30a
kafka_python 0.9.5 was moved to monasca common
Upstream community wants to move to
newer version of kafka python which has number of
performance problems.
See https://review.openstack.org/#/c/420579/
and
https://review.openstack.org/#/c/424840/
Monasca transform
uses kafka python library to write aggregated
metrics to kafka as well as read offset information
in case of hourly aggregation. Since long term
plan is to move to pykafka in the future we will
have to investigate if that functionality
is available.
Change-Id: I831c9e259b3d7b92fb2834193034e15b62c80c37
There is a desire to use Monasca Transform to aggregate
kubernetes metrics. This change is a start in that
direction. The transformation specs in the tests
folder now include some representative aggregations
of some kubernetes metrics.
This commit also includes some changes to get
first_attempt_at_spark_test.py working again
after being moved from the unit test folder to the
functional test folder.
Change-Id: I038ecaf42e67d5c994980991232a2a8428a4f4e3
Prevent creating a new spark sql context object with every batch.
Profiling of java heap for the driver indicated that there is a
steady increase (~12MB over 5 days) of
org.apache.spark.sql.execution.metric.LongSQLMetricValue
and org.apache.spark.sql.execution.ui.SQLTaskMetrics with
each batch execution. These are used by the spark streaming
ui and were not being garbage collected.
See https://issues.apache.org/jira/browse/SPARK-17381
with a similar issue.
This change along with setting
spark.sql.ui.retainedExecutions to a low number in
sparks-defaults.conf will reduce gradual increase in heap
size.
Also made a change to catch unhandled MemberNotJoined exception
because of whichthe transform service thread went into
a unresponsive state.
Change-Id: Ibf244cbfc00a90ada66f492b473719c25fa17fd2
Fixed a bug where the hourly agregation would run at every iteration
if the hour is zero (midnight) because zero is falsey.
Change-Id: I9652f02aea30f3ddb6f154db716aa4057455be06
For logging the exception message: e.message has been
deprecated. The preferred way is to call str(e).
More details: https://www.python.org/dev/peps/pep-0352/
Change-Id: I27b6a7b1f5e336df3cd618684cedfd01c840c99f
Pre Hourly processor fails if offsets recorded in
kafka_offsets table no longer exist in kafka.
This change deletes the offsets from kafka_offsets
table, so that the pre hourly processor can resume
processing with the next run.
Change-Id: I017c271e630fdf6de05a73b3bfcb14f5ed18615f
Added configuration option to allow the pre-hourly transformation to be
done at a specified period past the hour. This includes a check to
ensure that if not done yet for the hour but overdue processing is done
at the earliest time.
Change-Id: I8882f3089ca748ce435b4e9a92196a72a0a8e63f
Removing unique metric count aggregation
since it causes data from Kafka to be pulled
and processed twice and leads to unecessary
increase in over all time required to process
a batch.
Change-Id: I2046f95709232979dfd590d5293c803cac05bbb2
This needs to be the admin project id so for devstack this needs to be written
to the configuration file once the users/projects etc are created and
identifiable.
Add a similar process to the refresh script.
Correct the configuration property name to 'project' rather than using the old
nomencature 'tenant'.
Change-Id: Ib9970ffacf5ee0f7f006722038a1db8024c1385e
The metric filter allows metrics to be filtered before aggregation,
for example to exclude metrics from certain environments or nodes
from being included where they are outside the scope of data
aggregation. This is a powerful feature but not appropriate for
all scenarios (e.g. devstack). So default the filter to empty.
Change-Id: Icb790a0ec41133bfac54244aae8782a5cc665186
Made changes such that debug-level log entries are written to
the application log noting which aggregated metrics are submitted
during pre-hourly and hourly processing.
Change-Id: I64c6a18233614fe680aa0b084570ee7885f316e5
The hourly aggregation for storage.objects.size_agg
should have been a sum rather than an average.
Change-Id: Icf018a24c5de0efb67faeeee1418bad5064a39e7
Removed the calls to the ceiling function on utilization
metrics aggregation such that they now are exact values (i.e.,
not rounded up to the next integral value).
Change-Id: I9813b94acb051f6754da2d559090318010f86e57
correctly based on swiftlm.diskusage.host.val.avail (instead of
incorrectly being based on swiftlm.diskusage.host.val.size).
Change-Id: If17853e166c050cefbf390791a8696ce520fca96
we transitioned to the upstream Monasca-Transform OpenStack repo.
Specifically, the missing aggregations were those for the
nova.vm.cpu.total_allocated and nova.vm.mem.total_allocated_mb
source metrics.
This set of changes also includes the resolution of a couple
pre-existing pep8 errors.
Change-Id: I84bf19b674aeadcd0d27799a887d0b89d0381550
Add properties to conf file to allow configuration of
SSL for the database connection. Done for both the python
and java connection strings.
Change-Id: I4c3d25c3f8f12eae801a6a818bf4ac7acd93d2dc