Merge "Add Telemetry Test Plan."

2016-12-06 06:04:59 +00:00 · 2016-12-06 06:04:59 +00:00 · 37cecc15dc
parent c3f7d4eb9e ff5664ef00
commit 37cecc15dc
1 changed files with 184 additions and 0 deletions
--- a/doc/source/test_plans/telemetry_scale/plan.rst
+++ b/doc/source/test_plans/telemetry_scale/plan.rst
@ -0,0 +1,184 @@
+.. _telemetry_scale:
+
+===========================================================
+Telemetry Services resource consumption/scalability testing
+===========================================================
+
+:status: **draft**
+:version: 1.0
+
+:Abstract:
+
+  This document describes how scalability and performance testing is conducted
+  on an OpenStack Cloud with a focus on OpenStack Telemetry Services. Currently
+  this focuses on Telemetry Services collection/processing of metrics, further
+  test cases can be added to scale and performance test other aspects of the
+  OpenStack Telemetry Services.
+
+
+Test Plan
+=========
+
+Characterize the resource consumption and application performance of OpenStack
+Telemetry Services on an OpenStack Cloud as a workload increases over time.
+As the workload is increased, measure System Performance Metrics (CPU, Memory,
+Disk, IO) and Application Performance Metrics (responsiveness, health,
+utilization, functionality) until desired load is reached or system/application
+failures.
+
+Test Environment
+----------------
+
+Preparation
+^^^^^^^^^^^
+Ideally this is run on a newly deployed cloud each time for repeatability
+purposes.  Cloud deployment should be documented for each test case / run as
+deployment will set many configuration values which will impact performance.
+
+Environment description
+^^^^^^^^^^^^^^^^^^^^^^^
+The environment description includes hardware specs, software versions, tunings
+and configuration of the OpenStack Cloud under test.
+
+Hardware
+~~~~~~~~
+List details of hardware for each node type here.
+
+Deployment node (Undercloud)
+
+-----------+------------------------------------------------------------+
+| Parameter | Value                                                      |
+-----------+------------------------------------------------------------+
+| model     | ex. Dell PowerEdge r610                                    |
+-----------+------------------------------------------------------------+
+| CPU       | ex. 2xIntel(R) Xeon(R) X5650 @ 2.67GHz (12Cores/24Threads) |
+-----------+------------------------------------------------------------+
+| Memory    | ex. 64GiB (@1333MHz)                                       |
+-----------+------------------------------------------------------------+
+| Disk      | ex. 4 x 146GiB 15K SAS Drives in RAID 0                    |
+-----------+------------------------------------------------------------+
+| Network   | ex. 2x1Gb/s Broadcom, 2x10Gb/s Intel X520                  |
+-----------+------------------------------------------------------------+
+
+Controller
+
+-----------+------------------------------------------------------------+
+| Parameter | Value                                                      |
+-----------+------------------------------------------------------------+
+| model     | ex. Dell PowerEdge r610                                    |
+-----------+------------------------------------------------------------+
+| CPU       | ex. 2xIntel(R) Xeon(R) X5650 @ 2.67GHz (12Cores/24Threads) |
+-----------+------------------------------------------------------------+
+| Memory    | ex. 64GiB (@1333MHz)                                       |
+-----------+------------------------------------------------------------+
+| Disk      | ex. 4 x 146GiB 15K SAS Drives in RAID 0                    |
+-----------+------------------------------------------------------------+
+| Network   | ex. 2x1Gb/s Broadcom, 2x10Gb/s Intel X520                  |
+-----------+------------------------------------------------------------+
+
+Compute
+
+-----------+------------------------------------------------------------+
+| Parameter | Value                                                      |
+-----------+------------------------------------------------------------+
+| model     | ex. Dell PowerEdge r610                                    |
+-----------+------------------------------------------------------------+
+| CPU       | ex. 2xIntel(R) Xeon(R) X5650 @ 2.67GHz (12Cores/24Threads) |
+-----------+------------------------------------------------------------+
+| Memory    | ex. 64GiB (@1333MHz)                                       |
+-----------+------------------------------------------------------------+
+| Disk      | ex. 4 x 146GiB 15K SAS Drives in RAID 0                    |
+-----------+------------------------------------------------------------+
+| Network   | ex. 2x1Gb/s Broadcom, 2x10Gb/s Intel X520                  |
+-----------+------------------------------------------------------------+
+
+Additional Hardware for testing/monitoring/results
+
+- Performance Monitoring Host (Carbon/Graphite/Grafana)
+- Performance Results Host (ElasticSearch/Kibana)
+
+Software
+~~~~~~~~
+Record versions of Linux kernel, Base Operating System (ex. Centos 7.3),
+OpenStack version (ex. Newton), OpenStack Packages, testing harness/framework
+and any other pertinent software.
+
+Tuning/Configuration
+~~~~~~~~~~~~~~~~~~~~
+Record deployed configuration, including the following but not limited to
+
+- # of Gnocchi-metricd processes
+- # api processes/threads
+- api deployed in httpd? (If so include httpd configuration options)
+- Backend (file, swift, ceph)
+- Ceilometer polling interval
+- Other Services worker/process counts (Nova, Neutron, ...)
+
+System Performance Monitoring
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Record System performance metrics into a separate metrics
+collection/storage/analysis system. Suggested system would be a separate
+machine with Carbon, Graphite, and Grafana with dashboards for monitoring
+system resource utilization.  To push metrics into the TSDB, collectd
+can/should be installed on all monitored machines. (Deployment, Controllers,
+and Computes)
+
+Test Diagram
+~~~~~~~~~~~~
+Attach test diagram to display test topology.
+
+Test Case 1
+-----------
+
+Description
+^^^^^^^^^^^
+
+Boot 50 persisting instances every 1200 seconds until 1000 instances booted
+and running in OpenStack cloud.
+
+Parameters
+
+#. Amount of Instances to boot per period (ex. 50)
+#. Amount of time to wait between booting periods (ex. 1200 seconds)
+#. Maximum number of instances desired for test (ex. 1000)
+
+**Depending upon available hardware, the above parameters will need to adjusted**
+
+Stopping/Failure Conditions
+
+- Max number of instances achieved
+- Failure to boot instances
+- Failure for Telemetry Services to consume metrics
+- Other service failures/errors
+- System out of Resources (ex. CPU 100% utilized)
+
+Setup
+^^^^^^^^
+
+#. Deploy OpenStack Cloud
+#. Install testing and monitoring tooling
+#. Gather metadata on Cloud
+#. Run test
+
+Analysis
+^^^^^^^^
+
+Review System performance metrics graphs during test duration to observe for
+stopping/failure conditions. Review testing harness output for test failure
+conditions.
+
+List of performance metrics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Performance
+
+- CPU utilization
+- Memory utilization
+- Disk IO utilization
+- Per-Process CPU/Memory/IO (Gnocchi, Ceilometer, Nova, Swift, Ceph ...)
+- Time required to Boot Instances
+- Responsiveness of Gnocchi/Ceilometer or services
+
+Failure Conditions
+
+- Errors in log files (Gnocchi, Ceilometer, Nova, Swift, ...)