From a789aeed16d32285a2195abe3de57f3ef4e0ab0f Mon Sep 17 00:00:00 2001
From: Kostiantyn Danylov <kdanilov@mirantis.com>
Date: Wed, 5 Apr 2017 14:46:37 +0300
Subject: [PATCH] Ceph RBD test plan

Change-Id: I3611a385fd734ba699670f4a7867161d07a887d2
---
 doc/source/test_plans/ceph_rbd/index.rst | 216 +++++++++++++++++++++++
 1 file changed, 216 insertions(+)
 create mode 100644 doc/source/test_plans/ceph_rbd/index.rst

diff --git a/doc/source/test_plans/ceph_rbd/index.rst b/doc/source/test_plans/ceph_rbd/index.rst
new file mode 100644
index 0000000..c16c11e
--- /dev/null
+++ b/doc/source/test_plans/ceph_rbd/index.rst
@@ -0,0 +1,216 @@
+.. _ceph_rbd_test_plan:
+
+============================
+Ceph RBD performance testing
+============================
+
+:status: **ready**
+:version: 1.0
+
+:Abstract:
+
+  This test plan aims to provide set of tests to identify Ceph RBD
+  performance against given Ceph cluster by using of Wally tests.
+
+Test Plan
+=========
+
+The purpose of this document is to describe the environment and performance test plan
+for benchmarking Ceph block storage (RBD) performance.
+
+The main goals are:
+
+- Define test approach, methodology and benchmarking toolset for testing Ceph
+  block storage performance
+- Benchmark Ceph performance for defined scenarios
+
+Preparation
+-----------
+
+This test plan is performed against existing Ceph cluster.
+Single VM is created for running tests on every compute node.
+Before running IO load storage devices are filled with pseudo-random data.
+
+Execution Strategy
+------------------
+
+All tests are executed sequentially on all dedicated virtual machines. Number of IO load
+threads per VM depends on test phase. Every test starts with 30 second warm-up, which
+is not included in test results, followed by 180 second test load phase. At any given time
+a single VM per compute node generates IO load with given number of threads.
+
+Block size for small block read/write operation is chosen to be 4K, since using smaller
+blocks is not reasonable because a) most modern HDD drives have physical sector size
+equal to 4KB and b) default Linux virtual memory page size equals to 4KB too. Larger
+block sizes provides no additional information since maximal I/O operations per second
+value is constant due to HDD mechanics.
+
+Block size for large block sequential read/write operations has no certain limitations
+except being bigger than Ceph block size (4MB), so value of 16MB was chosen.
+
+Test tool
+---------
+
+For benchmarking Ceph performance new tool (`Wally`_) was developed. It uses Flexible IO
+(fio) as load generator.
+
+Test types
+----------
+
+Following load scenarios are selected for Ceph benchmarking:
+
+- Average random read IOPS for small (4KB) blocks as function of thread count
+- Average random write IOPS for small (4KB) blocks, both for direct and
+  synchronous mode, as function of thread count
+- Average linear read throughput for large (16MB) blocks, as function of thread
+  count
+- Average linear write throughput for large (16MB) blocks, as function of thread
+  count
+- Maximal synchronous random write IOPS for small (4K) blocks with latency not
+  exceeding some predefined value.
+- Maximal random read IOPS for small (4K) blocks with latency not exceeding some
+  predefined value.
+- Maximal amount of threads (virtual machines) can be served from storage with
+  given SLA.
+
+Every load scenario is executed for different number of simultaneous threads.
+
+Actual values for scenario parameters are defined in section "Load Description"
+
+Disk operations with small block size shows maximum IO operations rate under
+sustained load, moving bottleneck to disks, while sequential operations with large block
+sizes allows to estimate system performance when bottleneck is network.
+
+Test Measurements and Metrics
+-----------------------------
+
+During every test run raw metrics are collected at least once per second. Collected data
+are reported after test run. Report should include median value for a metric, 95%
+confidence interval and standard deviation value. Charts can be generated for selected
+metrics.
+
+Following metrics are collected on each host for all test scenarios:
+
+- CPU usage per core and total
+- RAM utilization
+- Network throughput and IOPS on both replication and public interfaces
+- Throughput, IOPS and latency per storage device for each participating storage
+  devices
+
+Following metrics are additionally collected on test VM depending on test type:
+
+- Random read/write tests:
+
+  - Storage IOPS per thread
+  - Storage operations latency
+
+- Sequential read/write tests
+
+  - Storage throughput
+
+Expected Results and Pass/Fail Criteria
+---------------------------------------
+
+Pass/Fail Criteria
+~~~~~~~~~~~~~~~~~~
+
+A test run is considered as failed if one or more test loads is not completed without
+errors.
+
+Expected results
+~~~~~~~~~~~~~~~~
+
+No certain expected results are provided since the purpose of this testing effort is to
+create benchmarking framework and collect baseline data for described environment.
+
+The only requirement is that pass criteria are fulfilled.
+
+However, results difference between test runs by more than 10% for same test
+scenarios should be explained. This value is based on test execution experience (results
+variation is about 5%).
+
+Load Description
+----------------
+
+- Random write in synchronous mode using 4k block size. 1, 5, 10, 15, 25 and 40
+  threads
+- Random write in direct mode using 4k block size. 1 thread
+- Random read using 4k block size. 1, 5, 10, 15, 25, 40, 80 and 120 threads
+- Number of VMs conforming SLA (4K block size, 60 MBps, 100 IOPS for
+  read/write, 30 ms latency)
+- Sequential read, direct, 16m block size, 1, 3 and 10 threads
+- Sequential write, direct, 16m block size, 1, 3 and 10 threads
+
+All test loads should be run with default and optimal size of placement groups.
+
+Test Environment
+----------------
+
+Environment description
+^^^^^^^^^^^^^^^^^^^^^^^
+
+The environment description includes hardware specification of servers,
+network parameters, operation system and Ceph deployment characteristics.
+
+Hardware
+~~~~~~~~
+
+This section contains list of all types of hardware nodes (table below is
+an example).
+
++-----------+-------+----------------------------------------------------+
+| Parameter | Value | Comments                                           |
++-----------+-------+----------------------------------------------------+
+| model     |       | e.g. Supermicro X9SRD-F                            |
++-----------+-------+----------------------------------------------------+
+| CPU       |       | e.g. 6 x Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz |
++-----------+-------+----------------------------------------------------+
+| role      |       | e.g. compute or network                            |
++-----------+-------+----------------------------------------------------+
+
+Network
+~~~~~~~
+
+This section contains list of interfaces and network parameters. For
+complicated cases this section may include topology diagram and switch
+parameters (table below is an example).
+
++------------------+-------+-------------------------+
+| Parameter        | Value | Comments                |
++------------------+-------+-------------------------+
+| network role     |       | e.g. provider or public |
++------------------+-------+-------------------------+
+| card model       |       | e.g. Intel              |
++------------------+-------+-------------------------+
+| driver           |       | e.g. ixgbe              |
++------------------+-------+-------------------------+
+| speed            |       | e.g. 10G or 1G          |
++------------------+-------+-------------------------+
+| MTU              |       | e.g. 9000               |
++------------------+-------+-------------------------+
+| offloading modes |       | e.g. default            |
++------------------+-------+-------------------------+
+
+Software
+~~~~~~~~
+
+This section describes installed software (table below is an example).
+
++-----------------+-------+---------------------------+
+| Parameter       | Value | Comments                  |
++-----------------+-------+---------------------------+
+| OS              |       | e.g. Ubuntu 16.04         |
++-----------------+-------+---------------------------+
+| Ceph            |       | e.g. Jewel                |
++-----------------+-------+---------------------------+
+
+Reports
+=======
+
+Test plan execution reports:
+
+* :ref:`ceph_rbd_performance_results_50_osd`
+
+.. references:
+
+.. _Wally: https://github.com/Mirantis/disk_perf_test_tool