From ae376f2901d32f67013b303b782d8ffbc1cb58e1 Mon Sep 17 00:00:00 2001 From: ahothan Date: Sun, 15 May 2016 23:24:37 -0700 Subject: [PATCH] Add faq Change-Id: Ifbb9ff1ab128d2163ae8cd7985fa94054823429f --- doc/source/faq.rst | 195 +++++++++++++++++++++++++++++++ doc/source/index.rst | 1 + doc/source/quickstart_docker.rst | 2 +- doc/source/quickstart_git.rst | 3 +- doc/source/quickstart_pip.rst | 4 +- kloudbuster/kb_runner_storage.py | 2 +- 6 files changed, 200 insertions(+), 7 deletions(-) create mode 100644 doc/source/faq.rst diff --git a/doc/source/faq.rst b/doc/source/faq.rst new file mode 100644 index 0000000..a35b237 --- /dev/null +++ b/doc/source/faq.rst @@ -0,0 +1,195 @@ +========================== +Frequently Asked Questions +========================== + + +KloudBuster in a nutshell? +-------------------------- + +A self contained, fully automated and open source OpenStack VM-level tenant +network and storage scale measurement tool. + +Why is a tool like KloudBuster useful? +-------------------------------------- + +Before KloudBuster it was practically very difficult and time consuming to load +an OpenStack cloud at a scale that reflects large deployments with traffic on +the data plane or the storage plane and to measure its impact where it counts: +at the VM application level. To the point that practically very few people would +take the pain of conducting such experiment except perhaps for very small scale +setups (such as single rack, 2 compute nodes). Just to give an idea, to +replicate manually what a 15-minute KloudBuster run can do on a medium size +cloud (2 racks, less than 100 nodes and 40GE links), would require at the very +least several days of hard-to-repeat work assuming the person doing that test +knows how to do all the different small tasks needed to obtain similar results: + +- create a VM image with the appropriate test packages (not trivial to find + which packages to use) +- create all the tenants/users/routers/resources/VMs +- place the VMs in a way that makes sense for scale testing (such as rack based + placement) +- provision the test VMs with the right configuration +- orchestrate the test VMs to start at the same time (cannot be manual due to + the scale, we're talking about hundreds of client VMs to coordinate) +- repatriate all the results from all the client VMs when the test is finished + (which itself can represent a very large amount of data) +- consolidate all the results in a way that makes sense system wise and is easy + to interpret +- present results in a nice digestible format + +And this is just for the simplest of the KloudBuster runs. Dual cloud scale +testing (where 1 testing cloud loads the cloud under test to scale up the North +South traffic) requires juggling with 2 OpenStack clouds at the same time, +KloudBuster handles that mode by design and as easily as the single-cloud scale +test. Latest features add support for periodic reporting (e.g. latency stats +every 5 seconds), server mode with RESTFUL API control by external orchestrators +(this is required for any form of automated HA testing) or scale progressions +(e.g. collect latency numbers for 10,000 clients to 200,000 clients by increment +of 10,000 clients, at that level of scale recreating every resource set from +scratch at every iteration is going to take too much time). All of these +advanced features are clearly impossible to do manually or semi-manually. + +What do you mean by comprehensive end to end scale testing? +----------------------------------------------------------- + +You could start with a completely idle OpenStack cloud with zero resources and +zero data plane traffic (as if you just deployed OpenStack on it). Within +minutes you have a cloud that is fully loaded with tenants, users, routers, +networks, VMs and with all network pipes filled to capacity (if the network +architecture and configuration is tuned properly) with a massive amount of live +HTTP traffic or storage traffic. After the test, you revert back to the original +idle state and you have a nice HTML report with the data plane characterization +of your cloud or charts representing the scalability of your storage back end +viewed from VM applications. + + +How scalable is KloudBuster itself? +----------------------------------- + +All the runs so far have shown bottlenecks residing in the cloud under test. +KloudBuster is designed to scale to an extremely large number of VM end points +thanks to the use of an efficient distributed key value store and the associated +publish/subscribe service (Redis) for the scale orchestration. Redis has shown +to scale to thousands of subscribers without any problem while more traditional +scaling tools that use SSH to control the end points will have trouble keeping +up past a few hundred sessions. + + +General Usage Questions +----------------------- + +Is there a way to prevent KloudBuster from deleting all the resources? +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In cfg.scale.yaml, there is a “cleanup_resources” property which is True by +default. Set it to False and KloudBuster won’t clean up the resources after the +run. + +Is there a way to cleanup all lingering resources created by KloudBuster? +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +All resources created by KlousBuster have a "KB\_" prefix in their name. The +“force_cleanup” script will clean up all resources that have a name starting +with "KB\_". + +How are KloudBuster VM images managed? +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +KloudBuster VM images are built using OpenStack diskimage-builder (or DIB) and +have a version (single number). The default name of an image is +"kloudbuster_v" (e.g. "kludbuster_v6"). Normally each KloudBuster +application is associated to a recommended KloudBuster VM image version. + +This is indicated in the output of --version:: + + $ python kloudbuster.py --version + 6.0.3, VM image: kloudbuster_v6 + +In this example, the KloudBuster application is version 6.0.3 and the matching +VM image is v6. By default KloudBuster will use the Glance image that is named +"kloudbuster_v6" (this can be overridden in the configuration file). + +Note that if the user specifies a different VM image version in the +configuration file, a warning will be issued to indicate that there might be +some incompatibilities (but the run will proceed): + +:: + + 2015-08-26 10:47:10.915 38100 INFO kb_runner [-] Waiting for agents on VMs to come up... + 2015-08-26 10:47:15.918 38100 INFO kb_runner [-] 0 Succeed, 0 Failed, 1 Pending... Retry #0 + 2015-08-26 10:47:20.920 38100 INFO kb_runner [-] 1 Succeed, 0 Failed, 0 Pending... Retry #1 + 2015-08-26 10:47:20.961 38100 WARNING kb_runner [-] The VM image you are running (2.0) is not the expected version (6) this may cause some incompatibilities + +It is recommended to always use the appropriate VM image version to avoid any +potential incompatibility. + +HTTP Data Plane Testing +----------------------- + +How many TCP connections exactly are created, how many requests are generated and how long do the connections stay? +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +KloudBuster will create the exact number of HTTP connections configured and will +keep them active and open until the end of the scale test. There is a 1:1 +mapping between an HTTP connection/client and 1 TCP connection (the same TCP +connection is reused for all requests sent by the same client). For example, +with 100 HTTP servers, 1000 HTTP connections and 500 HTTP requests/sec per HTTP +server, the total number of simultaneous HTTP connections will be 100,000 at any +time during the scale test and the number of HTTP requests generated will be +50,000 rps. + +Why pick wrk2 to generate HTTP traffic? +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This tool was chosen among many other open source tools because it was tested to +be the most scalable (highest number of connections/rps per CPU) and provided +very accurate HTTP throughput and latency results (which cannot be said of most +other tools - see FAQ on how latency is calculated). + +Storage Scale Testing +--------------------- + +What kind of VM storage are supported? +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +KloudBuster cam measure the performance of ephemeral disks and Cinder attached +volumes at scale. + + +Common Pitfalls and Limitations +------------------------------- + +AuthorizationFailure and SSL Exception when running KloudBuster +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +:: + + 2016-05-12 17:20:30 CRITICAL AuthorizationFailure: Authorization Failed: SSL exception connecting to https://172.29.86.5:5000/v2.0/tokens: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:765) + +This exception most likely indicates that the OpenStack API uses SSL and that +the CA certificate file is missing in the openrc file used. Check that the +openrc file used: + +- has OS_AUTH_URL using https +- either has OS_CACERT missing or pointing to an invalid or missing certificate + file path + +To fix this you will need to have the OS_CACERT variable in your openrc file +point to a valid certificate file (you will need to get this certificate file +from the cloud admin). + + +Creating the image with diskimage-builder fails with an "import yaml" error +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This error means that the python PyYAML package is not installed or that your +/etc/sudoers file is configured in a way that causes a sudo script in diskimage- +builder to fail. To check if PyYAML is installed: pip list | grep PyYAML If +PyYAML is installed, comment out this line in /etc/sudoers (use "sudo visudo" +to modify that file): + +.. code-block:: bash + + #Defaults secure_path="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" + + diff --git a/doc/source/index.rst b/doc/source/index.rst index d2106c5..4f1dbce 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -19,6 +19,7 @@ Contents: std_scale_profile adv_features cleanup + faq development contributing diff --git a/doc/source/quickstart_docker.rst b/doc/source/quickstart_docker.rst index c0e091f..ac64e8d 100644 --- a/doc/source/quickstart_docker.rst +++ b/doc/source/quickstart_docker.rst @@ -37,7 +37,7 @@ file is saved under the local directory with the name "admin-openrc.sh" ---------------------------------------------------------- If your OpenStack cloud has full access to the Internet, you can skip this step -as KloudBuster will instruct Glance to download the KloudBuster VM inage +as KloudBuster will instruct Glance to download the KloudBuster VM image directly from the OpenStack (skip to next step). Otherwise, :ref:`download the latest kloudbuster image ` from diff --git a/doc/source/quickstart_git.rst b/doc/source/quickstart_git.rst index 8514de7..e58b3dd 100644 --- a/doc/source/quickstart_git.rst +++ b/doc/source/quickstart_git.rst @@ -61,9 +61,8 @@ First, download XCode from App Store, then execute below commands: $ pip install -r requirements-dev.txt If you need to run the KloudBuster Web UI you need to install coreutils -(you can skip this step if you do not run the KloudBuster Web server): +(you can skip this step if you do not run the KloudBuster Web server):: -.. code-block:: bash $ # If you need to run KloudBuster Web UI, $ # coreutils needs to be installed using Homebrew. diff --git a/doc/source/quickstart_pip.rst b/doc/source/quickstart_pip.rst index f3f3c6f..0994e32 100644 --- a/doc/source/quickstart_pip.rst +++ b/doc/source/quickstart_pip.rst @@ -25,9 +25,7 @@ RHEL/Fedora/CentOS based: $ sudo yum install gcc python-devel python-pip python-virtualenv libyaml-devel -MacOSX: - -.. code-block:: bash +MacOSX:: $ # Download the XCode command line tools from Apple App Store $ xcode-select --install diff --git a/kloudbuster/kb_runner_storage.py b/kloudbuster/kb_runner_storage.py index b4397e6..f3fbfff 100644 --- a/kloudbuster/kb_runner_storage.py +++ b/kloudbuster/kb_runner_storage.py @@ -118,7 +118,7 @@ class KBRunner_Storage(KBRunner): vm_count = active_range[1] - active_range[0] + 1\ if active_range else len(self.full_client_dict) for idx, cur_config in enumerate(self.config.storage_tool_configs): - LOG.info("Runing test case %d of %d..." % (idx + 1, test_count)) + LOG.info("Running test case %d of %d..." % (idx + 1, test_count)) self.report = {'seq': 0, 'report': None} self.result = {} self.run_storage_test(active_range, dict(cur_config))