An OpenStack fault injection library
Go to file
Javier Pena a8be2c6e43 Fix for Ansible >= 2.4.0
The Ansible version was capped to < 2.4.0 in [1], since 2.4.0 refactored
the Inventory and VariableManager modules.

This patch aims at adding support for Ansible >= 2.4.0, while keeping
compatibility with earlier versions.

Change-Id: I5722fc8531671e69abe90d64a4bfd988321850b5
Closes-Bug: #1724227
2017-10-18 15:26:41 +02:00
doc Merge "Add Devstack Systemd driver" 2017-07-31 09:02:59 +00:00
examples Allow usage of multiple power drivers at once 2017-01-25 12:58:36 +03:00
os_faults Fix for Ansible >= 2.4.0 2017-10-18 15:26:41 +02:00
releasenotes Switch from oslosphinx to openstackdocstheme 2017-06-29 15:11:51 +07:00
.coveragerc Use pytest to run test in tox 2016-09-14 17:46:33 +03:00
.gitignore Use pytest to run test in tox 2016-09-14 17:46:33 +03:00
.gitreview Update .gitreview post project rename 2016-09-02 13:22:48 -07:00
.mailmap Small cleanup before the first release 2016-09-27 16:03:01 +03:00
CONTRIBUTING.rst Fixed link to launchpad in docs 2016-10-17 14:58:39 -07:00
HACKING.rst Small cleanup before the first release 2016-09-27 16:03:01 +03:00
LICENSE Small cleanup before the first release 2016-09-27 16:03:01 +03:00
MANIFEST.in Initial Cookiecutter Commit. 2016-08-08 12:06:17 +03:00
README.rst Allow usage of multiple power drivers at once 2017-01-25 12:58:36 +03:00
babel.cfg Small cleanup before the first release 2016-09-27 16:03:01 +03:00
readthedocs.yml Fix readthedocs build 2016-10-17 19:04:49 +03:00
requirements.txt Fix for Ansible >= 2.4.0 2017-10-18 15:26:41 +02:00
rtd-requirements.txt Fix readthedocs build 2016-10-17 19:04:49 +03:00
setup.cfg [CLI] Add os-faults discover command 2017-04-25 11:08:27 +03:00
setup.py Update requirements 2017-03-02 11:47:36 +04:00
test-requirements.txt Switch from oslosphinx to openstackdocstheme 2017-06-29 15:11:51 +07:00
tox.ini [docs] Add documentation for config file 2017-04-25 11:34:35 +03:00

README.rst

OS-Faults

OpenStack fault-injection library

The library does destructive actions inside an OpenStack cloud. It provides an abstraction layer over different types of cloud deployments. The actions are implemented as drivers (e.g. DevStack driver, Fuel driver, Libvirt driver, IPMI driver).

Installation

Reqular installation:

pip install os-faults

The library contains optional libvirt driver, if you plan to use it, please use the following command to install os-faults with extra dependencies:

pip install os-faults[libvirt]

Configuration

The cloud deployment configuration schema is an extension to the cloud config used by the os-client-config library:

cloud_config = {
    'cloud_management': {
        'driver': 'devstack',
        'args': {
            'address': 'devstack.local',
            'username': 'root',
        }
    },
    'power_managements': [
        {
            'driver': 'libvirt',
            'args': {
                'connection_uri': 'qemu+unix:///system',
            }
        },
        {
            'driver': 'ipmi',
            'args': {
                'mac_to_bmc': {
                    'aa:bb:cc:dd:ee:01': {
                        'address': '55.55.55.55',
                        'username': 'foo',
                        'password': 'bar',
                    }
                }
            }
        }
    ]
}

Establish a connection to the cloud and verify it:

destructor = os_faults.connect(cloud_config)
destructor.verify()

The library can also read configuration from a file and the file can be in the following three formats: os-faults.{json,yaml,yml}. The configuration file can be specified in the OS_FAULTS_CONFIG environment variable or can be read from one of the default locations: * current directory * ~/.config/os-faults * /etc/openstack

Make some destructive actions:

destructor.get_service(name='keystone').restart()
The library operates with 2 types of objects:
  • service - is a software that runs in the cloud, e.g. nova-api
  • nodes - nodes that host the cloud, e.g. a hardware server with a hostname

Simplified API

Simplified API is used to inject faults in a human-friendly form.

Service-oriented command performs specified action against service on all, on one random node or on the node specified by FQDN:

<action> <service> service [on (random|one|single|<fqdn> node[s])]
Examples:
  • Restart Keystone service - restarts Keystone service on all nodes.
  • kill nova-api service on one node - restarts Nova API on one randomly-picked node.

Node-oriented command performs specified action on node specified by FQDN or set of service's nodes:

<action> [random|one|single|<fqdn>] node[s] [with <service> service]
Examples:
  • Reboot one node with mysql - reboots one random node with MySQL.
  • Reset node-2.domain.tld node - reset node node-2.domain.tld.

Network-oriented command is a subset of node-oriented and performs network management operation on selected nodes:

<action> <network> network on [random|one|single|<fqdn>] node[s]
    [with <service> service]
Examples:
  • Disconnect management network on nodes with rabbitmq service - shuts down management network interface on all nodes where rabbitmq runs.
  • Connect storage network on node-1.domain.tld node - enables storage network interface on node-1.domain.tld.

Extended API

1. Service actions

Get a service and restart it:

destructor = os_faults.connect(cloud_config)
service = destructor.get_service(name='glance-api')
service.restart()
Available actions:
  • start - start Service
  • terminate - terminate Service gracefully
  • restart - restart Service
  • kill - terminate Service abruptly
  • unplug - unplug Service out of network
  • plug - plug Service into network

2. Node actions

Get all nodes in the cloud and reboot them:

nodes = destructor.get_nodes()
nodes.reboot()
Available actions:
  • reboot - reboot all nodes gracefully
  • poweroff - power off all nodes abruptly
  • reset - reset (cold restart) all nodes
  • oom - fill all node's RAM
  • disconnect - disable network with the specified name on all nodes
  • connect - enable network with the specified name on all nodes

3. Operate with nodes

Get all nodes where a service runs, pick one of them and reset:

nodes = service.get_nodes()
one = nodes.pick()
one.reset()

Get nodes where l3-agent runs and disable the management network on them:

fqdns = neutron.l3_agent_list_hosting_router(router_id)
nodes = destructor.get_nodes(fqdns=fqdns)
nodes.disconnect(network_name='management')

4. Operate with services

Restart a service on a single node:

service = destructor.get_service(name='keystone')
nodes = service.get_nodes().pick()
service.restart(nodes)