Add overcloud scale nodes

Typical workflow:
1. Scale node
2. Validate the overcloud
3. Reduce scale by deleting orignal node of scale type
4. Re-inventory the overcloud
5. Validate the overcloud

Note: Related to change[1] adding role to CAT

[1] - https://review.gerrithub.io/#/c/274987/

Change-Id: I0da9cce50a7a491f5370521fad3c5409c2c917b1
This commit is contained in:
Harry Rybacki 2016-06-07 16:09:00 -04:00
parent 12ed48ede3
commit 13d9e4416c
13 changed files with 509 additions and 15 deletions

View File

@ -1,38 +1,99 @@
Role Name
=========
A brief description of the role goes here.
An Ansible role for scaling and deleting nodes from an overcloud.
Requirements
------------
Any pre-requisites that may not be covered by Ansible itself or the role should be mentioned here. For instance, if the role uses the EC2 module, it may be a good idea to mention in this section that the boto package is required.
This role assumes it will be executed against a host on which a Liberty or Mitaka under/overcloud have already been deployed.
**Note:** The ansible-role-tripleo-overcloud-validate role must be accessible.
Role Variables
--------------
A description of the settable variables for this role should go here, including any variables that are in defaults/main.yml, vars/main.yml, and any variables that can/should be set via parameters to the role. Any variables that are read from other roles and/or the global scope (ie. hostvars, group vars, etc.) should be mentioned here as well.
**Note:** Make sure to include all environment file and options from your [initial Overcloud creation](https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html/Director_Installation_and_Usage/sect-Scaling_the_Overcloud.html#sect-Adding_Compute_or_Ceph_Storage_Nodes). This includes the same scale parameters for non-Compute nodes.
- artosn_scale_nodes: <true> -- boolean value that will scale nodes if true
- artosn_delete_original_node: <false> -- boolean value that will delete the original node of type that was scaled
- artosn_working_dir: <'/home/stack'> -- working directory for the role. Assumes stackrc file is present at this location
Dependencies
------------
A list of other roles hosted on Galaxy should go here, plus any details in regards to parameters that may need to be set for other roles, or variables that are used from other roles.
1. [ansible-role-tripleo-overcloud-validate](https://github.com/redhat-openstack/ansible-role-tripleo-overcloud-validate)
Example Playbook
----------------
Including an example of how to use your role (for instance, with variables passed in as parameters) is always nice for users too:
1. Sample playbook to call the role
- hosts: servers
- name: Scale overcloud nodes
hosts: undercloud
roles:
- { role: username.rolename, x: 42 }
- ansible-role-tripleo-overcloud-scale-nodes
2. Sample config file to scale from one compute node to two compute nodes on the overcloud
control_memory: 6144
compute_memory: 6144
undercloud_memory: 8192
undercloud_vcpu: 2
overcloud_nodes:
- name: control_0
flavor: control
- name: compute_0
flavor: compute
- name: compute_1
flavor: compute
- name: compute_2
flavor: compute
tempest: false
pingtest: true
deploy_timeout: 60
# General deployment info
libvirt_args: "--libvirt-type qemu"
flavor_args: >-
--control-flavor {{flavor_map.control
if flavor_map is defined and 'control' in flavor_map else 'oooq_control'}}
--compute-flavor {{flavor_map.compute
if flavor_map is defined and 'compute' in flavor_map else 'oooq_compute'}}
--ceph-storage-flavor {{flavor_map.ceph
if flavor_map is defined and 'ceph' in flavor_map else 'oooq_ceph'}}
timeout_args: "--timeout {{ deploy_timeout }}"
# Pulled this out so we can hand these configs to the openstack overcloud node delete command
scale_extra_configs: "-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e ~/network-environment.yaml"
scale_extra_args: "--{{ node_to_scale }}-scale {{ final_scale_value }} --neutron-network-type vxlan --neutron-tunnel-types vxlan {{ scale_extra_configs }} --ntp-server pool.ntp.org"
# Scale deployment info
node_to_scale: compute # Type of node to scale
initial_scale_value: 1 # Initial number of nodes to deploy
final_scale_value: 2 # Number of additional nodes to add during the scale
# Scale deployment arguments
scale_args: >-
{{ libvirt_args }}
{{ flavor_args }}
{{ timeout_args }}
{{ scale_extra_args }}
License
-------
BSD
Apache
Author Information
------------------
An optional section for the role authors to include contact information, or a website (HTML is not allowed).
RDO-CI Team

64
configs/scale_ceph.yml Normal file
View File

@ -0,0 +1,64 @@
---
control_memory: 6144
compute_memory: 6144
ceph_memory: 8192
undercloud_memory: 8192
undercloud_vcpu: 2
overcloud_nodes:
- name: control_0
flavor: control
- name: compute_0
flavor: compute
- name: ceph-storage_0
flavor: ceph
- name: ceph-storage_1
flavor: ceph
tempest: false
pingtest: true
deploy_timeout: 60
# General deployment info
libvirt_args: "--libvirt-type qemu"
flavor_args: >-
--control-flavor {{flavor_map.control
if flavor_map is defined and 'control' in flavor_map else 'oooq_control'}}
--compute-flavor {{flavor_map.compute
if flavor_map is defined and 'compute' in flavor_map else 'oooq_compute'}}
--ceph-storage-flavor {{flavor_map.ceph
if flavor_map is defined and 'ceph' in flavor_map else 'oooq_ceph'}}
timeout_args: "--timeout {{ deploy_timeout }}"
extra_args: "--ceph-storage-scale 1 --neutron-network-type vxlan --neutron-tunnel-types vxlan -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e ~/network-environment.yaml --ntp-server pool.ntp.org"
# Pulled this out so we can hand these configs to the openstack overcloud node delete command
scale_extra_configs: "-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e ~/network-environment.yaml"
scale_extra_args: "--{{ node_to_scale_deployment_arg }}-scale {{ final_scale_value }} --neutron-network-type vxlan --neutron-tunnel-types vxlan {{ scale_extra_configs }} --ntp-server pool.ntp.org"
# Scale deployment info
node_to_scale: ceph # Type of node to scale
node_to_scale_deployment_arg: ceph-storage # argument needed to scale node
initial_scale_value: 1 # Initial number of nodes to deploy
final_scale_value: 2 # Number of additional nodes to add during the scale
# Initial deployment arguments
deploy_args: >-
{{ libvirt_args }}
{{ flavor_args }}
{{ timeout_args }}
{{ extra_args }}
# Scale deployment arguments
scale_args: >-
{{ libvirt_args }}
{{ flavor_args }}
{{ timeout_args }}
{{ scale_extra_args }}
# Sample call
# ./deploy.sh -v --playbook scale_nodes --config-file config/scale/scale_ceph.yml

62
configs/scale_compute.yml Normal file
View File

@ -0,0 +1,62 @@
---
control_memory: 6144
compute_memory: 6144
undercloud_memory: 8192
undercloud_vcpu: 2
overcloud_nodes:
- name: control_0
flavor: control
- name: compute_0
flavor: compute
- name: compute_1
flavor: compute
- name: compute_2
flavor: compute
tempest: false
pingtest: true
deploy_timeout: 60
# General deployment info
libvirt_args: "--libvirt-type qemu"
flavor_args: >-
--control-flavor {{flavor_map.control
if flavor_map is defined and 'control' in flavor_map else 'oooq_control'}}
--compute-flavor {{flavor_map.compute
if flavor_map is defined and 'compute' in flavor_map else 'oooq_compute'}}
--ceph-storage-flavor {{flavor_map.ceph
if flavor_map is defined and 'ceph' in flavor_map else 'oooq_ceph'}}
timeout_args: "--timeout {{ deploy_timeout }}"
extra_args: "--compute-scale 1 --neutron-network-type vxlan --neutron-tunnel-types vxlan -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e ~/network-environment.yaml --ntp-server pool.ntp.org"
# Pulled this out so we can hand these configs to the openstack overcloud node delete command
scale_extra_configs: "-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e ~/network-environment.yaml"
scale_extra_args: "--{{ node_to_scale }}-scale {{ final_scale_value }} --neutron-network-type vxlan --neutron-tunnel-types vxlan {{ scale_extra_configs }} --ntp-server pool.ntp.org"
# Scale deployment info
node_to_scale: compute # Type of node to scale
initial_scale_value: 1 # Initial number of nodes to deploy
final_scale_value: 2 # Number of additional nodes to add during the scale
# Initial deployment arguments
deploy_args: >-
{{ libvirt_args }}
{{ flavor_args }}
{{ timeout_args }}
{{ extra_args }}
# Scale deployment arguments
scale_args: >-
{{ libvirt_args }}
{{ flavor_args }}
{{ timeout_args }}
{{ scale_extra_args }}
# Sample call
# ./deploy.sh -v --playbook scale_nodes --config-file config/scale/scale_compute.yml

View File

@ -1,2 +1,6 @@
---
# defaults file for ansible-role-tripleo-overcloud-scale-nodes
artosn_scale_nodes: true
artosn_delete_original_node: false
artosn_working_dir: /home/stack

81
playbooks/scale_nodes.yml Normal file
View File

@ -0,0 +1,81 @@
---
################
# Deploy Nodes #
################
# This is the playbook used by the `quickstart.sh` script.
# The [provision.yml](provision.yml.html) playbook is responsible for
# creating an inventory entry for our `virthost` and for creating an
# unprivileged user on that host for use by our virtual environment.
- include: provision.yml
tags:
- provision
# The `environment/setup` role performs any tasks that require `root`
# access on the target host.
- name: Install libvirt packages and configure networks
hosts: virthost
tags:
- environment
roles:
- environment/setup
# The `libvirt/setup` role creates the undercloud and overcloud
# virtual machines.
- name: Setup undercloud and overcloud vms
hosts: virthost
gather_facts: yes
roles:
- libvirt/teardown
- libvirt/setup
# Add the undercloud node to the generated
# inventory.
- name: Rebuild inventory
hosts: localhost
vars:
inventory: undercloud
roles:
- tripleo-inventory
# DEPLOY ALL THE THINGS! Depending on the currently selected set of
# tags, this will deploy the undercloud, deploy the overcloud, and
# perform some validation tests on the overcloud.
- name: Install undercloud and deploy overcloud
hosts: undercloud
gather_facts: no
roles:
- tripleo/undercloud
- tripleo/overcloud
# Add the overcloud node to the generated
# inventory.
- name: Rebuild inventory
hosts: undercloud
gather_facts: yes
vars:
inventory: all
roles:
- tripleo-inventory
###############
# Scale Nodes #
###############
# Scale nodes w/o delete
- name: Scale overcloud nodes
hosts: undercloud
roles:
- { role: tripleo-overcloud-scale, artosn_scale_nodes: true, artosn_delete_original_node: false }
# Delete the original node of type that was scaled - ensure overcloud validates after reducing scale
- name: Delete original node of type scaled
hosts: undercloud
roles:
- { role: tripleo-overcloud-scale, artosn_scale_nodes: false, artosn_delete_original_node: true }
# NOTE(hrybacki: inventory regeneration and overcloud validation must be completed in a second playbook. The
# deleted node is removed from the hosts file. However, it still exists in memory and will cause the
# 'ansible-role-tripleo-inventory: regenerate ssh config' task to fail when attempting to acces non-existant host vars

View File

@ -0,0 +1,19 @@
---
# NOTE(hrybacki: inventory regeneration and overcloud validation must be completed in a second playbook. The
# deleted node is removed from the hosts file. However, it still exists in memory and will cause the
# 'ansible-role-tripleo-inventory: regenerate ssh config' task to fail when attempting to acces non-existant host vars
# Re-inventory the overcloud
- name: Inventory the overcloud
hosts: undercloud
gather_facts: no
roles:
- tripleo-inventory
# Validate the overcloud
- name: Validate the overcloud post-delete-node
hosts: undercloud
gather_facts: no
roles:
- tripleo-overcloud-validate

View File

@ -20,13 +20,15 @@ setup-hooks =
[files]
data_files =
usr/local/share/ansible-role-tripleo-overcloud-scale-nodes/defaults = ../ansible-role-tripleo-overcloud-scale-nodes/defaults/*
usr/local/share/ansible-role-tripleo-overcloud-scale-nodes/handlers = ../ansible-role-tripleo-overcloud-scale-nodes/handlers/*
usr/local/share/ansible-role-tripleo-overcloud-scale-nodes/meta = ../ansible-role-tripleo-overcloud-scale-nodes/meta/*
usr/local/share/ansible-role-tripleo-overcloud-scale-nodes/tasks = ../ansible-role-tripleo-overcloud-scale-nodes/tasks/*
usr/local/share/ansible-role-tripleo-overcloud-scale-nodes/templates = ../ansible-role-tripleo-overcloud-scale-nodes/templates/*
usr/local/share/ansible-role-tripleo-overcloud-scale-nodes/tests = ../ansible-role-tripleo-overcloud-scale-nodes/tests/*
usr/local/share/ansible-role-tripleo-overcloud-scale-nodes/vars = ../ansible-role-tripleo-overcloud-scale-nodes/vars/*
usr/local/share/ansible/roles/tripleo-overcloud-scale/defaults = defaults/*
usr/local/share/ansible/roles/tripleo-overcloud-scale/handlers = handlers/*
usr/local/share/ansible/roles/tripleo-overcloud-scale/meta = meta/*
usr/local/share/ansible/roles/tripleo-overcloud-scale/tasks = tasks/*
usr/local/share/ansible/roles/tripleo-overcloud-scale/templates = templates/*
usr/local/share/ansible/roles/tripleo-overcloud-scale/tests = tests/*
usr/local/share/ansible/roles/tripleo-overcloud-scale/vars = vars/*
playbooks = playbooks/*
config/general_config/ = configs/*
[wheel]
universal = 1

View File

@ -0,0 +1,69 @@
---
# Delete the scaled node
- name: Check the overcloud heat stack-list state
shell: >
source "{{ artosn_working_dir }}"/stackrc;
heat stack-list
register: heat_stack_list_result
- name: Verify the overcloud is in a complete state
fail: msg='Overcloud heat stack is not in a complete state'
when: heat_stack_list_result.stdout.find('COMPLETE') == -1
- name: Get id for the overcloud stack
shell: >
source "{{ artosn_working_dir }}"/stackrc;
heat stack-list | grep overcloud | sed -e 's/|//g' | awk '{print $1}'
register: overcloud_id
- name: Register uuid of original "{{ node_to_scale}}" node
shell: >
source "{{ artosn_working_dir }}"/stackrc;
nova list | grep -m 1 "{{ node_to_scale }}" | sed -e 's/|//g' | awk '{print $1}'
register: node_id_to_delete
- name: Register the Name of the original "{{ node_to_scale }}" node
shell: >
source "{{ artosn_working_dir }}"/stackrc;
nova list | grep -m 1 "{{ node_to_scale }}" | sed -e 's/|//g' | awk '{print $2}'
register: node_name_to_delete
- name: Display node name to be deleted
debug: msg={{ node_name_to_delete.stdout }}
- name: Copy delete node script to undercloud
template:
src: delete-node.j2
dest: "{{ artosn_working_dir }}/delete-node.sh"
mode: 0755
- name: Delete node by id
shell: >
cat "{{ artosn_working_dir }}"/delete-node.sh;
"{{ artosn_working_dir }}"/delete-node.sh &> delete_node_scale_console.log;
# Verify the delete was successful
- name: Poll heat stack-list to determine when node delete is complete
shell: >
source "{{ artosn_working_dir }}"/stackrc;
heat stack-list
register: heat_stack_list_result
until: heat_stack_list_result.stdout.find("COMPLETE") != -1
retries: 20
delay: 90
- name: Determine the post scale node count
shell: >
source "{{ artosn_working_dir }}/stackrc";
nova list | grep "{{ node_to_scale }}" | cut -f2- -d':' | wc -l
register: post_scale_node_count
- name: Remove deleted hosts from the host file
shell: >
sed -i '/{{ node_name_to_delete.stdout }}/d' {{ local_working_dir }}/hosts
delegate_to: localhost
- name: Check that post delete node count is correct
fail: msg="Overcloud nova list does not show expected number of {{ node_to_scale }} services"
when: post_scale_node_count.stdout != "{{ initial_scale_value }}"

View File

@ -1,2 +1,18 @@
---
# tasks file for ansible-role-tripleo-overcloud-scale-nodes
- include: pre-scale.yml
when: artosn_scale_nodes
tags:
- pre-overcloud-scale-nodes
- include: scale-nodes.yml
when: artosn_scale_nodes
tags:
- overcloud-scale-nodes
# Optionally delete the original node of type scaled
- include: delete-original-node.yml
when: artosn_delete_original_node
tags:
- post-overcloud-scale-nodes-delete

36
tasks/pre-scale.yml Normal file
View File

@ -0,0 +1,36 @@
---
# Prep for scale
- name: Determine initial number of node(s) that will be scaled
shell: >
source "{{ artosn_working_dir }}/stackrc";
nova list | grep "{{ node_to_scale }}" | cut -f2- -d':' | wc -l
register: initial_node_count
- name: Register uuid of original "{{ node_to_scale}}" node
shell: >
source "{{ artosn_working_dir }}"/stackrc;
nova list | grep -m 1 "{{ node_to_scale }}" | sed -e 's/|//g' | awk '{print $1}'
register: node_id_to_delete
- name: Register the Name of the original "{{ node_to_scale }}" node
shell: >
source "{{ artosn_working_dir }}"/stackrc;
nova list | grep -m 1 "{{ node_to_scale }}" | sed -e 's/|//g' | awk '{print $2}'
register: node_name_to_delete
- name: Register pre-scale nova list
shell: >
source "{{ artosn_working_dir }}/stackrc";
nova list
register: pre_scale_nova_list
- name: Display pre-scale nova list
debug: msg={{ pre_scale_nova_list.stdout_lines }}
when: pre_scale_nova_list is defined
- name: Copy scale deployment template to undercloud
template:
src: scale-deployment.j2
dest: "{{ artosn_working_dir }}/scale-deployment.sh"
mode: 0755

37
tasks/scale-nodes.yml Normal file
View File

@ -0,0 +1,37 @@
---
# Do the scale
- name: Call scale deployment script
shell: >
source "{{ artosn_working_dir }}/stackrc";
"{{ artosn_working_dir }}"/scale-deployment.sh &> overcloud_deployment_scale_console.log;
- name: Poll heat stack-list to determine when node scale is complete
shell: >
source "{{ artosn_working_dir }}"/stackrc;
heat stack-list
register: heat_stack_list_result
until: heat_stack_list_result.stdout.find("COMPLETE") != -1
retries: 20
delay: 90
- name: Register post-scale nova list
shell: >
source "{{ artosn_working_dir }}/stackrc";
nova list
register: post_scale_nova_list
- name: Display post-scale nova list
debug: msg={{ post_scale_nova_list.stdout_lines }}
when: post_scale_nova_list is defined
# Verify the scale
- name: Determine the post scale node count
shell: >
source "{{ artosn_working_dir }}/stackrc";
nova list | grep "{{ node_to_scale }}" | cut -f2- -d':' | wc -l
register: post_scale_node_count
- name: Check that post scale node count is correct
fail: msg="Overcloud nova list does not show expected number of {{ node_to_scale }} services"
when: post_scale_node_count.stdout != "{{ final_scale_value }}"

4
templates/delete-node.j2 Normal file
View File

@ -0,0 +1,4 @@
#! /bin/bash
source ./stackrc
openstack overcloud node delete --debug --stack {{ overcloud_id.stdout }} --templates {{ scale_extra_configs }} {{ node_id_to_delete.stdout }}

View File

@ -0,0 +1,39 @@
---
# tasks file for ansible-role-tripleo-overcloud-scale-nodes
#!/bin/bash
# Simple overcloud scale script
set -eux
# Source in undercloud credentials.
source {{ artosn_working_dir }}/stackrc
# Wait until there are hypervisors available.
while true; do
count=$(openstack hypervisor stats show -c count -f value)
if [ $count -gt 0 ]; then
break
fi
done
deploy_status=0
# Scale the overcloud!
openstack overcloud deploy --templates {{ scale_args }} \
${DEPLOY_ENV_YAML:+-e $DEPLOY_ENV_YAML} || deploy_status=1
# We don't always get a useful error code from the openstack deploy command,
# so check `heat stack-list` for a CREATE_FAILED status.
if heat stack-list | grep -q 'CREATE_FAILED'; then
deploy_status=1
for failed in $(heat resource-list \
--nested-depth 5 overcloud | grep FAILED |
grep 'StructuredDeployment ' | cut -d '|' -f3)
do heat deployment-show $failed > failed_deployment_$failed.log
done
fi
exit $deploy_status