Adding collection of misc tools and scripts

This commit is contained in:
Mike Dorman 2014-10-07 10:29:10 -06:00
parent 634710542e
commit 92b369044f
16 changed files with 509 additions and 0 deletions

19
ansible/LICENSE.txt Normal file
View File

@ -0,0 +1,19 @@
Copyright (c) 2014 Go Daddy Operating Company, LLC
Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the "Software"),
to deal in the Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute, sublicense,
and/or sell copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.

101
ansible/README.md Normal file
View File

@ -0,0 +1,101 @@
ansible-playbooks
=================
Go Daddy Ansible playbooks for managing OpenStack infrastructure.
Also available publically at https://github.com/godaddy/openstack-ansible
This assumes your baseline Ansible config is at `/etc/ansible`, and this repo is cloned
to `/etc/ansible/playbooks` (Specifically, the playbooks assume the path to the tasks directory is `/etc/ansible/playbooks/tasks`,
so if you are cloning this repo somewhere else, you'll need to adjust that.)
Patches/comments/complaints welcomed and encouraged! Create an issue or PR here.
Usage Details
-------------
Playbooks are "shebanged" with `#!/usr/bin/env ansible-playbook --forks 50`, so you can actually
just run them directly from the command line.
We have the concept of "worlds", which correspond to dev, test, prod, etc., servers. We use
the world teminology to avoid confusion with the Puppet environment setting (which, for us,
corresponds to a branch in our [openstack-puppet](https://github.com/godaddy/openstack-puppet) repo.) So when you see references to the world
variable, that's what it is. Right now only a couple playbooks utilize that, so for the most
part you can probably use these without worrying about defining a world variable.
puppet-run.yaml and r10k-deploy.yaml are fairly specific to our environment, and are probably
mostly irrelevant unless you're also using our openstack-puppet repo for Puppet configuration.
Basic usage for the other playbooks follows.
### template-prestage.yaml
This one is really cool, complements of [krislindgren](http://github.com/krislindgren). It sets up a
BitTorrent swarm, seeded by the machine running Glance, to distribute a Glance image out to any number
of nova-compute nodes very quickly. So when we roll new gold images each month, we can use this to
"push" them out to all compute nodes, and avoid the first provision penalty of waiting for the image
to transfer the first time.
Caveats:
* Only works when the Glance backend is on a traditional (local or network-based) filesystem. Almost
certainly this does not work, and may not even make sense, for Swift-backed Glance.
* Firewalls or other traffic filters need to allow the BitTorrent ports through, and among, the Glance
server and all compute nodes.
* Assumes Glance images are stored at `/var/lib/glance/images` on the Glance server.
* There are some situations where this cannot be run multiple times, if the tracker or othe BitTorrent
processes are still running. So use caution, and YMMV.
* This is done completely outside the scope of Glance and Nova. There is no Keystone authentication or
access controls. You must have ssh and sudo access to all machines involved for this to work.
Usage:
./template-prestage.yaml -k -K -e "image_uuid=<uuid> image_sha1=<sha1> image_md5=<md5> tracker_host=<glance server> hosts_to_update=<compute host group>"
* _image_uuid_: UUID of the image to prestage (from `nova image-list` or `glance image-list`)
* _image_sha1_: SHA1 sum of the image_uuid (this can be gotten by running: `echo -n "<image_uuid>" | sha1sum | awk '{print $1}'` on any Linux box)
* _image_md5_: MD5 sum of the image file itsemf (this can be gotten by running: `md5sum /var/lib/glance/images/<image_uuid> | awk '{print $1}'` on the Glance server
* _tracker_host_: This is the Glance server host that runs the tracker
* _hosts_to_update_: This is the host group to place the image onto (a list of compute nodes)
### copy-hiera-eyaml-keys.yaml
Copies public and private keys for hiera-eyaml from the source/ansible client machine, to hosts, at
`/etc/pki/tls/private/hiera-eyaml-{public,private}_key.pkcs.pem`
./copy-hiera-eyaml-keys.yaml -k -K -e "srcdir=<srcdir> hosts=<host group>"
* _srcdir_: Source directory on the ansible client machine where the hiera-eyaml-public_key.pkcs7.pem and hiera-eyaml-private_key.pkcs7.pem keys can be found
### patching.yaml
This is a simple playbook to run `yum -y update --skip-broken` on all machines.
./patching.yaml -k -K -e "hosts=<host group>"
This also runs the remove-old-kernels.yaml task first, which removes any kernel packages from the
system which are not 1) the currently running kernel, nor, 2) the default boot kernel in GRUB.
### updatepackages.yaml
Similar to patching.yaml, but this one allows for a specification of exactly which package(s) to update.
./updatepackages.yaml -k -K -e "puppet=<true|false> package=<package spec> hosts=<host group>"
* _puppet_: If true, will also run the puppet-run.yaml task after updating the packages. Default false.
* _package spec_: Specification for what packages to update, wildcards are valid. Default '*'
### restartworld.yaml
Restarts some (or all) openstack services on hosts. Note that this uses the `tools/restartworld.sh` script
from the [godaddy/openstack-puppet](https://github.com/godaddy/openstack-puppet) repo, so you may want to look at that before trying to use this playbook.
This is also somewhat specific to our environment, as far as how we group services together (most of
them run on the "app" class of server.) So results and usefulness may vary.
./restartworld.yaml -k -K -e "class=<server class> hosts=<host group> service=<service class>"
* _class_: Server class to define what services to restart. Recognized options are app, network, and compute. See the `restartworld.sh` script referenced above for which services are on which class of server. You may want to define a group var for this (that's what we did) to automatically map servers to their appropriate server class value.
* _hosts_: Hosts on which to perform the service restarts
* _service class_: Generally, the OpenStack project name for the services to restart. Recognized options are nova,keystone,glance,ceilometer,heat,neutron,spice,els,world (where "world" means all services.)

View File

@ -0,0 +1,14 @@
#!/usr/bin/env ansible-playbook -f 50
---
# This playbook requires the following variables:
# hosts - this is the host(s) that you are trying to run on. any host in the hosts file is valid
#
# Author: Mike Dorman <mdorman@godaddy.com>
#
# Usage:
# ansible-playbook disable-glance-quota.yaml -k -K --extra-vars "hosts=glance-servers"
- hosts: '{{ hosts }}'
sudo: yes
tasks:
- include: ../tasks/turn-off-glance-quota.yaml

View File

@ -0,0 +1,15 @@
#!/usr/bin/env ansible-playbook -f 50
---
# This playbook requires the following variables:
# hosts - this is the host(s) that you are trying to run on. any host in the hosts file is valid
# value - value to set user_storage_quota to. Default 21474836480 (20 GB)
#
# Author: Mike Dorman <mdorman@godaddy.com>
#
# Usage:
# ansible-playbook enable-glance-quota.yaml -k -K --extra-vars "hosts=glance-servers value=21474836480"
- hosts: '{{ hosts }}'
sudo: yes
tasks:
- include: ../tasks/turn-on-glance-quota.yaml

View File

@ -0,0 +1,18 @@
#!/usr/bin/env ansible-playbook --forks 50
---
#
# This playbook supports the following variables:
#
# Author: Kris Lindgren <klindgren@godaddy.com>
#
# hosts - host(s)/group(s) on which to run this playbook (REQUIRED)
#
# Example:
# To run without changing the plabook with differnt uuids run ansible-plabook with the following:
# ansible-playbook run-puppet.yaml -k -K --extra-vars "hosts=compute-servers"
#
- hosts: '{{ hosts }}'
sudo: yes
tasks:
- include: /etc/ansible/playbooks/tasks/orphaned-vms.yaml

View File

@ -0,0 +1,42 @@
#!/usr/bin/env ansible-playbook --forks 50
---
# Reboot a series of compute nodes in a rolling fasion, verifying all VMs come back up on
# each node before going on to reboot the next group.
#
# Author: Kris Lindgren <klindgren@godaddy.com>
#
# This playbook requires the following variables:
# api_server - this is the server that runs the nova-api instance that we will use to get a list of the running vm's on compute nodes
# hosts - this is the host group to perform the rolling reboot on typically would be: *-compute
# This playbook also except the following *OPTIONAL* variables:
# reboot_parallelism (5) - How many hosts to reboot at once
# reboot_check_port - This is the port to check to see if the server has come back online (22)
# wait_delay - This is how long to wait between checks (120 seconds)
# wait_timeout - This is the maximum time to wait until we move on (1200 seconds)
# pause_for_host_boot - This is the time to wait for the host to fully restart (3 minutes)
# Example:
# ansible-playbook compute-rolling-reboot.yaml -k -K --extra-vars "api_server=api01 hosts=compute"
- hosts: '{{ hosts }}'
sudo: yes
serial: "{{ reboot_parallelism | default('5') }}"
tasks:
- name: Gather list of all running vm's on the host
shell: source /root/keystonerc_admin; nova list --all-tenants --status Active --host {{inventory_hostname}} --fields host,OS-EXT-SRV-ATTR:instance_name,status | grep ACTIVE | awk -F" | " '{if(a[$4]){a[$4]=a[$4]","$2"+"$6} else { a[$4]=$2"+"$6}} END {for (i in a) { print i":"a[i]} }'
register: running_vms
delegate_to: '{{ api_server }}'
- include: ../tasks/rolling-reboot.yaml
- name: ensure that nova-compute is started
service: name=openstack-nova-compute state=started
register: novacompute
- name: Verify running vm's are still running
shell: rc=$(echo "0"); vmlist=$( echo "{{running_vms.stdout }}" | grep {{inventory_hostname }} |cut -d":" -f2,2 |awk -F"," '{for (i=1; i<=NF; i++) print $i}'); virshlist=$( virsh list | grep running | awk '{print $2}'); for i in $vmlist; do vm=$( echo $i | cut -d"+" -f2,2 ); tmp=$( echo "$virshlist" | grep $vm); if [ $? -eq 1 ]; then uuid=$( echo "$i" | cut -d"+" -f1,1); echo "$uuid"; rc=$(echo "1"); fi; done; if [ "$rc" == "1" ]; then false; else true; fi
register: vms_not_running
when: novacompute.state == "started"
- debug: msg="{{vms_not_running}}"
when: vms_not_running.rc == 1

View File

@ -0,0 +1,112 @@
#!/usr/bin/env ansible-playbook --forks 50
---
# Distribute/prestage Glance image out to many compute nodes at once using BitTorrent
#
# Author: Kris Lindgren <klindgren@godaddy.com>
#
# Sets up a BitTorrent swarm, seeded by the machine running Glance, to distribute a Glance image out to any number
# of nova-compute nodes very quickly. So when we roll new gold images each month, we can use this to
# "push" them out to all compute nodes, and avoid the first provision penalty of waiting for the image
# to transfer the first time.
#
# Caveats:
# * Only works when the Glance backend is on a traditional (local or network-based) filesystem. Almost
# certainly this does not work, and may not even make sense, for Swift-backed Glance.
# * Firewalls or other traffic filters need to allow the BitTorrent ports through, and among, the Glance
# server and all compute nodes.
# * Assumes Glance images are stored at `/var/lib/glance/images` on the Glance server.
# * There are some situations where this cannot be run multiple times, if the tracker or othe BitTorrent
# processes are still running. So use caution, and YMMV.
# * This is done completely outside the scope of Glance and Nova. There is no Keystone authentication or
# access controls. You must have ssh and sudo access to all machines involved for this to work.
#
# This playbook requires the following variables:
# image_uuid - this can be gotten from the output of either nova image-list or glance image-list for the image you want to prestage
# image_sha1 - this can be gotten by running: echo -n "<image_uuid" | sha1sum | awk '{print $1'} on any linux box
# image_md5 - this can be gotten by running: md5sum /var/lib/glance/images/<image_uuid> | awk '{print $1}' on the glance server
# tracker_host - this is the host that runs the tracker (also this is the same host in the first and second play)
# hosts_to_update - this is the host group to place the image onto typically *-compute
# To run without changing the plabook with differnt uuids run ansible-plabook with the following:
# ansible-playbook template-prestage.yaml -k -K --extra-vars "image_uuid=41009dbd-52f5-4972-b65f-c429b1d42f5f image_sha1=1b8cddc7825df74e19d0a621ce527a0272541c35 image_md5=41d45920d859a2d5bd4d1ed98adf7668 tracker_host=api01 hosts_to_update=compute"
- hosts: '{{ tracker_host }}'
sudo: yes
vars:
image_uuid: '{{ image_uuid }}'
tracker_host: '{{ tracker_host }}'
tasks:
- name: install ctorrent client
yum: name=ctorrent state=present
- name: install opentracker-ipv4
yum: name=opentracker-ipv4 state=present
- name: make sane
shell: "killall -9 opentracker-ipv4 | true; killall -9 ctorrent | true;"
- name: Start Tracker
command: "{{item}}"
with_items:
- /usr/bin/opentracker-ipv4 -m -p 6969 -P 6969 -d /var/opentracker
- name: Create bittorrent file
command: "{{item}}"
with_items:
- mkdir -p /var/www/html/torrent
- rm -rf /var/www/html/torrent/{{ image_uuid }}.torrent
- /usr/bin/ctorrent -t -s /var/www/html/torrent/{{ image_uuid }}.torrent -u http://{{ tracker_host }}:6969/announce -c Testfile /var/lib/glance/images/{{ image_uuid }}
- name: Seed Bittorrent file
command: /usr/bin/ctorrent -d -U 50000 -s /var/lib/glance/images/{{ image_uuid }} /var/www/html/torrent/{{ image_uuid }}.torrent
- hosts: '{{hosts_to_update}}'
sudo: yes
vars:
image_uuid: '{{ image_uuid }}'
image_sha1: '{{ image_sha1 }}'
image_md5: '{{ image_md5 }}'
tracker_host: '{{ tracker_host }}'
tasks:
- name: install ctorrent client
yum: name=ctorrent state=present
- name: Check if image exits
stat: path=/var/lib/nova/instances/_base/{{ image_sha1 }}
register: image
- name: make sane
shell: "killall -9 ctorrent | true; iptables -D INPUT -p tcp --dport 2704:2706 -j ACCEPT | true"
when: image.stat.exists == False
- name: Download Torrent File and run torrent
command: "{{item}}"
with_items:
- /sbin/iptables -I INPUT -p tcp --dport 2704:2706 -j ACCEPT
- /usr/bin/wget http://{{ tracker_host }}/torrent/{{ image_uuid }}.torrent
- /usr/bin/ctorrent -e 0 -m 10 -U 30000 -D 80000 -p 2706 -s /var/lib/elsprecachedir/{{ image_uuid }} {{ image_uuid }}.torrent
when: image.stat.exists == False
- name: insure md5sum matches
shell: "md5sum /var/lib/elsprecachedir/{{ image_uuid }} | grep {{ image_md5 }}"
when: image.stat.exists == False
- name: Convert image to raw file
command: "{{item}}"
with_items:
- /usr/bin/qemu-img convert -f qcow2 -O raw /var/lib/elsprecachedir/{{ image_uuid }} /var/lib/nova/instances/_base/{{ image_sha1 }}
- /bin/chown nova:qemu /var/lib/nova/instances/_base/{{ image_sha1 }}
- /bin/chmod 644 /var/lib/nova/instances/_base/{{ image_sha1 }}
when: image.stat.exists == False
- name: Cleanup
shell: "/sbin/iptables -D INPUT -p tcp --dport 2704:2706 -j ACCEPT | true; rm -rf {{ image_uuid }}*; rm -rf /var/lib/elsprecachedir/{{ image_uuid }}; killall -9 ctorrent | true"
when: image.stat.exists == False
- hosts: '{{ tracker_host }}'
sudo: yes
vars:
image_uuid: '{{ image_uuid }}'
tasks:
- name: Kill tracker and ctorrent and remove torrent file
shell: "killall -9 ctorrent | true ; killall -9 opentracker-ipv4 | true; rm -rf /var/www/html/torrent/{{ image_uuid }}"

View File

@ -0,0 +1,3 @@
- name: Running puppet deployment script
shell: "tools/remove-deleted-orphans.sh"

View File

@ -0,0 +1,21 @@
---
- name: Rebooting Server
shell: sleep 2 && /sbin/shutdown -r now &
tags: reboot
- name: Waiting for port to go down from server reboot
wait_for: host={{ inventory_hostname }} port={{ reboot_check_port | default('22') }} timeout={{ wait_timeout | default('1200') }} state=stopped
connection: local
sudo: false
tags: reboot
- name: Waiting for port to come back after reboot
wait_for: host={{ inventory_hostname }} port={{ reboot_check_port | default('22') }} delay={{ wait_delay | default('120') }} timeout={{ wait_timeout | default('1200') }} state=started
connection: local
sudo: false
tags: reboot
- name: pausing to make sure host is fully booted
pause: minutes={{ pause_for_host_boot | default('3') }}
tags: reboot

View File

@ -0,0 +1,4 @@
- name: Removing user_storage_quota setting from glance-api.conf
shell: "sed -r -i 's/^[[:space:]]*user_storage_quota/#user_storage_quota/g' /etc/glance/glance-api.conf"
- service: name=openstack-glance-api state=restarted

View File

@ -0,0 +1,3 @@
- name: Adding user_storage_quota setting to glance-api.conf
shell: "sed -r -i '0,/^#?[[:space:]]*user_storage_quota/s/^#?[[:space:]]*user_storage_quota[[:space:]]*=[[:space:]]*[[:digit:]]+/user_storage_quota = {{ value | default('21474836480') }}/' /etc/glance/glance-api.conf"
- service: name=openstack-glance-api state=restarted

35
libvirt/cleanup-orphaned-vms.sh Executable file
View File

@ -0,0 +1,35 @@
#!/usr/bin/env bash
#
# Run this script on a compute node to cleanup/remove any orphaned KVM VMs that were left behind by something.
# Run with --noop to do a dry run and not actually delete anything
#
# To populate the UUIDS value below, run the following command on a Nova api server to get list of VM UUIDs that are known to OpenStack:
# nova list --all-tenants | awk '{print $2;}' | grep -E '^[0-9a-f]+' | tr '\n' '|' | sed -r 's/\|$/\n/'
# Then paste in the results for UUIDS below OR define it in the environment before running this script.
#
# Author: Kris Lindgren <klindgren@godaddy.com>
#
#UUIDS=""
if [ -z "$UUIDS" ]; then
echo "UUIDS value not defined"
exit 1
fi
for i in `virsh list --all | grep -E '^ [0-9-]+' | awk '{print $2;}'` ; do
virsh dumpxml $i | grep "source file" | grep -E "$UUIDS" >/dev/null
if [ $? -ne 0 ]; then
echo -n "+ $i is NOT known to OpenStack, removing managedsave info... "
[ ! -z "$1" ] && virsh managedsave-remove $i 1>/dev/null 2>&1
echo -n "destroying VM... "
[ ! -z "$1" ] && virsh destroy $i 1>/dev/null 2>&1
echo -n "undefining VM... "
[ ! -z "$1" ] && virsh undefine $i 1>/dev/null 2>&1
echo DONE
else
echo "* $i is known to OpenStack, not removing."
fi
done

View File

@ -0,0 +1,45 @@
#!/bin/bash
#
# This script will look at the configured vm's and will check to make sure that their disk drive still exists,
# If not then it will remove the vm from libvirt. This fixes the nova errors about disks missing from VM's
#
# Author: Kris Lindgren <klindgren@godaddy.com>
#
removeorphan(){
local domain
local tmp
domain=$1
tmp=$( virsh destroy $domain )
tmp=$( virsh undefine $domain )
tmp=$(virsh list --all | grep $domain )
if [ $? -eq 1 ]
then
tmp=$( ps auxwwwf | grep $domain | grep -v grep )
if [ $? -eq 1 ]
then
return 0
fi
fi
return 1
}
for i in /etc/libvirt/qemu/*.xml
do
disklocation=$( grep /var/lib/nova/instances $i | grep disk | cut -d"'" -f2,2)
if [ ! -e $disklocation ]
then
orphan=$(echo $i | cut -d"/" -f5,5 | cut -d"." -f1,1)
echo "$orphan does not have a disk located at: $disklocation"
echo "This is an orphan of openstack... stopping the orphaned vm."
removeorphan $orphan
if [ $? -eq 0 ]
then
echo "Domain $orphan has been shutdown and removed"
else
echo "Domain $orphan has *NOT* been shutdown and removed"
fi
fi
done

View File

@ -0,0 +1,25 @@
#!/bin/bash
#
# A quick and dirty script to backfill empty config drive
# images to VMs that don't already have a config drive.
#
# This is a workaround for the config drive bug described
# at https://bugs.launchpad.net/nova/+bug/1356534
#
# Author: Mike Dorman <mdorman@godaddy.com>
#
cd /root
mkdir -p blank
mkisofs -o blank.iso blank/ >/dev/null 2>&1
rmdir blank
for i in `ls /var/lib/nova/instances | grep -E '[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}'`; do
ls -l /var/lib/nova/instances/$i/disk.config
if [ ! -s /var/lib/nova/instances/$i/disk.config ]; then
echo "$i config drive doesn't exist, or is size zero."
cp -f /root/blank.iso /var/lib/nova/instances/$i/disk.config
chown qemu:qemu /var/lib/nova/instances/$i/disk.config
fi
done

20
nova/list-vms-by-host.sh Normal file
View File

@ -0,0 +1,20 @@
#!/bin/bash
#
# Outputs a tab-delimited list of all VMs with these fields:
# [Hypervisor Host] [UUID] [Status] [IP Address] [Name]
#
# Author: Mike Dorman <mdorman@godaddy.com>
#
for i in `nova list --all-tenants | grep -v '^+-' | grep -v '^| ID' | awk '{print $2 "," $4 "," $6;}'`; do
ID=`echo $i | cut -d, -f 1`
NAME=`echo $i | cut -d, -f 2`
STATUS=`echo $i | cut -d, -f 3`
SHOW=`nova show ${ID}`
HV=`echo "${SHOW}" | grep OS-EXT-SRV-ATTR:host | awk '{print $4;}'`
IP=`echo "${SHOW}" | grep " network" | awk '{print $5;}'`
echo -e "${HV}\t${ID}\t${STATUS}\t${IP}\t${NAME}"
done

View File

@ -0,0 +1,32 @@
#!/bin/bash
#
# Lists VMs which have been orphaned from their tenant (i.e. the tenant
# was removed, but VMs were still in the tenant.)
#
# Author: Kris Lindgren <klindgren@godaddy.com>
#
echo "THIS SCRIPT NEED TO HAVE keystonerc sourced to work"
sleep 5
echo "Getting a list of vm's from nova..."
novavmsraw=$( nova list --all-tenants --fields name,tenant_id,user_id )
echo "done."
echo "Getting a list of tenants from keystone...."
keystoneraw=$( keystone tenant-list )
echo "done."
novatenants=$( echo "$novavmsraw" | awk '{print $6}' | sort | uniq | grep -v Tenant )
echo "Starting to list vm's that are no longer attached to a tenant..."
echo "Fields are:"
echo "| VM ID | VM Name | Tenant Id | User Id |"
for i in $novatenants
do
tmp=$( echo "$keystoneraw" | grep $i )
if [ $? -eq 0 ]
then
continue
else
vms=$( echo "$novavmsraw" | grep $i )
echo "$vms"
fi
done