Commit Graph

164 Commits

Author SHA1 Message Date
Jeremy Stanley 601e4a4a55 Transition to Rackspace API keys
Rackspace is requiring multi-factor authentication for all users
beginning 2024-03-26. Enabling MFA on our accounts will immediately
render password-based authentication inoperable for the API. In
preparation for this switch, add new cloud entries for the provider
which authenticate by API key so that we can test and move more
smoothly between the two while we work out any unanticipated kinks.

Change-Id: I787df458aa048ad80e246128085b252bb5888285
2024-03-05 19:31:09 +00:00
Jeremy Stanley b44cae0233 Check launched server for x86-64-v2/sse4_2 support
The "UBI" that the latest Keycloak images are based on has a glibc
compiled to only work on x86-64-v2 systems, and in some regions we
seem to sometimes get hypervisors reporting older processor
architectures where it won't work. Check CPU flags for sse4_2
support as an indicator, and abort launching if it's not present.

Change-Id: Ib0f482a939f94e801c82f3583e0a58dc4ca1f35c
2024-02-08 18:42:20 +00:00
Ian Wienand 530e14e32b
launch: fix RAX rdns command-line tool
I'm not sure this ever worked ... plumb the arguments through
correctly.

Change-Id: Ibf2e90bddb831b3671b24c48a8f19b0285978d1e
2023-04-19 10:45:11 +10:00
Ian Wienand ff45b12412
launch: further DNS cleanups
As pointed out by clarkb in review of prior change
I06995027a4b80133bdac91c263d7a92fd495493b the hostname handling here
is a bit wonky.

Use "host" instead of "hostname" since we use that in the rest of the
file.  The print_dns() function doesn't need a cloud argument, remove
it.  The print_sshfp_records was incorrectly splitting the host/domain
for nodes like "mirror.rax.iad.openstack.org" -- simply pass only the
host for the bind record from the print_dns() function.

Change-Id: I3d851902ef52588a69294b02e22f4b4667454629
2023-04-14 10:31:06 +10:00
Ian Wienand 03de935048
launch: refactor to work
This was never consistently showing the host key and sshfp records
upon launch.

Upon digging, a number of things are going wrong.

The socket.create_connection() check isn't waiting for the host to be
up properly.  This means the keyscans were not working, and we'd get
blank return values [1].  We have a ssh_connect() routine, rework it
to use that to probe.  We add a close method to the sshclient so we
can shut it down too.

I don't know why the inventory output was in dns.py, as it's not
really DNS.  Move it to the main launch_node.py, and simplify it by
using f-strings.  While we're here, deliminate the output a bit more
and make white-space more consistent.

This allows us to simplify dns.py and make it so it handles multiple
domains.

Since we're actually waiting for ssh to be up now, the keyscan works
better and this outputs the information we want.  A sample of this is

  https://paste.opendev.org/show/b1MjiTvYr4E03GTeP56w/

[1] ssh-keyscan has a very short timeout, and just returns blank if it
    doesn't get a response to it's probes.  We weren't checking its
    return code.

Change-Id: I06995027a4b80133bdac91c263d7a92fd495493b
2023-04-14 07:05:37 +10:00
Clark Boylan 276d31d2dd Fix rax reverse DNS setup in launch
The launch node script didn't call the method in dns.py to setup rax
reverse dns. This means we didn't attempt to set up reverse dns at all
even when booting in rax. Fix this by calling the appropriate method.

Note we do some refactoring too in order to keep dns.py in the business
of dealing only with forward dns. All of the rax reverse dns content is
kept in rax_rdns.py.

Change-Id: I86091f4e5c56d38bb2d25b983f8b77ec1cd5b7b5
2023-04-03 17:01:43 -07:00
Zuul b61c9edf66 Merge "launch: add a probe for ssh after reboot" 2023-03-14 00:26:44 +00:00
Ian Wienand 04c3e4a0cc
launch: add a probe for ssh after reboot
We do more operations after rebooting the host (getting ssh keys,
etc.).  Put in a small loop to wait for it to reappear.

Change-Id: Ibfaf530bba8f84bc5a6110e3dd7e7c73be7d5f4f
2023-03-03 09:19:24 +11:00
Zuul 662ddd8ef6 Merge "launch: add ssh keys to inventory" 2022-12-21 23:09:43 +00:00
Zuul 815a7728b1 Merge "launch: remove local mode for sshfp records" 2022-12-21 23:09:41 +00:00
Zuul 09a377232a Merge "launch: Automatically do RAX rdns updates when launching nodes" 2022-12-21 23:09:39 +00:00
Ian Wienand 87115f512c
launch: add ssh keys to inventory
When bringing up a new server, scan the ssh-keys of the remote IP and
add them automatically to the inventory output.

c.f. I4863425d5b784d0cdf118e1252414ca78fd24179

Change-Id: I2120fd476aa89e207ab76a1fc0faeeb5a0fb55ce
2022-12-01 11:29:36 +11:00
Ian Wienand 8fa64482dd
launch: remove local mode for sshfp records
The "non-local" mode was added to this for the old Bionic based bridge
node, whose version of ssh-keyscan didn't have "-D", so we had to
actually log into the remote host to query its keys.

Now this runs on a Jammy node, we can remove this and just use the
remote probe.  We don't have to worry about comaptability of this
tool, so I've just removed these bits.

Change-Id: Ie8254a965597db5695ff1613fc4ebf8cc26f3a25
2022-12-01 11:29:24 +11:00
Ian Wienand 20d2643f74
launch: Automatically do RAX rdns updates when launching nodes
On the old bridge node we had some unmanaged venv's with a very old,
now unmaintained RAX DNS API interaction tool.

Adding the RDNS entries is fairly straight forward, and this small
tool is mostly a copy of some of the bits for our dns api backup tool.
It really just comes down to getting a token and making a post request
with the name/ip addresses.

When the cloud the node is launched as is identified as RAX, this will
automatically add the PTR records for the ip4 & 6 addresses.  It also
has an entrypoint to be called manually.

This is added and hacked in, along with a config file for the
appropriate account (I have added these details on bridge).

I've left the update of openstack.org DNS entries as a manual
procedure.  Although they could be set automatically with small
updates to the tool (just a different POST) -- details like CNAMES,
etc. and the relatively few servers we start in the RAX mangaed DNS
domains means I think it's easier to just do manually via the web ui.
The output comment is updated.

Change-Id: I8a42afdd00be2595ca73819610757ce5d4435d0a
2022-12-01 11:26:32 +11:00
Jeremy Stanley a31bd1a8fd Improve launch-node deps and fix script bugs
The version of python-cinderclient needs to be constrained to before
the point at which it dropped volume API v2 support (which happened
in 8.0.0). If this is pinned back, latest openstackclient can be
installed and used for Rackspace volume operations without issue.
Make sure we install new enough OpenStackSDK so it doesn't try to
pass unsupported networking options in server create calls to
Rackspace too.

The script itself had a couple of issues once rehomed, the first
being it was looking for Ansible playbooks relative to its former
path in the repository rather than its installed location in the
venv, so make that path configurable but have it default to the
absolute path to those on the bridge now. Also, the script really
wanted to clear the ansible cache, but when that path doesn't exist
(as is currently the case on the bridge), it aborts rather than
continuing, so wrap that call in a try/except.

While we're here, update our default server image from focal to
jammy.

Change-Id: I103c7799ebe319d2d8b3fb626d7804387d3e8a60
2022-11-30 01:53:14 +00:00
Ian Wienand ed7083ed88
launch-node : make into a small package
This turns launch-node into an installable package.  This is not meant
for distribution, we just encapsulate the installation in a virtualenv
on the bastion host.  Small updates to documentation and simple
testing are added (also remove some spaces to make test_bridge.py
consistent).

Change-Id: Ibcb4774114d73600753ca155ed277d775964bc79
2022-11-21 16:29:22 +11:00
Jeremy Stanley 25dc84fecf Update launch-node's default from bionic to focal
We only use Ubuntu 20.04 LTS (Focal) for new servers now.

Change-Id: I357a8c35ff608e43031bef64a58eefca3cd651e4
2021-10-29 16:42:48 +00:00
Clark Boylan 0d7c02f132 Better swap alignment
We ran into this when fixing the zuul02 swap partition. Essentially
parted complained that our alignments weren't optimal. After some
googling the Internet said that using multiples of 8 tends to be safe.
We shifted the offset from 1MB to 8MB to start the partition and the
warnings went away.

Add this change into make_swap.sh to automate this for future servers.

Change-Id: Iad3ef40cf2c1e064482d49bd722c3de4354ec74d
2021-05-17 15:03:05 -07:00
Clark Boylan 5e43926b5e Fix min swap value in make_swap.sh
We just discovered that a number of new servers have rather small swap
sizes. It appears this snuck in via change 782898 which tries to bound
the max swap size to 8GB. Unfortunately the input to parted expects MB
so we make a swap size of 8MB instead of 8GB.

Bump the min value to 8192 to fix this.

Change-Id: I76b5b7dd8ac76c2ecbab9064bcdf956394b3a770
2021-05-14 14:09:11 -07:00
Clark Boylan bc82cc3e90 Handle focal's insistence we don't use root in launch-node.py
It seems newer focal images force you to use the ubuntu user account. We
already have code that attempts to fallback to ubuntu, but we were not
properly catching the error when root fails which caused the whole
script to fail.

Address this by catching the exception, logging a message, then
continuing to the next possible option. If no possible options are found
we raise an exception already which handles the worst case situation.

Change-Id: Ie6013763daff01063840abce193050b33120a7a2
2021-04-21 16:51:46 -07:00
Ian Wienand 2e629bfb96 launch-node : cap to 8gb swap
If you're donated a really nice, big server from a friendly provider
like Vexxhost, you need to cap the amount of swap you make or you fill
up the entire root disk.

Change-Id: Ide965f7df8db84a6bbfe3294c9c5b85f0dd7367f
2021-03-25 16:34:15 +11:00
Clark Boylan 5f4b5000c8 Fix sshfp record printing
Previously if you ran `sshfp.py foo.opendev.org x.y.z.a` it would spit
out records that look like:

  foo.opendev.org IN SSHFP 1 1 stuffstuffstuff

The problem with this is when you copy this output into the zone file
the lack of a terminating '.' means the record will actually be for
foo.opendev.org.opendev.org.

We address this by splitting on '.' and taking the first element. This
will still be broken for hosts named foo.bar.opendev.org but for now is
a decent improvement.

Change-Id: Ib12f66c30e20a62d14d0d0ddd485e28f7f7ab518
2021-03-05 12:18:13 -08:00
Zuul 6fc894b26b Merge "Wait for ipv6 addrs when launching nodes" 2020-09-22 19:39:14 +00:00
Zuul eabd2e3aac Merge "launch-node: get sshfp entries from the host" 2020-09-22 19:39:12 +00:00
Clark Boylan 2f9b31a93f Wait for ipv6 addrs when launching nodes
When launching new nodes with launch-node.py we need to wait for ipv6
addresses to configure prior to running ping6 sanity checks. The reason
for this is some clouds rely on router advertisements to configure ipv6
addrs on VMs. These happen periodically and the VM may not have its ipv6
address configured yet when we try to ping6 otherwise.

Change-Id: I77515fec481e4146765630cd230dd3c2c296958f
2020-09-04 14:25:59 -07:00
Ian Wienand 96dbd1a34e launch: move old scripts out of top-level
These don't make any sense in the top-level these days.

Once upon a time we used to use these as node scripts to bring up
testing nodes (I think).  The important thing is they're not used now.

Change-Id: Iffa6c6bee647f1a242e9e71241d829c813f2a3e7
2020-09-03 09:55:42 +10:00
Ian Wienand e819c26cad launch-node: get sshfp entries from the host
It turns out bionic ssh-keygen doesn't have the "-D" to produce the
sshfp records; switch to logging in and getting these via "ssh-keygen
-r" on the host.

Change-Id: Icb6efd7c4fd9623af24e58c69f8a188a4c1fb4c9
2020-08-20 15:10:01 +10:00
Ian Wienand 6494ed0275 Add OE mirror to inventory
This has been restarted.  While we're here, fix the path to the
inventory in the launch output.

Change-Id: I4d78d9eb2ee365e47850c68c36d475a468dc6064
2020-08-06 09:42:39 +10:00
Ian Wienand 3cbb877d43 launch-node : add sshfp records
Add a tool to scan a host and generate the sshfp records to go into
dns.  Hook this into the DNS print out from the node launcher.

Change-Id: I686287c3c081debeb6a230e2a3e7b48e5720c65a
2020-08-04 01:04:37 +00:00
Zuul 43f4121f4c Merge "Change launch scripts to python3 shebangs" 2020-06-15 21:14:26 +00:00
Monty Taylor 3ffeba5a20 Fix launch-node to work with the new inventory reorg
We moved some of these, but didnt' catch in launch-node
where we reference them.

Change-Id: I5939fc0c3cc5f49a99d99f91bca12186a5be2652
2020-06-11 17:22:23 -05:00
Clark Boylan 9da01bfbeb Change launch scripts to python3 shebangs
As part of our audit to find out what needs to be ported from python2 to
python3 I've discovered that launch-node is already all python3 (because
it runs on bridge) but the shebangs still pointed to `python`. Update
them to reduce confusion while we do the audit and potentially
uplift/port things.

Change-Id: I9a4c9397a1bc9a8b39c60b92ce58c77c0cb3f7f0
2020-06-08 16:05:11 -07:00
Ian Wienand 9f9ef66451 Use ipv4 in server launch inventory output
Follow-on to switch to ipv4 by default with
Ifea55a923453c4c2af20151512fa95549342d1f5.

Change-Id: I3b028b0ba3e8482421e8fd4c77e0a0d0f6b0be27
2020-05-22 09:50:30 +10:00
Monty Taylor ebae022d07 Use project-config from zuul instead of direct clones
We use project-config for gerrit, gitea and nodepool config. That's
cool, because can clone that from zuul too and make sure that each
prod run we're doing runs with the contents of the patch in question.

Introduce a flag file that can be touched in /home/zuulcd that will
block zuul from running prod playbooks. By default, if the file is
there, zuul will wait for an hour before giving up.

Rename zuulcd to zuul

To better align prod and test, name the zuul user zuul.

Change-Id: I83c38c9c430218059579f3763e02d6b9f40c7b89
2020-04-15 12:29:33 -05:00
Ian Wienand c54efaeeaa launch-node.py : use new(?) image name
The "PVHVM" image appears to have disappeared from RAX, replaced with
a "Cloud" image.

Maybe I haven't looked in the right place, but I can't find any info
on if, why or when this was updated.  But I started a server with the
"Cloud" image and it seems the same as the PVHVM image to me; hdparm
showed read speads the same as a older server and dd writes to a file
were the same speed (recorded below for posterity).

 ianw@nb04:~$ dd if=/dev/zero of=testfile bs=1G count=1 oflag=direct
 1+0 records in
 1+0 records out
 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 5.21766 s, 206 MB/s

 ianw@nb04:~$ sudo hdparm -Tt /dev/xvda
 /dev/xvda:
 Timing cached reads:   16428 MB in  1.99 seconds = 8263.05 MB/sec
 Timing buffered disk reads: 752 MB in  3.00 seconds = 250.65 MB/sec

From looking at dmesg it has

 [    0.000000] DMI: Xen HVM domU, BIOS 4.1.5 11/28/2013
 [    0.000000] Hypervisor detected: Xen HVM
 [    0.000000] Xen version 4.1.
 [    0.000000] Xen Platform PCI: I/O protocol version 1
 [    0.000000] Netfront and the Xen platform PCI driver have been compiled for this kernel: unplug emulated NICs.
 [    0.000000] Blkfront and the Xen platform PCI driver have been compiled for this kernel: unplug emulated disks.

which, if [1] is anything to go by suggests it is in PVHVM mode
anyway.

tl;dr seems like the image name changed.

[1] https://xen-orchestra.com/blog/debian-pvhvm-vs-pv/

Change-Id: I4ff14e7e36f59a9487c32fdc6940e8b8a93459e6
2020-03-18 16:54:44 +11:00
Zuul a514f098ac Merge "launch-node.py : make sure new inventory comes last" 2020-02-20 22:37:59 +00:00
Monty Taylor 99a52a9c52 Make small tweaks to launch node README
First of all, we're using RST syntax, so rename it to README.rst.

More importantly, remove menitons of puppetmaster - and puppet in
general, as they are distracting. When reading the file, my eyes
scanned and hit puppetmaster and I almost skipped the section with the
assumption it was out of date.

Change-Id: I294bf17084be7dad46e075ad2a3ef2674276c018
2020-02-12 08:42:32 -06:00
Ian Wienand 8980905319 launch-node.py : make sure new inventory comes last
If you happen to be booting a replacement host, you don't want ansible
to pick up the current host from the current inventory.  Put the new
server's inventory last in the list so it overrides and before it.

Change-Id: I3f1edfb95924dae0256f969bc740f1141e291c25
2020-02-07 14:09:00 +11:00
Zuul b0ea150b89 Merge "Correct emergency file reference in launch script" 2019-07-31 23:22:38 +00:00
Clark Boylan b1de301261 Use public_v4 addr when ignoring ipv6
In our launch node script we have the option to ignore ipv6 to deal with
clouds like ovh that report an ipv6 address but don't actually provide
that data to the instance so it cannot configure ipv6. When we ignore
ipv6 we should not try to use the ipv6 address at all.

Use the public_v4 address in this case when writing out an ansible
inventory to run the base.yaml playbook when launching the node.
Otherwise we could use ipv6 which doesn't work.

Change-Id: I2ce5cc0db9852d3426828cf88965819f88b3ebd5
2019-07-30 15:00:53 -07:00
Jeremy Stanley 4c04ad5436 Correct emergency file reference in launch script
The launch script is referring to the wrong path for the emergency
inventory. Also correct the references in the sysadmin guide and
update the example for using it.

Change-Id: I80bdbd440ec451bcd6fb1a3eb552ffda32407c44
2019-07-26 14:55:32 +00:00
Ian Wienand f673b71466 launch-node.py : add option to skip ipv6 address checks
As noted inline, this needs to be skipped on OVH (and I always forget,
and debug this over and over when launching a mirror node there :).

Change-Id: I07780e29f5fef75cdbab3b504f278387ddc4b13f
2019-06-26 18:28:28 +10:00
Clark Boylan 4e9fab65b7 Check spamhaus pbl when launching new servers
Add a reminder to launch node script to check the spamhaus pbl when
launching a new server.

Change-Id: I1daaccfb0b90fb46b29c035f8f4fd5788dffe627
2019-06-11 13:12:44 -07:00
Andreas Jaeger 15a5806bce Follow opendev renames
The sandbox repos moved from openstack-dev to opendev, the
zone-opendev.org and zone-zuul-ci.org as well.

Follow the rename in this repo.

Depends-On: https://review.opendev.org/657277
Change-Id: I31097568e8791cc49c623fc751bcc575268ad148
2019-05-30 16:00:30 +02:00
Ian Wienand d86d1d8796 launch.py : fix typo calling legacy dns print function
Change-Id: Ia33c93320497adeffd3ea4e812f11115a6570f28
2019-05-20 13:37:07 +10:00
Ian Wienand 87d2cea6a7 launch.py: Fix inventory list
This was introduced with Ia67e65d25a1d961b619aa445303015fd577dee57

Passing "-i file1,file2,file.." makes Ansible think that the inventory
argument is a list of hostnames.  Separate out the "-i" flags so it
reads each file as desired.

Change-Id: I92c9a74de6552968da6c919074d84f2911faf4d4
2019-05-20 13:09:40 +10:00
Ian Wienand 86d0d78255 Add --flush-cache to launch.py ansible
I managed to leave off the "--image" flag for a Xenial host, so the
script created a Bionic host by default.  I let that play out, deleted
the host and tried again with the correct image, but what ended up
happening was the fact cache thought this new host was Bionic, and
several ansible roles therefore ran thinking this too, and we ended up
with a bad Xenial/Bionic mashup.

Clear the cache on node launch to avoid this sort of thing again.

I have launched a node with this new option, and it worked.

Change-Id: Ie37f562402bed3846f27fbdd4441b5f4dcec7eb2
2019-03-19 17:09:41 +11:00
Monty Taylor eb6c3c2f1a Add global inventory to launch_node
Passing the -i to the jobdir means we're overriding the inventory.
This means variables that come from the /etc/ansible vars, like
sysadmins, are missing.

Add the global inventory to the command line for ansible-playbook.
We have --limit specified from '-l' - so we should still only run
on the host in question.

Change-Id: Ia67e65d25a1d961b619aa445303015fd577dee57
2019-03-08 17:53:28 +00:00
Monty Taylor e4c4d108f5 Print yaml inventory instructions
We need to also add servers to the inventory. Print a snippet to
add.

Change-Id: I630cc9f68b570b517eba81f23b603d84a019b20a
2019-02-28 18:21:20 +00:00
Monty Taylor ecbe164bae Clean up boot-from-volume volumes on error
When we're booting boot-from-volume servers and there are errors,
we leave the root volume around. Clean up after ourselves.

Change-Id: I6341cdbf21d659d043592f92ddf8ecf6be997802
2019-02-28 17:20:21 +00:00