Commit Graph

69 Commits

Author SHA1 Message Date
Clark Boylan c12ac287cc Remove unused nl0X.openstack.org config files
Once the opendev launchers are handling these duties and these servers
have all been removed from the system-config inventory we can go ahead
and land this change to clean up the unused config files.

Change-Id: I9792620eea81a07b6cbbfee37c08807114d2b390
2021-03-16 14:50:53 -07:00
Clark Boylan 2bc6b1e5d9 Flip nl02-04.openstack.org to nl02-04.opendev.org
Once these new servers are up and running in a happy idle state we are
clear to flip the configs around so the new focal servers take over node
provisioning duties. This change makes that happen.

Change-Id: I6ad57218805e28b555e1e3a0dc959ee4f00428cc
2021-03-16 14:49:40 -07:00
Carlos Goncalves ab394b20c7 Add nested-virt-centos-8-stream label
Change [1] added nested virtualization labels for Ubuntu Bionic and
CentOS 7, and change [2] for CentOS 8. This patch extends that to CentOS
8 Stream.

[1] https://review.opendev.org/#/c/683431/
[2] https://review.opendev.org/#/c/738161/

Change-Id: Ie8b532184d0cddad63e876a30ef521103dc11b84
2020-12-16 09:29:41 +01:00
Carlos Goncalves 374b24a4bd CentOS 8 Stream initial deployment
Change-Id: I0b04d92de6287b78bc422dd177403570eb18294a
2020-10-03 10:38:34 +00:00
Clark Boylan 45e8b6e3de Remove fedora-30 from nodepool launchers
This serves as a sanity check that we don't have any fedora-30 usage
hiding somewhere. If this goes in safely then we can remove the image
from the builders.

Change-Id: I09b21e812081f5855a069ca8ab1eedadf090c1b8
2020-09-25 12:03:26 -07:00
Lee Yarwood 91da9f92f6 Add Fedora 32 builds
Change-Id: I0acf6c581cd22a8835928ef643d4237f66ca6181
2020-09-10 13:45:08 +10:00
Artom Lifshitz 357e2ef12e Add nested-virt-ubuntu-focal label
Ubuntu Focal has a newer libvirt version than Bionic (4.0.0 vs 6.0.0).
By adding a Focal-flavored nested-virt label, features made possible
by a more recent libvirt version can be tested in the gate.
Specifically, whitebox-tempest-plugin tests Nova's hw_video_type image
property. Support for the 'none' value was added in libvirt 4.6.0.

Change-Id: Id48fff64d13c258d9f22908debfad86c5f089bf5
Needed-by: https://review.opendev.org/#/c/742014/
2020-07-27 09:02:45 -04:00
Carlos Goncalves 2cd18a7772 Add nested-virt-centos-8 label
Change [1] added nested virtualization labels for Ubuntu Bionic and
CentOS 7. This patch extends that to CentOS 8.

Additionally, we extend nl04 to include these labels too as OVH is a
nested-virt enabled nodepool provider.

[1] https://review.opendev.org/#/c/683431/

Change-Id: Ibf5ac5fa0371cc70dbe58806d147568278afcfea
2020-06-26 11:13:07 +02:00
Clark Boylan 60f352bfce Use infra-root-keys-2020-05-13 in nodepool
This should only be landed once we've landed the dependency and
confirmed all clouds have the new key value.

This does our semi regular key rotation.

Depends-On: https://review.opendev.org/727865
Change-Id: Ic55c96ad5dd867b70fa52c396e792d5a2e2e0470
2020-05-18 08:52:19 -07:00
Jeremy Stanley 53d810d8da Revert "Temporarily disable OVH"
New mirror servers have been built, so turn our utilization of these
regions back on again.

This reverts commit 37d292ee74.

Change-Id: Id86b578cec163e264c93fbbbda32a2cc4603492a
2020-05-12 19:19:57 +00:00
James E. Blair 37d292ee74 Temporarily disable OVH
We have lost contact with our mirrors and seem unable to access
our account.

Change-Id: If29ca8a06759a3871a46826caaed1d56dda86603
2020-05-11 06:36:43 -07:00
Dr. Jens Harbott d14b65cf46 Launch focal nodes
Focal images were built with [0] and the result looks successful, so
let's start launching them.

[0] https://review.opendev.org/720719

Change-Id: I2b825178df230d13d75e782c60dd247e6d65ac8b
2020-04-26 11:56:28 +00:00
Ian Wienand 378161469b Add Fedora 31 to launchers
Images are built and seem ready to go

Change-Id: I9677af7e15d8b6c561f9f2dc5ed968a6708dd93e
2020-04-03 08:34:52 +11:00
Andreas Jaeger dc09dffbe2 Remove Fedora 29 from nodepool
All jobs using Fedora 29 have been removed, we can remove it from
nodepool and thus OpenDev now.

Depends-On: https://review.opendev.org/711969
Change-Id: I75c0713d164c29a47db9a0cdfc43fadb370e81f8
2020-03-10 10:50:37 +01:00
Andreas Jaeger 398adb791f Bye, Bye, Trusty
This removes trusty from the repo and thus from OpenDev.

Afterwards the AFS volume mirror.wheel.trustyx64 can be deleted.

Depends-On: https://review.opendev.org/702771
Depends-On: https://review.opendev.org/702818
Change-Id: I3fa4c26b0c8aeacf1af76f9046ea98edb2fcdbd0
2020-01-19 16:00:55 +01:00
Clark Boylan 88dcfdb488 Remove opensuse-150
The opensuse-150 image is being removed as the 15.0 release is EOL.
Similar to centos the expectation is that users keep up to date with
minor releases. For this we have the opensuse-15 image which should be
used instead.

Depends-On: https://review.opendev.org/#/c/682844/
Change-Id: I8db99f8f2fd4b1b7b9a5e06148ca2dc185ed682b
2019-11-12 12:06:49 -08:00
Ian Wienand b4daf4e47b Remove fedora-28 nodes
The time has come to retire fedora-28; remove from configuration.

Change-Id: Ic0b4b065a217dcfaa8c230cda53114793e93b803
2019-10-23 09:38:40 +11:00
James E. Blair 19380c9b59 Remove opensuse-423 from nodepool
This label/image is no longer supported.

Change-Id: Ic48d7db4c6b7b1dd2589118c630f45deee19730f
2019-10-16 07:50:22 +02:00
Ian Wienand 0a13b1dd92 CentOS 8 initial deployment
This is for an initial build and debug purposes

Change-Id: I5bbb83bf313f13c216ca3e031d7a8e3dc8ba7cc7
2019-10-09 17:02:01 +11:00
Ian Wienand e38efe3a84 Add Fedora 30 nodes
Fedora 30 is supported with diskimage-builder's 2.27.0 release
recently tagged

Change-Id: Ic0316a0e755b67181ae74c06f712e351ea485de7
2019-09-09 16:56:02 +10:00
Arnaud Morin e2100145fe Reduce OVH BHS1 pool size to 120
OVH is having issues on some hypervisors on OSF aggregate.
As a result, the others hypervisors are under heavy load and some
instances are not able to boot correctly.
To avoid this, I propose to reduce the number of instance to 120 for a
while.

Change-Id: Ic5f4b279e7222e9ec242aeb80e69612d2e6ef70f
Signed-off-by: Arnaud Morin <arnaud.morin@gmail.com>
2019-06-20 10:37:27 +02:00
Dirk Mueller eb81a56ac1 Add openSUSE 15.1 to the nodepool building as opensuse-15
The main reason why we can do this cleanup now is because there has been a policy
change with openSUSE Leap 15: the minor releases like 15.1 and 15.2 are similarly
backwards compatible like e.g. minor releases in centos 7.x are, as such
we can build an opensuse 15 image in the ci and update all jobs to that
one to have less continuous effort in maintaining the opensuse builds.

Depends-On: https://review.opendev.org/#/c/660137/

Change-Id: I2b1f21fb6e01558c8cee27de116dfc857a1a1c91
2019-06-14 17:08:40 +02:00
Arnaud Morin 56a21b2622 Remove glean config for OVH nodes
OVH updated its infra in order to have correct network_data.json given
by metadata API and / or config drive.

So glean override for this is not needed anymore.

Change-Id: Id97aceb78019b7b71bc231778d7ea7e0f3964e0d
Signed-off-by: Arnaud Morin <arnaud.morin@corp.ovh.com>
2019-06-11 16:55:28 +02:00
Thomas Goirand 7bd7b5f044 Add a Debian Buster image.
Change-Id: Ia55d240fa8ce1d19ef69608e68c47e02776f15ca
2019-05-21 13:47:04 +02:00
Andreas Jaeger 6eedb831d3 Revert "Revert "Revert "nodepool: pause ovh"""
This reverts commit 9467a1e51b.

With using a different nameserver, everything should be fine again.

Change-Id: Icd388dd5b96526c10bd4452a2c1d9f83f656edc6
2019-04-26 16:28:09 +00:00
Jeremy Stanley 9467a1e51b Revert "Revert "nodepool: pause ovh""
This reverts commit 1911815832.

Seems the DNS lookup failures are continuing, based on recent job
logs.

Change-Id: I55690b005eb1a393041f93f2512c783f59bec6d2
2019-04-24 19:30:13 +00:00
Mohammed Naser 1911815832 Revert "nodepool: pause ovh"
This reverts commit 8d8755bb4b1fdc6566cb4d39c524a0229685c9db.

Change-Id: Ic9a5ea87bb8eefeb4217a739de063a1a5a4f1990
2019-04-18 19:00:18 -04:00
Mohammed Naser 054c0a0712 nodepool: pause ovh
It seems to be seeing network issues, such as Ansible timing out
trying to connect to the machine and the VMs failing to hit 1.1.1.1
to resolve hosts such as git.openstack.org on node startup.

Change-Id: Id4af1ec98899afd1f2e55ad7b7bd397ceca43a62
2019-04-18 19:00:17 -04:00
Ian Wienand 1fdd6945d6 Revert "Add CentOS NetworkManager testing node"
This reverts commit 32e63aa0c8 (and
small follow-on fix 0eeb4395d1)

The base CentOS node is switched to NetworkManager support

Change-Id: Ic254273afdf0637194b608b781ea9e3ff4bd73a3
2019-01-14 09:42:59 +11:00
Arnaud Morin 62e10ef2b0 Enable back GRA1 on OVH cloud
We updated the kernel on the aggregate, so we dont have the issue with
memory leak anymore.
We can safely re-enable GRA1 nodepool.

Change-Id: Ie1d4e188c352d427e2e2113daedc38c1eea2e92a
Signed-off-by: Arnaud Morin <arnaud.morin@corp.ovh.com>
2019-01-11 11:18:44 +01:00
Ian Wienand 32e63aa0c8 Add CentOS NetworkManager testing node
This enables NetworkManager control of interfaces on a new centos7-nm
node type.  This is intended only for short-term initial testing.

Change-Id: I43318f33d206c28e1f06ac7a8f07c3fb8c8f0626
2019-01-10 10:03:31 +11:00
Ian Wienand 1c650ef2c7 Add Fedora 29 nodes
Add Fedora 29 nodes

Depends-On: https://review.openstack.org/618671
Change-Id: Icea5721c295d31a7efc953bd71fa914727c56d08
2019-01-09 15:53:57 +11:00
Arnaud Morin feda25de9d Set OVH GRA1 region in maintenance mode
I recently applied a new kernel on BHS1, if everything is fine with
that, I propose to apply the same one GRA1 so it will help fixing some
timeout errors.

Change-Id: I489f8b84871c18f2dad079cae5b53fb1a504f1bd
Signed-off-by: Arnaud Morin <arnaud.morin@corp.ovh.com>
2018-12-20 08:29:14 +01:00
Clark Boylan 7942c19f22 Use OVH BHS1 again
Set ovh-bhs1 max-servers to 150. OVH (thank you amorin) have debugged
and corrected a memory leak there that we believe to be the cause of the
test node slowness.

Frickler and I have run fio tests on VMs running on each hypervisor in
the region and they look happy. We've also run spot tests of devstack
and tempest which also appear happy.

Change-Id: If6fd5a6194a9996e8b031f74918f373dc7bbe758
2018-12-18 07:59:16 -08:00
Jens Harbott 55d145c34e Disable ovh bhs1
We are seeing excessive job timeouts in this region[0], disable it
until we can get a more stable turnout again.

[0] https://ethercalc.openstack.org/jg8f4p7jow5o

Change-Id: I7969cca2cdd99526294a4bf7a0f44f059823dae7
2018-12-07 14:06:27 +00:00
Clark Boylan a5088837e2 Halve bhs1 max-servers value
We are debugging slow nodes in bhs1. Looking at dstat data we clearly
have some jobs that end up spending a lot of cpu time in sys and wai
columns while other similar jobs do not.

One thought was that this is due to an unhappy hypervisor or two, but
amorin has dug in and found that these slow jobs run on multiple unique
hypervisors implying that isn't likely.

My next thought is that we are our own noisy neighbors. Reducing the
max-servers should improve things if we are indeed our own noisy
neighbors.

Change-Id: Idd7804778a141d38da38b739294c6c6a62016053
2018-12-06 14:04:47 -08:00
Arnaud Morin 7671cc88f5 Reduce a little number of instances on BHS1
I'd like to isolate one host from the aggregate, but to perform that in
a good way, it's better to reduce the number of instances the nodepool
is trying to boot, this will avoid useless no valid host found errors.

Change-Id: Iddbfba1c3093e9f128c41db91d6b5b3e1d467ce8
Signed-off-by: Arnaud Morin <arnaud.morin@corp.ovh.com>
2018-12-05 09:01:56 +01:00
Jeremy Stanley dfda58e203 Revert "Temporarily disable ovh-bhs1 in nodepool"
This reverts commit 3f40af4296.

Can be approved once the slow disk performance in this region is
resolved.

Change-Id: Idda585116ae9dc09b55f6794ab5ee7bda47f455a
2018-11-30 17:38:54 +00:00
Jeremy Stanley 3f40af4296 Temporarily disable ovh-bhs1 in nodepool
We've gotten reports of frequent slow job runs in the BHS1 region
leading to job timeouts. Further investigation indicates these
instances top out around ~10-15MB/sec for contiguous writes to their
rootfs while instances booted from the same image and flavor in GRA1
see 250MB/sec or better with the same write patterns. Disable BHS1
in nodepool for now while we work with OVH staff to see if they can
determine the root cause.

Change-Id: I8b9a79b64dd7da6d3a33f24797ca597bd2426c86
2018-11-30 17:33:50 +00:00
Jeremy Stanley 970987e3ce Revert "Halve ovh-bhs1 max-servers temporarily"
This reverts commit 521d1ceafe. Merge
once testing of the CPU contention theory has concluded.

Change-Id: Ia15f6f943bab530e8b6fd96a2c57d091d60e3193
2018-11-23 15:30:52 +00:00
Jeremy Stanley 521d1ceafe Halve ovh-bhs1 max-servers temporarily
We've gotten reports of frequent slow job runs in the BHS1 region
leading to job timeouts and OVH staff have confirmed we're running a
CPU oversubscription ratio of 2:1 there, so try dropping our
utilization by half to confirm whether this could be due to CPU
contention during peak load.

Change-Id: If7e5f3c0dec71813f5bcb974a0217dc031801115
2018-11-23 15:25:10 +00:00
Clark Boylan 57eaa73695 Switch nodepool launchers to use new zk cluster
This should happen at the same time as we switch the zuul scheduler over
to the new zk cluster and after the nodepool builders have populated
image data on the new zk cluster.

This gets us off the old nodepool.o.o server and onto newer HA cluster.

Change-Id: I9cea03f726d4acb21ad5584f8db7a4d15bc556db
2018-10-22 09:23:12 -07:00
Ian Wienand c64c3d6f0f Restore full OVH-GRA1 quota
This is a follow-on to Id01f85fcee150f9360f508b09003a8d0043155bd to
restore the full quota.

Change-Id: Iec483a37f711f12fbb8ae6fe3299aabe4f621ac4
2018-10-19 16:01:07 +02:00
Ian Wienand 529b912c80 Revert "Disable ovh-gra1"
This partially reverts commit
bfdd3e6a42.

After fruitful discussions with amorin in IRC, we have nodes working
again in this region.  This puts a small load on for us to monitor for
a while.  A follow-on will do a full revert so we don't forget.

Story: #2004090
Task: #27492
Change-Id: Id01f85fcee150f9360f508b09003a8d0043155bd
2018-10-18 09:41:14 +00:00
Ian Wienand bfdd3e6a42 Disable ovh-gra1
As described in the story/task, this region is currently not working

Change-Id: Ief7b68b45537e7fc8791905d3039d35942636368
Story: #2004090
Task: #27492
2018-10-16 17:34:09 +11:00
Clark Boylan 9a3fc0c1e2 Revert "Disable OVH BHS1 region"
This reverts commit 19e7cf09d9.

The issues in OVH BHS1 around networking configuration have been worked
around with updates to glean and configuration to the labels in zuul.
New images are in place for each supported image in BHS1. We can go
ahead and start using this region again.

I have manually tested this by booting an ubuntu-xenial node with
glean_ignore_interfaces='True' set in metadata and the networking comes
up with expected using DHCP. The mirror in that region is reachable from
this test node.

Change-Id: I29746686217a62709c4afc6656d95829ace6fb3b
2018-09-25 14:01:27 -07:00
Clark Boylan 22fb41c763 Glean config on OVH nodes
Instruct glean via metadata properties to ignore the config drive
network_data.json interface data on OVH and instead fall back to DHCP.
This is necessary because post upgrade OVH config drive
network_data.json provides inaccurate network configuration details and
DHCP is actually what is needed there for working l2 networking.

Change-Id: I51f16d34a96ee8d964e8b540ce5113a662a56f6d
2018-09-25 09:28:03 -07:00
Ian Wienand 19e7cf09d9 Disable OVH BHS1 region
This reverts commit 756a8f43f7, which
was where we re-enabled OVH BHS1 after maintenance.  I strongly
suspect that this has something to do with the issues ...

It appears that VM's in BHS1 can not communicate with the mirror

From a sample host 158.69.64.62 to mirror01.bhs1.ovh.openstack.org

---
 root@ubuntu-bionic-ovh-bhs1-0002154210:~# ip addr
 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
 2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether fa:16:3e:1b:4b:32 brd ff:ff:ff:ff:ff:ff
    inet 158.69.64.62/19 brd 158.69.95.255 scope global ens3
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe1b:4b32/64 scope link
       valid_lft forever preferred_lft forever

 root@ubuntu-bionic-ovh-bhs1-0002154210:~# traceroute -n mirror01.bhs1.ovh.openstack.org
 traceroute to mirror01.bhs1.ovh.openstack.org (158.69.80.87), 30 hops max, 60 byte packets
  1  158.69.64.62  2140.650 ms !H  2140.627 ms !H  2140.615 ms !H

 root@ubuntu-bionic-ovh-bhs1-0002154210:~# ping mirror01.bhs1.ovh.openstack.org
 PING mirror01.bhs1.ovh.openstack.org (158.69.80.87) 56(84) bytes of data.
 From ubuntu-bionic-ovh-bhs1-0002154210 (158.69.64.62) icmp_seq=1 Destination Host Unreachable
 From ubuntu-bionic-ovh-bhs1-0002154210 (158.69.64.62) icmp_seq=2 Destination Host Unreachable
 From ubuntu-bionic-ovh-bhs1-0002154210 (158.69.64.62) icmp_seq=3 Destination Host Unreachable
 --- mirror01.bhs1.ovh.openstack.org ping statistics ---
 4 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3049ms
---

However, *external* access to the mirror host and all other hosts
seems fine.  It appears to be an internal OVH BHS1 networking issue.

I have raised ticket #9721374795 with OVH about this issue.  It needs
to be escalated so is currently pending (further details should come
to infra-root@openstack.org).

In the mean time, all jobs are failing in the region.  Disable it
until we have a solution.

Change-Id: I748ca1c10d98cc2d7acf2e1821d4d0f886db86eb
2018-09-20 15:55:45 +10:00
Andreas Jaeger 756a8f43f7 Revert "Revert "Revert "OVH BHS1 Maintenance" - 2018-09-19 1200UTC""
Enable OVH BHS1 again.

This reverts commit d74c51b0a5.

Change-Id: Ie3c24efb3e9a753d027dc680ab6a26c6a1934159
2018-09-19 13:18:20 +00:00
Andreas Jaeger d74c51b0a5 Revert "Revert "OVH BHS1 Maintenance" - 2018-09-19 1200UTC"
OVH is not ready yet.

This reverts commit d610f9b6b2.

Change-Id: I8365d0def2c1bcb1ca16889092f2267c374942df
2018-09-19 12:54:19 +00:00