Commit Graph

306 Commits

Author SHA1 Message Date
Michal Nasiadka ed699b0c9a Drop k8s_fedora_atomic_v1 driver
Change-Id: I3551ae244ecf99f67a9b142c964c020a5fae70a3
2024-02-27 16:35:35 +00:00
Spyros c1c9942f8b fcos-k8s: Update to v1.22
* change rbac.authorization.k8s.io/v1beta1 to v1
  * update metrics-server
* change storage.k8s.io/v1beta1 to v1
* drop kubelet-https
* update to FCOS 35

story: 2009828
task: 44416

Signed-off-by: Spyros <strigazi@gmail.com>
Change-Id: I24b89366a4a8e8bc4c90f6a85ef6de2ac77dae1d
2022-02-03 13:59:32 +00:00
Grzegorz Bialas 9643abc9ae Upgrade to calico_tag=v3.21.2
Additionally, use fixed subnet CIDR for IP_AUTODETECTION_METHOD
supported from v3.16.x onwards.

Story: 2007256
Task: 42017

Change-Id: Iaa25cd5054cec5482f01d90e2cd150bcd9700dbe
2022-01-21 08:50:15 +00:00
Jake Yip 679a174a0a Refix --registry-enabled
This fixes an issue with --registry-enabled that was previously fixed [1] but
somehow dropped after a refactoring [2]

[1] Change Ib93a7c0f761d047da3408703a5cf4208821acb33
[2] Change Ibbed59bc135969174a20e5243ff8464908801a23

Task: 41306
Story: 2008383
Change-Id: I76fedd34edec55f5a906a96672529ed15775f5da
2021-11-25 12:41:18 +00:00
Zuul 042d2ad144 Merge "Update traefik options" 2021-04-27 11:01:11 +00:00
Zuul bc6ec3ab63 Merge "[hca] Use wallaby-stable-1 as default HCA tag" 2021-04-09 20:24:43 +00:00
Zuul c07628bca6 Merge "Support hyperkube_prefix label" 2021-04-07 19:09:49 +00:00
Diogo Guerra b4016783d5 Update traefik options
* Traefik version updated from v1.7.19 to v1.7.28
* Force secure connections to use TLSv1.2 or greater

Change-Id: I65561358113952e3f60dc488b35ee8fa8f8da740
Signed-off-by: Diogo Guerra <diogo.filipe.tomas.guerra@cern.ch>
2021-03-26 18:08:29 +01:00
Bharat Kunwar 1de9b140f4 Download correct cri-containerd-cni tarball
In I05cbd1ec62e9a68c68a1666ff62f20138bf8c731, fedora_coreos_v1 driver was
missed in version bump. This PS bumps it to 1.4.4 for both fedora_coreos_v1 and
fedora_atomic_v1 drivers.

Story: 2008451
Task: 42098

Change-Id: I22b698cd925dcf4f10805ae9493b77ddc9709f3f
2021-03-25 10:50:26 +01:00
Bharat Kunwar 7be7a5a123 [hca] Use wallaby-stable-1 as default HCA tag
Additionally:
- update syntax for compatibility with Ansible 2.9+.
- explicitly check for "not found" to prevent rebuild due to
  other types of errors, e.g. "pull rate limit".

Story: 2007264
Task: 42009

Change-Id: I68ca057e500ea293bde398288432a67eb758af25
2021-03-09 11:46:49 +00:00
Bharat Kunwar fc1f27a569 Support hyperkube_prefix label
Additionally for k8s_fedora_coreos_v1 driver:
* Introduce hyperkube_prefix which defaults to k8s.gcr.io/
* Bump default kube_tag to v1.18.16

Story: 1668998
Task: 41791

Change-Id: I38b8df45a00f1a2a1604059b8329d1dd762e05cd
2021-02-18 13:18:56 +00:00
Zuul f6dafb5084 Merge "Make kubelet and kube-proxy use the secure port" 2021-02-10 10:00:18 +00:00
Diogo Guerra ea64468ab3 3. Configure monitoring apps path based endpoints
* Add monitoring_ingress_enabled magnum label to set up ingress with
path based routing for all the configured services
{alertmanager,grafana,prometheus}. When using this,
cluster_root_domain_name magnum label must be used to setup base path
where this services are available.
* Add cluster_basic_auth_secret magnum label to configure basic auth
on unprotected services {alertmanager and  prometheus}. This is only
in effect when app access is routed by ingress.
* Set services logFormat to json to enable easier machine log parsing.

task: 39477
story: 2006765

Depends-On: Ieb90605182626869528349a7fdeed65061914bcb
Change-Id: Ie0e7000e0d94b2037f2c398fa67a2a2b7e256bc3
Signed-off-by: Diogo Guerra <diogo.filipe.tomas.guerra@cern.ch>
2021-02-05 15:52:52 +00:00
Diogo Guerra 37497ccf5b 1. Configurable prometheus monitoring persistent storage
* Add metrics_retention_days magnum label allowing user to specify
prometheus server scraped metrics retention days (default: 14)
* Add metrics_retention_size magnum label allowing user to specify
prometheus server metrics storage maximum size in Gib (default: 14)
* Add metrics_scrape_interval allowing user to specify prometheus
scrape frequency in seconds (default: 30)
* Add metrics_storage_class_name allowing user to specify the
storageClass to use as external retention for pod fail-over data
persistency

task: 39509
story: 2006765

Change-Id: I42117837e8e3cd03f3cb723df4d73692ead0d169
Signed-off-by: Diogo Guerra <diogo.filipe.tomas.guerra@cern.ch>
2021-02-05 15:52:33 +00:00
Theodoros Tsioutsias f46923cc5e Allow nodegroups with node_count equal to 0
This change allows users to create clusters and nodegroups with
node_count equal to 0. Also adds support for resizing existing
nodegroups to 0.

Change-Id: Id63459d0fe9836e678bb7569f23d29eabc225e9e
story: 2007851
task: 40145
Signed-off-by: Diogo Guerra <diogo.filipe.tomas.guerra@cern.ch>
2021-02-04 13:07:18 +00:00
Spyros Trigazis d11f4e8393 Make kubelet and kube-proxy use the secure port
Create certificates for kubelet and kube-proxy on control-plane
nodes similar to worker nodes.  Use the secure kube-apiserver
port on control-plane nodes.

story: 2008524
task: 41602

Change-Id: Ibeb32a24ca25914cab32c63a9ccafaf711148a84
Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>
2021-01-15 12:27:54 +00:00
Feilong Wang 8bdf0e76c6 Update containerd version and tarball URL
1. Update default containerd version to 1.4.3
2. Fix the redirect issue of containerd tarball download

story: 2008451

Change-Id: I05cbd1ec62e9a68c68a1666ff62f20138bf8c731
2021-01-05 09:35:44 +00:00
Mohammed Naser 2c63aca8c6 Stop using delete_on_termination for BFV instances
When using delete_on_termination and the booting of the instance fails
on the first attempt, the second attempt will fail with Heat.  The
reason is that with delete_on_termination set to True, Nova will delete
the volume when Heat deletes the ERROR'd instance and it will then
result in the follow-up boot to fail with an error along the line of
unable to find volume, which masks the real failure from the user (which
could potentialy be aquota issue).

With this patch, we no longer set this and instead use the default of
false.  This will not mean we will leak volumes because when we delete
the stack, Heat will do all the right things and delete them in order,
making sure the volume disappears eventually.

Change-Id: I362cea7bf57825035d13d234d0181a2b1fca5743
2020-08-26 20:53:06 -04:00
Bharat Kunwar ffed883959 [k8s-atomic] Support master_lb_allowed_cidrs in template
In I157a3b01d169e550e79b94316803fde8ddf77b03, support for
master_lb_allowed_cidrs  was introduced but only for the fedora coreos
driver. However, this parameter is also supplied to fedora atomic
clusters but the template does not expect this parameter. As a result,
cluster creation fails due to backward incompatibility. This PS
addresses this issue.

Task: 40632
Story: 2007478

Change-Id: Ia781288f7aa35146582b10d5762aa05e3b107dce
2020-08-07 15:26:24 +00:00
Bharat Kunwar 799563eb61 Remove shebang from scripts
Without this, heat container agents using  kubectl version
1.18.x (e.g. ussuri-dev) fail because they do not have the correct
KUBECONFIG in the environment.

Task: 39938
Story: 2007591

Change-Id: Ifc212478ae09c658adeb6ba4c8e8afc8943e3977
2020-06-16 20:53:07 +00:00
Zuul 52690900a7 Merge "Fix label fixed_network_cidr" 2020-06-11 11:20:37 +00:00
Feilong Wang 001b9c6101 Fix label fixed_network_cidr
Now the label `fixed_network_cidr` is not handled correctly, no matter
if the label is set, the default value '10.0.0.0/24' is used for
fixed network anyway. This patch fixes it and renamed it as
`fixed_subnet_cidr` to make less confusion. The new behaviour will be:
1. If the label `fixed_subnet_cidr` is set but no fixed subnet passed
   in, then a new subnet will be created with the given CIDR.
2. If a fixed subnet is passed in by user, then label `fixed_subnet_cidr`
   will be override with the CIDR from the given subnet.

Task: 39847
Story: 2007712

Change-Id: Id05e36696bf85297a556fcd959ed897fe47b7354
2020-06-11 13:54:59 +12:00
Bharat Kunwar 81d0699c4c [hca] Pin fedora to 32 until new greenlet release
Eventlet used by many openstack packages depends on greenlet which does
not have a pip release supported by Python 3.9 (default Python version
on Fedora 33). Therefore, pin Fedora to version 32 until new greenlet
release is cut which includes the required fix [0].

Also update default heat_container_agent_tag to victoria-dev.

[0] https://github.com/python-greenlet/greenlet/pull/161

Change-Id: Ice75ae880925cd15c096eb6d1cdabf7f802bccde
Story: 2007264
Task: 39941
2020-06-03 08:55:30 +00:00
Bharat Kunwar a79f8f52f9 [k8s] Use Helm v3 by default
- Refactor helm installer to use a single meta chart install job
  install job and config which use Helm v3 client.
- Use upstream helm client binary instead of using helm-client container
  maintained by us. To verify checksum, helm_client_sha256 label is
  introduced for helm_client_tag (or alternatively for URL specified
  using new helm_client_url label).
- Default helm_client_tag=v3.2.1.
- Default tiller_tag=v2.16.7, tiller_enabled=false.

Story: 2007514
Task: 39295

Change-Id: I9b9633c81afb08b91576a9a4d3c5a0c445e0cee4
2020-05-26 15:23:14 +00:00
Diogo Guerra 7ab504d1e2 Scrape internal kubernetes components
* Scrape metrics from kube-{controller-manager,scheduler}
* Disable PrometheusRule for etcd
* Extra configurations for future https updates

task: 38702
story: 2006765

Change-Id: Idace486e0e173e7beea0d5376e9edfd461377021
Signed-off-by: Diogo Guerra <diogo.filipe.tomas.guerra@cern.ch>
2020-05-20 18:24:10 +02:00
Bharat Kunwar 3179921f0c [k8s] Deprecate in-tree Cinder
- Deprecate in-tree Cinder volume driver for removal in X cycle in
  favour of out-of-tree Cinder CSI plugin for Kubernetes.
- Set cinder_csi_enabled to True by default from V cycle.
- Add unit test for in-tree Cinder deprecation.
- Add mssing unit tests for resent docker_storage_driver deprecation.

Change-Id: I6f033049b5ff18c19866637efc8cf964272097f5
Story: 2007048
Task: 37873
2020-05-19 08:43:58 +00:00
Zuul 76155d09db Merge "Update nginx-ingress to v1.36.3 and 0.32.0 tag" 2020-05-13 22:32:49 +00:00
Zuul 715a27dcb7 Merge "Update prometheus monitoring chart and images" 2020-05-12 23:01:33 +00:00
Spyros Trigazis 063a65e441 Update nginx-ingress to v1.36.3 and 0.32.0 tag
* remove user since it is controlled in the chart
  and changed from 33 to 101
* use the latest chart v1.36.3 from stable
* use latest 0.32.0 controller image

story: 2006945
task: 39747

Change-Id: I6df49929cb8890f534afde185d56b7b6d70c691e
Signed-off-by: Spyros Trigazis <strigazi@gmail.com>
2020-05-12 22:56:49 +00:00
Zuul 5ada350502 Merge "[k8s] Upgrade k8s dashboard version to v2.0.0" 2020-05-01 14:20:42 +00:00
Spyros Trigazis 40f40b7772 k8s: Use the same kubectl version as API
In the heat-agent we use kubectl to install
several deployments, it is better if we use
matching versions of kubectl and apiserver
to minimize errors. Additionally, the
heat-agent won't need kubectl anymore.

story: 2007591
task: 39536

Change-Id: If8f6d84efc70606ac0d888c084c82d8c7eff54f8
Signed-off-by: Spyros Trigazis <strigazi@gmail.com>
2020-04-24 17:11:13 +00:00
Feilong Wang b4965416b1 [k8s] Upgrade k8s dashboard version to v2.0.0
Heapster has been deprecated for a while and the new k8s dashboard
2.0.0 version supports metrics-server now. So it's time to upgrade
the default k8s dashboard to v2.0.0.

Task: 39101
Story: 2007256

Change-Id: I02f8cb77b472142f42ecc59a339555e60f5f38d0
2020-04-24 16:34:36 +12:00
Diogo Guerra 62a4b8ba09 Update prometheus monitoring chart and images
Features:
* Add to prometheus federation exported metrics the cluster_uuid label

Updates:
* prometheus-operator chart tag bumped to 8.12.13
* Update container_infra_prefix to missing prometheusOperator images

task: 39540
task: 39541
story: 2006765

Change-Id: I76bca268bf4e0b8c253f112c5665bd2b43fc8d44
Signed-off-by: Diogo Guerra <diogo.filipe.tomas.guerra@cern.ch>
2020-04-23 17:59:57 +02:00
Diogo Guerra 06659759f1 [k8s] Introduce helm_client_tag label.
Added label helm_client_tag to allow user to specify helm client
container version.

Task: 39294
Story: 2007514

Change-Id: I5d1cf238511951ac4a1849ca66b74dc747865391
Signed-off-by: Diogo Guerra <diogo.filipe.tomas.guerra@cern.ch>
2020-04-17 12:52:08 +00:00
Zuul 3b9f06726d Merge "Add selinux_mode label" 2020-04-10 00:09:32 +00:00
Spyros Trigazis dd4b79263f Support calico v3.3.6
For backwards compatibility support calico
v3.3.6 as well. The control flow is managed
in the heat templates.

Story: 2007256
task: 39280

Change-Id: Id61dbdaf09cde35fdd532e3fff216934c1ef4dff
Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>
2020-04-06 13:19:46 +00:00
Bharat Kunwar fd80e1989f Add selinux_mode label
Fedora Atomic default: permissive
Fedora CoreOS default: enforcing

Story: 2007413
Task: 39033

Change-Id: Ibc1e02098155ac95bb35fcea5f21cc380bdf0d03
Signed-off-by: Bharat Kunwar <brtknr@bath.edu>
2020-03-28 17:57:25 +00:00
Bharat Kunwar 2864fc57d4
Use cluster name for fixed_network instead of private
At present, when a fixed_network is not specified, it is given the name
"private" by default. When multiple clusters are created, we end up in a
situation where we end up with multiple networks all with the same name.
This PS intends to make it easier to see where the resources belong to
by using the cluster name.

Story: 2007460
Task: 39139

Change-Id: I7f8028b716f9a9eced17d85ca2e46e2b1e34875f
2020-03-24 06:38:38 +00:00
Zuul 69f5892a4f Merge "Update default calico_ipv4pool" 2020-03-17 22:57:39 +00:00
Feilong Wang d342fc0ad9 Update default calico_ipv4pool
The current default Calico IPv4 CIDR 192.168.0.0/16  is too common and 
it has bring us some IP conflicts troubles on production. This patch is
proposing to replace it with a rare CIDR range.

Task: 39052
Story: 2007426

Change-Id: I13aa0c58bf168bc069edf1d5c0187f89011fffdb
2020-03-16 22:33:10 +00:00
Zuul 305a0095ff Merge "Add cinder_csi_enabled label" 2020-03-16 06:43:47 +00:00
Feilong Wang d61dd1d5b5 [k8s] Support post install manifest URL
A new config option `post_install_manifest_url` is added to support
installing cloud provider/vendor specific manifest after booted
the k8s cluster. It's an URL pointing to the manifest file. For
example, cloud admin can set their specific storageclass into
this file, then it will be automatically setup after created
the cluster.

Task: 35798
Story: 2006209

Change-Id: Ib5a2c5cd7970085db941f189613e175f622aea3f
2020-03-05 20:30:12 +13:00
Bharat Kunwar 9565984fd9 Add cinder_csi_enabled label
Add support for out of tree Cinder CSI. This is installed when the
cinder_csi_enabled=true label is added. This will allow us to eventually
deprecate in-tree Cinder.

story: 2007048
task: 37868

Change-Id: I8305b9f8c9c37518ec39198693adb6f18542bf2e
Signed-off-by: Bharat Kunwar <brtknr@bath.edu>
2020-02-21 10:24:36 +00:00
Spyros Trigazis de21e0431a Add opt-in containerd support
New labels:
container_runtime, containerd or fallback to host-docker
containerd_version, taken from https://github.com/containerd/containerd/releases
containerd_tarball_url, eg https://storage.googleapis.com/cri-containerd-release/cri-containerd-1.2.4.linux-amd64.tar.gz
containerd_tarball_sha256, sha256 of the above tarball

story: 2007317
task: 38823

Change-Id: I6c6599cdee61f508bd2a5e4c454da3125a256753
Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>
2020-02-20 15:47:40 +00:00
Zuul 16ea8b6397 Merge "Fix api-cert-manager=true blocking cluster creation" 2020-02-03 17:53:15 +00:00
Zuul 454b0f55ec Merge "[k8s] Fix volumes availability zone issue" 2020-01-27 21:42:13 +00:00
Diogo Guerra 1ecec95b8c Fix api-cert-manager=true blocking cluster creation
In the current release, cert-api-manager runs on kubecluster.yaml [1],
but in the kubemaster.yaml [2] the script [3] expects the existance of
the ca.key file (if the cert_api_manager_enabled=true), otherwise it gets blocked.
This file (ca.key), in turn, it's created only when enable-cert-api-manager.sh runs [4]

So, we have a dead lock...
So we need to change the call enable-cert-api-manager.sh into the kubemaster.yaml

[1] https://github.com/openstack/magnum/blob/master/magnum/drivers/k8s_fedora_atomic_v1/templates/kubecluster.yaml#L1158-L1161
[2] https://github.com/openstack/magnum/blob/master/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml#L760
[3] https://github.com/openstack/magnum/blob/master/magnum/drivers/common/templates/kubernetes/fragments/enable-services-master.sh#L12-L16
[4] https://github.com/openstack/magnum/blob/master/magnum/drivers/common/templates/kubernetes/fragments/enable-cert-api-manager.sh#L11

On other issue, the chown of this file (ca.key) it's not working. Moving the
call of this file into kubemaster.yaml makes cluster creation FAILS because of
an error [7] in [5]. If we check a cluster created in stein [6] we notice that
the file is owned by root:root. Knowing this we can comment [5] for now.

[5] https://github.com/openstack/magnum/blob/master/magnum/drivers/common/templates/kubernetes/fragments/enable-cert-api-manager.sh#L13
[6] http://paste.openstack.org/show/788534/
[7] http://paste.openstack.org/show/788537/

Change-Id: Ibee2df435c3f7c34bff74e9146fb28d8367124b1
Signed-off-by: Diogo Guerra <diogo.filipe.tomas.guerra@cern.ch>
2020-01-17 14:29:36 +01:00
Zuul 7f8ffe7d7b Merge "Support verifying the digest for hyperkube image" 2020-01-16 04:22:21 +00:00
Feilong Wang a0e62df093 [k8s] Fix volumes availability zone issue
For a multi AZ env, if Nova doesn't support cross AZ volume mount,
then the cluster creation may fail because of block device mapping
error. The patch fixes this issue by passing in the AZ information
when creating volumes for etcd, docker and the node root disk.

Task: 38131
Story: 2007097

Change-Id: I39c99259abc84cbbee50ac1a827e9349ede6593c
2020-01-16 12:41:26 +13:00
Diogo Guerra 355c71924b Add calico_ipv4pool_ipip label
IPIP Mode to use for the IPv4 POOL created at start up
allowed_values: ["Always", "CrossSubnet", "Never", "Off"]
default: "Off"

Change-Id: Ib834a1f86a6db408047cc8f86fc7744d16d83904
Signed-off-by: Diogo Guerra <diogo.filipe.tomas.guerra@cern.ch>
2020-01-09 14:22:23 +01:00