Commit Graph

140 Commits

Author SHA1 Message Date
Michal Nasiadka ed699b0c9a Drop k8s_fedora_atomic_v1 driver
Change-Id: I3551ae244ecf99f67a9b142c964c020a5fae70a3
2024-02-27 16:35:35 +00:00
Grzegorz Bialas 9643abc9ae Upgrade to calico_tag=v3.21.2
Additionally, use fixed subnet CIDR for IP_AUTODETECTION_METHOD
supported from v3.16.x onwards.

Story: 2007256
Task: 42017

Change-Id: Iaa25cd5054cec5482f01d90e2cd150bcd9700dbe
2022-01-21 08:50:15 +00:00
Zuul c07628bca6 Merge "Support hyperkube_prefix label" 2021-04-07 19:09:49 +00:00
Bharat Kunwar fc1f27a569 Support hyperkube_prefix label
Additionally for k8s_fedora_coreos_v1 driver:
* Introduce hyperkube_prefix which defaults to k8s.gcr.io/
* Bump default kube_tag to v1.18.16

Story: 1668998
Task: 41791

Change-Id: I38b8df45a00f1a2a1604059b8329d1dd762e05cd
2021-02-18 13:18:56 +00:00
Zuul f6dafb5084 Merge "Make kubelet and kube-proxy use the secure port" 2021-02-10 10:00:18 +00:00
Diogo Guerra ea64468ab3 3. Configure monitoring apps path based endpoints
* Add monitoring_ingress_enabled magnum label to set up ingress with
path based routing for all the configured services
{alertmanager,grafana,prometheus}. When using this,
cluster_root_domain_name magnum label must be used to setup base path
where this services are available.
* Add cluster_basic_auth_secret magnum label to configure basic auth
on unprotected services {alertmanager and  prometheus}. This is only
in effect when app access is routed by ingress.
* Set services logFormat to json to enable easier machine log parsing.

task: 39477
story: 2006765

Depends-On: Ieb90605182626869528349a7fdeed65061914bcb
Change-Id: Ie0e7000e0d94b2037f2c398fa67a2a2b7e256bc3
Signed-off-by: Diogo Guerra <diogo.filipe.tomas.guerra@cern.ch>
2021-02-05 15:52:52 +00:00
Diogo Guerra 37497ccf5b 1. Configurable prometheus monitoring persistent storage
* Add metrics_retention_days magnum label allowing user to specify
prometheus server scraped metrics retention days (default: 14)
* Add metrics_retention_size magnum label allowing user to specify
prometheus server metrics storage maximum size in Gib (default: 14)
* Add metrics_scrape_interval allowing user to specify prometheus
scrape frequency in seconds (default: 30)
* Add metrics_storage_class_name allowing user to specify the
storageClass to use as external retention for pod fail-over data
persistency

task: 39509
story: 2006765

Change-Id: I42117837e8e3cd03f3cb723df4d73692ead0d169
Signed-off-by: Diogo Guerra <diogo.filipe.tomas.guerra@cern.ch>
2021-02-05 15:52:33 +00:00
Spyros Trigazis d11f4e8393 Make kubelet and kube-proxy use the secure port
Create certificates for kubelet and kube-proxy on control-plane
nodes similar to worker nodes.  Use the secure kube-apiserver
port on control-plane nodes.

story: 2008524
task: 41602

Change-Id: Ibeb32a24ca25914cab32c63a9ccafaf711148a84
Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>
2021-01-15 12:27:54 +00:00
Mohammed Naser 2c63aca8c6 Stop using delete_on_termination for BFV instances
When using delete_on_termination and the booting of the instance fails
on the first attempt, the second attempt will fail with Heat.  The
reason is that with delete_on_termination set to True, Nova will delete
the volume when Heat deletes the ERROR'd instance and it will then
result in the follow-up boot to fail with an error along the line of
unable to find volume, which masks the real failure from the user (which
could potentialy be aquota issue).

With this patch, we no longer set this and instead use the default of
false.  This will not mean we will leak volumes because when we delete
the stack, Heat will do all the right things and delete them in order,
making sure the volume disappears eventually.

Change-Id: I362cea7bf57825035d13d234d0181a2b1fca5743
2020-08-26 20:53:06 -04:00
Bharat Kunwar 799563eb61 Remove shebang from scripts
Without this, heat container agents using  kubectl version
1.18.x (e.g. ussuri-dev) fail because they do not have the correct
KUBECONFIG in the environment.

Task: 39938
Story: 2007591

Change-Id: Ifc212478ae09c658adeb6ba4c8e8afc8943e3977
2020-06-16 20:53:07 +00:00
Bharat Kunwar a79f8f52f9 [k8s] Use Helm v3 by default
- Refactor helm installer to use a single meta chart install job
  install job and config which use Helm v3 client.
- Use upstream helm client binary instead of using helm-client container
  maintained by us. To verify checksum, helm_client_sha256 label is
  introduced for helm_client_tag (or alternatively for URL specified
  using new helm_client_url label).
- Default helm_client_tag=v3.2.1.
- Default tiller_tag=v2.16.7, tiller_enabled=false.

Story: 2007514
Task: 39295

Change-Id: I9b9633c81afb08b91576a9a4d3c5a0c445e0cee4
2020-05-26 15:23:14 +00:00
Zuul 715a27dcb7 Merge "Update prometheus monitoring chart and images" 2020-05-12 23:01:33 +00:00
Zuul 5ada350502 Merge "[k8s] Upgrade k8s dashboard version to v2.0.0" 2020-05-01 14:20:42 +00:00
Spyros Trigazis 40f40b7772 k8s: Use the same kubectl version as API
In the heat-agent we use kubectl to install
several deployments, it is better if we use
matching versions of kubectl and apiserver
to minimize errors. Additionally, the
heat-agent won't need kubectl anymore.

story: 2007591
task: 39536

Change-Id: If8f6d84efc70606ac0d888c084c82d8c7eff54f8
Signed-off-by: Spyros Trigazis <strigazi@gmail.com>
2020-04-24 17:11:13 +00:00
Feilong Wang b4965416b1 [k8s] Upgrade k8s dashboard version to v2.0.0
Heapster has been deprecated for a while and the new k8s dashboard
2.0.0 version supports metrics-server now. So it's time to upgrade
the default k8s dashboard to v2.0.0.

Task: 39101
Story: 2007256

Change-Id: I02f8cb77b472142f42ecc59a339555e60f5f38d0
2020-04-24 16:34:36 +12:00
Diogo Guerra 62a4b8ba09 Update prometheus monitoring chart and images
Features:
* Add to prometheus federation exported metrics the cluster_uuid label

Updates:
* prometheus-operator chart tag bumped to 8.12.13
* Update container_infra_prefix to missing prometheusOperator images

task: 39540
task: 39541
story: 2006765

Change-Id: I76bca268bf4e0b8c253f112c5665bd2b43fc8d44
Signed-off-by: Diogo Guerra <diogo.filipe.tomas.guerra@cern.ch>
2020-04-23 17:59:57 +02:00
Diogo Guerra 06659759f1 [k8s] Introduce helm_client_tag label.
Added label helm_client_tag to allow user to specify helm client
container version.

Task: 39294
Story: 2007514

Change-Id: I5d1cf238511951ac4a1849ca66b74dc747865391
Signed-off-by: Diogo Guerra <diogo.filipe.tomas.guerra@cern.ch>
2020-04-17 12:52:08 +00:00
Bharat Kunwar fd80e1989f Add selinux_mode label
Fedora Atomic default: permissive
Fedora CoreOS default: enforcing

Story: 2007413
Task: 39033

Change-Id: Ibc1e02098155ac95bb35fcea5f21cc380bdf0d03
Signed-off-by: Bharat Kunwar <brtknr@bath.edu>
2020-03-28 17:57:25 +00:00
Zuul 305a0095ff Merge "Add cinder_csi_enabled label" 2020-03-16 06:43:47 +00:00
Feilong Wang d61dd1d5b5 [k8s] Support post install manifest URL
A new config option `post_install_manifest_url` is added to support
installing cloud provider/vendor specific manifest after booted
the k8s cluster. It's an URL pointing to the manifest file. For
example, cloud admin can set their specific storageclass into
this file, then it will be automatically setup after created
the cluster.

Task: 35798
Story: 2006209

Change-Id: Ib5a2c5cd7970085db941f189613e175f622aea3f
2020-03-05 20:30:12 +13:00
Bharat Kunwar 9565984fd9 Add cinder_csi_enabled label
Add support for out of tree Cinder CSI. This is installed when the
cinder_csi_enabled=true label is added. This will allow us to eventually
deprecate in-tree Cinder.

story: 2007048
task: 37868

Change-Id: I8305b9f8c9c37518ec39198693adb6f18542bf2e
Signed-off-by: Bharat Kunwar <brtknr@bath.edu>
2020-02-21 10:24:36 +00:00
Spyros Trigazis de21e0431a Add opt-in containerd support
New labels:
container_runtime, containerd or fallback to host-docker
containerd_version, taken from https://github.com/containerd/containerd/releases
containerd_tarball_url, eg https://storage.googleapis.com/cri-containerd-release/cri-containerd-1.2.4.linux-amd64.tar.gz
containerd_tarball_sha256, sha256 of the above tarball

story: 2007317
task: 38823

Change-Id: I6c6599cdee61f508bd2a5e4c454da3125a256753
Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>
2020-02-20 15:47:40 +00:00
Zuul 16ea8b6397 Merge "Fix api-cert-manager=true blocking cluster creation" 2020-02-03 17:53:15 +00:00
Diogo Guerra 1ecec95b8c Fix api-cert-manager=true blocking cluster creation
In the current release, cert-api-manager runs on kubecluster.yaml [1],
but in the kubemaster.yaml [2] the script [3] expects the existance of
the ca.key file (if the cert_api_manager_enabled=true), otherwise it gets blocked.
This file (ca.key), in turn, it's created only when enable-cert-api-manager.sh runs [4]

So, we have a dead lock...
So we need to change the call enable-cert-api-manager.sh into the kubemaster.yaml

[1] https://github.com/openstack/magnum/blob/master/magnum/drivers/k8s_fedora_atomic_v1/templates/kubecluster.yaml#L1158-L1161
[2] https://github.com/openstack/magnum/blob/master/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml#L760
[3] https://github.com/openstack/magnum/blob/master/magnum/drivers/common/templates/kubernetes/fragments/enable-services-master.sh#L12-L16
[4] https://github.com/openstack/magnum/blob/master/magnum/drivers/common/templates/kubernetes/fragments/enable-cert-api-manager.sh#L11

On other issue, the chown of this file (ca.key) it's not working. Moving the
call of this file into kubemaster.yaml makes cluster creation FAILS because of
an error [7] in [5]. If we check a cluster created in stein [6] we notice that
the file is owned by root:root. Knowing this we can comment [5] for now.

[5] https://github.com/openstack/magnum/blob/master/magnum/drivers/common/templates/kubernetes/fragments/enable-cert-api-manager.sh#L13
[6] http://paste.openstack.org/show/788534/
[7] http://paste.openstack.org/show/788537/

Change-Id: Ibee2df435c3f7c34bff74e9146fb28d8367124b1
Signed-off-by: Diogo Guerra <diogo.filipe.tomas.guerra@cern.ch>
2020-01-17 14:29:36 +01:00
Feilong Wang a0e62df093 [k8s] Fix volumes availability zone issue
For a multi AZ env, if Nova doesn't support cross AZ volume mount,
then the cluster creation may fail because of block device mapping
error. The patch fixes this issue by passing in the AZ information
when creating volumes for etcd, docker and the node root disk.

Task: 38131
Story: 2007097

Change-Id: I39c99259abc84cbbee50ac1a827e9349ede6593c
2020-01-16 12:41:26 +13:00
Diogo Guerra 355c71924b Add calico_ipv4pool_ipip label
IPIP Mode to use for the IPv4 POOL created at start up
allowed_values: ["Always", "CrossSubnet", "Never", "Off"]
default: "Off"

Change-Id: Ib834a1f86a6db408047cc8f86fc7744d16d83904
Signed-off-by: Diogo Guerra <diogo.filipe.tomas.guerra@cern.ch>
2020-01-09 14:22:23 +01:00
Feilong Wang ad2ef4962c Fix proxy issue for k8s fedora drivers
Due to the big changes recently to support k8s rolling upgrade, a
regression issue was introduced which is broken the proxy function
for image downloading. This patch fixes it for both fedor atomic
driver and fedora coreos driver.

Task: 37784
Story: 2007005

Change-Id: I11113d69629e1a97a58e5270f67c7404292b45c3
2019-12-20 09:40:00 +13:00
Diogo Guerra df52f9c9ea [k8s] Update metrics-server
Magnum allows to use CONTAINER_INFRA_PREFIX to specify a local
repository from which we can pull container images. This repository
defaults to the upstream one that is specified in the metrics helm
chart.

* This patch allows for the usage of CONTAINER_INFRA_PREFIX to
correctly configure the pull of the metric-server container image
from the specified repo.
* Add label metrics_server_chart_tag to allow user to specify
stable/metrics-server chart tag to use
* Add label metrics_server_enabled to allow enable/disable of
component (defaults: true)

Story: 2004816
Task: 37390

Change-Id: Idc315937a82317b76349bbe8466d900d00194953
Signed-off-by: Diogo Guerra <dy090.guerra@gmail.com>
2019-12-16 13:06:24 +01:00
Zuul 1af2826dd9 Merge "Add prometheus-adapter" 2019-12-11 14:17:30 +00:00
Bharat Kunwar 1ad4a9d0a0 [k8s] Add heapster_enabled label
Story: 2004816
Task: 37654

Change-Id: Icd7f380d87672c00257e34df385d81e1c3e36ddf
Signed-off-by: Diogo Guerra <dy090.guerra@gmail.com>
2019-12-11 11:40:47 +00:00
Diogo Guerra 354575804f Add prometheus-adapter
This will install the prometheus-adapter stable
helm chart. Requires monitoring_enabled=true.

The chart version can be configured using
prometheus_adapter_chart_tag and an option is
available to overwrite the default configuration
rules for a user defined ConfigMap referenced
by using prometheus_adapter_configmap label.

story: 2006765
task: 37278

Change-Id: I5b86f4455f88c8dbeac6e56942e1ca55f1d1726c
Signed-off-by: Diogo Guerra <diogo.filipe.tomas.guerra@cern.ch>
2019-12-10 13:54:39 +01:00
Bharat Kunwar 7d6e344f1a Add nginx_ingress_controller_chart_tag
Additioanlly, bumping up the Chart version to 1.24.7 without which the
ingress controller fails to deploy on 1.16.x.

Additionally, bump up nginx_ingress_controller_tag version to 0.26.1.
This is to ensure that we are running an up to date nginx ingress
controller with fixes for known CVEs.

Story: 2006853
Task: 37444

Change-Id: Ibf045a06d19b02095e19d9a21d14a91a39a3751c
2019-11-24 11:24:33 +00:00
Spyros Trigazis aa6b3bbeba k8s_fedora: Add use_podman label
Choose whether system containers etcd, kubernetes and the heat-agent will be
installed with podman or atomic. This label is relevant for k8s_fedora drivers.

k8s_fedora_atomic_v1 defaults to use_podman=false, meaning atomic will be used
pulling containers from docker.io/openstackmagnum. use_podman=true is accepted
as well, which will pull containers by k8s.gcr.io.

k8s_fedora_coreos_v1 defaults and accepts only use_podman=true.

Fix upgrade for k8s_fedora_coreos_v1 and magnum-cordon systemd unit.

Task: 37242
Story: 2005201

Change-Id: I0d5e4e059cd4f0458746df7c09d2fd47c389c6a0
Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>
2019-10-23 10:43:52 +00:00
Fei Long Wang 09f85f3746 [fedora-atomic][k8s] Support operating system upgrade
Along with the kubernetes version upgrade support we just released, we're
adding the support to upgrade the operating system of the k8s cluster
(including master and worker nodes). It's an inplace upgrade leveraging the
atomic/ostree upgrade capability.

Story: 2002210
Task: 33607

Change-Id: If6b9c054bbf5395c30e2803314e5695a531c22bc
2019-10-18 14:44:27 +00:00
Theodoros Tsioutsias 113fdc44b2 ng-12: Label nodegroup nodes
With this change each node will be labeled with the following:
* --node-labels=magnum.openstack.org/role=${NODEGROUP_ROLE}
* --node-labels=magnum.openstack.org/nodegroup=${NODEGROUP_NAME}

Change-Id: Ic410a059b19a1252cdf6eed786964c5c7b03d01c
2019-10-16 11:53:44 +00:00
Stanislav Dmitriev cd054f20ac Change the order of resource creation
Resource creation order in kubernetes templates for Fedora Atomic
was changed to avoid neutron bug https://bugs.launchpad.net/neutron/+bug/1845360
Floating IP should be assigned to network port after instance creation

Change-Id: Ib7e0503d475d7cd3164a116c3a0325c4ae417a0a
Story: 2006631
Task: 36844
2019-10-01 18:29:05 +00:00
Zuul 60d2485d83 Merge "[fedora atomic k8s] Add boot from volume support" 2019-09-20 11:21:33 +00:00
Zuul 83569e8394 Merge "calico: drop calico_cni_tag" 2019-09-20 11:08:53 +00:00
Mohammed Naser cfe2753fd3 [fedora atomic k8s] Add boot from volume support
Support boot from volume for Kubernetes all nodes (master and worker)
so that user can create a big size root volume, which could be more
flexible than using docker_volume_size. And user can specify the
volume type so that user can leverage high performance storage, e.g.
NVMe etc.

And a new label etcd_volme_type is added as well so that user can
set volume type for etcd volume.

If the boot_volume_type or etcd_volume_type are not passed by labels,
Magnum will try to read them from config option
default_boot_volume_type and default_etcd_volume_type. A random
volume type from Cinder will be used if those options are not set.

Task: 30374
Story: 2005386

Co-Authorized-By: Feilong Wang<flwang@catalyst.net.nz>

Change-Id: I39dd456bfa285bf06dd948d11c86867fc03d5afb
2019-09-20 05:00:29 +00:00
Bharat Kunwar e84cc4c975 Convert network UUID to name required for OCCM
Sometimes, the fixed_network value gets rendered as UUID. However OCCM's
internal-network-name requires the network name, it does not support
UUID. This patch introduces a new parameter called fixed_network_name
which converts fixed_network UUID to name if it is UUID-like.

Story: 2005333
Task: 36313

Change-Id: I3453bc0dbea285687d39c9782685cb1f2a3ecd39
2019-08-25 22:16:42 +00:00
Zuul 04fd0470ad Merge "k8s: stop introspecting instance name" 2019-08-08 19:50:58 +00:00
Mohammed Naser 2f2d05c826 k8s: stop introspecting instance name
We kept introspecting the name of the instance with the assumption
that the network always existed under .novalocal

This is not always the case, with certain variables changed inside
Neutron it is possible to control this, therefore, leading in failing
deploys.

With this change, we pass the instance name directly to the cluster
and therefore we always have the accurate name.

Task: 36160
Story: 2006371

Change-Id: I2ba32844b822ffc14da043e6ef7d071bb62a22ee
2019-08-07 21:24:06 +00:00
Zuul f1cf3d0b38 Merge "Support auto_healing_controller" 2019-08-06 08:40:25 +00:00
Bharat Kunwar 425fb0fa32 Add network config to stabilise multi-NIC scenario
When there is more than one NIC attached to an instance, openstack cloud
provider returns a random InternalIP back to the host resulting in instability
with API server which only talks to a default interface.

This patch incorporates the changes made in
https://github.com/kubernetes/cloud-provider-openstack/pull/444 which enables
OpenStack Cloud Controller Manager (OCCM) to respect the
`internal-network-name` in cloud-config file which ensures that InternalIP
remains stable.

Uses a separate cloud-config file for OCCM to ensure in-tree Cinder volumes
remain compatible.

Change-Id: Idfa52ed2d512e7dc383a556371e896205dd542f9
Story: 2005333
Task: 30271
2019-07-29 09:07:26 +00:00
Lingxian Kong 52155f0e76 Support auto_healing_controller
This patch allows the user to choose the auto-healing service by
introducing a new label 'auto_healing_controller', currently, 'draino'
and 'magnum-auto-healer'[1] are supported. 'draino' is the default value
for backward compatibility.

[1]: https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/using-magnum-auto-healer.md

Change-Id: I7ff14837a8d7d360b72c8f40733e84c88c4269d4
2019-07-24 17:52:33 +12:00
Zuul 1963fce81a Merge "Add npd_enabled label" 2019-07-10 00:35:48 +00:00
Diogo Guerra 41b83cef43 [k8s] Update prometheus monitoring helm based configuration
* prometheus-operator chart version upgraded from 0.1.31. to 5.12.3
* Fix an issue where when using Feature Gate Priority the scheduler
would evict the prometheus monitoring node-exporter pods
* Fix an issue where intensive CPU utilization would make the
metrics fail intermitently or completly fail
* Prometheus resources are now calculated based on the MAX_NODE_COUNT
requested
* Change the sampling rate from the standard 30s to 1 minute (Rollback)
* Add the missing tiller CONTAINER_INFRA_PREFIX variable to the ConfigMap
* Add label prometheus_operator_chart_tag to enable the user to
specify the stable/prometheus-operator chart to use
* Fix breaking changes on CoreDNS metrics introduced by
8fb27da2fc
* Fix Graphana dashboard not showing data.


Change-Id: If42873cd6668c07e4e911e4eef5e4ae2232be66f
Task: 30777
Task: 30779
Story: 2005588
Signed-off-by: Diogo Guerra <dy090.guerra@gmail.com>
2019-06-25 10:07:55 +00:00
Diogo Guerra 10a5996e32 Add npd_enabled label
Change-Id: Id3c5fdda6424d1a51f2e60ae26ca3069d93e00ee
Story: 2004782
Task: 34192
Signed-off-by: Diogo Guerra <dy090.guerra@gmail.com>
2019-06-20 19:01:42 +02:00
Mohammed Naser cd26be16c6 calico: drop calico_cni_tag
This variable was not being used anywhere so it was an extra
parameter that served no purpose.

Change-Id: I7ae84ab6683530d95a8bca51487558b381f9cef2
2019-06-18 16:36:22 -04:00
Feilong Wang 05c27f2d73 [k8s][fedora atomic] Rolling upgrade support
Rolling ugprade is an important feature for a managed k8s service,
at this stage, two user cases will be covered:

1. Upgrade base operating system
2. Upgrade k8s version

Known limitation: When doing operating system upgrade, there is no
chance to call kubectl drain to evict pods on that node.

Task: 30185
Story: 2002210

Change-Id: Ibbed59bc135969174a20e5243ff8464908801a23
2019-06-07 14:48:08 +12:00