* Add monitoring_ingress_enabled magnum label to set up ingress with
path based routing for all the configured services
{alertmanager,grafana,prometheus}. When using this,
cluster_root_domain_name magnum label must be used to setup base path
where this services are available.
* Add cluster_basic_auth_secret magnum label to configure basic auth
on unprotected services {alertmanager and prometheus}. This is only
in effect when app access is routed by ingress.
* Set services logFormat to json to enable easier machine log parsing.
task: 39477
story: 2006765
Depends-On: Ieb90605182626869528349a7fdeed65061914bcb
Change-Id: Ie0e7000e0d94b2037f2c398fa67a2a2b7e256bc3
Signed-off-by: Diogo Guerra <diogo.filipe.tomas.guerra@cern.ch>
* Add metrics_retention_days magnum label allowing user to specify
prometheus server scraped metrics retention days (default: 14)
* Add metrics_retention_size magnum label allowing user to specify
prometheus server metrics storage maximum size in Gib (default: 14)
* Add metrics_scrape_interval allowing user to specify prometheus
scrape frequency in seconds (default: 30)
* Add metrics_storage_class_name allowing user to specify the
storageClass to use as external retention for pod fail-over data
persistency
task: 39509
story: 2006765
Change-Id: I42117837e8e3cd03f3cb723df4d73692ead0d169
Signed-off-by: Diogo Guerra <diogo.filipe.tomas.guerra@cern.ch>
Create certificates for kubelet and kube-proxy on control-plane
nodes similar to worker nodes. Use the secure kube-apiserver
port on control-plane nodes.
story: 2008524
task: 41602
Change-Id: Ibeb32a24ca25914cab32c63a9ccafaf711148a84
Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>
When using delete_on_termination and the booting of the instance fails
on the first attempt, the second attempt will fail with Heat. The
reason is that with delete_on_termination set to True, Nova will delete
the volume when Heat deletes the ERROR'd instance and it will then
result in the follow-up boot to fail with an error along the line of
unable to find volume, which masks the real failure from the user (which
could potentialy be aquota issue).
With this patch, we no longer set this and instead use the default of
false. This will not mean we will leak volumes because when we delete
the stack, Heat will do all the right things and delete them in order,
making sure the volume disappears eventually.
Change-Id: I362cea7bf57825035d13d234d0181a2b1fca5743
Without this, heat container agents using kubectl version
1.18.x (e.g. ussuri-dev) fail because they do not have the correct
KUBECONFIG in the environment.
Task: 39938
Story: 2007591
Change-Id: Ifc212478ae09c658adeb6ba4c8e8afc8943e3977
- Refactor helm installer to use a single meta chart install job
install job and config which use Helm v3 client.
- Use upstream helm client binary instead of using helm-client container
maintained by us. To verify checksum, helm_client_sha256 label is
introduced for helm_client_tag (or alternatively for URL specified
using new helm_client_url label).
- Default helm_client_tag=v3.2.1.
- Default tiller_tag=v2.16.7, tiller_enabled=false.
Story: 2007514
Task: 39295
Change-Id: I9b9633c81afb08b91576a9a4d3c5a0c445e0cee4
In the heat-agent we use kubectl to install
several deployments, it is better if we use
matching versions of kubectl and apiserver
to minimize errors. Additionally, the
heat-agent won't need kubectl anymore.
story: 2007591
task: 39536
Change-Id: If8f6d84efc70606ac0d888c084c82d8c7eff54f8
Signed-off-by: Spyros Trigazis <strigazi@gmail.com>
Heapster has been deprecated for a while and the new k8s dashboard
2.0.0 version supports metrics-server now. So it's time to upgrade
the default k8s dashboard to v2.0.0.
Task: 39101
Story: 2007256
Change-Id: I02f8cb77b472142f42ecc59a339555e60f5f38d0
A new config option `post_install_manifest_url` is added to support
installing cloud provider/vendor specific manifest after booted
the k8s cluster. It's an URL pointing to the manifest file. For
example, cloud admin can set their specific storageclass into
this file, then it will be automatically setup after created
the cluster.
Task: 35798
Story: 2006209
Change-Id: Ib5a2c5cd7970085db941f189613e175f622aea3f
Add support for out of tree Cinder CSI. This is installed when the
cinder_csi_enabled=true label is added. This will allow us to eventually
deprecate in-tree Cinder.
story: 2007048
task: 37868
Change-Id: I8305b9f8c9c37518ec39198693adb6f18542bf2e
Signed-off-by: Bharat Kunwar <brtknr@bath.edu>
For a multi AZ env, if Nova doesn't support cross AZ volume mount,
then the cluster creation may fail because of block device mapping
error. The patch fixes this issue by passing in the AZ information
when creating volumes for etcd, docker and the node root disk.
Task: 38131
Story: 2007097
Change-Id: I39c99259abc84cbbee50ac1a827e9349ede6593c
IPIP Mode to use for the IPv4 POOL created at start up
allowed_values: ["Always", "CrossSubnet", "Never", "Off"]
default: "Off"
Change-Id: Ib834a1f86a6db408047cc8f86fc7744d16d83904
Signed-off-by: Diogo Guerra <diogo.filipe.tomas.guerra@cern.ch>
Due to the big changes recently to support k8s rolling upgrade, a
regression issue was introduced which is broken the proxy function
for image downloading. This patch fixes it for both fedor atomic
driver and fedora coreos driver.
Task: 37784
Story: 2007005
Change-Id: I11113d69629e1a97a58e5270f67c7404292b45c3
Magnum allows to use CONTAINER_INFRA_PREFIX to specify a local
repository from which we can pull container images. This repository
defaults to the upstream one that is specified in the metrics helm
chart.
* This patch allows for the usage of CONTAINER_INFRA_PREFIX to
correctly configure the pull of the metric-server container image
from the specified repo.
* Add label metrics_server_chart_tag to allow user to specify
stable/metrics-server chart tag to use
* Add label metrics_server_enabled to allow enable/disable of
component (defaults: true)
Story: 2004816
Task: 37390
Change-Id: Idc315937a82317b76349bbe8466d900d00194953
Signed-off-by: Diogo Guerra <dy090.guerra@gmail.com>
This will install the prometheus-adapter stable
helm chart. Requires monitoring_enabled=true.
The chart version can be configured using
prometheus_adapter_chart_tag and an option is
available to overwrite the default configuration
rules for a user defined ConfigMap referenced
by using prometheus_adapter_configmap label.
story: 2006765
task: 37278
Change-Id: I5b86f4455f88c8dbeac6e56942e1ca55f1d1726c
Signed-off-by: Diogo Guerra <diogo.filipe.tomas.guerra@cern.ch>
Additioanlly, bumping up the Chart version to 1.24.7 without which the
ingress controller fails to deploy on 1.16.x.
Additionally, bump up nginx_ingress_controller_tag version to 0.26.1.
This is to ensure that we are running an up to date nginx ingress
controller with fixes for known CVEs.
Story: 2006853
Task: 37444
Change-Id: Ibf045a06d19b02095e19d9a21d14a91a39a3751c
Choose whether system containers etcd, kubernetes and the heat-agent will be
installed with podman or atomic. This label is relevant for k8s_fedora drivers.
k8s_fedora_atomic_v1 defaults to use_podman=false, meaning atomic will be used
pulling containers from docker.io/openstackmagnum. use_podman=true is accepted
as well, which will pull containers by k8s.gcr.io.
k8s_fedora_coreos_v1 defaults and accepts only use_podman=true.
Fix upgrade for k8s_fedora_coreos_v1 and magnum-cordon systemd unit.
Task: 37242
Story: 2005201
Change-Id: I0d5e4e059cd4f0458746df7c09d2fd47c389c6a0
Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>
Along with the kubernetes version upgrade support we just released, we're
adding the support to upgrade the operating system of the k8s cluster
(including master and worker nodes). It's an inplace upgrade leveraging the
atomic/ostree upgrade capability.
Story: 2002210
Task: 33607
Change-Id: If6b9c054bbf5395c30e2803314e5695a531c22bc
With this change each node will be labeled with the following:
* --node-labels=magnum.openstack.org/role=${NODEGROUP_ROLE}
* --node-labels=magnum.openstack.org/nodegroup=${NODEGROUP_NAME}
Change-Id: Ic410a059b19a1252cdf6eed786964c5c7b03d01c
Resource creation order in kubernetes templates for Fedora Atomic
was changed to avoid neutron bug https://bugs.launchpad.net/neutron/+bug/1845360
Floating IP should be assigned to network port after instance creation
Change-Id: Ib7e0503d475d7cd3164a116c3a0325c4ae417a0a
Story: 2006631
Task: 36844
Support boot from volume for Kubernetes all nodes (master and worker)
so that user can create a big size root volume, which could be more
flexible than using docker_volume_size. And user can specify the
volume type so that user can leverage high performance storage, e.g.
NVMe etc.
And a new label etcd_volme_type is added as well so that user can
set volume type for etcd volume.
If the boot_volume_type or etcd_volume_type are not passed by labels,
Magnum will try to read them from config option
default_boot_volume_type and default_etcd_volume_type. A random
volume type from Cinder will be used if those options are not set.
Task: 30374
Story: 2005386
Co-Authorized-By: Feilong Wang<flwang@catalyst.net.nz>
Change-Id: I39dd456bfa285bf06dd948d11c86867fc03d5afb
Sometimes, the fixed_network value gets rendered as UUID. However OCCM's
internal-network-name requires the network name, it does not support
UUID. This patch introduces a new parameter called fixed_network_name
which converts fixed_network UUID to name if it is UUID-like.
Story: 2005333
Task: 36313
Change-Id: I3453bc0dbea285687d39c9782685cb1f2a3ecd39
We kept introspecting the name of the instance with the assumption
that the network always existed under .novalocal
This is not always the case, with certain variables changed inside
Neutron it is possible to control this, therefore, leading in failing
deploys.
With this change, we pass the instance name directly to the cluster
and therefore we always have the accurate name.
Task: 36160
Story: 2006371
Change-Id: I2ba32844b822ffc14da043e6ef7d071bb62a22ee
When there is more than one NIC attached to an instance, openstack cloud
provider returns a random InternalIP back to the host resulting in instability
with API server which only talks to a default interface.
This patch incorporates the changes made in
https://github.com/kubernetes/cloud-provider-openstack/pull/444 which enables
OpenStack Cloud Controller Manager (OCCM) to respect the
`internal-network-name` in cloud-config file which ensures that InternalIP
remains stable.
Uses a separate cloud-config file for OCCM to ensure in-tree Cinder volumes
remain compatible.
Change-Id: Idfa52ed2d512e7dc383a556371e896205dd542f9
Story: 2005333
Task: 30271
* prometheus-operator chart version upgraded from 0.1.31. to 5.12.3
* Fix an issue where when using Feature Gate Priority the scheduler
would evict the prometheus monitoring node-exporter pods
* Fix an issue where intensive CPU utilization would make the
metrics fail intermitently or completly fail
* Prometheus resources are now calculated based on the MAX_NODE_COUNT
requested
* Change the sampling rate from the standard 30s to 1 minute (Rollback)
* Add the missing tiller CONTAINER_INFRA_PREFIX variable to the ConfigMap
* Add label prometheus_operator_chart_tag to enable the user to
specify the stable/prometheus-operator chart to use
* Fix breaking changes on CoreDNS metrics introduced by
8fb27da2fc
* Fix Graphana dashboard not showing data.
Change-Id: If42873cd6668c07e4e911e4eef5e4ae2232be66f
Task: 30777
Task: 30779
Story: 2005588
Signed-off-by: Diogo Guerra <dy090.guerra@gmail.com>
Rolling ugprade is an important feature for a managed k8s service,
at this stage, two user cases will be covered:
1. Upgrade base operating system
2. Upgrade k8s version
Known limitation: When doing operating system upgrade, there is no
chance to call kubectl drain to evict pods on that node.
Task: 30185
Story: 2002210
Change-Id: Ibbed59bc135969174a20e5243ff8464908801a23