Commit Graph

79 Commits

Author SHA1 Message Date
Michal Nasiadka ed699b0c9a Drop k8s_fedora_atomic_v1 driver
Change-Id: I3551ae244ecf99f67a9b142c964c020a5fae70a3
2024-02-27 16:35:35 +00:00
Jake Yip 679a174a0a Refix --registry-enabled
This fixes an issue with --registry-enabled that was previously fixed [1] but
somehow dropped after a refactoring [2]

[1] Change Ib93a7c0f761d047da3408703a5cf4208821acb33
[2] Change Ibbed59bc135969174a20e5243ff8464908801a23

Task: 41306
Story: 2008383
Change-Id: I76fedd34edec55f5a906a96672529ed15775f5da
2021-11-25 12:41:18 +00:00
Bharat Kunwar fc1f27a569 Support hyperkube_prefix label
Additionally for k8s_fedora_coreos_v1 driver:
* Introduce hyperkube_prefix which defaults to k8s.gcr.io/
* Bump default kube_tag to v1.18.16

Story: 1668998
Task: 41791

Change-Id: I38b8df45a00f1a2a1604059b8329d1dd762e05cd
2021-02-18 13:18:56 +00:00
Mohammed Naser 2c63aca8c6 Stop using delete_on_termination for BFV instances
When using delete_on_termination and the booting of the instance fails
on the first attempt, the second attempt will fail with Heat.  The
reason is that with delete_on_termination set to True, Nova will delete
the volume when Heat deletes the ERROR'd instance and it will then
result in the follow-up boot to fail with an error along the line of
unable to find volume, which masks the real failure from the user (which
could potentialy be aquota issue).

With this patch, we no longer set this and instead use the default of
false.  This will not mean we will leak volumes because when we delete
the stack, Heat will do all the right things and delete them in order,
making sure the volume disappears eventually.

Change-Id: I362cea7bf57825035d13d234d0181a2b1fca5743
2020-08-26 20:53:06 -04:00
Bharat Kunwar 799563eb61 Remove shebang from scripts
Without this, heat container agents using  kubectl version
1.18.x (e.g. ussuri-dev) fail because they do not have the correct
KUBECONFIG in the environment.

Task: 39938
Story: 2007591

Change-Id: Ifc212478ae09c658adeb6ba4c8e8afc8943e3977
2020-06-16 20:53:07 +00:00
Spyros Trigazis 40f40b7772 k8s: Use the same kubectl version as API
In the heat-agent we use kubectl to install
several deployments, it is better if we use
matching versions of kubectl and apiserver
to minimize errors. Additionally, the
heat-agent won't need kubectl anymore.

story: 2007591
task: 39536

Change-Id: If8f6d84efc70606ac0d888c084c82d8c7eff54f8
Signed-off-by: Spyros Trigazis <strigazi@gmail.com>
2020-04-24 17:11:13 +00:00
Bharat Kunwar fd80e1989f Add selinux_mode label
Fedora Atomic default: permissive
Fedora CoreOS default: enforcing

Story: 2007413
Task: 39033

Change-Id: Ibc1e02098155ac95bb35fcea5f21cc380bdf0d03
Signed-off-by: Bharat Kunwar <brtknr@bath.edu>
2020-03-28 17:57:25 +00:00
Spyros Trigazis de21e0431a Add opt-in containerd support
New labels:
container_runtime, containerd or fallback to host-docker
containerd_version, taken from https://github.com/containerd/containerd/releases
containerd_tarball_url, eg https://storage.googleapis.com/cri-containerd-release/cri-containerd-1.2.4.linux-amd64.tar.gz
containerd_tarball_sha256, sha256 of the above tarball

story: 2007317
task: 38823

Change-Id: I6c6599cdee61f508bd2a5e4c454da3125a256753
Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>
2020-02-20 15:47:40 +00:00
Feilong Wang a0e62df093 [k8s] Fix volumes availability zone issue
For a multi AZ env, if Nova doesn't support cross AZ volume mount,
then the cluster creation may fail because of block device mapping
error. The patch fixes this issue by passing in the AZ information
when creating volumes for etcd, docker and the node root disk.

Task: 38131
Story: 2007097

Change-Id: I39c99259abc84cbbee50ac1a827e9349ede6593c
2020-01-16 12:41:26 +13:00
Feilong Wang ad2ef4962c Fix proxy issue for k8s fedora drivers
Due to the big changes recently to support k8s rolling upgrade, a
regression issue was introduced which is broken the proxy function
for image downloading. This patch fixes it for both fedor atomic
driver and fedora coreos driver.

Task: 37784
Story: 2007005

Change-Id: I11113d69629e1a97a58e5270f67c7404292b45c3
2019-12-20 09:40:00 +13:00
Spyros Trigazis aa6b3bbeba k8s_fedora: Add use_podman label
Choose whether system containers etcd, kubernetes and the heat-agent will be
installed with podman or atomic. This label is relevant for k8s_fedora drivers.

k8s_fedora_atomic_v1 defaults to use_podman=false, meaning atomic will be used
pulling containers from docker.io/openstackmagnum. use_podman=true is accepted
as well, which will pull containers by k8s.gcr.io.

k8s_fedora_coreos_v1 defaults and accepts only use_podman=true.

Fix upgrade for k8s_fedora_coreos_v1 and magnum-cordon systemd unit.

Task: 37242
Story: 2005201

Change-Id: I0d5e4e059cd4f0458746df7c09d2fd47c389c6a0
Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>
2019-10-23 10:43:52 +00:00
Fei Long Wang 09f85f3746 [fedora-atomic][k8s] Support operating system upgrade
Along with the kubernetes version upgrade support we just released, we're
adding the support to upgrade the operating system of the k8s cluster
(including master and worker nodes). It's an inplace upgrade leveraging the
atomic/ostree upgrade capability.

Story: 2002210
Task: 33607

Change-Id: If6b9c054bbf5395c30e2803314e5695a531c22bc
2019-10-18 14:44:27 +00:00
Theodoros Tsioutsias 113fdc44b2 ng-12: Label nodegroup nodes
With this change each node will be labeled with the following:
* --node-labels=magnum.openstack.org/role=${NODEGROUP_ROLE}
* --node-labels=magnum.openstack.org/nodegroup=${NODEGROUP_NAME}

Change-Id: Ic410a059b19a1252cdf6eed786964c5c7b03d01c
2019-10-16 11:53:44 +00:00
Stanislav Dmitriev cd054f20ac Change the order of resource creation
Resource creation order in kubernetes templates for Fedora Atomic
was changed to avoid neutron bug https://bugs.launchpad.net/neutron/+bug/1845360
Floating IP should be assigned to network port after instance creation

Change-Id: Ib7e0503d475d7cd3164a116c3a0325c4ae417a0a
Story: 2006631
Task: 36844
2019-10-01 18:29:05 +00:00
Mohammed Naser cfe2753fd3 [fedora atomic k8s] Add boot from volume support
Support boot from volume for Kubernetes all nodes (master and worker)
so that user can create a big size root volume, which could be more
flexible than using docker_volume_size. And user can specify the
volume type so that user can leverage high performance storage, e.g.
NVMe etc.

And a new label etcd_volme_type is added as well so that user can
set volume type for etcd volume.

If the boot_volume_type or etcd_volume_type are not passed by labels,
Magnum will try to read them from config option
default_boot_volume_type and default_etcd_volume_type. A random
volume type from Cinder will be used if those options are not set.

Task: 30374
Story: 2005386

Co-Authorized-By: Feilong Wang<flwang@catalyst.net.nz>

Change-Id: I39dd456bfa285bf06dd948d11c86867fc03d5afb
2019-09-20 05:00:29 +00:00
Zuul 04fd0470ad Merge "k8s: stop introspecting instance name" 2019-08-08 19:50:58 +00:00
Mohammed Naser 2f2d05c826 k8s: stop introspecting instance name
We kept introspecting the name of the instance with the assumption
that the network always existed under .novalocal

This is not always the case, with certain variables changed inside
Neutron it is possible to control this, therefore, leading in failing
deploys.

With this change, we pass the instance name directly to the cluster
and therefore we always have the accurate name.

Task: 36160
Story: 2006371

Change-Id: I2ba32844b822ffc14da043e6ef7d071bb62a22ee
2019-08-07 21:24:06 +00:00
Lingxian Kong 52155f0e76 Support auto_healing_controller
This patch allows the user to choose the auto-healing service by
introducing a new label 'auto_healing_controller', currently, 'draino'
and 'magnum-auto-healer'[1] are supported. 'draino' is the default value
for backward compatibility.

[1]: https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/using-magnum-auto-healer.md

Change-Id: I7ff14837a8d7d360b72c8f40733e84c88c4269d4
2019-07-24 17:52:33 +12:00
Diogo Guerra 10a5996e32 Add npd_enabled label
Change-Id: Id3c5fdda6424d1a51f2e60ae26ca3069d93e00ee
Story: 2004782
Task: 34192
Signed-off-by: Diogo Guerra <dy090.guerra@gmail.com>
2019-06-20 19:01:42 +02:00
Feilong Wang 05c27f2d73 [k8s][fedora atomic] Rolling upgrade support
Rolling ugprade is an important feature for a managed k8s service,
at this stage, two user cases will be covered:

1. Upgrade base operating system
2. Upgrade k8s version

Known limitation: When doing operating system upgrade, there is no
chance to call kubectl drain to evict pods on that node.

Task: 30185
Story: 2002210

Change-Id: Ibbed59bc135969174a20e5243ff8464908801a23
2019-06-07 14:48:08 +12:00
Zuul 4bd3d1cd8c Merge "Fix registry on k8s_fedora_atomic" 2019-04-17 08:48:28 +00:00
Feilong Wang 75fab6ff37 [fedora_atomic] Support auto healing for k8s
Using Node Problem Detector, Draino and AutoScaler to support
auto healing for K8s cluster, user can use a new label
"auto_healing_enabled' to turn on/off it.

Meanwhile, a new label "auto_scaling_enabled" is also introduced
to enable the capability to let the k8s cluster auto scale based
its workload.

Task: 28923
Story: 2004782

Change-Id: I25af2a72a7a960205929374d2300bd83d4d20960
2019-04-17 14:47:39 +12:00
Adolfo R. Brandes 00522c5ba2 Fix registry on k8s_fedora_atomic
This fixes an issue with --registry-enabled in k8s_fedora_atomic where
the registry container fails to start in the minion due to two missing
heat parameters: TRUSTEE_USERNAME and TRUSTEE_DOMAIN_ID.

Change-Id: Ib93a7c0f761d047da3408703a5cf4208821acb33
Task: 23067
Story: 2003033
2019-04-12 11:42:43 -03:00
Spyros Trigazis 2ab874a5be [k8s] Make flannel self-hosted
Similar to calico, deploy flannel as a DS.
Flannel can use the kubernetes API to store
data, so it doesn't need to contact the etcd
server directly anymore.

This patch drops to relatively large files for
flannel's config, flannel-config-service.sh and
write-flannel-config.sh. All required config is
in the manifests.

Additional options to the controller manager:
--allocate-node-cidrs=true and --cluster-cidr.

Change-Id: I4f1129e155e2602299394b5866165260f4ea0df8
story: 2002751
task: 24870
2019-03-05 18:33:45 +01:00
Feilong Wang 20d03919fb Return instance ID of worker node
Return the nova instance UUID of worker nodes in kubeminion
templates. We will be able to remove resources from the
ResourceGroups based on nova instance uuid.

Backstory:
In heat a ResourceGroup creates a stack of depth 2. ResourceGroups
support removal policies to declare which resources must be removed.
This can be done by passing the index of the resource or the stack_id
of the nested stack. If a stack update call receives a list of
indices (eg [0, 5, 3]) or nested stack uuid (eg [uuidA, uuidB]), it
will remove the corresponding nested stacks.

In magnum's heat templates, a nested stack logically represents a
nova compute instance which is a cluster node. Using composition in
heat, we can change the way a resources group references the nested
stacks. This proposes to use the nova instance uuid as
'OS::stack_id'.

With this change, an external consumer of the stack (the cluster
autoscaler or an actual user) can remove resources from the
ResourceGroup using the nova instance uuid or resource index. Without
this change, a user or system  (which typically knows the name,
server uuid or ip) would have to find in which nested stack a
kubernetes node belongs too.  Resulting multiple call to heat.

The end result of this patch can be verified like this:
nested_stack_id=$(openstack stack resource show <STACK_ID_OR_NAME> kube_minions -c physical_resource_id -f value)
openstack stack show "${nested_stack_id}"

Task: 29664
Story: 2005054

Change-Id: I6d776f62d640c72b3228460392b92df94fe56fe6
2019-02-27 10:46:41 +01:00
Spyros Trigazis b2a6a7715a [k8s_fedora] Add heat-agent to worker nodes
Start/Install heat agent in worker nodes.

task: 29140
story: 2002210
Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>

Change-Id: If39d0dff3432ba132b8b56eb21b5aae80ba52450
2019-02-13 09:36:33 +00:00
Feilong Wang b6936894c4 Fix prometheus monitoring
There are 2 changes included in this patch:

1. Using cluster ip instead of fixed ip for grafana service to
make sure the address is reachable.

2. Move node exporter to prometheus-monitoring namespace and
make it as a DaemonSet to collect metrics from master node.

Task: 28468
Story: 2004590

Change-Id: I9090c6dc4b38e1a1466c4c3a6a827d95c089fb41
2019-01-17 11:10:54 +13:00
Spyros Trigazis c98e9525c7 Add heat_container_agent_tag label
Add heat_container_agent_tag label to allow users select the
heat-agent tag. Stein default: stein-dev

story: 2003992
task: 26936

Change-Id: I6a8d8dbb2ec7bd4b7d01fa7cd790a8966ea88f73
Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>
2018-10-24 10:40:55 +02:00
Spyros Trigazis 4f121e50c5 [k8s] Add proxy to master and set cluster-cidr
1. pods with host network can not reach coredns or any svc or resolve
their own hostname
2. If webhooks are deployed in the cluster, the apiserver needs to
contact them, which means kube-proxy is required in the master node with
the cluster-cidr set.

Change-Id: Icb8e7c3b8c75a3ab087c818c8580c0c8a9111d30
story: 2003460
task: 24719
2018-08-17 09:54:56 +02:00
Feilong Wang feed29d7ed Using cgroupfs as default cgroup-driver
This is a part of fixes for k8s v1.11.1 recently we're doing. When
testing the k8s v1.11.1, we just found some small but annoying issues:

1. cgroup-driver with systemd not working well with Fedora Atomic, so
   we're going to use cgroupfs as the default cgroup-driver.
2. The $ char need to be escaped wc-notify-master.sh

Task: 23223
Story: 2003103

Change-Id: I995f5b82abadfdb7f78f7c098ac7a7f1e5c34fd3
2018-08-08 09:27:33 +00:00
Spyros Trigazis 974399a912 k8s_fedora: Add cloud_provider_enabled label
Add 'cloud_provider_enabled' label for the k8s_fedora_atomic
driver. Defaults to true. For specific kubernetes versions if
'cinder' is selected as a 'volume_driver', it is implied that
the cloud provider will be enabled since they are combined.

The motivation for this change is that in environments with
high load to the OpenStack APIs, users might want to disable
the cloud provider.

story: 1775358
task: 1775358

Change-Id: I2920f699654af1f4ba45644ab60a04a3f70918fe
2018-07-13 09:39:08 +02:00
Zuul 9375dc2ae5 Merge "Rename scripts" 2018-07-10 13:48:36 +00:00
Feilong Wang cff4823168 Rename scripts
Scripts are the core of Magnum for COE deployment. To be more
clear and consistent, two changes proposed in this patch:

1. Rename network related script to xxx-flannel-xxx given they
are all for flannel and now we have calico driver.

2. Adding .sh for some scripts to be consistent with others.

Change-Id: I97f3e53b4b43648a4896193fb4ce469dbf42c611
2018-07-10 06:02:20 +12:00
Zuul 3d136642b5 Merge "Revert "Rename scripts"" 2018-07-05 14:40:49 +00:00
Spyros Trigazis 97f086c19f Revert "Rename scripts"
This reverts commit 591a2dc94a.

Change-Id: I38cd4b2d745b811f83480cd298ceadb86898cdf0
2018-07-05 07:57:46 +00:00
Zuul 1eb1f35a75 Merge "Add option to specify Cgroup driver for Kubelet" 2018-06-28 07:49:39 +00:00
Feilong Wang 591a2dc94a Rename scripts
Scripts are the core of Magnum for COE deployment. To be more
clear and consistent, two changes proposed in this patch:

1. Rename network related script to xxx-flannel-xxx given they
are all for flannel and now we have calico driver.

2. Adding .sh for some scripts to be consistent with others.

Change-Id: I1a8dfe21d4ff0c58f7f52ebea05c9b22dff16bf0
2018-06-27 13:40:30 +12:00
Bharat Kunwar ec58c23361 Add option to specify Cgroup driver for Kubelet
This patch allows specification of Cgroup driver for Kubelet service.
The necessity of this patch was realised after upgrading Docker to the
new community edition (17.3+) which defaults to  `cgroupfs` Cgroup
driver but on the other hand, Fedora Atomic (version 27) comes with
1.13. Cgroup drivers for Docker need to be identical for the two
services, Docker and Kubelet, need to be able to work together.

Story: 2002533
Task: 22079
Change-Id: Ia4b38a63ede59e18c8edb01e93acbb66f1e0b0e4
2018-06-12 12:31:14 +01:00
Lingxian Kong 2cc57c5386 Use Octavia for LoadBalancer type service
In the OpenStack deployment with Octavia service enabled, the octavia
service should be used not only for master nodes high availability, but
also for k8s LoadBalancer type service implementation as well.

Change-Id: Ib61f59507510253794a4780a91e49aa6682c8039
Closes-Bug: #1770133
2018-05-30 15:36:24 +12:00
Zuul 095b0146bb Merge "k8s: allow passing extra options to kube daemons" 2018-02-22 19:43:45 +00:00
Ricardo Rocha 4efb58b28d k8s: allow passing extra options to kube daemons
Define a set of new labels to pass additional options to the kubernetes
daemons - kubelet_options, kubeapi_options, kubescheduler_options,
kubecontroller_options, kubeproxy_options.

In all cases the default value is "", meaning no extra options are
passed to the daemons.

Change-Id: Idabe33b1365c7530edc53d1a81dee3c857a4ea47
Closes-Bug: #1701223
2018-02-22 15:54:46 +00:00
Spyros Trigazis d95ba4d1ff Run etcd and flanneld in a system container
In Fedora Atomic 27 etcd and flanneld are removed from the base image.
Install them as a system containers.

* update docker-storage configuration
* add etcd and flannel tags as labels

Change-Id: I2103c7c3d50f4b68ddc11abff72bc9e3f22839f3
Closes-Bug: #1735381
2018-02-22 12:30:27 +00:00
Feilong Wang 838b8daf6e Support calico as network driver
Adding calico as Kubernetes network driver to support network
policy of Kubernetes. Network policy is a very important feature
for k8s production use. See more information about k8s network
policy at [1] and [2], as for calico please refer [3] and [4].

[1] https://kubernetes.io/docs/concepts/services-networking/network-policies/
[2] http://blog.kubernetes.io/2017/10/enforcing-network-policies-in-kubernetes.html
[3] https://www.projectcalico.org/calico-network-policy-comes-to-kubernetes/
[4] https://cloudplatform.googleblog.com/2017/09/network-policy-support-for-kubernetes-with-calico.html

Closes-Bug: #1746379

Change-Id: I135a46cd32a67d73d8e64ac5bbc4debfae6c4568
2018-02-21 14:47:54 +13:00
Ricardo Rocha 53d386dc01 Add label availability_zone
Add a new label 'availability_zone' allowing users to specify the AZ
the nodes should be deployed in. Only one AZ can be passed for this
first implementation.

Change-Id: I9e55d7631191fffa6cc6b9bebbeb4faf2497815b
Partially-Implements: blueprint magnum-availability-zones
2018-02-05 15:03:59 +00:00
Feilong Wang be0609ce88 Support soft-anti-affinity policy for nodes
Currently, there is no guarantee to make sure all nodes of one cluster are
created on different compute hosts. So it would be nice if we can create
a server group and set it with anti-affinity policy to get a better HA
for cluster. This patch is proposing to create a server group for master
and minion nodes with soft-anti-affinity policy by default.

Closes-Bug: #1737802

Change-Id: Icc7a73ef55296a58bf00719ca4d1cdcc304fab86
2018-01-24 07:13:48 +13:00
Spyros Trigazis 65dfb2009f Add openstack_ca_file configuration option
In the drivers section of magnum.conf add openstack_ca_file.
This file is expected to be a CA Certificate OR CA bundle
which will be passed on every node and it will be installed
on the host's CA bundle.

Update devstack plugin to use the ssl bundle if tls-proxy is
enabled.

Install the CA for drivers:
k8s_coreos_v1
k8s_fedora_atomic_v1
k8s_fedora_ironic_v1
mesos_ubuntu_v1
swarm_fedora_atomic_v1
swarm_fedora_atomic_v2

Add doc in troubleshooting-guide.

Add release notes.

Closes-Bug: #1580704
Partially-Implements: blueprint heat-agent
Change-Id: Id48fbea187da667a5e7334694c3ec17c8e2504db
2018-01-17 14:58:56 +00:00
yatin 192dc8b1fb [k8s] Add missing verify_ca in minion_wc_notify
Change-Id: I1db23b88097fae77377cce5c56e176e9296f76a2
Partial-Bug: #1663757
2018-01-16 10:54:27 +00:00
Ricardo Rocha 28fff8006a Make docker_storage_driver a str instead of enum
Allow any value to be passed on the docker_storage_driver field by turning it
into a StringField (was EnumField), and remove the constraints limiting the
values to 'devicemapper' and 'overlay'.

Change the docker storage setup to have a generic setup for all drivers with
the exception of 'devicemapper', which keeps its own specific storage config
function. For all others, do the same we already did for overlay (with two
cases for usage of a cinder volume or not) and simply set the storage driver
in the docker configuration to the value provided in the cluster template.

Change-Id: I9aa8f232ce64ece4d439c0a476f463820a499617
Closes-Bug: #1722522
2017-12-14 14:41:09 +00:00
Zuul 86bd89bc43 Merge "k8s_atomic: Add server to kubeconfig" 2017-11-24 09:34:01 +00:00
Kirsten G b07b6f34d5 Add verify_ca configuration parameter
Added configuration parameter, verify_ca, to magnum.conf with default
value of True. This parameter is passed to the heat templates to
indicate whether the cluster nodes validate the Certificate Authority
when making requests to the OpenStack APIs (Keystone, Magnum, Heat).
This configuration parameter can be set to False to disable CA
validation.

Co-Authored-By: Vijendar Komalla <vijendar.komalla@rackspace.com>

Change-Id: Iab02cb1338b811dac0c147378dbd0e63c83f0413
Partial-Bug: #1663757
2017-11-21 10:25:32 -08:00