This fixes an issue with --registry-enabled that was previously fixed [1] but
somehow dropped after a refactoring [2]
[1] Change Ib93a7c0f761d047da3408703a5cf4208821acb33
[2] Change Ibbed59bc135969174a20e5243ff8464908801a23
Task: 41306
Story: 2008383
Change-Id: I76fedd34edec55f5a906a96672529ed15775f5da
When using delete_on_termination and the booting of the instance fails
on the first attempt, the second attempt will fail with Heat. The
reason is that with delete_on_termination set to True, Nova will delete
the volume when Heat deletes the ERROR'd instance and it will then
result in the follow-up boot to fail with an error along the line of
unable to find volume, which masks the real failure from the user (which
could potentialy be aquota issue).
With this patch, we no longer set this and instead use the default of
false. This will not mean we will leak volumes because when we delete
the stack, Heat will do all the right things and delete them in order,
making sure the volume disappears eventually.
Change-Id: I362cea7bf57825035d13d234d0181a2b1fca5743
Without this, heat container agents using kubectl version
1.18.x (e.g. ussuri-dev) fail because they do not have the correct
KUBECONFIG in the environment.
Task: 39938
Story: 2007591
Change-Id: Ifc212478ae09c658adeb6ba4c8e8afc8943e3977
In the heat-agent we use kubectl to install
several deployments, it is better if we use
matching versions of kubectl and apiserver
to minimize errors. Additionally, the
heat-agent won't need kubectl anymore.
story: 2007591
task: 39536
Change-Id: If8f6d84efc70606ac0d888c084c82d8c7eff54f8
Signed-off-by: Spyros Trigazis <strigazi@gmail.com>
For a multi AZ env, if Nova doesn't support cross AZ volume mount,
then the cluster creation may fail because of block device mapping
error. The patch fixes this issue by passing in the AZ information
when creating volumes for etcd, docker and the node root disk.
Task: 38131
Story: 2007097
Change-Id: I39c99259abc84cbbee50ac1a827e9349ede6593c
Due to the big changes recently to support k8s rolling upgrade, a
regression issue was introduced which is broken the proxy function
for image downloading. This patch fixes it for both fedor atomic
driver and fedora coreos driver.
Task: 37784
Story: 2007005
Change-Id: I11113d69629e1a97a58e5270f67c7404292b45c3
Choose whether system containers etcd, kubernetes and the heat-agent will be
installed with podman or atomic. This label is relevant for k8s_fedora drivers.
k8s_fedora_atomic_v1 defaults to use_podman=false, meaning atomic will be used
pulling containers from docker.io/openstackmagnum. use_podman=true is accepted
as well, which will pull containers by k8s.gcr.io.
k8s_fedora_coreos_v1 defaults and accepts only use_podman=true.
Fix upgrade for k8s_fedora_coreos_v1 and magnum-cordon systemd unit.
Task: 37242
Story: 2005201
Change-Id: I0d5e4e059cd4f0458746df7c09d2fd47c389c6a0
Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>
Along with the kubernetes version upgrade support we just released, we're
adding the support to upgrade the operating system of the k8s cluster
(including master and worker nodes). It's an inplace upgrade leveraging the
atomic/ostree upgrade capability.
Story: 2002210
Task: 33607
Change-Id: If6b9c054bbf5395c30e2803314e5695a531c22bc
With this change each node will be labeled with the following:
* --node-labels=magnum.openstack.org/role=${NODEGROUP_ROLE}
* --node-labels=magnum.openstack.org/nodegroup=${NODEGROUP_NAME}
Change-Id: Ic410a059b19a1252cdf6eed786964c5c7b03d01c
Resource creation order in kubernetes templates for Fedora Atomic
was changed to avoid neutron bug https://bugs.launchpad.net/neutron/+bug/1845360
Floating IP should be assigned to network port after instance creation
Change-Id: Ib7e0503d475d7cd3164a116c3a0325c4ae417a0a
Story: 2006631
Task: 36844
Support boot from volume for Kubernetes all nodes (master and worker)
so that user can create a big size root volume, which could be more
flexible than using docker_volume_size. And user can specify the
volume type so that user can leverage high performance storage, e.g.
NVMe etc.
And a new label etcd_volme_type is added as well so that user can
set volume type for etcd volume.
If the boot_volume_type or etcd_volume_type are not passed by labels,
Magnum will try to read them from config option
default_boot_volume_type and default_etcd_volume_type. A random
volume type from Cinder will be used if those options are not set.
Task: 30374
Story: 2005386
Co-Authorized-By: Feilong Wang<flwang@catalyst.net.nz>
Change-Id: I39dd456bfa285bf06dd948d11c86867fc03d5afb
We kept introspecting the name of the instance with the assumption
that the network always existed under .novalocal
This is not always the case, with certain variables changed inside
Neutron it is possible to control this, therefore, leading in failing
deploys.
With this change, we pass the instance name directly to the cluster
and therefore we always have the accurate name.
Task: 36160
Story: 2006371
Change-Id: I2ba32844b822ffc14da043e6ef7d071bb62a22ee
Rolling ugprade is an important feature for a managed k8s service,
at this stage, two user cases will be covered:
1. Upgrade base operating system
2. Upgrade k8s version
Known limitation: When doing operating system upgrade, there is no
chance to call kubectl drain to evict pods on that node.
Task: 30185
Story: 2002210
Change-Id: Ibbed59bc135969174a20e5243ff8464908801a23
Using Node Problem Detector, Draino and AutoScaler to support
auto healing for K8s cluster, user can use a new label
"auto_healing_enabled' to turn on/off it.
Meanwhile, a new label "auto_scaling_enabled" is also introduced
to enable the capability to let the k8s cluster auto scale based
its workload.
Task: 28923
Story: 2004782
Change-Id: I25af2a72a7a960205929374d2300bd83d4d20960
This fixes an issue with --registry-enabled in k8s_fedora_atomic where
the registry container fails to start in the minion due to two missing
heat parameters: TRUSTEE_USERNAME and TRUSTEE_DOMAIN_ID.
Change-Id: Ib93a7c0f761d047da3408703a5cf4208821acb33
Task: 23067
Story: 2003033
Similar to calico, deploy flannel as a DS.
Flannel can use the kubernetes API to store
data, so it doesn't need to contact the etcd
server directly anymore.
This patch drops to relatively large files for
flannel's config, flannel-config-service.sh and
write-flannel-config.sh. All required config is
in the manifests.
Additional options to the controller manager:
--allocate-node-cidrs=true and --cluster-cidr.
Change-Id: I4f1129e155e2602299394b5866165260f4ea0df8
story: 2002751
task: 24870
Return the nova instance UUID of worker nodes in kubeminion
templates. We will be able to remove resources from the
ResourceGroups based on nova instance uuid.
Backstory:
In heat a ResourceGroup creates a stack of depth 2. ResourceGroups
support removal policies to declare which resources must be removed.
This can be done by passing the index of the resource or the stack_id
of the nested stack. If a stack update call receives a list of
indices (eg [0, 5, 3]) or nested stack uuid (eg [uuidA, uuidB]), it
will remove the corresponding nested stacks.
In magnum's heat templates, a nested stack logically represents a
nova compute instance which is a cluster node. Using composition in
heat, we can change the way a resources group references the nested
stacks. This proposes to use the nova instance uuid as
'OS::stack_id'.
With this change, an external consumer of the stack (the cluster
autoscaler or an actual user) can remove resources from the
ResourceGroup using the nova instance uuid or resource index. Without
this change, a user or system (which typically knows the name,
server uuid or ip) would have to find in which nested stack a
kubernetes node belongs too. Resulting multiple call to heat.
The end result of this patch can be verified like this:
nested_stack_id=$(openstack stack resource show <STACK_ID_OR_NAME> kube_minions -c physical_resource_id -f value)
openstack stack show "${nested_stack_id}"
Task: 29664
Story: 2005054
Change-Id: I6d776f62d640c72b3228460392b92df94fe56fe6
There are 2 changes included in this patch:
1. Using cluster ip instead of fixed ip for grafana service to
make sure the address is reachable.
2. Move node exporter to prometheus-monitoring namespace and
make it as a DaemonSet to collect metrics from master node.
Task: 28468
Story: 2004590
Change-Id: I9090c6dc4b38e1a1466c4c3a6a827d95c089fb41
1. pods with host network can not reach coredns or any svc or resolve
their own hostname
2. If webhooks are deployed in the cluster, the apiserver needs to
contact them, which means kube-proxy is required in the master node with
the cluster-cidr set.
Change-Id: Icb8e7c3b8c75a3ab087c818c8580c0c8a9111d30
story: 2003460
task: 24719
This is a part of fixes for k8s v1.11.1 recently we're doing. When
testing the k8s v1.11.1, we just found some small but annoying issues:
1. cgroup-driver with systemd not working well with Fedora Atomic, so
we're going to use cgroupfs as the default cgroup-driver.
2. The $ char need to be escaped wc-notify-master.sh
Task: 23223
Story: 2003103
Change-Id: I995f5b82abadfdb7f78f7c098ac7a7f1e5c34fd3
Add 'cloud_provider_enabled' label for the k8s_fedora_atomic
driver. Defaults to true. For specific kubernetes versions if
'cinder' is selected as a 'volume_driver', it is implied that
the cloud provider will be enabled since they are combined.
The motivation for this change is that in environments with
high load to the OpenStack APIs, users might want to disable
the cloud provider.
story: 1775358
task: 1775358
Change-Id: I2920f699654af1f4ba45644ab60a04a3f70918fe
Scripts are the core of Magnum for COE deployment. To be more
clear and consistent, two changes proposed in this patch:
1. Rename network related script to xxx-flannel-xxx given they
are all for flannel and now we have calico driver.
2. Adding .sh for some scripts to be consistent with others.
Change-Id: I97f3e53b4b43648a4896193fb4ce469dbf42c611
Scripts are the core of Magnum for COE deployment. To be more
clear and consistent, two changes proposed in this patch:
1. Rename network related script to xxx-flannel-xxx given they
are all for flannel and now we have calico driver.
2. Adding .sh for some scripts to be consistent with others.
Change-Id: I1a8dfe21d4ff0c58f7f52ebea05c9b22dff16bf0
This patch allows specification of Cgroup driver for Kubelet service.
The necessity of this patch was realised after upgrading Docker to the
new community edition (17.3+) which defaults to `cgroupfs` Cgroup
driver but on the other hand, Fedora Atomic (version 27) comes with
1.13. Cgroup drivers for Docker need to be identical for the two
services, Docker and Kubelet, need to be able to work together.
Story: 2002533
Task: 22079
Change-Id: Ia4b38a63ede59e18c8edb01e93acbb66f1e0b0e4
In the OpenStack deployment with Octavia service enabled, the octavia
service should be used not only for master nodes high availability, but
also for k8s LoadBalancer type service implementation as well.
Change-Id: Ib61f59507510253794a4780a91e49aa6682c8039
Closes-Bug: #1770133
Define a set of new labels to pass additional options to the kubernetes
daemons - kubelet_options, kubeapi_options, kubescheduler_options,
kubecontroller_options, kubeproxy_options.
In all cases the default value is "", meaning no extra options are
passed to the daemons.
Change-Id: Idabe33b1365c7530edc53d1a81dee3c857a4ea47
Closes-Bug: #1701223
In Fedora Atomic 27 etcd and flanneld are removed from the base image.
Install them as a system containers.
* update docker-storage configuration
* add etcd and flannel tags as labels
Change-Id: I2103c7c3d50f4b68ddc11abff72bc9e3f22839f3
Closes-Bug: #1735381
Add a new label 'availability_zone' allowing users to specify the AZ
the nodes should be deployed in. Only one AZ can be passed for this
first implementation.
Change-Id: I9e55d7631191fffa6cc6b9bebbeb4faf2497815b
Partially-Implements: blueprint magnum-availability-zones
Currently, there is no guarantee to make sure all nodes of one cluster are
created on different compute hosts. So it would be nice if we can create
a server group and set it with anti-affinity policy to get a better HA
for cluster. This patch is proposing to create a server group for master
and minion nodes with soft-anti-affinity policy by default.
Closes-Bug: #1737802
Change-Id: Icc7a73ef55296a58bf00719ca4d1cdcc304fab86
In the drivers section of magnum.conf add openstack_ca_file.
This file is expected to be a CA Certificate OR CA bundle
which will be passed on every node and it will be installed
on the host's CA bundle.
Update devstack plugin to use the ssl bundle if tls-proxy is
enabled.
Install the CA for drivers:
k8s_coreos_v1
k8s_fedora_atomic_v1
k8s_fedora_ironic_v1
mesos_ubuntu_v1
swarm_fedora_atomic_v1
swarm_fedora_atomic_v2
Add doc in troubleshooting-guide.
Add release notes.
Closes-Bug: #1580704
Partially-Implements: blueprint heat-agent
Change-Id: Id48fbea187da667a5e7334694c3ec17c8e2504db
Allow any value to be passed on the docker_storage_driver field by turning it
into a StringField (was EnumField), and remove the constraints limiting the
values to 'devicemapper' and 'overlay'.
Change the docker storage setup to have a generic setup for all drivers with
the exception of 'devicemapper', which keeps its own specific storage config
function. For all others, do the same we already did for overlay (with two
cases for usage of a cinder volume or not) and simply set the storage driver
in the docker configuration to the value provided in the cluster template.
Change-Id: I9aa8f232ce64ece4d439c0a476f463820a499617
Closes-Bug: #1722522
Added configuration parameter, verify_ca, to magnum.conf with default
value of True. This parameter is passed to the heat templates to
indicate whether the cluster nodes validate the Certificate Authority
when making requests to the OpenStack APIs (Keystone, Magnum, Heat).
This configuration parameter can be set to False to disable CA
validation.
Co-Authored-By: Vijendar Komalla <vijendar.komalla@rackspace.com>
Change-Id: Iab02cb1338b811dac0c147378dbd0e63c83f0413
Partial-Bug: #1663757