Allow ClusterTemplate to explicitly specify a driver to use for creating
Clusters.
This is initially sourced from the image property 'magnum_driver', but
may be improved to be specified via client in the future.
Falls back to old driver discovery using (coe, server_type, os) tuple to
keep existing behaviour.
Change-Id: I9e206b589951a02360d3cef0282a9538236ef53b
Cluster conductor creates trusts for all drivers, but does not clean
them up. The Heat driver has previously performed this action.
This change moves the lifecycle of trust and certificate creation
to the Conductor, so drivers do not need to clean up resources they
didn't create.
Change-Id: I2b3e99589d2d3069191d0727406601f0647a9722
This is part of the steps to remove usage of six library, which is no
longer needed since python 2 support was removed.
Change-Id: I9a750de4f1ba7017c9dfd67dbf87be138421d017
Heat stack SoftwareConfig is unable to provide a reliable upgrade
experience, so is being disabled. More details in code comments.
A Cluster API driver provides a way forward for Magnum to support
these again, and implement upgrade_cluster.
Change-Id: Ibea354ebfe36e8d689a95c30820709ec2b633964
In Change I523a4a85867f82d234ba1f3e6fad8b8cd2291182, the pep8 test was
accidentally dropped.
Fix up code so that pep8 passes.
In addition to that following change has been added here to unbreak CI:
Add WebTest as an indirect test dependency
Pecan has made webtest an optional dependency for testing only [1].
Since it is still used for testing we need to add it to our
test-requirements.txt.
[1]: https://github.com/pecan/pecan/pull/140
Change-Id: I2f85adb4ef29a43389897c201e6152fd4c7be9d6
We depend on the Kubernetes Python client for several things such as
health checks & metrics polling. Those are both run inside periodic
jobs which spawn in greenthreads.
The Kubernetes API uses it's own thread pools which seem to use
native pools and cause several different deadlocks when it comes to
logging. Since we don't make extensive use of the Kubernetes API
and we want something that doesn't use any threadpools, we can
simply use a simple wrapper using Requests.
This patch takes care of dropping the dependency and refactoring
all the code to use this simple mechansim instead, which should
reduce the overall dependency list as well as avoid any deadlock
issues which are present in the upstream client.
Change-Id: If0b7c96cb77bba0c79a678c9885622f1fe0f7ebc
This change allows users to create clusters and nodegroups with
node_count equal to 0. Also adds support for resizing existing
nodegroups to 0.
Change-Id: Id63459d0fe9836e678bb7569f23d29eabc225e9e
story: 2007851
task: 40145
Signed-off-by: Diogo Guerra <diogo.filipe.tomas.guerra@cern.ch>
We are currently creating a new transport for each api
call. This patch changes that so that each worker
can re-use the same transport for multiple requests.
Story: 2008494
Task: 41544
Change-Id: I11a24f035a9d66a536e5e58328084ee08f0c6285
Now k8s cluster owner can do CA cert rotate to re-generate CA of
the cluster, service account keys and the certs of all nodes will
be regenerated as well. Cluster user needs to get a new kubeconfig
to access kubernetes API. This function is only supported by
Fedora CoreOS driver.
To test this patch with python-magnumclient, you need this patch
https://review.opendev.org/#/c/724243/, otherwise, you will see
an error about "not enough values to unpack", though the CA cert
rotate request has been processed by Magnum server side correctly.
Task: 39580
Story: 2005201
Change-Id: I4ae12f928e4f49b99732fba097371692cb35d9ee
The removed warning was meant as suggestion to
developers to implement a scale_manager. However,
returning None for not implemented classes is a
deliberate choice. Remove this log line from
production sites and developers can always decide
to not return None.
story: 2007803
task: 40064
Change-Id: I85f0c89081007fbbbfe00c7cbeebf0ad837cedf5
Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>
Lower the log level of a warning for a missing output to debug.
This log line appears repeatedly on successful cluster deletion,
creation failure (for unrelated reasons) and nodegroup creation
failure (again for unrelated reasons, eg timeout). This is
triggered when having multiple magnum conductors all trying to
query the status in heat. Additionally, this warning is not an
indication of a malfunction in a cluster or a failure, so it is
useful only for debugging. Finally, add the cluster id, cluster
status and stack id to have more context.
story: 2007636
task: 40062
Change-Id: Ie44b1d13899d77bd2a5d5b1e6107c384277788b9
Signed-off-by: Spyros Trigazis <spyridon.trigazis@cern.ch>
The original design of k8s cluster health status is allowing
the health status being updated by Magnum control plane. However,
it doesn't work when the cluster is private. This patch supports
updating the k8s cluster health status via the Magnum cluster
update API by a 3rd party service so that a controller (e.g.
magnum-auto-healer) running inside the k8s cluster can call
the Magnum update API to update the cluster health status.
Task: 38583
Story: 2007242
Change-Id: Ie7189d328c4038403576b0324e7b0e8a9b305a5e
The repo is Python 3 now, so update hacking to version 3.0 which
supports Python 3.
Fix problems found.
Update local hacking checks for new flake8.
Remove hacking and friends from lower-constraints, those are not needed
for co-installing.
Change-Id: I926efaef501f190e78da9cab40c1e94203277258
Adds support for upgrading nodegroups. All non-default nodegroups,
are allowed to be upgraded using the CT set in the cluster. The
only label that gets upgraded for now is kube_tag. All other labels
in the new cluster_template are ignored.
Change-Id: Icade1a70f160d5ec1c0e6f06ee642e29fe9b02ff
This adds the changes needed in the API and conductor level to support
creating updating and deleting nodegroups.
Change-Id: I4ad60994ad6b4cb9cac18129557e1e87e61ae98c
Since each nodegroup will be one independent stack, we have to add
more fields to the table and object in order to track each stack
contained in the cluster. This adds the stack_id, version, status,
status_reason and version fields to the nodegroup object.
Change-Id: I6d36b2d3bc6476efbef6a9f702ffc73cfa0fab8c
Magnum is sending notifications like cluster create but has no
details regarding the cluster, like cluster UUID. Notifications
from other OpenStack projects contain full detailed information
(e.g. instance UUID in Nova instance create notification).
Detailed notifications are important for other OpenStack
projects like Searchlight or third party projects that cache
information regarding OpenStack objects or have custom actions
running on notification. Caching systems can efficiently update
one single object (e.g. cluster), while without notifications
they need to periodically retrieve object list, which is
inefficient.
Change-Id: I820fbe0659222ba31baf43ca09d2bbb0030ed61f
Story: #2006297
Task: 36009
To enable the rolling upgrade ability of Kubernetes Cluster, this
patch is proposing a new API /upgrade to support upgrade the
base operating system of nodes and the version of Kubernetes, even
add-ons running on the k8s cluster:
POST <ClusterID>/actions/upgrade
And the post body will be:
{
"cluster_template": 'dd9cc5ed-3a2b-11e9-9233-fa163e46bcc2',
"max_batch_size": 1,
"nodegroup": "production_group"
}
Co-Authored-By: Feilong Wang <flwang@catalyst.net.nz>
Task: 30168
Story: 2002210
Change-Id: Ia168877778aa0d473383eb06b1c8a16dc06b0576
This commit removes the fields node_addresses, master_addresses,
node_count and master_count from the cluster object since this info
will be stored in the nodegroups. At the same time, provides the way
to adapt existing clusters to the new schema.
story: 2005266
Change-Id: Iaf2cef3cc50b956c9b6d7bae13dbb716ae54eaf7
This changes the existing cluster APIs and the cluster conductor to
take into consideration nodegroups:
* create: now creates the default nodegroups for the cluster
* update: updates the default nodegroups of the cluster
* delete: deletes also the nodegroups that belong to the cluster
* cluster_resize: takes into account the nodegroup provided by the API
story: 2005266
Change-Id: I5478c83ca316f8f09625607d5ae9d9f3c02eb65a
Now an OpenStack driver for Kubernetes Cluster Autoscaler is being
proposed to support autoscaling when running k8s cluster on top of
OpenStack. However, currently there is no way in Magnum to let
the external consumer to control which node will be removed. The
alternative option is calling Heat API directly but obviously it
is not the best solution and it's confusing k8s community. So with
this patch, we're going to add a new API:
POST <ClusterID>/actions/resize
And the post body will be:
{
"node_count": 3,
"nodes_to_remove": ["dd9cc5ed-3a2b-11e9-9233-fa163e46bcc2"],
"nodegroup": "production_group"
}
The API will be working in a declarative way. For example, there
are 3 nodes in the cluser now, user can propose an API request
like above. Magnum will call Heat to remove the node
dd9cc5ed-3a2b-11e9-9233-fa163e46bcc2 firstly, then bring the node
count back to 3 again.
Task: 29563
Story: 2005052
Change-Id: I7e36ce82c3f442976cc498153950b19c56a1759f
We are writing to files opened with text mode ('w+'), so binary data
will have to be decoded before writing
Task: 29577
Story: 2005057
Change-Id: I034d0230c3022e701111bdc71f0af43da1852c3c
kubernetes-client has patched this [1]. To retain backwards
compatibility, we can use **kwargs to handle async/async_req arguments
[1]: b10c7b6a17
Change-Id: I8e738b4f99091786dd76e081bffa36ef5ab70085
Calling Kubernetes native API to update the cluster health status
so that it can used for cluster auto healing.
Task: 24593
Story: 2002742
Change-Id: Ia76eeeb2f1734dff38d9660c804d7d2d0f65b9fb
Cleaning up comments and logging to make sure they properly adhere
to Openstack standards.
* Consistently use """ instead of ''' for comments.
* Always lazy-load logging parameters.
* Fixed bad log line in cert_manager.
Change-Id: I547f5dfa61609a899aef9b1470be8d8a6d8e4b81
Added configuration parameter, temp_cache_dir, to magnum.conf with
default value of "/var/lib/magnum/certificate-cache". This local
directory will hold cached cluster TLS credentials that are generated
during periodic tasks, to reduce load as the number of clusters
increases. If the temp_cache_dir does not exist, the certificates
will be created as tempfiles.
Closes-Bug: #1659545
Change-Id: I8808c4098a7c8d22dbfc841142c9f9c8b976dde1
this commit introduces a new '/federations'
endpoint to Magnum API, as well as its controllers,
entities and conductor handlers.
this corresponds to the first phase of the
federation-api spec. please refer to [1] for more
details.
[1] https://review.openstack.org/#/c/489609/
Change-Id: I662ac2d6ddec07b50712109541486fd26c5d21de
Partially-Implements: blueprint federation-api
Due to a few several small connected patches for the
fedora atomic driver, this patch includes 4 smaller patches.
Patch 1:
k8s: Do not start kubelet and kube-proxy on master
Patch [1], misses the removal of kubelet and kube-proxy from
enable-services-master.sh and therefore they are started if they
exist in the image or the script will fail.
https://review.openstack.org/#/c/533593/
Closes-Bug: #1726482
Patch 2:
k8s: Set require-kubeconfig when needed
From kubernetes 1.8 [1] --require-kubeconfig is deprecated and
in kubernetes 1.9 it is removed.
Add --require-kubeconfig only for k8s <= 1.8.
[1] https://github.com/kubernetes/kubernetes/issues/36745
Closes-Bug: #1718926https://review.openstack.org/#/c/534309/
Patch 3:
k8s_fedora: Add RBAC configuration
* Make certificates and kubeconfigs compatible
with NodeAuthorizer [1].
* Add CoreDNS roles and rolebindings.
* Create the system:kube-apiserver-to-kubelet ClusterRole.
* Bind the system:kube-apiserver-to-kubelet ClusterRole to
the kubernetes user.
* remove creation of kube-system namespaces, it is created
by default
* update client cert generation in the conductor with
kubernetes' requirements
* Add --insecure-bind-address=127.0.0.1 to work on
multi-master too. The controller manager on each
node needs to contact the apiserver (on the same node)
on 127.0.0.1:8080
[1] https://kubernetes.io/docs/admin/authorization/node/
Closes-Bug: #1742420
Depends-On: If43c3d0a0d83c42ff1fceffe4bcc333b31dbdaab
https://review.openstack.org/#/c/527103/
Patch 4:
k8s_fedora: Update coredns config to pass e2e
To pass the e2e conformance tests, coredns needs to
be configured with POD-MODE verified. Otherwise, pods
won't be resolvable [1].
[1] https://github.com/coredns/coredns/tree/master/plugin/kuberneteshttps://review.openstack.org/#/c/528566/
Closes-Bug: #1738633
Change-Id: Ibd5245ca0f5a11e1d67a2514cebb2ffe8aa5e7de
After [1] jobs are return false(SUCCESS) status due
to wrong EXIT_CODE.
After [2] kubernetes client is updated to v4.0.0 and
no longer contains ConfiugrationObject so we need create
instance of Configuration class.
Also don't use local to create variable as local
can only be used in a function.
[1] https://review.openstack.org/#/c/526618/
[2] https://review.openstack.org/#/c/528406
Change-Id: Ida5aac40b234a358b2a13b2e51a41d0242031ebb
For a really long time, we generated and maintained our very own python
client generated from kubernetes swagger json files. Now in Kubernetes
Community there is a concerted effort to organize an official python
client (also generated from swagger) for everyone to use. So let us
please switch over from our python-k8sclient and use the community
driven python client. I have ported all of our end-to-end tests and got
them working in kubernetes client-python project upstream so we should
be protected from regressions.
Implements: blueprint replace-k8sclient-with-upstream-kubernetes-client
Depends-On: I72359f2b811392008eb5267812bf343797b1553a
Change-Id: Ib81a69cfdc25198e259e3b3d4081c92c01fd1bc5
This commit addresses multiple potential vulnerabilities in
Magnum. It makes the following changes:
* Permissions for /etc/sysconfig/heat-params inside Magnum
created instances are tightened to 0600 (used to be 0755).
* Certificate retrieval is modified to work without the need
for a Keystone trust.
* The cluster's Keystone trust id is only passed into
instances for clusters where that is actually needed. This
prevents the trustee user from consuming the trust in cases
where it is not needed.
* The configuration setting trust/cluster_user_trust (False by
default) is introduced. It needs to be explicitely enabled
by the cloud operator to allow clusters that need the
trust_id to be passed into instances to work. Without this
setting, attempts to create such clusters will fail.
Please note, that none of these changes apply to existing
clusters. They will have to be deleted and rebuilt to benefit
from these changes.
Change-Id: I643d408cde0d6e30812cf6429fb7118184793400
This will give admins a way to revoke access to an existing cluster
once a user has been granted access.
Bumped the API microversion to 1.5 for the new endpoint.
Deprecated policy certificate:get in favor of certificate:get_ca for
clarity and consistency.
Depends-On: Ie960464e45445e195e75b91e8d65a4046eb21e93
Implements: blueprint revoke-cluster-cert
Change-Id: Ief28bef3a79f212acf4166e443a96e5419fbb757
* Add osprofiler wsgi middleware. This middleware is used for 2 things:
1) It checks that person who wants to trace is trusted and knows
secret HMAC key.
2) It starts tracing in case of proper trace headers
and adds first wsgi trace point, with info about HTTP request
* Add initialization of osprofiler at start of service
Currently that includes oslo.messaging notifer instance creation
to send Ceilometer backend notifications.
* Traces HTTP/RPC/DB API calls
Demo: https://hieulq.github.io/cluster-create-false-new-html.html
Co-Authored-By: Hieu LE <hieulq@vn.fujitsu.com>
Implements: blueprint osprofiler-support-in-magnum
Change-Id: I7d68995aab81d365433950aada078ef1fcd5469b
Following up cluster drivers implementation, move the scale managers
at driver level. This change is needed to add the driver field
properly.
Change-Id: Ia854f2354c51b5fa47095bb4cb118416f3f01a33
Implements: blueprint bay-drivers
Following changes for cluster-drivers, move coe specific monitors
at driver level. This change is needed to add the driver field
properly.
Change-Id: Id4658b8f7400bf3c86c8ff81756fb33d1211a0b3
Implements: blueprint bay-drivers
This commit changes the incorrect behavior of cluster create workflow.
Now db record with status CREATE_IN_PROGRESS is created right after
related API request.
Change-Id: I11692c4126823d49672ba5172fa45774bf0ce544
Closes-bug: #1640729