kuryr-kubernetes

Commit Graph

Author	SHA1	Message	Date
Michał Dulko	03b98adde2	Cleanup KuryrPort when Pod is missing We can easily imagine an user frustrated by his pod not getting deleted and opting to remove the finalizer from the Pod. If the cause of the deletion delay was the kuryr-controller being down, we end up with an orphaned KuryrPort. At the moment this causes crashes, which obviously it shouldn't. Moreover we should figure out how to clean up the Neutron port if that happens. This commit does so as explained below. 1. KuryrPort on_present() will trigger its deletion when it detects that Pod does not longer exist. 2. Turns out security_groups parameter passed to release_vif() was never used. I removed it from drivers and got rid of get_security_groups() call from on_finalize() as it's no longer necessary. 3. When we cannot get the Pod in KuryrPort on_finalize() we attempt to gather info required to cleanup the KuryrPort and "mock" a Pod object. A precaution is added that any error from release_vif() is ignored in that case to make sure failed cleanup is not causing the system to go down. Change-Id: Iaf48296ff28394823f68d58362bcc87d38a2cd42	2022-08-24 17:48:02 +02:00
Michał Dulko	04d4439606	Remove SR-IOV support This got decided at the PTG. The code is old, not maintained, not tested and most likely doesn't work anymore. Moreover it gave us a hard dependency on grpcio and protobuf, which is fairly problematic in Python and gave us all sorts of headaches. Change-Id: I0c8c91cdd3e1284e7a3c1e9fe04b4c0fbbde7e45	2022-06-29 12:49:37 +02:00
Maysa Macedo	da7a4b59a4	Fix Ports len calculation on Networks cleanup Generator doesn't have lenght available, so we must convert it to list instead. Change-Id: Ia979894de405bba83952284a1df714753d62494b	2022-06-23 12:59:46 +02:00
Zuul	3a65308784	Merge "Remove dead networks in current cluster."	2022-06-22 10:52:40 +00:00
Roman Dobosz	a38d764ffc	Get rid of obsolete KuryrNetPolicy CRD. There are some of the mentions of KuryrNetPolicy around our code. In this patch we are removing it (with one exception - the spec for originally designed CRD for network policy handling), just to avoid confusion with currently used KuryrNetworkPolicy. Change-Id: Ie9bb46467a249e1c0ada3a9810c4fff59fd57757	2022-06-10 15:46:31 +02:00
Roman Dobosz	a63bf23976	Remove dead networks in current cluster. There was already introduced port cleanup function, so in this patch there is new function introduced for cleaning up unused networks and subnets for whole deployment, by using identifier from description. Change-Id: Ia11e8953fab3f9cd8f6598f9da94daae324b1bf8	2022-06-10 15:46:04 +02:00
Roman Dobosz	846f158724	Removed all occurrences of removed KuryrNet CRD. CRD KuryrNet was already replaced by KuryrNetwork, although there are some spots, where it is mentioned - mostly docs and log messages. In this commit we get rid of it once and for all. Change-Id: I20345a1f4d4288534d620f0bd2196fc77ee795e9	2022-06-01 13:10:48 +02:00
Roman Dobosz	bc8ba2bc17	Add periodic task for cleaning up dead ports. Sometimes it happen, that during ports creation, there could be quota violations, so that port could be created without tags, and would just hanging there. This periodic task would remove all the dead ports. Change-Id: If646cb3bf00aca387c769fafe2f73a8194642f69	2022-05-05 07:06:02 +02:00
Michał Dulko	d5f5db7005	CNI: Use K8S_POD_UID passed from CRI Recent versions of cri-o and containerd are passing K8S_POD_UID as a CNI argument, alongside with K8S_POD_NAMESPACE and K8S_POD_NAME. As both latter variables cannot be used to safely identify a pod in the API (StatefulSet recreates pods with the same name), we were prone to race conditions in the CNI code that we could only workaround. The end effect was mostly IP conflict. Now that the UID argument is passed, we're able to compare the UID from the request with the one in the API to make sure we're wiring the correct pod. This commit implements that by making sure to move the check to the code actually waiting for the pod to appear in the registry. In case of K8S_POD_UID missing from the CNI request, API call to retrieve Pod is used as a fallback. We also know that this check doesn't work for static pods, so CRD and controller needed to be updated to include information if the pod is static on the KuryrPort spec, so that we can skip the check for the static pods without the need to fetch Pod from the API. Closes-Bug: 1963677 Change-Id: I5ef6a8212c535e90dee049a579c1483644d56db8	2022-03-08 12:28:48 +01:00
Michał Dulko	fce11a130d	Add source to Events It is possible to add info about which component added an Event and this patch makes sure we do it. This should show up in `kubectl describe` directly pointing each event to Kuryr. Change-Id: If954d62010b43e8a92ac2cb3140fc434e5d477a0	2021-12-16 12:42:42 +01:00
Michał Dulko	85a9696039	Follow up to Pods events patch Some minor wording fixes, making sure we're not logging errors happening when we try to create events on Namespace termination and a safeguard if somebody puts garbage into ownerReferences we depend from. Change-Id: Ia2d6f41c0b9f969ea01ae78a16036cd4715de703	2021-12-15 18:59:46 +01:00
Roman Dobosz	f8c0b736c1	Add events for Pod object. Also added function for getting object out of ownerReferences or by query kubernetes API for getting original obj out of CRD. Change-Id: I17e1672a46496bc6a51f3d28fbfbd1adea9ca249	2021-12-15 15:08:48 +01:00
Maysa Macedo	32cdda4914	Restrict handling of Namespace events Ensures Namespaces are only handled when there is any Pod on Pod Network present on it to avoid creating Networks and Subnets without really being used. Depends-On: https://review.opendev.org/c/openstack/kuryr-tempest-plugin/+/813404 Change-Id: Idbc1e2868c54eb1151d2498a0324731fe099592e	2021-10-29 09:32:33 +00:00
Maysa Macedo	85a4cd14f3	Make completed Pods Ports reusable When the Pod has status Completed it has finalized its job, consequently it's expected that Kuryr will reclycle the Neutron Ports associated with it to be used by other Pods. However, Kuryr is considering Completed Pods as not scheduled and the event is skipped. This commit fixes the issue by ensuring that Completed Pods Ports recycle is done before checking for Scheduled Pods. Closes-Bug: #1945680 Change-Id: I23e7901fb016d59fa76762d1f24fee47974e72df	2021-10-08 14:27:51 +00:00
Maysa Macedo	c4278f9da2	Expose critical lbs metrics Includes a new Gauge metric that records the number of members of load balancers considered critical. The metric is labeled with the Name of load balancer and pool name, and the amount of members. Also includes an Enum with the current state of the lb. Change-Id: Id89bb48d86588f4d2a28ab91963e0b84843cbd6f	2021-10-05 22:33:38 +00:00
Takashi Kajinami	cb30a40c74	Document cache parameters This change adds help strings for cache parameters, so that description of these parameters are included in .conf file generated by oslo-config-generator. Change-Id: I9d7082fc42b5dbd67f7214810719158d93f5f88d	2021-08-30 20:05:40 +09:00
Robin Cernin	164095b0e7	Remove ep_slices from klb on endpoint delete event Closes-Bug: 1939512 Change-Id: Ia0a9ae82c46a43a64c916194a272f0a85618fa34	2021-08-15 07:25:18 +10:00
Michał Dulko	673fb5d9b2	Fix readiness quota check Due to coding error I not only switched "used" with "limit" in our quota check messages but also made it impossible for the check to fail. This commit fixes it. Closes-Bug: 1927241 Change-Id: I6e6a396d0e0467ec424bb403064a19cb4f1a586e	2021-05-05 17:23:06 +02:00
Roman Dobosz	4ba363d6c8	Narrow connection to the cluster only on namespaceSelector. For simple case, when operator wants to open connection to all the namespaces in the cluster, i.e.: kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: networkpolicy-example spec: podSelector: {} policyTypes: - Egress - Ingress egress: - to: - namespaceSelector: {} there was false assumption, that we need to open it without any restriction, while the truth is, that all we need to do is to open egress network to all the namespaces within cluster. Change-Id: Ibea039fa9c3b46b83e99237ce2ceb03f02d50727 Closes-Bug: 1915008	2021-02-12 12:13:50 +01:00
Michał Dulko	229a990b57	Get trunks more diligently We mostly assumed that trunk ports are only used by Kuryr in an OpenStack env but sometimes it's not true. This commit adds some checks to make sure we list trunk ports in a smarter way (checking if they match the worker_nodes_subnets) and operate on them in a safer way (checking if they even have IPs). Change-Id: I3257e263b53bb9f38946ca9cff6a1be5448dec00 Closes-Bug: 1914631	2021-02-04 18:13:57 +01:00
Zuul	6f0f7bc0e6	Merge "Add OpenShiftNodesSubnets driver and MachineHandler"	2021-01-26 13:21:38 +00:00
Zuul	5e205c61af	Merge "Multiple nodes subnets support"	2021-01-26 13:21:26 +00:00
Michał Dulko	e95ed536d6	Add OpenShiftNodesSubnets driver and MachineHandler In order to support OpenShift's ability to run its nodes in various OpenStack subnets in a dynamic way, this commit introduces the OpenShiftNodesSubnets and MachineHandler. The idea is that MachineHandler is responsible for watching the OpenShift Machine objects and calling the driver. The driver will then save and serve a list of current worker nodes subnets. Change-Id: Iae3a5d011abaeab4aa97d6aa7153227c6f85b93c	2021-01-22 13:41:39 +01:00
Michał Dulko	b3814a33d6	Multiple nodes subnets support This commit deprecates `[pod_vif_nested]worker_nodes_subnet` in favor of `[pod_vif_nested]worker_nodes_subnets` that will accept a list instead. All the code using the deprecated options is updated to expect a list and iterate over possible nodes subnets. Change-Id: I7671fb06863d58b58905bec43555d8f21626f640	2021-01-22 13:41:39 +01:00
Roman Dobosz	caf098f0df	Adapt selfLink calculation for Endpoints objects. Implements: blueprint selflink Change-Id: Iaf6bc954624def675d6c1769873c5e5277390542	2021-01-11 10:19:24 +01:00
Roman Dobosz	94e5f92791	Adapt selfLink calculation for Service objects. Implements: blueprint selflink Change-Id: Id0933b837f4dc856c800bfb003f4aa8096a3176c	2021-01-11 10:19:24 +01:00
Roman Dobosz	b38f0584a0	Adapt selfLink calculation for KuryrNet CRD objects. Implements: blueprint selflink Change-Id: Ibef7880af6850277ea6954c693f122c09b4e7815	2021-01-11 10:19:24 +01:00
Roman Dobosz	7c790aa7f2	Adapt selfLink calculation for KuryrNetPolicy CRD objects. Implements: blueprint selflink Change-Id: I8d83db9be3bcb49f94b77e6e961589cbb54e9da3	2021-01-11 10:19:24 +01:00
Roman Dobosz	e3ff9547a6	Added function for figure out link for the resource. Because Kubernetes will stop propagating metadata.selLink field in release 1.20 and it will be removed in release 1.21, we need to adapt and calculate selfLink equivalent by ourselves. In this patch new function is introduced, which will return the path for provided resource. Also, we need to deal with list of resources, since for some reason CRs do have information regarding apiVersion and the kind, while as for core resources the apiVersion is only on top of the list object, kind is something like *List, and object within 'items' are without either apiVersion nor kind. Implements: blueprint selflink Change-Id: I721a46ea0379382f7eb2e13c59bd193314f37e7f	2021-01-07 10:41:01 +01:00
Maysa Macedo	27876a586f	Add support to Endpoints without targetRef The targetRef field is optional for the Endpoints object, and when service without selectors are created that field is more likely to not be specified. This commit ensures Kuryr properly wires Endpoints without targetRef. Change-Id: Ib43e88aafd9e0907556a0e740990a6acbd173fb0	2020-11-19 21:49:21 +00:00
Zuul	fba2c4dfa1	Merge "Civilize logging vol 2"	2020-09-22 10:51:05 +00:00
Maysa Macedo	7894021931	Clean lb crd status upon Load Balancer removal When a lb transitions to ERROR or the IP on the Service spec differs from the lb VIP, the lb is released and the CRD doesn't get updated, causing Not Found expections when handling the creation of others load balancer resources. This commit fixes the issue by ensuring the clean up of the status field happens upon lb release. Also, it adds protection in case we still get nonexistent lb on the CRD. Closes-Bug: 1894758 Change-Id: I484ece6a7b52b51d878f724bd4fad0494eb759d6	2020-09-19 14:57:25 +00:00
Michał Dulko	9743f6b3c9	Civilize logging vol 2 This is another attempt at getting the useless tracebacks out of logs for any level higher than debug. In particular: * All pools logs regarding "Kuryr-controller not yet ready to " are now on debug level. The ugly ResourceNotReady raised from _populate_pool is now suppressed if method is run from eventlet.spawn(), which prevents that exception being logged by eventlet. * ResourceNotReady will only print namespace/name or name, not the full resource representation. * ConnectionResetError is suppressed on Retry handler level just as any other K8sClientError. Change-Id: Ic6e6ee556f36ef4fe3429e8e1e4a2ddc7e8251dc	2020-09-17 12:15:39 +02:00
Zuul	1708114fb1	Merge "Move vifs to 'status' in the KuryrPort CRD."	2020-08-19 08:24:26 +00:00
Roman Dobosz	1aa6753d80	Move vifs to 'status' in the KuryrPort CRD. I newly added CRD, KuryrPort, we noticed, that vifs key, which is now under 'spec' object, is rather a thing which could be represented as the CRD status. In this patch we propose to move vifs data under the status key. Depends-On: I2cb66e25534e44b79f660b10498086aa88ad805c Change-Id: I71385799775f9f9cc928e4d39a0fd443c98b53c6	2020-08-12 17:39:45 +02:00
Michał Dulko	d80e1bff99	Support upgrading LBaaSState annotation to KLB CRD On upgrade from version using annotations on Endpoints and Services objects to save information about created Octavia resources, to a version where that information lives in KuryrLoadBalancer CRD we need to make sure that data is converted. Otherwise we can end up with doubled loadbalancers. This commit makes sure data is converted before we try processing any Service or Endpoints resource that has annotations. Change-Id: I01ee5cedc7af8bd02283d065cd9b6f4a94f79888	2020-08-10 16:51:32 +00:00
scavnicka	f71ae55476	Update loadbalancer CRD with service spec and rely on CRD This commit adds support for creation of loadbalancer, listeners, members, pools with using the CRD, it is also filling the status field in the CRD. Depends-On: https://review.opendev.org/#/c/743214/ Change-Id: I42f90c836397b0d71969642d6ba31bfb49786a43	2020-07-30 21:56:43 +00:00
Roman Dobosz	a458fa6894	Pod annotations to KuryrPort CRD. Till now, we were using pod annotations to store information regarding state of the associated VIFs to pod. This alone have its own issues and it's prone to the inconsistency in case of controller failures. In this patch we propose new CRD called KuryrPort for storage the information about VIFs. Depends-On: If639b63dcf660ed709623c8d5f788026619c895c Change-Id: I1e76ea949120f819dcab6d07714522a576e426f2	2020-07-29 23:50:17 +02:00
Michał Dulko	9db38c85b2	Tweak exponential backoff This commit attempts to tweak and simplify the exponential backoff that we use by making the default interval 1 instead of 3 (so that it won't raise that fast), locking default maximum wait at 60 seconds (so that we won't wait e.g. more than 2 minutes as a backoff waiting for pod to become active) and introducing small jitter instead of fully random choice of time that we had. Change-Id: Iaf7abb1a82d213ba0aeeec5b5b17760b1622c549	2020-07-03 16:56:52 +02:00
Michał Dulko	d8892d2e72	Civilize logging Our logs are awful and this commit attempts to fix some issues with them: * Make sure we always indicate why some readiness or liveness probe fail. * Suppress INFO logs from werkzeug (so that we don't see every probe call on INFO level). * Remove logging of successful probe checks. * Make watcher restart logs less scary and include more cases. * Add backoff to watcher restarts so that we don't spam logs when K8s API is briefly unavailable. * Add warnings for low quotas. * Suppress some long logs on K8s healthz failures - we don't need full message from K8s printed twice. I also refactored CNI and controller health probes servers to make sure they're not duplicating code. Change-Id: Ia3db4863af8f28cfbaf2317042c8631cc63d9745	2020-07-03 15:09:52 +02:00
Luis Tomas Bolivar	eeee83d0f3	Add IPv6 support to namespace subnet driver Change-Id: If3bd633b36694dedaf65cb14287e9b9519958de8	2020-03-17 13:30:16 +00:00
Zuul	e461600ffa	Merge "Ensure LB sg rules use IPv6 when enabled"	2020-03-12 18:15:06 +00:00
Maysa Macedo	7fb7d96c21	Ensure LB sg rules use IPv6 when enabled When IPv6 and Network Policy are enabled we must ensure the amphora SG is updated with sg rules using IPv6. Implements: blueprint kuryr-ipv6-support Change-Id: Id89b6c02e85d7faa75be6182c9d82ee7f32ff909	2020-03-10 19:13:42 +00:00
ITD27M01	9cdd1c8112	Ensures accurate quota calculation during the readiness checks Current deployments of OpenShift platform with Kuryr CNI in real OpenStack installations (multi-projects environments) are crashing because of kuryr-controller cannot come to READY state. This is due to inaccurate quota calculations in the readiness process and an unscalable fetching of objects from Neutron API to count and comparing with limits. This commit ensures accurate quota calculation for installation project during the readiness checks and removes the harsh Neutron API calls. It will dramatically speedup readiness checks. Change-Id: Ia5e90d6bd5a8d30d0596508abd541e1508dc23ec Closes-Bug: 1864327	2020-02-25 16:58:02 +03:00
Michał Dulko	1045bcb02a	Bump hacking to newer version Our hacking module is ancient and makes Python 3.6's f-strings to fail PEP8 check. This commit bumps hacking to newer version and fixes violations found by it. Change-Id: If8769f7657676d71bcf84c08108e728836071425	2020-02-21 12:02:58 +01:00
Gary Loughnane	edc6597fe2	Add DPDK support for nested pods Add DPDK support for nested K8s pods. Patch includes a new VIF driver on the controller and a new CNI binding driver. This patch introduces dependency from os-vif v.1.12.0, since there a new vif type. Change-Id: I6be9110192f524325e24fb97d905faff86d0cfef Implements: blueprint nested-dpdk-support Co-Authored-By: Kural Ramakrishnan <kuralamudhan.ramakrishnan@intel.com> Co-Authored-By: Marco Chiappero <marco.chiappero@intel.com> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Signed-off-by: Danil Golov <d.golov@samsung.com>	2020-02-04 10:59:45 +03:00
Roman Dobosz	1b97158c89	Move from Neutron client to OpenStackSDK. OpenStack SDK is a framework, which provides consistent and complete interface for OpenStack cloud. As we already use it for Octavia integration, in this patch we propose to use it also for other parts of Kuryr. Implements: blueprint switch-to-openstacksdk Change-Id: Ia87bf68bdade3e6f7f752d013f4bdb29bfa5d444	2019-10-23 14:08:18 +02:00
Luis Tomas Bolivar	998be3bbda	Avoid race between Retries and Deletion actions This patch set increases the timeout to wait for resources to be created/deleted. This is needed to better support spikes without restarting the kuryr-controller. This patch also ensures that future retry events are not afecting the kuryr controller if they are retried once the related resources are already deleted, i.e., the on_delete event was executed before one of the retries. Closes-Bug: 1847753 Change-Id: I725ba22f0babf496af219a37e42e1c33b247308a	2019-10-16 18:25:01 +02:00
Yash Gupta	4c3e338273	Reuse utils.get_lbaas_spec in lb handler LoadBalancerHandler._get_lbaas_spec is identical to utils.get_lbaas_spec, which is used by LBaaSSpecHandler, so we can reuse the function from utils instead of duplication of code. Note that the passed k8s object is endpoint in case of LoadBalancerHandler and service in case of LBaaSSpecHandler (but same annotation used in both cases, so a common function should be enough) Change-Id: I124109f79bcdefcc4948eb35b4bbb4a9ca87c43b Signed-off-by: Yash Gupta <y.gupta@samsung.com>	2019-09-03 14:30:03 +09:00
Maysa Macedo	73cac914c2	Ensure controller is only restarted after the event timesout It can take a while for a pod to have annotations and a hostIP defined. It is also possible that a pod is deleted before the state is set, causing a NotFound k8s exception. Lastly, a service migth be missing the lbaas_spec annotation, causing the event handled on the endpoints to crash. All this scenarios can be avoided by raising a resource not ready exception, which allows the operation to be retried. Change-Id: I5476cd4261a6118dbb388d7238e83169439ffe0d	2019-08-28 13:22:23 +02:00

1 2

71 Commits