Xelon Docs

Kubernetes Service

Kubernetes Architecture

We explain how we set up our Kubernetes service.

Our Kubernetes service is based on the Talos Linux distribution from Siderolabs. In general, this can be described as “Vanilla Kubernetes”, but we have made a preliminary decision for certain components to make it as easy as possible for you.

We explain the most important components below, but refer you to the official documentation for details.

This service is currently in early-access mode. Contact us to request access.

The most important Kubernetes components

The green text box below contains an example of each component and its function.

components-of-kubernetes Source: https://kubernetes.io/docs/concepts/overview/components/

Kube API Server

The Kube API server is the central management interface of Kubernetes, which coordinates the communication between all Kubernetes components. The server processes API requests, performs validations and ensures that the desired states of the cluster resources are achieved.

A developer sends a request to the Kube API server to create a new web application in the form of a deployment. The Kube API server receives this request, validates it and saves the configuration in the ETCD.

ETCD

ETCD is the consistent key-value store used by Kubernetes to store all cluster data. It stores the entire cluster status, including configurations, secrets and information about the current status of pods and other resources.

ETCD stores information about the new deployment, including the number of replicas, container images and configurations. It acts as a central database for the current status of the cluster.

Kube Scheduler

The Kube Scheduler is responsible for assigning pods to nodes in the cluster. It checks the requirements and restrictions of the pods, such as resource requirements and affinity rules, and then decides on which node the pods are best executed in order to ensure efficient resource utilization and load distribution.

The Kube scheduler recognizes the new pod request stored in the ETCD. It evaluates the available nodes based on the resource requirements and the affinity rules of the pod and decides on which node the pod should be placed.

Kube Controller Manager

The Kube Controller Manager is a central component of Kubernetes that manages and coordinates various controller processes. These controllers monitor the state of the cluster, respond to changes and ensure that the desired configuration of the cluster resources is maintained, for example by ensuring that the correct number of pods are running or that the nodes are functioning correctly.

The Kube Controller Manager monitors the status of the new pods. If a pod is not functioning properly or a node fails, the controller starts new pods to ensure the desired number of replicas.

Kubelet

The kubelet is the agent that runs on each node of the Kubernetes cluster and is responsible for running and managing the pods on that node. It communicates with the kube-apiserver to obtain pod specifications, launches containers via container runtime engines such as Docker and continuously monitors their state to ensure they are running correctly.

On the selected node, the kubelet monitors the incoming pod definitions from the kube-apiserver. It starts the containers according to the specifications and monitors their status to ensure that they function as expected.

Kube Proxy

The Kube Proxy is a network component of Kubernetes that runs on each node and is responsible for network communication. It manages the network rules and ensures that service requests are correctly forwarded to the appropriate pods by enabling load balancing and network routing between the different components of the cluster.

The Kube proxy ensures that requests to the web application service are correctly forwarded to the pods. It manages the network rules and ensures load balancing between the application's various pods.

Container Runtime

The Container Runtime is the software that executes and manages containers on a node. It is responsible for creating, executing, terminating and deleting containers and ensures that containers are executed according to the specifications in the Kubernetes cluster.

Container Runtime is responsible for creating, executing, terminating and deleting containers and ensures that containers are executed according to the specifications in the Kubernetes cluster.

The distributions of the components selected by Xelon

Below you will find the components that we have selected as the basis for our Kubernetes service. The components not listed are Vanilla Kubernetes components. Our goal is to offer a Kubernetes service that is as standardized as possible, because after all, standardization is the reason why Kubernetes has prevailed over alternatives.

Operating system - Talos Linux

The operating system itself is not directly a Kubernetes component, but it is the basis for the Kubernetes service. For the Kubernetes nodes, we rely on Talos Linux, a minimal operating system that was developed specifically for the operation of Kubernetes.

Many Kubernetes nodes run on Ubuntu, RHEL or minimal container operating systems such as “MicroOS” from SUSE or “CoreOS” from Fedora/Redhat. Most of these distributions focus on removing all unnecessary applications.

Talos Linux from SideroLabs turns the tables: It is a Linux kernel with the minimum necessary binaries to run containers. In addition, the system partition is unchangeable and is replaced by a new one during an upgrade.

KubePrism

KubePrism is a simple tool from SideroLabs that increases the availability of the controlplanes in the cluster itself. Basically, it creates a load balancer on the control plane nodes that forwards the Kube API server requests to a healthy endpoint. Furthermore, the requests are sent to the KubeAPI server with the lowest latency. For more details, please refer to the official documentation.

Container Network Interface - Cilium

As CNI (Container Network Interface) we have chosen Cilium, which is currently not yet a standard installation option on the Talos Linux distribution, but there is already a lot of documentation that simplifies the installation. We chose Cilium for two main reasons. The first is the good performance that Cilium offers. Cilium uses ebpf to route packets back and forth between nodes, and the great thing is that ebpf uses very little CPU and memory resources compared to traditional tools like iptables/nftables. The second reason for Cilium is monitorability. With Hubble and other integrated tools, the network traffic between containers and services is no longer just black magic, but is once again clear and traceable, even for team colleagues who have not set up the cluster themselves.

As a bonus tip, we would like to point out that Cilium Ingress (Envoy) is also pre-installed in the cluster, which also makes Layer 7 traffic understandable and transparent.

Xelon - Cloud Controller Manager

The Kube Controller Manager is a standard component in the Kubernetes cluster that can be extended with the “CCM” (Cloud Controller Manager) component. We have written our own extension here, which is pre-installed in the cluster. This enables the creation of services of the load balancer type, which in simple terms is a public endpoint, without having to configure a load balancer manually.

Our Xelon Cloud Controller Manager is also available on Github.

Xelon - Container Storage Interface

Kubernetes was originally used almost exclusively for stateless applications, the reason for this was the “complicated” storage management. As soon as an application can run on multiple servers, all servers must have access to the same storage. The CSI (Container Storage Interface) is a standardization for connecting volumes to containers. Here too, we have installed our own Xelon CSI in the cluster as standard.

Our Xelon CSI is also available on Github.

Xelon Kubernetes Architecture

Below you can see how our service is structured.

Kubernetes_Architektur

Gateway & Loadbalancer

At Xelon, security is our top priority, which is why all Kubernetes nodes are located in a LAN that is isolated from the Internet by the load balancers and also acts as a layer 3 firewall. Each Kubernetes cluster has at least one load balancer (primary load balancer), which performs the dual function of gateway.

When “Production” mode is active, not just one load balancer is created, but two, which are made fail-safe by virtual IPs and HaProxy.

Each load balancer cluster (productive and non-productive) requires a dedicated /29 WAN network. Below is an example network with the IPs and their functions.

Network: 45.131.171.40/29
IP	Function	Description
45.131.171.40	Network ID	not usable
45.131.171.41	Xelon WAN Upstream Gateway	not usable
45.131.171.42	Dedicated IP of the primary load balancer	Is used by the load balancer as the primary IP.
45.131.171.43	Dedicated IP of the secondary load balancer (FailOver)	Is used by the load balancer as the primary IP.
45.131.171.44	#1 Virtual IP address	Is used for the first load balancer cluster per Kubernetes cluster as an endpoint for the KubeAPI (6443), Talos API (50000) and Cilium Ingress Controller (80, 443). For each additional load balancer cluster, this can also be used as a load balancer IP.
45.131.171.45	#2 Virtual IP address	Usable IP address for Kubernetes service of type load balancer
45.131.171.46	#3 Virtual IP address	Usable IP address for Kubernetes service of type load balancer
45.131.171.47	Broadcast address	not usable

Kubernetes Nodes

Our Kubernetes nodes are divided into two types, controlplane nodes and worker nodes. However, these are only logical subdivisions for a better overview and scalability.

The controlplane nodes are always in the first pool, which is only available in two sizes: “productive” and ‘non-productive’. In non-productive mode, only one control plane is provided. If this is restarted or is no longer available, the KUBE API is no longer functional and containers / workloads are not recreated and cannot be managed until the control plane is available again. Workloads that are already running on the worker nodes are not affected by such an interruption, provided there are no other dependencies. Scaling from non-production mode to production mode is possible, but not in the opposite direction. In productive mode, the ETCD data memory is actively synchronized between the control plane nodes. This mode allows the failure of a Controlplane node.

Worker nodes can be scaled as required, but it is not possible to reduce the size of hard disks.

Node Pool

As already mentioned, we use the principle of node pools, which represent a logical model for the nodes in a pool. The aim is for all nodes in a pool to have the same number of resources, which simplifies the calculation of fault tolerances.