Kubernetes Architecture

Kubernetes is based on a client-server architecture with master and worker nodes. The key element in this architecture is the cluster. So what is it?

Imagine a production complex with a large number of buildings and premises. One of the buildings is administrative (headquarters). It houses managers who plan, coordinate, and control all production processes. They make important decisions, for example, where and when to start new production or how to distribute resources. Other buildings in the complex are production facilities (factory 1, 2, ...) and are engaged in manufacturing products.

A cluster is a complex of servers or virtual machines that can be compared to a production complex with a large number of buildings and premises (servers or virtual machines). A cluster consists of nodes, each of which performs a specific function and affects overall productivity.

In the diagram below, you can see worker nodes — Node 1 and Node 2. Worker nodes (computers or virtual machines) are responsible for running and executing applications. On these nodes, worker containers with programs run, performing specific tasks, from data processing to serving web requests.

The Control Plane (also called Master Nodes) is responsible for planning, distributing work, monitoring the state of production, and solving problems (similar to the administrative building). It is the brain of Kubernetes that manages the entire cluster. This is where decisions are made (e.g., where to place applications), cluster state tracking and responding to changes happens. The Control Plane includes important components:

API-server, also called kube-api-server;
scheduler;
Controller Manager;
a key-value data store called etcd.

💡 Loss of the Control Plane can lead to significant outages. You will learn later how to avoid this.

Control Plane components can be run on any computer in the cluster. Usually these components are run on at least one separate computer. This helps avoid communication delays and simplifies coordination between components when managing the cluster.

It is also best practice not to run user containers on this computer to ensure sufficient resources for the control plane components. If user containers are run on the same machine as the control plane components, this can lead to resource contention.

Control Plane Components

Now let's take a closer look at the master node components.

kube-api-server (API Server)

The first component is kube-api-server, also called API Server.

API Server is the "main gateway" for all operations in the cluster. In essence, this gateway is similar to a REST API server, which serves as the central interface for cluster management and handles receiving and processing all requests from users, administrators, developers, and external agents. API Server also validates requests and ensures security through authentication, authorization, and access control.

In case of a rapid increase in the number of requests, API Server can create "clones" or "replicas" to handle the load. Horizontal scaling is one of the key properties of API Server. Like most Kubernetes components, API Server runs in a container.

💡 API Server is the only component that directly interacts with etcd, acting as an intermediary interface for all other control plane agents.

etcd

The second component is etcd, which is a distributed and strongly consistent key-value store. This store holds all critically important information about the cluster state.

💡 etcd can be compared to a safe that stores the configuration and state of all cluster resources. Everything that happens in the cluster must be recorded and saved! This is the function etcd performs.

All new data is written to the data store by appending, while outdated data is regularly compacted to optimize storage volume. Only the API server has direct access to etcd. This configuration ensures centralized management and control.

To manage etcd, etcdctl is used, which performs various functions. One of the most important functions of etcdctl is the ability to create "snapshots" of the cluster state. These snapshots, like photographs, capture the state of the store at a specific point in time. They can be used to restore the cluster state after failures or problems, reducing the risk of losing important data.

Usually a cluster has 3 or more master nodes (the number of master nodes should be odd).

In essence, etcd is also a cluster. For example, there are 3 etcd nodes, one of which is the leader, and the others are followers. At startup, the cluster independently determines who will perform which role. In this case, 3 nodes are needed for stability and data integrity about operations and states within the cluster.

When a write request comes to etcd, this write is automatically redirected to the cluster leader. The leader makes a note about the change but does not write it yet.

At the same time, the leader replicates this value to the remaining etcd nodes in the cluster. After this, the leader waits until the value is written to the majority of nodes, and only then writes it to itself and marks the operation as completed. When the value is written, the leader sends a write confirmation to the client that executed the write command.

kube-scheduler

The next component is kube-scheduler or simply scheduler. You may have already guessed from the name what this component is responsible for: kube-scheduler is responsible for planning the resources needed to perform work. Work refers to running containers, which requires the following resources:

CPU;
RAM;
Disk Space.

The Scheduler analyzes needs and the current state of the cluster and finds the best node for running containers. The scheduler receives information about needs and the current cluster state from the API Server. After finding the best node for execution, the Scheduler passes the result back to the Server, which then delegates workload deployment to other control plane agents.

💡 When choosing a node, the scheduler considers various factors, including physical resources and various constraints that you can configure.

kube-controller-manager

kube-controller-manager is responsible for maintaining the normal (desired) state of the cluster. It carefully monitors the cluster's operation and, if something goes wrong, launches controllers and operators to fix the situation.

There are different types of operators and controllers, each of which monitors a certain aspect of cluster operation. For example, the node controller is responsible for monitoring and responding to outages in node operation.

kube-controller-manager combines all controllers into a single process for efficient management. This allows easy configuration of different controllers for stable cluster operation and compliance with established requirements and rules.

The essence of controller operation comes down to reproducing the desired state. Controllers run in an infinite loop that constantly checks whether the desired cluster state matches the current state. If these states do not match, the corresponding controller launches synchronization mechanisms to resolve the problem.

cloud-controller-manager

cloud-controller-manager is similar to kube-controller-manager but runs controllers specific to the cloud provider. If you run Kubernetes on your own hardware, there will be no cloud-controller-manager in the cluster.

Like kube-controller-manager, cloud-controller-manager combines several controllers that run as a single process. To improve performance or tolerate failures, more than one copy of such processes can be run.

Worker Nodes

The real magic of Kubernetes happens in worker nodes, where applications run in containers. To understand worker nodes in more detail, you need to familiarize yourself with pods.

A pod is a collection of one or more containers scheduled together. This collection can be started, stopped, or rescheduled simultaneously.

💡 A pod is often compared to basic building blocks. It is the smallest deployable unit that can be scheduled in Kubernetes, and it is the only object in Kubernetes that runs containers.

Now that you know what a pod is, it's time to return to the Worker Node.

For uninterrupted operation, each worker node contains:

kubelet — an agent that runs on each worker node and monitors that everything is working properly.

For monitoring, kubelet receives information from the master node about what work needs to be done and which pods need to be created. Taking the received information into account, kubelet starts, stops, and manages containers. It also continuously monitors the state of pods and sends reports to the master node.

kube-proxy — an agent responsible for network interaction. It balances the load by distributing traffic between pods and also monitors compliance with network rules. kube-proxy is responsible for communications within the cluster.

Desired State

Desired State is a description of how the system should function. This description defines how many copies of the application should be running, what network resources they should use, what storage volumes they should connect, and much more.

In other words, the desired state is a plan that describes how the system should work.

Using kube-controller-manager, Kubernetes constantly checks whether everything is working according to the described requirements. When something breaks or works incorrectly, Kubernetes fixes it according to the plan.

When a container fails, Kubernetes automatically creates a new container to replace it. If the application configuration is updated, Kubernetes makes the appropriate changes while keeping the system in a functional state. This ensures continuous application operation, even in case of errors.

If a web application requires additional resources, Kubernetes automatically scales the application. When the system load decreases, reverse scaling occurs. This guarantees that the application always has enough resources for efficient operation, but does not waste resources when they are not needed.

This approach is useful for modern web applications because it ensures stability, scalability, and self-healing. Configuration files are used to describe the desired state.

Stable Network and Services

Each pod in Kubernetes receives a unique IP address that operates at the cluster level. This means there is no need to specifically create links between pods, and in most cases, you won't need to configure container port mapping to host ports, since they can communicate directly through these IP addresses. This greatly simplifies network management.

Agents on the node, such as Kubelet (which manages pods on the node) or system daemons, can communicate with all pods on that same node. This allows efficient resource management and monitoring of pod states.

All containers inside one pod share the pod's network namespace. That is, they use the IP and MAC address of the pod itself, not their own. To communicate with each other, containers within one pod can use localhost instead of any IP.

Communication Between Pods

Kubernetes uses a virtual network for its components and provides each pod with its own unique IP address within this virtual network. Communication between pods occurs using IP addresses.

Imagine that the first pod contains a Python Server that shows weather through a web page, and the second pod contains a service with an API for generating predictions. To show the weather and predictions, the weather service makes a request to the predictions service. And as the IP of the predictions service, it simply uses the IP of the second pod.

This scheme may look a bit confusing. Fortunately, knowing pod IPs is almost never necessary. For this, there are network objects called Services.

Services

Kubernetes provides easy scaling capabilities. Depending on the load and needs, the number of pods can increase or decrease. At such moments, a pod's IP address may change, or an additional pod with a new IP may be added. Tracking all these changes manually is very difficult. Kubernetes solves this problem with Service.

A Service, as a k8s network object, has a permanent address that can be used to access a group of pods. The address will be correct even if individual pods are shut down or replaced with new ones. This guarantees that the application will remain accessible at one address, regardless of internal changes in the cluster. Most often, a service is accessed not by its IP address but by its network name.

Service allows the use of different access types:

internal access (ClusterIP);
external access (NodePort, LoadBalancer);
mapping to external resources (ExternalName).

Service provides flexibility in deployment and interaction with applications, allowing adaptation to different needs and usage scenarios. Thanks to Service, you can easily find and connect to services in the cluster, which simplifies network management and increases the efficiency of interaction between services.

Service also allows you to configure access and security rules, controlling who can access the pods in the service. This provides an additional level of security and control, protecting applications from unauthorized access and attacks.