In this tutorial, we are going to discuss about POD networking in Kubernetes. So far we have setup several Kubernetes master and worker nodes and configured networking between them so they are all on a network that can reach each other.
We also made sure the firewall and network security groups are configured correctly to allow for the Kubernetes control plane components to reach each other.
Assume that we have also setup all the Kubernetes control plane components such as the kube-api server, the etcd servers, kubelets etc. And we are finally ready to deploy our applications.
But before we can do that there is something that we must address. We talked about the Network that connects the nodes together. But there is also another layer of networking that is crucial to the clusters functioning and that is the networking at the POD layer.
Our Kubernetes cluster is soon going to have a large number of PODs and services running on it. How are these PODs addressed, How do they communicate with each other, how do you access the services running on these PODs internally from within the cluster, as well as externally from outside the cluster.
These are challenges that Kubernetes expects you to solve. As of today, Kubernetes does not come with a built-in solution for this. It expects you to implement a networking solution that solves these challenges. However, Kubernetes have laid out, clearly, the requirements for POD networking.
Let’s take a look at what they are.
- Kubernetes expects every POD to get its own unique IP address.
- Every POD should be able to communicate with every other POD within the same node using that IP address.
- Every POD should be able to communicate with every other POD on other nodes other nodes as well using the same IP address.
It doesn’t care what IP address that is and what range or subnet it belongs to. As long as you can implement a solution that takes care of automatically assigning IP addresses and establish connectivity between the PODs in a node as well as PODs on different nodes, you are good.
Without having to configure any NAT rules. So how do you implement a model that solves these requirements.
Now there are many networking solutions available out there that does these. But we have already discussed about networking concepts, routing, IP Address management, namespaces and CNI.
So let’s try to use that knowledge to solve this problem by ourselves first. This will help in understanding how other solutions work.
I know there is a bit of repetition but I’m trying to relate the same concept and approach all the way way from plain network namespaces on Linux all the way to Kubernetes.
So we have a 3 node cluster. It doesn’t matter which one is master or worker. They all run pods either for management or workload purposes.
As far as networking is concerned we’re going to consider all of them as the same. So first let’s plan what we’re going to do.
The nodes are part of an external network and has IP addresses in the 192.168.1 series. Node1 is assigned 192.168.1.11, node2 is 192.168.1.12 and node3 is 192.168.1.13.
Assigning IP Address
Next step, When containers are created Kubernetes creates network namespaces for them. To enable communication between them we attach these namespaces to a network. But what network?
We’ve discussed about bridge networks that can be created within nodes to attach namespaces. So we create a bridge network on each node and then bring them up.
Now It’s time to assign an IP address to the bridge interfaces or networks. But what IP address. We decide that each bridge network will be on its own subnet. Chose any private IP address range. Say 10.244.1, 10.244.2 and 10.244.3.
Next we said the IP address for the bridge interface. So we have built our base. The remaining steps are to be performed for each container and every time a new container is created. So we write a script for it.
Now you don’t have to know any kind of complicated scripting. It’s just a file that has all commands we will be using. And we can run this multiple times for each container going forward.
Attach a container
To attach a container to the network. We need a pipe or virtual network cable. We create that using the ip link add command. Don’t focus on the options as they are similar to what we saw in our previous tutorials.
Assume that they vary depending on the inputs. We then attach one end to the container and another to the bridge using the ip link set command. We then assigned IP address using the ip addr command and add a route to default gateway. But what IP do we add?
We either manage that ourselves or store that information in some kind of database. For now we will assume it is 10.244.1.2 which is a free IP in the subnet.
Finally we bring up the interface. We then run the same script this time from the second container with its information and gets the container connected to the network.
The two containers can now communicate with each other. We copy the script to the other nodes and run the script on them to assign IP address and connect those containers to their own internal networks.
So we have solved the first part of the challenge. The pods all get their own unique IP address and are able to communicate with each other on their own nodes. The next part to is to enable them to reach other PODs on other nodes.
Say for example the pod at 10.244.1.2 on Node1 wants to ping pod 10.244.2.2 on Node2. As of now the first has no idea where the address 10.244.2.2 is, because it is on a different network than its own. so it routes to Node1’s IP as it is set to be the default gateway.
Add route in the routing table
Node1 doesn’t know either since 10.244.2.2 is a private network on Node2. Add a route to Node1s routing table to route traffic to 10.244.2.2 via second node IP at 192.168.1.12. Once the route is added the bluepod is able to ping across.
Similarly we configure route on all hosts to all other hosts with information regarding the respective networks within them.
node1$ ip route add 10.244.2.2 via 192.168.1.12 node1$ ip route add 10.244.3.2 via 192.168.1.13 node2$ ip route add 10.244.1.2 via 192.168.1.11 node2$ ip route add 10.244.3.2 via 192.168.1.13 node3$ ip route add 10.244.1.2 via 192.168.1.11 node3$ ip route add 10.244.2.2 via 192.168.1.12
Now this works fine in this simple setup. But this will require a lot more configuration as and when your underlying network architecture gets complicated.
Instead of having to configure routes on each server a better solution is to do that on a router. If you have one in your network and point all host to use that as the default gateway. That way you can easily manage routes to all networks in the routing table on the router.
With that the individual virtual networks we created with the address 10.244.1.0/24 on each node. Now form a single large network with the address 10.244.0.0/16.
It’s time to tie everything together. We performed a number of manual steps to get the environment ready with the bridge networks and routing tables.
We then wrote a script that can be run for each container that performs the necessary steps required to connect each container to the network and we executed the script manually.
Of course we don’t want to do that as in large environments where thousands of PODs are created every minute. So how do we run the script automatically when a pod is created on Kubernetes? That’s where CNI comes in acting as the middleman.
Container Network Interface (CNI)
CNI tells Kubernetes that this is how you should call a script as soon as you create a container. And CNI tells us this is how your script should look like.
So we need to modify the script a little bit to meet CNI’s standards. It should have an ADD section that will take care of adding a container to the network. And a DELETE section that will take care of deleting container interfaces from the network and freeing the IP address etc.
So our script is ready. The kubelet on each node is responsible for creating containers.
Whenever a container is created, the kubelet looks at the CNI configuration passed as a command line argument when it was run, and identifies our scripts name it then looks in the CNI bin directory to find our script and then executes the script with the add command and the name and namespace ID of the container and then our scripts takes care of the rest.