Common Questions About Containerd

1. Containerd-shim process leakage

Currently, it has been observed that containerd-shim process leakages may occur in a k8s cluster using Docker, especially in the situations of frequent creation and deletion of Pods or frequent restarts of Pods.
At this point, it can even cause docker inspect on a specific container to hang, subsequently leading to kubelet PLEG timeout anomaly. Here, using the coredns Pod as an example, we’ll show you how to check for the presence of containerd-shim process leakage. As shown in the example below, under normal circumstances, one containerd-shim process will have one actual working child process. When the child process disappears, the containerd-shim process will automatically exit. If the containerd-shim process does not have a child process, it indicates a process leak.

To handle the situation of containerd-shim process leakage, you can follow the steps below:

Identify the process id of the leak, and execute kill pid. Note that there is no need to add the -9 parameter at this point, as a simple kill can handle the situation in most cases.
After confirming the exit of the containerd-shim process, you can observe whether Docker and kubelet are back to normal.
Note that at this point, kubelet may have been blocked by Docker, preventing the execution of many operations. After Docker recovers, a large amount of operations may be performed simultaneously causing the node load to spike for a moment. Therefore, you can consider restarting kubelet and docker before and after the operation.


[root@xxxx ~]# docker ps |grep coredns-8f7c8b477-snmpq
ee404991798d   uhub.surfercloud.com/uk8s/coredns                        "/coredns -conf /etc…"   4 minutes ago   Up 4 minutes             k8s_coredns_coredns-8f7c8b477-snmpq_kube-system_26da4954-3d8e-4f67-902d-28689c45de37_0
b592e7f9d8f2   uhub.surfercloud.com/google_containers/pause-amd64:3.2   "/pause"                 4 minutes ago   Up 4 minutes             k8s_POD_coredns-8f7c8b477-snmpq_kube-system_26da4954-3d8e-4f67-902d-28689c45de37_0
[root@xxxx ~]# ps aux |grep ee404991798d
root       10386  0.0  0.2 713108 10628 ?        Sl   11:12   0:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id ee404991798d70cb9c3c7967a31b3bc2a50e56b072f2febf604004f5a3382ce2 -address /run/containerd/containerd.sock
root       12769  0.0  0.0 112724  2344 pts/0    S+   11:16   0:00 grep --color=auto ee404991798d
[root@xxxx ~]# ps -ef |grep 10386
root       10386       1  0 11:12 ?        00:00:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id ee404991798d70cb9c3c7967a31b3bc2a50e56b072f2febf604004f5a3382ce2 -address /run/containerd/containerd.sock
root       10421   10386  0 11:12 ?        00:00:00 /coredns -conf /etc/coredns/Corefile
root       12822   12398  0 11:17 pts/0    00:00:00 grep --color=auto 10386

2. Failure of kubelet in 1.19.5 cluster to connect to containerd

In the 1.19.5 cluster, there could be situations when the node is not ready. Looking into the kubelet logs, you will find a large number of Error while dialing dial unix:///run/containerd/containerd.sock related logs. This is a known bug in version 1.19.5; when containerd restarts, kubelet will lose the connection to containerd and its connection can only be restored by restarting kubelet. You can find specifics in the official k8s issue .
If you encounter this problem, restarting kubelet can help in the recovery. At the same time, current uk8s cluster no longer supports creating version 1.19.5 clusters; if your cluster version is 1.19.5, you can upgrade your cluster to version 1.19.10.