定位 Orphaned Pod Found - but Volume Paths Are Still Present on Disk 问题
问题描述
今天一台kubernetes
计算节点状态显示异常(NotReady
)。首先登陆到计算节点查看Kubelet
和Docker
进行状态,显示都没有问题。
然后去查看系统日志(/var/log/message),发现如下的报错信息:
1 | Dec 31 12:44:16 docker18 kubelet: E1231 12:44:16.634146 707301 kubelet_volumes.go:128] Orphaned pod "356a8df1-0b4e-11e9-8afe-fa163e75de2b" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them. |
问题定位
从错误信息可以推测到,这台计算节点存在一个孤儿Pod,并且该Pod挂载了数据卷(volume),阻碍了Kubelet对孤儿Pod正常的回收清理。
注意: 孤儿Pod: 就是裸露的Pod,没有相关的控制器领养的Pod
通过google搜索相关的信息也证实了这一点:
https://github.com/kubernetes/kubernetes/issues/60987
https://github.com/kubernetes/kubernetes/pull/68616
1 | While meet Orphan Pod, kubelet will clean up it and its directorys (cleanupOrphanedPodDirs); |
解决问题
1.首先通过Pod ID
获取Pod的挂载数据卷的mount信息:
1 | # mount -l | grep 356a8df1-0b4e-11e9-8afe-fa163e75de2b |
2.为了防止数据丢失,umount该挂载点
1 | umount /data/kubelet/pods/356a8df1-0b4e-11e9-8afe-fa163e75de2b/volumes/kubernetes.io~cephfs/pvc-fac32543-f3ab-11e8-acec-fa163e75de2b |
3.删除该计算节点Pod的元数据
1 | rm -r /data/kubelet/pods/356a8df1-0b4e-11e9-8afe-fa163e75de2b |
4.检查kubernetes计算节点是否正常
1 | kubectl get nodes |
ok,计算节点恢复正常:)