环境信息:
RKE2 版本:
[root@testnode3 ~]# rke2 -v
rke2 version v1.31.4+rke2r1 (5142beec71f7a61804840df5b434c2fd7137ce82)
go version go1.22.9 X:boringcrypto
节点 CPU 架构,操作系统和版本:
[root@testnode3 ~]# uname -a
Linux testnode3 3.10.0-1160.119.1.el7.x86_64 #1 SMP Tue Jun 4 14:43:51 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
集群配置:
[root@testnode3 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
testnode1 Ready control-plane,etcd,master 201d v1.31.4+rke2r1
testnode2 Ready control-plane,etcd,master 201d v1.31.4+rke2r1
testnode3 Ready control-plane,etcd,master 201d v1.31.4+rke2r1
testnode4 Ready <none> 200d v1.31.4+rke2r1
testnode5 Ready <none> 200d v1.31.4+rke2r1
testnode6 Ready <none> 200d v1.31.4+rke2r1
问题描述:
集群运行1个多月后在testnode1,testnode2,testnode3会出现大量netns文件:
ls /var/run/netns | wc -l 数量基本就是几万个.
testnode4,testnode5,testnode6的数量是十几个.
然后执行ip命令会失败:
[root@testnode1 ~]# ip addr
Cannot bind netlink socket: Argument list too long
# 在rancher上的日志是:
Warning FailedCreatePodSandBox 86s (x1582 over 5h44m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create network namespace for sandbox "fae7442ee1affb3e2caa12c1130a465799bc77eff99e6adb13fa32052fcf10ad": failed to setup netns: failed to create namespace: cannot allocate memory
部署新服务,或者删除pod等都会失败, 一直卡在ContainerCreating,
这个时候重启全部服务器(reboot)后会恢复
重现步骤:
- 安装 RKE2 的命令:
预期结果:
实际结果:
日志
attle-fleet-local-system fleet-agent-0 2/2 Running 0 51d
cattle-fleet-system fleet-cleanup-gitrepo-jobs-29537280-6h5v7 0/1 ContainerCreating 0 6d5h
cattle-fleet-system fleet-controller-86475cfcb5-vzfzj 3/3 Running 133 (51d ago) 208d
cattle-fleet-system gitjob-84689865c8-7gc5f 1/1 Running 34 (51d ago) 156d
cattle-provisioning-capi-system capi-controller-manager-7b644f4667-b7lcs 1/1 Running 94 (2d17h ago) 156d
cattle-system rancher-7bbb49f74f-lbr7r 1/1 Running 35 (51d ago) 208d
cattle-system rancher-webhook-865df66cd6-xbhng 1/1 Running 3 (51d ago) 208d
cattle-system system-upgrade-controller-8fbf9cf-bb4wj 1/1 Running 4 (51d ago) 208d
cert-manager cert-manager-56d4c7dfb7-m7srk 1/1 Running 19 (51d ago) 156d
cert-manager cert-manager-cainjector-6dc54dcd78-8ln78 1/1 Running 25 (51d ago) 208d
cert-manager cert-manager-webhook-5d74598b49-lq8j7 1/1 Running 2 (51d ago) 156d
fleet-default rke2-machineconfig-cleanup-cronjob-29532965-c6874 0/1 Completed 0 9d
fleet-default rke2-machineconfig-cleanup-cronjob-29534405-rp24d 0/1 Completed 0 8d
fleet-default rke2-machineconfig-cleanup-cronjob-29535845-tcc4r 0/1 Completed 0 7d5h
fleet-default rke2-machineconfig-cleanup-cronjob-29537285-t92gp 0/1 ContainerCreating 0 6d5h
fleet-default rke2-machineconfig-cleanup-cronjob-29538725-z67s5 0/1 ContainerCreating 0 5d5h
fleet-default rke2-machineconfig-cleanup-cronjob-29540165-27krx 0/1 ContainerCreating 0 4d5h
fleet-default rke2-machineconfig-cleanup-cronjob-29541605-w672p 0/1 ContainerCreating 0 3d5h
fleet-default rke2-machineconfig-cleanup-cronjob-29543045-ng7gn 0/1 ContainerCreating 0 2d5h
fleet-default rke2-machineconfig-cleanup-cronjob-29544485-w67fs 0/1 ContainerCreating 0 29h
fleet-default rke2-machineconfig-cleanup-cronjob-29545925-r5rxr 0/1 ContainerCreating 0 5h47m
k