Rke2 k8s在内核3.10上出现netns泄露

环境信息:
RKE2 版本:

[root@testnode3 ~]# rke2 -v
rke2 version v1.31.4+rke2r1 (5142beec71f7a61804840df5b434c2fd7137ce82)
go version go1.22.9 X:boringcrypto

节点 CPU 架构,操作系统和版本:

[root@testnode3 ~]# uname -a
Linux testnode3 3.10.0-1160.119.1.el7.x86_64 #1 SMP Tue Jun 4 14:43:51 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

集群配置:

[root@testnode3 ~]# kubectl get nodes
NAME        STATUS   ROLES                       AGE    VERSION
testnode1   Ready    control-plane,etcd,master   201d   v1.31.4+rke2r1
testnode2   Ready    control-plane,etcd,master   201d   v1.31.4+rke2r1
testnode3   Ready    control-plane,etcd,master   201d   v1.31.4+rke2r1
testnode4   Ready    <none>                      200d   v1.31.4+rke2r1
testnode5   Ready    <none>                      200d   v1.31.4+rke2r1
testnode6   Ready    <none>                      200d   v1.31.4+rke2r1

问题描述:

集群运行1个多月后在testnode1,testnode2,testnode3会出现大量netns文件:
ls /var/run/netns | wc -l 数量基本就是几万个.
testnode4,testnode5,testnode6的数量是十几个.

然后执行ip命令会失败:

[root@testnode1 ~]# ip addr
Cannot bind netlink socket: Argument list too long
# 在rancher上的日志是:
 Warning  FailedCreatePodSandBox  86s (x1582 over 5h44m)  kubelet  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create network namespace for sandbox "fae7442ee1affb3e2caa12c1130a465799bc77eff99e6adb13fa32052fcf10ad": failed to setup netns: failed to create namespace: cannot allocate memory

部署新服务,或者删除pod等都会失败, 一直卡在ContainerCreating,
这个时候重启全部服务器(reboot)后会恢复

重现步骤:

  • 安装 RKE2 的命令:

预期结果:

实际结果:

日志
attle-fleet-local-system         fleet-agent-0                                           2/2     Running             0                51d
cattle-fleet-system               fleet-cleanup-gitrepo-jobs-29537280-6h5v7               0/1     ContainerCreating   0                6d5h
cattle-fleet-system               fleet-controller-86475cfcb5-vzfzj                       3/3     Running             133 (51d ago)    208d
cattle-fleet-system               gitjob-84689865c8-7gc5f                                 1/1     Running             34 (51d ago)     156d
cattle-provisioning-capi-system   capi-controller-manager-7b644f4667-b7lcs                1/1     Running             94 (2d17h ago)   156d
cattle-system                     rancher-7bbb49f74f-lbr7r                                1/1     Running             35 (51d ago)     208d
cattle-system                     rancher-webhook-865df66cd6-xbhng                        1/1     Running             3 (51d ago)      208d
cattle-system                     system-upgrade-controller-8fbf9cf-bb4wj                 1/1     Running             4 (51d ago)      208d
cert-manager                      cert-manager-56d4c7dfb7-m7srk                           1/1     Running             19 (51d ago)     156d
cert-manager                      cert-manager-cainjector-6dc54dcd78-8ln78                1/1     Running             25 (51d ago)     208d
cert-manager                      cert-manager-webhook-5d74598b49-lq8j7                   1/1     Running             2 (51d ago)      156d
fleet-default                     rke2-machineconfig-cleanup-cronjob-29532965-c6874       0/1     Completed           0                9d
fleet-default                     rke2-machineconfig-cleanup-cronjob-29534405-rp24d       0/1     Completed           0                8d
fleet-default                     rke2-machineconfig-cleanup-cronjob-29535845-tcc4r       0/1     Completed           0                7d5h
fleet-default                     rke2-machineconfig-cleanup-cronjob-29537285-t92gp       0/1     ContainerCreating   0                6d5h
fleet-default                     rke2-machineconfig-cleanup-cronjob-29538725-z67s5       0/1     ContainerCreating   0                5d5h
fleet-default                     rke2-machineconfig-cleanup-cronjob-29540165-27krx       0/1     ContainerCreating   0                4d5h
fleet-default                     rke2-machineconfig-cleanup-cronjob-29541605-w672p       0/1     ContainerCreating   0                3d5h
fleet-default                     rke2-machineconfig-cleanup-cronjob-29543045-ng7gn       0/1     ContainerCreating   0                2d5h
fleet-default                     rke2-machineconfig-cleanup-cronjob-29544485-w67fs       0/1     ContainerCreating   0                29h
fleet-default                     rke2-machineconfig-cleanup-cronjob-29545925-r5rxr       0/1     ContainerCreating   0                5h47m
k