在centos8上部署k3s报错了

环境信息:
K3s 版本:
v1.24.6+k3s1

节点 CPU 架构、操作系统和版本::
Linux g610-03 4.18.0-348.7.1.el8_5.x86_64 #1 SMP Wed Dec 22 13:25:12 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

集群配置:
3 servers

问题描述:
在安装第三个节点的时候失败,看日志提示打开/sys/fs/cgroup/kubepods.slice/cpu.weight文件,提示文件不存在。用ls命令查看,该文件确实不存在。

复现步骤:

预期结果:

实际结果:

附加上下文/日志:
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.338463 43137 server.go:1177] “Started kubelet”
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.338668 43137 server.go:150] “Starting to listen” address=“0.0.0.0” port=10250
Nov 09 15:07:19 g610-03 k3s[43137]: E1109 15:07:19.338677 43137 cri_stats_provider.go:455] “Failed to get the info of the filesystem with mountpoint” err=“unable to find data in memory cache” mountpoint="/v>
Nov 09 15:07:19 g610-03 k3s[43137]: E1109 15:07:19.338698 43137 kubelet.go:1298] “Image garbage collection failed once. Stats initialization may not have completed yet” err=“invalid capacity 0 on image file>
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.339029 43137 fs_resource_analyzer.go:67] “Starting FS ResourceAnalyzer”
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.339084 43137 volume_manager.go:289] “Starting Kubelet Volume Manager”
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.339123 43137 desired_state_of_world_populator.go:145] “Desired state populator starts to run”
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.340210 43137 server.go:410] “Adding debug handlers to kubelet server”
Nov 09 15:07:19 g610-03 k3s[43137]: E1109 15:07:19.344348 43137 nodelease.go:49] “Failed to get node when trying to set owner ref to the node lease” err=“nodes “g610-03” not found” node=“g610-03”
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.363865 43137 kubelet_network_linux.go:76] “Initialized protocol iptables rules.” protocol=IPv4
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.373162 43137 cpu_manager.go:213] “Starting CPU manager” policy=“static”
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.373175 43137 cpu_manager.go:214] “Reconciling” reconcilePeriod=“10s”
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.373186 43137 state_mem.go:36] “Initialized new in-memory state store”
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.373325 43137 state_mem.go:88] “Updated default CPUSet” cpuSet=“0-63”
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.373336 43137 state_mem.go:96] “Updated CPUSet assignments” assignments=map
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.373708 43137 memory_manager.go:168] “Starting memorymanager” policy=“None”
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.373728 43137 state_mem.go:35] “Initializing new in-memory state store”
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.373857 43137 state_mem.go:75] “Updated machine memory state”
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.380422 43137 kubelet_network_linux.go:76] “Initialized protocol iptables rules.” protocol=IPv6
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.380436 43137 status_manager.go:161] “Starting to sync pod status with apiserver”
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.380456 43137 kubelet.go:1986] “Starting kubelet main sync loop”
Nov 09 15:07:19 g610-03 k3s[43137]: E1109 15:07:19.380481 43137 kubelet.go:2010] “Skipping pod synchronization” err=”[container runtime status check may not have completed yet, PLEG is not healthy: pleg has>
Nov 09 15:07:19 g610-03 k3s[43137]: E1109 15:07:19.406023 43137 cgroup_manager_linux.go:473] cgroup manager.Set failed: open /sys/fs/cgroup/kubepods.slice/cpu.weight: no such file or directory
Nov 09 15:07:19 g610-03 k3s[43137]: E1109 15:07:19.406079 43137 kubelet.go:1378] “Failed to start ContainerManager” err="failed to initialize top level QOS containers: root container [kubepods] doesn’t exis>
Nov 09 15:07:19 g610-03 systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE
Nov 09 15:07:19 g610-03 systemd[1]: k3s.service: Failed with result ‘exit-code’.

[root@g610-03 ~]# ls -l /sys/fs/cgroup/kubepods.slice/cpu.*
-rw-r–r-- 1 root root 0 Nov 9 11:33 /sys/fs/cgroup/kubepods.slice/cpu.pressure
-r–r–r-- 1 root root 0 Nov 9 11:33 /sys/fs/cgroup/kubepods.slice/cpu.stat

很可能是你选用的OS默认是cgroupv2,同时你对kubelet的参数配置没有进行兼容。
你可以尝试将OS的cgroup回滚到v1,参考:Runtime metrics | Docker Documentation

切换到cgroupv1确实就可以了。奇怪的地方是,前面两个节点也是cgroupv2,但是k3s运行正常。
[user ~]$ mount|grep cgroup
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)

cgroupv2其实也是被支持的,只是可能你的节点配置可能有细微差异,还没有被发现。

最近正好遇到这个问题, 卸载 rtkit 就可以了 dnf remove rtkit
( rtkit.x86_64 : Realtime Policy and Watchdog Daemon )

参考