在centos8上部署k3s报错了

bruce · 2022 年11 月 9 日 03:29

环境信息:
K3s 版本:
v1.24.6+k3s1

节点 CPU 架构、操作系统和版本：:
Linux g610-03 4.18.0-348.7.1.el8_5.x86_64 #1 SMP Wed Dec 22 13:25:12 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

集群配置:
3 servers

问题描述:
在安装第三个节点的时候失败，看日志提示打开/sys/fs/cgroup/kubepods.slice/cpu.weight文件，提示文件不存在。用ls命令查看，该文件确实不存在。

复现步骤:

安装 K3s 的命令:
curl -sLS https://rancher-mirror.oss-cn-beijing.aliyuncs.com/k3s/k3s-install.sh | INSTALL_K3S_MIRROR=cn K3S_URL=‘https://172.27.100.181:6443’ K3S_TOKEN=‘bdcaeb4471ead2a445a6a1fc531ec401’ INSTALL_K3S_EXEC=‘server --server https://172.27.100.181:6443 --tls-san 172.27.100.181 --tls-san 172.27.100.182 --tls-san 172.27.100.183 --node-external-ip 172.27.100.183 --kubelet-arg cpu-manager-policy=static --kubelet-arg kube-reserved=cpu=2 --kubelet-arg system-reserved=memory=4Gi --cluster-cidr 10.42.0.0/16’ INSTALL_K3S_VERSION=‘v1.24.6+k3s1’ sh -

预期结果:

实际结果:

附加上下文/日志:
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.338463 43137 server.go:1177] “Started kubelet”
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.338668 43137 server.go:150] “Starting to listen” address=“0.0.0.0” port=10250
Nov 09 15:07:19 g610-03 k3s[43137]: E1109 15:07:19.338677 43137 cri_stats_provider.go:455] “Failed to get the info of the filesystem with mountpoint” err=“unable to find data in memory cache” mountpoint="/v>
Nov 09 15:07:19 g610-03 k3s[43137]: E1109 15:07:19.338698 43137 kubelet.go:1298] “Image garbage collection failed once. Stats initialization may not have completed yet” err=“invalid capacity 0 on image file>
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.339029 43137 fs_resource_analyzer.go:67] “Starting FS ResourceAnalyzer”
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.339084 43137 volume_manager.go:289] “Starting Kubelet Volume Manager”
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.339123 43137 desired_state_of_world_populator.go:145] “Desired state populator starts to run”
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.340210 43137 server.go:410] “Adding debug handlers to kubelet server”
Nov 09 15:07:19 g610-03 k3s[43137]: E1109 15:07:19.344348 43137 nodelease.go:49] “Failed to get node when trying to set owner ref to the node lease” err=“nodes “g610-03” not found” node=“g610-03”
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.363865 43137 kubelet_network_linux.go:76] “Initialized protocol iptables rules.” protocol=IPv4
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.373162 43137 cpu_manager.go:213] “Starting CPU manager” policy=“static”
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.373175 43137 cpu_manager.go:214] “Reconciling” reconcilePeriod=“10s”
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.373186 43137 state_mem.go:36] “Initialized new in-memory state store”
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.373325 43137 state_mem.go:88] “Updated default CPUSet” cpuSet=“0-63”
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.373336 43137 state_mem.go:96] “Updated CPUSet assignments” assignments=map
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.373708 43137 memory_manager.go:168] “Starting memorymanager” policy=“None”
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.373728 43137 state_mem.go:35] “Initializing new in-memory state store”
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.373857 43137 state_mem.go:75] “Updated machine memory state”
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.380422 43137 kubelet_network_linux.go:76] “Initialized protocol iptables rules.” protocol=IPv6
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.380436 43137 status_manager.go:161] “Starting to sync pod status with apiserver”
Nov 09 15:07:19 g610-03 k3s[43137]: I1109 15:07:19.380456 43137 kubelet.go:1986] “Starting kubelet main sync loop”
Nov 09 15:07:19 g610-03 k3s[43137]: E1109 15:07:19.380481 43137 kubelet.go:2010] “Skipping pod synchronization” err=”[container runtime status check may not have completed yet, PLEG is not healthy: pleg has>
Nov 09 15:07:19 g610-03 k3s[43137]: E1109 15:07:19.406023 43137 cgroup_manager_linux.go:473] cgroup manager.Set failed: open /sys/fs/cgroup/kubepods.slice/cpu.weight: no such file or directory
Nov 09 15:07:19 g610-03 k3s[43137]: E1109 15:07:19.406079 43137 kubelet.go:1378] “Failed to start ContainerManager” err="failed to initialize top level QOS containers: root container [kubepods] doesn’t exis>
Nov 09 15:07:19 g610-03 systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE
Nov 09 15:07:19 g610-03 systemd[1]: k3s.service: Failed with result ‘exit-code’.

[root@g610-03 ~]# ls -l /sys/fs/cgroup/kubepods.slice/cpu.*
-rw-r–r-- 1 root root 0 Nov 9 11:33 /sys/fs/cgroup/kubepods.slice/cpu.pressure
-r–r–r-- 1 root root 0 Nov 9 11:33 /sys/fs/cgroup/kubepods.slice/cpu.stat

niusmallnan · 2022 年11 月 10 日 01:55

很可能是你选用的OS默认是cgroupv2，同时你对kubelet的参数配置没有进行兼容。
你可以尝试将OS的cgroup回滚到v1，参考：Runtime metrics | Docker Documentation

bruce · 2022 年11 月 10 日 02:49

切换到cgroupv1确实就可以了。奇怪的地方是，前面两个节点也是cgroupv2，但是k3s运行正常。
[user ~]$ mount|grep cgroup
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)

niusmallnan · 2022 年11 月 11 日 05:06

cgroupv2其实也是被支持的，只是可能你的节点配置可能有细微差异，还没有被发现。

panda · 2024 年6 月 5 日 02:13

最近正好遇到这个问题, 卸载 rtkit 就可以了 dnf remove rtkit
( rtkit.x86_64 : Realtime Policy and Watchdog Daemon )

参考

github.com/canonical/microk8s

microk8s v1.26 daemon-kubelite crashloop with cgroup error

opened 06:04AM - 02 Apr 23 UTC

closed 05:26PM - 10 Apr 23 UTC

jpalpant

#### Summary I am running microk8s v1.26 inside of WSL2 (Ubuntu 22.04) via the snap. After some unknown period of time and system updates, I noticed that microk8s wasn't behaving correctly. microk8s inspect said services were starting, but I eventually noticed that daemon-kubelite was crashlooping with this message: ``` Apr 01 22:49:00 windows-node-01 microk8s.daemon-kubelite[12820]: E0401 22:49:00.771376 12820 cgroup_manager_linux.go:472] cgroup manager.Set failed: openat2 /sys/fs/cgroup/kubepods/cpu.weight: no such file or directory Apr 01 22:49:00 windows-node-01 microk8s.daemon-kubelite[12820]: E0401 22:49:00.771456 12820 kubelet.go:1466] "Failed to start ContainerManager" err="failed to initialize top level QOS containers: root container [kubepods] doesn't exist" Apr 01 22:49:01 windows-node-01 systemd[1]: snap.microk8s.daemon-kubelite.service: Main process exited, code=exited, status=1/FAILURE Apr 01 22:49:01 windows-node-01 systemd[1]: snap.microk8s.daemon-kubelite.service: Failed with result 'exit-code'. Apr 01 22:49:01 windows-node-01 systemd[1]: snap.microk8s.daemon-kubelite.service: Consumed 2.087s CPU time. Apr 01 22:49:01 windows-node-01 systemd[1]: snap.microk8s.daemon-kubelite.service: Scheduled restart job, restart counter is at 31. ``` Any advice on what I could look into to track this down? I expect my machine is misconfigured, but it could be something others run into. I don't have any experience with cgroups but am happy to pull more information or logs if it's helpful. ``` $ uname -a Linux windows-node-01 5.15.90.1-microsoft-standard-WSL2 #1 SMP Fri Jan 27 02:56:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.2 LTS Release: 22.04 Codename: jammy ``` #### What Should Happen Instead? daemon-kubelite should start and run normally. #### Reproduction Steps Unfortunately no, I'm not sure how I got into this situation except possibly standard Windows updates. #### Introspection Report [inspection-report-20230401_225828.tar.gz](https://github.com/canonical/microk8s/files/11131231/inspection-report-20230401_225828.tar.gz) #### Can you suggest a fix? #### Are you interested in contributing with a fix?