Rancher新建基于rke2的k8s集群,新集群创建node节点失败

环境信息:
RKE2 版本:

Kubernetes 版本
v1.26.7+rke2r1
云提供商

默认 - 嵌入 RKE2

节点 CPU 架构,操作系统和版本:

Linux node37 5.4.132-1.el7.elrepo.x86_64 #1 SMP Wed Jul 14 07:42:43 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux
集群配置:

问题描述:

rancher新建基于rke2的k8s集群,新集群创建node节点失败:

用rancher创建了rke2的集群,在rke2新节点上执行rancher-system-agent安装,rancher-system-agen启动有错误,新建 [rke2-k8s] 集群提示:Rkecontrolplane was already initialized but no etcd machines exist that have plans, indicating the etcd plane has been entirely replaced. Restoration from etcd snapshot is required.

rke2-k8s注册添加新节点:从1台添加失败,后来添加到3个新节点(含etcd,Control Plane,Worker)也没成功。

注册指令:

重现步骤:

  • 安装 RKE2 的命令:

curl --insecure -fL https://rancher.sslip.io/system-agent-install.sh | sudo sh -s - --server https://rancher.sslip.io --label ‘cattle.io/os=linux’ --token hmdwcg7xsp4zgd92nsn7rr7brp8czjgcsjmkmcpqxsrvh4lw7jshdn --ca-checksum 6d64872a7ba85277adacc6327dd8b331837a6d5b520bc272a0ef7b8b49426bf2 --etcd --controlplane --worker

预期结果:

实际结果:

日志

Aug 25 11:09:19 gvm systemd: rancher-system-agent.service holdoff time over, scheduling restart.
Aug 25 11:09:19 gvm kubelet: I0825 11:09:19.053065 1629 factory.go:120] Factory “docker” was unable to handle container “/system.slice/rancher-system-agent.service”
Aug 25 11:09:19 gvm kubelet: I0825 11:09:19.053088 1629 factory.go:109] Error trying to work out if we can handle /system.slice/rancher-system-agent.service: /system.slice/rancher-system-agent.service not handled by systemd handler
Aug 25 11:09:19 gvm kubelet: I0825 11:09:19.053094 1629 factory.go:120] Factory “systemd” was unable to handle container “/system.slice/rancher-system-agent.service”
Aug 25 11:09:19 gvm kubelet: I0825 11:09:19.053106 1629 factory.go:116] Using factory “raw” for container “/system.slice/rancher-system-agent.service”
Aug 25 11:09:19 gvm kubelet: I0825 11:09:19.053290 1629 manager.go:1011] Added container: “/system.slice/rancher-system-agent.service” (aliases: , namespace: “”)
Aug 25 11:09:19 gvm kubelet: I0825 11:09:19.053425 1629 handler.go:325] Added event &{/system.slice/rancher-system-agent.service 2023-08-25 11:09:19.050026768 +0800 CST containerCreation {}}
Aug 25 11:09:19 gvm kubelet: I0825 11:09:19.053466 1629 container.go:464] Start housekeeping for container “/system.slice/rancher-system-agent.service”
Aug 25 11:09:19 gvm rancher-system-agent: time=“2023-08-25T11:09:19+08:00” level=info msg=“Rancher System Agent version v0.3.3 (9e827a5) is starting”
Aug 25 11:09:19 gvm rancher-system-agent: time=“2023-08-25T11:09:19+08:00” level=info msg=“Using directory /var/lib/rancher/agent/work for work”
Aug 25 11:09:19 gvm rancher-system-agent: time=“2023-08-25T11:09:19+08:00” level=info msg=“Starting remote watch of plans”
Aug 25 11:09:19 gvm rancher-system-agent: time=“2023-08-25T11:09:19+08:00” level=fatal msg=“error while connecting to Kubernetes cluster: the server has asked for the client to provide credentials”
Aug 25 11:09:19 gvm systemd: rancher-system-agent.service: main process exited, code=exited, status=1/FAILURE
Aug 25 11:09:19 gvm kubelet: I0825 11:09:19.086654 1629 manager.go:1068] Destroyed container: “/system.slice/rancher-system-agent.service” (aliases: , namespace: “”)
Aug 25 11:09:19 gvm kubelet: I0825 11:09:19.086687 1629 handler.go:325] Added event &{/system.slice/rancher-system-agent.service 2023-08-25 11:09:19.086678888 +0800 CST m=+78248.990757545 containerDeletion {}}
Aug 25 11:09:19 gvm systemd: Unit rancher-system-agent.service entered failed state.
Aug 25 11:09:19 gvm systemd: rancher-system-agent.service failed.
Aug 25 11:09:24 gvm systemd: rancher-system-agent.service holdoff time over, scheduling restart.
Aug 25 11:09:24 gvm kubelet: I0825 11:09:24.302375 1629 factory.go:120] Factory “docker” was unable to handle container “/system.slice/rancher-system-agent.service”
Aug 25 11:09:24 gvm kubelet: I0825 11:09:24.302422 1629 factory.go:109] Error trying to work out if we can handle /system.slice/rancher-system-agent.service: /system.slice/rancher-system-agent.service not handled by systemd handler
Aug 25 11:09:24 gvm kubelet: I0825 11:09:24.302430 1629 factory.go:120] Factory “systemd” was unable to handle container “/system.slice/rancher-system-agent.service”
Aug 25 11:09:24 gvm kubelet: I0825 11:09:24.302437 1629 factory.go:116] Using factory “raw” for container “/system.slice/rancher-system-agent.service”
Aug 25 11:09:24 gvm kubelet: I0825 11:09:24.302589 1629 manager.go:1011] Added container: “/system.slice/rancher-system-agent.service” (aliases: , namespace: “”)
Aug 25 11:09:24 gvm kubelet: I0825 11:09:24.302683 1629 handler.go:325] Added event &{/system.slice/rancher-system-agent.service 2023-08-25 11:09:24.300910467 +0800 CST containerCreation {}}
Aug 25 11:09:24 gvm kubelet: I0825 11:09:24.302709 1629 container.go:464] Start housekeeping for container “/system.slice/rancher-system-agent.service”
Aug 25 11:09:24 gvm rancher-system-agent: time=“2023-08-25T11:09:24+08:00” level=info msg=“Rancher System Agent version v0.3.3 (9e827a5) is starting”
Aug 25 11:09:24 gvm rancher-system-agent: time=“2023-08-25T11:09:24+08:00” level=info msg=“Using directory /var/lib/rancher/agent/work for work”
Aug 25 11:09:24 gvm rancher-system-agent: time=“2023-08-25T11:09:24+08:00” level=info msg=“Starting remote watch of plans”
Aug 25 11:09:24 gvm rancher-system-agent: time=“2023-08-25T11:09:24+08:00” level=fatal msg=“error while connecting to Kubernetes cluster: the server has asked for the client to provide credentials”
Aug 25 11:09:24 gvm systemd: rancher-system-agent.service: main process exited, code=exited, status=1/FAILURE
Aug 25 11:09:24 gvm kubelet: I0825 11:09:24.336306 1629 manager.go:1068] Destroyed container: “/system.slice/rancher-system-agent.service” (aliases: , namespace: “”)
Aug 25 11:09:24 gvm kubelet: I0825 11:09:24.336332 1629 handler.go:325] Added event &{/system.slice/rancher-system-agent.service 2023-08-25 11:09:24.336325158 +0800 CST m=+78254.240403812 containerDeletion {}}
Aug 25 11:09:24 gvm systemd: Unit rancher-system-agent.service entered failed state.
Aug 25 11:09:24 gvm systemd: rancher-system-agent.service failed.

Rancher Server 设置:
Rancher 版本:2.7.5
安装选项: Helm Chart
RKE2版本:v1.26.5+rke2r1

大佬,帮忙看下这个问题 :pray:

可以查看 rancher-system-agent 的日志,然后根据下面的链接排查 rke2 的其他日志,查看哪个地方出问题了

rancher-system-agen客户端启动失败,在KSD大神回复的基础上可以查看下游集群主机/var/lib/rancher/rke2/agent/logs/kubelet.log日志进一步排查