在Rancher UI中集群的节点显示ERROR

Rancher Server 设置

  • Rancher 版本:2.10.2 社区
  • 安装选项 (Docker install/Helm Chart): Docker install
  • 在线或离线部署: 离线

下游集群信息

  • Kubernetes 版本: k3s v1.31.4+k3s1
  • Cluster Type : Downstream
    • 自定义

用户信息

  • 登录用户的角色:管理员

主机操作系统:
SLES 15 sp6

问题描述:
在Rancher UI界面显示节点的状态为ERROR,提示:Error applying plan – check rancher-system-agent.service logs on node for more information。 但是通过kubectl get node 显示全部是ready的。

重现步骤:
集群开机后,提示此信息,重启系统,重启rancher-system-agent服务等都无效。

结果:

预期结果:

截图:


其他上下文信息:

日志
rke21-3:~ # systemctl status  rancher-system-agent
● rancher-system-agent.service - Rancher System Agent
     Loaded: loaded (/etc/systemd/system/rancher-system-agent.service; en
abled; preset: disabled)
     Active: active (running) since Thu 2025-06-05 16:05:11 CST; 11s ago
       Docs: https://www.rancher.com
   Main PID: 26309 (rancher-system-)
      Tasks: 9
        CPU: 113ms
     CGroup: /system.slice/rancher-system-agent.service
             └─26309 /opt/rancher-system-agent/bin/rancher-system-agent sentinel

Jun 05 16:05:11 rke21-3 systemd[1]: Started Rancher System Agent.
Jun 05 16:05:11 rke21-3 rancher-system-agent[26309]: time="2025-06-05T16:05:11+08:00" level=info msg="Rancher System Agent version v0.3.11 (b8c28d0) is
 starting"
Jun 05 16:05:11 rke21-3 rancher-system-agent[26309]: time="2025-06-05T16:05:11+08:00" level=info msg="Using directory /var/lib/rancher/agent/work for w
ork"
Jun 05 16:05:11 rke21-3 rancher-system-agent[26309]: time="2025-06-05T16:05:11+08:00" level=info msg="Starting remote watch of plans"
Jun 05 16:05:11 rke21-3 rancher-system-agent[26309]: time="2025-06-05T16:05:11+08:00" level=info msg="Starting /v1, Kind=Secret controller"
Jun 05 16:05:11 rke21-3 rancher-system-agent[26309]: time="2025-06-05T16:05:11+08:00" level=info msg="Detected first start, force-applying one-time ins
truction set"
Jun 05 16:05:11 rke21-3 rancher-system-agent[26309]: time="2025-06-05T16:05:11+08:00" level=error msg="[K8s] Maximum failure threshold exceeded for pla
n with checksum value of c96e029e8f8f806e751a35a37259a4f3e44571ce348916f44886be5f7d518991, (failures: 1, threshold: 1)"
Jun 05 16:05:16 rke21-3 rancher-system-agent[26309]: time="2025-06-05T16:05:16+08:00" level=error msg="[K8s] Maximum failure threshold exceeded for pla
n with checksum value of c96e029e8f8f806e751a35a37259a4f3e44571ce348916f44886be5f7d518991, (failures: 1, threshold: 1)"
Jun 05 16:05:21 rke21-3 rancher-system-agent[26309]: time="2025-06-05T16:05:21+08:00" level=error msg="[K8s] Maximum failure threshold exceeded for pla
n with checksum value of c96e029e8f8f806e751a35a37259a4f3e44571ce348916f44886be5f7d518991, (failures: 1, threshold: 1)"

rke21-3:~ # journalctl -xeu rancher-system-agent
Jun 05 16:00:53 rke21-3 systemd[1]: Started Rancher System Agent.
░░ Subject: A start job for unit rancher-system-agent.service has finished successfully
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░ 
░░ A start job for unit rancher-system-agent.service has finished successfully.
░░ 
░░ The job identifier is 262.
Jun 05 16:00:54 rke21-3 rancher-system-agent[1630]: time="2025-06-05T16:00:54+08:00" level=info msg="Rancher System Agent version v0.3.11 (b8c28d0) is 
starting"
Jun 05 16:00:54 rke21-3 rancher-system-agent[1630]: time="2025-06-05T16:00:54+08:00" level=info msg="Using directory /var/lib/rancher/agent/work for wo
rk"
Jun 05 16:00:54 rke21-3 rancher-system-agent[1630]: time="2025-06-05T16:00:54+08:00" level=info msg="Starting remote watch of plans"
Jun 05 16:00:54 rke21-3 rancher-system-agent[1630]: time="2025-06-05T16:00:54+08:00" level=info msg="Starting /v1, Kind=Secret controller"
Jun 05 16:00:54 rke21-3 rancher-system-agent[1630]: time="2025-06-05T16:00:54+08:00" level=info msg="Detected first start, force-applying one-time inst
ruction set"
Jun 05 16:00:54 rke21-3 rancher-system-agent[1630]: time="2025-06-05T16:00:54+08:00" level=error msg="[K8s] Maximum failure threshold exceeded for plan
 with checksum value of c96e029e8f8f806e751a35a37259a4f3e44571ce348916f44886be5f7d518991, (failures: 1, threshold: 1)"
Jun 05 16:00:54 rke21-3 rancher-system-agent[1630]: time="2025-06-05T16:00:54+08:00" level=info msg="[K8s] updated plan secret fleet-default/custom-763
c9046db2b-machine-plan with feedback"
Jun 05 16:00:54 rke21-3 rancher-system-agent[1630]: time="2025-06-05T16:00:54+08:00" level=error msg="[K8s] Maximum failure threshold exceeded for plan
 with checksum value of c96e029e8f8f806e751a35a37259a4f3e44571ce348916f44886be5f7d518991, (failures: 1, threshold: 1)"
Jun 05 16:00:54 rke21-3 rancher-system-agent[1630]: time="2025-06-05T16:00:54+08:00" level=info msg="[K8s] updated plan secret fleet-default/custom-763
c9046db2b-machine-plan with feedback"
Jun 05 16:00:54 rke21-3 rancher-system-agent[1630]: time="2025-06-05T16:00:54+08:00" level=error msg="[K8s] Maximum failure threshold exceeded for plan
 with checksum value of c96e029e8f8f806e751a35a37259a4f3e44571ce348916f44886be5f7d518991, (failures: 1, threshold: 1)"
Jun 05 16:00:59 rke21-3 rancher-system-agent[1630]: time="2025-06-05T16:00:59+08:00" level=error msg="[K8s] Maximum failure threshold exceeded for plan
 with checksum value of c96e029e8f8f806e751a35a37259a4f3e44571ce348916f44886be5f7d518991, (failures: 1, threshold: 1)"
Jun 05 16:01:04 rke21-3 rancher-system-agent[1630]: time="2025-06-05T16:01:04+08:00" level=error msg="[K8s] Maximum failure threshold exceeded for plan
 with checksum value of c96e029e8f8f806e751a35a37259a4f3e44571ce348916f44886be5f7d518991, (failures: 1, threshold: 1)"
Jun 05 16:01:04 rke21-3 rancher-system-agent[1630]: time="2025-06-05T16:01:04+08:00" level=info msg="[K8s] updated plan secret fleet-default/custom-763
c9046db2b-machine-plan with feedback"
Jun 05 16:01:04 rke21-3 rancher-system-agent[1630]: time="2025-06-05T16:01:04+08:00" level=error msg="[K8s] Maximum failure threshold exceeded for plan
 with checksum value of c96e029e8f8f806e751a35a37259a4f3e44571ce348916f44886be5f7d518991, (failures: 1, threshold: 1)"
Jun 05 16:01:09 rke21-3 rancher-system-agent[1630]: time="2025-06-05T16:01:09+08:00" level=error msg="[K8s] Maximum failure threshold exceeded for plan
 with checksum value of c96e029e8f8f806e751a35a37259a4f3e44571ce348916f44886be5f7d518991, (failures: 1, threshold: 1)"
Jun 05 16:01:09 rke21-3 rancher-system-agent[1630]: time="2025-06-05T16:01:09+08:00" level=info msg="[K8s] updated plan secret fleet-default/custom-763
c9046db2b-machine-plan with feedback"
Jun 05 16:01:09 rke21-3 rancher-system-agent[1630]: time="2025-06-05T16:01:09+08:00" level=error msg="[K8s] Maximum failure threshold exceeded for plan
 with checksum value of c96e029e8f8f806e751a35a37259a4f3e44571ce348916f44886be5f7d518991, (failures: 1, threshold: 1)"
Jun 05 16:01:14 rke21-3 rancher-system-agent[1630]: time="2025-06-05T16:01:14+08:00" level=error msg="[K8s] Maximum failure threshold exceeded for plan
 with checksum value of c96e029e8f8f806e751a35a37259a4f3e44571ce348916f44886be5f7d518991, (failures: 1, threshold: 1)"
Jun 05 16:01:19 rke21-3 rancher-system-agent[1630]: time="2025-06-05T16:01:19+08:00" level=error msg="[K8s] Maximum failure threshold exceeded for plan
 with checksum value of c96e029e8f8f806e751a35a37259a4f3e44571ce348916f44886be5f7d518991, (failures: 1, threshold: 1)"
Jun 05 16:01:24 rke21-3 rancher-system-agent[1630]: time="2025-06-05T16:01:24+08:00" level=error msg="[K8s] Maximum failure threshold exceeded for plan
 with checksum value of c96e029e8f8f806e751a35a37259a4f3e44571ce348916f44886be5f7d518991, (failures: 1, threshold: 1)"
Jun 05 16:01:29 rke21-3 rancher-system-agent[1630]: time="2025-06-05T16:01:29+08:00" level=error msg="[K8s] Maximum failure threshold exceeded for plan
 with checksum value of c96e029e8f8f806e751a35a37259a4f3e44571ce348916f44886be5f7d518991, (failures: 1, threshold: 1)"
Jun 05 16:01:34 rke21-3 rancher-system-agent[1630]: time="2025-06-05T16:01:34+08:00" level=error msg="[K8s] Maximum failure threshold exceeded for plan
 with checksum value of c96e029e8f8f806e751a35a37259a4f3e44571ce348916f44886be5f7d518991, (failures: 1, threshold: 1)"
Jun 05 16:01:39 rke21-3 rancher-system-agent[1630]: time="2025-06-05T16:01:39+08:00" level=error msg="[K8s] Maximum failure threshold exceeded for plan
 with checksum value of c96e029e8f8f806e751a35a37259a4f3e44571ce348916f44886be5f7d518991, (failures: 1, threshold: 1)"
Jun 05 16:01:44 rke21-3 rancher-system-agent[1630]: time="2025-06-05T16:01:44+08:00" level=error msg="[K8s] Maximum failure threshold exceeded for plan
 with checksum value of c96e029e8f8f806e751a35a37259a4f3e44571ce348916f44886be5f7d518991, (failures: 1, threshold: 1)"
Jun 05 16:01:44 rke21-3 rancher-system-agent[1630]: time="2025-06-05T1


使用9天前的etcd快照进行了Restore。集群界面正常了。但是,没搞懂这个报错。

可以登录有问题的主机节点,看下错误日志

journalctl -xef -u rancher-system-agent.service

嗯,我的日志里有xeu的日志。