Rancher Server 设置
- Rancher 版本:2.7.5
- 安装选项 (Docker install/Helm Chart): Helm Chart
- 如果是 Helm Chart 安装,需要提供 Local 集群的类型(RKE1, RKE2, k3s, EKS, 等)和版本:RKE2
- 在线或离线部署:在线部署,部署完成之后断网
下游集群信息
- Kubernetes 版本: v1.26.7+rke2r1
- Cluster Type (Local/Downstream): 自定义 REK2
- 如果 Downstream,是什么类型的集群?(自定义/导入或为托管 等):
用户信息
- 登录用户的角色是什么? (管理员/集群所有者/集群成员/项目所有者/项目成员/自定义):admin
- 如果自定义,自定义权限集:admin
**主机操作系统:rockylinux9
**问题描述:上下游集群自己是正常的,所有机器都是ready。有一台机器重启系统之后报error applying plan – check rancher-system-agent.service logs on node for more information,我尝试重启rancher-system-agent.service结果无效,拿其他正常的机器去重启rancher-system-agent.service,结果就是重启后都挂了。
查看rancher-system-agent的日志,感觉是要跑一个脚本但是意外退出了,msg=“[Applyinator] Command sh [-c run.sh] finished with err: and exit code: 127” 不清楚脚步里面的具体内容。
查看rancher日志,我怀疑的点是他必须跑到外网https://api.github.com/repos/rancher/ui-plugin-charts/commits/main去拿个模板,因为目前是断网所以导致上下游通讯失败?
这里有一个重要的变更是,安装的时候是联网的,在先安装,之后才断开外网的。
在目前断开外网的情况下如何恢复呢?
**重现步骤:systemctl restart rancher-system-agent.service
**结果:结果如图所示Error applying plan – check rancher-system-agent.service logs on node for more information
**预期结果:Error applying plan – check rancher-system-agent.service logs on node for more information
**截图:


其他上下文信息:
rancher-system-agent.service日志
rancher-system-agent.service的日志如下:
-- The start-up result is done.
Jan 15 11:06:01 5gaocsrv08v rancher-system-agent[1123]: time="2026-01-15T11:06:01+02:00" level=info msg="Rancher System Agent version v0.3.3 (9e827a5) is starting"
Jan 15 11:06:01 5gaocsrv08v rancher-system-agent[1123]: time="2026-01-15T11:06:01+02:00" level=info msg="Using directory /var/lib/rancher/agent/work for work"
Jan 15 11:06:01 5gaocsrv08v rancher-system-agent[1123]: time="2026-01-15T11:06:01+02:00" level=info msg="Starting remote watch of plans"
Jan 15 11:06:01 5gaocsrv08v rancher-system-agent[1123]: E0115 11:06:01.817488 1123 memcache.go:206] couldn't get resource list for management.cattle.io/v3:
Jan 15 11:06:01 5gaocsrv08v rancher-system-agent[1123]: time="2026-01-15T11:06:01+02:00" level=info msg="Starting /v1, Kind=Secret controller"
Jan 15 11:06:01 5gaocsrv08v rancher-system-agent[1123]: time="2026-01-15T11:06:01+02:00" level=info msg="Detected first start, force-applying one-time instruction set"
Jan 15 11:06:01 5gaocsrv08v rancher-system-agent[1123]: time="2026-01-15T11:06:01+02:00" level=info msg="[Applyinator] Applying one-time instructions for plan with checksum c107eaf92a9a2f86106ca657dab3e777>
Jan 15 11:06:01 5gaocsrv08v rancher-system-agent[1123]: time="2026-01-15T11:06:01+02:00" level=info msg="[Applyinator] Extracting image rancher/system-agent-installer-rke2:v1.26.7-rke2r1 to directory /var/>
Jan 15 11:06:01 5gaocsrv08v rancher-system-agent[1123]: time="2026-01-15T11:06:01+02:00" level=info msg="Using private registry config file at /etc/rancher/agent/registries.yaml"
Jan 15 11:06:01 5gaocsrv08v rancher-system-agent[1123]: time="2026-01-15T11:06:01+02:00" level=info msg="Pulling image index.docker.io/rancher/system-agent-installer-rke2:v1.26.7-rke2r1"
Jan 15 11:06:32 5gaocsrv08v rancher-system-agent[1123]: time="2026-01-15T11:06:32+02:00" level=info msg="[Applyinator] Running command: sh [-c run.sh]"
Jan 15 11:06:32 5gaocsrv08v rancher-system-agent[1123]: time="2026-01-15T11:06:32+02:00" level=info msg="[c107eaf92a9a2f86106ca657dab3e777feb4728239c254e5c9e54a70316d765a_0:stderr]: sh: run.sh: command not>
Jan 15 11:06:32 5gaocsrv08v rancher-system-agent[1123]: time="2026-01-15T11:06:32+02:00" level=info msg="[Applyinator] Command sh [-c run.sh] finished with err: <nil> and exit code: 127"
Jan 15 11:06:32 5gaocsrv08v rancher-system-agent[1123]: time="2026-01-15T11:06:32+02:00" level=error msg="error executing instruction 0: <nil>"
Jan 15 11:06:32 5gaocsrv08v rancher-system-agent[1123]: time="2026-01-15T11:06:32+02:00" level=info msg="[Applyinator] No image provided, creating empty working directory /var/lib/rancher/agent/work/202601>
Jan 15 11:06:32 5gaocsrv08v rancher-system-agent[1123]: time="2026-01-15T11:06:32+02:00" level=info msg="[Applyinator] Running command: sh [-c rke2 etcd-snapshot list --etcd-s3=false 2>/dev/null]"
Jan 15 11:06:32 5gaocsrv08v rancher-system-agent[1123]: time="2026-01-15T11:06:32+02:00" level=info msg="[c107eaf92a9a2f86106ca657dab3e777feb4728239c254e5c9e54a70316d765a_0:stdout]: Name >
Jan 15 11:06:32 5gaocsrv08v rancher-system-agent[1123]: time="2026-01-15T11:06:32+02:00" level=info msg="[c107eaf92a9a2f86106ca657dab3e777feb4728239c254e5c9e54a70316d765a_0:stdout]: etcd-snapshot-5gaocsrv0>
Jan 15 11:06:32 5gaocsrv08v rancher-system-agent[1123]: time="2026-01-15T11:06:32+02:00" level=info msg="[c107eaf92a9a2f86106ca657dab3e777feb4728239c254e5c9e54a70316d765a_0:stdout]: etcd-snapshot-5gaocsrv0>
Jan 15 11:06:32 5gaocsrv08v rancher-system-agent[1123]: time="2026-01-15T11:06:32+02:00" level=info msg="[c107eaf92a9a2f86106ca657dab3e777feb4728239c254e5c9e54a70316d765a_0:stdout]: etcd-snapshot-5gaocsrv0>
Jan 15 11:06:32 5gaocsrv08v rancher-system-agent[1123]: time="2026-01-15T11:06:32+02:00" level=info msg="[c107eaf92a9a2f86106ca657dab3e777feb4728239c254e5c9e54a70316d765a_0:stdout]: etcd-snapshot-5gaocsrv0>
Jan 15 11:06:32 5gaocsrv08v rancher-system-agent[1123]: time="2026-01-15T11:06:32+02:00" level=info msg="[c107eaf92a9a2f86106ca657dab3e777feb4728239c254e5c9e54a70316d765a_0:stdout]: etcd-snapshot-5gaocsrv0>
Jan 15 11:06:32 5gaocsrv08v rancher-system-agent[1123]: time="2026-01-15T11:06:32+02:00" level=info msg="[Applyinator] Command sh [-c rke2 etcd-snapshot list --etcd-s3=false 2>/dev/null] finished with err:>
Jan 15 11:06:33 5gaocsrv08v rancher-system-agent[1123]: time="2026-01-15T11:06:33+02:00" level=info msg="[K8s] updated plan secret fleet-default/custom-0e52d8d78b68-machine-plan with feedback"
Jan 15 11:06:33 5gaocsrv08v rancher-system-agent[1123]: time="2026-01-15T11:06:33+02:00" level=info msg="[K8s] updated plan secret fleet-default/custom-0e52d8d78b68-machine-plan with feedback"
Jan 15 11:06:59 5gaocsrv08v rancher-system-agent[1123]: time="2026-01-15T11:06:59+02:00" level=info msg="[K8s] updated plan secret fleet-default/custom-0e52d8d78b68-machine-plan with feedback"
Jan 15 11:07:39 5gaocsrv08v rancher-system-agent[1123]: time="2026-01-15T11:07:39+02:00" level=info msg="[K8s] updated plan secret fleet-default/custom-0e52d8d78b68-machine-plan with feedback"
rancher日志
其中上有集群的rancher的日志如下:
2026-01-15T11:43:41.836946549+02:00 2026/01/15 09:43:41 [ERROR] error syncing 'rancher-ui-plugins': handler helm-clusterrepo-download: Repo [https://github.com/rancher/ui-plugin-charts] is not accessible: Get "https://api.github.com/repos/rancher/ui-plugin-charts/commits/main": context deadline exceeded (Client.Timeout exceeded while awaiting headers), requeuing
2026-01-15T11:44:11.898578831+02:00 2026/01/15 09:44:11 [ERROR] error syncing 'rancher-ui-plugins': handler helm-clusterrepo-download: Repo [https://github.com/rancher/ui-plugin-charts] is not accessible: Get "https://api.github.com/repos/rancher/ui-plugin-charts/commits/main": context deadline exceeded (Client.Timeout exceeded while awaiting headers), requeuing
2026-01-15T11:44:41.965941846+02:00 2026/01/15 09:44:41 [ERROR] error syncing 'rancher-ui-plugins': handler helm-clusterrepo-download: Repo [https://github.com/rancher/ui-plugin-charts] is not accessible: Get "https://api.github.com/repos/rancher/ui-plugin-charts/commits/main": context deadline exceeded (Client.Timeout exceeded while awaiting headers), requeuing
2026-01-15T11:44:54.236991471+02:00 2026/01/15 09:44:54 [INFO] [planner] rkecluster fleet-default/dataops: waiting: failing bootstrap machine(s) custom-0e52d8d78b68: error applying plan -- check rancher-system-
agent.service logs on node for more information
2026-01-15T11:45:12.050467713+02:00 2026/01/15 09:45:12 [ERROR] error syncing 'rancher-ui-plugins': handler helm-clusterrepo-download: Repo [https://github.com/rancher/ui-plugin-charts] is not accessible: Get "https://api.github.com/repos/rancher/ui-plugin-charts/commits/main": context deadline exceeded (Client.Timeout exceeded while awaiting headers), requeuing
2026-01-15T11:45:23.422451039+02:00 2026/01/15 09:45:23 [ERROR] Failed to handle tunnel request from remote address 10.42.0.33:52820: response 400: cluster not found
2026-01-15T11:45:23.561901002+02:00 2026/01/15 09:45:23 [INFO] [planner] rkecluster fleet-default/dataops: waiting: failing bootstrap machine(s) custom-0e52d8d78b68: error applying plan -- check rancher-system-agent.service logs on node for more information
2026-01-15T11:45:23.640625846+02:00 2026/01/15 09:45:23 [INFO] Updating TLS secret for cattle-system/serving-cert (count: 28): map[field.cattle.io/projectId:local:p-nfkwv listener.cattle.io/cn-10.42.0.15:10.42.0.15 listener.cattle.io/cn-10.42.0.21:10.42.0.21 listener.cattle.io/cn-10.42.0.22:10.42.0.22 listener.cattle.io/cn-10.42.0.33:10.42.0.33 listener.cattle.io/cn-10.42.0.6:10.42.0.6 listener.cattle.io/cn-10.42.0.7:10.42.0.7 listener.cattle.io/cn-10.42.0.8:10.42.0.8 listener.cattle.io/cn-10.42.1.10:10.42.1.10 listener.cattle.io/cn-10.42.1.11:10.42.1.11 listener.cattle.io/cn-10.42.1.13:10.42.1.13 listener.cattle.io/cn-10.42.1.2:10.42.1.2 listener.cattle.io/cn-10.42.1.3:10.42.1.3 listener.cattle.io/cn-10.42.1.4:10.42.1.4 listener.cattle.io/cn-10.42.1.5:10.42.1.5 listener.cattle.io/cn-10.42.1.6:10.42.1.6 listener.cattle.io/cn-10.42.1.9:10.42.1.9 listener.cattle.io/cn-10.42.2.12:10.42.2.12 listener.cattle.io/cn-10.42.2.13:10.42.2.13 listener.cattle.io/cn-10.42.2.14:10.42.2.14 listener.cattle.io/cn-10.42.2.15:10.42.2.15 listener.cattle.io/cn-10.42.2.2:10.42.2.2 listener.cattle.io/cn-10.42.2.5:10.42.2.5 listener.cattle.io/cn-10.42.2.6:10.42.2.6 listener.cattle.io/cn-127.0.0.1:127.0.0.1 listener.c
attle.io/cn-5gaocsrv03v.ppa.olp.gr:5gaocsrv03v.ppa.olp.gr listener.cattle.io/cn-localhost:localhost listener.cattle.io/cn-rancher.cattle-system:rancher.cattle-system listener.cattle.io/fingerprint:SHA1=62C127697858F65F7516B37B3035E2AF5CD1A1D1]
2026-01-15T11:45:23.659057733+02:00 2026/01/15 09:45:23 [INFO] Active TLS secret cattle-system/serving-cert (ver=765007789) (count 28): map[field.cattle.io/projectId:local:p-nfkwv listener.cattle.io/cn-10.42.0.15:10.42.0.15 listener.cattle.io/cn-10.42.0.21:10.42.0.21 listener.cattle.io/cn-10.42.0.22:10.42.0.22 listener.cattle.io/cn-10.42.0.33:10.42.0.33 listener.cattle.io/cn-10.42.0.6:10.42.0.6 listener.cattle.io/cn-10.42.0.7:10.42.0.7 listener.cattle.io/cn-10.42.0.8:10.42.0.8 listener.cattle.io/cn-10.42.1.10:10.42.1.10 listener.cattle.io/cn-10.42.1.11:10.42.1.11 listener.cattle.io/cn-10.42.1.13:10.42.1.13 listener.cattle.io/cn-10.42.1.2:10.42.1.2 listener.cattle.io/cn-10.42.1.3:10.42.1.3 listener.cattle.io/cn-10.42.1.4:10.42.1.4 listener.cattle.io/cn-10.42.1.5:10.42.1.5 listener.cattle.io/cn-10.42.1.6:10.42.1.6 listener.cattle.io/cn-10.42.1.9:10.42.1.9 listener.cattle.io/cn-10.42.2.12:10.42.2.12 listener.cattle.io/cn-10.42.2.13:10.42.2.13 listener.cattle.io/cn-10.42.2.14:10.42.2.14 listener.cattle.io/cn-10.42.2.15:10.42.2.15 listener.cattle.io/cn-10.42.2.2:10.42.2.2 listener.cattle.io/cn-10.42.2.5:10.42.2.5 listener.cattle.io/cn-10.42.2.6:10.42.2.6 listener.cattle.io/cn-127.0.0.1:127.0.0.1 listener.cattle.io/cn-5gaocsrv03v.ppa.olp.gr:5gaocsrv03v.ppa.olp.gr listener.cattle.io/cn-localhost:localhost listener.cattle.io/cn-rancher.cattle-system:rancher.cattle-system listener.cattle.io/fingerprint:SHA1=62C127697858F65F7516B37B3035E2AF5CD1A1D1]
2026-01-15T11:45:28.424302356+02:00 2026/01/15 09:45:28 [ERROR] Failed to serve peer connection 10.42.1.11: websocket: close 1006 (abnormal closure): unexpected EOF
2026-01-15T11:45:28.430715304+02:00 2026/01/15 09:45:28 [INFO] error in remotedialer server [400]: read tcp 10.42.2.15:443->10.42.1.11:36462: use of closed network connection
2026/01/15 09:45:28 [ERROR] Failed to handle tunnel request from remote address 10.42.0.33:52832: response 400: cluster not found
2026-01-15T11:45:33.432347830+02:00 2026/01/15 09:45:33 [INFO] Handling backend connection request [10.42.1.11]
2026-01-15T11:45:33.449408992+02:00 2026/01/15 09:45:33 [ERROR] Failed to handle tunnel request from remote address 10.42.0.33:48662: response 400: cluster not found
2026-01-15T11:45:37.949602056+02:00 2026/01/15 09:45:37 [ERROR] Error during subscribe websocket: close sent
2026-01-15T11:45:38.033470825+02:00 2026/01/15 09:45:38 [ERROR] Error during subscribe websocket: close sent
2026-01-15T11:45:38.453272877+02:00 2026/01/15 09:45:38 [ERROR] Failed to handle tunnel request from remote address 10.42.0.33:48668: response 400: cluster not found
2026-01-15T11:45:42.154007852+02:00 2026/01/15 09:45:42 [ERROR] error syncing 'rancher-ui-plugins': handler helm-clusterrepo-download: Repo [https://github.com/rancher/ui-plugin-charts] is not accessible: Get "https://api.github.com/repos/rancher/ui-plugin-charts/commits/main": context deadline exceeded (Client.Timeout exceeded while awaiting headers), requeuing
2026-01-15T11:45:43.457185058+02:00 2026/01/15 09:45:43 [ERROR] Failed to handle tunnel request from remote address 10.42.0.33:40390: response 400: cluster not found
2026-01-15T11:45:48.459608667+02:00 2026/01/15 09:45:48 [ERROR] Failed to handle tunnel request from remote address 10.42.0.33:40400: response 400: cluster not found
2026-01-15T11:45:50.324930886+02:00 2026/01/15 09:45:50 [INFO] Adding peer wss://10.42.0.33/v3/connect, 10.42.0.33
2026-01-15T11:45:50.330940384+02:00 2026/01/15 09:45:50 [INFO] Stopping cluster agent for local
2026-01-15T11:45:50.349049378+02:00 2026/01/15 09:45:50 [INFO] Shutting down /v1, Kind=Node workers
2026-01-15T11:45:50.349065692+02:00 2026/01/15 09:45:50 [INFO] Shutting down rbac.authorization.k8s.io/v1, Kind=Role workers
2026-01-15T11:45:50.349068259+02:00 2026/01/15 09:45:50 [INFO] Shutting down apiregistr
ation.k8s.io/v1, Kind=APIService workers
2026-01-15T11:45:50.349070391+02:00 2026/01/15 09:45:50 [INFO] Shutting down rbac.authorization.k8s.io/v1, Kind=ClusterRole workers
2026-01-15T11:45:50.349072662+02:00 2026/01/15 09:45:50 [INFO] Shutting down /v1, Kind=Namespace workers
2026-01-15T11:45:50.349080846+02:00 2026/01/15 09:45:50 [INFO] Shutting down /v1, Kind=Secret workers
2026-01-15T11:45:50.349083261+02:00 2026/01/15 09:45:50 [INFO] Shutting down rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding workers
2026-01-15T11:45:50.349085271+02:00 2026/01/15 09:45:50 [INFO] Shutting down /v1, Kind=ServiceAccount workers
2026-01-15T11:45:50.349087626+02:00 2026/01/15 09:45:50 [INFO] Shutting down /v1, Kind=ConfigMap workers
2026-01-15T11:45:50.349089643+02:00 2026/01/15 09:45:50 [INFO] Shutting down /v1, Kind=LimitRange workers
2026-01-15T11:45:50.349091682+02:00 2026/01/15 09:45:50 [INFO] Shutting down /v1, Kind=ResourceQuota workers
2026-01-15T11:45:50.349096915+02:00 2026/01/15 09:45:50 [INFO] Shutting down rbac.authorization.k8s.io/v1, Kind=RoleBinding workers
2026-01-15T11:45:50.381985170+02:00 2026/01/15 09:45:50 [INFO] Starting cluster controllers for local
2026-01-15T11:45:50.645653628+02:00 2026/01/15 09:45:50 [INFO] Starting cluster agent for local [owner=false]
2026-01-15T11:45:50.645672829+02:00 2026/01/15 09:45:50 [INFO] Starting rbac.authorization.k8s.io/v1, Kind=ClusterRole controller
2026-01-15T11:45:50.645676044+02:00 2026/01/15 09:45:50 [INFO] Starting rbac.authorization.k8s.io/v1, Kind=RoleBinding controller
2026-01-15T11:45:50.645678533+02:00 2026/01/15 09:45:50 [INFO] Starting rbac.authorization.k8s.io/v1, Kind=Role controller
2026-01-15T11:45:50.645680943+02:00 2026/01/15 09:45:50 [INFO] Starting /v1, Kind=Namespace controller
2026-01-15T11:45:50.645683425+02:00 2026/01/15 09:45:50 [INFO] Starting /v1, Kind=Secret controller
2026-01-15T11:45:50.645685941+02:00 2026/01/15 09:45:50 [INFO] Starting /v1, Kind=ServiceAccount controller
2026-01-15T11:45:50.645690983+02:00 2026/01/15 09:45:50 [INFO
] Starting rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding controller
2026-01-15T11:45:51.053640024+02:00 2026/01/15 09:45:51 [INFO] [planner] rkecluster fleet-default/dataops: waiting: failing bootstrap machine(s) custom-0e52d8d78b68: error applying plan -- check rancher-system-agent.service logs on node for more information
2026/01/15 09:45:53 [INFO] Handling backend connection request [10.42.0.33]
2026/01/15 09:46:11 [INFO] [planner] rkecluster fleet-default/dataops: waiting: failing bootstrap machine(s) custom-0e52d8d78b68: error applying plan -- check rancher-system-agent.service logs on node for more information
2026/01/15 09:46:12 [ERROR] error syncing 'rancher-ui-plugins': handler helm-clusterrepo-download: Repo [https://github.com/rancher/ui-plugin-charts] is not accessible: Get "https://api.github.com/repos/rancher/ui-plugin-charts/commits/main": context deadline exceeded (Client.Timeout exceeded while awaiting headers), requeuing
2026/01/15 09:46:42 [ERROR] error syncing 'rancher-ui-plugins': handler helm-clusterrepo-download: Repo [https://github.com/rancher/ui-plugin-charts] is not accessible: Get "https://api.github.com/repos/rancher/ui-plugin-charts/commits/main": context deadline exceeded (Client.Timeout exceeded while awaiting headers), requeuing
2026/01/15 09:47:12 [ERROR] error syncing 'rancher-ui-plugins': handler helm-clusterrepo-download: Repo [https://github.com/rancher/ui-plugin-charts] is not accessible: Get "https://api.github.com/repos/rancher/ui-plugin-charts/commits/main": context deadline exceeded (Client.Timeout exceeded while awaiting headers), requeuing