Rancher-server 管理集群异常

Rancher Server 设置

  • Rancher 版本:2.10
  • 安装选项 (Docker install/Helm Chart):
    • rke up 安装rancher-server
    • 在线部署

下游集群信息

  • Kubernetes 版本: v1.31.2+rke2r1
  • Cluster Type (Local/Downstream):
    • Downstream
    • rancher-server UI控制台创建rke2集群

用户信息

  • 登录用户的角色:管理员

主机操作系统:

  • Ubuntu 22.04.5 LTS

问题描述:

  • rancher-server管理集群报错信息:“Cluster health check failed: Failed to communicate with API server during namespace check: Get “https://10.43.0.1:443/api/v1/namespaces/kube-system?timeout=45s”: context deadline exceeded”
  • 检查rancher-server 和RKE2集群均无异常,检查RKE2集群系统日志信息发现异常信息如下:
    “Feb 26 01:08:39 ip-10-1-10-2 rancher-system-agent[2620123]: W0226 01:08:39.867256 2620123 reflector.go:462] pkg/mod/github.com/rancher/client-go@v1.29.3-rancher1/tools/cache/reflector.go:229: watch of *v1.Secret ended with: an error on the server (“unable to decode an event from the watch stream: http2: client connection lost”) has prevented the request from succeeding
    Feb 26 01:08:41 ip-10-1-10-2 rancher-system-agent[2620123]: W0226 01:08:41.915355 2620123 reflector.go:539] pkg/mod/github.com/rancher/client-go@v1.29.3-rancher1/tools/cache/reflector.go:229: failed to list *v1.Secret: Get “https://rancher.abcd.com/api/v1/namespaces/fleet-default/secrets?fieldSelector=metadata.name%3Dcustom-1851018effa4-machine-plan&resourceVersion=44082975”: dial tcp 172.25.11.11:443: connect: no route to host”
  • RKE2集群节点执行“ping rancher.abcd.com”可以正常ping通,执行“telnet 172.25.11.11 443” 可以正常通讯

调查进度:

  • 目前怀疑是由于网络问题导致rancher-server集群管理RKE2集群异常,网络恢复后,不知道什么原因导致连接未恢复,请问不考虑重新将RKE2集群导入的情况下,rancher-server如何重新触发集群健康检查

重现步骤:

结果:

预期结果:

截图:

其他上下文信息:

日志
Feb 26 01:08:39 ip-10-1-10-2 rancher-system-agent[2620123]: W0226 01:08:39.867256 2620123 reflector.go:462] pkg/mod/github.com/rancher/client-go@v1.29.3-rancher1/tools/cache/reflector.go:229: watch of *v1.Secret ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
Feb 26 01:08:41 ip-10-1-10-2 rancher-system-agent[2620123]: W0226 01:08:41.915355 2620123 reflector.go:539] pkg/mod/github.com/rancher/client-go@v1.29.3-rancher1/tools/cache/reflector.go:229: failed to list *v1.Secret: Get "https://rancher.abcd.com/api/v1/namespaces/fleet-default/secrets?fieldSelector=metadata.name%3Dcustom-1851018effa4-machine-plan&resourceVersion=44082975": dial tcp 172.25.11.11:443: connect: no route to host