Rancher创建集群后一直提示[Disconnected] Cluster agent is not connected

zhiqiangking · 2022 年7 月 7 日 10:56

Rancher Server 设置

Rancher 版本：2.6.4
安装选项 (Docker install/Helm Chart): Helm Chart
- 如果是 Helm Chart 安装，需要提供 Local 集群的类型（RKE1, RKE2, k3s, EKS, 等）和版本：RKE2
在线或离线部署：
Ranchen集群采用高可用部署，三台服务器使用RKE2组成集群

下游集群信息

Kubernetes 版本: 1.22.7
Cluster Type (Local/Downstream):
- 如果 Downstream，是什么类型的集群?(自定义/导入或为托管等):
  自定义集群RKE，k8s版本v1.22.7，所有节点全部正常，集群处于 Waiting状态，
  
  image1635×326 18.5 KB
  
  image1106×677 23.4 KB
  
  另外，创建错误的集群不能删除，点击删除后没有任何反应。
  
  image1690×385 23.8 KB

用户信息

登录用户的角色是什么？（管理员/集群所有者/集群成员/项目所有者/项目成员/自定义）：
- 如果自定义，自定义权限集：

问题描述：

重现步骤：

结果：

预期结果：

截图：

其他上下文信息：

日志

在master节点上kube_api_auth容器有如下日志
E0708 00:36:13.919547       1 reflector.go:139] pkg/mod/github.com/rancher/client-go@v0.21.0-rancher.1/tools/cache/reflector.go:168: Failed to watch *v1.ClusterRepo: failed to list *v1.ClusterRepo: the server could not find the requested resource (get clusterrepos.meta.k8s.io)
E0708 00:37:11.254996       1 reflector.go:139] pkg/mod/github.com/rancher/client-go@v0.21.0-rancher.1/tools/cache/reflector.go:168: Failed to watch *v1.ClusterRepo: failed to list *v1.ClusterRepo: the server could not find the requested resource (get clusterrepos.meta.k8s.io)
E0708 00:37:44.889567       1 reflector.go:139] pkg/mod/github.com/rancher/client-go@v0.21.0-rancher.1/tools/cache/reflector.go:168: Failed to watch *v1.ClusterRepo: failed to list *v1.ClusterRepo: the server could not find the requested resource (get clusterrepos.meta.k8s.io)
E0708 00:38:24.806544       1 reflector.go:139] pkg/mod/github.com/rancher/client-go@v0.21.0-rancher.1/tools/cache/reflector.go:168: Failed to watch *v1.ClusterRepo: failed to list *v1.ClusterRepo: the server could not find the requested resource (get clusterrepos.meta.k8s.io)

#agent日志
time="2022-07-07T08:08:00Z" level=info msg="Option etcd=false"
time="2022-07-07T08:08:00Z" level=info msg="Option controlPlane=false"
time="2022-07-07T08:08:00Z" level=info msg="Option worker=false"
time="2022-07-07T08:08:00Z" level=info msg="Option requestedHostname=master03"
time="2022-07-07T08:08:00Z" level=info msg="Option dockerInfo={F5SL:MHBN:DFNI:OAMY:PTPN:KTIG:QWFJ:7OSK:SM57:TJ7S:UX3M:JQLI 12 10 0 2 5 overlay2 [[Backing Filesystem xfs] [Supports d_type true] [Native Overlay Diff true] [userxattr false]] [] {[local] [bridge host ipvlan macvlan null overlay] [] [awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog]} true true true true true true true true true true true true false 90 true 108 2022-07-07T16:08:00.44657978+08:00 json-file systemd 1 0 3.10.0-1160.el7.x86_64 CentOS Linux 7 (Core) 7 linux x86_64 https://index.docker.io/v1/ 0xc00167e150 8 12428435456 [] /data/docker    master03 [] false 20.10.17   map[io.containerd.runc.v2:{runc [] <nil>} io.containerd.runtime.v1.linux:{runc [] <nil>} runc:{runc [] <nil>}] runc {  inactive false  [] 0 0 <nil> []} false  docker-init {10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1} {v1.1.2-0-ga916309 v1.1.2-0-ga916309} {de40ad0 de40ad0} [name=seccomp,profile=default]  [] []}"
time="2022-07-07T08:08:00Z" level=info msg="Connecting to wss://rancher.sinopharm.com/v3/connect with token starting with hg7cnpts5x24w7qm6qz24q4f6ql"
time="2022-07-07T08:08:00Z" level=info msg="Connecting to proxy" url="wss://rancher.sinopharm.com/v3/connect"
time="2022-07-07T08:08:00Z" level=info msg="Starting plan monitor, checking every 120 seconds"
time="2022-07-07T08:08:15Z" level=info msg="Removing unmanaged agent /eager_hypatia(bdb281a1ad4e7dc7eca257fa6098de4e0ba7b8945134fa81f28b83a5f4683c06)"

niusmallnan · 2022 年7 月 8 日 09:45

cluster-agent-is-not-connected 在论坛中有很多人提问，你可以尝试搜索。基本上涵盖了troubleshooting的方法。

对于集群无法删除的情况，可能需要一些重现步骤才能进一步确认。不过，你可以在local集群中，找到这个cluster crd条目进行手动删除，通常都是卡在finalizer（需要手动移除它）。

zcjwsrf · 2023 年2 月 22 日 05:10

agent访问不通rancher server ，可能是使用了本地hosts，搭建个dns服务器就行了

moge1997 · 2024 年12 月 10 日 09:34

访问本地hosts有问题吗

asdfghhong · 2025 年7 月 22 日 00:48

按道理应该没问题，域名网络通就OK，感觉是