Rancher创建集群后一直提示[Disconnected] Cluster agent is not connected

Rancher Server 设置

  • Rancher 版本:2.6.4
  • 安装选项 (Docker install/Helm Chart): Helm Chart
    • 如果是 Helm Chart 安装,需要提供 Local 集群的类型(RKE1, RKE2, k3s, EKS, 等)和版本:RKE2
  • 在线或离线部署:
    Ranchen集群采用高可用部署,三台服务器使用RKE2组成集群

下游集群信息

  • Kubernetes 版本: 1.22.7
  • Cluster Type (Local/Downstream):

用户信息

  • 登录用户的角色是什么? (管理员/集群所有者/集群成员/项目所有者/项目成员/自定义):
    • 如果自定义,自定义权限集:

问题描述:

重现步骤:

结果:

预期结果:

截图:

其他上下文信息:

日志
在master节点上kube_api_auth容器有如下日志
E0708 00:36:13.919547       1 reflector.go:139] pkg/mod/github.com/rancher/client-go@v0.21.0-rancher.1/tools/cache/reflector.go:168: Failed to watch *v1.ClusterRepo: failed to list *v1.ClusterRepo: the server could not find the requested resource (get clusterrepos.meta.k8s.io)
E0708 00:37:11.254996       1 reflector.go:139] pkg/mod/github.com/rancher/client-go@v0.21.0-rancher.1/tools/cache/reflector.go:168: Failed to watch *v1.ClusterRepo: failed to list *v1.ClusterRepo: the server could not find the requested resource (get clusterrepos.meta.k8s.io)
E0708 00:37:44.889567       1 reflector.go:139] pkg/mod/github.com/rancher/client-go@v0.21.0-rancher.1/tools/cache/reflector.go:168: Failed to watch *v1.ClusterRepo: failed to list *v1.ClusterRepo: the server could not find the requested resource (get clusterrepos.meta.k8s.io)
E0708 00:38:24.806544       1 reflector.go:139] pkg/mod/github.com/rancher/client-go@v0.21.0-rancher.1/tools/cache/reflector.go:168: Failed to watch *v1.ClusterRepo: failed to list *v1.ClusterRepo: the server could not find the requested resource (get clusterrepos.meta.k8s.io)

#agent日志
time="2022-07-07T08:08:00Z" level=info msg="Option etcd=false"
time="2022-07-07T08:08:00Z" level=info msg="Option controlPlane=false"
time="2022-07-07T08:08:00Z" level=info msg="Option worker=false"
time="2022-07-07T08:08:00Z" level=info msg="Option requestedHostname=master03"
time="2022-07-07T08:08:00Z" level=info msg="Option dockerInfo={F5SL:MHBN:DFNI:OAMY:PTPN:KTIG:QWFJ:7OSK:SM57:TJ7S:UX3M:JQLI 12 10 0 2 5 overlay2 [[Backing Filesystem xfs] [Supports d_type true] [Native Overlay Diff true] [userxattr false]] [] {[local] [bridge host ipvlan macvlan null overlay] [] [awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog]} true true true true true true true true true true true true false 90 true 108 2022-07-07T16:08:00.44657978+08:00 json-file systemd 1 0 3.10.0-1160.el7.x86_64 CentOS Linux 7 (Core) 7 linux x86_64 https://index.docker.io/v1/ 0xc00167e150 8 12428435456 [] /data/docker    master03 [] false 20.10.17   map[io.containerd.runc.v2:{runc [] <nil>} io.containerd.runtime.v1.linux:{runc [] <nil>} runc:{runc [] <nil>}] runc {  inactive false  [] 0 0 <nil> []} false  docker-init {10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1} {v1.1.2-0-ga916309 v1.1.2-0-ga916309} {de40ad0 de40ad0} [name=seccomp,profile=default]  [] []}"
time="2022-07-07T08:08:00Z" level=info msg="Connecting to wss://rancher.sinopharm.com/v3/connect with token starting with hg7cnpts5x24w7qm6qz24q4f6ql"
time="2022-07-07T08:08:00Z" level=info msg="Connecting to proxy" url="wss://rancher.sinopharm.com/v3/connect"
time="2022-07-07T08:08:00Z" level=info msg="Starting plan monitor, checking every 120 seconds"
time="2022-07-07T08:08:15Z" level=info msg="Removing unmanaged agent /eager_hypatia(bdb281a1ad4e7dc7eca257fa6098de4e0ba7b8945134fa81f28b83a5f4683c06)"
1 个赞

cluster-agent-is-not-connected 在论坛中有很多人提问,你可以尝试搜索。基本上涵盖了troubleshooting的方法。

对于集群无法删除的情况,可能需要一些重现步骤才能进一步确认。不过,你可以在local集群中,找到这个cluster crd条目进行手动删除,通常都是卡在finalizer(需要手动移除它)。

agent访问不通rancher server ,可能是使用了本地hosts,搭建个dns服务器就行了