rancher2.6.8下游集群注册报错[Disconnected] Cluster agent is not connected

Rancher Server 设置

  • Rancher 版本:v2.6.8 / v2.6.7 / v2.6.8-patch1
  • 安装选项 (Docker install/Helm Chart): Helm Chart
    • 如果是 Helm Chart 安装,需要提供 Local 集群的类型(RKE1等)和版本:RKE version: v1.3.14 local 集群 v1.23.8
  • 在线或离线部署:离线部署

下游集群信息

  • Kubernetes 版本: v1.23.8
  • Cluster Type (Local/Downstream):
    • 如果 Downstream,是什么类型的集群?(自定义/导入或为托管 等): 自定义

用户信息 admin

  • 登录用户的角色是什么? (管理员/集群所有者/集群成员/项目所有者/项目成员/自定义):管理员
    • 如果自定义,自定义权限集:Default Admin

**主机操作系统:Centos 7.9 4.19.12-1.el7.elrepo.x86_64

**问题描述:k8s部署正常,local集群正常,其中的helm-operation 运行正常后 状态变成NotReady;创建下游集群,报错[Disconnected] Cluster agent is not connected


**重现步骤:v2.6.8 / v2.6.7 / v2.6.8-patch1 每个版本都试过都是这样的错误;证书是自制作的

结果: [Disconnected] Cluster agent is not connected 停留在 Waiting for API to be available 没有完成集群注册

预期结果:

截图:

其他上下文信息:

日志

不知道查看哪个日志,分析对应问题,如何解决呢?



你可以看下 rancher server 的日志,还有你截图中标注的 cluster-agent 日志

kubectl -n cattle-system logs -f rancher-6496df4785-ps76z

2022/09/29 02:58:23 [INFO] Handling backend connection request [c-cshzg:m-996677f3dd4b]
2022/09/29 02:58:23 [INFO] Handling backend connection request [c-cshzg:m-1f954fa9d278]
2022/09/29 02:58:27 [INFO] Handling backend connection request [c-cshzg:m-1f954fa9d278]
2022/09/29 02:58:27 [INFO] error in remotedialer server [400]: websocket: close 1006 (abnormal closure): unexpected EOF
W0929 02:58:40.155771 34 reflector.go:443] pkg/mod/github.com/rancher/client-go@v1.24.0-rancher1/tools/cache/reflector.go:168: watch of *v1.Namespace ended with: an error on the server (“unable to decode an event from the watch stream: tunnel disconnect”) has prevented the request from succeeding
W0929 02:58:40.169450 34 reflector.go:443] pkg/mod/github.com/rancher/client-go@v1.24.0-rancher1/tools/cache/reflector.go:168: watch of *v1.Role ended with: an error on the server (“unable to decode an event from the watch stream: tunnel disconnect”) has prevented the request from succeeding
W0929 02:58:40.172987 34 reflector.go:443] pkg/mod/github.com/rancher/client-go@v1.24.0-rancher1/tools/cache/reflector.go:168: watch of *v1.ServiceAccount ended with: an error on the server (“unable to decode an event from the watch stream: tunnel disconnect”) has prevented the request from succeeding
W0929 02:58:40.173361 34 reflector.go:443] pkg/mod/github.com/rancher/client-go@v1.24.0-rancher1/tools/cache/reflector.go:168: watch of *v1.RoleBinding ended with: an error on the server (“unable to decode an event from the watch stream: tunnel disconnect”) has prevented the request from succeeding
W0929 02:58:40.173530 34 reflector.go:443] pkg/mod/github.com/rancher/client-go@v1.24.0-rancher1/tools/cache/reflector.go:168: watch of *v1.ClusterRoleBinding ended with: an error on the server (“unable to decode an event from the watch stream: tunnel disconnect”) has prevented the request from succeeding
W0929 02:58:40.173576 34 reflector.go:443] pkg/mod/github.com/rancher/client-go@v1.24.0-rancher1/tools/cache/reflector.go:168: watch of *v1.ClusterRole ended with: an error on the server (“unable to decode an event from the watch stream: tunnel disconnect”) has prevented the request from succeeding
2022/09/29 02:58:43 [INFO] error in remotedialer server [400]: websocket: close 1006 (abnormal closure): unexpected EOF
W0929 02:58:43.781167 34 reflector.go:443] pkg/mod/github.com/rancher/client-go@v1.24.0-rancher1/tools/cache/reflector.go:168: watch of *v1.ClusterRoleBinding ended with: an error on the server (“unable to decode an event from the watch stream: tunnel disconnect”) has prevented the request from succeeding
W0929 02:58:43.781253 34 reflector.go:443] pkg/mod/github.com/rancher/client-go@v1.24.0-rancher1/tools/cache/reflector.go:168: watch of *v1.ServiceAccount ended with: an error on the server (“unable to decode an event from the watch stream: tunnel disconnect”) has prevented the request from succeeding
W0929 02:58:43.781747 34 reflector.go:443] pkg/mod/github.com/rancher/client-go@v1.24.0-rancher1/tools/cache/reflector.go:168: watch of *v1.Secret ended with: an error on the server (“unable to decode an event from the watch stream: tunnel disconnect”) has prevented the request from succeeding
I0929 02:59:27.006631 34 trace.go:205] Trace[309410732]: “Reflector ListAndWatch” name:pkg/mod/github.com/rancher/client-go@v1.24.0-rancher1/tools/cache/reflector.go:168 (29-Sep-2022 02:58:41.161) (total time: 45845ms):
Trace[309410732]: —“Objects listed” error: 45844ms (02:59:27.006)
Trace[309410732]: [45.845074035s] [45.845074035s] END
W0929 03:01:36.027108 34 warnings.go:80] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0929 03:08:53.033736 34 warnings.go:80] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+

######################################

kubectl -n cattle-system logs -f rancher-6496df4785-c4s5z

W0929 04:28:30.205454 34 transport.go:288] Unable to cancel request for *client.addQuery
W0929 04:28:30.205748 34 transport.go:288] Unable to cancel request for *client.addQuery
W0929 04:28:30.208320 34 transport.go:288] Unable to cancel request for *client.addQuery
W0929 04:28:30.209432 34 transport.go:288] Unable to cancel request for *client.addQuery
W0929 04:28:30.209774 34 transport.go:288] Unable to cancel request for *client.addQuery
W0929 04:28:30.210637 34 transport.go:288] Unable to cancel request for *client.addQuery

######################################

kubectl -n cattle-system logs -f rancher-6496df4785-dktjz

I0929 02:59:25.348671 34 trace.go:205] Trace[1883157715]: “Reflector ListAndWatch” name:pkg/mod/github.com/rancher/client-go@v1.24.0-rancher1/tools/cache/reflector.go:168 (29-Sep-2022 02:58:41.437) (total time: 43911ms):
Trace[1883157715]: —“Objects listed” error: 43911ms (02:59:25.348)
Trace[1883157715]: [43.911241697s] [43.911241697s] END
I0929 02:59:25.354980 34 trace.go:205] Trace[673008646]: “Reflector ListAndWatch” name:pkg/mod/github.com/rancher/client-go@v1.24.0-rancher1/tools/cache/reflector.go:168 (29-Sep-2022 02:58:41.693) (total time: 43661ms):
Trace[673008646]: —“Objects listed” error: 43661ms (02:59:25.354)
Trace[673008646]: [43.661319094s] [43.661319094s] END
2022/09/29 03:03:40 [ERROR] error syncing ‘cattle-fleet-system/helm-operation-42sts’: handler helm-operation: pods “helm-operation-42sts” not found, requeuing
W0929 03:28:20.462905 34 transport.go:288] Unable to cancel request for *client.addQuery
W0929 03:28:20.653063 34 transport.go:288] Unable to cancel request for *client.addQuery
W0929 03:28:20.657948 34 transport.go:288] Unable to cancel request for *client.addQuery

###########
[rancher@k8s44 deploy_rancher]$ docker logs -f fb0bf3fa4e2e
Error: No such container: fb0bf3fa4e2e

[rancher@k8s44 deploy_rancher] docker ps -a |grep agent ea2a262d123d fe7f7b7c3cac "run.sh" 5 minutes ago Exited (1) 3 minutes ago k8s_cluster-register_cattle-cluster-agent-7697bb7794-cj72h_cattle-system_e9996f0b-74fd-49fb-ae42-14af370c378d_28 a0e4f3ff19fa yyimgs.com/library/rancher/rancher-agent:v2.6.7 "run.sh --no-registe…" 3 hours ago Exited (0) 3 hours ago share-mnt e15dfaf9e08c fe7f7b7c3cac "run.sh" 3 hours ago Up 3 hours k8s_agent_cattle-node-agent-zkthd_cattle-system_4bf7bbc5-955e-4059-8f6f-6e814e52151d_0 c47204d07332 yyimgs.com/library/rancher/mirrored-pause:3.6 "/pause" 3 hours ago Up 3 hours k8s_POD_cattle-node-agent-zkthd_cattle-system_4bf7bbc5-955e-4059-8f6f-6e814e52151d_0 cfb7b6c7bfa2 yyimgs.com/library/rancher/mirrored-pause:3.6 "/pause" 3 hours ago Up 3 hours k8s_POD_cattle-cluster-agent-7697bb7794-cj72h_cattle-system_e9996f0b-74fd-49fb-ae42-14af370c378d_0 [rancher@k8s44 deploy_rancher] docker logs -f ea2a262d123d
INFO: Environment: CATTLE_ADDRESS=10.42.0.6 CATTLE_CA_CHECKSUM=6cbd59f484941df6580d08227d9c329d7dc043685d76f9c1612a0d1130965308 CATTLE_CLUSTER=true CATTLE_CLUSTER_AGENT_PORT=tcp://10.43.177.17:80 CATTLE_CLUSTER_AGENT_PORT_443_TCP=tcp://10.43.177.17:443 CATTLE_CLUSTER_AGENT_PORT_443_TCP_ADDR=10.43.177.17 CATTLE_CLUSTER_AGENT_PORT_443_TCP_PORT=443 CATTLE_CLUSTER_AGENT_PORT_443_TCP_PROTO=tcp CATTLE_CLUSTER_AGENT_PORT_80_TCP=tcp://10.43.177.17:80 CATTLE_CLUSTER_AGENT_PORT_80_TCP_ADDR=10.43.177.17 CATTLE_CLUSTER_AGENT_PORT_80_TCP_PORT=80 CATTLE_CLUSTER_AGENT_PORT_80_TCP_PROTO=tcp CATTLE_CLUSTER_AGENT_SERVICE_HOST=10.43.177.17 CATTLE_CLUSTER_AGENT_SERVICE_PORT=80 CATTLE_CLUSTER_AGENT_SERVICE_PORT_HTTP=80 CATTLE_CLUSTER_AGENT_SERVICE_PORT_HTTPS_INTERNAL=443 CATTLE_CLUSTER_REGISTRY=yyimgs.com/library CATTLE_FEATURES=embedded-cluster-api=false,fleet=false,monitoringv1=false,multi-cluster-management=false,multi-cluster-management-agent=true,provisioningv2=false,rke2=false CATTLE_INGRESS_IP_DOMAIN=sslip.io CATTLE_INSTALL_UUID=86d83395-c05f-4b24-8ea6-92fa894232f0 CATTLE_INTERNAL_ADDRESS= CATTLE_IS_RKE=true CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=cattle-cluster-agent-7697bb7794-cj72h CATTLE_SERVER=https://yyk8s.com CATTLE_SERVER_VERSION=v2.6.7
INFO: Using resolv.conf: nameserver 10.43.0.10 search cattle-system.svc.cluster.local svc.cluster.local cluster.local options ndots:5
ERROR: https://yyk8s.com/ping is not accessible (Failed to connect to yyk8s.com port 443: Connection timed out)

INFO: Using resolv.conf: nameserver 10.43.0.10 search cattle-system.svc.cluster.local svc.cluster.local cluster.local options ndots:5
ERROR: https://yyk8s.com/ping is not accessible (Failed to connect to yyk8s.com port 443: Connection timed out)

同样的安装过程在 2.5.16 版本是可以正常的,至少可以完成集群注册,然后打cattle-cluster-agent补丁,IP 对应域名 就都正常了,也可以在建集群时设置 域名对应 IP地址 都正常的; 现在 2.6.8 没有设置 域名对应IP的配置了,可能没有学到对应技巧吧


  1. 在控制节点执行下面命令来生成下游集群的 kubeconfig 文件
docker run --rm --net=host -v $(docker inspect kubelet --format '{{ range .Mounts }}{{ if eq .Destination "/etc/kubernetes" }}{{ .Source }}{{ end }}{{ end }}')/ssl:/etc/kubernetes/ssl:ro --entrypoint bash $(docker inspect $(docker images -q --filter=label=io.cattle.agent=true) --format='{{index .RepoTags 0}}' | tail -1) -c 'kubectl --kubeconfig /etc/kubernetes/ssl/kubecfg-kube-node.yaml get configmap -n kube-system full-cluster-state -o json | jq -r .data.\"full-cluster-state\" | jq -r .currentState.certificatesBundle.\"kube-admin\".config | sed -e "/^[[:space:]]*server:/ s_:.*_: \"https://127.0.0.1:6443\"_"' > kubeconfig_admin.yaml
  1. 然后参考 下面连接来设置域名的映射
1 个赞

由于集群没有注册成功,一直琢磨怎么调整。rancher大佬的答复,解决了困惑问题,谢谢 ! ksd