rancher导入eks集群Cluster agent is not connected

Rancher 版本:2.7.6

安装选项 (Docker install/Helm Chart): Docker install

如果是 Helm Chart 安装,需要提供 Local 集群的类型(RKE1, RKE2, k3s, EKS, 等)和版本:k3s

在线或离线部署:

在线docker-compose部署

下游集群信息

Kubernetes 版本: 1.30

Cluster Type (Local/Downstream):

如果 Downstream,是什么类型的集群?(自定义/导入或为托管 等): 自定义

用户信息 admin

登录用户的角色是什么? (管理员/集群所有者/集群成员/项目所有者/项目成员/自定义):管理员

如果自定义,自定义权限集:Default Admin

主机操作系统: Amazon-linux 2023

问题描述:

local集群正常,下游集群一直无法注册进rancher-server,界面显示[Disconnected] Cluster agent is not connected

server端日志


rancher | 2025/02/26 03:22:19 [ERROR] error syncing 'c-m-2gdnwxzd/machine-6pblw': handler node-controller-sync: Operation cannot be fulfilled on nodes.management.cattle.io "machine-6pblw": StorageError: invalid object, Code: 4, Key: /registry/management.cattle.io/nodes/c-m-2gdnwxzd/machine-6pblw, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: cc3706d9-3940-46be-9f36-ece5dfba186d, UID in object meta: , requeuing

rancher | 2025/02/26 03:22:20 [ERROR] error syncing 'c-m-2gdnwxzd/machine-c75qs': handler node-controller-sync: Operation cannot be fulfilled on nodes.management.cattle.io "machine-c75qs": StorageError: invalid object, Code: 4, Key: /registry/management.cattle.io/nodes/c-m-2gdnwxzd/machine-c75qs, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: f184c0c2-4cc0-4b05-8e7f-2a2bf07b6fcb, UID in object meta: , requeuing

rancher | 2025/02/26 03:22:20 [ERROR] error syncing 'c-m-2gdnwxzd/machine-ckm8q': handler node-controller-sync: Operation cannot be fulfilled on nodes.management.cattle.io "machine-ckm8q": StorageError: invalid object, Code: 4, Key: /registry/management.cattle.io/nodes/c-m-2gdnwxzd/machine-ckm8q, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: e172a685-b2d4-4839-92ed-223750cd208d, UID in object meta: , requeuing

rancher | 2025/02/26 03:22:20 [ERROR] error syncing 'c-m-2gdnwxzd/machine-cphdf': handler node-controller-sync: Operation cannot be fulfilled on nodes.management.cattle.io "machine-cphdf": StorageError: invalid object, Code: 4, Key: /registry/management.cattle.io/nodes/c-m-2gdnwxzd/machine-cphdf, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 153f5d09-9dba-4d10-889e-1893dbd951e6, UID in object meta: , requeuing

rancher | 2025/02/26 03:22:20 [ERROR] error syncing 'c-m-2gdnwxzd/machine-gj2n5': handler node-controller-sync: Operation cannot be fulfilled on nodes.management.cattle.io "machine-gj2n5": StorageError: invalid object, Code: 4, Key: /registry/management.cattle.io/nodes/c-m-2gdnwxzd/machine-gj2n5, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: a5fd3747-6e3e-4c53-8a20-8fa156feb61c, UID in object meta: , requeuing

rancher | 2025/02/26 03:22:20 [ERROR] error syncing 'c-m-2gdnwxzd/machine-ksfz9': handler node-controller-sync: Operation cannot be fulfilled on nodes.management.cattle.io "machine-ksfz9": StorageError: invalid object, Code: 4, Key: /registry/management.cattle.io/nodes/c-m-2gdnwxzd/machine-ksfz9, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: aad54221-a917-4c88-809b-6f85d122264b, UID in object meta: , requeuing

rancher | 2025/02/26 03:22:20 [ERROR] error syncing 'c-m-2gdnwxzd/machine-qhxw5': handler node-controller-sync: Operation cannot be fulfilled on nodes.management.cattle.io "machine-qhxw5": StorageError: invalid object, Code: 4, Key: /registry/management.cattle.io/nodes/c-m-2gdnwxzd/machine-qhxw5, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 0ebf5821-6584-4a00-a9ee-5701cf472810, UID in object meta: , requeuing

rancher | 2025/02/26 03:22:20 [ERROR] error syncing 'c-m-2gdnwxzd/machine-xrgqx': handler node-controller-sync: Operation cannot be fulfilled on nodes.management.cattle.io "machine-xrgqx": StorageError: invalid object, Code: 4, Key: /registry/management.cattle.io/nodes/c-m-2gdnwxzd/machine-xrgqx, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: b1a387a8-a8d4-4805-bd3e-cb87e04be32d, UID in object meta: , requeuing

rancher | 2025/02/26 03:22:22 [ERROR] error syncing 'p-qggff/creator-project-owner': handler mgmt-auth-prtb-controller: projects.management.cattle.io "c-m-2gdnwxzd/p-qggff" not found, requeuing

rancher | 2025/02/26 03:22:22 [ERROR] error syncing 'p-hrtg9/creator-project-owner': handler mgmt-auth-prtb-controller: projects.management.cattle.io "c-m-2gdnwxzd/p-hrtg9" not found, requeuing

rancher | 2025/02/26 03:22:25 [INFO] [mgmt-auth-prtb-controller] Updating owner label for roleBinding crb-6t3x5zmwjp

rancher | 2025/02/26 03:22:25 [INFO] [mgmt-auth-prtb-controller] Deleting roleBinding crb-6t3x5zmwjp

agent端日志


INFO: Environment: CATTLE_ADDRESS=192.168.7.174 CATTLE_CA_CHECKSUM= CATTLE_CLUSTER=true CATTLE_CLUSTER_AGENT_PORT=tcp://172.20.44.34:80 CATTLE_CLUSTER_AGENT_PORT_443_TCP=tcp://172.20.44.34:443 CATTLE_CLUSTER_AGENT_PORT_443_TCP_ADDR=172.20.44.34 CATTLE_CLUSTER_AGENT_PORT_443_TCP_PORT=443 CATTLE_CLUSTER_AGENT_PORT_443_TCP_PROTO=tcp CATTLE_CLUSTER_AGENT_PORT_80_TCP=tcp://172.20.44.34:80 CATTLE_CLUSTER_AGENT_PORT_80_TCP_ADDR=172.20.44.34 CATTLE_CLUSTER_AGENT_PORT_80_TCP_PORT=80 CATTLE_CLUSTER_AGENT_PORT_80_TCP_PROTO=tcp CATTLE_CLUSTER_AGENT_SERVICE_HOST=172.20.44.34 CATTLE_CLUSTER_AGENT_SERVICE_PORT=80 CATTLE_CLUSTER_AGENT_SERVICE_PORT_HTTP=80 CATTLE_CLUSTER_AGENT_SERVICE_PORT_HTTPS_INTERNAL=443 CATTLE_CLUSTER_REGISTRY= CATTLE_INGRESS_IP_DOMAIN=sslip.io CATTLE_INSTALL_UUID=e38f6482-f19a-4384-82af-6fb2e0c301ac CATTLE_INTERNAL_ADDRESS= CATTLE_IS_RKE=false CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=cattle-cluster-agent-6c76b4df48-9q8xb CATTLE_RANCHER_WEBHOOK_MIN_VERSION= CATTLE_RANCHER_WEBHOOK_VERSION=2.0.5+up0.3.5 CATTLE_SERVER=https://rancher.plt.corp.certik.com CATTLE_SERVER_VERSION=v2.7.6

INFO: Using resolv.conf: search cattle-system.svc.cluster.local svc.cluster.local cluster.local ec2.internal nameserver 172.20.0.10 options ndots:5

INFO: https://rancher.plt.corp.certik.com/ping is accessible

INFO: rancher.plt.corp.certik.com resolves to 18.214.199.108 34.237.147.16

time="2025-02-26T03:20:43Z" level=info msg="Listening on /tmp/log.sock"

time="2025-02-26T03:20:43Z" level=info msg="Rancher agent version v2.7.6 is starting"

time="2025-02-26T03:20:43Z" level=info msg="EnsureSecretForServiceAccount: waiting for secret [cattle-token-fbrnl] to be populated with token"

time="2025-02-26T03:20:43Z" level=info msg="Connecting to wss://rancher.plt.corp.certik.com/v3/connect/register with token starting with klfm96tzlcr2b9z8rjw6jvtnjdf"

time="2025-02-26T03:20:43Z" level=info msg="Connecting to proxy" url="wss://rancher.plt.corp.certik.com/v3/connect/register"

检查过网络agent是可以连接上server的。包括日志也显示能解析server的ip以及能发送api请求。

盲猜是因为 rancher 2.7.6 不支持 K8s 1.30 的原因,可参考支持矩阵:https://www.suse.com/suse-rancher/support-matrix/all-supported-versions/rancher-v2-7-6/

最开始我用的2.10 我感觉是版本问题。所以我降级了。依旧是这个问题

切换成2.10 复现给你看看

server日志

kubectl -n cattle-system logs -f cattle-cluster-agent-7d9c899f4-wfcp2
INFO: Environment: CATTLE_ADDRESS=192.168.7.9 CATTLE_CA_CHECKSUM= CATTLE_CLUSTER=true CATTLE_CLUSTER_AGENT_PORT=tcp://172.20.44.34:80 CATTLE_CLUSTER_AGENT_PORT_443_TCP=tcp://172.20.44.34:443 CATTLE_CLUSTER_AGENT_PORT_443_TCP_ADDR=172.20.44.34 CATTLE_CLUSTER_AGENT_PORT_443_TCP_PORT=443 CATTLE_CLUSTER_AGENT_PORT_443_TCP_PROTO=tcp CATTLE_CLUSTER_AGENT_PORT_80_TCP=tcp://172.20.44.34:80 CATTLE_CLUSTER_AGENT_PORT_80_TCP_ADDR=172.20.44.34 CATTLE_CLUSTER_AGENT_PORT_80_TCP_PORT=80 CATTLE_CLUSTER_AGENT_PORT_80_TCP_PROTO=tcp CATTLE_CLUSTER_AGENT_SERVICE_HOST=172.20.44.34 CATTLE_CLUSTER_AGENT_SERVICE_PORT=80 CATTLE_CLUSTER_AGENT_SERVICE_PORT_HTTP=80 CATTLE_CLUSTER_AGENT_SERVICE_PORT_HTTPS_INTERNAL=443 CATTLE_CLUSTER_REGISTRY= CATTLE_INGRESS_IP_DOMAIN=sslip.io CATTLE_INSTALL_UUID=3efbe6f6-4900-4fe7-ae47-349c6b1f6b22 CATTLE_INTERNAL_ADDRESS= CATTLE_IS_RKE=false CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=cattle-cluster-agent-7d9c899f4-wfcp2 CATTLE_RANCHER_PROVISIONING_CAPI_VERSION=105.1.0+up0.6.0 CATTLE_RANCHER_WEBHOOK_VERSION=105.0.2+up0.6.3 CATTLE_SERVER=https://rancher.plt.corp.certik.com CATTLE_SERVER_VERSION=v2.10.2
INFO: Using resolv.conf: search cattle-system.svc.cluster.local svc.cluster.local cluster.local ec2.internal nameserver 172.20.0.10 options ndots:5
INFO: https://rancher.plt.corp.certik.com/ping is accessible
INFO: rancher.plt.corp.certik.com resolves to 34.237.147.16 18.214.199.108
time="2025-02-26T05:14:52Z" level=info msg="Listening on /tmp/log.sock"
time="2025-02-26T05:14:52Z" level=info msg="Rancher agent version v2.10.2 is starting"
time="2025-02-26T05:14:52Z" level=error msg="unable to read CA file from /etc/kubernetes/ssl/certs/serverca: open /etc/kubernetes/ssl/certs/serverca: no such file or directory"
time="2025-02-26T05:14:52Z" level=info msg="Connecting to wss://rancher.plt.corp.certik.com/v3/connect/register with token starting with 6vdfqxgm8kks9hk7d9vsfp5fvcz"
time="2025-02-26T05:14:52Z" level=info msg="Connecting to proxy" url="wss://rancher.plt.corp.certik.com/v3/connect/register"

agent日志

rancher  | 2025/02/26 05:14:52 [INFO] Starting cluster controllers for c-m-2mh9xz9c
rancher  | 2025/02/26 05:14:54 [INFO] Starting /v1, Kind=ServiceAccount controller
rancher  | 2025/02/26 05:14:54 [INFO] Starting /v1, Kind=Secret controller
rancher  | 2025/02/26 05:14:54 [INFO] Starting apiregistration.k8s.io/v1, Kind=APIService controller
rancher  | 2025/02/26 05:14:54 [INFO] Starting rbac.authorization.k8s.io/v1, Kind=Role controller
rancher  | 2025/02/26 05:14:54 [INFO] Starting /v1, Kind=Namespace controller
rancher  | 2025/02/26 05:14:54 [INFO] Starting rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding controller
rancher  | 2025/02/26 05:14:54 [INFO] Starting /v1, Kind=ResourceQuota controller
rancher  | 2025/02/26 05:14:54 [INFO] Starting rbac.authorization.k8s.io/v1, Kind=RoleBinding controller
rancher  | 2025/02/26 05:14:54 [INFO] Starting /v1, Kind=LimitRange controller
rancher  | 2025/02/26 05:14:54 [INFO] Starting rbac.authorization.k8s.io/v1, Kind=ClusterRole controller
rancher  | 2025/02/26 05:14:54 [INFO] Updating clusterRole cluster-owner because of rules difference with roleTemplate Cluster Owner (cluster-owner).
rancher  | 2025/02/26 05:14:54 [INFO] Updating clusterRole project-owner because of rules difference with roleTemplate Project Owner (project-owner).
rancher  | 2025/02/26 05:14:54 [INFO] Starting cluster agent for c-m-2mh9xz9c [owner=true]
rancher  | 2025/02/26 05:14:54 [INFO] Starting /v1, Kind=Node controller
rancher  | 2025/02/26 05:14:54 [INFO] Updating clusterRole cluster-owner because of rules difference with roleTemplate Cluster Owner (cluster-owner).
rancher  | 2025/02/26 05:14:54 [INFO] Updating clusterRole project-owner because of rules difference with roleTemplate Project Owner (project-owner).
rancher  | 2025/02/26 05:14:54 [INFO] Updating clusterRole project-owner because of rules difference with roleTemplate Project Owner (project-owner).
rancher  | 2025/02/26 05:14:54 [INFO] Creating clusterRoleBinding for project access to global resource for subject user-2bqgz role create-ns.
rancher  | 2025/02/26 05:14:54 [INFO] Creating clusterRoleBinding User user-2bqgz Role cluster-owner
rancher  | 2025/02/26 05:14:54 [INFO] Creating clusterRoleBinding for project access to global resource for subject user-2bqgz role p-799s6-namespaces-edit.
rancher  | 2025/02/26 05:14:54 [INFO] Creating clusterRoleBinding User user-2bqgz Role cluster-owner
rancher  | 2025/02/26 05:14:54 [INFO] Creating clusterRoleBinding for project access to global resource for subject user-2bqgz role project-owner-promoted.
rancher  | 2025/02/26 05:14:54 [INFO] Updating clusterRoleBinding crb-2o6y2uapqa for project access to global resource for subject user-2bqgz role create-ns.
rancher  | 2025/02/26 05:14:54 [INFO] Creating clusterRoleBinding for project access to global resource for subject user-2bqgz role p-sbjlb-namespaces-edit.
rancher  | 2025/02/26 05:14:54 [INFO] Updating clusterRoleBinding crb-umaungjbix for project access to global resource for subject user-2bqgz role project-owner-promoted.
rancher  | 2025/02/26 05:14:54 [INFO] Updating clusterRoleBinding crb-umaungjbix for project access to global resource for subject user-2bqgz role project-owner-promoted.
rancher  | 2025/02/26 05:14:54 [INFO] EnsureSecretForServiceAccount: waiting for secret [cattle-impersonation-system:cattle-impersonation-user-2bqgz-token-2qm7n] for service account [cattle-impersonation-system:cattle-impersonation-user-2bqgz] to be populated with token
rancher  | 2025/02/26 05:14:54 [INFO] Rolling back ServiceAccount secret for [cattle-impersonation-system:cattle-impersonation-user-2bqgz-token-bhhzb]
rancher  | 2025/02/26 05:14:54 [INFO] Rolling back ServiceAccount secret for [cattle-impersonation-system:cattle-impersonation-user-2bqgz-token-qc4cw]
rancher  | 2025/02/26 05:14:54 [INFO] EnsureSecretForServiceAccount: got the service account token for service account [cattle-impersonation-system:cattle-impersonation-user-2bqgz] in 25.800538ms
rancher  | 2025/02/26 05:14:55 [INFO] Created machine for node [ip-10-0-1-191.ec2.internal]
rancher  | 2025/02/26 05:14:56 [INFO] Created machine for node [ip-10-0-2-104.ec2.internal]
rancher  | 2025/02/26 05:14:56 [INFO] Creating user for principal system://c-m-2mh9xz9c
rancher  | 2025/02/26 05:14:56 [INFO] Created machine for node [ip-10-0-2-209.ec2.internal]
rancher  | 2025/02/26 05:14:56 [INFO] Created machine for node [ip-10-0-2-217.ec2.internal]
rancher  | 2025/02/26 05:14:56 [INFO] Creating globalRoleBindings for u-nyayow5sfx
rancher  | 2025/02/26 05:14:56 [INFO] Created machine for node [ip-10-0-3-168.ec2.internal]
rancher  | 2025/02/26 05:14:56 [INFO] Created machine for node [ip-10-0-3-196.ec2.internal]
rancher  | 2025/02/26 05:14:56 [INFO] Created machine for node [ip-10-0-3-246.ec2.internal]
rancher  | 2025/02/26 05:14:56 [INFO] Created machine for node [ip-10-0-3-85.ec2.internal]
rancher  | 2025/02/26 05:14:56 [INFO] Creating new GlobalRoleBinding for GlobalRoleBinding grb-qg7xr
rancher  | 2025/02/26 05:14:56 [INFO] [mgmt-auth-grb-controller] Creating clusterRoleBinding for globalRoleBinding grb-qg7xr for user u-nyayow5sfx with role cattle-globalrole-user
rancher  | 2025/02/26 05:14:56 [INFO] Creating system token for u-nyayow5sfx, token: agent-u-nyayow5sfx
rancher  | 2025/02/26 05:14:56 [INFO] [mgmt-auth-crtb-controller] Creating clusterRoleBinding for membership in cluster c-m-2mh9xz9c for subject u-nyayow5sfx
rancher  | 2025/02/26 05:14:56 [INFO] [mgmt-auth-crtb-controller] Creating roleBinding for subject u-nyayow5sfx with role cluster-owner in namespace c-m-2mh9xz9c
rancher  | 2025/02/26 05:14:56 [INFO] [mgmt-auth-crtb-controller] Creating roleBinding for subject u-nyayow5sfx with role cluster-owner in namespace p-799s6
rancher  | 2025/02/26 05:14:56 [INFO] [mgmt-auth-crtb-controller] Creating roleBinding for subject u-nyayow5sfx with role cluster-owner in namespace p-sbjlb
rancher  | 2025/02/26 05:14:56 [INFO] Creating clusterRoleBinding User u-nyayow5sfx Role cluster-owner
rancher  | 2025/02/26 05:14:56 [INFO] EnsureSecretForServiceAccount: waiting for secret [cattle-impersonation-system:cattle-impersonation-u-nyayow5sfx-token-w9zz5] for service account [cattle-impersonation-system:cattle-impersonation-u-nyayow5sfx] to be populated with token
rancher  | 2025/02/26 05:14:56 [INFO] EnsureSecretForServiceAccount: got the service account token for service account [cattle-impersonation-system:cattle-impersonation-u-nyayow5sfx] in 19.018242ms
rancher  | 2025/02/26 05:19:05 [ERROR] Error during subscribe websocket: close sent
rancher  | 2025/02/26 05:19:06 [ERROR] 2025/02/26 05:19:06 http: superfluous response.WriteHeader call from github.com/rancher/rancher/pkg/version.(*versionHandler).ServeHTTP (version.go:53)
rancher  | 2025/02/26 05:19:21 [INFO] certificate CN=dynamic,O=dynamic signed by CN=dynamiclistener-ca@1740546471,O=dynamiclistener-org: notBefore=2025-02-26 05:07:51 +0000 UTC notAfter=2026-02-26 05:19:21 +0000 UTC
rancher  | 2025/02/26 05:19:21 [INFO] Updating TLS secret for cattle-system/tls-rancher-internal (count: 3): map[field.cattle.io/projectId:local:p-vzqzw listener.cattle.io/cn-10.43.80.112:10.43.80.112 listener.cattle.io/cn-172.23.0.2:172.23.0.2 listener.cattle.io/fingerprint:SHA1=1F23CE7409A10292A2221A5814EA53D52CCE8ED9]
rancher  | 2025/02/26 05:19:21 [INFO] Active TLS secret cattle-system/tls-rancher-internal (ver=6712) (count 3): map[field.cattle.io/projectId:local:p-vzqzw listener.cattle.io/cn-10.43.80.112:10.43.80.112 listener.cattle.io/cn-172.23.0.2:172.23.0.2 listener.cattle.io/fingerprint:SHA1=1F23CE7409A10292A2221A5814EA53D52CCE8ED9]
rancher  | 2025/02/26 05:19:21 [INFO] Updating TLS secret for cattle-system/tls-rancher-internal (count: 3): map[field.cattle.io/projectId:local:p-vzqzw listener.cattle.io/cn-10.43.80.112:10.43.80.112 listener.cattle.io/cn-172.23.0.2:172.23.0.2 listener.cattle.io/fingerprint:SHA1=1F23CE7409A10292A2221A5814EA53D52CCE8ED9]
rancher  | 2025/02/26 05:19:21 [ERROR] 2025/02/26 05:19:21 http: TLS handshake error from 10.42.0.5:55624: remote error: tls: bad certificate
rancher  | 2025/02/26 05:19:21 [ERROR] 2025/02/26 05:19:21 http: TLS handshake error from 10.42.0.5:55648: remote error: tls: bad certificate
rancher  | 2025/02/26 05:19:23 [ERROR] defaultSvcAccountHandler: Sync: error handling default ServiceAccount of namespace key=cattle-fleet-local-system, err=Operation cannot be fulfilled on namespaces "cattle-fleet-local-system": the object has been modified; please apply your changes to the latest version and try again
rancher  | 2025/02/26 05:19:23 [ERROR] defaultSvcAccountHandler: Sync: error handling default ServiceAccount of namespace key=cattle-fleet-local-system, err=Operation cannot be fulfilled on namespaces "cattle-fleet-local-system": the object has been modified; please apply your changes to the latest version and try again
rancher  | 2025/02/26 05:19:23 [INFO] namespaceHandler: addProjectIDLabelToNamespace: adding label field.cattle.io/projectId=p-vzqzw to namespace=cattle-fleet-local-system
rancher  | 2025/02/26 05:19:23 [ERROR] namespaceHandler: Sync: error adding project id label to namespace err=Operation cannot be fulfilled on namespaces "cattle-fleet-local-system": the object has been modified; please apply your changes to the latest version and try again
rancher  | 2025/02/26 05:19:23 [INFO] namespaceHandler: addProjectIDLabelToNamespace: adding label field.cattle.io/projectId=p-vzqzw to namespace=cattle-fleet-local-system


机器可以显示出来。但是一直显示Cluster agent is not connected


试了一下其他开源的是可以导入的。

这是什么工具好用么

具体的原因如上: