Rancher2.6运行报错,求解答

Rancher Server 设置

  • Rancher 版本:rancher2.6.3
  • 安装选项 (Docker install/Helm Chart):
  • 在线或离线部署:在线docker部署

下游集群信息

  • Kubernetes 版本: v1.21.14-rancher1-1
  • Cluster Type (Local/Downstream):
    • 如果 Downstream,是什么类型的集群?(自定义/导入或为托管 等):

用户信息

  • 登录用户的角色是什么? (管理员/集群所有者/集群成员/项目所有者/项目成员/自定义):管理员
    • 如果自定义,自定义权限集:

**主机操作系统:**CentOS Linux release 7.9.2009 (Core)

**问题描述:**rancher报错,ui界面无法创建自定义集群

重现步骤:
docker版本 23.0.1
运行命令:docker run -d --privileged --restart=unless-stopped -p 80:80 -p 443:443 rancher/rancher:latest

结果:
ui界面能正常访问,但在ui界面创建集群也一直无法成功,界面一直提示“Cluster must have at least one etcd plane host: please specify one or more etcd in cluster config”,查看rancher服务日志一直erro,应该是rancher异常引起了下游集群无法创建成功(运行低版本的rancher无这类问题)

预期结果:
在ui能创建下游集群,并发布服务

截图:



其他上下文信息:

日志
2023/02/16 01:27:57 [INFO] [certificates] Generating kube-etcd-192-168-52-100 certificate and key
2023/02/16 01:27:57 [INFO] cluster [c-chccb] provisioning: Successfully Deployed state file at [management-state/rke/rke-737886933/cluster.rkestate]
2023/02/16 01:27:57 [INFO] cluster [c-chccb] provisioning: Building Kubernetes cluster
2023/02/16 01:27:57 [INFO] cluster [c-chccb] provisioning: [dialer] Setup tunnel for host [192.168.52.100]
2023/02/16 01:27:57 [INFO] cluster [c-chccb] provisioning: [network] Deploying port listener containers
2023/02/16 01:27:57 [INFO] Pulling image [rancher/rke-tools:v0.1.87] on host [192.168.52.100], try #1
2023/02/16 01:28:12 [ERROR] Failed to install system chart fleet: pod cattle-system/helm-operation-59wxn failed, watch closed
2023/02/16 01:28:20 [ERROR] error syncing 'validating-webhook-configuration': handler need-a-cert: services "webhook-service" not found, requeuing
2023/02/16 01:28:20 [ERROR] error syncing 'mutating-webhook-configuration': handler need-a-cert: services "webhook-service" not found, requeuing
2023/02/16 01:28:50 [ERROR] error syncing 'validating-webhook-configuration': handler need-a-cert: services "webhook-service" not found, requeuing
2023/02/16 01:28:50 [ERROR] error syncing 'mutating-webhook-configuration': handler need-a-cert: services "webhook-service" not found, requeuing
2023/02/16 01:29:12 [ERROR] Failed to install system chart fleet-crd: pod cattle-system/helm-operation-w4527 failed, watch closed
2023/02/16 01:29:20 [ERROR] error syncing 'validating-webhook-configuration': handler need-a-cert: services "webhook-service" not found, requeuing
2023/02/16 01:29:20 [ERROR] error syncing 'mutating-webhook-configuration': handler need-a-cert: services "webhook-service" not found, requeuing
2023/02/16 01:29:50 [ERROR] error syncing 'validating-webhook-configuration': handler need-a-cert: services "webhook-service" not found, requeuing
2023/02/16 01:29:50 [ERROR] error syncing 'mutating-webhook-configuration': handler need-a-cert: services "webhook-service" not found, requeuing
2023/02/16 01:30:12 [ERROR] Failed to install system chart fleet: pod cattle-system/helm-operation-f8692 failed, watch closed
2023/02/16 01:30:20 [ERROR] error syncing 'validating-webhook-configuration': handler need-a-cert: services "webhook-service" not found, requeuing
2023/02/16 01:30:20 [ERROR] error syncing 'mutating-webhook-configuration': handler need-a-cert: services "webhook-service" not found, requeuing
2023/02/16 01:30:50 [ERROR] error syncing 'validating-webhook-configuration': handler need-a-cert: services "webhook-service" not found, requeuing
2023/02/16 01:30:50 [ERROR] error syncing 'mutating-webhook-configuration': handler need-a-cert: services "webhook-service" not found, requeuing
2023/02/16 01:31:12 [ERROR] Failed to install system chart fleet-crd: pod cattle-system/helm-operation-m6ffp failed, watch closed





  1. 创建集群的时候,节点选择的角色有哪些?
  2. 这个节点是否之前安装过集群?或者你直接使用卸载脚本,将该节点的数据清理掉,重现添加,参考:如何清理节点? | Rancher文档
1 个赞

创建集群的时候3个都选了,因为是master节点(etcd,work,controlplane)
没安装过,尝试用新的机器部署一样的报错

Docker 23.x 最近刚刚发布,我们还没进行完整测试,暂时还不建议使用。
你可以跟踪这里查看后续进展:Validate Docker 23.0.x · Issue #40417 · rancher/rancher · GitHub

另外,我不建议你使用 rancher/rancher:latest ,而是指定到具体版本的tag,比如:rancher/rancher:v2.6.10
国内很多Registry Mirror服务缓存镜像一直不更新,会导致latest tag一直停留在很早的版本。

我刚试过了Docker version 19.03.9,一样的error
Failed to install system chart rancher-webhook: pod cattle-system/helm-operation-w9fxk failed, watch closed
2023/02/16 05:59:27 [ERROR] error syncing ‘mutating-webhook-configuration’: handler need-a-cert: services “webhook-service” not found, requeuing
2023/02/16 05:59:27 [ERROR] error syncing ‘validating-webhook-configuration’: handler need-a-cert: services “webhook-service” not found, requeuing
2023/02/16 05:59:52 [ERROR] Failed to install system chart fleet: pod cattle-system/helm-operation-nnqjm failed, watch closed
2023/02/16 06:00:52 [ERROR] Failed to install system chart fleet-crd: pod cattle-system/helm-operation-z2llx failed, watch closed
2023/02/16 06:01:27 [ERROR] error syncing ‘mutating-webhook-configuration’: handler need-a-cert: services “webhook-service” not found, requeuing
2023/02/16 06:01:27 [ERROR] error syncing ‘validating-webhook-configuration’: handler need-a-cert: services “webhook-service” not found, requeuing
2023/02/16 06:01:52 [ERROR] Failed to install system chart rancher-webhook: pod cattle-system/helm-operation-9nnb6 failed, watch closed

排查后发现:

  1. centos 7 的 selinux 没关闭,建议关闭
  2. docker info 会有一些 warning 的消息,建议百度并去掉这个 warning
  3. 配置了一个错误的 mirror,导致 pull rancher/rancher:latest 拉的是 2.6.3 的镜像

总结:
其实 以上的错误日志并不影响你创建集群,最终的原因就是因为你配置了错误的 mirror,或者是拉镜像非常慢的 mirror,导致创建 rke 集群需要的镜像一直拉不下来,随后替换成可用的 mirror 之后,集群可创建成功。

其他:
创建下游集群,可以在 rancher 中查看 rancher 的日志来判断下游集群的安装进度