ranchr集群管理报Error while applying agent YAML, it will be retried automatically: exit status 1

Rancher Server 设置

  • Rancher 版本:2.6.9
  • 安装选项 (Docker install):
    • 如果是 Helm Chart 安装,需要提供 Local 集群的类型(RKE1, RKE2, k3s, EKS, 等)和版本:
  • 在线或离线部署:
    离线部署

下游集群信息

  • Kubernetes 版本: 1.23.6
  • Cluster Type (Local/Downstream):
    • 如果 Downstream,是什么类型的集群?(自定义/导入或为托管 等):
      自定义

用户信息

  • 登录用户的角色是什么? (管理员/集群所有者/集群成员/项目所有者/项目成员/自定义):admin
    • 如果自定义,自定义权限集:

主机操作系统:centos7.9

问题描述:自定义安装完成后,出现这个报错

**重现步骤:几次安装完成后都报这个错,同时我还在这里修改max-pod数量:


因为是物理机,我需要跑更多的pod

**

结果:

预期结果:不产生报错提示

**截图:


**

其他上下文信息:

日志
rancher 是docker run的,下面是rancher日志:

2024/05/30 11:27:39 [INFO] Handling backend connection request [stv-cluster-c-bq26s]
2024/05/30 11:27:42 [INFO] Handling backend connection request [c-bq26s]
2024/05/30 11:27:42 [INFO] error in remotedialer server [400]: websocket: close 1006 (abnormal closure): unexpected EOF
2024/05/30 11:27:42 [INFO] error in remotedialer server [400]: websocket: close 1006 (abnormal closure): unexpected EOF
W0530 11:27:42.934085      71 reflector.go:443] pkg/mod/github.com/rancher/client-go@v1.24.0-rancher1/tools/cache/reflector.go:168: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: tunnel disconnect") has prevented the request from succeeding
W0530 11:27:42.934146      71 reflector.go:443] pkg/mod/github.com/rancher/client-go@v1.24.0-rancher1/tools/cache/reflector.go:168: watch of *v1.Namespace ended with: an error on the server ("unable to decode an event from the watch stream: tunnel disconnect") has prevented the request from succeeding
W0530 11:27:42.939059      71 reflector.go:325] pkg/mod/github.com/rancher/client-go@v1.24.0-rancher1/tools/cache/reflector.go:168: failed to list *v1.APIService: Get "https://10.10.1.113:6443/apis/apiregistration.k8s.io/v1/apiservices?resourceVersion=1471950": tunnel disconnect
E0530 11:27:42.939111      71 reflector.go:139] pkg/mod/github.com/rancher/client-go@v1.24.0-rancher1/tools/cache/reflector.go:168: Failed to watch *v1.APIService: failed to list *v1.APIService: Get "https://10.10.1.113:6443/apis/apiregistration.k8s.io/v1/apiservices?resourceVersion=1471950": tunnel disconnect
2024/05/30 11:27:43 [ERROR] Failed to handle tunnel request from remote address 10.10.1.126:63593: response 401: failed authentication
2024/05/30 11:27:43 [INFO] Handling backend connection request [stv-cluster-c-bq26s]
2024/05/30 11:28:03 [ERROR] error syncing 'c-bq26s': handler cluster-deploy: Error while applying agent YAML, it will be retried automatically: exit status 1, clusterrole.rbac.authorization.k8s.io/proxy-clusterrole-kubeapiserver unchanged clusterrolebinding.rbac.authorization.k8s.io/proxy-role-binding-kubernetes-master unchanged namespace/cattle-system unchanged serviceaccount/cattle unchanged clusterrolebinding.rbac.authorization.k8s.io/cattle-admin-binding unchanged secret/cattle-credentials-ade1a45 unchanged clusterrole.rbac.authorization.k8s.io/cattle-admin unchanged deployment.apps/cattle-cluster-agent unchanged daemonset.apps/cattle-node-agent unchanged daemonset.apps/kube-api-auth unchanged service/cattle-cluster-agent unchanged Error from server (BadRequest): error when creating "./management-state/tmp/yaml-787991680": Secret in version "v1" cannot be handled as a Secret: illegal base64 data at input byte 103 , requeuing

没见过这个问题,大概的原因是 你修改的secret 中存在不标准 base64 的有效字符才导致的。

正常来说,如果你自修改了 max-pod 的数据,是不太可可能出现上述问题的。

或者你再次编辑集群,将max-pod 的数字修改为默认值,看看是否恢复,或者把你之前修改过的所有内容都贴出来。