ranchr集群管理报Error while applying agent YAML, it will be retried automatically: exit status 1

jzyuan2008 · 2024 年5 月 30 日 03:43

Rancher Server 设置

Rancher 版本：2.6.9
安装选项 (Docker install):
- 如果是 Helm Chart 安装，需要提供 Local 集群的类型（RKE1, RKE2, k3s, EKS, 等）和版本：
在线或离线部署：
离线部署

下游集群信息

Kubernetes 版本: 1.23.6
Cluster Type (Local/Downstream):
- 如果 Downstream，是什么类型的集群?(自定义/导入或为托管等):
  自定义

用户信息

登录用户的角色是什么？（管理员/集群所有者/集群成员/项目所有者/项目成员/自定义）：admin
- 如果自定义，自定义权限集：

主机操作系统：centos7.9

问题描述：自定义安装完成后，出现这个报错

**重现步骤：几次安装完成后都报这个错，同时我还在这里修改max-pod数量：

因为是物理机，我需要跑更多的pod

**

结果：

预期结果：不产生报错提示

**截图：

**

其他上下文信息：

日志

rancher 是docker run的，下面是rancher日志：

2024/05/30 11:27:39 [INFO] Handling backend connection request [stv-cluster-c-bq26s]
2024/05/30 11:27:42 [INFO] Handling backend connection request [c-bq26s]
2024/05/30 11:27:42 [INFO] error in remotedialer server [400]: websocket: close 1006 (abnormal closure): unexpected EOF
2024/05/30 11:27:42 [INFO] error in remotedialer server [400]: websocket: close 1006 (abnormal closure): unexpected EOF
W0530 11:27:42.934085      71 reflector.go:443] pkg/mod/github.com/rancher/client-go@v1.24.0-rancher1/tools/cache/reflector.go:168: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: tunnel disconnect") has prevented the request from succeeding
W0530 11:27:42.934146      71 reflector.go:443] pkg/mod/github.com/rancher/client-go@v1.24.0-rancher1/tools/cache/reflector.go:168: watch of *v1.Namespace ended with: an error on the server ("unable to decode an event from the watch stream: tunnel disconnect") has prevented the request from succeeding
W0530 11:27:42.939059      71 reflector.go:325] pkg/mod/github.com/rancher/client-go@v1.24.0-rancher1/tools/cache/reflector.go:168: failed to list *v1.APIService: Get "https://10.10.1.113:6443/apis/apiregistration.k8s.io/v1/apiservices?resourceVersion=1471950": tunnel disconnect
E0530 11:27:42.939111      71 reflector.go:139] pkg/mod/github.com/rancher/client-go@v1.24.0-rancher1/tools/cache/reflector.go:168: Failed to watch *v1.APIService: failed to list *v1.APIService: Get "https://10.10.1.113:6443/apis/apiregistration.k8s.io/v1/apiservices?resourceVersion=1471950": tunnel disconnect
2024/05/30 11:27:43 [ERROR] Failed to handle tunnel request from remote address 10.10.1.126:63593: response 401: failed authentication
2024/05/30 11:27:43 [INFO] Handling backend connection request [stv-cluster-c-bq26s]
2024/05/30 11:28:03 [ERROR] error syncing 'c-bq26s': handler cluster-deploy: Error while applying agent YAML, it will be retried automatically: exit status 1, clusterrole.rbac.authorization.k8s.io/proxy-clusterrole-kubeapiserver unchanged clusterrolebinding.rbac.authorization.k8s.io/proxy-role-binding-kubernetes-master unchanged namespace/cattle-system unchanged serviceaccount/cattle unchanged clusterrolebinding.rbac.authorization.k8s.io/cattle-admin-binding unchanged secret/cattle-credentials-ade1a45 unchanged clusterrole.rbac.authorization.k8s.io/cattle-admin unchanged deployment.apps/cattle-cluster-agent unchanged daemonset.apps/cattle-node-agent unchanged daemonset.apps/kube-api-auth unchanged service/cattle-cluster-agent unchanged Error from server (BadRequest): error when creating "./management-state/tmp/yaml-787991680": Secret in version "v1" cannot be handled as a Secret: illegal base64 data at input byte 103 , requeuing

ksd · 2024 年5 月 30 日 06:56

没见过这个问题，大概的原因是你修改的secret 中存在不标准 base64 的有效字符才导致的。

正常来说，如果你自修改了 max-pod 的数据，是不太可可能出现上述问题的。

或者你再次编辑集群，将max-pod 的数字修改为默认值，看看是否恢复，或者把你之前修改过的所有内容都贴出来。