Rancher Server 设置
- Rancher 版本:2.6.8
- 安装选项 (Docker install/Helm Chart): Docker
下游集群信息
- Kubernetes 版本: 1.23.10
- Cluster Type (Local/Downstream): Downstream
- 如果 Downstream,是什么类型的集群?(自定义/导入或为托管 等): 自定义RKE
用户信息
- 登录用户的角色是什么? (管理员/集群所有者/集群成员/项目所有者/项目成员/自定义):
- 如果自定义,自定义权限集:
主机操作系统:
CentOS 7.9
问题描述:
Rancher 2.6.8 Docker Server端更了SSL证书和CA证书链后,根据这个帖子(https://gist.github.com/superseb/076f20146e012f1d4e289f5bd1bd4971)重建了对应cattle-system空间下的cluster-agent和node-agent
在控制节点执行:
kubectl -n cattle-system delete daemonset.apps/cattle-node-agent deployment.apps/cattle-cluster-agent
curl --insecure -sfL https://mydomain.net/v3/import/mycode_c-sp9kw.yaml | kubectl apply -f -
但是通过rancher UI查看 下游RKE集群还是[Disconnected] Cluster agent is not connected
重现步骤:
结果:
预期结果:
截图:
其他上下文信息:
日志
cattle-cluster-agent pod运行正常,日志无报错:
time=“2023-01-07T17:11:16Z” level=info msg=“Listening on /tmp/log.sock”
time=“2023-01-07T17:11:16Z” level=info msg=“Rancher agent version v2.6.8 is starting”
time=“2023-01-07T17:11:16Z” level=info msg=“Connecting to wss://mydomain.net/v3/connect/register with token starting with nfg78vndnq9zvh5cqsm4tvrffht”
time=“2023-01-07T17:11:16Z” level=info msg=“Connecting to proxy” url=“wss://mydomain.net/v3/connect/register”
node-agent pod运行正常,日志无报错:
level=info msg=“Connecting to wss://mydomain.net/v3/connect with token starting with nfg78vndnq9zvh5cqsm4tvrffht”
level=info msg=“Connecting to proxy” url=“wss://mydomain.net/v3/connect”
level=info msg=“Starting plan monitor, checking every 120 seconds”
Rancher Server有一些错误日志(大概有3种异常,应该和此次的更新操作无关)
[ERROR] error syncing ‘c-sp9kw/p-2kqvq’: handler system-image-upgrade-controller: upgrade cluster c-sp9kw system service alerting failed: template system-library-rancher-monitoring incompatible with rancher version or cluster’s [c-sp9kw] kubernetes version, requeuing
2023/01/07 18:22:45 [ERROR] Failed to handle tunnel request from remote address 10.231.227.113:25364: response 400: cluster not found
2023/01/07 18:22:45 [ERROR] Failed to handle tunnel request from remote address 10.231.227.113:25366: response 400: cluster not found
2023/01/07 18:22:46 [ERROR] error syncing ‘system-library’: handler system-image-upgrade-catalog-controller: upgrade cluster local system service alerting failed: template system-library-rancher-monitoring incompatible with rancher version or cluster’s [local] kubernetes version, handler system-image-upgrade-catalog-controller: upgrade cluster c-sp9kw system service alerting failed: template system-library-rancher-monitoring incompatible with rancher version or cluster’s [c-sp9kw] kubernetes version, requeuing