2.5.16 更新证书 etcd 无法注册

rancher证书明天到期,按照官方文档,执行了如下命令:
1.exec 到 rancher server
kubectl --insecure-skip-tls-verify -n kube-system delete secrets k3s-serving
kubectl --insecure-skip-tls-verify delete secret serving-cert -n cattle-system
rm -f /var/lib/rancher/k3s/server/tls/dynamic-cert.json
exit
2.执行以下命令刷新参数
curl --insecure -sfL https://172.17.80.215/v3 # rancher 地址
3.重启 rancher-server

重启后登录页面一直提示
当前集群 Unavailable 中…,在 API 准备就绪之前,直接与 API 交互的功能将不可用。

查看 rancher 日志
2023/09/14 09:31:26 [ERROR] failed on subscribe namespacedBasicAuth: Get “https://172.17.80.216:6443/api/v1/secrets?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true”: waiting for cluster [c-bwnnc] agent to connect
2023/09/14 09:31:53 [ERROR] failed on subscribe namespacedCertificate: Get “https://172.17.80.216:6443/api/v1/secrets?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true”: waiting for cluster [c-bwnnc] agent to connect
2023/09/14 09:32:07 [INFO] Stopping cluster agent for c-bwnnc
2023/09/14 09:32:07 [ERROR] failed to start cluster controllers c-bwnnc: context canceled
W0914 09:33:02.501034 8 warnings.go:80] extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
2023/09/14 09:34:18 [INFO] Stopping cluster agent for c-bwnnc
2023/09/14 09:34:18 [ERROR] failed to start cluster controllers c-bwnnc: context canceled
2023-09-14 09:34:44.545854 I | mvcc: store.index: compact 233691551
2023-09-14 09:34:44.602587 I | mvcc: finished scheduled compaction at 233691551 (took 56.039501ms)
2023/09/14 09:35:36 [ERROR] error syncing ‘c-bwnnc’: handler cluster-deploy: Get “https://172.17.80.216:6443/apis/apps/v1/namespaces/cattle-system/daemonsets/cattle-node-agent”: waiting for cluster [c-bwnnc] agent to connect, requeuing
2023/09/14 09:36:27 [INFO] Stopping cluster agent for c-bwnnc
2023/09/14 09:36:27 [ERROR] failed to start cluster controllers c-bwnnc: context canceled
2023/09/14 09:38:19 [INFO] Stopping cluster agent for c-bwnnc
2023/09/14 09:38:19 [ERROR] failed to start cluster controllers c-bwnnc: context canceled
W0914 09:39:07.340037 8 warnings.go:80] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
2023-09-14 09:39:44.548704 I | mvcc: store.index: compact 233693409
2023-09-14 09:39:44.604408 I | mvcc: finished scheduled compaction at 233693409 (took 54.767808ms)
2023/09/14 09:40:22 [INFO] Stopping cluster agent for c-bwnnc
2023/09/14 09:40:22 [ERROR] failed to start cluster controllers c-bwnnc: context canceled
W0914 09:42:11.501855 8 warnings.go:80] extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
2023/09/14 09:42:31 [INFO] Stopping cluster agent for c-bwnnc
2023/09/14 09:42:31 [ERROR] failed to start cluster controllers c-bwnnc: context canceled
2023/09/14 09:44:41 [INFO] Stopping cluster agent for c-bwnnc
2023/09/14 09:44:41 [ERROR] failed to start cluster controllers c-bwnnc: context canceled
2023-09-14 09:44:44.551494 I | mvcc: store.index: compact 233695267
2023-09-14 09:44:44.607465 I | mvcc: finished scheduled compaction at 233695267 (took 55.242018ms)
2023/09/14 09:46:49 [INFO] Stopping cluster agent for c-bwnnc
2023/09/14 09:46:49 [ERROR] failed to start cluster controllers c-bwnnc: context canceled
W0914 09:48:00.340817 8 warnings.go:80] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
2023/09/14 09:48:56 [INFO] Stopping cluster agent for c-bwnnc
2023/09/14 09:48:56 [ERROR] failed to start cluster controllers c-bwnnc: context canceled

再看下 cluster-agent 的日志

rancher 使用的 docker run 启动,启动命令如下:
docker run -d --privileged --restart=unless-stopped -p 80:80 -p 443:443 --name rancher -v /data/certs:/container/certs -v /data/log/rancher/auditlog:/var/log/auditlog -v /data/rancher:/var/lib/rancher -e SSL_CERT_DIR="/container/certs" -e CATTLE_AGENT_IMAGE=“registry.cn-hangzhou.aliyuncs.com/rancher/rancher-agent:v2.5.16registry.cn-hangzhou.aliyuncs.com/rancher/rancher:v2.5.16

查看 cluster-agent 日志,没有任何输出

查看 rancher 日志,显示连接不到 kube-apiserver

查看 kube-apiserver日志,显示往 etcd 更新
ccResolverWrapper: sending update to cc: {[{https://172.17.80.216:2379 0 }] }
I0915 02:08:30.405363 1 clientconn.go:948] ClientConn switching balancer to “pick_first”

查看 etcd 日志

你看的是 pause 的日志,当然是空的了, docker ps -a | grep cluster-agent

curl --insecure -sfL https://180.19.0.2/v3

然后在观察 cluster-agent 的日志

还是输出这个错误

当时你注册集群的时候,设置的 server-url 是啥?

rc.mingya.com.cn

这样,你在运行 cluster-agent 的节点上执行:curl --insecure -sfL https://172.17.80.215/v3 然后给我截个图

在 执行上面命令之前,观察 rancher 的日志,如果你执行成功,会有一个更新证书的日志

在运行 cluster-agent 的节点上执行:curl --insecure -sfL https://172.17.80.215/v3命令后,以下图片 是 rancher 日志


执行命令后,图片是cluster-agent日志

太奇怪了,没遇到过。

你是通过哪个地址访问的 rancher UI ?

通过的 rc.mingya.com.cn 访问的

映射的地址是多少啊……

还有你的 rancher 的运行的命令是什么

内部有DNS 服务器,rc.mingya.com.cn 映射的地址是 172.17.80.215

172.17.80.215 rc.mingya.com.cn
rancher 集群每台服务器都配置了 hosts 文件

rancher 的启动命令是:
rancher 使用的 docker run 启动,启动命令如下:
docker run -d --privileged --restart=unless-stopped -p 80:80 -p 443:443 --name rancher -v /data/certs:/container/certs -v /data/log/rancher/auditlog:/var/log/auditlog -v /data/rancher:/var/lib/rancher -e SSL_CERT_DIR="/container/certs" -e CATTLE_AGENT_IMAGE=“registry.cn-hangzhou.aliyuncs.com/rancher/rancher-agent:v2.5.16registry.cn-hangzhou.aliyuncs.com/rancher/rancher:v2.5.16