从 ui 恢复备份失败

Rancher Server 设置

  • Rancher 版本:
  • 安装选项 (Docker install/Helm Chart): Helm Chart
    • 如果是 Helm Chart 安装,需要提供 Local 集群的类型(RKE1, RKE2, k3s, EKS, 等)和版本:k3s
  • 在线或离线部署:离线

下游集群信息

  • Kubernetes 版本:
  • Cluster Type (Local/Downstream): Downstream
    • 如果 Downstream,是什么类型的集群?(自定义/导入或为托管 等): 自定义

用户信息

  • 登录用户的角色是什么? (管理员/集群所有者/集群成员/项目所有者/项目成员/自定义):管理员
    • 如果自定义,自定义权限集:

**主机操作系统:**CentOS 7

**问题描述:**从 UI 上点击还原备份,还原至前一天的备份,然后就报错。

Failed to start backup server on all etcd nodes: [Failed to run backup server container, container logs: time="2022-09-08T01:48:19Z" level=fatal msg="listen tcp 0.0.0.0:2379: bind: address already in use" Failed to run backup server container, container logs: time="2022-09-08T01:48:26Z" level=fatal msg="listen tcp 0.0.0.0:2379: bind: address already in use" ]

我在其中一个etcd节点上重启服务器,然后查看如下:

[root@server149033 ~]# docker ps
CONTAINER ID   IMAGE                                               COMMAND                  CREATED       STATUS                          PORTS     NAMES
ee316bb5a473   172.22.149.31:18888/rancher/rancher-agent:v2.5.14   "run.sh --no-registe…"   5 weeks ago   Restarting (1) 41 seconds ago             share-mnt

只有这一个容器,并且无限重启,查看日志如下:

time="2022-09-08T01:34:09Z" level=fatal msg="Certificate chain is not complete, please check if all needed intermediate certificates are included in the server certificate (in the correct order) and if the cacerts setting in Rancher either contains the correct CA certificate (in the case of using self signed certificates) or is empty (in the case of using a certificate signed by a recognized CA). Certificate information is displayed above. error: Get \"https://xxyf.rancher.com\": x509: certificate signed by unknown authority"
INFO: Arguments: --no-register --only-write-certs --node-name server149033 --server https://xxyf.rancher.com --token REDACTED
INFO: Environment: CATTLE_ADDRESS=172.22.149.33 CATTLE_AGENT_CONNECT=true CATTLE_INTERNAL_ADDRESS= CATTLE_NODE_NAME=server149033 CATTLE_SERVER=https://xxyf.rancher.com CATTLE_TOKEN=REDACTED CATTLE_WRITE_CERT_ONLY=true
INFO: Using resolv.conf: nameserver 114.114.114.114
INFO: https://xxyf.rancher.com/ping is accessible
INFO: xxyf.rancher.com resolves to 172.22.149.32 172.22.149.146
time="2022-09-08T01:35:09Z" level=info msg="Listening on /tmp/log.sock"
time="2022-09-08T01:35:09Z" level=info msg="Rancher agent version 52a8de7b6-dirty is starting"
time="2022-09-08T01:35:09Z" level=info msg="Option customConfig=map[address:172.22.149.33 internalAddress: label:map[] roles:[] taints:[]]"
time="2022-09-08T01:35:09Z" level=info msg="Option etcd=false"
time="2022-09-08T01:35:09Z" level=info msg="Option controlPlane=false"
time="2022-09-08T01:35:09Z" level=info msg="Option worker=false"
time="2022-09-08T01:35:09Z" level=info msg="Option requestedHostname=server149033"
time="2022-09-08T01:35:10Z" level=info msg="Certificate details from https://xxyf.rancher.com"
time="2022-09-08T01:35:10Z" level=info msg="Certificate #0 (https://xxyf.rancher.com)"
time="2022-09-08T01:35:10Z" level=info msg="Subject: CN=xxyf.rancher.com,C=CN"
time="2022-09-08T01:35:10Z" level=info msg="Issuer: CN=cattle-ca,C=CN"
time="2022-09-08T01:35:10Z" level=info msg="IsCA: false"
time="2022-09-08T01:35:10Z" level=info msg="DNS Names: [xxyf.rancher.com]"
time="2022-09-08T01:35:10Z" level=info msg="IPAddresses: <none>"
time="2022-09-08T01:35:10Z" level=info msg="NotBefore: 2022-07-28 09:09:14 +0000 UTC"
time="2022-09-08T01:35:10Z" level=info msg="NotAfter: 2032-07-25 09:09:14 +0000 UTC"
time="2022-09-08T01:35:10Z" level=info msg="SignatureAlgorithm: SHA256-RSA"
time="2022-09-08T01:35:10Z" level=info msg="PublicKeyAlgorithm: RSA"
time="2022-09-08T01:35:10Z" level=info msg="Certificate #1 (https://xxyf.rancher.com)"
time="2022-09-08T01:35:10Z" level=info msg="Subject: CN=cattle-ca,C=CN"
time="2022-09-08T01:35:10Z" level=info msg="Issuer: CN=cattle-ca,C=CN"
time="2022-09-08T01:35:10Z" level=info msg="IsCA: true"
time="2022-09-08T01:35:10Z" level=info msg="DNS Names: <none>"
time="2022-09-08T01:35:10Z" level=info msg="IPAddresses: <none>"
time="2022-09-08T01:35:10Z" level=info msg="NotBefore: 2022-07-28 09:09:13 +0000 UTC"
time="2022-09-08T01:35:10Z" level=info msg="NotAfter: 2032-07-25 09:09:13 +0000 UTC"
time="2022-09-08T01:35:10Z" level=info msg="SignatureAlgorithm: SHA256-RSA"
time="2022-09-08T01:35:10Z" level=info msg="PublicKeyAlgorithm: RSA"
time="2022-09-08T01:35:10Z" level=fatal msg="Certificate chain is not complete, please check if all needed intermediate certificates are included in the server certificate (in the correct order) and if the cacerts setting in Rancher either contains the correct CA certificate (in the case of using self signed certificates) or is empty (in the case of using a certificate signed by a recognized CA). Certificate information is displayed above. error: Get \"https://xxyf.rancher.com\": x509: certificate signed by unknown authority"

重现步骤:

结果:

预期结果:

截图:

其他上下文信息:

日志


我是真真真的崩溃了,就恢复个备份直接给我整个集群搞崩了,真棒!

我也遇到这个问题了,我的集群再修改server-url后,提示api无法使用,然后重启了一下其中一个master节点docker。
ui报错这个节点的etcd连接失败,然后我就在ui上恢复了一下etcd备份。
这时候我发现三个master节点的所有etcd和api容器都没有了,现在集群已经完全崩溃,彻底无法使用了……