因证书问题Rancher升级失败

Rancher Server 设置

  • Rancher 版本:v2.5.5升级到v2.5.12
  • 安装选项 (Docker install):
    docker单节点安装

下游集群信息

  • Kubernetes 版本:
    托管的AWS EKS v1.19

问题描述:按官方文档升级Rancherv2.5.5到v2.5.12 official documentation
升级完成后,启动容器,发现rancher无法使用。查看到下面的log:
Waiting for server to become available: Get “https://127.0.0.1:6443/version?timeout=15m0s”: x509 certificate signed by unknown authority
重新启动Rancher依然报这个错误。
是否有人遇到过这个问题?

single docker安装模式下,rancher-server容器内会内置启动一个k3s作为local集群。
你可以先确认这个内置k3s是否正常启动,rancher-server容器内 /var/lib/rancher/k3s.log

我发现/var/lib/rancher/路径下并没有k3s.log

另外我现在版本的Rancher会经常自动重启,请问这个有什么排查手段吗?我查看日志发现有log:
failed to find access control: ClusterUnavailable 503: cannot determine access, cluster is unavailable。不知道是否有关。

启动时映射到了 -v /opt/rancher:/var/lib/rancher,但是在/opt/rancher目录下什么也没有,看不到日志

确定是 single-docker 模式安装的么?这是我的环境:

docker run -d --restart=unless-stopped   -p 80:80 -p 443:443   --privileged   -v /opt/rancher:/var/lib/rancher rancher/rancher:v2.5.13

root@ip-172-31-17-197:/opt/rancher# ls -ahl
total 4.0M
drwxr-xr-x 4 root root 4.0K Apr 21 09:26 .
drwxr-xr-x 6 root root 4.0K Apr 26 09:35 ..
drwxr-xr-x 5 root root 4.0K Apr 21 09:26 k3s
-rw-r--r-- 1 root root 4.0M May 13 02:09 k3s.log
drwx------ 8 root root 4.0K Apr 21 09:31 management-state

root@ip-172-31-17-197:/opt/rancher# tail -f k3s.log
I0513 01:44:27.740414      38 controller.go:609] quota admission added evaluator for: etcdbackups.management.cattle.io
W0513 01:49:24.227155      38 watcher.go:207] watch chan error: etcdserver: mvcc: required revision has been compacted
W0513 01:53:06.204934      38 watcher.go:207] watch chan error: etcdserver: mvcc: required revision has been compacted
W0513 01:53:49.676369      38 watcher.go:207] watch chan error: etcdserver: mvcc: required revision has been compacted
W0513 01:54:08.406959      38 watcher.go:207] watch chan error: etcdserver: mvcc: required revision has been compacted
W0513 01:59:22.140489      38 watcher.go:207] watch chan error: etcdserver: mvcc: required revision has been compacted
W0513 02:00:29.355740      38 watcher.go:207] watch chan error: etcdserver: mvcc: required revision has been compacted
W0513 02:05:29.472004      38 watcher.go:207] watch chan error: etcdserver: mvcc: required revision has been compacted
W0513 02:08:09.939703      38 watcher.go:207] watch chan error: etcdserver: mvcc: required revision has been compacted
W0513 02:09:16.251164      38 watcher.go:207] watch chan error: etcdserver: mvcc: required revision has been compacted

我看到k3s.log了,还是报之前的问题,看起来启动失败。

没有复现这个问题,操作步骤如下:

  1. 启动单节点 Rancher server v2.5.5
docker run -itd -p 80:80 -p 443:443 --privileged rancher/rancher:v2.5.5
  1. 创建下游集群,并创建测试 workload
  2. 按照文档步骤去升级 Rancher server 到 v2.5.12
oot@ip-172-31-10-217:~# docker ps
CONTAINER ID   IMAGE                    COMMAND           CREATED          STATUS          PORTS                                                                      NAMES
e6c531465159   rancher/rancher:v2.5.5   "entrypoint.sh"   20 minutes ago   Up 20 minutes   0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp   zealous_knuth

root@ip-172-31-10-217:~# docker stop e6c531465159
e6c531465159

root@ip-172-31-10-217:~# docker create --volumes-from zealous_knuth --name rancher-data rancher/rancher:v2.5.5
07ef82a41d5766db6abe1b4e0f12adeff6b5530be6e369be33bd3194afa54af6

root@ip-172-31-10-217:~# docker pull rancher/rancher:v2.5.12

root@ip-172-31-10-217:~# docker run -d --volumes-from rancher-data \
  --restart=unless-stopped \
  -p 80:80 -p 443:443 \
  --privileged \
  rancher/rancher:v2.5.12

等待 Rancher v2.5.12 启动后,通过 UI 访问 集群,可以正常显示。

@hyj-github 你可以参考下我的步骤,来排查下不同之处