Rke2集群安装成功后-虚拟机机器重启后集群就运行不起来了

RKE2 ,虚拟机部署 最新版本,使用官方脚本执行的
1个server, 使用官方说明部署
2个agent, 使用官方说明部署

问题描述:
第一次部署成功运行,且K8S可以部署服务,没有问题,重启所有机器后,集群就不正常了

重现步骤:

结果:

预期结果:

截图:

其他上下文信息:

日志
  1. server日志
    Apr 23 04:13:11 fplserver1 rke2[3573]: {“level”:“warn”,“ts”:“2022-04-23T04:13:11.103Z”,“caller”:“etcdserver/server.go:1130”,“msg”:“failed to revoke lease”,“lease-id”:“004e8050328adb29”,“error”:“etcdserver: too many requests”}
    Apr 23 04:13:11 fplserver1 rke2[3573]: {“level”:“warn”,“ts”:“2022-04-23T04:13:11.103Z”,“caller”:“etcdserver/server.go:1130”,“msg”:“failed to revoke lease”,“lease-id”:“004e8050328ae13c”,“error”:“etcdserver: too many requests”}
    Apr 23 04:13:11 fplserver1 rke2[3573]: {“level”:“warn”,“ts”:“2022-04-23T04:13:11.103Z”,“caller”:“etcdserver/server.go:1130”,“msg”:“failed to revoke lease”,“lease-id”:“004e8050328adc1d”,“error”:“etcdserver: too many requests”}
    Apr 23 04:13:11 fplserver1 rke2[3573]: {“level”:“warn”,“ts”:“2022-04-23T04:13:11.103Z”,“caller”:“etcdserver/server.go:1130”,“msg”:“failed to revoke lease”,“lease-id”:“004e8050328aea45”,“error”:“etcdserver: too many requests”}
    Apr 23 04:13:11 fplserver1 rke2[3573]: {“level”:“warn”,“ts”:“2022-04-23T04:13:11.103Z”,“caller”:“etcdserver/server.go:1130”,“msg”:“failed to revoke lease”,“lease-id”:“004e8050328afd68”,“error”:“etcdserver: too many requests”}
    Apr 23 04:13:11 fplserver1 rke2[3573]: {“level”:“warn”,“ts”:“2022-04-23T04:13:11.103Z”,“caller”:“etcdserver/server.go:1130”,“msg”:“failed to revoke lease”,“lease-id”:“004e8050328ad548”,“error”:“etcdserver: too many requests”}
    Apr 23 04:13:11 fplserver1 rke2[3573]: {“level”:“warn”,“ts”:“2022-04-23T04:13:11.103Z”,“caller”:“etcdserver/server.go:1130”,“msg”:“failed to revoke lease”,“lease-id”:“004e8050328ae349”,“error”:“etcdserver: too many requests”}
    Apr 23 04:13:11 fplserver1 rke2[3573]: {“level”:“warn”,“ts”:“2022-04-23T04:13:11.103Z”,“caller”:“etcdserver/server.go:1130”,“msg”:“failed to revoke lease”,“lease-id”:“004e8050328b009d”,“error”:“etcdserver: too many requests”}
    Apr 23 04:13:11 fplserver1 rke2[3573]: {“level”:“warn”,“ts”:“2022-04-23T04:13:11.103Z”,“caller”:“etcdserver/server.go:1130”,“msg”:“failed to revoke lease”,“lease-id”:“004e8050328b1103”,“error”:“etcdserver: too many requests”}
    Apr 23 04:13:15 fplserver1 rke2[3573]: {“level”:“warn”,“ts”:“2022-04-23T04:13:15.608Z”,“caller”:“etcdserver/server.go:1130”,“msg”:“failed to revoke lease”,“lease-id”:“004e8050328b3ca6”,“error”:“etcdserver: too many requests”}
    Apr 23 04:13:15 fplserver1 rke2[3573]: {“level”:“warn”,“ts”:“2022-04-23T04:13:15.609Z”,“caller”:“etcdserver/server.go:1130”,“msg”:“failed to revoke lease”,“lease-id”:“004e8050328b3c49”,“error”:“etcdserver: too many requests”}
    Apr 23 04:13:16 fplserver1 rke2[3573]: {“level”:“warn”,“ts”:“2022-04-23T04:13:16.882Z”,“logger”:“etcd-client”,“caller”:“v3@v3.5.1-k3s1/retry_interceptor.go:62”,“msg”:“retrying of unary invoker failed”,“target”:“etcd-endpoints://0xc0010e0700/127.0.0.1:2399”,“attempt”:0,“error”:“rpc error: code = DeadlineExceeded desc = context deadline exceeded”}
    Apr 23 04:13:16 fplserver1 rke2[3573]: time=“2022-04-23T04:13:16Z” level=info msg=“Failed to test temporary data store connection: context deadline exceeded”

  2. agent 日志
    p 127.0.0.1:38164->127.0.0.1:6444: read: connection reset by peer"
    Apr 23 04:13:32 fplserver2 rke2[6660]: time=“2022-04-23T04:13:32Z” level=error msg=“failed to get CA certs: Get “https://127.0.0.1:6444/cacerts”: read tcp 127.0.0.1:38172->127.0.0.1:6444: read: connection reset by peer”
    Apr 23 04:13:34 fplserver2 rke2[6660]: time=“2022-04-23T04:13:34Z” level=error msg=“failed to get CA certs: Get “https://127.0.0.1:6444/cacerts”: read tcp 127.0.0.1:38180->127.0.0.1:6444: read: connection reset by peer”
    Apr 23 04:13:36 fplserver2 rke2[6660]: time=“2022-04-23T04:13:36Z” level=error msg=“failed to get CA certs: Get “https://127.0.0.1:6444/cacerts”: read tcp 127.0.0.1:38188->127.0.0.1:6444: read: connection reset by peer”
    Apr 23 04:13:38 fplserver2 rke2[6660]: time=“2022-04-23T04:13:38Z” level=error msg=“failed to get CA certs: Get “https://127.0.0.1:6444/cacerts”: read tcp 127.0.0.1:38196->127.0.0.1:6444: read: connection reset by peer”
    Apr 23 04:13:40 fplserver2 rke2[6660]: time=“2022-04-23T04:13:40Z” level=error msg=“failed to get CA certs: Get “https://127.0.0.1:6444/cacerts”: read tcp 127.0.0.1:38204->127.0.0.1:6444: read: connection reset by peer”

日志提示 agent 没连接,所以可以看 rancher server 和 rancher agent 的日志

rancher不用看吧,这是我独立安装的rke2, 只是在rancher里面导入了

有日志,在截图上面,点一下箭头

你提交的日志里的报错基本都是 etcd 的,有可能是因为你的磁盘性能不足导致 I/O 延迟和过多的读取请求导致了起不来,你可以根据 etcd 的日志 google 看看。