Rancher 创建自定义下游集群添加node节点时报证书问题

Rancher Server 设置

  • Rancher 版本:Rancher:v2.11.0
  • 安装选项 (Docker install/Helm Chart): helm chart
    • 如果是 Helm Chart 安装,需要提供 Local 集群的类型(RKE1, RKE2, k3s, EKS, 等)和版本:k3s:v1.32.3+k3s1
  • 在线或离线部署:离线部署

下游集群信息

  • Kubernetes 版本: v1.32.3+rke2r1
  • Cluster Type (Local/Downstream): Downstream
    • 如果 Downstream,是什么类型的集群?(自定义/导入或为托管 等): 自定义

用户信息

  • 登录用户的角色是什么? (管理员/集群所有者/集群成员/项目所有者/项目成员/自定义):管理员
    • 如果自定义,自定义权限集:管理员

主机操作系统:
Anolis OS 8.10 arm64
内核:5.10.134-18.an8.aarch64

问题描述:
rancher 创建自定义下游集群添加node节点时报证书文件不存在的问题。应该如何解决?
问题:如何给私有镜像仓库授权,配置文件应该写在哪个目录下,格式怎么样?

 cat /etc/rancher/rke2/registries.yaml 
{"configs":{},"mirrors":null}

手工配置进去,再去执行node注册命令时,又被得复原成上面这样。

重现步骤:
1.使用 RKE2创建集群
2.node节点基本配置已经完成,自定义域名已通。
3.复制注册命令进行注册

curl --insecure -fL https://acloud.zylab.com/system-agent-install.sh | sudo  sh -s - --server https://acloud.zylab.com --label 'cattle.io/os=linux' --token 6pg8gdbsvt6sfz9fdsf2ccrdfj84hct74lnsvj6kjlglr55xp2cfhl --ca-checksum a22b08fbba6c6f3949e27df3deadfa84e8cf1dd56ed3f3c77dcfe67d5c319913 --etcd --controlplane --worker

结果:
configuring bootstrap node(s) custom-7491d081ac7f: error applying plan – check rancher-system-agent.service and rke2-server.service logs on node for more information, waiting for agent to check in and apply initial plan

预期结果:
节点注册成功。

截图:

其他上下文信息:

日志
journalctl -xe命令查看日志有报错:

6月 11 15:00:20 worker01.zylab.com rancher-system-agent[1420]: time="2025-06-11T15:00:20+08:00" level=info msg="Pulling image ctrimages.zylab.com/rancher/system-agent-installer-rke2:v1.32.3-rke2r1"
6月 11 15:00:20 worker01.zylab.com rancher-system-agent[1420]: time="2025-06-11T15:00:20+08:00" level=warning msg="Failed to get image from endpoint: GET https://ctrimages.zylab.com/v2/rancher/system-agent-installer-rke2/manifests/v1.32.3-rke2r1: UNAUTHORIZED: unauthorized to access repository: rancher/system-agent-installer-rke2, action: pull: unauthorized to access repository: rancher/system-agent-installer-rke2, action: pull"
6月 11 15:00:20 worker01.zylab.com rancher-system-agent[1420]: time="2025-06-11T15:00:20+08:00" level=error msg="error while staging: all endpoints failed: GET https://ctrimages.zylab.com/v2/rancher/system-agent-installer-rke2/manifests/v1.32.3-rke2r1: UNAUTHORIZED: unauthorized to access repository: rancher/system-agent-installer-rke2, action: pull: unauthorized to access repository: rancher/system-agent-installer-rke2, action: pull: failed to get image ctrimages.zylab.com/rancher/system-agent-installer-rke2:v1.32.3-rke2r1"
6月 11 15:00:20 worker01.zylab.com rancher-system-agent[1420]: time="2025-06-11T15:00:20+08:00" level=error msg="error executing instruction 0: all endpoints failed: GET https://ctrimages.zylab.com/v2/rancher/system-agent-installer-rke2/manifests/v1.32.3-rke2r1: UNAUTHORIZED: unauthorized to access repository: rancher/system-agent-installer-rke2, action: pull: unauthorized to access repository: rancher/system-agent-installer-rke2, action: pull: failed to get image ctrimages.zylab.com/rancher/system-agent-installer-rke2:v1.32.3-rke2r1"
6月 11 15:00:21 worker01.zylab.com rancher-system-agent[1420]: time="2025-06-11T15:00:21+08:00" level=error msg="error loading x509 client cert/key for probe kube-apiserver (/var/lib/rancher/rke2/server/tls/client-kube-apiserver.crt//var/lib/rancher/rke2/server/tls/client-kube-apiserver.key): open /var/lib/rancher/rke2/server/tls/client-kube-apiserver.crt: no such file or directory"
6月 11 15:00:21 worker01.zylab.com rancher-system-agent[1420]: time="2025-06-11T15:00:21+08:00" level=error msg="error loading CA cert for probe (kube-apiserver) /var/lib/rancher/rke2/server/tls/server-ca.crt: open /var/lib/rancher/rke2/server/tls/server-ca.crt: no such file or directory"
6月 11 15:00:21 worker01.zylab.com rancher-system-agent[1420]: time="2025-06-11T15:00:21+08:00" level=error msg="error while appending ca cert to pool for probe kube-apiserver"
6月 11 15:00:21 worker01.zylab.com rancher-system-agent[1420]: time="2025-06-11T15:00:21+08:00" level=error msg="error loading CA cert for probe (kube-scheduler) /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt: open /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt: no such file or directory"
6月 11 15:00:21 worker01.zylab.com rancher-system-agent[1420]: time="2025-06-11T15:00:21+08:00" level=error msg="error while appending ca cert to pool for probe kube-scheduler"
6月 11 15:00:21 worker01.zylab.com rancher-system-agent[1420]: time="2025-06-11T15:00:21+08:00" level=error msg="error loading CA cert for probe (kube-controller-manager) /var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.crt: open /var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.crt: no such file or directory"
6月 11 15:00:21 worker01.zylab.com rancher-system-agent[1420]: time="2025-06-11T15:00:21+08:00" level=error msg="error while appending ca cert to pool for probe kube-controller-manager"
6月 11 15:00:21 worker01.zylab.com rancher-system-agent[1420]: time="2025-06-11T15:00:21+08:00" level=info msg="[K8s] updated plan secret fleet-default/custom-95d0315478c5-machine-plan with feedback"
6月 11 15:00:21 worker01.zylab.com rancher-system-agent[1420]: time="2025-06-11T15:00:21+08:00" level=error msg="error loading CA cert for probe (kube-controller-manager) /var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.crt: open /var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.crt: no such file or directory"
6月 11 15:00:21 worker01.zylab.com rancher-system-agent[1420]: time="2025-06-11T15:00:21+08:00" level=error msg="error while appending ca cert to pool for probe kube-controller-manager"
6月 11 15:00:21 worker01.zylab.com rancher-system-agent[1420]: time="2025-06-11T15:00:21+08:00" level=error msg="error loading CA cert for probe (kube-scheduler) /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt: open /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt: no such file or directory"
6月 11 15:00:21 worker01.zylab.com rancher-system-agent[1420]: time="2025-06-11T15:00:21+08:00" level=error msg="error while appending ca cert to pool for probe kube-scheduler"
6月 11 15:00:21 worker01.zylab.com rancher-system-agent[1420]: time="2025-06-11T15:00:21+08:00" level=error msg="error loading x509 client cert/key for probe kube-apiserver (/var/lib/rancher/rke2/server/tls/client-kube-apiserver.crt//var/lib/rancher/rke2/server/tls/client-kube-apiserver.key): open /var/lib/rancher/rke2/server/tls/client-kube-apiserver.crt: no such file or directory"
6月 11 15:00:21 worker01.zylab.com rancher-system-agent[1420]: time="2025-06-11T15:00:21+08:00" level=error msg="error loading CA cert for probe (kube-apiserver) /var/lib/rancher/rke2/server/tls/server-ca.crt: open /var/lib/rancher/rke2/server/tls/server-ca.crt: no such file or directory"
6月 11 15:00:21 worker01.zylab.com rancher-system-agent[1420]: time="2025-06-11T15:00:21+08:00" level=error msg="error while appending ca cert to pool for probe kube-apiserver"


“Failed to get image from endpoint: GET https://ctrimages.zylab.com/v2/rancher/system-agent-installer-rke2/manifests/v1.32.3-rke2r1: UNAUTHORIZED: unauthorized to access repository: rancher/system-agent-installer-rke2, action: pull: unauthorized to access repository: rancher/system-agent-installer-rke2, action: pull”

被添加的节点不能从镜像库拉下image。可能是https的问题。这个问题解决了,下面的应该不会报错。

谢谢,已经解决。
解决办法: 在Rancher的页面的镜像仓库添加身份认证即可。

:+1:

不过rancher:v2.11.0有一个问题,不知你有没有碰到过。我用了两台k3s组成了local集群,部署了 三个rancher的POD,安装完成后,发现这两台k3s主机CPU一直很高。
每台K3S的配置是 2核,8G内存。
以前我用的Rancher:v2.5.5版本,没这个问题,还有一个rancher:v2.6.13也没有这个问题。