通过rancher ui拉起的k8s集群,从UI删除集群后,如何给该集群添加节点

Rancher Server 设置

  • Rancher 版本:
  • 安装选项 (Docker install/Helm Chart):
    • 如果是 Helm Chart 安装,需要提供 Local 集群的类型(RKE1, RKE2, k3s, EKS, 等)和版本:
  • 在线或离线部署:

下游集群信息

  • Kubernetes 版本:
  • Cluster Type (Local/Downstream):
    • 如果 Downstream,是什么类型的集群?(自定义/导入或为托管 等):

用户信息

  • 登录用户的角色是什么? (管理员/集群所有者/集群成员/项目所有者/项目成员/自定义):
    • 如果自定义,自定义权限集:

主机操作系统:

问题描述:
通过rancher ui拉起的k8s集群,从UI删除集群后,如何给该集群添加节点
重现步骤:

结果:

预期结果:

截图:

其他上下文信息:

日志


得看下游集群是什么集群

下游集群为自定义 rke2拉起

那就参考 rke2 官方去添加节点就行了,参考:Quick Start | RKE2

通过官方文档添加node节点正常,在添加server节点(rke2-server)时,启动报错。报错信息如下:
Oct 22 21:17:30 rancher-04 systemd[1]: Failed to start Rancher Kubernetes Engine v2 (server).
Oct 22 21:17:30 rancher-04 systemd[1]: Unit rke2-server.service entered failed state.
Oct 22 21:17:30 rancher-04 systemd[1]: rke2-server.service failed.
Oct 22 21:17:35 rancher-04 systemd[1]: rke2-server.service holdoff time over, scheduling restart.
Oct 22 21:17:35 rancher-04 systemd[1]: Starting Rancher Kubernetes Engine v2 (server)…
Oct 22 21:17:35 rancher-04 sh[19836]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
Oct 22 21:17:35 rancher-04 sh[19836]: Failed to get unit file state for nm-cloud-setup.service: No such file or directory
Oct 22 21:17:35 rancher-04 rke2[19844]: time=“2024-10-22T21:17:35+08:00” level=warning msg=“not running in CIS mode”
Oct 22 21:17:35 rancher-04 rke2[19844]: time=“2024-10-22T21:17:35+08:00” level=info msg=“Applying Pod Security Admission Configuration”
Oct 22 21:17:35 rancher-04 rke2[19844]: time=“2024-10-22T21:17:35+08:00” level=info msg=“Starting rke2 v1.29.8+rke2r1 (92e522acb6da8cfb1319bd68b5e15f2e4e04f8ff)”
Oct 22 21:17:35 rancher-04 rke2[19844]: time=“2024-10-22T21:17:35+08:00” level=info msg=“Managed etcd cluster not yet initialized”
Oct 22 21:17:35 rancher-04 rke2[19844]: time=“2024-10-22T21:17:35+08:00” level=fatal msg=“starting kubernetes: preparing server: failed to validate server configuration: not authorized”

如上,/etc/rancher/rke2/config.yaml文件的token是从server节点/var/lib/rancher/rke2/server/node-token获取的,目前未查到failed to validate server configuration: not authorized如何处理,求解

猜不出来,还是把详细的操作步骤和相关的配置、环境信息都列出来吧

[root@prod-rancher-04 rke2]# INSTALL_RKE2_ARTIFACT_PATH=/root/rke2 sh install.sh
[INFO] staging local checksums from /root/rke2/sha256sum-amd64.txt
[INFO] staging zst airgap image tarball from /root/rke2/rke2-images.linux-amd64.tar.zst
[INFO] staging tarball from /root/rke2/rke2.linux-amd64.tar.gz
[INFO] verifying airgap tarball
[INFO] installing airgap tarball to /var/lib/rancher/rke2/agent/images
[INFO] verifying tarball
[INFO] unpacking tarball file to /usr/local

2、第一个server节点token
[root@prod-k8smaster-03 server]# cat /var/lib/rancher/rke2/server/node-token
K10a9df1f1dbbb431d01deab9f4ca8e93a9634d15e50f7493f2f046b6593245a51f::server:rxjscfp6bsvq6qdtthpzh8vlb77fkw86zp9mqgbpgxsdtpgwvzfpxr
3、待添加的server的配置文件:
[root@prod-rancher-04 rke2]# cat /etc/rancher/rke2/config.yaml
server: https://172.25.2.75:9345
token: K10a9df1f1dbbb431d01deab9f4ca8e93a9634d15e50f7493f2f046b6593245a51f::server:rxjscfp6bsvq6qdtthpzh8vlb77fkw86zp9mqgbpgxsdtpgwvzfpxr

4、启动rke2-server服务
[root@prod-rancher-04 bin]# ./rke2 server --config /etc/rancher/rke2/config.yaml
WARN[0000] not running in CIS mode
INFO[0000] Applying Pod Security Admission Configuration
INFO[0000] Starting rke2 v1.29.8+rke2r1 (92e522acb6da8cfb1319bd68b5e15f2e4e04f8ff)
INFO[0000] Managed etcd cluster not yet initialized
FATA[0000] starting kubernetes: preparing server: failed to validate server configuration: not authorized

5、已有的server节点日志
Oct 23 21:04:26 prod-k8smaster-03 rke2[29211]: time=“2024-10-23T21:04:26+08:00” level=error msg=“Failed to authenticate request from 172.25.2.79:40678: invalid username/password combination”
Oct 23 21:04:26 prod-k8smaster-03 rke2[29211]: time=“2024-10-23T21:04:26+08:00” level=error msg=“Sending HTTP 401 response to 172.25.2.79:40678: not authorized”

6、以上操作前提为rancher页面自定义rke2部署的集群,因删除后无法再从页面注册server节点,使用该方式来添加server节点,启动rke2-server服务。(备注:添加agent节点可以正常加入)

改成下面这样试试:

server: https://172.25.2.75:9345
token: rxjscfp6bsvq6qdtthpzh8vlb77fkw86zp9mqgbpgxsdtpgwvzfpxr

Oct 25 10:45:13 prod-rancher-04 systemd[1]: Starting Rancher Kubernetes Engine v2 (server)…

Oct 25 10:45:13 prod-rancher-04 sh[18870]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service

Oct 25 10:45:13 prod-rancher-04 sh[18870]: Failed to get unit file state for nm-cloud-setup.service: No such file or directory

Oct 25 10:45:13 prod-rancher-04 rke2[18877]: time=“2024-10-25T10:45:13+08:00” level=warning msg=“not running in CIS mode”

Oct 25 10:45:13 prod-rancher-04 rke2[18877]: time=“2024-10-25T10:45:13+08:00” level=info msg=“Applying Pod Security Admission Configuration”

Oct 25 10:45:13 prod-rancher-04 rke2[18877]: time=“2024-10-25T10:45:13+08:00” level=info msg=“Starting rke2 v1.29.8+rke2r1 (92e522acb6da8cfb1319bd68b5e15f2e4e04f8ff)”

Oct 25 10:45:13 prod-rancher-04 rke2[18877]: time=“2024-10-25T10:45:13+08:00” level=warning msg=“Cluster CA certificate is not trusted by the host CA bundle, but the token does not include a CA hash. Use the full token from the server’s node-token file to enable Cluster CA validation.”

Oct 25 10:45:13 prod-rancher-04 rke2[18877]: time=“2024-10-25T10:45:13+08:00” level=info msg=“Managed etcd cluster not yet initialized”

Oct 25 10:45:13 prod-rancher-04 rke2[18877]: time=“2024-10-25T10:45:13+08:00” level=warning msg=“Cluster CA certificate is not trusted by the host CA bundle, but the token does not include a CA hash. Use the full token from the server’s node-token file to enable Cluster CA validation.”

Oct 25 10:45:13 prod-rancher-04 rke2[18877]: time=“2024-10-25T10:45:13+08:00” level=fatal msg=“starting kubernetes: preparing server: failed to validate server configuration: not authorized”

Oct 25 10:45:13 prod-rancher-04 systemd[1]: rke2-server.service: main process exited, code=exited, status=1/FAILURE

Oct 25 10:45:13 prod-rancher-04 systemd[1]: Failed to start Rancher Kubernetes Engine v2 (server).

Oct 25 10:45:13 prod-rancher-04 systemd[1]: Unit rke2-server.service entered failed state.

Oct 25 10:45:13 prod-rancher-04 systemd[1]: rke2-server.service failed.

尝试后,显示token不完整