Rancher导入k8s集群失败

Rancher Server 设置

  • Rancher 版本: rancher2.5.17版本
  • 安装选项 (Docker install/Helm Chart):
    • 如果是 Helm Chart 安装,需要提供 Local 集群的类型(RKE1, RKE2, k3s, EKS, 等)和版本:
  • 在线或离线部署:
    我这里使用docker run 方式安装的rancher
    #!/bin/bash

运行rancher

docker run -itd --name rancher
-p 80:80 -p 443:443
–restart=unless-stopped
-v /data/rancher-server:/var/lib/rancher
-v /var/log/rancher/auditlog:/var/log/auditlog
-e AUDIT_LEVEL=3
–privileged
rancher/rancher:v2.5.17 --no-cacerts

[root@sealos-node02 rancher]# sh run-rancher.sh

下游集群信息

  • Kubernetes 版本: v1.21.0
  • Cluster Type (Local/Downstream):
    • 如果 Downstream,是什么类型的集群?(自定义/导入或为托管 等):

通过自己单独部署的k8s集群,然后导入rancher

用户信息

  • 登录用户的角色是什么? (管理员/集群所有者/集群成员/项目所有者/项目成员/自定义):
    • 如果自定义,自定义权限集:admin

主机操作系统:

问题描述:

重现步骤:
执行导入集群:

2、创建clusterrolebinding
[root@localhost sealos]# grep -i user /etc/kubernetes/kubelet.conf
user: system:node:sealos.hub
users:
user:
[root@localhost sealos]# kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user system:node:sealos.hub

3、安装agent
[root@localhost sealos]# curl --insecure -sfL https://192.168.92.32/v3/import/8cqtdjzgddd2hr45546cbxs5cp5s9ks2kl4z5s5ghsp7kjbggcdsw5_c-lkhrd.yaml | kubectl apply -f -
clusterrole.rbac.authorization.k8s.io/proxy-clusterrole-kubeapiserver created
clusterrolebinding.rbac.authorization.k8s.io/proxy-role-binding-kubernetes-master created
namespace/cattle-system created
serviceaccount/cattle created
clusterrolebinding.rbac.authorization.k8s.io/cattle-admin-binding created
secret/cattle-credentials-4375f45 created
clusterrole.rbac.authorization.k8s.io/cattle-admin created
deployment.apps/cattle-cluster-agent created

结果:

[root@localhost sealos]# kubectl logs -f cattle-cluster-agent-689fd565cb-rhgxv -n cattle-system -f
INFO: Environment: CATTLE_ADDRESS=100.123.220.4 CATTLE_CA_CHECKSUM= CATTLE_CLUSTER=true CATTLE_CLUSTER_REGISTRY= CATTLE_FEATURES= CATTLE_INGRESS_IP_DOMAIN=sslip.io CATTLE_INSTALL_UUID=be11a475-b7da-4338-8edf-74da767066a5 CATTLE_INTERNAL_ADDRESS= CATTLE_IS_RKE=false CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=cattle-cluster-agent-689fd565cb-rhgxv CATTLE_SERVER=https://192.168.92.32 CATTLE_SERVER_VERSION=v2.5.17
INFO: Using resolv.conf: nameserver 10.96.0.10 search cattle-system.svc.cluster.local svc.cluster.local cluster.local hub options ndots:5
FO: https://192.168.92.32/ping is accessible
time=“2023-09-08T09:05:42Z” level=info msg=“Listening on /tmp/log.sock”
time=“2023-09-08T09:05:42Z” level=info msg=“Rancher agent version v2.5.17 is starting”
time=“2023-09-08T09:05:42Z” level=info msg=“Certificate details from https://192.168.92.32
time=“2023-09-08T09:05:42Z” level=info msg=“Certificate #0 (https://192.168.92.32)”
time=“2023-09-08T09:05:42Z” level=info msg=“Subject: CN=dynamic,O=dynamic”
time=“2023-09-08T09:05:42Z” level=info msg=“Issuer: CN=dynamiclistener-ca,O=dynamiclistener-org”
time=“2023-09-08T09:05:42Z” level=info msg=“IsCA: false”
time=“2023-09-08T09:05:42Z” level=info msg=“DNS Names: [localhost rancher.cattle-system]”
time=“2023-09-08T09:05:42Z” level=info msg=“IPAddresses: [127.0.0.1 172.17.0.2 192.168.92.32]”
time=“2023-09-08T09:05:42Z” level=info msg=“NotBefore: 2023-09-01 08:40:58 +0000 UTC”
time=“2023-09-08T09:05:42Z” level=info msg=“NotAfter: 2024-08-31 08:45:53 +0000 UTC”
time=“2023-09-08T09:05:42Z” level=info msg=“SignatureAlgorithm: ECDSA-SHA256”
time=“2023-09-08T09:05:42Z” level=info msg=“PublicKeyAlgorithm: ECDSA”
time=“2023-09-08T09:05:42Z” level=fatal msg=“Certificate chain is not complete, please check if all needed intermediate certificates are included in the server certificate (in the correct order) and if the cacerts setting in Rancher either contains the correct CA certificate (in the case of using self signed certificates) or is empty (in the case of using a certificate signed by a recognized CA). Certificate information is displayed above. error: Get “https://192.168.92.32”: x509: certificate signed by unknown authority”

预期结果:

截图:

其他上下文信息:

日志
[root@localhost sealos]# kubectl  logs -f cattle-cluster-agent-689fd565cb-rhgxv  -n cattle-system -f 
INFO: Environment: CATTLE_ADDRESS=100.123.220.4 CATTLE_CA_CHECKSUM= CATTLE_CLUSTER=true CATTLE_CLUSTER_REGISTRY= CATTLE_FEATURES= CATTLE_INGRESS_IP_DOMAIN=sslip.io CATTLE_INSTALL_UUID=be11a475-b7da-4338-8edf-74da767066a5 CATTLE_INTERNAL_ADDRESS= CATTLE_IS_RKE=false CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=cattle-cluster-agent-689fd565cb-rhgxv CATTLE_SERVER=https://192.168.92.32 CATTLE_SERVER_VERSION=v2.5.17
INFO: Using resolv.conf: nameserver 10.96.0.10 search cattle-system.svc.cluster.local svc.cluster.local cluster.local hub options ndots:5
  FO: https://192.168.92.32/ping is accessible
time="2023-09-08T09:05:42Z" level=info msg="Listening on /tmp/log.sock"
time="2023-09-08T09:05:42Z" level=info msg="Rancher agent version v2.5.17 is starting"
time="2023-09-08T09:05:42Z" level=info msg="Certificate details from https://192.168.92.32"
time="2023-09-08T09:05:42Z" level=info msg="Certificate #0 (https://192.168.92.32)"
time="2023-09-08T09:05:42Z" level=info msg="Subject: CN=dynamic,O=dynamic"
time="2023-09-08T09:05:42Z" level=info msg="Issuer: CN=dynamiclistener-ca,O=dynamiclistener-org"
time="2023-09-08T09:05:42Z" level=info msg="IsCA: false"
time="2023-09-08T09:05:42Z" level=info msg="DNS Names: [localhost rancher.cattle-system]"
time="2023-09-08T09:05:42Z" level=info msg="IPAddresses: [127.0.0.1 172.17.0.2 192.168.92.32]"
time="2023-09-08T09:05:42Z" level=info msg="NotBefore: 2023-09-01 08:40:58 +0000 UTC"
time="2023-09-08T09:05:42Z" level=info msg="NotAfter: 2024-08-31 08:45:53 +0000 UTC"
time="2023-09-08T09:05:42Z" level=info msg="SignatureAlgorithm: ECDSA-SHA256"
time="2023-09-08T09:05:42Z" level=info msg="PublicKeyAlgorithm: ECDSA"
time="2023-09-08T09:05:42Z" level=fatal msg="Certificate chain is not complete, please check if all needed intermediate certificates are included in the server certificate (in the correct order) and if the cacerts setting in Rancher either contains the correct CA certificate (in the case of using self signed certificates) or is empty (in the case of using a certificate signed by a recognized CA). Certificate information is displayed above. error: Get \"https://192.168.92.32\": x509: certificate signed by unknown authority"


也许是你使用的 rancher 版本和 K8s 版本不匹配导致,你可以查看对应的支持情况,参考:

Rancher的版本必须只能支持默认的4个版本的k8s吗?
我们在rancher的UI页面上创建集群的时候,下拉框选择k8s版本。是不是只能只能支持他那下拉框里面的K8s版本。

k8s的版本导入必须要和rancher的版本对应吗?

我已经升级了rancher,升级到v2.6.0里。 导入k8s1.21.0 还是报错

[root@localhost sealos]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
sealos.hub Ready control-plane,master 59m v1.21.0

rancher: v2.6.0

[root@localhost sealos]# kubectl logs cattle-cluster-agent-6dcbfff7cb-98thw -n cattle-system -f
INFO: Environment: CATTLE_ADDRESS=100.123.220.5 CATTLE_CA_CHECKSUM= CATTLE_CLUSTER=true CATTLE_CLUSTER_AGENT_PORT=tcp://10.96.3.24:80 CATTLE_CLUSTER_AGENT_PORT_443_TCP=tcp://10.96.3.24:443 CATTLE_CLUSTER_AGENT_PORT_443_TCP_ADDR=10.96.3.24 CATTLE_CLUSTER_AGENT_PORT_443_TCP_PORT=443 CATTLE_CLUSTER_AGENT_PORT_443_TCP_PROTO=tcp CATTLE_CLUSTER_AGENT_PORT_80_TCP=tcp://10.96.3.24:80 CATTLE_CLUSTER_AGENT_PORT_80_TCP_ADDR=10.96.3.24 CATTLE_CLUSTER_AGENT_PORT_80_TCP_PORT=80 CATTLE_CLUSTER_AGENT_PORT_80_TCP_PROTO=tcp CATTLE_CLUSTER_AGENT_SERVICE_HOST=10.96.3.24 CATTLE_CLUSTER_AGENT_SERVICE_PORT=80 CATTLE_CLUSTER_AGENT_SERVICE_PORT_HTTP=80 CATTLE_CLUSTER_AGENT_SERVICE_PORT_HTTPS_INTERNAL=443 CATTLE_INTERNAL_ADDRESS= CATTLE_IS_RKE=false CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=cattle-cluster-agent-6dcbfff7cb-98thw CATTLE_SERVER=https://192.168.92.32
INFO: Using resolv.conf: nameserver 10.96.0.10 search cattle-system.svc.cluster.local svc.cluster.local cluster.local hub options ndots:5
INFO: https://192.168.92.32/ping is accessible
time=“2023-09-11T03:39:30Z” level=info msg=“Listening on /tmp/log.sock”
time=“2023-09-11T03:39:30Z” level=info msg=“Rancher agent version v2.6.0 is starting”
time=“2023-09-11T03:39:30Z” level=info msg=“Certificate details from https://192.168.92.32
time=“2023-09-11T03:39:30Z” level=info msg=“Certificate #0 (https://192.168.92.32)”
time=“2023-09-11T03:39:30Z” level=info msg=“Subject: CN=dynamic,O=dynamic”
time=“2023-09-11T03:39:30Z” level=info msg=“Issuer: CN=dynamiclistener-ca,O=dynamiclistener-org”
time=“2023-09-11T03:39:30Z” level=info msg=“IsCA: false”
time=“2023-09-11T03:39:30Z” level=info msg=“DNS Names: [localhost rancher.cattle-system]”
time=“2023-09-11T03:39:30Z” level=info msg=“IPAddresses: [127.0.0.1 172.17.0.2 192.168.92.32]”
time=“2023-09-11T03:39:30Z” level=info msg=“NotBefore: 2023-09-01 08:40:58 +0000 UTC”
time=“2023-09-11T03:39:30Z” level=info msg=“NotAfter: 2024-09-10 03:18:17 +0000 UTC”
time=“2023-09-11T03:39:30Z” level=info msg=“SignatureAlgorithm: ECDSA-SHA256”
time=“2023-09-11T03:39:30Z” level=info msg=“PublicKeyAlgorithm: ECDSA”
time=“2023-09-11T03:39:30Z” level=fatal msg=“Certificate chain is not complete, please check if all needed intermediate certificates are included in the server certificate (in the correct order) and if the cacerts setting in Rancher either contains the correct CA certificate (in the case of using self signed certificates) or is empty (in the case of using a certificate signed by a recognized CA). Certificate information is displayed above. error: Get “https://192.168.92.32”: x509: certificate signed by unknown authority”

理论上,其他版本也能使用,但有可能会出现问题,对应支持矩阵的版本会支持的更好一些

证书问题,你为什么要加 --no-cacerts 参数?如果没特殊原因,清空数据目录,去掉这个参数重新运行

1 个赞

确实可以。 我去掉了参数 --no-cacerts。 导入成功了。
但是,生产环境的话,用了自己的tls证书。这个有啥办法可以解决吗?
我的生产环境这样运行的:

#!/bin/bash
# 运行rancher

docker run -itd --name rancher \
           -p 80:80 \
           -p 443:443 \
           --restart=unless-stopped \
           -v ${PWD}/rancher-data:/var/lib/rancher \
           -v /var/log/rancher/auditlog:/var/log/auditlog \
           -v ${PWD}/tls.crt:/etc/rancher/ssl/cert.pem \
           -v ${PWD}/tls.key:/etc/rancher/ssl/key.pem \
           -e AUDIT_LEVEL=3 \
           --privileged \
           rancher/rancher:v2.5.17 --no-cacerts 

tls证书是浏览器信任的证书。

这个有啥办法可以解决吗?当我导入一个k8s集群的时候。

你的证书哪来的

公司买的阿

你试试在 全局设置–> server-url 中,将rancher 的地址修改为你证书对应的域名,然后重新导入

rancher的域名生产环境是证书提供的域名。

我的意思是,生产环境上启动rancher的时候也带了参数"–no-cacerts ",导入集群也不行。

测试环境启动rancher,我去掉了参数"–no-cacerts "这个参数,就导入成功了。

日志报的就是证书链不完整的问题,我不管你是生产还是测试环境,既然你用证书去安装 rancher,如果你没在证书里添加对应允许的 IP,那你就应该用域名去访问 rancher 和 导入下游集群,否则肯定过不去