离线 Helm CLI 安装Rancher 2.8.5 基于rke2的k8s集群,新建集群创建node节点失败

一、部署rancher 2.8.5
准备环境:
IP:192.168.8.30 (rke2单节点)用来部署rancher 2.8.5的,操作系统:CentOS 7.9
IP:192.168.8.31 用来加入rancher创建的自定义集群的。操作系统:CentOS 7.9

内网单独部署http方式 harbor镜像仓库,收集镜像并发布到私有仓库2.8.5,操作过程略

harbor镜像仓库是http的

1.1 安装RKE2

mkdir -p /etc/rancher/rke2/
cat >/etc/rancher/rke2/config.yaml <<EOF
node-name: rancher-30
tls-san: 192.168.8.30
system-default-registry: "registry.cn-hangzhou.aliyuncs.com"
kube-proxy-arg:
  - "proxy-mode=ipvs"
  - "ipvs-strict-arp=true"
EOF
cat /etc/rancher/rke2/config.yaml

安装rke2

curl -sfL http://rancher-mirror.rancher.cn/rke2/install.sh | INSTALL_RKE2_MIRROR=cn INSTALL_RKE2_VERSION=v1.28.15+rke2r1 sh -

启动

systemctl enable --now rke2-server.service  # 开机自启动和启动,使用

环境变量

# vim /etc/profile.d/rke2.sh
export PATH=$PATH:/var/lib/rancher/rke2/bin/
export KUBECONFIG=/etc/rancher/rke2/rke2.yaml

source /etc/profile

1.2 安装Rancher 2.8.5

添加 Helm Chart 仓库

mkdir tools
cd tools
wget https://get.helm.sh/helm-v3.14.2-linux-amd64.tar.gz
tar -xzvf helm-v3.14.2-linux-amd64.tar.gz
cp linux-amd64/helm /usr/local/bin/
helm version
rm -rf linux-amd64
# Latest: 
helm repo add rancher-latest https://releases.rancher.com/server-charts/latest

# Stable:
helm repo add rancher-stable https://releases.rancher.com/server-charts/stable

# 输出如下
"rancher-stable" has been added to your repositories

# 更新helm仓库
helm repo update
# 查看helm仓库
helm repo list

helm fetch rancher-stable/rancher --version=v2.8.5

一键生成自签名 SSL 证书

证书来自:生成自签名 SSL 证书 | Rancher文档

生产证书文件

bash create_self-signed-cert.sh --ssl-domain=rancher-30.lalalajr.com --ssl-size=2048 --ssl-date=36500

添加 TLS 密文

# 创建命名空间:cattle-system
cd
kubectl create namespace cattle-system

## 创建自签名Ingress证书
kubectl -n cattle-system create secret tls tls-rancher-ingress \
  --cert=/root/ssl/tls.crt \
  --key=/root/ssl/tls.key

## 创建自签名证书CA
kubectl -n cattle-system create secret generic tls-ca \
  --from-file=cacerts.pem=/root/ssl/cacerts.pem

安装 Rancher

helm install rancher ./rancher-2.8.5.tgz \
 --namespace cattle-system \
 --set hostname=rancher-30.lalalajr.com \
 --set rancherImage=registry-70.lalalajr.com/rancher/rancher \
 --set ingress.tls.source=secret \
 --set privateCA=true \
 --set systemDefaultRegistry=registry-70.lalalajr.com \
 --set useBundledSystemChart=true 

DNS服务器配置解析:

192.168.8.30 rancher-30.lalalajr.com

浏览器访问:rancher-30.lalalajr.com
配置密码

二、Rancher 2.8.5 创建自定义集群

使用现有节点并使用 RKE2/K3s 创建集群


节点报错:rkecontrolplane was already initialized but no etcd machines exist that have plans, indicating the etcd plane has been entirely replaced. Restoration from etcd snapshot is required.


节点:k8s-master-31加入此集群
节点配置私有仓库:

[root@k8s-master-31 ~]# cat /etc/rancher/rke2/registries.yaml
mirrors:
  registry-70.lalalajr.com:
    endpoint:
      - "http://registry-70.lalalajr.com"
configs:
  "http://registry-70.lalalajr.com":
    auth:
      username: rke2
      password: B64O1ed7POH[Gc63Y3oS
[root@k8s-master-31 ~]# 

得看下 下游集群的 rancher-system-agent 服务的日志,那里应该有一些有用的报错信息,还有 rancher server 的日志也得看看

1、下游集群的 rancher-system-agent 服务的日志

[root@k8s-master-31 ~]# systemctl status rancher-system-agent
● rancher-system-agent.service - Rancher System Agent
   Loaded: loaded (/etc/systemd/system/rancher-system-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2024-11-29 17:09:09 CST; 3 days ago
     Docs: https://www.rancher.com
 Main PID: 2269 (rancher-system-)
   CGroup: /system.slice/rancher-system-agent.service
           └─2269 /usr/local/bin/rancher-system-agent sentinel

Dec 01 01:47:23 k8s-master-31 rancher-system-agent[2269]: Trace[1115559670]: [12m0.10976892s] ...ND
Dec 01 01:47:23 k8s-master-31 rancher-system-agent[2269]: E1201 01:47:23.296855    2269 reflecto...
Dec 01 01:58:29 k8s-master-31 rancher-system-agent[2269]: W1201 01:58:29.951377    2269 reflecto...
Dec 01 02:00:30 k8s-master-31 rancher-system-agent[2269]: W1201 02:00:30.381867    2269 reflecto...
Dec 01 03:03:02 k8s-master-31 rancher-system-agent[2269]: W1201 03:03:02.330587    2269 reflecto...
Dec 01 03:03:02 k8s-master-31 rancher-system-agent[2269]: I1201 03:03:02.330679    2269 trace....):
Dec 01 03:03:02 k8s-master-31 rancher-system-agent[2269]: Trace[1101545984]: ---"Objects listed"...
Dec 01 03:03:02 k8s-master-31 rancher-system-agent[2269]: Trace[1101545984]: [1h2m30.574663985...ND
Dec 01 03:03:02 k8s-master-31 rancher-system-agent[2269]: E1201 03:03:02.330702    2269 reflecto...
Dec 01 03:06:20 k8s-master-31 rancher-system-agent[2269]: W1201 03:06:20.297402    2269 reflecto...
Hint: Some lines were ellipsized, use -l to show in full.
[root@k8s-master-31 ~]# journalctl -fu rancher-system-agent
-- Logs begin at Fri 2024-11-29 23:18:12 CST. --
Dec 01 01:47:23 k8s-master-31 rancher-system-agent[2269]: Trace[1115559670]: [12m0.10976892s] [12m0.10976892s] END
Dec 01 01:47:23 k8s-master-31 rancher-system-agent[2269]: E1201 01:47:23.296855    2269 reflector.go:148] pkg/mod/github.com/rancher/client-go@v1.27.4-rancher1/tools/cache/reflector.go:231: Failed to watch *v1.Secret: failed to list *v1.Secret: an error on the server ("<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>") has prevented the request from succeeding (get secrets.meta.k8s.io)
Dec 01 01:58:29 k8s-master-31 rancher-system-agent[2269]: W1201 01:58:29.951377    2269 reflector.go:456] pkg/mod/github.com/rancher/client-go@v1.27.4-rancher1/tools/cache/reflector.go:231: watch of *v1.Secret ended with: an error on the server ("unable to decode an event from the watch stream: stream error: stream ID 217; INTERNAL_ERROR; received from peer") has prevented the request from succeeding
Dec 01 02:00:30 k8s-master-31 rancher-system-agent[2269]: W1201 02:00:30.381867    2269 reflector.go:456] pkg/mod/github.com/rancher/client-go@v1.27.4-rancher1/tools/cache/reflector.go:231: watch of *v1.Secret ended with: an error on the server ("unable to decode an event from the watch stream: stream error: stream ID 221; INTERNAL_ERROR; received from peer") has prevented the request from succeeding
Dec 01 03:03:02 k8s-master-31 rancher-system-agent[2269]: W1201 03:03:02.330587    2269 reflector.go:533] pkg/mod/github.com/rancher/client-go@v1.27.4-rancher1/tools/cache/reflector.go:231: failed to list *v1.Secret: an error on the server ("<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>") has prevented the request from succeeding (get secrets.meta.k8s.io)
Dec 01 03:03:02 k8s-master-31 rancher-system-agent[2269]: I1201 03:03:02.330679    2269 trace.go:219] Trace[1101545984]: "Reflector ListAndWatch" name:pkg/mod/github.com/rancher/client-go@v1.27.4-rancher1/tools/cache/reflector.go:231 (01-Dec-2024 02:00:31.755) (total time: 3750574ms):
Dec 01 03:03:02 k8s-master-31 rancher-system-agent[2269]: Trace[1101545984]: ---"Objects listed" error:an error on the server ("<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>") has prevented the request from succeeding (get secrets.meta.k8s.io) 3750574ms (03:03:02.330)
Dec 01 03:03:02 k8s-master-31 rancher-system-agent[2269]: Trace[1101545984]: [1h2m30.574663985s] [1h2m30.574663985s] END
Dec 01 03:03:02 k8s-master-31 rancher-system-agent[2269]: E1201 03:03:02.330702    2269 reflector.go:148] pkg/mod/github.com/rancher/client-go@v1.27.4-rancher1/tools/cache/reflector.go:231: Failed to watch *v1.Secret: failed to list *v1.Secret: an error on the server ("<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>") has prevented the request from succeeding (get secrets.meta.k8s.io)
Dec 01 03:06:20 k8s-master-31 rancher-system-agent[2269]: W1201 03:06:20.297402    2269 reflector.go:456] pkg/mod/github.com/rancher/client-go@v1.27.4-rancher1/tools/cache/reflector.go:231: watch of *v1.Secret ended with: an error on the server ("unable to decode an event from the watch stream: stream error: stream ID 407; INTERNAL_ERROR; received from peer") has prevented the request from succeeding

2、rancher server 的日志

[root@rancher-30 ~]# kubectl -n cattle-system logs -f rancher-6bbb576fc7-q9td6
Doing /etc/rancher/ssl
2024/11/30 19:00:41 [INFO] Rancher version v2.8.5 (7af1354e9) is starting
2024/11/30 19:00:41 [INFO] Rancher arguments {ACMEDomains:[] AddLocal:true Embedded:false BindHost: HTTPListenPort:80 HTTPSListenPort:443 K8sMode:auto Debug:false Trace:false NoCACerts:false AuditLogPath:/var/log/auditlog/rancher-api-audit.log AuditLogMaxage:10 AuditLogMaxsize:100 AuditLogMaxbackup:10 AuditLevel:0 Features: ClusterRegistry:}
2024/11/30 19:00:41 [INFO] Listening on /tmp/log.sock
2024/11/30 19:00:41 [INFO] Running in clustered mode with ID 10.42.0.21, monitoring endpoint cattle-system/rancher
2024/11/30 19:00:41 [INFO] Applying CRD features.management.cattle.io
2024/11/30 19:00:42 [INFO] Updating embedded CRD clusterroletemplatebindings.management.cattle.io
2024/11/30 19:00:42 [INFO] Updating embedded CRD globalroles.management.cattle.io
2024/11/30 19:00:42 [INFO] Updating embedded CRD globalrolebindings.management.cattle.io
2024/11/30 19:00:42 [INFO] Updating embedded CRD projects.management.cattle.io
2024/11/30 19:00:42 [INFO] Updating embedded CRD projectroletemplatebindings.management.cattle.io
2024/11/30 19:00:42 [INFO] Updating embedded CRD roletemplates.management.cattle.io
2024/11/30 19:00:43 [INFO] Applying CRD navlinks.ui.cattle.io
2024/11/30 19:00:43 [INFO] Applying CRD podsecurityadmissionconfigurationtemplates.management.cattle.io
2024/11/30 19:00:43 [INFO] Applying CRD clusters.management.cattle.io
2024/11/30 19:00:43 [INFO] Applying CRD apiservices.management.cattle.io
2024/11/30 19:00:43 [INFO] Applying CRD clusterregistrationtokens.management.cattle.io
2024/11/30 19:00:43 [INFO] Applying CRD settings.management.cattle.io
2024/11/30 19:00:43 [INFO] Applying CRD preferences.management.cattle.io
2024/11/30 19:00:43 [INFO] Applying CRD features.management.cattle.io
2024/11/30 19:00:43 [INFO] Applying CRD clusterrepos.catalog.cattle.io
2024/11/30 19:00:43 [INFO] Applying CRD operations.catalog.cattle.io
2024/11/30 19:00:43 [INFO] Applying CRD apps.catalog.cattle.io
2024/11/30 19:00:44 [INFO] Applying CRD fleetworkspaces.management.cattle.io
2024/11/30 19:00:44 [INFO] Applying CRD managedcharts.management.cattle.io
2024/11/30 19:00:44 [INFO] Applying CRD clusters.provisioning.cattle.io
2024/11/30 19:00:44 [INFO] Applying CRD clusters.provisioning.cattle.io
2024/11/30 19:00:44 [INFO] Applying CRD rkeclusters.rke.cattle.io
2024/11/30 19:00:44 [INFO] Applying CRD rkecontrolplanes.rke.cattle.io
2024/11/30 19:00:45 [INFO] Applying CRD rkebootstraps.rke.cattle.io
2024/11/30 19:00:45 [INFO] Applying CRD rkebootstraptemplates.rke.cattle.io
2024/11/30 19:00:45 [INFO] Applying CRD rkecontrolplanes.rke.cattle.io
2024/11/30 19:00:45 [INFO] Applying CRD custommachines.rke.cattle.io
2024/11/30 19:00:45 [INFO] Applying CRD etcdsnapshots.rke.cattle.io
2024/11/30 19:00:45 [INFO] Applying CRD clusters.cluster.x-k8s.io
2024/11/30 19:00:45 [INFO] Applying CRD machinedeployments.cluster.x-k8s.io
2024/11/30 19:00:45 [INFO] Applying CRD machinehealthchecks.cluster.x-k8s.io
2024/11/30 19:00:45 [INFO] Applying CRD machines.cluster.x-k8s.io
2024/11/30 19:00:45 [INFO] Applying CRD machinesets.cluster.x-k8s.io
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=Token controller
2024/11/30 19:01:07 [INFO] Starting rbac.authorization.k8s.io/v1, Kind=ClusterRole controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=MultiClusterApp controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=NodeTemplate controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=Setting controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=Node controller
2024/11/30 19:01:07 [INFO] Starting /v1, Kind=Namespace controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=RoleTemplate controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=APIService controller
2024/11/30 19:01:07 [INFO] Starting rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding controller
2024/11/30 19:01:07 [INFO] Starting project.cattle.io/v3, Kind=App controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=CatalogTemplateVersion controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=RkeAddon controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=Catalog controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=ClusterRoleTemplateBinding controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=DynamicSchema controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=GlobalRoleBinding controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=RkeK8sServiceOption controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=ProjectCatalog controller
2024/11/30 19:01:07 [INFO] Starting API controllers
2024/11/30 19:01:07 [INFO] Starting catalog.cattle.io/v1, Kind=ClusterRepo controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=ClusterTemplateRevision controller
2024/11/30 19:01:07 [INFO] Starting apiextensions.k8s.io/v1, Kind=CustomResourceDefinition controller
2024/11/30 19:01:07 [INFO] Starting cluster.x-k8s.io/v1beta1, Kind=Machine controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=Feature controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=CatalogTemplate controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=NodeDriver controller
2024/11/30 19:01:07 [INFO] Starting rbac.authorization.k8s.io/v1, Kind=RoleBinding controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=GroupMember controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=ClusterRegistrationToken controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=AuthConfig controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=GlobalDns controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=ClusterCatalog controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=Preference controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=UserAttribute controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=User controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=PodSecurityPolicyTemplateProjectBinding controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=Project controller
2024/11/30 19:01:07 [INFO] Starting /v1, Kind=Secret controller
2024/11/30 19:01:07 [INFO] Starting /v1, Kind=ServiceAccount controller
2024/11/30 19:01:07 [INFO] Starting provisioning.cattle.io/v1, Kind=Cluster controller
2024/11/30 19:01:07 [INFO] Starting /v1, Kind=Endpoints controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=RkeK8sSystemImage controller
2024/11/30 19:01:07 [INFO] Starting rbac.authorization.k8s.io/v1, Kind=Role controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=Cluster controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=KontainerDriver controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=Group controller
I1130 19:01:07.266042      40 leaderelection.go:250] attempting to acquire leader lease kube-system/cattle-controllers...
2024/11/30 19:01:07 [INFO] Starting rke.cattle.io/v1, Kind=RKEBootstrap controller
2024/11/30 19:01:07 [INFO] Starting /v1, Kind=ConfigMap controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=GlobalRole controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=PodSecurityPolicyTemplate controller
2024/11/30 19:01:07 [INFO] Adding peer wss://10.42.0.22/v3/connect, 10.42.0.22
2024/11/30 19:01:07 [INFO] Adding peer wss://10.42.0.23/v3/connect, 10.42.0.23
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=MultiClusterAppRevision controller
2024/11/30 19:01:07 [INFO] Starting apiregistration.k8s.io/v1, Kind=APIService controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=NodePool controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=ProjectRoleTemplateBinding controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=ClusterTemplate controller
2024/11/30 19:01:07 [ERROR] Failed to connect to peer wss://10.42.0.22/v3/connect [local ID=10.42.0.21]: dial tcp 10.42.0.22:443: connect: connection refused
2024/11/30 19:01:07 [ERROR] Failed syncing peers [{10.42.0.21 [10.42.0.22 10.42.0.23] true false}]: failed to start user controllers for cluster c-m-s4nlmmqq: ClusterUnavailable 503: cluster not found
2024/11/30 19:01:07 [ERROR] error syncing '_all_': handler user-controllers-controller: userControllersController: failed to set peers for key _all_: failed to start user controllers for cluster c-m-s4nlmmqq: ClusterUnavailable 503: cluster not found, requeuing
2024/11/30 19:01:07 [ERROR] error syncing '_all_': handler user-controllers-controller: userControllersController: failed to set peers for key _all_: failed to start user controllers for cluster c-m-s4nlmmqq: ClusterUnavailable 503: cluster not found, requeuing
2024/11/30 19:01:07 [INFO] Starting cluster controllers for local
2024/11/30 19:01:07 [ERROR] error syncing '_all_': handler user-controllers-controller: userControllersController: failed to set peers for key _all_: failed to start user controllers for cluster c-m-s4nlmmqq: ClusterUnavailable 503: cluster not found, requeuing
2024/11/30 19:01:07 [INFO] Starting /v1, Kind=Secret controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=Cluster controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=GroupMember controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=Group controller
2024/11/30 19:01:07 [INFO] Starting /v1, Kind=ConfigMap controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=Token controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=User controller
2024/11/30 19:01:07 [INFO] Starting management.cattle.io/v3, Kind=UserAttribute controller
2024/11/30 19:01:07 [ERROR] error syncing '_all_': handler user-controllers-controller: userControllersController: failed to set peers for key _all_: failed to start user controllers for cluster c-m-s4nlmmqq: ClusterUnavailable 503: cluster not found, requeuing
2024/11/30 19:01:07 [INFO] Active TLS secret cattle-system/serving-cert (ver=500335) (count 8): map[field.cattle.io/projectId:local:p-42zvc listener.cattle.io/cn-10.42.0.21:10.42.0.21 listener.cattle.io/cn-10.42.0.22:10.42.0.22 listener.cattle.io/cn-10.42.0.23:10.42.0.23 listener.cattle.io/cn-127.0.0.1:127.0.0.1 listener.cattle.io/cn-localhost:localhost listener.cattle.io/cn-rancher-30.lakalajr.com:rancher-30.lakalajr.com listener.cattle.io/cn-rancher.cattle-system:rancher.cattle-system listener.cattle.io/fingerprint:SHA1=0EB6575726DE013C97E0EA471A29E6088D029D56]
2024/11/30 19:01:07 [INFO] Listening on :443

2、rancher server 的日志

2024/11/30 19:01:38 [INFO] Watching metadata for rke.cattle.io/v1, Kind=RKEControlPlane
2024/11/30 19:01:38 [INFO] Watching metadata for management.cattle.io/v3, Kind=ClusterRegistrationToken
2024/11/30 19:01:38 [INFO] Watching metadata for storage.k8s.io/v1, Kind=VolumeAttachment
2024/11/30 19:01:38 [INFO] Watching metadata for crd.projectcalico.org/v1, Kind=IPPool
2024/11/30 19:01:38 [INFO] Watching metadata for management.cattle.io/v3, Kind=CatalogTemplateVersion
2024/11/30 19:01:38 [INFO] Watching metadata for management.cattle.io/v3, Kind=KontainerDriver
2024/11/30 19:01:38 [INFO] Watching metadata for /v1, Kind=Secret
2024/11/30 19:01:38 [INFO] Watching metadata for crd.projectcalico.org/v1, Kind=HostEndpoint
2024/11/30 19:01:38 [INFO] Watching metadata for node.k8s.io/v1, Kind=RuntimeClass
2024/11/30 19:01:38 [INFO] Watching metadata for management.cattle.io/v3, Kind=Preference
2024/11/30 19:01:38 [INFO] Watching metadata for management.cattle.io/v3, Kind=DynamicSchema
2024/11/30 19:01:38 [INFO] Watching metadata for management.cattle.io/v3, Kind=PodSecurityPolicyTemplate
2024/11/30 19:01:38 [INFO] Watching metadata for crd.projectcalico.org/v1, Kind=GlobalNetworkPolicy
2024/11/30 19:01:38 [INFO] Watching metadata for ui.cattle.io/v1, Kind=NavLink
2024/11/30 19:01:38 [INFO] Watching metadata for provisioning.cattle.io/v1, Kind=Cluster
2024/11/30 19:01:38 [INFO] Watching metadata for rke.cattle.io/v1, Kind=CustomMachine
2024/11/30 19:01:38 [INFO] Watching metadata for management.cattle.io/v3, Kind=GlobalDnsProvider
2024/11/30 19:01:38 [INFO] Watching metadata for crd.projectcalico.org/v1, Kind=ClusterInformation
2024/11/30 19:01:38 [INFO] Watching metadata for management.cattle.io/v3, Kind=ProjectAlert
2024/11/30 19:01:38 [INFO] Watching metadata for /v1, Kind=ConfigMap
2024/11/30 19:01:38 [INFO] Watching metadata for crd.projectcalico.org/v1, Kind=BGPConfiguration
2024/11/30 19:01:38 [INFO] Watching metadata for events.k8s.io/v1, Kind=Event
2024/11/30 19:01:38 [INFO] Watching metadata for fleet.cattle.io/v1alpha1, Kind=Content
2024/11/30 19:01:38 [INFO] Watching metadata for management.cattle.io/v3, Kind=ProjectNetworkPolicy
2024/11/30 19:01:38 [INFO] Watching metadata for groupsnapshot.storage.k8s.io/v1alpha1, Kind=VolumeGroupSnapshot
2024/11/30 19:01:38 [INFO] Watching metadata for management.cattle.io/v3, Kind=Feature
2024/11/30 19:01:38 [INFO] Watching metadata for management.cattle.io/v3, Kind=Cluster
2024/11/30 19:01:38 [INFO] Watching metadata for management.cattle.io/v3, Kind=Catalog
2024/11/30 19:01:38 [INFO] Watching metadata for management.cattle.io/v3, Kind=Token
2024/11/30 19:01:38 [INFO] Watching metadata for crd.projectcalico.org/v1, Kind=BlockAffinity
2024/11/30 19:01:38 [INFO] Watching metadata for flowcontrol.apiserver.k8s.io/v1beta3, Kind=FlowSchema
2024/11/30 19:01:38 [INFO] Watching metadata for fleet.cattle.io/v1alpha1, Kind=ImageScan
2024/11/30 19:01:38 [INFO] Watching metadata for rbac.authorization.k8s.io/v1, Kind=ClusterRole
2024/11/30 19:01:38 [INFO] Watching metadata for snapshot.storage.k8s.io/v1, Kind=VolumeSnapshot
2024/11/30 19:01:38 [INFO] Watching metadata for cluster.x-k8s.io/v1beta1, Kind=MachineDeployment
2024/11/30 19:01:38 [INFO] Watching metadata for storage.k8s.io/v1, Kind=CSIDriver
2024/11/30 19:01:38 [INFO] Watching metadata for apps/v1, Kind=ControllerRevision
2024/11/30 19:01:38 [INFO] Watching metadata for crd.projectcalico.org/v1, Kind=IPReservation
2024/11/30 19:01:38 [INFO] Watching metadata for management.cattle.io/v3, Kind=NodePool
2024/11/30 19:01:38 [INFO] Watching metadata for management.cattle.io/v3, Kind=Notifier
2024/11/30 19:01:38 [INFO] Watching metadata for discovery.k8s.io/v1, Kind=EndpointSlice
2024/11/30 19:01:38 [INFO] Watching metadata for management.cattle.io/v3, Kind=Setting
2024/11/30 19:01:38 [INFO] Watching metadata for management.cattle.io/v3, Kind=NodeTemplate
2024/11/30 19:01:38 [INFO] Watching metadata for batch/v1, Kind=CronJob
2024/11/30 19:01:38 [INFO] Watching metadata for networking.k8s.io/v1, Kind=Ingress
2024/11/30 19:01:38 [INFO] Watching metadata for project.cattle.io/v3, Kind=AppRevision
2024/11/30 19:01:38 [INFO] Watching metadata for management.cattle.io/v3, Kind=ComposeConfig
2024/11/30 19:01:38 [INFO] Watching metadata for management.cattle.io/v3, Kind=Template
2024/11/30 19:01:38 [INFO] Watching metadata for management.cattle.io/v3, Kind=Group
2024/11/30 19:01:38 [INFO] Watching metadata for management.cattle.io/v3, Kind=RkeK8sServiceOption
2024/11/30 19:01:38 [INFO] Watching metadata for management.cattle.io/v3, Kind=ProjectMonitorGraph
2024/11/30 19:01:38 [INFO] Watching metadata for fleet.cattle.io/v1alpha1, Kind=GitRepo
2024/11/30 19:01:38 [INFO] Watching metadata for rke-machine.cattle.io/v1, Kind=DigitaloceanMachine
2024/11/30 19:01:38 [INFO] Watching metadata for k3s.cattle.io/v1, Kind=ETCDSnapshotFile
2024/11/30 19:01:38 [INFO] Watching metadata for flowcontrol.apiserver.k8s.io/v1beta3, Kind=PriorityLevelConfiguration
2024/11/30 19:01:38 [INFO] Watching metadata for snapshot.storage.k8s.io/v1, Kind=VolumeSnapshotContent
2024/11/30 19:01:38 [INFO] Watching metadata for storage.k8s.io/v1, Kind=CSIStorageCapacity
2024/11/30 19:01:48 [ERROR] error syncing '_all_': handler user-controllers-controller: userControllersController: failed to set peers for key _all_: failed to start user controllers for cluster c-m-s4nlmmqq: ClusterUnavailable 503: cluster not found, requeuing
2024/11/30 19:02:07 [ERROR] Failed to serve peer connection 10.42.0.23: read tcp 10.42.0.21:48160->10.42.0.23:443: i/o timeout
2024/11/30 19:02:18 [ERROR] error syncing '_all_': handler user-controllers-controller: userControllersController: failed to set peers for key _all_: failed to start user controllers for cluster c-m-s4nlmmqq: ClusterUnavailable 503: cluster not found, requeuing
2024/11/30 19:02:48 [ERROR] error syncing '_all_': handler user-controllers-controller: userControllersController: failed to set peers for key _all_: failed to start user controllers for cluster c-m-s4nlmmqq: ClusterUnavailable 503: cluster not found, requeuing
2024/11/30 19:02:55 [INFO] Adding peer wss://10.42.0.22/v3/connect, 10.42.0.22
2024/11/30 19:02:55 [INFO] Adding peer wss://10.42.0.23/v3/connect, 10.42.0.23
2024/11/30 19:02:55 [ERROR] Failed syncing peers [{10.42.0.21 [10.42.0.22 10.42.0.23] true false}]: failed to start user controllers for cluster c-m-s4nlmmqq: ClusterUnavailable 503: cluster not found
2024/11/30 19:02:55 [INFO] Handling backend connection request [10.42.0.23]
2024/11/30 19:02:55 [INFO] Handling backend connection request [10.42.0.22]
2024/11/30 19:03:18 [ERROR] error syncing '_all_': handler user-controllers-controller: userControllersController: failed to set peers for key _all_: failed to start user controllers for cluster c-m-s4nlmmqq: ClusterUnavailable 503: cluster not found, requeuing
2024/11/30 19:03:48 [ERROR] error syncing '_all_': handler user-controllers-controller: 
2024/11/30 19:06:20 [ERROR] Failed to serve peer connection 10.42.0.23: websocket: close 1006 (abnormal closure): unexpected EOF
2024/11/30 19:06:20 [INFO] error in remotedialer server [400]: websocket: close 1006 (abnormal closure): unexpected EOF
2024/11/30 19:06:25 [ERROR] Failed to connect to peer wss://10.42.0.23/v3/connect [local ID=10.42.0.21]: dial tcp 10.42.0.23:443: connect: connection refused
2024/11/30 19:06:28 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:stale GroupVersion discovery: metrics.k8s.io/v1beta1]
2024/11/30 19:06:30 [ERROR] Failed to connect to peer wss://10.42.0.23/v3/connect [local ID=10.42.0.21]: dial tcp 10.42.0.23:443: connect: connection refused
2024/11/30 19:06:35 [ERROR] Failed to connect to peer wss://10.42.0.23/v3/connect [local ID=10.42.0.21]: dial tcp 10.42.0.23:443: connect: connection refused
E1130 19:06:28.534311      40 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: stale GroupVersion discovery: metrics.k8s.io/v1beta1
2024/11/30 19:06:48 [ERROR] error syncing '_all_': handler user-controllers-controller: userControllersController: failed to set peers for key _all_: failed to start user controllers for cluster c-m-s4nlmmqq: ClusterUnavailable 503: cluster not found, requeuing
2024/11/30 19:07:02 [INFO] Handling backend connection request [10.42.0.23]
2024/11/30 19:07:18 [ERROR] error syncing '_all_': handler user-controllers-controller: userControllersController: failed to set peers for key _all_: failed to start user controllers for cluster c-m-s4nlmmqq: ClusterUnavailable 503: cluster not found, requeuing
2024/11/30 19:58:18 [ERROR] error syncing '_all_': handler user-controllers-controller: userControllersController: failed to set peers for key _all_: failed to start user controllers for cluster c-m-s4nlmmqq: ClusterUnavailable 503: cluster not found, requeuing
2024/12/03 01:18:19 [ERROR] error syncing '_all_': handler user-controllers-controller: userControllersController: failed to set peers for key _all_: failed to start user controllers for cluster c-m-s4nlmmqq: ClusterUnavailable 503: cluster not found, requeuing
2024/12/03 01:20:19 [ERROR] error syncing '_all_': handler user-controllers-controller: userControllersController: failed to set peers for key _all_: failed to start user controllers for cluster c-m-s4nlmmqq: ClusterUnavailable 503: cluster not found, requeuing

经过排查,是 harbor 缺少镜像导致,日志如下:

Dec  3 15:10:36 k8s-master-31 rancher-system-agent: time="2024-12-03T15:10:36+08:00" level=info msg="Using private registry config file at /etc/rancher/agent/registries.yaml"
Dec  3 15:10:36 k8s-master-31 rancher-system-agent: time="2024-12-03T15:10:36+08:00" level=info msg="Pulling image registry-70.lakalajr.com/rancher/system-agent-installer-rke2:v1.28.15-rke2r1"
Dec  3 15:10:36 k8s-master-31 rancher-system-agent: time="2024-12-03T15:10:36+08:00" level=warning msg="Failed to get image from endpoint: GET http://registry-70.lakalajr.com/v2/rancher/system-agent-installer-rke2/manifests/v1.28.15-rke2r1: NOT_FOUND: artifact rancher/system-agent-installer-rke2:v1.28.15-rke2r1 not found"
Dec  3 15:10:36 k8s-master-31 rancher-system-agent: time="2024-12-03T15:10:36+08:00" level=warning msg="Failed to get image from endpoint: Get \"https://registry-70.lakalajr.com/v2/\": dial tcp 192.168.0.70:443: connect: connection refused"