操作系统是centos7.9,docker版本是20.10.17
以下是helm安装rancher的部分步骤,证书为权威证书:
kubectl create namespace cattle-system
kubectl create -n cattle-system
secret tls tls-rancher-ingress
–cert=./tls.crt
–key=./tls.key
helm install rancher rancher/
–version 2.11.3
–namespace cattle-system
–set ingress.tls.source=secret
–set hostname=cfc-rancher.quickegret.com
–set systemDefaultRegistry=newharbor.brightoilonline.com
–set rancherImage=newharbor.brightoilonline.com/rancher/rancher
–set busyboxImage=newharbor.brightoilonline.com/rancher/busybox
–set postDelete.image.repository=newharbor.brightoilonline.com/rancher/shell
–set useBundledSystemChart=true
–set replicas=3
上述一切正常,我通过ng做了负载,并在DNS服务器配置了解析后,本地电脑和虚拟机都能够正常访问rancher ui,且local集群的pod都正常运行。
在创建自定义集群之前,我把rancher全局设置中的agent-tls-mode改为了system store,然后在master和worker节点运行创建命令:
CATTLE_AGENT_VAR_DIR=“/data/rancher/agent” curl -fL https://cfc-rancher.quickegret.com/system-agent-install.sh | sudo CATTLE_AGENT_VAR_DIR=“/data/rancher/agent” sh -s - --server https://cfc-rancher.quickegret.com --label ‘cattle.io/os=linux’ --token 6x6g6b7xgw9n58wmdhx7lmhfxwwp4gh8lbd8nq95rlp97m2x9v6fpc --etcd --controlplane
CATTLE_AGENT_VAR_DIR=“/data/rancher/agent” curl -fL https://cfc-rancher.quickegret.com/system-agent-install.sh | sudo CATTLE_AGENT_VAR_DIR=“/data/rancher/agent” sh -s - --server https://cfc-rancher.quickegret.com --label ‘cattle.io/os=linux’ --token 6x6g6b7xgw9n58wmdhx7lmhfxwwp4gh8lbd8nq95rlp97m2x9v6fpc --worker
在master节点报错:
9月 15 17:32:35 kl-cfc-k8s-master01 rancher-system-agent[29002]: time=“2025-09-15T17:32:35+08:00” level=info msg=“[c0b9577a4206396eb53911cbf6539a34b227a163afb4c2e4af759418880c316e_0:stderr]: + ‘[’ ‘’ = true ‘]’”
9月 15 17:32:35 kl-cfc-k8s-master01 rancher-system-agent[29002]: time=“2025-09-15T17:32:35+08:00” level=info msg=“[c0b9577a4206396eb53911cbf6539a34b227a163afb4c2e4af759418880c316e_0:stderr]: + ‘[’ server = server ‘]’”
9月 15 17:32:35 kl-cfc-k8s-master01 rancher-system-agent[29002]: time=“2025-09-15T17:32:35+08:00” level=info msg=“[c0b9577a4206396eb53911cbf6539a34b227a163afb4c2e4af759418880c316e_0:stderr]: + systemctl is-active --quiet rke2-agent”
9月 15 17:32:35 kl-cfc-k8s-master01 rancher-system-agent[29002]: time=“2025-09-15T17:32:35+08:00” level=info msg=“[c0b9577a4206396eb53911cbf6539a34b227a163afb4c2e4af759418880c316e_0:stderr]: + systemctl enable rke2-server”
9月 15 17:32:35 kl-cfc-k8s-master01 rancher-system-agent[29002]: time=“2025-09-15T17:32:35+08:00” level=info msg=“[c0b9577a4206396eb53911cbf6539a34b227a163afb4c2e4af759418880c316e_0:stderr]: + ‘[’ ‘’ = true ‘]’”
9月 15 17:32:35 kl-cfc-k8s-master01 rancher-system-agent[29002]: time=“2025-09-15T17:32:35+08:00” level=info msg=“[c0b9577a4206396eb53911cbf6539a34b227a163afb4c2e4af759418880c316e_0:stderr]: + ‘[’ true = true ‘]’”
9月 15 17:32:35 kl-cfc-k8s-master01 rancher-system-agent[29002]: time=“2025-09-15T17:32:35+08:00” level=info msg=“[c0b9577a4206396eb53911cbf6539a34b227a163afb4c2e4af759418880c316e_0:stderr]: + systemctl --no-block restart rke2-server”
9月 15 17:32:35 kl-cfc-k8s-master01 rancher-system-agent[29002]: time=“2025-09-15T17:32:35+08:00” level=info msg=“[Applyinator] Command sh [-c run.sh] finished with err: and exit code: 0”
9月 15 17:32:36 kl-cfc-k8s-master01 rancher-system-agent[29002]: time=“2025-09-15T17:32:36+08:00” level=info msg=“[K8s] updated plan secret fleet-default/custom-d871646d0e33-machine-plan with feedback”
9月 15 17:32:36 kl-cfc-k8s-master01 rancher-system-agent[29002]: time=“2025-09-15T17:32:36+08:00” level=error msg=“[K8s] received secret to process that was older than the last secret operated on. (956265 vs 956817)”
9月 15 17:32:36 kl-cfc-k8s-master01 rancher-system-agent[29002]: time=“2025-09-15T17:32:36+08:00” level=error msg=“error syncing ‘fleet-default/custom-d871646d0e33-machine-plan’: handler secret-watch: secret received was too old, requeuing”
9月 15 17:32:41 kl-cfc-k8s-master01 rancher-system-agent[29002]: time=“2025-09-15T17:32:41+08:00” level=error msg=“[K8s] received secret to process that was older than the last secret operated on. (956265 vs 956817)”
9月 15 17:32:41 kl-cfc-k8s-master01 rancher-system-agent[29002]: time=“2025-09-15T17:32:41+08:00” level=error msg=“error syncing ‘fleet-default/custom-d871646d0e33-machine-plan’: handler secret-watch: secret received was too old, requeuing”
9月 15 17:34:33 kl-cfc-k8s-master01 rancher-system-agent[29002]: W0915 17:34:33.567180 29002 reflector.go:492] pkg/mod/k8s.io/client-go@v0.32.2/tools/cache/reflector.go:251: watch of *v1.Secret ended with: an error on the server (“unable to decode an event from the watch stream: stream error: stream ID 13; INTERNAL_ERROR; received from peer”) has prevented the request from succeeding
9月 15 17:34:33 kl-cfc-k8s-master01 rancher-system-agent[29002]: time=“2025-09-15T17:34:33+08:00” level=info msg=“[K8s] updated plan secret fleet-default/custom-d871646d0e33-machine-plan with feedback”
9月 15 17:35:34 kl-cfc-k8s-master01 rancher-system-agent[29002]: W0915 17:35:34.846465 29002 reflector.go:492] pkg/mod/k8s.io/client-go@v0.32.2/tools/cache/reflector.go:251: watch of *v1.Secret ended with: an error on the server (“unable to decode an event from the watch stream: stream error: stream ID 25; INTERNAL_ERROR; received from peer”) has prevented the request from succeeding
worker节点报错:
9月 15 17:30:40 kl-cfc-k8s-worker01 systemd[1]: Started Rancher System Agent.
9月 15 17:30:40 kl-cfc-k8s-worker01 rancher-system-agent[2747]: time=“2025-09-15T17:30:40+08:00” level=info msg=“Rancher System Agent version v0.3.12 (e4876a6) is starting”
9月 15 17:30:40 kl-cfc-k8s-worker01 rancher-system-agent[2747]: time=“2025-09-15T17:30:40+08:00” level=info msg=“Using directory /data/rancher/agent/work for work”
9月 15 17:30:40 kl-cfc-k8s-worker01 rancher-system-agent[2747]: time=“2025-09-15T17:30:40+08:00” level=info msg=“Starting remote watch of plans”
9月 15 17:30:41 kl-cfc-k8s-worker01 rancher-system-agent[2747]: time=“2025-09-15T17:30:41+08:00” level=info msg=“Starting /v1, Kind=Secret controller”
9月 15 17:31:41 kl-cfc-k8s-worker01 rancher-system-agent[2747]: W0915 17:31:41.043659 2747 reflector.go:492] pkg/mod/k8s.io/client-go@v0.32.2/tools/cache/reflector.go:251: watch of *v1.Secret ended with: an error on the server (“unable to decode an event from the watch stream: stream error: stream ID 5; INTERNAL_ERROR; received from peer”) has prevented the request from succeeding
rancher容器的报错:
2025/09/15 10:05:30 [ERROR] watcher channel closed:
2025/09/15 10:05:40 [INFO] starting imperative api cert rotator
2025/09/15 10:05:40 [INFO] imperative api APIService cert updated
2025/09/15 10:07:13 [INFO] [planner] rkecluster fleet-default/kuailu-cfc: configuring bootstrap node(s) custom-d871646d0e33: waiting for probes: etcd, kube-apiserver, kube-controller-manager, kube-scheduler, kubelet
2025/09/15 10:07:13 [ERROR] error syncing ‘c-m-g77dsbbd’: handler cluster-deploy: cluster context c-m-g77dsbbd is unavailable, requeuing
2025/09/15 10:07:16 [ERROR] error syncing ‘all’: handler user-controllers-controller: userControllersController: failed to set peers for key all: failed to start user controllers for cluster c-m-g77dsbbd: ClusterUnavailable 503: cluster not found, requeuing
[root@kl-cfc-rancher01 tmp]# kubectl logs -n cattle-system ^C app=rancher
目前多番排查没找到可行的解决办法,rancher容器比虚拟机时间晚了8小时