Rancher Server 设置
- Rancher 版本:v2.5.9
- 安装选项 (Docker install/Helm Chart): Docker install
- 如果是 Helm Chart 安装,需要提供 Local 集群的类型(RKE1, RKE2, k3s, EKS, 等)和版本:RKE: v1.1.11
- 在线或离线部署:离线
下游集群信息
- Kubernetes 版本: client (1.22) and server (1.18)
- Cluster Type (Local/Downstream): local
- 如果 Downstream,是什么类型的集群?(自定义/导入或为托管 等):
用户信息
- 登录用户的角色是什么? (管理员/集群所有者/集群成员/项目所有者/项目成员/自定义):admin
- 如果自定义,自定义权限集:
主机操作系统:centos7
问题描述:集群昨日正常未作任何操作修改,今日登录rancher页面后发现集群error,提示:Failed to ensure monitoring project name: failed to find “cattle-prometheus” Namespace: Get “https://10.43.0.1:443/api/v1/namespaces/cattle-prometheus”: waiting for cluster [c-swd8k] agent to connect; waiting on cluster-scoped-gc
重现步骤:
结果:
预期结果:
**截图:
**
其他上下文信息:
日志
I0429 00:54:56.119792 54 request.go:645] Throttling request took 1.034102129s, request: GET:https://127.0.0.1:6444/apis/management.cattle.io/v3/rkek8sserviceoptions?limit=500&resourceVersion=0
I0429 00:54:56.693654 54 shared_informer.go:240] Waiting for caches to sync for garbage collector
I0429 00:54:57.176689 54 shared_informer.go:247] Caches are synced for resource quota
I0429 00:54:57.206887 54 shared_informer.go:247] Caches are synced for resource quota
2024/04/29 00:54:58 [ERROR] error syncing 'system-library': handler catalog: Update failed: fatal: unable to access 'https://git.rancher.io/system-charts/': gnutls_handshake() failed: Error in the pull function.
: exit status 128, requeuing
I0429 00:54:58.494414 54 shared_informer.go:247] Caches are synced for garbage collector
2024/04/29 00:54:58 [ERROR] error syncing 'helm3-library': handler catalog: Update failed: fatal: unable to access 'https://git.rancher.io/helm3-charts/': gnutls_handshake() failed: Error in the pull function.
: exit status 128, requeuing
2024/04/29 00:54:58 [ERROR] error syncing 'library': handler catalog: Update failed: fatal: unable to access 'https://git.rancher.io/charts/': gnutls_handshake() failed: Error in the pull function.
: exit status 128, requeuing
I0429 00:54:58.554887 54 shared_informer.go:247] Caches are synced for garbage collector
I0429 00:54:58.554959 54 garbagecollector.go:137] Garbage collector: all resource monitors have synced. Proceeding to collect garbage
2024/04/29 00:55:01 [ERROR] failed on subscribe replicationController: Get "https://10.43.0.1:443/api/v1/replicationcontrollers?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
2024/04/29 00:55:02 [ERROR] failed on subscribe replicaSet: Get "https://10.43.0.1:443/apis/apps/v1/replicasets?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
2024/04/29 00:55:03 [ERROR] failed on subscribe serviceMonitor: Get "https://10.43.0.1:443/apis/monitoring.coreos.com/v1/servicemonitors?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
2024/04/29 00:55:05 [ERROR] failed on subscribe alertmanager: Get "https://10.43.0.1:443/apis/monitoring.coreos.com/v1/alertmanagers?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
2024/04/29 00:55:06 [ERROR] failed on subscribe job: Get "https://10.43.0.1:443/apis/batch/v1/jobs?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
2024/04/29 00:55:07 [ERROR] failed on subscribe daemonSet: Get "https://10.43.0.1:443/apis/apps/v1/daemonsets?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
2024/04/29 00:55:07 [ERROR] failed on subscribe configMap: Get "https://10.43.0.1:443/api/v1/configmaps?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
2024/04/29 00:55:07 [ERROR] failed on subscribe statefulSet: Get "https://10.43.0.1:443/apis/apps/v1/statefulsets?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
2024/04/29 00:55:07 [ERROR] failed on subscribe ingress: Get "https://10.43.0.1:443/apis/extensions/v1beta1/ingresses?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
2024/04/29 00:55:07 [ERROR] failed on subscribe service: Get "https://10.43.0.1:443/api/v1/services?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
2024/04/29 00:55:08 [ERROR] failed on subscribe prometheus: Get "https://10.43.0.1:443/apis/monitoring.coreos.com/v1/prometheuses?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
2024/04/29 00:55:09 [ERROR] error syncing 'library': handler catalog: Update failed: fatal: unable to access 'https://git.rancher.io/charts/': gnutls_handshake() failed: Error in the pull function.
: exit status 128, requeuing
2024/04/29 00:55:09 [ERROR] error syncing 'system-library': handler catalog: Update failed: fatal: unable to access 'https://git.rancher.io/system-charts/': gnutls_handshake() failed: Error in the pull function.
: exit status 128, requeuing
2024/04/29 00:55:09 [ERROR] error syncing 'helm3-library': handler catalog: Update failed: fatal: unable to access 'https://git.rancher.io/helm3-charts/': gnutls_handshake() failed: Error in the pull function.
: exit status 128, requeuing
2024/04/29 00:55:10 [ERROR] failed on subscribe dnsRecord: Get "https://10.43.0.1:443/api/v1/services?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
2024/04/29 00:55:10 [ERROR] failed on subscribe prometheusRule: Get "https://10.43.0.1:443/apis/monitoring.coreos.com/v1/prometheusrules?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
2024/04/29 00:55:10 [ERROR] failed on subscribe deployment: Get "https://10.43.0.1:443/apis/apps/v1/deployments?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
2024/04/29 00:55:10 [ERROR] failed on subscribe virtualService: Get "https://10.43.0.1:443/apis/networking.istio.io/v1alpha3/virtualservices?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
2024/04/29 00:55:10 [ERROR] failed on subscribe cronJob: Get "https://10.43.0.1:443/apis/batch/v1beta1/cronjobs?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
2024/04/29 00:55:11 [ERROR] failed on subscribe gateway: Get "https://10.43.0.1:443/apis/networking.istio.io/v1alpha3/gateways?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
2024/04/29 00:55:12 [ERROR] failed on subscribe destinationRule: Get "https://10.43.0.1:443/apis/networking.istio.io/v1alpha3/destinationrules?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
2024/04/29 00:55:12 [ERROR] failed on subscribe pod: Get "https://10.43.0.1:443/api/v1/pods?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
2024/04/29 00:55:13 [ERROR] failed on subscribe namespacedDockerCredential: Get "https://10.43.0.1:443/api/v1/secrets?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
2024/04/29 00:55:14 [ERROR] failed on subscribe persistentVolumeClaim: Get "https://10.43.0.1:443/api/v1/persistentvolumeclaims?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
2024/04/29 00:55:25 [ERROR] error syncing 'c-swd8k': handler cluster-deploy: Get "https://10.43.0.1:443/apis/apps/v1/namespaces/cattle-system/daemonsets/cattle-node-agent": waiting for cluster [c-swd8k] agent to connect, requeuing
time="2024-04-29T00:55:30.515409722Z" level=info msg="Cluster-Http-Server 2024/04/29 00:55:30 http: TLS handshake error from 10.42.0.34:59750: remote error: tls: bad certificate"
E0429 00:55:30.524220 54 authentication.go:53] Unable to authenticate the request due to an error: [invalid bearer token, Token has been invalidated]
2024/04/29 00:55:30 [ERROR] error syncing 'library': handler catalog: Update failed: fatal: unable to access 'https://git.rancher.io/charts/': gnutls_handshake() failed: Error in the pull function.
: exit status 128, requeuing
2024/04/29 00:55:30 [ERROR] error syncing 'system-library': handler catalog: Update failed: fatal: unable to access 'https://git.rancher.io/system-charts/': gnutls_handshake() failed: Error in the pull function.
: exit status 128, requeuing
2024/04/29 00:55:31 [ERROR] error syncing 'helm3-library': handler catalog: Update failed: fatal: unable to access 'https://git.rancher.io/helm3-charts/': gnutls_handshake() failed: Error in the pull function.
: exit status 128, requeuing
2024/04/29 00:55:31 [ERROR] failed on subscribe replicationController: Get "https://10.43.0.1:443/api/v1/replicationcontrollers?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
2024/04/29 00:55:32 [ERROR] failed on subscribe replicaSet: Get "https://10.43.0.1:443/apis/apps/v1/replicasets?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
2024/04/29 00:55:36 [ERROR] failed on subscribe statefulSet: Get "https://10.43.0.1:443/apis/apps/v1/statefulsets?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
2024/04/29 00:55:37 [ERROR] failed on subscribe daemonSet: Get "https://10.43.0.1:443/apis/apps/v1/daemonsets?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
2024/04/29 00:55:37 [ERROR] failed on subscribe job: Get "https://10.43.0.1:443/apis/batch/v1/jobs?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
2024/04/29 00:55:40 [ERROR] failed on subscribe deployment: Get "https://10.43.0.1:443/apis/apps/v1/deployments?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
2024/04/29 00:55:40 [ERROR] failed on subscribe cronJob: Get "https://10.43.0.1:443/apis/batch/v1beta1/cronjobs?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
2024/04/29 00:55:46 [ERROR] failed on subscribe namespacedSshAuth: Get "https://10.43.0.1:443/api/v1/secrets?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
2024/04/29 00:56:01 [ERROR] error syncing 'library': handler catalog: Update failed: fatal: unable to access 'https://git.rancher.io/charts/': gnutls_handshake() failed: Error in the pull function.
: exit status 128, requeuing
2024/04/29 00:56:01 [ERROR] error syncing 'system-library': handler catalog: Update failed: fatal: unable to access 'https://git.rancher.io/system-charts/': gnutls_handshake() failed: Error in the pull function.
: exit status 128, requeuing
2024/04/29 00:56:01 [ERROR] error syncing 'helm3-library': handler catalog: Update failed: fatal: unable to access 'https://git.rancher.io/helm3-charts/': gnutls_handshake() failed: Error in the pull function.
: exit status 128, requeuing
2024/04/29 00:56:07 [ERROR] error syncing 'c-swd8k': handler cluster-deploy: Get "https://10.43.0.1:443/apis/apps/v1/namespaces/cattle-system/daemonsets/cattle-node-agent": waiting for cluster [c-swd8k] agent to connect, requeuing
2024/04/29 00:56:10 [INFO] Stopping cluster agent for c-swd8k
2024/04/29 00:56:10 [ERROR] failed to start cluster controllers c-swd8k: context canceled
2024/04/29 00:56:17 [ERROR] failed on subscribe namespacedSecret: Get "https://10.43.0.1:443/api/v1/secrets?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": waiting for cluster [c-swd8k] agent to connect
time="2024-04-29T00:56:30.539847234Z" level=info msg="Cluster-Http-Server 2024/04/29 00:56:30 http: TLS handshake error from 10.42.0.34:60076: remote error: tls: bad certificate"
E0429 00:56:30.550754 54 authentication.go:53] Unable to authenticate the request due to an error: [invalid bearer token, Token has been invalidated]