下游k8s集群 提示一个主控节点报错 Error Get "https://x.x.x.x:6443/api/v1/namespaces?timeout=45s": tunnel disconnect

Rancher Server 设置

  • Rancher 版本:v2.5.7
  • 安装选项 (Helm Chart): helm install rancher rancher-stable/rancher --version=2.5.7 --namespace cattle-system --set hostname=xxxx.com --set ingress.tls.source=rancher --set replicas=2 --set auditLog.hostPath=/var/log/auditlog --set auditLog.level=1 --set auditLog.maxBackups=10 --set auditLog.maxSize=100
    • 如果是 Helm Chart 安装,需要提供 Local 集群的类型(k3s)和版本:v1.19.5+k3s2
  • 在线部署

下游集群信息

  • Kubernetes 版本: v1.19.4
  • Cluster Type (Downstream):
    -自定义

用户信息

  • 登录用户的角色是什么? (管理员):
    • 如果自定义,自定义权限集:

**主机操作系统:centos7

**问题描述:*下游k8s集群 提示一个主控节点报错 Error Get “https://x.x.x.x:6443/api/v1/namespaces?timeout=45s”: tunnel disconnect *

*截图:


*

[rancher =“日志”]

![image|690x277](upload://gm846fha8dZbifw6Y9cs8WqIoTh.png)
2023/11/07 07:09:07 [ERROR] Unknown error: Get "https://x.x.x.x:6443/apis/extensions/v1beta1/namespaces/saas/ingresses?timeout=45s": context deadline exceeded
2023/11/07 07:09:07 [ERROR] failed on subscribe replicationController: Get "https://x.x.x.x:6443/api/v1/replicationcontrollers?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": tunnel disconnect
E1107 07:09:07.506359       8 reflector.go:139] pkg/mod/github.com/rancher/client-go@v1.20.0-rancher.1/tools/cache/reflector.go:168: Failed to watch *v1.Role: Get "https://x.x.x.x:6443/apis/rbac.authorization.k8s.io/v1/roles?allowWatchBookmarks=true&resourceVersion=12647598&timeout=7m8s&timeoutSeconds=428&watch=true": tunnel disconnect
E1107 07:09:07.506648       8 request.go:1011] Unexpected error when reading response body: tunnel disconnect
I1107 07:09:07.507570       8 trace.go:205] Trace[242591981]: "Reflector ListAndWatch" name:pkg/mod/github.com/rancher/client-go@v1.20.0-rancher.1/tools/cache/reflector.go:168 (07-Nov-2023 07:08:08.638) (total time: 58869ms):
Trace[242591981]: [58.86945847s] [58.86945847s] END
E1107 07:09:07.507586       8 reflector.go:139] pkg/mod/github.com/rancher/client-go@v1.20.0-rancher.1/tools/cache/reflector.go:168: Failed to watch *v1.ClusterRole: failed to list *v1.ClusterRole: Get "https://x.x.x.x:6443/apis/rbac.authorization.k8s.io/v1/clusterroles?resourceVersion=12647598": tunnel disconnect
I1107 07:09:07.507854       8 trace.go:205] Trace[1510293002]: "Reflector ListAndWatch" name:pkg/mod/github.com/rancher/client-go@v1.20.0-rancher.1/tools/cache/reflector.go:168 (07-Nov-2023 07:08:09.772) (total time: 57735ms):
Trace[1510293002]: [57.735502618s] [57.735502618s] END
E1107 07:09:07.507865       8 reflector.go:139] pkg/mod/github.com/rancher/client-go@v1.20.0-rancher.1/tools/cache/reflector.go:168: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://x.x.x.x:6443/api/v1/services?resourceVersion=12647597": tunnel disconnect
I1107 07:09:07.509666       8 trace.go:205] Trace[157177730]: "Reflector ListAndWatch" name:pkg/mod/github.com/rancher/client-go@v1.20.0-rancher.1/tools/cache/reflector.go:168 (07-Nov-2023 07:08:10.010) (total time: 57499ms):
Trace[157177730]: [57.499281259s] [57.499281259s] END
E1107 07:09:07.509732       8 reflector.go:139] pkg/mod/github.com/rancher/client-go@v1.20.0-rancher.1/tools/cache/reflector.go:168: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: Get "https://x.x.x.x:6443/api/v1/endpoints?resourceVersion=12666265": tunnel disconnect
I1107 07:09:07.509977       8 trace.go:205] Trace[403112844]: "Reflector ListAndWatch" name:pkg/mod/github.com/rancher/client-go@v1.20.0-rancher.1/tools/cache/reflector.go:168 (07-Nov-2023 07:08:06.233) (total time: 61276ms):
Trace[403112844]: [1m1.276573236s] [1m1.276573236s] END
E1107 07:09:07.510012       8 reflector.go:139] pkg/mod/github.com/rancher/client-go@v1.20.0-rancher.1/tools/cache/reflector.go:168: Failed to watch *v1.Pod: failed to list *v1.Pod: unexpected error when reading response body. Please retry. Original error: tunnel disconnect
2023/11/07 07:09:07 [ERROR] failed on subscribe service: Get "https://x.x.x.x:6443/api/v1/services?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": tunnel disconnect
2023/11/07 07:09:07 [ERROR] failed on subscribe pod: Get "https://x.x.x.x:6443/api/v1/pods?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true": tunnel disconnect

[/details]

实际从集群中其他节点curl 地址是通的

不通过 rancher的接口访问 集群是正常 ,但从rancher界面或接口显示集群是上面的报错

恢复 之前的备份 ,apiserver 报错 :
v1beta1.metrics.k8s.io failed with: Operation cannot be fulfilled on apiservices.apiregistration.k8s.iov1beta1.metrics.k8s.io”: the object has been modified; please apply your changes to the latest version and try again
W1107 09:30:46.956208 1 handler_proxy.go:102] no RequestInfo found in the context
E1107 09:30:46.956243 1 controller.go:116] loading OpenAPI spec for “v1beta1.metrics.k8s.io” failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
v1beta1.metrics.k8s.io failed with: Operation cannot be fulfilled on apiservices.apiregistration.k8s.iov1beta1.metrics.k8s.io”: the object has been modified; please apply your changes to the latest version and try again

apiserver 里面报错: unable to encode watch object *v1.WatchEvent: write tcp 10.16.69.176:6443->172.16.0.130:38404: write: broken pipe (&streaming.encoder{writer:(*http.response)(0xc01d6ba540), encoder:(*versioning.codec)(0xc01d2b9540), buf:(*bytes.Buffer)(0xc017debbf0)})
E1108 07:02:11.428899 1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:“context canceled”}
E1108 07:02:11.429081 1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:“context canceled”}

这个问题,是持续出现问题,还是偶尔出现?

持续出现,好了几秒马上又异常几分钟

rancher 集群管理的个别的集群有这个问题,之前部署的集群都还正常

异常的集群目前通过rancher没发用

可参考:Rancher 2.4.4 集群unavailabe - #3,来自 ksd

非常感谢,但是版本不好升级,接口很多都要重新开发,线上没有测试不好贸然升级

问题已解决,是rancher 和下游k8s集群 之间 公网 网络有丢包引起的,解决了网络问题就好了