问题描述
分片突然变成不可用状态,观察当时k8s节点状况,发现节点io上来导致k8s节点负载高,节点出现NotReady
重现步骤
预期结果
日志
日志
Trace[29640460]: [154.805505ms] [154.805505ms] END
W0809 06:40:14.045503 1 reflector.go:442] k8s.io/client-go/informers/factory.go:134: watch of *v1.ConfigMap ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0809 06:40:14.045549 1 reflector.go:442] github.com/longhorn/longhorn-manager/k8s/pkg/client/informers/externalversions/factory.go:117: watch of *v1beta2.Node ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0809 06:40:14.045568 1 reflector.go:442] k8s.io/client-go/informers/factory.go:134: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0809 06:40:14.045630 1 reflector.go:442] k8s.io/client-go/informers/factory.go:134: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0809 06:40:14.045639 1 reflector.go:442] k8s.io/client-go/informers/factory.go:134: watch of *v1.CSIDriver ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0809 06:40:14.045727 1 reflector.go:442] k8s.io/client-go/informers/factory.go:134: watch of *v1.DaemonSet ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0809 06:40:14.045750 1 reflector.go:442] k8s.io/client-go/informers/factory.go:134: watch of *v1.Secret ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0809 06:40:14.045756 1 reflector.go:442] k8s.io/client-go/informers/factory.go:134: watch of *v1.Pod ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0809 06:40:14.045809 1 reflector.go:442] k8s.io/client-go/informers/factory.go:134: watch of *v1.Deployment ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
I0809 06:40:14.155551 1 trace.go:205] Trace[376176241]: "DeltaFIFO Pop Process" ID:default/tw591-react-web-master,Depth:178,Reason:slow event handlers blocking the queue (09-Aug-2023 06:40:13.019) (total time: 1135ms):
环境信息
- Longhorn 版本: 1.3.3
- 安装方法 (e.g. Rancher Catalog App/Helm/Kubectl): Helm
- Kubernetes 发行版 (e.g. RKE/K3s/EKS/OpenShift) 和版本: k8s 1.19.9
- 集群管理节点个数: 2
- 集群 worker 节点数: 15
- Node 配置
- 操作系统类型和版本: CentOS Linux 7 (Core)
- 每个节点的CPU: 64
- 每个节点的内存:128
- 磁盘类型(e.g. SSD/NVMe): SSD
- 节点间网络带宽::1000M
- 底层基础设施 (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): 物理机
- 集群中Longhorn卷的个数: 8