longhorn节点io激增,节点出现NotReady状况

问题描述

分片突然变成不可用状态,观察当时k8s节点状况,发现节点io上来导致k8s节点负载高,节点出现NotReady

重现步骤



预期结果

日志

日志
Trace[29640460]: [154.805505ms] [154.805505ms] END
W0809 06:40:14.045503       1 reflector.go:442] k8s.io/client-go/informers/factory.go:134: watch of *v1.ConfigMap ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0809 06:40:14.045549       1 reflector.go:442] github.com/longhorn/longhorn-manager/k8s/pkg/client/informers/externalversions/factory.go:117: watch of *v1beta2.Node ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0809 06:40:14.045568       1 reflector.go:442] k8s.io/client-go/informers/factory.go:134: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0809 06:40:14.045630       1 reflector.go:442] k8s.io/client-go/informers/factory.go:134: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0809 06:40:14.045639       1 reflector.go:442] k8s.io/client-go/informers/factory.go:134: watch of *v1.CSIDriver ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0809 06:40:14.045727       1 reflector.go:442] k8s.io/client-go/informers/factory.go:134: watch of *v1.DaemonSet ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0809 06:40:14.045750       1 reflector.go:442] k8s.io/client-go/informers/factory.go:134: watch of *v1.Secret ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0809 06:40:14.045756       1 reflector.go:442] k8s.io/client-go/informers/factory.go:134: watch of *v1.Pod ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0809 06:40:14.045809       1 reflector.go:442] k8s.io/client-go/informers/factory.go:134: watch of *v1.Deployment ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
I0809 06:40:14.155551       1 trace.go:205] Trace[376176241]: "DeltaFIFO Pop Process" ID:default/tw591-react-web-master,Depth:178,Reason:slow event handlers blocking the queue (09-Aug-2023 06:40:13.019) (total time: 1135ms):

环境信息

  • Longhorn 版本: 1.3.3
  • 安装方法 (e.g. Rancher Catalog App/Helm/Kubectl): Helm
  • Kubernetes 发行版 (e.g. RKE/K3s/EKS/OpenShift) 和版本: k8s 1.19.9
    • 集群管理节点个数: 2
    • 集群 worker 节点数: 15
  • Node 配置
    • 操作系统类型和版本: CentOS Linux 7 (Core)
    • 每个节点的CPU: 64
    • 每个节点的内存:128
    • 磁盘类型(e.g. SSD/NVMe): SSD
    • 节点间网络带宽::1000M
  • 底层基础设施 (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): 物理机
  • 集群中Longhorn卷的个数: 8

附加上下文