deployment挂载PVC概率性出现timed out waiting for the condition,导致挂载慢,这个原因是啥?

问题描述

deployment挂载PVC概率性出现timed out waiting for the condition,导致挂载过程变慢。看了相关日志,日志显示正常。节点的资源使用率不高的情况下,也会出现。

重现步骤

预期结果

日志

日志
vents:
  Type     Reason                  Age    From                     Message
  ----     ------                  ----   ----                     -------
  Normal   Scheduled               8m4s   default-scheduler        Successfully assigned default/atp660-85f67bd846-l476h to rke01
  Normal   SuccessfulAttachVolume  7m54s  attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-4e75b6d0-b733-434e-b914-7c6836c17952"
  Warning  FailedMount             6m1s   kubelet                  Unable to attach or mount volumes: unmounted volumes=[volume], unattached volumes=[volume default-token-vqkqv]: timed out waiting for the condition
  Normal   Pulled                  5m45s  kubelet                  Container image "10.129.1.25:1603/atp/out_docker:v1.2" already present on machine
  Normal   Created                 5m45s  kubelet                  Created container instance
  Normal   Started                 5m45s  kubelet                  Started container instance


longhorn-instance-manager] time="2023-01-10T01:55:25Z" level=info msg="Process Manager: prepare to create process pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369"
[longhorn-instance-manager] time="2023-01-10T01:55:25Z" level=info msg="Process Manager: created process pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369"
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:25Z" level=info msg="Starting with replicas [\"tcp://10.42.14.191:10030\" \"tcp://10.42.7.214:10105\"]"
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:25Z" level=info msg="Connecting to remote: 10.42.14.191:10030"
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:25Z" level=info msg="Opening: 10.42.14.191:10030"
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:25Z" level=info msg="Connecting to remote: 10.42.7.214:10105"
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:25Z" level=info msg="Opening: 10.42.7.214:10105"
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:25Z" level=info msg="Adding backend: tcp://10.42.14.191:10030"
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:25Z" level=info msg="Adding backend: tcp://10.42.7.214:10105"
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:25Z" level=info msg="Get backend tcp://10.42.14.191:10030 revision counter 269877623"
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:25Z" level=info msg="Get backend tcp://10.42.7.214:10105 revision counter 269877623"
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:25Z" level=info msg="device pvc-4e75b6d0-b733-434e-b914-7c6836c17952: SCSI device /dev/longhorn/pvc-4e75b6d0-b733-434e-b914-7c6836c17952 shutdown"
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] go-iscsi-helper: tgtd is already running
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:26Z" level=info msg="go-iscsi-helper: found available target id 3"
tgtd: device_mgmt(246) sz:110 params:path=/var/run/longhorn-pvc-4e75b6d0-b733-434e-b914-7c6836c17952.sock,bstype=longhorn,bsopts=size=536870912000
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:26Z" level=info msg="New data socket connection established"
[longhorn-instance-manager] time="2023-01-10T01:55:26Z" level=info msg="wait for gRPC service of process pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369 to start at localhost:10002"
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:26Z" level=info msg="default: automatically rescan all LUNs of all iscsi sessions"
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:26Z" level=info msg="Creating device /dev/longhorn/pvc-4e75b6d0-b733-434e-b914-7c6836c17952 8:48"
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:26Z" level=info msg="device pvc-4e75b6d0-b733-434e-b914-7c6836c17952: SCSI device sdd created"
[longhorn-instance-manager] time="2023-01-10T01:55:27Z" level=info msg="wait for gRPC service of process pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369 to start at localhost:10002"
[longhorn-instance-manager] time="2023-01-10T01:55:27Z" level=info msg="Process pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369 has started at localhost:10002"









环境信息

  • Longhorn 版本:
  • 安装方法 (e.g. Rancher Catalog App/Helm/Kubectl): Rancher Catalog App
  • Kubernetes 发行版 (e.g. RKE/K3s/EKS/OpenShift) 和版本: rke
    • 集群管理节点个数: 3
    • 集群 worker 节点数: 15
  • Node 配置
    • 操作系统类型和版本:redhat 7.9
    • 每个节点的CPU: 104
    • 每个节点的内存: 320
    • 磁盘类型(e.g. SSD/NVMe): sas
    • 节点间网络带宽::10G
  • 底层基础设施 (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): 物理机
  • 集群中Longhorn卷的个数: 300

附加上下文