问题描述
deployment挂载PVC概率性出现timed out waiting for the condition,导致挂载过程变慢。看了相关日志,日志显示正常。节点的资源使用率不高的情况下,也会出现。
重现步骤
预期结果
日志
日志
vents:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 8m4s default-scheduler Successfully assigned default/atp660-85f67bd846-l476h to rke01
Normal SuccessfulAttachVolume 7m54s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-4e75b6d0-b733-434e-b914-7c6836c17952"
Warning FailedMount 6m1s kubelet Unable to attach or mount volumes: unmounted volumes=[volume], unattached volumes=[volume default-token-vqkqv]: timed out waiting for the condition
Normal Pulled 5m45s kubelet Container image "10.129.1.25:1603/atp/out_docker:v1.2" already present on machine
Normal Created 5m45s kubelet Created container instance
Normal Started 5m45s kubelet Started container instance
longhorn-instance-manager] time="2023-01-10T01:55:25Z" level=info msg="Process Manager: prepare to create process pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369"
[longhorn-instance-manager] time="2023-01-10T01:55:25Z" level=info msg="Process Manager: created process pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369"
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:25Z" level=info msg="Starting with replicas [\"tcp://10.42.14.191:10030\" \"tcp://10.42.7.214:10105\"]"
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:25Z" level=info msg="Connecting to remote: 10.42.14.191:10030"
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:25Z" level=info msg="Opening: 10.42.14.191:10030"
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:25Z" level=info msg="Connecting to remote: 10.42.7.214:10105"
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:25Z" level=info msg="Opening: 10.42.7.214:10105"
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:25Z" level=info msg="Adding backend: tcp://10.42.14.191:10030"
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:25Z" level=info msg="Adding backend: tcp://10.42.7.214:10105"
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:25Z" level=info msg="Get backend tcp://10.42.14.191:10030 revision counter 269877623"
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:25Z" level=info msg="Get backend tcp://10.42.7.214:10105 revision counter 269877623"
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:25Z" level=info msg="device pvc-4e75b6d0-b733-434e-b914-7c6836c17952: SCSI device /dev/longhorn/pvc-4e75b6d0-b733-434e-b914-7c6836c17952 shutdown"
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] go-iscsi-helper: tgtd is already running
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:26Z" level=info msg="go-iscsi-helper: found available target id 3"
tgtd: device_mgmt(246) sz:110 params:path=/var/run/longhorn-pvc-4e75b6d0-b733-434e-b914-7c6836c17952.sock,bstype=longhorn,bsopts=size=536870912000
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:26Z" level=info msg="New data socket connection established"
[longhorn-instance-manager] time="2023-01-10T01:55:26Z" level=info msg="wait for gRPC service of process pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369 to start at localhost:10002"
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:26Z" level=info msg="default: automatically rescan all LUNs of all iscsi sessions"
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:26Z" level=info msg="Creating device /dev/longhorn/pvc-4e75b6d0-b733-434e-b914-7c6836c17952 8:48"
[pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369] time="2023-01-10T01:55:26Z" level=info msg="device pvc-4e75b6d0-b733-434e-b914-7c6836c17952: SCSI device sdd created"
[longhorn-instance-manager] time="2023-01-10T01:55:27Z" level=info msg="wait for gRPC service of process pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369 to start at localhost:10002"
[longhorn-instance-manager] time="2023-01-10T01:55:27Z" level=info msg="Process pvc-4e75b6d0-b733-434e-b914-7c6836c17952-e-dd202369 has started at localhost:10002"
环境信息
- Longhorn 版本:
- 安装方法 (e.g. Rancher Catalog App/Helm/Kubectl): Rancher Catalog App
- Kubernetes 发行版 (e.g. RKE/K3s/EKS/OpenShift) 和版本: rke
- 集群管理节点个数: 3
- 集群 worker 节点数: 15
- Node 配置
- 操作系统类型和版本:redhat 7.9
- 每个节点的CPU: 104
- 每个节点的内存: 320
- 磁盘类型(e.g. SSD/NVMe): sas
- 节点间网络带宽::10G
- 底层基础设施 (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): 物理机
- 集群中Longhorn卷的个数: 300