使用local pv的时候kubelet找不到本地卷

Rancher Server 设置

  • Rancher 版本:2.6.4
  • 安装选项 (Docker install/Helm Chart): helm chart
    • 如果是 Helm Chart 安装,需要提供 Local 集群的类型(RKE1, RKE2, k3s, EKS, 等)和版本:
      rke1: rke版本 1.3.9 k8s版本 1.22.7
  • 在线或离线部署:
    离线部署

下游集群信息

  • Kubernetes 版本: 1.22.7
  • Cluster Type (Local/Downstream):
    • 如果 Downstream,是什么类型的集群?(自定义/导入或为托管 等):
      自定义的rke集群

用户信息

  • 登录用户的角色是什么? (管理员/集群所有者/集群成员/项目所有者/项目成员/自定义):
    • 如果自定义,自定义权限集:
      admin角色

问题描述:
使用local pv创建本地卷,在启用pod的时候,报pv对应的卷没找到,按照官方的提示,升级了cluster.yaml,还是报同样的错误。
yaml文件如下:

[rancher@fotileappmaster01 ~]$ cat pv-local.yaml 
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-local
spec:
  capacity:
    storage: 5Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  storageClassName: local-storage
  local:
    path: /home/rancher/k8s/localpv
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - 10.11.111.137

[rancher@fotileappmaster01 ~]$ cat local-storageclass.yaml 
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer

[rancher@fotileappmaster01 ~]$ cat pvc-local.yaml 
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-local
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  storageClassName: local-storage

[rancher@fotileappmaster01 ~]$ cat pod-local-pv.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: nginxdemo
spec:
  hostname: nginxdemo
  volumes:
    - name: pvc-local-pv
      persistentVolumeClaim:
        claimName: pvc-local
  containers:
    - name: nginx
      image: docker.io/nginx:alpine
      imagePullPolicy: IfNotPresent
      volumeMounts:
      - name: pvc-local-pv
        mountPath: /usr/share/nginx/html

[rancher@fotileappmaster01 rke2-cluster]$ cat config.yml
nodes:
  - address: 10.11.111.134 # 离线环境节点 IP
    internal_address: 10.11.111.134  # 节点内网 IP
    user: rancher
    role: ["controlplane", "etcd"]
    ssh_key_path: /home/rancher/.ssh/id_rsa
  - address: 10.11.111.135 # 离线环境节点 IP
    internal_address: 10.11.111.135 # 节点内网 IP
    user: rancher
    role: ["controlplane", "etcd"]
    ssh_key_path: /home/rancher/.ssh/id_rsa
  - address: 10.11.111.136 # 离线环境节点 IP
    internal_address: 10.11.111.136 # 节点内网 IP
    user: rancher
    role: ["controlplane", "etcd"]
    ssh_key_path: /home/rancher/.ssh/id_rsa
  - address: 10.11.111.137 # 离线环境节点 IP
    internal_address: 10.11.111.137 # 节点内网 IP
    user: rancher
    role: ["worker"]
    ssh_key_path: /home/rancher/.ssh/id_rsa
  - address: 10.11.111.138 # 离线环境节点 IP
    internal_address: 10.11.111.138 # 节点内网 IP
    user: rancher
    role: ["worker"]
    ssh_key_path: /home/rancher/.ssh/id_rsa
  - address: 10.11.111.139 # 离线环境节点 IP
    internal_address: 10.11.111.139 # 节点内网 IP
    user: rancher
    role: ["worker"]
    ssh_key_path: /home/rancher/.ssh/id_rsa
network:
  plugin: calico
  options: {}
  mtu: 0
  node_selector: {}
  update_strategy: null
  tolerations: []
private_registries:
  -- url: harbor.tkg.com # 私有镜像库地址
    user: admin
    password: "P@ssw0rd"
    is_default: true
services:
  kubelet: 
    extra_binds:
    -- "/home/rancher/k8s/localpv:/home/rancher/k8s/localpv"
    - "/usr/libexec/kubernetes/kubelet-plugins:/usr/libexec/kubernetes/kubelet-plugins:z"

重现步骤:

./rke up --update-only --config config.yml
kubectl apply -f pv-local.yaml
kubectl apply -f local-storageclass.yaml
kubectl apply -f pvc-local.yaml
kubectl apply -f pod-local-pv.yaml

结果:

[rancher@fotileappmaster01 rke2-cluster]$ cat config.rkestate | grep extraBinds -C 5             
        "scheduler": {
          "image": "harbor.tkg.com/rancher/hyperkube:v1.22.7-rancher1"
        },
        "kubelet": {
          "image": "harbor.tkg.com/rancher/hyperkube:v1.22.7-rancher1",
          "extraBinds": [
            "/home/rancher/k8s/localpv:/home/rancher/k8s/localpv",
            "/usr/libexec/kubernetes/kubelet-plugins:/usr/libexec/kubernetes/kubelet-plugins:z"
          ],
          "clusterDomain": "cluster.local",
          "infraContainerImage": "harbor.tkg.com/rancher/mirrored-pause:3.6",
--
        "scheduler": {
          "image": "harbor.tkg.com/rancher/hyperkube:v1.22.7-rancher1"
        },
        "kubelet": {
          "image": "harbor.tkg.com/rancher/hyperkube:v1.22.7-rancher1",
          "extraBinds": [
            "/home/rancher/k8s/localpv:/home/rancher/k8s/localpv",
            "/usr/libexec/kubernetes/kubelet-plugins:/usr/libexec/kubernetes/kubelet-plugins:z"
          ],
          "clusterDomain": "cluster.local",
          "infraContainerImage": "harbor.tkg.com/rancher/mirrored-pause:3.6",
[rancher@fotileappmaster01 ~]$ kubectl get pv
NAME       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM               STORAGECLASS    REASON   AGE
pv-local   5Gi        RWO            Delete           Bound    default/pvc-local   local-storage            15m
pvdemo     1Gi        RWO,RWX        Delete           Bound    default/pvcdemo                              84m

[rancher@fotileappmaster01 ~]$ kubectl get pvc
NAME        STATUS   VOLUME     CAPACITY   ACCESS MODES   STORAGECLASS    AGE
pvc-local   Bound    pv-local   5Gi        RWO            local-storage   15m
pvcdemo     Bound    pvdemo     1Gi        RWO,RWX                        84m
[rancher@fotileappmaster01 ~]$ kubectl get sc
NAME            PROVISIONER                    RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
local-storage   kubernetes.io/no-provisioner   Delete          WaitForFirstConsumer   false                  15m
[rancher@fotileappmaster01 ~]$ kubectl get po
NAME                          READY   STATUS              RESTARTS        AGE
deploydemo-77b64b85bb-vf6kz   1/1     Running             0               84m
nginx-6fdfb68959-8wv6w        1/1     Running             1 (7h33m ago)   12d
nginx-6fdfb68959-kplrs        1/1     Running             1 (7h34m ago)   12d
nginx-6fdfb68959-w47c4        1/1     Running             1 (7h34m ago)   12d
nginxdemo                     0/1     ContainerCreating   0               15m


[rancher@fotileappmaster01 ~]$ kubectl describe po nginxdemo
.....
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  pvc-local-pv:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  pvc-local
    ReadOnly:   false
  kube-api-access-qfkxv:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Guaranteed
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason       Age                  From               Message
  ----     ------       ----                 ----               -------
  Normal   Scheduled    15m                  default-scheduler  Successfully assigned default/nginxdemo to 10.11.111.137
  Warning  FailedMount  6m44s                kubelet            Unable to attach or mount volumes: unmounted volumes=[pvc-local-pv], unattached volumes=[kube-api-access-qfkxv pvc-local-pv]: timed out waiting for the condition
  Warning  FailedMount  2m15s (x5 over 13m)  kubelet            Unable to attach or mount volumes: unmounted volumes=[pvc-local-pv], unattached volumes=[pvc-local-pv kube-api-access-qfkxv]: timed out waiting for the condition
  Warning  FailedMount  74s (x15 over 15m)   kubelet            MountVolume.NewMounter initialization failed for volume "pv-local" : path "/home/rancher/k8s/localpv" does not exist

预期结果:
pod能正常挂载localpv

截图:

其他上下文信息:

日志

在讨论这个问题之前,非常想说一句:可否把markdown格式优化一下?这个问题中大量的yaml和logs,由于没有合理使用markdown,导致阅读起来非常困难。

针对这个问题,我怀疑这里有问题: PV的nodeSelectorTerms设置,RKE集群节点的 kubernetes.io/hostname 普遍是ip-xx-xx-xx-xx格式,而你使用的是纯粹ip。

此外,Rancher有一个比k8s原生local sc更优雅一点的实现,可以动态创建PVC,避免搞PV。可参考:GitHub - rancher/local-path-provisioner: Dynamically provisioning persistent local storage with Kubernetes 。当然,由于kubelet是容器的缘故,你还是需要设置extra-binds。

这种编辑器真的不是很友好 :expressionless:

针对这个问题,我怀疑这里有问题: PV的nodeSelectorTerms设置,RKE集群节点的 kubernetes.io/hostname 普遍是ip-xx-xx-xx-xx格式,而你使用的是纯粹ip。

》》 kubectl get nodes是ip地址,而且在其他的k8s平台上测试是正常的

此外,Rancher有一个比k8s原生local sc更优雅一点的实现,可以动态创建PVC,避免搞PV。可参考:GitHub - rancher/local-path-provisioner: Dynamically provisioning persistent local storage with Kubernetes 。当然,由于kubelet是容器的缘故,你还是需要设置extra-binds。

》》rke的集群配置怎么更新,修改cluster.yaml,然后rke up --update-only --config cluster.yaml吗。我已经这样更新,还是不行 :face_exhaling:,需要手动到kubelet容器里面手动挂载

在问题节点执行docker inspect kubelet,查看extra_binds的目录是否被正确设定。

我在我的环境试验了下,没有问题。其他参考排查项:

  1. 本地是否创建了 localpv 对应的文件目录
  2. 和你的另一个区别是:没有额外增加 .../kubelet-plugins 那个extra_binds
  3. 看到你给pvcdemo设置了RWX,我理解local-storage是无法ReadWriteMany的,你可以去掉试试看。
[rancher@fotileappworker01 localpv]$ pwd
/home/rancher/k8s/localpv
[rancher@fotileappworker01 localpv]$ docker inspect kubelet | grep -i extra
            "ExtraHosts": null,

最开始我也是没加 .../kubelet-plugins也是不行,rke update之后不要再重启什么吧

看到你给pvcdemo设置了RWX,我理解local-storage是无法ReadWriteMany的,你可以去掉试试看
》》》直接用ReadWriteOnce?

对了,我那个pv是单独挂载了一块硬盘,格式化后mount到那个目录的,不知道有没影响

这个帖子有点混乱,可能是多个问题混合讨论,有点难以应答。
结合你在Office Hour的问题,我说明一下我的用法,希望对你有帮助。

基本环境:Rancher v2.6.6、RKE K8s v1.23.7
local-path-provisioner:v0.0.22

默认情况下,local-path-provisioner会把所有pvc都映射到主机目录 /opt/local-path-provisioner

在RKE部署完毕并安装local-path-provisioner后,创建PVC:

部署一个简单的deployment,将这个PVC 挂载到容器目录 /data

我可以在主机目录 /opt/local-path-provisioner 看到该卷,并且我在容器中写入数据时,两者是同步的:

BTW:由于local-path-provisioner是动态PV机制,其实你并不需要手动创建PV。

补充:RKE1使用容器化部署kubelet,当pod volume使用subPath时,需要辅助extra_binds,否则,正常使用即可。
参考:volume hostpath with subpath · Issue #14836 · rancher/rancher · GitHub

我不是用local-path-provisioner,而是用了启用volumeBindingMode: WaitForFirstConsumer的原生storageclass,如果直接用hostpath或者pv是正常的

环境:rancher2.6.4,rke集群
pv,pvc,storageclass以及pod yaml文件如下:

cat local-storageclass.yaml 
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer


cat pv-local.yaml 
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-local
spec:
  capacity:
    storage: 5Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  storageClassName: local-storage
  local:
    path: /localtest
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - 10.11.111.25


cat pvc-local.yaml 
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-local
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  storageClassName: local-storage
        


cat pod-local-pv.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: nginxdemo
spec:
  hostname: nginxdemo
  volumes:
    - name: pvc-local-pv
      persistentVolumeClaim:
        claimName: pvc-local
  containers:
    - name: nginx
      image: nginx:alpine
      imagePullPolicy: IfNotPresent
      volumeMounts:
      - name: pvc-local-pv
        mountPath: /localtest

错误: