K3s配置registries.yaml未正常生效,certs.d未正常创建

环境信息:
K3s 版本:
k3s version v1.24.8+k3s1 (648004e4)
go version go1.18.8

节点 CPU 架构、操作系统和版本:
架构为arm64,操作系统为ubuntu20.04

集群配置:
1server,2agents

问题描述:
使用registries.yaml进行私有镜像仓库配置,但不知为何不能正常生效

复现步骤:
离线安装k3s1.24.8+k3s1
在所有节点的/etc/rancher/k3s/目录下均创建了registries.yaml
registries.yaml内容:

nvidia@node166:~/Downloads/k3sTest$ cat /etc/rancher/k3s/registries.yaml
mirrors:
  docker.io:
    endpoint:
      - "http://192.168.5.130:1119"
  harbor.crrc.com:
    endpoint:
      - "https://192.168.5.130:1119"
configs:
  harbor.crrc.com:
    auth:
      username: admin
      password: Harbor12345
    tls:
      cert_file: /home/nvidia/Downloads/certs/harbor.crrc.com.crt
      key_file: /home/nvidia/Downloads/certs/harbor.crrc.com.key
      ca_file: /home/nvidia/Downloads/certs/ca.crt

实际结果:
使用systemctl restart k3s重启k3s服务后使用sudo crictl info | grep -A 5 "registry"查看,得到:

"registry": {
      "configPath": "",
      "mirrors": null,
      "configs": null,
      "auths": null,
      "headers": null

使用sudo k3s crictl pull harbor.crrc.com:1119/test/registry尝试拉取镜像得到

E1125 07:11:04.613471   45861 remote_image.go:238] "PullImage from image service failed" err="rpc error: code = Unknown desc = failed to pull and unpack image \"harbor.crrc.com:1119/test/registry:latest\": failed to resolve reference \"harbor.crrc.com:1119/test/registry:latest\": pulling from host harbor.crrc.com:1119 failed with status code [manifests latest]: 401 Unauthorized" image="harbor.crrc.com:1119/test/registry"
FATA[0000] pulling image: rpc error: code = Unknown desc = failed to pull and unpack image "harbor.crrc.com:1119/test/registry:latest": failed to resolve reference "harbor.crrc.com:1119/test/registry:latest": pulling from host harbor.crrc.com:1119 failed with status code [manifests latest]: 401 Unauthorized

查看containerd配置文件如下

nvidia@node166:~/Downloads/k3sTest$ sudo cat /var/lib/rancher/k3s/agent/etc/containerd/config.toml

version = 2

[plugins."io.containerd.internal.v1.opt"]
  path = "/var/lib/rancher/k3s/agent/containerd"
[plugins."io.containerd.grpc.v1.cri"]
  stream_server_address = "127.0.0.1"
  stream_server_port = "10010"
  enable_selinux = false
  enable_unprivileged_ports = true
  enable_unprivileged_icmp = true
  sandbox_image = "rancher/mirrored-pause:3.6"

[plugins."io.containerd.grpc.v1.cri".containerd]
  snapshotter = "overlayfs"
  disable_snapshot_annotations = true
  default_runtime_name = "nvidia-container-runtime"

[plugins."io.containerd.grpc.v1.cri".cni]
  bin_dir = "/var/lib/rancher/k3s/data/03319a42bd191a541dd2fb18e572bf84e43905984afb83f1aca41e70cf220067/bin"
  conf_dir = "/var/lib/rancher/k3s/agent/etc/cni/net.d"


[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
  runtime_type = "io.containerd.runc.v2"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
  SystemdCgroup = true

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia-container-runtime]
  runtime_type = "io.containerd.runtime.v1.linux"
  runtime_engine = "/usr/bin/nvidia-container-runtime"








[plugins."io.containerd.grpc.v1.cri".containerd.runtimes."nvidia"]
  runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes."nvidia".options]
  BinaryName = "/usr/bin/nvidia-container-runtime"

改成这样试试:

nvidia@node166:~/Downloads/k3sTest$ cat /etc/rancher/k3s/registries.yaml
mirrors:
  docker.io:
    endpoint:
      - "http://192.168.5.130:1119"
  harbor.crrc.com:
    endpoint:
      - "https://192.168.5.130:1119"
configs:
  "192.168.5.130:1119":
    auth:
      username: admin
      password: Harbor12345
    tls:
      cert_file: /home/nvidia/Downloads/certs/harbor.crrc.com.crt
      key_file: /home/nvidia/Downloads/certs/harbor.crrc.com.key
      ca_file: /home/nvidia/Downloads/certs/ca.crt

改过并且重启过k3s集群了,但还是不行

nvidia@node166:~/Downloads/k3sTest$ cat /etc/rancher/k3s/registries.yaml
mirrors:
  docker.io:
    endpoint:
      - "http://192.168.5.130:1119"
  harbor.crrc.com:
    endpoint:
      - "https://192.168.5.130:1119"
configs:
  "192.168.5.130:1119":
    auth:
      username: admin
      password: Harbor12345
    tls:
      cert_file: /home/nvidia/Downloads/certs/harbor.crrc.com.crt
      key_file: /home/nvidia/Downloads/certs/harbor.crrc.com.key
      ca_file: /home/nvidia/Downloads/certs/ca.crt

对应的pod.yaml如下:

apiVersion: v1
kind: Pod
metadata:
  name: testpod
  namespace: default
  labels:
    app: myapp
    environment: dev
spec:
  nodeSelector:
    kubernetes.io/hostname: node166
  containers:
  - name: mycontainer
    image: harbor.crrc.com:1119/test/registry
    imagePullPolicy: Always

pod部署日志如下:

nvidia@node166:~/Downloads/k3sTest$ kubectl describe pods
Name:         testpod
Namespace:    default
Priority:     0
Node:         node166/192.168.5.166
Start Time:   Wed, 27 Nov 2024 09:08:13 +0000
Labels:       app=myapp
              environment=dev
Annotations:  <none>
Status:       Pending
IP:           10.42.0.30
IPs:
  IP:  10.42.0.30
Containers:
  mycontainer:
    Container ID:
    Image:          harbor.crrc.com:1119/test/registry
    Image ID:
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-v65lk (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  kube-api-access-v65lk:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              kubernetes.io/hostname=node166
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 4s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 4s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  3m47s                  default-scheduler  Successfully assigned default/testpod to node166
  Normal   Pulling    2m20s (x4 over 3m46s)  kubelet            Pulling image "harbor.crrc.com:1119/test/registry"
  Warning  Failed     2m19s (x4 over 3m46s)  kubelet            Failed to pull image "harbor.crrc.com:1119/test/registry": rpc error: code = Unknown desc = failed to pull and unpack image "harbor.crrc.com:1119/test/registry:latest": failed to resolve reference "harbor.crrc.com:1119/test/registry:latest": pulling from host harbor.crrc.com:1119 failed with status code [manifests latest]: 401 Unauthorized
  Warning  Failed     2m19s (x4 over 3m46s)  kubelet            Error: ErrImagePull
  Warning  Failed     2m7s (x6 over 3m46s)   kubelet            Error: ImagePullBackOff
  Normal   BackOff    112s (x7 over 3m46s)   kubelet            Back-off pulling image "harbor.crrc.com:1119/test/registry"

有两个方案,你都可以试试:
方案 1:
将容器镜像改为为:harbor.crrc.com/test/registry

方案 2:
改为:

nvidia@node166:~/Downloads/k3sTest$ cat /etc/rancher/k3s/registries.yaml
mirrors:
  docker.io:
    endpoint:
      - "http://192.168.5.130:1119"
  "harbor.crrc.com:1119":
    endpoint:
      - "https://192.168.5.130:1119"
configs:
  "192.168.5.130:1119":
    auth:
      username: admin
      password: Harbor12345
    tls:
      cert_file: /home/nvidia/Downloads/certs/harbor.crrc.com.crt
      key_file: /home/nvidia/Downloads/certs/harbor.crrc.com.key
      ca_file: /home/nvidia/Downloads/certs/ca.crt

这两个方案有冲突,只能按照某一个方案去执行。

谢谢你的回复,这两个方法我都试过了,还是都没有效果,使用sudo crictl info | grep -A 5 "registry"命令查看时依旧是这样的结果,似乎是k3s没有自动从registries.yaml文件中生成containerd的配置文件。

 "registry": {
      "configPath": "",
      "mirrors": null,
      "configs": null,
      "auths": null,
      "headers": null
    },

有没有可能是k3s版本过旧的缘故

你不用看这个,现在的 K3s 版本已经不能从 crictl info 中查到配置了,参考:https://github.com/k3s-io/k3s/issues/9626

而且,这两个方案你说不行,都报的啥错啊,

感谢你的回复,我们所采用的k3s版本为1.24.8,是两年前的老版本。
方案一的报错似乎是找不到认证

Events:
  Type     Reason   Age                    From     Message
  ----     ------   ----                   ----     -------
  Normal   Killing  5m4s                   kubelet  Container mycontainer definition changed, will be restarted
  Normal   BackOff  4m37s (x2 over 5m4s)   kubelet  Back-off pulling image "harbor.crrc.com/test/registry"
  Warning  Failed   4m37s (x70 over 23h)   kubelet  Error: ImagePullBackOff
  Normal   Pulling  3m35s (x4 over 5m4s)   kubelet  Pulling image "harbor.crrc.com/test/registry"
  Warning  Failed   3m35s (x4 over 5m4s)   kubelet  Failed to pull image "harbor.crrc.com/test/registry": rpc error: code = Unknown desc = failed to pull and unpack image "harbor.crrc.com/test/registry:latest": failed to resolve reference "harbor.crrc.com/test/registry:latest": failed to do request: Head "https://harbor.crrc.com/v2/test/registry/manifests/latest": x509: certificate is valid for aefe17bf2dcb830be36eb2742c08eb14.a8faa192c3ae0e3652b28d267adf6952.traefik.default, not harbor.crrc.com
  Warning  Failed   3m35s (x11 over 23h)   kubelet  Error: ErrImagePull

方案二报的错和之前一样,都是401未登陆的错


Events:
  Type     Reason     Age   From               Message
  ----     ------     ----  ----               -------
  Normal   Scheduled  2s    default-scheduler  Successfully assigned default/testpod to node166
  Normal   Pulling    1s    kubelet            Pulling image "harbor.crrc.com:1119/test/registry"
  Warning  Failed     1s    kubelet            Failed to pull image "harbor.crrc.com:1119/test/registry": rpc error: code = Unknown desc = failed to pull and unpack image "harbor.crrc.com:1119/test/registry:latest": failed to resolve reference "harbor.crrc.com:1119/test/registry:latest": pulling from host harbor.crrc.com:1119 failed with status code [manifests latest]: 401 Unauthorized
  Warning  Failed     1s    kubelet            Error: ErrImagePull
  Normal   BackOff    1s    kubelet            Back-off pulling image "harbor.crrc.com:1119/test/registry"
  Warning  Failed     1s    kubelet            Error: ImagePullBackOff

官方似乎是在1.26版本之后才更改了这个配置,而且我们在另外的集群上尝试部署了1.30.x的k3s,使用同样的sudo crictl info | grep -A 5 "registry"命令查看,能查找到registry的配置,如下:

"registry": {
      "configPath": "/var/lib/rancher/k3s/agent/etc/containerd/certs.d",
      "mirrors": null,
      "configs": {
        "192.168.5.130:1119": {
          "auth": {

其上registries.yaml的配置如下:

mirrors:
 docker-registry:
   endpoint:
     - "http://registry.cube.local:5000"
 "192.168.5.130:1119":
   endpoint:
     - "http://192.168.5.130:1119"
 "harbor.crrc.com:1119":
   endpoint:
     - "http://192.168.5.130:1119"
configs:
 "192.168.5.130:1119":
   auth:
     username: admin
     password: Harbor12345
   tls:
     cert_file: /home/sgq/Downloads/certs/harbor.crrc.com.crt
     key_file: /home/sgq/Downloads/certs/harbor.crrc.com.key
     ca_file: /home/sgq/Downloads/certs/ca.crt

这时是可以正常拉取Harbor的私有仓库的。