Ubuntu1804启动rke2-server失败,containerd起不来

环境信息:
RKE2 版本: v1.22.15+rke2r1

节点 CPU 架构,操作系统和版本: ubuntu1804

集群配置:

问题描述:

启动rke2-server一直起不来,日志显示找不到/run/k3s/containerd/containerd.sock, containerd启动失败
重现步骤:

  • 安装 RKE2 的命令:

curl -sfL https://rancher-mirror.oss-cn-beijing.aliyuncs.com/rke2/install.sh | INSTALL_RKE2_MIRROR=cn INSTALL_RKE2_VERSION=v1.22.15+rke2r1 sh -
mkdir -p /etc/rancher/rke2
cat << EOF > /etc/rancher/rke2/config.yaml
tls-san:
#keepalived VIP

  • 192.168.1.20
    node-name: “node01”
    node-label:
  • “node01=Master”
    EOF
    systemctl enable rke2-server.service
    systemctl start rke2-server.service
    journalctl -fu rke2-server

预期结果:

实际结果:

日志

Oct 26 14:33:51 node01 rke2[18913]: time=“2022-10-26T14:33:51+08:00” level=info msg=“Waiting for etcd server to become available”
Oct 26 14:33:51 node01 rke2[18913]: time=“2022-10-26T14:33:51+08:00” level=info msg=“Waiting for API server to become available”
Oct 26 14:34:01 node01 rke2[18913]: time=“2022-10-26T14:34:01+08:00” level=info msg=“Waiting for cri connection: rpc error: code = Unavailable desc = connection error: desc = “transport: Error while dialing dial unix /run/k3s/containerd/containerd.sock: connect: no such file or directory””
Oct 26 14:34:11 node01 rke2[18913]: {“level”:“warn”,“ts”:“2022-10-26T14:34:11.325+0800”,“logger”:“etcd-client”,“caller”:“v3@v3.5.4-k3s1/retry_interceptor.go:62”,“msg”:“retrying of unary invoker failed”,“target”:“etcd-endpoints://0xc00066ba40/127.0.0.1:2379”,“attempt”:0,“error”:“rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = “transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused””}
Oct 26 14:34:11 node01 rke2[18913]: time=“2022-10-26T14:34:11+08:00” level=info msg=“Failed to test data store connection: context deadline exceeded”
Oct 26 14:34:21 node01 rke2[18913]: time=“2022-10-26T14:34:21+08:00” level=info msg=“Waiting for cri connection: rpc error: code = Unavailable desc = connection error: desc = “transport: Error while dialing dial unix /run/k3s/containerd/containerd.sock: connect: no such file or directory””

更多日志信息如下:
– Unit rke2-server.service has begun starting up.
Oct 26 14:59:14 node01 sh[19279]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
Oct 26 14:59:14 node01 sh[19279]: /bin/sh: 1: /usr/bin/systemctl: not found
Oct 26 14:59:14 node01 rke2[19295]: time=“2022-10-26T14:59:14+08:00” level=warning msg=“not running in CIS mode”
Oct 26 14:59:14 node01 rke2[19295]: time=“2022-10-26T14:59:14+08:00” level=info msg=“Starting rke2 v1.22.15+rke2r1 (30c66ca6420dbab813baed87beede5c239ccde2c)”
Oct 26 14:59:14 node01 rke2[19295]: time=“2022-10-26T14:59:14+08:00” level=info msg=“Managed etcd cluster initializing”
Oct 26 14:59:14 node01 rke2[19295]: time=“2022-10-26T14:59:14+08:00” level=info msg=“Starting etcd for new cluster”
Oct 26 14:59:14 node01 rke2[19295]: time=“2022-10-26T14:59:14+08:00” level=info msg=“Running kube-apiserver --advertise-port=6443 --allow-privileged=true --anonymous-auth=false --api-audiences=https://kubernetes.default.svc.cluster.local,rke2 --authorization-mode=Node,RBAC --bind-address=0.0.0.0 --cert-dir=/var/lib/rancher/rke2/server/tls/temporary-certs --client-ca-file=/var/lib/rancher/rke2/server/tls/client-ca.crt --egress-selector-config-file=/var/lib/rancher/rke2/server/etc/egress-selector-config.yaml --enable-admission-plugins=NodeRestriction,PodSecurityPolicy --enable-aggregator-routing=true --encryption-provider-config=/var/lib/rancher/rke2/server/cred/encryption-config.json --etcd-cafile=/var/lib/rancher/rke2/server/tls/etcd/server-ca.crt --etcd-certfile=/var/lib/rancher/rke2/server/tls/etcd/client.crt --etcd-keyfile=/var/lib/rancher/rke2/server/tls/etcd/client.key --etcd-servers=https://127.0.0.1:2379 --feature-gates=JobTrackingWithFinalizers=true --insecure-port=0 --kubelet-certificate-authority=/var/lib/rancher/rke2/server/tls/server-ca.crt --kubelet-client-certificate=/var/lib/rancher/rke2/server/tls/client-kube-apiserver.crt --kubelet-client-key=/var/lib/rancher/rke2/server/tls/client-kube-apiserver.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --profiling=false --proxy-client-cert-file=/var/lib/rancher/rke2/server/tls/client-auth-proxy.crt --proxy-client-key-file=/var/lib/rancher/rke2/server/tls/client-auth-proxy.key --requestheader-allowed-names=system:auth-proxy --requestheader-client-ca-file=/var/lib/rancher/rke2/server/tls/request-header-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6443 --service-account-issuer=https://kubernetes.default.svc.cluster.local --service-account-key-file=/var/lib/rancher/rke2/server/tls/service.key --service-account-signing-key-file=/var/lib/rancher/rke2/server/tls/service.key --service-cluster-ip-range=10.43.0.0/16 --service-node-port-range=30000-32767 --storage-backend=etcd3 --tls-cert-file=/var/lib/rancher/rke2/server/tls/serving-kube-apiserver.crt --tls-private-key-file=/var/lib/rancher/rke2/server/tls/serving-kube-apiserver.key”
Oct 26 14:59:14 node01 rke2[19295]: time=“2022-10-26T14:59:14+08:00” level=info msg=“Running kube-scheduler --authentication-kubeconfig=/var/lib/rancher/rke2/server/cred/scheduler.kubeconfig --authorization-kubeconfig=/var/lib/rancher/rke2/server/cred/scheduler.kubeconfig --bind-address=127.0.0.1 --kubeconfig=/var/lib/rancher/rke2/server/cred/scheduler.kubeconfig --profiling=false --secure-port=10259”
Oct 26 14:59:14 node01 rke2[19295]: time=“2022-10-26T14:59:14+08:00” level=info msg=“Running kube-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/var/lib/rancher/rke2/server/cred/controller.kubeconfig --authorization-kubeconfig=/var/lib/rancher/rke2/server/cred/controller.kubeconfig --bind-address=127.0.0.1 --cluster-cidr=10.42.0.0/16 --cluster-signing-kube-apiserver-client-cert-file=/var/lib/rancher/rke2/server/tls/client-ca.crt --cluster-signing-kube-apiserver-client-key-file=/var/lib/rancher/rke2/server/tls/client-ca.key --cluster-signing-kubelet-client-cert-file=/var/lib/rancher/rke2/server/tls/client-ca.crt --cluster-signing-kubelet-client-key-file=/var/lib/rancher/rke2/server/tls/client-ca.key --cluster-signing-kubelet-serving-cert-file=/var/lib/rancher/rke2/server/tls/server-ca.crt --cluster-signing-kubelet-serving-key-file=/var/lib/rancher/rke2/server/tls/server-ca.key --cluster-signing-legacy-unknown-cert-file=/var/lib/rancher/rke2/server/tls/client-ca.crt --cluster-signing-legacy-unknown-key-file=/var/lib/rancher/rke2/server/tls/client-ca.key --configure-cloud-routes=false --controllers=*,-service,-route,-cloud-node-lifecycle --feature-gates=JobTrackingWithFinalizers=true --kubeconfig=/var/lib/rancher/rke2/server/cred/controller.kubeconfig --profiling=false --root-ca-file=/var/lib/rancher/rke2/server/tls/server-ca.crt --secure-port=10257 --service-account-private-key-file=/var/lib/rancher/rke2/server/tls/service.key --service-cluster-ip-range=10.43.0.0/16 --use-service-account-credentials=true”
Oct 26 14:59:14 node01 rke2[19295]: time=“2022-10-26T14:59:14+08:00” level=info msg=“Running cloud-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/var/lib/rancher/rke2/server/cred/cloud-controller.kubeconfig --authorization-kubeconfig=/var/lib/rancher/rke2/server/cred/cloud-controller.kubeconfig --bind-address=127.0.0.1 --cloud-provider=rke2 --cluster-cidr=10.42.0.0/16 --configure-cloud-routes=false --kubeconfig=/var/lib/rancher/rke2/server/cred/cloud-controller.kubeconfig --node-status-update-frequency=1m0s --port=0 --profiling=false”
Oct 26 14:59:14 node01 rke2[19295]: time=“2022-10-26T14:59:14+08:00” level=info msg=“Tunnel server egress proxy mode: disabled”
Oct 26 14:59:14 node01 rke2[19295]: time=“2022-10-26T14:59:14+08:00” level=info msg=“Server node token is available at /var/lib/rancher/rke2/server/token”
Oct 26 14:59:14 node01 rke2[19295]: time=“2022-10-26T14:59:14+08:00” level=info msg=“To join server node to cluster: rke2 server -s https://192.168.1.10:9345 -t {SERVER_NODE_TOKEN}" Oct 26 14:59:14 node01 rke2[19295]: time="2022-10-26T14:59:14+08:00" level=info msg="Agent node token is available at /var/lib/rancher/rke2/server/agent-token" Oct 26 14:59:14 node01 rke2[19295]: time="2022-10-26T14:59:14+08:00" level=info msg="To join agent node to cluster: rke2 agent -s https://192.168.1.10:9345 -t {AGENT_NODE_TOKEN}”
Oct 26 14:59:14 node01 rke2[19295]: time=“2022-10-26T14:59:14+08:00” level=info msg=“Wrote kubeconfig /etc/rancher/rke2/rke2.yaml”
Oct 26 14:59:14 node01 rke2[19295]: time=“2022-10-26T14:59:14+08:00” level=info msg=“Run: rke2 kubectl”
Oct 26 14:59:14 node01 rke2[19295]: time=“2022-10-26T14:59:14+08:00” level=info msg=“Waiting for cri connection: rpc error: code = Unavailable desc = connection error: desc = “transport: Error while dialing dial unix /run/k3s/containerd/containerd.sock: connect: no such file or directory””
Oct 26 14:59:14 node01 rke2[19295]: time=“2022-10-26T14:59:14+08:00” level=info msg=“certificate CN=node01 signed by CN=rke2-server-ca@1666765461: notBefore=2022-10-26 06:24:21 +0000 UTC notAfter=2023-10-26 06:59:14 +0000 UTC”
Oct 26 14:59:14 node01 rke2[19295]: time=“2022-10-26T14:59:14+08:00” level=info msg=“certificate CN=system:node:node01,O=system:nodes signed by CN=rke2-client-ca@1666765461: notBefore=2022-10-26 06:24:21 +0000 UTC notAfter=2023-10-26 06:59:14 +0000 UTC”
Oct 26 14:59:14 node01 rke2[19295]: time=“2022-10-26T14:59:14+08:00” level=info msg=“Module overlay was already loaded”
Oct 26 14:59:14 node01 rke2[19295]: time=“2022-10-26T14:59:14+08:00” level=info msg=“Module nf_conntrack was already loaded”
Oct 26 14:59:14 node01 rke2[19295]: time=“2022-10-26T14:59:14+08:00” level=info msg=“Module br_netfilter was already loaded”
Oct 26 14:59:14 node01 rke2[19295]: time=“2022-10-26T14:59:14+08:00” level=info msg=“Module iptable_nat was already loaded”
Oct 26 14:59:14 node01 rke2[19295]: time=“2022-10-26T14:59:14+08:00” level=info msg=“Checking local image archives in /var/lib/rancher/rke2/agent/images for Docker Hub
Oct 26 14:59:14 node01 rke2[19295]: time=“2022-10-26T14:59:14+08:00” level=warning msg=“Failed to load runtime image Docker Hub from tarball: no local image available for Docker Hub not found in any file in /var/lib/rancher/rke2/agent/images: image not found”
Oct 26 14:59:14 node01 rke2[19295]: time=“2022-10-26T14:59:14+08:00” level=info msg=“Checking local image archives in /var/lib/rancher/rke2/agent/images for Docker Hub
Oct 26 14:59:14 node01 rke2[19295]: time=“2022-10-26T14:59:14+08:00” level=warning msg=“Failed to load runtime image Docker Hub from tarball: no local image available for Docker Hub not found in any file in /var/lib/rancher/rke2/agent/images: image not found”
Oct 26 14:59:14 node01 rke2[19295]: time=“2022-10-26T14:59:14+08:00” level=info msg=“Pulling runtime image Docker Hub
Oct 26 14:59:27 node01 rke2[19295]: time=“2022-10-26T14:59:27+08:00” level=info msg=“Creating directory /var/lib/rancher/rke2/data/v1.22.15-rke2r1-5565860d3e73/bin”
Oct 26 14:59:27 node01 rke2[19295]: time=“2022-10-26T14:59:27+08:00” level=info msg=“Extracting file bin/containerd to /var/lib/rancher/rke2/data/v1.22.15-rke2r1-5565860d3e73/bin/containerd”
Oct 26 14:59:34 node01 rke2[19295]: time=“2022-10-26T14:59:34+08:00” level=info msg=“Waiting for cri connection: rpc error: code = Unavailable desc = connection error: desc = “transport: Error while dialing dial unix /run/k3s/containerd/containerd.sock: connect: no such file or directory””
Oct 26 14:59:44 node01 rke2[19295]: time=“2022-10-26T14:59:44+08:00” level=info msg=“Waiting for etcd server to become available”
Oct 26 14:59:44 node01 rke2[19295]: {“level”:“warn”,“ts”:“2022-10-26T14:59:44.513+0800”,“logger”:“etcd-client”,“caller”:“v3@v3.5.4-k3s1/retry_interceptor.go:62”,“msg”:“retrying of unary invoker failed”,“target”:“etcd-endpoints://0xc0009cf180/127.0.0.1:2379”,“attempt”:0,“error”:“rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = “transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused””}
Oct 26 14:59:44 node01 rke2[19295]: time=“2022-10-26T14:59:44+08:00” level=info msg=“Failed to test data store connection: context deadline exceeded”
Oct 26 14:59:44 node01 rke2[19295]: time=“2022-10-26T14:59:44+08:00” level=info msg=“Waiting for API server to become available”
Oct 26 14:59:54 node01 rke2[19295]: time=“2022-10-26T14:59:54+08:00” level=info msg=“Waiting for cri connection: rpc error: code = Unavailable desc = connection error: desc = “transport: Error while dialing dial unix /run/k3s/containerd/containerd.sock: connect: no such file or directory””
Oct 26 15:00:14 node01 rke2[19295]: time=“2022-10-26T15:00:14+08:00” level=info msg=“Waiting for etcd server to become available”
Oct 26 15:00:14 node01 rke2[19295]: time=“2022-10-26T15:00:14+08:00” level=info msg=“Waiting for API server to become available”
Oct 26 15:00:14 node01 rke2[19295]: time=“2022-10-26T15:00:14+08:00” level=info msg=“Waiting for cri connection: rpc error: code = Unavailable desc = connection error: desc = “transport: Error while dialing dial unix /run/k3s/containerd/containerd.sock: connect: no such file or directory””
Oct 26 15:00:19 node01 rke2[19295]: {“level”:“warn”,“ts”:“2022-10-26T15:00:19.514+0800”,“logger”:“etcd-client”,“caller”:“v3@v3.5.4-k3s1/retry_interceptor.go:62”,“msg”:“retrying of unary invoker failed”,“target”:“etcd-endpoints://0xc0009cf180/127.0.0.1:2379”,“attempt”:0,“error”:“rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = “transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused””}
Oct 26 15:00:19 node01 rke2[19295]: time=“2022-10-26T15:00:19+08:00” level=info msg=“Failed to test data store connection: context deadline exceeded”
Oct 26 15:00:34 node01 rke2[19295]: time=“2022-10-26T15:00:34+08:00” level=info msg=“Waiting for cri connection: rpc error: code = Unavailable desc = connection error: desc = “transport: Error while dialing dial unix /run/k3s/containerd/containerd.sock: connect: no such file or directory””
Oct 26 15:00:44 node01 rke2[19295]: time=“2022-10-26T15:00:44+08:00” level=info msg=“Waiting for etcd server to become available”
Oct 26 15:00:44 node01 rke2[19295]: time=“2022-10-26T15:00:44+08:00” level=info msg=“Waiting for API server to become available”
Oct 26 15:00:54 node01 rke2[19295]: {“level”:“warn”,“ts”:“2022-10-26T15:00:54.516+0800”,“logger”:“etcd-client”,“caller”:“v3@v3.5.4-k3s1/retry_interceptor.go:62”,“msg”:“retrying of unary invoker failed”,“target”:“etcd-endpoints://0xc0009cf180/127.0.0.1:2379”,“attempt”:0,“error”:“rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = “transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused””}
Oct 26 15:00:54 node01 rke2[19295]: time=“2022-10-26T15:00:54+08:00” level=info msg=“Failed to test data store connection: context deadline exceeded”
Oct 26 15:00:54 node01 rke2[19295]: time=“2022-10-26T15:00:54+08:00” level=info msg=“Waiting for cri connection: rpc error: code = Unavailable desc = connection error: desc = “transport: Error while dialing dial unix /run/k3s/containerd/containerd.sock: connect: no such file or directory””

由于你在安装脚本中配置的是国内仓库,这个仓库与dockerhub同步是有延迟的,没有SLA保证,纯粹是社区爱好者维护。
你可以试试其他版本,比如:v1.22.13+rke2r1 。或者切换使用dockerhub主库,那里的版本最全。

问题已解决,将ubuntu1804系统换成ubuntu2004安装启动正常,初步判断应该是内核或系统版本低的原因导致的压缩包解压失败