Rke2删掉俩台master之后,在添加俩台master后,观察到kube-system名称空间下job(helm-install-rke2-canal、helm-install-rke2-coredns)一直重启状态

环境信息:
RKE2 版本:

rke2 -v

rke2 version v1.24.10+rke2r1 (1ccdce2571291649b9414af1f269f645c3fe4002)
go version go1.19.5 X:boringcrypto

节点 CPU 架构,操作系统和版本:

uname -a

Linux iZwz9hifjgcz508mj7fsqcZ 5.4.0-125-generic #141-Ubuntu SMP Wed Aug 10 13:42:03 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

集群配置:

kubectl get nodes

NAME STATUS ROLES AGE VERSION
izwz9hifjgcz508mj7fsqcz Ready control-plane,etcd,master 33d v1.24.10+rke2r1
master-01 Ready control-plane,etcd,master 3d11h v1.24.11+rke2r1
master-02 Ready control-plane,etcd,master 3d11h v1.24.11+rke2r1
worker-03 Ready 3d14h v1.24.11+rke2r1
worker-04 Ready 3d14h v1.24.11+rke2r1

问题描述:
rke2删掉俩台master之后,在添加俩台master后,观察到kube-system名称空间下job(helm-install-rke2-canal、helm-install-rke2-coredns)一直重启状态,观察日志后,发现提示为“Error: UPGRADE FAILED: chart requires kubeVersion: >= v1.24.11 which is incompatible with Kubernetes v1.24.10+rke2r1”

重现步骤:
先驱逐master上pod,然后删减一台master,再加一台master,再驱逐pod减一台master,再加一台master

预期结果:
集群运行稳定,状态都正常

实际结果:
集群运行正常,但是kube-system名称空间下有job的pod一直重启

kubectl get po -n kube-system

NAME READY STATUS RESTARTS AGE
cloud-controller-manager-izwz9hifjgcz508mj7fsqcz 1/1 Running 14 (23d ago) 33d
cloud-controller-manager-master-01 1/1 Running 0 3d11h
cloud-controller-manager-master-02 1/1 Running 0 3d11h
etcd-izwz9hifjgcz508mj7fsqcz 1/1 Running 7 (23d ago) 33d
etcd-master-01 1/1 Running 0 3d11h
etcd-master-02 1/1 Running 0 3d11h
helm-install-rke2-canal-jqr6t 0/1 CrashLoopBackOff 983 (3m8s ago) 3d11h
helm-install-rke2-coredns-4qr9z 0/1 CrashLoopBackOff 984 (3m3s ago) 3d11h
helm-install-rke2-ingress-nginx-767sv 0/1 Completed 1 3d11h
helm-install-rke2-metrics-server-jbffv 0/1 Completed 4 3d11h
kube-apiserver-izwz9hifjgcz508mj7fsqcz 1/1 Running 7 (23d ago) 33d
kube-apiserver-master-01 1/1 Running 0 3d11h
kube-apiserver-master-02 1/1 Running 0 3d11h
kube-controller-manager-izwz9hifjgcz508mj7fsqcz 1/1 Running 13 (23d ago) 33d
kube-controller-manager-master-01 1/1 Running 0 3d11h
kube-controller-manager-master-02 1/1 Running 0 3d11h
kube-proxy-izwz9hifjgcz508mj7fsqcz 1/1 Running 7 (23d ago) 33d
kube-proxy-master-01 1/1 Running 0 3d11h
kube-proxy-master-02 1/1 Running 0 3d11h
kube-proxy-worker-03 1/1 Running 0 3d14h
kube-proxy-worker-04 1/1 Running 0 3d15h
kube-scheduler-izwz9hifjgcz508mj7fsqcz 1/1 Running 7 (23d ago) 33d
kube-scheduler-master-01 1/1 Running 0 3d11h
kube-scheduler-master-02 1/1 Running 0 3d11h
nfs-client-provisioner-6b6c4968c8-wgqws 1/1 Running 1 (3d11h ago) 13d
rke2-canal-44s4s 2/2 Running 0 3d15h
rke2-canal-crl4q 2/2 Running 15 (23d ago) 33d
rke2-canal-kw5cq 2/2 Running 0 3d14h
rke2-canal-ppbdf 2/2 Running 0 3d11h
rke2-canal-rq686 2/2 Running 0 3d11h
rke2-coredns-rke2-coredns-58fd75f64b-bz6fk 1/1 Running 0 3d11h
rke2-coredns-rke2-coredns-58fd75f64b-httwv 1/1 Running 0 3d11h
rke2-coredns-rke2-coredns-autoscaler-768bfc5985-d8v2z 1/1 Running 0 3d11h
rke2-ingress-nginx-controller-4n464 1/1 Running 0 3d10h
rke2-ingress-nginx-controller-g7gcm 1/1 Running 0 3d10h
rke2-ingress-nginx-controller-jd4q4 1/1 Running 0 3d10h
rke2-ingress-nginx-controller-k7swd 1/1 Running 0 3d10h
rke2-ingress-nginx-controller-q4c7x 1/1 Running 0 3d10h
rke2-metrics-server-74f878b999-2ckxt 1/1 Running 0 3d11h

日志

kubectl logs helm-install-rke2-canal-jqr6t -n kube-system
if [[ {KUBERNETES_SERVICE_HOST} =~ .*:.* ]]; then echo "KUBERNETES_SERVICE_HOST is using IPv6" CHART="{CHART//%{KUBERNETES_API}%/[{KUBERNETES_SERVICE_HOST}]:{KUBERNETES_SERVICE_PORT}}"
else
CHART="{CHART//%\{KUBERNETES_API\}%/{KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}}"
fi

set +v -x

  • [[ true != \t\r\u\e ]]
  • [[ ‘’ == \1 ]]
  • [[ ‘’ == \v\2 ]]
  • [[ -f /config/ca-file.pem ]]
  • [[ -n ‘’ ]]
  • shopt -s nullglob
  • helm_content_decode
  • set -e
  • ENC_CHART_PATH=/chart/rke2-canal.tgz.base64
  • CHART_PATH=/tmp/rke2-canal.tgz
  • [[ ! -f /chart/rke2-canal.tgz.base64 ]]
  • base64 -d /chart/rke2-canal.tgz.base64
  • CHART=/tmp/rke2-canal.tgz
  • set +e
  • [[ install != \d\e\l\e\t\e ]]
  • helm_repo_init
  • grep -q -e ‘https?://’
  • [[ helm_v3 == \h\e\l\m_\v\3 ]]
  • [[ /tmp/rke2-canal.tgz == stable/* ]]
  • [[ -n ‘’ ]]
  • helm_update install --set-string global.clusterCIDR=10.42.0.0/16 --set-string global.clusterCIDRv4=10.42.0.0/16 --set-string global.clusterDNS=10.43.0.10 --set-string global.clusterDomain=cluster.local --set-string global.rke2DataDir=/var/lib/rancher/rke2 --set-string global.serviceCIDR=10.43.0.0/16
  • [[ helm_v3 == \h\e\l\m_\v\3 ]]
    ++ helm_v3 ls --all -f ‘^rke2-canal$’ --namespace kube-system --output json
    ++ jq -r ‘"(.[0].app_version),(.[0].status)"’
    ++ tr ‘[:upper:]’ ‘[:lower:]’
  • LINE=v3.24.5,deployed
  • IFS=,
  • read -r INSTALLED_VERSION STATUS _
  • VALUES=
  • [[ install = \d\e\l\e\t\e ]]
  • [[ v3.24.5 =~ ^(|null)$ ]]
  • [[ deployed =~ ^(pending-install|pending-upgrade|pending-rollback)$ ]]
  • [[ deployed == \d\e\p\l\o\y\e\d ]]
  • echo ‘Already installed rke2-canal’
    Already installed rke2-canal
  • [[ helm_v3 == \h\e\l\m_\v\3 ]]
  • helm_v3 mapkubeapis rke2-canal --namespace kube-system
    2023/04/03 02:21:19 Release ‘rke2-canal’ will be checked for deprecated or removed Kubernetes APIs and will be updated if necessary to supported API versions.
    2023/04/03 02:21:19 Get release ‘rke2-canal’ latest version.
    2023/04/03 02:21:19 Check release ‘rke2-canal’ for deprecated or removed APIs…
    2023/04/03 02:21:19 Finished checking release ‘rke2-canal’ for deprecated or removed APIs.
    2023/04/03 02:21:19 Release ‘rke2-canal’ has no deprecated or removed APIs.
    2023/04/03 02:21:19 Map of release ‘rke2-canal’ deprecated or removed APIs to supported versions, completed successfully.
  • echo ‘Upgrading helm_v3 chart’
  • echo ‘Upgrading rke2-canal’
  • shift 1
    Upgrading rke2-canal
  • helm_v3 upgrade --set-string global.clusterCIDR=10.42.0.0/16 --set-string global.clusterCIDRv4=10.42.0.0/16 --set-string global.clusterDNS=10.43.0.10 --set-string global.clusterDomain=cluster.local --set-string global.rke2DataDir=/var/lib/rancher/rke2 --set-string global.serviceCIDR=10.43.0.0/16 rke2-canal /tmp/rke2-canal.tgz
    Error: UPGRADE FAILED: chart requires kubeVersion: >= v1.24.11 which is incompatible with Kubernetes v1.24.10+rke2r1
  • exit

“Error: UPGRADE FAILED: chart requires kubeVersion: >= v1.24.11 which is incompatible with Kubernetes v1.24.10+rke2r1”
这是要升级集群吗?不升级集群的话,删除这俩个job会有什么影响,有没有大佬告知一下

你的master节点的版本居然不一致,这很疯狂。

看到了 原因是pod分到了版本低的节点上了

这种版本的差异会发生什么