Rancher2.6.6下使用rke2+cilium部署集群coredns一直重启

环境信息:
RKE2 版本: v1.23.8+rke2r1

节点 CPU 架构,操作系统和版本:Linux k8s-master01 5.15.0-46-generic #49-Ubuntu SMP Thu Aug 4 18:03:25 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

集群配置: rke2+cilium默认配置无更改~~目前只部署一个节点

问题描述: rancher2.6.6下使用rke2+cilium部署集群coredns一直重启

重现步骤:

  • 安装 RKE2 的命令: rancher2.6.6默认配置

预期结果:

实际结果:

日志

coredns日志:
[ERROR] plugin/errors: 2 6296581614096806887.6541633590852796495. HINFO: read udp 10.42.0.136:37265->192.168.123.231:53: i/o timeout
[ERROR] plugin/errors: 2 6296581614096806887.6541633590852796495. HINFO: read udp 10.42.0.136:35435->192.168.123.232:53: i/o timeout
[ERROR] plugin/errors: 2 6296581614096806887.6541633590852796495. HINFO: read udp 10.42.0.136:36955->192.168.123.232:53: i/o timeout
[ERROR] plugin/errors: 2 6296581614096806887.6541633590852796495. HINFO: read udp 10.42.0.136:35306->192.168.123.232:53: i/o timeout
[ERROR] plugin/errors: 2 6296581614096806887.6541633590852796495. HINFO: read udp 10.42.0.136:38333->192.168.123.231:53: i/o timeout
[ERROR] plugin/errors: 2 6296581614096806887.6541633590852796495. HINFO: read udp 10.42.0.136:45850->192.168.123.231:53: i/o timeout
[ERROR] plugin/errors: 2 6296581614096806887.6541633590852796495. HINFO: read udp 10.42.0.136:49269->192.168.123.232:53: i/o timeout
[WARNING] plugin/kubernetes: Kubernetes API connection failure: Get “https://10.43.0.1:443/version”: dial tcp 10.43.0.1:443: i/o timeout

cordns-autoscaler日志:
E0818 09:50:10.392747 1 autoscaler_server.go:108] Error while getting cluster status: timed out waiting for the condition
E0818 09:50:16.398360 1 reflector.go:125] pkg/mod/k8s.io/client-go@v0.0.0-20190718183610-8e956561bbf5/tools/cache/reflector.go:98: Failed to list *v1.Node: Get “https://10.43.0.1:443/api/v1/nodes”: dial tcp 10.43.0.1:443: i/o timeout
E0818 09:50:20.392188 1 autoscaler_server.go:108] Error while getting cluster status: timed out waiting for the condition
E0818 09:50:30.391773 1 autoscaler_server.go:108] Error while getting cluster status: timed out waiting for the condition

kube-proxy 日志:()
root@k8s-master01:~# kubectl -n kube-system logs kube-proxy-k8s-master01
I0818 09:57:19.512690 1 server.go:225] “Warning, all flags other than --config, --write-config-to, and --cleanup are deprecated, please begin using a config file ASAP”
E0818 09:57:19.645567 1 proxier.go:643] “Failed to read builtin modules file, you can ignore this message when kube-proxy is running inside container without mounting /lib/modules” err=“open /lib/modules/5.15.0-46-generic/modules.builtin: no such file or directory” filePath="/lib/modules/5.15.0-46-generic/modules.builtin"
I0818 09:57:20.152647 1 proxier.go:653] “Failed to load kernel module with modprobe, you can ignore this message when kube-proxy is running inside container without mounting /lib/modules” moduleName=“ip_vs”
I0818 09:57:20.156850 1 proxier.go:653] “Failed to load kernel module with modprobe, you can ignore this message when kube-proxy is running inside container without mounting /lib/modules” moduleName=“ip_vs_rr”
I0818 09:57:20.161322 1 proxier.go:653] “Failed to load kernel module with modprobe, you can ignore this message when kube-proxy is running inside container without mounting /lib/modules” moduleName=“ip_vs_wrr”
I0818 09:57:20.163988 1 proxier.go:653] “Failed to load kernel module with modprobe, you can ignore this message when kube-proxy is running inside container without mounting /lib/modules” moduleName=“ip_vs_sh”
I0818 09:57:20.166459 1 proxier.go:653] “Failed to load kernel module with modprobe, you can ignore this message when kube-proxy is running inside container without mounting /lib/modules” moduleName=“nf_conntrack”
time=“2022-08-18T09:57:20Z” level=warning msg=“Running modprobe ip_vs failed with message: ``, error: exit status 1”
E0818 09:57:20.500220 1 node.go:152] Failed to retrieve node info: Get “https://127.0.0.1:6443/api/v1/nodes/k8s-master01”: dial tcp 127.0.0.1:6443: connect: connection refused
E0818 09:57:21.684808 1 node.go:152] Failed to retrieve node info: Get “https://127.0.0.1:6443/api/v1/nodes/k8s-master01”: dial tcp 127.0.0.1:6443: connect: connection refused
E0818 09:57:27.493909 1 node.go:152] Failed to retrieve node info: nodes “k8s-master01” is forbidden: User “system:kube-proxy” cannot get resource “nodes” in API group “” at the cluster scope
I0818 09:57:31.934576 1 node.go:163] Successfully retrieved node IP: 192.168.123.201
I0818 09:57:31.934647 1 server_others.go:138] “Detected node IP” address=“192.168.123.201”
I0818 09:57:32.798258 1 server_others.go:206] “Using iptables Proxier”
I0818 09:57:32.798343 1 server_others.go:213] “kube-proxy running in dual-stack mode” ipFamily=IPv4
I0818 09:57:32.798366 1 server_others.go:214] “Creating dualStackProxier for iptables”
I0818 09:57:32.798419 1 server_others.go:491] “Detect-local-mode set to ClusterCIDR, but no IPv6 cluster CIDR defined, , defaulting to no-op detect-local for IPv6”
I0818 09:57:32.834726 1 server.go:656] “Version info” version=“v1.23.8+rke2r1”
I0818 09:57:33.187033 1 config.go:317] “Starting service config controller”
I0818 09:57:33.187123 1 shared_informer.go:240] Waiting for caches to sync for service config
I0818 09:57:33.187131 1 config.go:226] “Starting endpoint slice config controller”
I0818 09:57:33.187158 1 shared_informer.go:240] Waiting for caches to sync for endpoint slice config
I0818 09:57:33.287604 1 shared_informer.go:247] Caches are synced for service config
I0818 09:57:33.287930 1 shared_informer.go:247] Caches are synced for endpoint slice config
root@k8s-master01:~# lsmod | grep ip_vs
ip_vs_sh 16384 0
ip_vs_wrr 16384 0
ip_vs_rr 16384 0
ip_vs 176128 6 ip_vs_rr,ip_vs_sh,ip_vs_wrr
nf_conntrack 172032 6 xt_conntrack,nf_nat,xt_nat,xt_CT,xt_MASQUERADE,ip_vs
nf_defrag_ipv6 24576 4 nf_conntrack,xt_socket,xt_TPROXY,ip_vs
libcrc32c 16384 6 nf_conntrack,nf_nat,btrfs,nf_tables,raid456,ip_vs

补充下,系统是 ubuntu22.04 想用5.10+内核部署cilium

Rancher还没有正式声明对Ubuntu 22.04的支持,所以通常来讲大概率会遇到兼容性问题。

我偶尔会使用22.04,如果部署Cilium插件,我会使用这个临时方案:kubeadm cluster not working against Ubuntu 22.04 and RHEL 9 · Issue #20125 · cilium/cilium · GitHub

谢谢 已退回 ubuntu20.04 升级内核到5.15 测试 cilium了~