Rancher2.6.6下使用rke2+cilium部署集群coredns一直重启

coleman · 2022 年8 月 18 日 10:11

环境信息:
RKE2 版本: v1.23.8+rke2r1

节点 CPU 架构，操作系统和版本：Linux k8s-master01 5.15.0-46-generic #49-Ubuntu SMP Thu Aug 4 18:03:25 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

集群配置: rke2+cilium默认配置无更改~~目前只部署一个节点

问题描述: rancher2.6.6下使用rke2+cilium部署集群coredns一直重启

重现步骤:

安装 RKE2 的命令: rancher2.6.6默认配置

预期结果:

实际结果:

日志

coredns日志：
[ERROR] plugin/errors: 2 6296581614096806887.6541633590852796495. HINFO: read udp 10.42.0.136:37265->192.168.123.231:53: i/o timeout
[ERROR] plugin/errors: 2 6296581614096806887.6541633590852796495. HINFO: read udp 10.42.0.136:35435->192.168.123.232:53: i/o timeout
[ERROR] plugin/errors: 2 6296581614096806887.6541633590852796495. HINFO: read udp 10.42.0.136:36955->192.168.123.232:53: i/o timeout
[ERROR] plugin/errors: 2 6296581614096806887.6541633590852796495. HINFO: read udp 10.42.0.136:35306->192.168.123.232:53: i/o timeout
[ERROR] plugin/errors: 2 6296581614096806887.6541633590852796495. HINFO: read udp 10.42.0.136:38333->192.168.123.231:53: i/o timeout
[ERROR] plugin/errors: 2 6296581614096806887.6541633590852796495. HINFO: read udp 10.42.0.136:45850->192.168.123.231:53: i/o timeout
[ERROR] plugin/errors: 2 6296581614096806887.6541633590852796495. HINFO: read udp 10.42.0.136:49269->192.168.123.232:53: i/o timeout
[WARNING] plugin/kubernetes: Kubernetes API connection failure: Get “https://10.43.0.1:443/version”: dial tcp 10.43.0.1:443: i/o timeout

cordns-autoscaler日志：
E0818 09:50:10.392747 1 autoscaler_server.go:108] Error while getting cluster status: timed out waiting for the condition
E0818 09:50:16.398360 1 reflector.go:125] pkg/mod/k8s.io/client-go@v0.0.0-20190718183610-8e956561bbf5/tools/cache/reflector.go:98: Failed to list *v1.Node: Get “https://10.43.0.1:443/api/v1/nodes”: dial tcp 10.43.0.1:443: i/o timeout
E0818 09:50:20.392188 1 autoscaler_server.go:108] Error while getting cluster status: timed out waiting for the condition
E0818 09:50:30.391773 1 autoscaler_server.go:108] Error while getting cluster status: timed out waiting for the condition

kube-proxy 日志：（）
root@k8s-master01:~# I0818 09:57:19.512690 E0818 09:57:19.645567 I0818 09:57:20.152647 I0818 09:57:20.156850 I0818 09:57:20.161322 I0818 09:57:20.163988 I0818 09:57:20.166459 time=“2022-08-18T09:57:20Z” E0818 09:57:20.500220 E0818 09:57:21.684808 E0818 09:57:27.493909 I0818 09:57:31.934576 I0818 09:57:31.934647 I0818 09:57:32.798258 I0818 09:57:32.798343 I0818 09:57:32.798366 I0818 09:57:32.798419 I0818 09:57:32.834726 I0818 09:57:33.187033 I0818 09:57:33.187123 I0818 09:57:33.187131 I0818 09:57:33.187158 I0818 09:57:33.287604 I0818 09:57:33.287930 root@k8s-master01:~# ip_vs_sh 16384 0
ip_vs_wrr 16384 0
ip_vs_rr 16384 0
ip_vs nf_conntrack nf_defrag_ipv6 libcrc32c kubectl -n kube-system logs kube-proxy-k8s-master01
1 server.go:225] “Warning, all flags other than --config, --write-config-to, and --cleanup are deprecated, please begin using a config file ASAP”
1 proxier.go:643] “Failed to read builtin modules file, you can ignore this message when kube-proxy is running inside container without mounting /lib/modules” err=“open /lib/modules/5.15.0-46-generic/modules.builtin: no such file or directory” filePath="/lib/modules/5.15.0-46-generic/modules.builtin"
1 proxier.go:653] “Failed to load kernel module with modprobe, you can ignore this message when kube-proxy is running inside container without mounting /lib/modules” moduleName=“ip_vs”
1 proxier.go:653] “Failed to load kernel module with modprobe, you can ignore this message when kube-proxy is running inside container without mounting /lib/modules” moduleName=“ip_vs_rr”
1 proxier.go:653] “Failed to load kernel module with modprobe, you can ignore this message when kube-proxy is running inside container without mounting /lib/modules” moduleName=“ip_vs_wrr”
1 proxier.go:653] “Failed to load kernel module with modprobe, you can ignore this message when kube-proxy is running inside container without mounting /lib/modules” moduleName=“ip_vs_sh”
1 proxier.go:653] “Failed to load kernel module with modprobe, you can ignore this message when kube-proxy is running inside container without mounting /lib/modules” moduleName=“nf_conntrack”
level=warning msg=“Running modprobe ip_vs failed with message: ``, error: exit status 1”
1 node.go:152] Failed to retrieve node info: Get “https://127.0.0.1:6443/api/v1/nodes/k8s-master01”: dial tcp 127.0.0.1:6443: connect: connection refused
1 node.go:152] Failed to retrieve node info: Get “https://127.0.0.1:6443/api/v1/nodes/k8s-master01”: dial tcp 127.0.0.1:6443: connect: connection refused
1 node.go:152] Failed to retrieve node info: nodes “k8s-master01” is forbidden: User “system:kube-proxy” cannot get resource “nodes” in API group “” at the cluster scope
1 node.go:163] Successfully retrieved node IP: 192.168.123.201
1 server_others.go:138] “Detected node IP” address=“192.168.123.201”
1 server_others.go:206] “Using iptables Proxier”
1 server_others.go:213] “kube-proxy running in dual-stack mode” ipFamily=IPv4
1 server_others.go:214] “Creating dualStackProxier for iptables”
1 server_others.go:491] “Detect-local-mode set to ClusterCIDR, but no IPv6 cluster CIDR defined, , defaulting to no-op detect-local for IPv6”
1 server.go:656] “Version info” version=“v1.23.8+rke2r1”
1 config.go:317] “Starting service config controller”
1 shared_informer.go:240] Waiting for caches to sync for service config
1 config.go:226] “Starting endpoint slice config controller”
1 shared_informer.go:240] Waiting for caches to sync for endpoint slice config
1 shared_informer.go:247] Caches are synced for service config
1 shared_informer.go:247] Caches are synced for endpoint slice config
lsmod | grep ip_vs
176128 6 ip_vs_rr,ip_vs_sh,ip_vs_wrr
172032 6 xt_conntrack,nf_nat,xt_nat,xt_CT,xt_MASQUERADE,ip_vs
24576 4 nf_conntrack,xt_socket,xt_TPROXY,ip_vs
16384 6 nf_conntrack,nf_nat,btrfs,nf_tables,raid456,ip_vs

coleman · 2022 年8 月 18 日 10:15

补充下，系统是 ubuntu22.04 想用5.10+内核部署cilium

niusmallnan · 2022 年8 月 19 日 03:46

Rancher还没有正式声明对Ubuntu 22.04的支持，所以通常来讲大概率会遇到兼容性问题。

我偶尔会使用22.04，如果部署Cilium插件，我会使用这个临时方案：kubeadm cluster not working against Ubuntu 22.04 and RHEL 9 · Issue #20125 · cilium/cilium · GitHub

coleman · 2022 年8 月 20 日 09:26

谢谢已退回 ubuntu20.04 升级内核到5.15 测试 cilium了~