K3s 集群加入新 agent 节点后，无法通过任何 agent 节点访问 web 服务（80/443）

ilharp · 2024 年5 月 6 日 21:42

问题描述

大学生一枚，计算机专业但非 devops 方向，k3s 为个人兴趣爱好。2023-01-23 搭建个人用 k3s 集群，目前稳定运行 469 天。但每当有新节点加入集群后，所有 agent 都会无法访问自己的 http://127.0.0.1，server 无此问题。前几次新节点加入后不知道在哪个操作后问题就不在了，这次新加入两个节点后问题又出现了，尝试重启 klipper 和 k3s-agent 均无效。

目前 CDN 回源全部配置到了 server 的 IP，所以 agent 无法提供 web 服务也不会对业务产生影响，但还是希望能解决这个问题，所以前来提问。

集群配置

集群有 1 台 server 和 5 台 agent，其中 4 台 agent 正在使用，另有 3 台 agent 已移出集群

k3s-sh1-new（server
k3s-sh-tx-2
k3s-sh-tx-3
k3s-ty-nasvm-1（目前 NotReady,SchedulingDisabled
k3s-sh-ali-3（本次新加入的两个节点
k3s-sh-ali-4

整个集群跨多个城市，所以下层网络选用了 ovpn。与集群相同，server 节点的 ovpn 也是 server，agent 节点的 ovpn 也是 agent。ovpn 网段为 10.150.0.0/16，在所有机器上均使用创建的 tun0 网卡工作。

已经确认下层网络没有任何问题（例：各机器之间相互访问 10250 等端口均正常）。

值得一提的是，每次加入新节点后，为确保网络正常工作，我会手动重启所有机器上的 ovpn 服务，重启会造成一秒左右的闪断。重启完成后我会再次确认各机器直接能够相互连通。

除 k3s-ty-nasvm-1 不在运行外，所有机器均设置了正确的 node-name、node-ip 和 node-external-ip，且所有 IP 均可用。

网络方面，仅 disable 了 traefik 并替换为 ingress-nginx。flannel、klipper、coredns 均未做任何修改。IPv4 单栈。flannel 后端为默认 vxlan，klipper 模式为默认 nft。

server 的存储为 MySQL。

初始化方面，集群所有机器均安装 Ubuntu 22.04 并完全执行一个相同的脚本，只有 k3s 启动参数有不同。各机器的启动参数在下文。

附一张 k get no -o wide 截图：

环境信息

K3s 版本

k3s version v1.27.9+k3s1 (2c249a39)
go version go1.20.12

已确认所有节点输出完全相同

节点 CPU 架构、操作系统和版本

Linux （节点名称） 5.15.0-58-generic #64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

所有节点均为 Ubuntu 22.04，内核 5.15.0，仅小版本号不同

复现步骤

安装 K3s 的命令

server：

curl -sfL https://rancher-mirror.rancher.cn/k3s/k3s-install.sh | INSTALL_K3S_VERSION=v1.27.9+k3s1 INSTALL_K3S_MIRROR=cn sh -s - server --token=（token） --datastore-endpoint="mysql://root:（MySQL密码）@tcp(10.150.0.1:3306)/" --node-name k3s-sh1-new --node-external-ip （公网 IP） --advertise-address 10.150.0.1 --node-ip 10.150.0.1 --flannel-iface tun0 --disable traefik

agent（以 k3s-sh-tx-3 为例）：

curl -sfL https://rancher-mirror.rancher.cn/k3s/k3s-install.sh | INSTALL_K3S_VERSION=v1.27.9+k3s1 INSTALL_K3S_MIRROR=cn sh -s - agent --server https://10.150.0.1:6443 --token=（token） --node-name k3s-sh-tx-3 --node-external-ip （公网 IP） --node-ip 10.150.0.45 --flannel-iface tun0

预期结果

在任意 agent 节点上，有

# curl http://127.0.0.1
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>

实际结果

仅 server 节点能产生上面的输出，所有 agent 节点均 timeout。

排查步骤

基于以下几点：

80/443 以外的端口，各节点直接都能相互连通
server 的 web 服务一直都没问题
agent 的 web 服务在加入新节点之前也没问题

排除 ovpn 层的问题，排除 ingress-nginx 开始所有之后链路的问题，考虑 klipper、flannel、k3s 本身的问题。

k describe service ingress-nginx-controller --namespace=ingress-nginx 的输出：

Name:                     ingress-nginx-controller
Namespace:                ingress-nginx
Labels:                   app=ingress-nginx
                          app.kubernetes.io/component=controller
                          app.kubernetes.io/instance=ingress-nginx
                          app.kubernetes.io/managed-by=Helm
                          app.kubernetes.io/name=ingress-nginx
                          app.kubernetes.io/part-of=ingress-nginx
                          app.kubernetes.io/version=1.7.0
                          helm.sh/chart=ingress-nginx-4.6.0
Annotations:              meta.helm.sh/release-name: ingress-nginx
                          meta.helm.sh/release-namespace: ingress-nginx
Selector:                 app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.43.102.77
IPs:                      10.43.102.77
LoadBalancer Ingress:     （所有机器的公网 IP）
Port:                     http  80/TCP
TargetPort:               http/TCP
NodePort:                 http  32696/TCP
Endpoints:                10.42.0.58:80
Port:                     https  443/TCP
TargetPort:               https/TCP
NodePort:                 https  30240/TCP
Endpoints:                10.42.0.58:443
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>

根据 NodePort 访问 127.0.0.1:32696 结果相同，仍然只有 server 能返回 404。

但在 agent 节点上尝试重启 klipper、重启 k3s-agent，问题均未解决。重启 k3s server 可能能解决问题，但我目前希望先找找不重启 server 就能解决问题的方法。

看过 iptables -L 但也没有什么头绪，只注意到 server 的 iptables 长度比 agent 多了近一倍，不知道是预期行为还是不太正常。

集群上有运行 Rancher，但四处看了看也没看到什么地方可能有问题。

由于出问题的地方卡在 ingress-nginx 之前，所以常用的容器内网络排查手段基本都起不到作用，目前想不到什么好的办法可以排查问题。如果有什么办法的话还请指教，在此先行致谢。如果有缺什么日志的话我都会补上。

日志

在 k3s-sh-ali-3（agent）上重新启动 klipper 的容器 log 全文：

+ trap exit TERM INT
+ BIN_DIR=/sbin
+ check_iptables_mode
+ set +e
+ lsmod
+ grep nf_tables
+ '[' 0 '=' 0 ]
+ mode=nft
+ set -e
+ info 'nft mode detected'
+ echo '[INFO] ' 'nft mode detected'
+ set_nft
+ ln -sf /sbin/xtables-nft-multi /sbin/iptables
nf_tables             266240 858 nft_limit,nft_chain_nat,nft_compat,nft_counter
nfnetlink              20480  5 nfnetlink_log,nf_conntrack_netlink,ip_set,nft_compat,nf_tables
libcrc32c              16384  5 nf_nat,nf_conntrack,nf_tables,btrfs,raid456
[INFO]  nft mode detected
+ ln -sf /sbin/xtables-nft-multi /sbin/iptables-save
+ ln -sf /sbin/xtables-nft-multi /sbin/iptables-restore
+ ln -sf /sbin/xtables-nft-multi /sbin/ip6tables
+ start_proxy
+ echo 0.0.0.0/0
+ grep -Eq :
+ iptables -t filter -I FORWARD -s 0.0.0.0/0 -p TCP --dport 80 -j ACCEPT
+ echo 10.43.102.77
+ grep -Eq :
+ cat /proc/sys/net/ipv4/ip_forward
+ '[' 1 '==' 1 ]
+ iptables -t filter -A FORWARD -d 10.43.102.77/32 -p TCP --dport 80 -j DROP
+ iptables -t nat -I PREROUTING -p TCP --dport 80 -j DNAT --to 10.43.102.77:80
+ iptables -t nat -I POSTROUTING -d 10.43.102.77/32 -p TCP -j MASQUERADE
+ '[' '!' -e /pause ]
+ mkfifo /pause

重新启动 k3s-agent 的 journal 全文：

May 07 04:28:24 sh-ali-3 systemd[1]: Starting Lightweight Kubernetes...
░░ Subject: A start job for unit k3s-agent.service has begun execution
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░
░░ A start job for unit k3s-agent.service has begun execution.
░░
░░ The job identifier is 11168.
May 07 04:28:24 sh-ali-3 sh[299968]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
May 07 04:28:24 sh-ali-3 systemd[1]: k3s-agent.service: Found left-over process 287014 (containerd-shim) in control group while starting unit. Ignoring.
May 07 04:28:24 sh-ali-3 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
May 07 04:28:24 sh-ali-3 k3s[299972]: time="2024-05-07T04:28:24+08:00" level=info msg="Starting k3s agent v1.27.9+k3s1 (2c249a39)"
May 07 04:28:24 sh-ali-3 k3s[299972]: time="2024-05-07T04:28:24+08:00" level=info msg="Adding server to load balancer k3s-agent-load-balancer: 10.150.0.1:6443"
May 07 04:28:24 sh-ali-3 k3s[299972]: time="2024-05-07T04:28:24+08:00" level=info msg="Running load balancer k3s-agent-load-balancer 127.0.0.1:6444 -> [10.150.0.1:6443] [default: 10.150.0.1:6443]"
May 07 04:28:24 sh-ali-3 k3s[299972]: time="2024-05-07T04:28:24+08:00" level=warning msg="Cluster CA certificate is not trusted by the host CA bundle, but the token does not include a CA hash. Use the full token from the server's node-token file to enable Cluster CA validation."
May 07 04:28:26 sh-ali-3 k3s[299972]: time="2024-05-07T04:28:26+08:00" level=info msg="Module overlay was already loaded"
May 07 04:28:26 sh-ali-3 k3s[299972]: time="2024-05-07T04:28:26+08:00" level=info msg="Module nf_conntrack was already loaded"
May 07 04:28:26 sh-ali-3 k3s[299972]: time="2024-05-07T04:28:26+08:00" level=info msg="Module br_netfilter was already loaded"
May 07 04:28:26 sh-ali-3 k3s[299972]: time="2024-05-07T04:28:26+08:00" level=info msg="Module iptable_nat was already loaded"
May 07 04:28:26 sh-ali-3 k3s[299972]: time="2024-05-07T04:28:26+08:00" level=info msg="Module iptable_filter was already loaded"
May 07 04:28:26 sh-ali-3 k3s[299972]: time="2024-05-07T04:28:26+08:00" level=info msg="Logging containerd to /var/lib/rancher/k3s/agent/containerd/containerd.log"
May 07 04:28:26 sh-ali-3 k3s[299972]: time="2024-05-07T04:28:26+08:00" level=info msg="Running containerd -c /var/lib/rancher/k3s/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/k3s/agent/containerd"
May 07 04:28:27 sh-ali-3 k3s[299972]: time="2024-05-07T04:28:27+08:00" level=info msg="containerd is now running"
May 07 04:28:27 sh-ali-3 k3s[299972]: time="2024-05-07T04:28:27+08:00" level=info msg="Getting list of apiserver endpoints from server"
May 07 04:28:27 sh-ali-3 k3s[299972]: time="2024-05-07T04:28:27+08:00" level=info msg="Updated load balancer k3s-agent-load-balancer default server address -> 10.150.0.1:6443"
May 07 04:28:27 sh-ali-3 k3s[299972]: time="2024-05-07T04:28:27+08:00" level=info msg="Connecting to proxy" url="wss://10.150.0.1:6443/v1-k3s/connect"
May 07 04:28:27 sh-ali-3 k3s[299972]: time="2024-05-07T04:28:27+08:00" level=info msg="Running kubelet --address=0.0.0.0 --allowed-unsafe-sysctls=net.ipv4.ip_forward,net.ipv6.conf.all.forwarding --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=systemd --client-ca-file=/var/lib/rancher/k3s/agent/client-ca.crt --cloud-provider=external --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock --containerd=/run/k3s/containerd/containerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --feature-gates=CloudDualStackNodeIPs=true --healthz-bind-address=127.0.0.1 --hostname-override=k3s-sh-ali-3 --kubeconfig=/var/lib/rancher/k3s/agent/kubelet.kubeconfig --node-ip=10.150.0.61 --node-labels= --pod-infra-container-image=rancher/mirrored-pause:3.6 --pod-manifest-path=/var/lib/rancher/k3s/agent/pod-manifests --read-only-port=0 --resolv-conf=/run/systemd/resolve/resolv.conf --serialize-image-pulls=false --tls-cert-file=/var/lib/rancher/k3s/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/k3s/agent/serving-kubelet.key"
May 07 04:28:27 sh-ali-3 k3s[299972]: Flag --cloud-provider has been deprecated, will be removed in 1.25 or later, in favor of removing cloud provider code from Kubelet.
May 07 04:28:27 sh-ali-3 k3s[299972]: Flag --containerd has been deprecated, This is a cadvisor flag that was mistakenly registered with the Kubelet. Due to legacy concerns, it will follow the standard CLI deprecation timeline before being removed.
May 07 04:28:27 sh-ali-3 k3s[299972]: Flag --pod-infra-container-image has been deprecated, will be removed in a future release. Image garbage collector will get sandbox image information from CRI.
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.791764  299972 server.go:198] "--pod-infra-container-image will not be pruned by the image garbage collector in kubelet and should also be set in the remote runtime"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.794560  299972 server.go:410] "Kubelet version" kubeletVersion="v1.27.9+k3s1"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.795072  299972 server.go:412] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.803072  299972 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/var/lib/rancher/k3s/agent/client-ca.crt"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.803857  299972 server.go:657] "--cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.804819  299972 container_manager_linux.go:265] "Container manager verified user specified cgroup-root exists" cgroupRoot=[]
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.805357  299972 container_manager_linux.go:270] "Creating Container Manager object based on Node Config" nodeConfig={RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: KubeletOOMScoreAdj:-999 ContainerRuntime: CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:systemd KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: ReservedSystemCPUs: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:imagefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>}]} QOSReserved:map[] CPUManagerPolicy:none CPUManagerPolicyOptions:map[] TopologyManagerScope:container CPUManagerReconcilePeriod:10s ExperimentalMemoryManagerPolicy:None ExperimentalMemoryManagerReservedMemory:[] PodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms TopologyManagerPolicy:none ExperimentalTopologyManagerPolicyOptions:map[]}
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.805843  299972 topology_manager.go:136] "Creating topology manager with policy per scope" topologyPolicyName="none" topologyScopeName="container"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.806178  299972 container_manager_linux.go:301] "Creating device plugin manager"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.806505  299972 state_mem.go:36] "Initialized new in-memory state store"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.810782  299972 kubelet.go:405] "Attempting to sync node with API server"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.810809  299972 kubelet.go:298] "Adding static pod path" path="/var/lib/rancher/k3s/agent/pod-manifests"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.810843  299972 kubelet.go:309] "Adding apiserver pod source"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.810865  299972 apiserver.go:42] "Waiting for node sync before watching apiserver pods"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.812460  299972 kuberuntime_manager.go:257] "Container runtime initialized" containerRuntime="containerd" version="v1.7.11-k3s2.27" apiVersion="v1"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.813388  299972 server.go:1163] "Started kubelet"
May 07 04:28:27 sh-ali-3 k3s[299972]: time="2024-05-07T04:28:27+08:00" level=info msg="Annotations and labels have already set on node: k3s-sh-ali-3"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.816044  299972 fs_resource_analyzer.go:67] "Starting FS ResourceAnalyzer"
May 07 04:28:27 sh-ali-3 k3s[299972]: E0507 04:28:27.818733  299972 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="unable to find data in memory cache" mountpoint="/var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs"
May 07 04:28:27 sh-ali-3 k3s[299972]: E0507 04:28:27.819186  299972 kubelet.go:1400] "Image garbage collection failed once. Stats initialization may not have completed yet" err="invalid capacity 0 on image filesystem"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.825159  299972 server.go:162] "Starting to listen" address="0.0.0.0" port=10250
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.826733  299972 volume_manager.go:284] "Starting Kubelet Volume Manager"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.827234  299972 server.go:461] "Adding debug handlers to kubelet server"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.829076  299972 ratelimit.go:65] "Setting rate limiting for podresources endpoint" qps=100 burstTokens=10
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.830763  299972 desired_state_of_world_populator.go:145] "Desired state populator starts to run"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.834627  299972 kubelet_network_linux.go:63] "Initialized iptables rules." protocol=IPv4
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.836823  299972 kubelet_network_linux.go:63] "Initialized iptables rules." protocol=IPv6
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.836880  299972 status_manager.go:207] "Starting to sync pod status with apiserver"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.836909  299972 kubelet.go:2257] "Starting kubelet main sync loop"
May 07 04:28:27 sh-ali-3 k3s[299972]: E0507 04:28:27.836999  299972 kubelet.go:2281] "Skipping pod synchronization" err="[container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]"
May 07 04:28:27 sh-ali-3 k3s[299972]: time="2024-05-07T04:28:27+08:00" level=info msg="Starting flannel with backend vxlan"
May 07 04:28:27 sh-ali-3 k3s[299972]: time="2024-05-07T04:28:27+08:00" level=info msg="Flannel found PodCIDR assigned for node k3s-sh-ali-3"
May 07 04:28:27 sh-ali-3 k3s[299972]: time="2024-05-07T04:28:27+08:00" level=info msg="The interface tun0 with ipv4 address 10.150.0.61 will be used by flannel"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.899150  299972 kube.go:145] Waiting 10m0s for node controller to sync
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.901348  299972 kube.go:489] Starting kube subnet manager
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.910909  299972 cpu_manager.go:214] "Starting CPU manager" policy="none"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.911321  299972 cpu_manager.go:215] "Reconciling" reconcilePeriod="10s"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.911712  299972 state_mem.go:36] "Initialized new in-memory state store"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.912293  299972 state_mem.go:88] "Updated default CPUSet" cpuSet=""
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.912791  299972 state_mem.go:96] "Updated CPUSet assignments" assignments=map[]
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.913061  299972 policy_none.go:49] "None policy: Start"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.915077  299972 memory_manager.go:169] "Starting memorymanager" policy="None"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.915124  299972 state_mem.go:35] "Initializing new in-memory state store"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.915438  299972 state_mem.go:75] "Updated machine memory state"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.920669  299972 manager.go:471] "Failed to read data from checkpoint" checkpoint="kubelet_internal_checkpoint" err="checkpoint is not found"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.923338  299972 plugin_manager.go:118] "Starting Kubelet Plugin Manager"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.927915  299972 kubelet_node_status.go:70] "Attempting to register node" node="k3s-sh-ali-3"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.944304  299972 kube.go:510] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.42.1.0/24]
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.944383  299972 kube.go:510] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.42.2.0/24]
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.944396  299972 kube.go:510] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.42.7.0/24]
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.944424  299972 kube.go:510] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.42.4.0/24]
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.944434  299972 kube.go:510] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.42.0.0/24]
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.944442  299972 kube.go:510] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.42.3.0/24]
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.954202  299972 kubelet_node_status.go:108] "Node was previously registered" node="k3s-sh-ali-3"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.954368  299972 kubelet_node_status.go:73] "Successfully registered node" node="k3s-sh-ali-3"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.972785  299972 kuberuntime_manager.go:1460] "Updating runtime config through cri with podcidr" CIDR="10.42.1.0/24"
May 07 04:28:27 sh-ali-3 k3s[299972]: I0507 04:28:27.973758  299972 kubelet_network.go:61] "Updating Pod CIDR" originalPodCIDR="" newPodCIDR="10.42.1.0/24"
May 07 04:28:28 sh-ali-3 k3s[299972]: time="2024-05-07T04:28:28+08:00" level=info msg="Starting the netpol controller version v2.0.0-20230925161250-364f994b140b, built on 2023-12-27T15:00:35Z, go1.20.12"
May 07 04:28:28 sh-ali-3 k3s[299972]: time="2024-05-07T04:28:28+08:00" level=info msg="k3s agent is up and running"
May 07 04:28:28 sh-ali-3 systemd[1]: Started Lightweight Kubernetes.
░░ Subject: A start job for unit k3s-agent.service has finished successfully
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░
░░ A start job for unit k3s-agent.service has finished successfully.
░░
░░ The job identifier is 11168.
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.018872  299972 network_policy_controller.go:164] Starting network policy controller
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.056334  299972 network_policy_controller.go:176] Starting network policy controller full sync goroutine
May 07 04:28:28 sh-ali-3 k3s[299972]: time="2024-05-07T04:28:28+08:00" level=info msg="Running kube-proxy --cluster-cidr=10.42.0.0/16 --conntrack-max-per-core=0 --conntrack-tcp-timeout-close-wait=0s --conntrack-tcp-timeout-established=0s --healthz-bind-address=127.0.0.1 --hostname-override=k3s-sh-ali-3 --kubeconfig=/var/lib/rancher/k3s/agent/kubeproxy.kubeconfig --proxy-mode=iptables"
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.338067  299972 server.go:226] "Warning, all flags other than --config, --write-config-to, and --cleanup are deprecated, please begin using a config file ASAP"
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.367849  299972 node.go:141] Successfully retrieved node IP: 10.150.0.61
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.367909  299972 server_others.go:110] "Detected node IP" address="10.150.0.61"
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.374106  299972 server_others.go:192] "Using iptables Proxier"
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.374142  299972 server_others.go:199] "kube-proxy running in dual-stack mode" ipFamily=IPv4
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.374154  299972 server_others.go:200] "Creating dualStackProxier for iptables"
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.374187  299972 server_others.go:484] "Detect-local-mode set to ClusterCIDR, but no IPv6 cluster CIDR defined, defaulting to no-op detect-local for IPv6"
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.374220  299972 proxier.go:253] "Setting route_localnet=1 to allow node-ports on localhost; to change this either disable iptables.localhostNodePorts (--iptables-localhost-nodeports) or set nodePortAddresses (--nodeport-addresses) to filter loopback addresses"
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.375159  299972 server.go:658] "Version info" version="v1.27.9+k3s1"
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.375211  299972 server.go:660] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.378127  299972 config.go:188] "Starting service config controller"
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.378173  299972 shared_informer.go:311] Waiting for caches to sync for service config
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.378208  299972 config.go:97] "Starting endpoint slice config controller"
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.378215  299972 shared_informer.go:311] Waiting for caches to sync for endpoint slice config
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.379184  299972 config.go:315] "Starting node config controller"
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.379225  299972 shared_informer.go:311] Waiting for caches to sync for node config
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.478791  299972 shared_informer.go:318] Caches are synced for service config
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.478791  299972 shared_informer.go:318] Caches are synced for endpoint slice config
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.479284  299972 shared_informer.go:318] Caches are synced for node config
May 07 04:28:28 sh-ali-3 k3s[299972]: time="2024-05-07T04:28:28+08:00" level=info msg="Tunnel authorizer set Kubelet Port 10250"
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.812187  299972 apiserver.go:52] "Watching apiserver"
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.824903  299972 topology_manager.go:212] "Topology Admit Handler" podUID=8323c327-8d8e-4435-8254-f7357cd7fd49 podNamespace="kube-system" podName="svclb-ingress-nginx-controller-1146ef43-7k9lq"
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.831094  299972 desired_state_of_world_populator.go:153] "Finished populating initial desired state of world"
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.835205  299972 reconciler.go:41] "Reconciler: start to sync state"
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.899775  299972 kube.go:152] Node controller sync successful
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.899896  299972 vxlan.go:141] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
May 07 04:28:28 sh-ali-3 k3s[299972]: time="2024-05-07T04:28:28+08:00" level=info msg="Wrote flannel subnet file to /run/flannel/subnet.env"
May 07 04:28:28 sh-ali-3 k3s[299972]: time="2024-05-07T04:28:28+08:00" level=info msg="Running flannel backend."
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.945101  299972 iptables.go:290] generated 3 rules
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.945502  299972 vxlan_network.go:65] watching for new subnet leases
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.945947  299972 subnet.go:159] Batch elem [0] is { lease.Event{Type:0, Lease:lease.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xa2a0200, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:lease.LeaseAttrs{PublicIP:0xa960041, PublicIPv6:(*ip.IP6)(nil), BackendType:"vxlan", BackendData:json.RawMessage{0x7b, 0x22, 0x56, 0x4e, 0x49, 0x22, 0x3a, 0x31, 0x2c, 0x22, 0x56, 0x74, 0x65, 0x70, 0x4d, 0x41, 0x43, 0x22, 0x3a, 0x22, 0x31, 0x61, 0x3a, 0x36, 0x31, 0x3a, 0x66, 0x38, 0x3a, 0x31, 0x39, 0x3a, 0x33, 0x65, 0x3a, 0x62, 0x36, 0x22, 0x7d}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Asof:0}} }
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.946658  299972 subnet.go:159] Batch elem [0] is { lease.Event{Type:0, Lease:lease.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xa2a0700, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:lease.LeaseAttrs{PublicIP:0xa960029, PublicIPv6:(*ip.IP6)(nil), BackendType:"vxlan", BackendData:json.RawMessage{0x7b, 0x22, 0x56, 0x4e, 0x49, 0x22, 0x3a, 0x31, 0x2c, 0x22, 0x56, 0x74, 0x65, 0x70, 0x4d, 0x41, 0x43, 0x22, 0x3a, 0x22, 0x61, 0x61, 0x3a, 0x30, 0x32, 0x3a, 0x63, 0x35, 0x3a, 0x63, 0x36, 0x3a, 0x64, 0x31, 0x3a, 0x38, 0x38, 0x22, 0x7d}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Asof:0}} }
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.946733  299972 subnet.go:159] Batch elem [0] is { lease.Event{Type:0, Lease:lease.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xa2a0400, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:lease.LeaseAttrs{PublicIP:0xa96002d, PublicIPv6:(*ip.IP6)(nil), BackendType:"vxlan", BackendData:json.RawMessage{0x7b, 0x22, 0x56, 0x4e, 0x49, 0x22, 0x3a, 0x31, 0x2c, 0x22, 0x56, 0x74, 0x65, 0x70, 0x4d, 0x41, 0x43, 0x22, 0x3a, 0x22, 0x61, 0x61, 0x3a, 0x30, 0x31, 0x3a, 0x39, 0x34, 0x3a, 0x32, 0x31, 0x3a, 0x63, 0x30, 0x3a, 0x33, 0x35, 0x22, 0x7d}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Asof:0}} }
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.947333  299972 subnet.go:159] Batch elem [0] is { lease.Event{Type:0, Lease:lease.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xa2a0000, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:lease.LeaseAttrs{PublicIP:0xa960001, PublicIPv6:(*ip.IP6)(nil), BackendType:"vxlan", BackendData:json.RawMessage{0x7b, 0x22, 0x56, 0x4e, 0x49, 0x22, 0x3a, 0x31, 0x2c, 0x22, 0x56, 0x74, 0x65, 0x70, 0x4d, 0x41, 0x43, 0x22, 0x3a, 0x22, 0x65, 0x61, 0x3a, 0x65, 0x33, 0x3a, 0x31, 0x32, 0x3a, 0x38, 0x62, 0x3a, 0x66, 0x39, 0x3a, 0x37, 0x32, 0x22, 0x7d}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Asof:0}} }
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.947762  299972 subnet.go:159] Batch elem [0] is { lease.Event{Type:0, Lease:lease.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xa2a0300, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:lease.LeaseAttrs{PublicIP:0xa960035, PublicIPv6:(*ip.IP6)(nil), BackendType:"vxlan", BackendData:json.RawMessage{0x7b, 0x22, 0x56, 0x4e, 0x49, 0x22, 0x3a, 0x31, 0x2c, 0x22, 0x56, 0x74, 0x65, 0x70, 0x4d, 0x41, 0x43, 0x22, 0x3a, 0x22, 0x65, 0x61, 0x3a, 0x36, 0x61, 0x3a, 0x39, 0x31, 0x3a, 0x64, 0x31, 0x3a, 0x64, 0x30, 0x3a, 0x34, 0x35, 0x22, 0x7d}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Asof:0}} }
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.949349  299972 iptables.go:290] generated 7 rules
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.972555  299972 iptables.go:283] bootstrap done
May 07 04:28:28 sh-ali-3 k3s[299972]: I0507 04:28:28.989370  299972 iptables.go:283] bootstrap done

ilharp · 2024 年5 月 8 日 21:16

今天对 LB svc 的各类端点进行了测试，结果均为 server 可以访问而任何 agent 均无法访问。

整个集群只有一个 LB 类型的 Service：ingress-nginx/ingress-nginx-controller。

测试的 url 有：

ClusterIP： http://10.43.102.77:80
Endpoint： http://10.42.0.58:80
NodePort： http://127.0.0.1:32696

使用 curl 访问三个 url 在 server 上均正常：

# curl http://127.0.0.1:32696
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>

但在 agent 上全部 timeout。

ilharp · 2024 年5 月 15 日 07:51

由于已经想不到任何可行的解决方案，今天决定直接重启 server。执行 systemctl restart k3s 后问题解决。

不甘心……但是没有别的办法了。看样子每次加入新节点都要进行一次这样的重启。好在这样重启只会影响 flannel 而不影响 containerd，所以服务只会闪断而不会出现不可用的情况。

ilharp · 2024 年8 月 1 日 16:35

今天更新服务，发现这个问题又出现了……而我最近几个月完全没动过整个集群。

于是，现在问题变成了： 「每隔一段时间后，或集群有节点变动后，无法通过任何 agent 节点访问 web 服务」

心累……明天再具体排查下问题吧。

jacie · 2024 年8 月 21 日 04:36

从上面看，当前使用的 ingress-nginx 是 v1.7.0 版本？这个版本应该还不支持 v1.27 吧。
从支持矩阵来看，v1.7.1 才支持 v1.27。
尝试升级一下 ingress-nginx 版本试试呢？我使用 v1.27.9 版本的 K3s + v1.11.2 版本的 ingress-nginx 没有问题。

ilharp · 2024 年8 月 21 日 05:31

感谢回复（四个月终于有人回复了 >_<

ingress-nginx 版本确实已经过低了（装上以来就没更新过），之前看到 server 节点能正常连通 80/443 就排除了 ingress-nginx 的问题，现在想想过于大意了。

我晚些时候会去试试的，在此先行致谢！

jamper · 2024 年8 月 29 日 08:29

升级ingress-nginx 版本,这个问题有解决吗

ilharp · 2024 年11 月 9 日 18:42

没想到又过了三个月……终于抽出了~~半夜两点的~~一点时间来回复一下。

刚才花了三四个小时的时间把 ingress-nginx 升级到了最新的 v1.11.3 （ingress-nginx 的 helm chart 真是破烂不堪，很多写在文档里的配置项都不起作用），又各种检查了一番，目前发现了这几件事：

1. 不仅 agent 的 127.0.0.1 无法访问，在 agent 上跑的 Pod 本身就无法访问

也就是说，跑在 agent 上的 Pod 本身就无法在集群内部被访问到；换句话说跑在 agent 上的负载一直都是无法访问的。这直接导致了我的几个域（下文用 目标域名.com 代替）半年以来都完全没在正常工作，所幸好像根本没人在使用这些服务（汗

浏览器里直接访问 https://目标域名.com，等待一分钟超时后 Nginx 给出 504 Gateway Timeout 报错，同时 ingress-nginx 日志中出现了如下内容：

2024/11/09 17:56:25 [error] 26#26: *1486 upstream timed out (110: Operation timed out) while connecting to upstream, client: 10.42.0.106, server: 目标域名.com, request: "GET /favicon.ico HTTP/2.0", upstream: "http://10.42.4.5:80/favicon.ico", host: "目标域名.com", referrer: "https://目标域名.com/"
2024/11/09 17:56:30 [error] 26#26: *1486 upstream timed out (110: Operation timed out) while connecting to upstream, client: 10.42.0.106, server: 目标域名.com, request: "GET /favicon.ico HTTP/2.0", upstream: "http://10.42.7.70:80/favicon.ico", host: "目标域名.com", referrer: "https://目标域名.com/"
2024/11/09 17:56:35 [error] 26#26: *1486 upstream timed out (110: Operation timed out) while connecting to upstream, client: 10.42.0.106, server: 目标域名.com, request: "GET /favicon.ico HTTP/2.0", upstream: "http://10.42.4.5:80/favicon.ico", host: "目标域名.com", referrer: "https://目标域名.com/"
10.42.0.106 - - [09/Nov/2024:17:56:35 +0000] "GET /favicon.ico HTTP/2.0" 504 562 "https://目标域名.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0" 61 15.002 [sh1-legacy-maven-80] [] 10.42.4.5:80, 10.42.7.70:80, 10.42.4.5:80 0, 0, 0 5.001, 5.000, 5.001 504, 504, 504 3b52ec2137de7f8d41d5f2da12042775

可以看到 ingress-nginx 尝试访问这个服务的两个 Pod 的 IP（10.42.4.5:80 和 10.42.7.70:80）均超时，报错

upstream timed out (110: Operation timed out)

。这似乎说明：

ingress-nginx 的嫌疑应当可以排除
集群内部的 Pod 相互访问已经出现了问题

感觉嫌疑重新回到了 klipper、flannel、k3s 本身身上。

2. 并非所有 agent 都存在此问题

回顾一下整个集群：

k3s-sh1-new（server
k3s-sh-tx-2
k3s-sh-tx-3
k3s-sh-ali-3（新加入的两个节点
k3s-sh-ali-4

这次排查一遍后发现，新加入的 k3s-sh-ali-3 和 k3s-sh-ali-4 既然是没有这个问题的。也就是说只有 k3s-sh-tx-2 k3s-sh-tx-3 两台机器存在此问题。

这真的很怪，因为所有 agent 都是使用完全相同的脚本初始化的……

这样一来，能够使用控制变量法排查的因素就变成了下面这些：

a. 我复制粘贴脚本时手滑了，几台 agent 的初始化脚本并不相同

这个是最有可能的原因了，可初始化脚本都一模一样的，k3s 的安装也只有一行命令，这里出问题的概率实在不高

b. 老 agent（570 天）有问题，新 agent（192 天）没问题

网络连通性会跟 agent 持续运行的天数相关吗……

c. 特定系统版本/内核版本会导致问题

但包括 server 在内的所有系统大版本都是相同的（22.04 + 5.15），不同的小版本导致问题的几率应该很小……

e. 两台腾讯云有问题，两台阿里云没问题

出问题的原因会跟腾讯云有关系吗，有点难以想象……

所以这么看下来，由于节点环境不同导致问题的概率很小，问题更有可能还是出在 klipper、flannel、k3s 身上……

3. 使用 `systemctl restart k3s` 重启 server 的 k3s 服务不再能修复此问题

由于有部分服务已经无法被公网访问，我直接决定先进行 systemctl restart k3s，希望能够暂时缓解这个问题。然而 k3s server 重启后问题仍然存在……这意味着我目前没有任何临时解决此问题的方法，只能把需要的 Pod 调度到阿里云的两个新节点上了。

感觉问题越来越朝着玄学的方向发展了……难顶……

jamper · 2024 年11 月 28 日 06:34

我问题节点上执行 iptables -F && iptables -X && iptables -Z && iptables -F -t nat && iptables -X -t nat && iptables -Z -t nat && docker restart kube-proxy 重启ingress-nginx后，问题解决了，我不清楚是什么原因导致的，但后面未出现此问题。

lchuanqi · 2024 年12 月 2 日 13:55

使用的腾讯轻应用服务器？
如果是的话，大概率是iptables的问题。腾讯轻应用服务器会设置大量iptables策略，阿里轻应用没有这种情况。

ilharp · 2024 年12 月 5 日 20:39

问题解决。

TL; DR

问题在升级了 k3s （或者说升级了 Flannel CNI）后出现。遇到此问题后，使用下面两个中的 任意一个 操作即可永久解决此问题。

iptables -t nat -D CNI-HOSTPORT-DNAT 1
删掉故障节点上的 ingress pod（对我的集群而言是 svclb-ingress-nginx-controller

关联 Issue

github.com/rancher/rke

Kube-proxy doesn't remove stale CNI-HOSTPORT-DNAT rule after Kubernetes upgrade to 1.26

opened 03:44PM - 17 Nov 23 UTC

raelix

**RKE version:** v1.4.6 **Docker version: (`docker version`,`docker info` …preferred)** 20.10.24 **Operating system and kernel: (`cat /etc/os-release`, `uname -r` preferred)** NAME="Red Hat Enterprise Linux" VERSION="8.6 (Ootpa)" **Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)** Openstack **cluster.yml file:** ```yaml nodes: - address: 10.94.1.8 internal_address: 172.16.10.53 ssh_key_path: /home/user/.ssh/id_rsa user: user role: - controlplane - etcd - worker ignore_docker_version: true enable_cri_dockerd: true cluster_name: mycluster kubernetes_version: v1.26.4-rancher2-1 network: plugin: flannel ingress: provider: nginx ``` **Steps to Reproduce:** Source versions -> rke: v1.4.3 - kubernetes_version: v1.23.10-rancher1-1 Dest versions -> rke: v.1.4.6 - kubernetes_version: v1.26.4-rancher2-1 Trying to upgrade Kubernetes with RKE from v1.23.10 to v1.26.4 I was not able anymore to reach my ingresses through the ```nginx-ingress-controller``` which listens on hostPort 80 and 443. I investigated further and I found that the ```CNI-HOSTPORT-DNAT``` Chain had still the old entry. ### Before the upgrade: ```shell [root@rancher user]# iptables -t nat -L CNI-HOSTPORT-DNAT --line-numbers Chain CNI-HOSTPORT-DNAT (2 references) num target prot opt source destination 1 CNI-DN-ff3905f57536228de6b29 tcp -- anywhere anywhere /* dnat name: "cbr0" id: "e96c642e169acf789be84e8fbcd0e5c3da1a53c8e8a459227c06bdf423deb482" */ multiport dports http,https [root@rancher user]# iptables -t nat -L CNI-DN-ff3905f57536228de6b29 --line-numbers Chain CNI-DN-ff3905f57536228de6b29 (1 references) num target prot opt source destination 1 CNI-HOSTPORT-SETMARK tcp -- rancher.internal.com/24 anywhere tcp dpt:http 2 CNI-HOSTPORT-SETMARK tcp -- localhost anywhere tcp dpt:http 3 DNAT tcp -- anywhere anywhere tcp dpt:http to:10.42.0.7:80 4 CNI-HOSTPORT-SETMARK tcp -- rancher.internal.com/24 anywhere tcp dpt:https 5 CNI-HOSTPORT-SETMARK tcp -- localhost anywhere tcp dpt:https 6 DNAT tcp -- anywhere anywhere tcp dpt:https to:10.42.0.7:443 ``` This looks good. ### After the upgrade: ```shell [root@rancher user]# iptables -t nat -L CNI-HOSTPORT-DNAT --line-numbers Chain CNI-HOSTPORT-DNAT (2 references) num target prot opt source destination 1 CNI-DN-ff3905f57536228de6b29 tcp -- anywhere anywhere /* dnat name: "cbr0" id: "e96c642e169acf789be84e8fbcd0e5c3da1a53c8e8a459227c06bdf423deb482" */ multiport dports http,https 2 CNI-DN-4c3eba344b3e2fffe3698 tcp -- anywhere anywhere /* dnat name: "cbr0" id: "aa3202e02a9fefbc97400df0685d864bc3894a580d4b2069542621371e1cfde8" */ multiport dports http,https ``` The second entry is the right one which points to the new pod IP but the first one should not be there. Looks like ```kube-proxy``` doesn't delete the old entry making it impossible to access the ingresses. As workaround I had to reboot the server or delete manually the entry: ```iptables -t nat -D CNI-HOSTPORT-DNAT 1``` **Results:** After the upgrade can't talk to the ingress controller listening on hostPort.

（是的，k3s 似乎没人提 issue 但在 rke 里有一个）

github.com/containernetworking/plugins

iptables rule CNI-HOSTPORT-DNAT never come back after restart firewalld.service

opened 11:46AM - 16 Jul 20 UTC

closed 02:14AM - 19 Apr 22 UTC

mzyfree

Stale

**What happened**: I have run kubernetes cluster while enable and running the firewalld.service on three node. ``` [root@node85 ~]# kubectl get no -owide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME node85 Ready master 8d v1.15.6 10.99.212.85 <none> H3Linux Release 1.1.1 3.10.0-957.21.3.el7.x86_64 docker://18.9.6 node86 Ready master 8d v1.15.6 10.99.212.86 <none> H3Linux Release 1.1.1 3.10.0-957.21.3.el7.x86_64 docker://18.9.6 node87 Ready master 8d v1.15.6 10.99.212.87 <none> H3Linux Release 1.1.1 3.10.0-957.21.3.el7.x86_64 docker://18.9.6 ``` After I restart firewalld.service, it will clean up all iptables rule,including the kube-proxy iptables rules and CNI iptables rules. ``` [root@node85 ~]# iptables -nvL PREROUTING -t nat Chain PREROUTING (policy ACCEPT 3 packets, 180 bytes) pkts bytes target prot opt in out source destination 2151 182K PREROUTING_direct all -- * * 0.0.0.0/0 0.0.0.0/0 2151 182K PREROUTING_ZONES_SOURCE all -- * * 0.0.0.0/0 0.0.0.0/0 2151 182K PREROUTING_ZONES all -- * * 0.0.0.0/0 0.0.0.0/0 ``` After I wait for 30 seconds(iptables-sync-period), all iptables rules that belong to kube-proxy and CNI plugin calico are back on the node.Because both of them have watching and resetting iptables rules function. ``` [root@node85 ~]# iptables -nvL PREROUTING -t nat Chain PREROUTING (policy ACCEPT 3 packets, 180 bytes) pkts bytes target prot opt in out source destination 3427 258K cali-PREROUTING all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:6gwbT8clXdHdC1b1 */ 3429 258K KUBE-SERVICES all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes service portals */ 2151 182K PREROUTING_direct all -- * * 0.0.0.0/0 0.0.0.0/0 2151 182K PREROUTING_ZONES_SOURCE all -- * * 0.0.0.0/0 0.0.0.0/0 2151 182K PREROUTING_ZONES all -- * * 0.0.0.0/0 0.0.0.0/0 ``` But the iptables rules that belong to CNI plugin portmapping never come back util I delete the pod which use hostport on the node. **What you expected to happen**: iptables rule CNI-HOSTPORT-DNAT comes back after restart firewalld.service **How to reproduce it (as minimally and precisely as possible)**: 1.deploy k8s 2.enable and start firewalld.service **Anything else we need to know?**: **Environment**: - Kubernetes version (use `kubectl version`): ``` [root@node85 ~]# kubectl version Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.6", GitCommit:"7015f71e75f670eb9e7ebd4b5749639d42e20079", GitTreeState:"clean", BuildDate:"2019-11-13T11:20:18Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.6", GitCommit:"7015f71e75f670eb9e7ebd4b5749639d42e20079", GitTreeState:"clean", BuildDate:"2019-11-13T11:11:50Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"} You have new mail in /var/spool/mail/root ``` - Cloud provider or hardware configuration: - OS (e.g: `cat /etc/os-release`): Centos 7.0 - Kernel (e.g. `uname -a`): ``` [root@node85 ~]# uname -a Linux node85 3.10.0-957.21.3.el7.x86_64 #1 SMP Wed Jul 17 06:57:50 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux ``` - Install tools: - Network plugin and version (if this is a network-related bug): calico/portmapping... - Others:

排查历程

由于上一次排查已经没有头绪，这次就直接先看上面两位大神的建议了：

两位都将问题的根源指向了 iptables。其中 @jamper 曾经遇到了此问题，并使用清空 iptables 的方法解决了；在只用于部署 Kubernetes 的机器上这么做是没有问题的，而我的 agent 节点均属于这种情况。因此我们可以稍作修改后直接运行这位兄弟提供的命令：

iptables -F && iptables -X && iptables -Z && iptables -F -t nat && iptables -X -t nat && iptables -Z -t nat && systemctl restart k3s-agent

直接 systemctl restart k3s-agent 的原因是 k3s 环境下 kube-proxy 就在 k3s 进程内，直接重启整个 k3s 即可。

在运行之前，我留了个心眼，把清空前的完整 iptables 给 dump 了出来：

iptables -L --line-numbers > filter.txt
iptables -L -tnat --line-numbers > nat.txt

命令执行后问题解决。

此时，我们再次 dump 一遍 iptables：

iptables -L --line-numbers > filter-new.txt
iptables -L -tnat --line-numbers > nat-new.txt

然后扔进 VSCode 里做差：

这下一眼就能看出来了。加以推测，我们可以判定：

在某次 k3s 升级后，k3s 并未删除旧版本的 CNI-HOSTPORT-DNAT 相关规则，导致了上面的问题。

将关键词 CNI-HOSTPORT-DNAT 扔进搜索引擎搜索，很轻松就找到了相关的 issue（虽然是 rke 的，但都是 SUSE 产，也就没差了！），也轻松地找到了标准的解决方法。

CNI-HOSTPORT-DNAT 是一个旧版本 CNI Port Mapping Plugin 所使用的规则，升级到新版本 k3s 后，此规则已经失效，但 k3s 并不会主动删除此规则，最终导致无法连接上 ingress pod。手动删除此规则可解决问题。重启 ingress pod 是否可以解决此问题我没试过，但据 issue 所说可以。

结算

从发帖到彻底解决问题，共经历了： 213 天
本次的问题并非腾讯云导致 ，但 iptables 里真的确实存在腾讯云写入的一些无用规则，本次清空 iptables 一并清了
问题不是玄学导致的。 还好最终能够获得一个非常清晰明确的结果……如果真是稀里糊涂出 bug，又稀里糊涂解决的话，就真的是在浪费人生了；这次解决问题虽然花了大半年，但在解决问题的途中收获了不少知识（尤其是现代 kube-proxy 和 CNI 利用 iptables 工作的原理），还是有意义的。人生没有浪费！

另外感谢上面所有回帖的 @jacie @jamper @lchuanqi 三位以及有在关注这个小小问题的其他坛友，尤其感谢 @jamper @lchuanqi 两位指出问题根源在 iptables，谢谢大家！