环境信息:
K3s 版本: v1.21.14+k3s1
k3s version v1.21.14+k3s1 (982252d7)
go version go1.16.10
节点 CPU 架构、操作系统和版本::
Linux test 4.19.27 #1 SMP Wed Oct 21 16:07:58 CST 2020 x86_64 x86_64 x86_64 GNU/Linux
集群配置:
1server 4agent
问题描述:
在test节点二进制部署k3s agent 当k3s agent进程启动后test节点和主节点的网络就会断开
复现步骤:
- 安装 K3s 的命令:
nohup ./k3s agent --server https://192.168.10.11:6443 --token passwd–kube-proxy-arg proxy-mode=ipvs --docker &
预期结果:
节点 ready
实际结果:
k3s agent进程启动后 test节点与主节点网络断开
telnet 、icmp都不可达
重启test节点后恢复正常
附加上下文/日志:
日志
E1129 18:12:01.541340 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:01.642158 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:01.742370 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:01.843371 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:01.903425 15084 kubelet.go:1870] "Skipping pod synchronization" err="container runtime status check may not have completed yet"
E1129 18:12:01.943866 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:02.044678 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:02.145608 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:02.246322 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:02.346519 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:02.447529 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:02.549245 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:02.650232 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:02.651440 15084 csi_plugin.go:295] Failed to initialize CSINode: error updating CSINode annotation: timed out waiting for the condition; caused by: nodes "rscu001001" not found
E1129 18:12:02.750353 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:02.851112 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:02.951634 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:03.052200 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:03.152796 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:03.253610 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:03.354346 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:03.455023 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:03.503658 15084 kubelet.go:1870] "Skipping pod synchronization" err="container runtime status check may not have completed yet"
E1129 18:12:03.506692 15084 node.go:161] Failed to retrieve node info: nodes "rscu001001" not found
E1129 18:12:03.556020 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:03.656934 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:03.757344 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:03.857939 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:03.958343 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:04.058857 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:04.159264 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:04.247870 15084 csi_plugin.go:295] Failed to initialize CSINode: error updating CSINode annotation: timed out waiting for the condition; caused by: nodes "rscu001001" not found
E1129 18:12:04.259894 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:04.360355 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:04.461444 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:04.562068 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:04.662552 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:04.763435 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:04.863604 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:04.964492 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:05.065435 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:05.165768 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:05.266306 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
I1129 18:12:05.290050 15084 kubelet_node_status.go:71] "Attempting to register node" node="rscu001001"
I1129 18:12:05.291476 15084 cpu_manager.go:199] "Starting CPU manager" policy="none"
I1129 18:12:05.291517 15084 cpu_manager.go:200] "Reconciling" reconcilePeriod="10s"
I1129 18:12:05.291542 15084 state_mem.go:36] "Initialized new in-memory state store"
I1129 18:12:05.292296 15084 state_mem.go:88] "Updated default CPUSet" cpuSet=""
I1129 18:12:05.292354 15084 state_mem.go:96] "Updated CPUSet assignments" assignments=map[]
I1129 18:12:05.292366 15084 policy_none.go:44] "None policy: Start"
I1129 18:12:05.297928 15084 manager.go:602] "Failed to read data from checkpoint" checkpoint="kubelet_internal_checkpoint" err="checkpoint is not found"
I1129 18:12:05.297979 15084 container_manager_linux.go:1015] "CPUAccounting not enabled for process" pid=14181
I1129 18:12:05.298025 15084 container_manager_linux.go:1018] "MemoryAccounting not enabled for process" pid=14181
I1129 18:12:05.298275 15084 container_manager_linux.go:1015] "CPUAccounting not enabled for process" pid=15084
I1129 18:12:05.298299 15084 container_manager_linux.go:1018] "MemoryAccounting not enabled for process" pid=15084
E1129 18:12:05.298300 15084 eviction_manager.go:255] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"rscu001001\" not found"
I1129 18:12:05.298523 15084 plugin_manager.go:114] "Starting Kubelet Plugin Manager"
E1129 18:12:05.367082 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:05.467593 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
E1129 18:12:05.567715 15084 kubelet.go:2291] "Error getting node" err="node \"rscu001001\" not found"
I1129 18:12:05.651705 15084 kubelet_node_status.go:74] "Successfully registered node" node="rscu001001"
I1129 18:12:05.768778 15084 kuberuntime_manager.go:1044] "Updating runtime config through cri with podcidr" CIDR="10.42.3.0/24"
I1129 18:12:05.769229 15084 docker_service.go:363] "Docker cri received runtime config" runtimeConfig="&RuntimeConfig{NetworkConfig:&NetworkConfig{PodCidr:10.42.3.0/24,},}"
I1129 18:12:05.769405 15084 kubelet_network.go:76] "Updating Pod CIDR" originalPodCIDR="" newPodCIDR="10.42.3.0/24"
time="2023-11-29T18:12:05.786084128+08:00" level=info msg="Failed to update node rscu001001: Operation cannot be fulfilled on nodes \"rscu001001\": the object has been modified; please apply your changes to the latest version and try again"
time="2023-11-29T18:12:05.804022082+08:00" level=info msg="Failed to update node rscu001001: Operation cannot be fulfilled on nodes \"rscu001001\": the object has been modified; please apply your changes to the latest version and try again"
time="2023-11-29T18:12:05.820562475+08:00" level=info msg="Failed to update node rscu001001: Operation cannot be fulfilled on nodes \"rscu001001\": the object has been modified; please apply your changes to the latest version and try again"
time="2023-11-29T18:12:05.839131616+08:00" level=info msg="Failed to update node rscu001001: Operation cannot be fulfilled on nodes \"rscu001001\": the object has been modified; please apply your changes to the latest version and try again"
time="2023-11-29T18:12:05.865678484+08:00" level=info msg="labels have been set successfully on node: rscu001001"
time="2023-11-29T18:12:05.865811602+08:00" level=info msg="Starting flannel with backend vxlan"
time="2023-11-29T18:12:05.886758437+08:00" level=info msg="Flannel found PodCIDR assigned for node rscu001001"
time="2023-11-29T18:12:05.887704459+08:00" level=info msg="The interface eth0 with ipv4 address 192.168.46.6 will be used by flannel"
I1129 18:12:05.890410 15084 kube.go:120] Waiting 10m0s for node controller to sync
I1129 18:12:05.890454 15084 kube.go:378] Starting kube subnet manager
time="2023-11-29T18:12:06.053478569+08:00" level=info msg="Starting the netpol controller"
I1129 18:12:06.053555 15084 network_policy_controller.go:151] Starting network policy controller
I1129 18:12:06.090726 15084 network_policy_controller.go:162] Starting network policy controller full sync goroutine
I1129 18:12:06.773188 15084 reconciler.go:157] "Reconciler: start to sync state"
I1129 18:12:06.891041 15084 kube.go:127] Node controller sync successful
I1129 18:12:06.891165 15084 vxlan.go:138] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
I1129 18:12:06.938079 15084 kube.go:345] Skip setting NodeNetworkUnavailable
time="2023-11-29T18:12:06.939531703+08:00" level=info msg="Wrote flannel subnet file to /run/flannel/subnet.env"
time="2023-11-29T18:12:06.939570048+08:00" level=info msg="Running flannel backend."
I1129 18:12:06.939589 15084 vxlan_network.go:61] watching for new subnet leases
I1129 18:12:06.941690 15084 iptables.go:217] Some iptables rules are missing; deleting and recreating rules
I1129 18:12:06.941731 15084 iptables.go:241] Deleting iptables rule: -s 10.42.0.0/16 -m comment --comment flanneld forward -j ACCEPT
I1129 18:12:06.942840 15084 iptables.go:241] Deleting iptables rule: -d 10.42.0.0/16 -m comment --comment flanneld forward -j ACCEPT
I1129 18:12:06.944043 15084 iptables.go:229] Adding iptables rule: -s 10.42.0.0/16 -m comment --comment flanneld forward -j ACCEPT
I1129 18:12:06.946844 15084 iptables.go:229] Adding iptables rule: -d 10.42.0.0/16 -m comment --comment flanneld forward -j ACCEPT
I1129 18:12:07.920479 15084 node.go:172] Successfully retrieved node IP: 192.168.46.6
I1129 18:12:07.920531 15084 server_others.go:141] Detected node IP 192.168.46.6
I1129 18:12:07.927300 15084 server_others.go:207] kube-proxy running in dual-stack mode, IPv4-primary
I1129 18:12:07.927361 15084 server_others.go:275] Using ipvs Proxier.
I1129 18:12:07.927399 15084 server_others.go:277] creating dualStackProxier for ipvs.
W1129 18:12:07.927435 15084 server_others.go:502] detect-local-mode set to ClusterCIDR, but no IPv6 cluster CIDR defined, , defaulting to no-op detect-local for IPv6
W1129 18:12:07.927835 15084 proxier.go:449] IPVS scheduler not specified, use rr by default
W1129 18:12:07.928129 15084 proxier.go:449] IPVS scheduler not specified, use rr by default
W1129 18:12:07.928167 15084 ipset.go:113] ipset name truncated; [KUBE-6-LOAD-BALANCER-SOURCE-CIDR] -> [KUBE-6-LOAD-BALANCER-SOURCE-CID]
W1129 18:12:07.928191 15084 ipset.go:113] ipset name truncated; [KUBE-6-NODE-PORT-LOCAL-SCTP-HASH] -> [KUBE-6-NODE-PORT-LOCAL-SCTP-HAS]
I1129 18:12:07.928465 15084 server.go:647] Version: v1.21.14+k3s1
I1129 18:12:07.929294 15084 config.go:224] Starting endpoint slice config controller
I1129 18:12:07.929320 15084 shared_informer.go:240] Waiting for caches to sync for endpoint slice config
I1129 18:12:07.929290 15084 config.go:315] Starting service config controller
I1129 18:12:07.929365 15084 shared_informer.go:240] Waiting for caches to sync for service config
I1129 18:12:07.942028 15084 iptables.go:217] Some iptables rules are missing; deleting and recreating rules
I1129 18:12:07.942073 15084 iptables.go:241] Deleting iptables rule: -s 10.42.0.0/16 -d 10.42.0.0/16 -m comment --comment flanneld masq -j RETURN
I1129 18:12:07.943326 15084 iptables.go:241] Deleting iptables rule: -s 10.42.0.0/16 ! -d 224.0.0.0/4 -m comment --comment flanneld masq -j MASQUERADE
I1129 18:12:07.944653 15084 iptables.go:241] Deleting iptables rule: ! -s 10.42.0.0/16 -d 10.42.3.0/24 -m comment --comment flanneld masq -j RETURN
I1129 18:12:07.945645 15084 iptables.go:241] Deleting iptables rule: ! -s 10.42.0.0/16 -d 10.42.0.0/16 -m comment --comment flanneld masq -j MASQUERADE
I1129 18:12:07.946625 15084 iptables.go:229] Adding iptables rule: -s 10.42.0.0/16 -d 10.42.0.0/16 -m comment --comment flanneld masq -j RETURN
I1129 18:12:07.948562 15084 iptables.go:229] Adding iptables rule: -s 10.42.0.0/16 ! -d 224.0.0.0/4 -m comment --comment flanneld masq -j MASQUERADE
I1129 18:12:07.950649 15084 iptables.go:229] Adding iptables rule: ! -s 10.42.0.0/16 -d 10.42.3.0/24 -m comment --comment flanneld masq -j RETURN
I1129 18:12:07.952625 15084 iptables.go:229] Adding iptables rule: ! -s 10.42.0.0/16 -d 10.42.0.0/16 -m comment --comment flanneld masq -j MASQUERADE
W1129 18:12:07.998141 15084 warnings.go:70] discovery.k8s.io/v1beta1 EndpointSlice is deprecated in v1.21+, unavailable in v1.25+; use discovery.k8s.io/v1 EndpointSlice
W1129 18:12:08.020956 15084 warnings.go:70] discovery.k8s.io/v1beta1 EndpointSlice is deprecated in v1.21+, unavailable in v1.25+; use discovery.k8s.io/v1 EndpointSlice
I1129 18:12:08.030050 15084 shared_informer.go:247] Caches are synced for endpoint slice config
I1129 18:12:08.129519 15084 shared_informer.go:247] Caches are synced for service config
k3s server日志
11月 08 07:10:52 master sh[1868]: Failed to get unit file state for nm-cloud-setup.service: No such file or directory
11月 08 07:10:56 master k3s[2206]: time=“2023-11-08T07:10:56.089259217+08:00” level=info msg=“Starting k3s v1.21.14+k3s1 (982252d7)”
11月 08 07:10:56 master k3s[2206]: time=“2023-11-08T07:10:56.164480118+08:00” level=info msg="Configuring sqlite3 database connection pooling: maxIdleConns=2,
11月 08 07:10:56 master k3s[2206]: time=“2023-11-08T07:10:56.164587288+08:00” level=info msg="Configuring database table schema and indexes, this may take a m
11月 08 07:10:56 master k3s[2206]: time=“2023-11-08T07:10:56.178125472+08:00” level=info msg=“Database tables and indexes are up to date”
11月 08 07:10:56 master k3s[2206]: time=“2023-11-08T07:10:56.461955396+08:00” level=info msg=“Kine listening on unix://kine.sock”
11月 08 07:10:56 master k3s[2206]: time=“2023-11-08T07:10:56.509392850+08:00” level=info msg=“Reconciling bootstrap data between datastore and disk”
11月 08 07:10:56 master k3s[2206]: time=“2023-11-08T07:10:56.824972873+08:00” level=info msg="Running kube-apiserver --advertise-port=8081 --allow-privileged=
11月 08 07:10:56 master k3s[2206]: Flag --insecure-port has been deprecated, This flag has no effect now and will be removed in v1.24.
11月 08 07:10:56 master k3s[2206]: I1108 07:10:56.882105 2206 server.go:656] external host was not specified, using 192.168.50.1
11月 08 07:10:56 master k3s[2206]: I1108 07:10:56.882799 2206 server.go:195] Version: v1.21.14+k3s1
11月 08 07:10:56 master k3s[2206]: time=“2023-11-08T07:10:56.892610878+08:00” level=info msg="Running kube-scheduler --address=127.0.0.1 --bind-address=127.0.
11月 08 07:10:56 master k3s[2206]: time=“2023-11-08T07:10:56.892733584+08:00” level=info msg=“Waiting for API server to become available”
11月 08 07:10:56 master k3s[2206]: time=“2023-11-08T07:10:56.894068468+08:00” level=info msg="Running kube-controller-manager --address=127.0.0.1 --allocate-n
11月 08 07:10:56 master k3s[2206]: time=“2023-11-08T07:10:56.895405386+08:00” level=info msg="Running cloud-controller-manager --allocate-node-cidrs=true --bi
11月 08 07:10:56 master k3s[2206]: time=“2023-11-08T07:10:56.938009764+08:00” level=info msg=“Node token is available at /var/lib/rancher/k3s/server/token”
11月 08 07:10:56 master k3s[2206]: time=“2023-11-08T07:10:56.938335600+08:00” level=info msg="To join node to cluster: k3s agent -s https://0.0.0.0:8081 -t ${
11月 08 07:10:56 master k3s[2206]: I1108 07:10:56.955499 2206 shared_informer.go:240] Waiting for caches to sync for node_authorizer
11月 08 07:10:56 master k3s[2206]: time=“2023-11-08T07:10:56.959069454+08:00” level=info msg=“Wrote kubeconfig /etc/rancher/k3s/k3s.yaml”
11月 08 07:10:56 master k3s[2206]: time=“2023-11-08T07:10:56.959139253+08:00” level=info msg=“Run: k3s kubectl”
11月 08 07:10:56 master k3s[2206]: I1108 07:10:56.975419 2206 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following orde
11月 08 07:10:56 master k3s[2206]: I1108 07:10:56.975457 2206 plugins.go:161] Loaded 10 validating admission controller(s) successfully in the following or
11月 08 07:10:56 master k3s[2206]: I1108 07:10:56.978234 2206 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following orde
11月 08 07:10:56 master k3s[2206]: I1108 07:10:56.978268 2206 plugins.go:161] Loaded 10 validating admission controller(s) successfully in the following or
11月 08 07:10:57 master k3s[2206]: {“level”:“warn”,“ts”:“2023-11-08T07:10:57.005+0800”,“caller”:“clientv3/retry_interceptor.go:62”,“msg”:"retrying of unary in
11月 08 07:10:57 master k3s[2206]: I1108 07:10:57.068266 2206 instance.go:283] Using reconciler: lease
11月 08 07:10:57 master k3s[2206]: I1108 07:10:57.138386 2206 rest.go:130] the default service ipfamily for this cluster is: IPv4
11月 08 07:10:57 master k3s[2206]: time=“2023-11-08T07:10:57.213458621+08:00” level=info msg="certificate CN=master signed by CN=k3s-server-ca@1698355188: not
11月 08 07:10:57 master k3s[2206]: time=“2023-11-08T07:10:57.218747437+08:00” level=info msg="certificate CN=system:node:master,O=system:nodes signed by CN=k3
11月 08 07:10:57 master k3s[2206]: time=“2023-11-08T07:10:57.255557524+08:00” level=info msg=“Module overlay was already loaded”
11月 08 07:10:57 master k3s[2206]: time=“2023-11-08T07:10:57.362734589+08:00” level=info msg=“Module br_netfilter was already loaded”
11月 08 07:10:57 master k3s[2206]: time=“2023-11-08T07:10:57.463681671+08:00” level=info msg=“Set sysctl ‘net/netfilter/nf_conntrack_max’ to 2097152”
11月 08 07:10:57 master k3s[2206]: time=“2023-11-08T07:10:57.463786685+08:00” level=info msg="Set sysctl ‘net/netfilter/nf_conntrack_tcp_timeout_established’