求助:Rancher2.5.9频繁重启

RKE 版本:
rancher2.5.9
Docker 版本: (docker version,docker info)
20.10.7
操作系统和内核: (cat /etc/os-release, uname -r)
centos7.9 3.10.0-1160.36.2.el7.x86_64
主机类型和供应商: (VirtualBox/Bare-metal/AWS/GCE/DO)

cluster.yml 文件:

重现步骤:

结果:

......
goroutine 2828 [select]:
github.com/rancher/k3s/vendor/k8s.io/apimachinery/pkg/util/wait.WaitFor(0xc01152e7e0, 0xc00fc85670, 0xc00cb70360, 0x0, 0x0)
        /go/src/github.com/rancher/k3s/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:539 +0x11d
github.com/rancher/k3s/vendor/k8s.io/apimachinery/pkg/util/wait.PollUntil(0xdf8475800, 0xc00fc85670, 0xc008641380, 0x0, 0x0)
        /go/src/github.com/rancher/k3s/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:492 +0xc5
github.com/rancher/k3s/vendor/k8s.io/apimachinery/pkg/util/wait.PollImmediateUntil(0xdf8475800, 0xc00fc85670, 0xc008641380, 0xb, 0xc0123abf48)
        /go/src/github.com/rancher/k3s/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:511 +0xb3
created by github.com/rancher/k3s/vendor/k8s.io/apiserver/pkg/server/dynamiccertificates.(*DynamicCertKeyPairContent).Run
        /go/src/github.com/rancher/k3s/vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/dynamic_serving_content.go:137 +0x2b3

goroutine 2829 [select, 44 minutes]:
github.com/rancher/k3s/vendor/k8s.io/apimachinery/pkg/util/wait.contextForChannel.func1(0xc008641380, 0xc00fc85690, 0x4c77280, 0xc009a2c400)
        /go/src/github.com/rancher/k3s/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:279 +0xbd
created by github.com/rancher/k3s/vendor/k8s.io/apimachinery/pkg/util/wait.contextForChannel
        /go/src/github.com/rancher/k3s/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:278 +0x8c

goroutine 2830 [select]:
github.com/rancher/k3s/vendor/k8s.io/apimachinery/pkg/util/wait.poller.func1.1(0xc00cb70420, 0xdf8475800, 0x0, 0xc00cb703c0)
        /go/src/github.com/rancher/k3s/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:588 +0x17b
created by github.com/rancher/k3s/vendor/k8s.io/apimachinery/pkg/util/wait.poller.func1
        /go/src/github.com/rancher/k3s/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:571 +0x8c

goroutine 2017 [chan receive, 44 minutes]:
github.com/rancher/k3s/vendor/k8s.io/apiserver/pkg/server/dynamiccertificates.(*DynamicServingCE0301 09:01:02.239998      35 leaderelection.go:325] error retrieving resource lock kube-system/kube-scheduler: Get "https://127.0.0.1:6444/api/v1/namespaces/kube-system/endpoints/kube-scheduler?timeout=10s": context deadline exceeded
I0301 09:01:02.240286      35 resource_quota_controller.go:291] Shutting down resource quota controller
I0301 09:01:02.240340      35 certificate_controller.go:130] Shutting down certificate controller "csrsigning-kubelet-serving"
I0301 09:01:02.240370      35 certificate_controller.go:130] Shutting down certificate controller "csrsigning-kubelet-client"
I0301 09:01:02.240374      35 garbagecollector.go:146] Shutting down garbage collector controller
I0301 09:01:02.240617      35 certificate_controller.go:130] Shutting down certificate controller "csrsigning-kube-apiserver-client"
I0301 09:01:02.240629      35 certificate_controller.go:130] Shutting down certificate controller "csrsigning-legacy-unknown"
I0301 09:01:02.240835      35 dynamic_serving_content.go:145] Shutting down csr-controller::/var/lib/rancher/k3s/server/tls/client-ca.crt::/var/lib/rancher/k3s/server/tls/client-ca.key
I0301 09:01:02.240842      35 dynamic_serving_content.go:145] Shutting down csr-controller::/var/lib/rancher/k3s/server/tls/client-ca.crt::/var/lib/rancher/k3s/server/tls/client-ca.key
I0301 09:01:02.240852      35 dynamic_serving_content.go:145] Shutting down csr-controller::/var/lib/rancher/k3s/server/tls/client-ca.crt::/var/lib/rancher/k3s/server/tls/client-ca.key
I0301 09:01:02.240862      35 dynamic_serving_content.go:145] Shutting down csr-controller::/var/lib/rancher/k3s/server/tls/client-ca.crt::/var/lib/rancher/k3s/server/tls/client-ca.key
I0301 09:01:02.240870      35 cleaner.go:91] Shutting down CSR cleaner controller
I0301 09:01:02.240902      35 cronjob_controller.go:100] Shutting down CronJob Manager
I0301 09:01:02.241507      35 horizontal.go:180] Shutting down HPA controller
I0301 09:01:02.241550      35 node_ipam_controller.go:171] Shutting down ipam controller
I0301 09:01:02.241560      35 endpointslicemirroring_controller.go:224] Shutting down EndpointSliceMirroring controller
I0301 09:01:02.241575      35 stateful_set.go:158] Shutting down statefulset controller
I0301 09:01:02.241584      35 deployment_controller.go:165] Shutting down deployment controller
I0301 09:01:02.241590      35 pv_controller_base.go:319] Shutting down persistent volume controller
I0301 09:01:02.241605      35 serviceaccounts_controller.go:129] Shutting down service account controller
I0301 09:01:02.241634      35 node_lifecycle_controller.go:589] Shutting down node controller
I0301 09:01:02.241645      35 job_controller.go:160] Shutting down job controller
I0301 09:01:02.241650      35 endpointslice_controller.go:253] Shutting down endpoint slice controller
I0301 09:01:02.241660      35 replica_set.go:194] Shutting down replicaset controller
I0301 09:01:02.241666      35 attach_detach_controller.go:361] Shutting down attach detach controller
I0301 09:01:02.241672      35 ttl_controller.go:130] Shutting down TTL controller
I0301 09:01:02.241674      35 clusterroleaggregation_controller.go:161] Shutting down ClusterRoleAggregator
I0301 09:01:02.241680      35 pv_protection_controller.go:95] Shutting down PV protection controller
I0301 09:01:02.241678      35 namespace_controller.go:212] Shutting down namespace controller
I0301 09:01:02.241686      35 gc_controller.go:100] Shutting down GC controller
I0301 09:01:02.241691      35 disruption.go:348] Shutting down disruption controller
I0301 09:01:02.241702      35 daemon_controller.go:299] Shutting down daemon sets controller
I0301 09:01:02.241707      35 expand_controller.go:315] Shutting down expand controller
I0301 09:01:02.241712      35 replica_set.go:194] Shutting down replicationcontroller controller
E0301 09:01:02.241911      35 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
E0301 09:01:02.241960      35 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
I0301 09:01:02.242358      35 range_allocator.go:184] Shutting down range CIDR allocator
I0301 09:01:02.242375      35 pvc_protection_controller.go:122] Shutting down PVC protection controller
I0301 09:01:02.242385      35 endpoints_controller.go:201] Shutting down endpoint controller
I0301 09:01:02.558968      35 event.go:291] "Event occurred" object="" kind="Endpoints" apiVersion="v1" type="Normal" reason="LeaderElection" message="c2c66f6c1fee_0c1d1c69-271a-453b-867f-730a16b5eaad stopped leading"
E0301 09:01:03.151887      35 controller.go:178] failed to update node lease, error: Put "https://127.0.0.1:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/local-node?timeout=10s": context deadline exceeded
I0301 09:01:03.152221      35 event.go:291] "Event occurred" object="" kind="Endpoints" apiVersion="v1" type="Normal" reason="LeaderElection" message="c2c66f6c1fee_afc82884-81f3-4bc7-bc12-ba86a19b5e00 stopped leading"
I0301 09:01:03.152669      35 trace.go:205] Trace[1384632010]: "GuaranteedUpdate etcd3" type:*coordination.Lease (01-Mar-2023 09:00:53.130) (total time: 10021ms):
Trace[1384632010]: [10.021872699s] [10.021872699s] END
I0301 09:01:03.153069      35 trace.go:205] Trace[1772204812]: "Update" url:/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/local-node,user-agent:k3s/v1.19.8+k3s1 (linux/amd64) kubernetes/95fc76b,client:127.0.0.1 (01-Mar-2023 09:00:53.130) (total time: 10022ms):
Trace[1772204812]: [10.022502816s] [10.022502816s] END
I0301 09:01:03.228462      35 leaderelection.go:278] failed to renew lease kube-system/kube-scheduler: timed out waiting for the condition
F0301 09:01:03.228602      35 server.go:199] leaderelection lost
2023-03-01 09:01:03.634899 W | etcdserver: read-only range request "key:\"/registry/minions/\" range_end:\"/registry/minions0\" count_only:true " with result "error:context canceled" took too long (481.905009ms) to execute
WARNING: 2023/03/01 09:01:03 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
2023-03-01 09:01:03.634946 W | etcdserver: read-only range request "key:\"/registry/jobs/\" range_end:\"/registry/jobs0\" limit:500 " with result "error:context canceled" took too long (10.419729397s) to execute
WARNING: 2023/03/01 09:01:03 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
2023-03-01 09:01:03.635089 W | etcdserver: read-only range request "key:\"/registry/namespaces/default\" " with result "error:context canceled" took too long (10.507019865s) to execute
WARNING: 2023/03/01 09:01:03 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
2023-03-01 09:01:03.635119 W | etcdserver: read-only range request "key:\"/registry/configmaps/kube-system/k3s\" " with result "error:context canceled" took too long (12.19467652s) to execute
WARNING: 2023/03/01 09:01:03 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
2023-03-01 09:01:03.635342 W | etcdserver: read-only range request "key:\"/registry/secrets/\" range_end:\"/registry/secrets0\" count_only:true " with result "error:context canceled" took too long (3.891377171s) to execute
WARNING: 2023/03/01 09:01:03 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
2023-03-01 09:01:03.635361 W | etcdserver: read-only range request "key:\"/registry/configmaps/fleet-system/gitjob\" " with result "error:context canceled" took too long (10.23865501s) to execute
WARNING: 2023/03/01 09:01:03 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
WARNING: 2023/03/01 09:01:03 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
WARNING: 2023/03/01 09:01:03 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
WARNING: 2023/03/01 09:01:03 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
WARNING: 2023/03/01 09:01:03 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
2023-03-01 09:01:03.635593 W | etcdserver: read-only range request "key:\"/registry/configmaps/kube-system/cattle-controllers\" " with result "error:context canceled" took too long (12.198797729s) to execute
WARNING: 2023/03/01 09:01:03 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
2023-03-01 09:01:03.635876 W | etcdserver: read-only range request "key:\"/registry/poddisruptionbudgets/\" range_end:\"/registry/poddisruptionbudgets0\" count_only:true " with result "error:context canceled" took too long (7.127710817s) to execute
WARNING: 2023/03/01 09:01:03 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
2023-03-01 09:01:03.635956 W | etcdserver: read-only range request "key:\"/registry/storageclasses/\" range_end:\"/registry/storageclasses0\" count_only:true " with result "error:context canceled" took too long (4.756664149s) to execute
2023-03-01 09:01:03.635968 W | etcdserver: read-only range request "key:\"/registry/volumeattachments/\" range_end:\"/registry/volumeattachments0\" count_only:true " with result "error:context canceled" took too long (6.200227375s) to execute
WARNING: 2023/03/01 09:01:03 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
WARNING: 2023/03/01 09:01:03 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
2023-03-01 09:01:03.636418 W | etcdserver: read-only range request "key:\"/registry/apiextensions.k8s.io/customresourcedefinitions/\" range_end:\"/registry/apiextensions.k8s.io/customresourcedefinitions0\" count_only:true " with result "error:context canceled" took too long (9.396489151s) to execute
2023-03-01 09:01:03.636448 W | etcdserver: read-only range request "key:\"/registry/clusterroles/\" range_end:\"/registry/clusterroles0\" count_only:true " with result "error:context canceled" took too long (9.2233696s) to execute
WARNING: 2023/03/01 09:01:03 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
WARNING: 2023/03/01 09:01:03 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
2023-03-01 09:01:03.636678 W | etcdserver: read-only range request "key:\"/registry/apiregistration.k8s.io/apiservices/\" range_end:\"/registry/apiregistration.k8s.io/apiservices0\" count_only:true " with result "error:context canceled" took too long (3.288246886s) to execute
WARNING: 2023/03/01 09:01:03 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
2023-03-01 09:01:03.638221 W | etcdserver: read-only range request "key:\"/registry/fleet.cattle.io/clusterregistrations/\" range_end:\"/registry/fleet.cattle.io/clusterregistrations0\" count_only:true " with result "error:context canceled" took too long (485.738184ms) to execute
WARNING: 2023/03/01 09:01:03 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
2023-03-01 09:01:03.638644 W | etcdserver: read-only range request "key:\"/registry/fleet.cattle.io/clusterregistrationtokens/\" range_end:\"/registry/fleet.cattle.io/clusterregistrationtokens0\" count_only:true " with result "error:context canceled" took too long (1.081180112s) to execute
WARNING: 2023/03/01 09:01:03 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
2023-03-01 09:01:03.640864 W | etcdserver: read-only range request "key:\"/registry/fleet.cattle.io/bundlenamespacemappings/\" range_end:\"/registry/fleet.cattle.io/bundlenamespacemappings0\" count_only:true " with result "error:context canceled" took too long (11.439674347s) to execute
WARNING: 2023/03/01 09:01:03 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
2023-03-01 09:01:03.641231 W | etcdserver: read-only range request "key:\"/registry/management.cattle.io/settings/\" range_end:\"/registry/management.cattle.io/settings0\" count_only:true " with result "error:context canceled" took too long (6.967171723s) to execute
WARNING: 2023/03/01 09:01:03 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
2023-03-01 09:01:03.642540 W | etcdserver: read-only range request "key:\"/registry/management.cattle.io/catalogtemplates/\" range_end:\"/registry/management.cattle.io/catalogtemplates0\" count_only:true " with result "error:context canceled" took too long (969.987971ms) to execute
WARNING: 2023/03/01 09:01:03 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
2023-03-01 09:01:03.644920 W | etcdserver: read-only range request "key:\"/registry/management.cattle.io/clusteralertgroups/\" range_end:\"/registry/management.cattle.io/clusteralertgroups0\" count_only:true " with result "error:context canceled" took too long (10.040408909s) to execute
WARNING: 2023/03/01 09:01:03 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
2023-03-01 09:01:03.644996 W | etcdserver: read-only range request "key:\"/registry/management.cattle.io/templates/\" range_end:\"/registry/management.cattle.io/templates0\" count_only:true " with result "error:context canceled" took too long (10.21421616s) to execute
WARNING: 2023/03/01 09:01:03 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
2023-03-01 09:01:03.645306 W | etcdserver: read-only range request "key:\"/registry/management.cattle.io/composeconfigs/\" range_end:\"/registry/management.cattle.io/composeconfigs0\" count_only:true " with result "error:context canceled" took too long (9.093468172s) to execute
WARNING: 2023/03/01 09:01:03 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
2023-03-01 09:01:03.645799 W | etcdserver: read-only range request "key:\"/registry/management.cattle.io/clustermonitorgraphs/\" range_end:\"/registry/management.cattle.io/clustermonitorgraphs0\" count_only:true " with result "error:context canceled" took too long (4.625231034s) to execute
WARNING: 2023/03/01 09:01:03 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
2023-03-01 09:01:03.646012 W | etcdserver: read-only range request "key:\"/registry/management.cattle.io/templateversions/\" range_end:\"/registry/management.cattle.io/templateversions0\" count_only:true " with result "error:context canceled" took too long (5.943559173s) to execute
WARNING: 2023/03/01 09:01:03 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
E0301 09:01:03.648896       8 leaderelection.go:325] error retrieving resource lock kube-system/cattle-controllers: Get "https://127.0.0.1:6443/api/v1/namespaces/kube-system/configmaps/cattle-controllers?timeout=15m0s": dial tcp 127.0.0.1:6443: connect: connection refused
W0301 09:01:03.653051       8 reflector.go:437] pkg/mod/github.com/rancher/client-go@v1.20.0-rancher.1/tools/cache/reflector.go:168: watch of *summary.SummarizedObject ended with: very short watch: pkg/mod/github.com/rancher/client-go@v1.20.0-rancher.1/tools/cache/reflector.go:168: Unexpected watch close - watch lasted less than a second and no items received
2023/03/01 09:01:03 httputil: ReverseProxy read error during body copy: unexpected EOF
2023/03/01 09:01:03 httputil: ReverseProxy read error during body copy: unexpected EOF
2023/03/01 09:01:03 [FATAL] k3s exited with: exit status 255

参考:K3s异常宕机 [FATAL] k3s exited with: exit status 255