Rke创建k8s集群失败

RKE 版本:
1.4.5
Docker 版本: (docker version,docker info)

Docker version 20.10.24, build 297e128

Client:
Context: default
Debug Mode: false

Server:
Containers: 3
Running: 2
Paused: 0
Stopped: 1
Images: 30
Server Version: 20.10.24
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc io.containerd.runc.v2 io.containerd.runtime.v1.linux
Default Runtime: runc
Init Binary: docker-init
containerd version: 2806fc1057397dbaeefbea0e4e17bddfbd388f38
runc version: v1.1.5-0-gf19387a6
init version: de40ad0
Security Options:
seccomp
Profile: default
Kernel Version: 4.19.90-25.25.v2101.ky10.aarch64
Operating System: Kylin Linux Advanced Server V10 (Sword)
OSType: linux
Architecture: aarch64
CPUs: 64
Total Memory: 254.1GiB
Name: k8s-master-01
ID: 3ZUX:PET5:NH6Y:JGXO:C2ZO:NHD4:ZDVS:6KRE:2RDG:SQ45:JL36:RBL3
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

操作系统和内核: (cat /etc/os-release, uname -r)
NAME=“Kylin Linux Advanced Server”
VERSION=“V10 (Sword)”
ID=“kylin”
VERSION_ID=“V10”
PRETTY_NAME=“Kylin Linux Advanced Server V10 (Sword)”
ANSI_COLOR=“0;31”
4.19.90-25.25.v2101.ky10.aarch64
4.19.90-25.25.v2101.ky10.aarch64

主机类型和供应商: (VirtualBox/Bare-metal/AWS/GCE/DO)

cluster.yml 文件:
nodes:

  • address: 172.16.153.10
    port: “22”
    internal_address: “”
    role:
    • controlplane
    • etcd
      hostname_override: k8s-master-01
      user: rke
      docker_socket: /var/run/docker.sock
      ssh_key: “”
      ssh_key_path: /home/rke/.ssh/id_rsa
      ssh_cert: “”
      ssh_cert_path: “”
      labels: {}
      taints:
  • address: 172.16.153.11
    port: “22”
    internal_address: “”
    role:
    • controlplane
    • etcd
      hostname_override: k8s-master-02
      user: rke
      docker_socket: /var/run/docker.sock
      ssh_key: “”
      ssh_key_path: /home/rke/.ssh/id_rsa
      ssh_cert: “”
      ssh_cert_path: “”
      labels: {}
      taints:
  • address: 172.16.153.12
    port: “22”
    internal_address: “”
    role:
    • controlplane
    • etcd
      hostname_override: k8s-master-03
      user: rke
      docker_socket: /var/run/docker.sock
      ssh_key: “”
      ssh_key_path: /home/rke/.ssh/id_rsa
      ssh_cert: “”
      ssh_cert_path: “”
      labels: {}
      taints:
  • address: 172.16.153.30
    port: “22”
    internal_address: “”
    role:
    • worker
      hostname_override: k8s-worker-01
      user: rke
      docker_socket: /var/run/docker.sock
      ssh_key: “”
      ssh_key_path: /home/rke/.ssh/id_rsa
      ssh_cert: “”
      ssh_cert_path: “”
      labels: {}
      taints:
  • address: 172.16.153.31
    port: “22”
    internal_address: “”
    role:
    • worker
      hostname_override: k8s-worker-02
      user: rke
      docker_socket: /var/run/docker.sock
      ssh_key: “”
      ssh_key_path: /home/rke/.ssh/id_rsa
      ssh_cert: “”
      ssh_cert_path: “”
      labels: {}
      taints:
  • address: 172.16.153.32
    port: “22”
    internal_address: “”
    role:
    • worker
      hostname_override: k8s-worker-03
      user: rke
      docker_socket: /var/run/docker.sock
      ssh_key: “”
      ssh_key_path: /home/rke/.ssh/id_rsa
      ssh_cert: “”
      ssh_cert_path: “”
      labels: {}
      taints:
      services:
      etcd:
      image: “”
      extra_args: {}
      extra_args_array: {}
      extra_binds:
      extra_env:
      win_extra_args: {}
      win_extra_args_array: {}
      win_extra_binds:
      win_extra_env:
      external_urls:
      ca_cert: “”
      cert: “”
      key: “”
      path: “”
      uid: 0
      gid: 0
      snapshot: null
      retention: “”
      creation: “”
      backup_config: null
      kube-api:
      image: “”
      extra_args: {}
      extra_args_array: {}
      extra_binds:
      extra_env:
      win_extra_args: {}
      win_extra_args_array: {}
      win_extra_binds:
      win_extra_env:
      service_cluster_ip_range: 10.43.0.0/16
      service_node_port_range: “”
      pod_security_policy: false
      pod_security_configuration: “”
      always_pull_images: false
      secrets_encryption_config: null
      audit_log: null
      admission_configuration: null
      event_rate_limit: null
      kube-controller:
      image: “”
      extra_args: {}
      extra_args_array: {}
      extra_binds:
      - ‘/etc/localtime:/etc/localtime’
      extra_env:
      win_extra_args: {}
      win_extra_args_array: {}
      win_extra_binds:
      win_extra_env:
      cluster_cidr: 10.42.0.0/16
      service_cluster_ip_range: 10.43.0.0/16
      scheduler:
      image: “”
      extra_args: {}
      extra_args_array: {}
      extra_binds:
      extra_env:
      win_extra_args: {}
      win_extra_args_array: {}
      win_extra_binds:
      win_extra_env:
      kubelet:
      image: “”
      extra_args: {}
      extra_args_array: {}
      extra_binds:
      extra_env:
      win_extra_args: {}
      win_extra_args_array: {}
      win_extra_binds:
      win_extra_env:
      cluster_domain: cluster.local
      infra_container_image: “”
      cluster_dns_server: 10.43.0.10
      fail_swap_on: false
      generate_serving_certificate: false
      kubeproxy:
      image: “”
      extra_args: {}
      extra_args_array: {}
      extra_binds:
      extra_env:
      win_extra_args: {}
      win_extra_args_array: {}
      win_extra_binds:
      win_extra_env:
      network:
      plugin: calico
      options: {}
      mtu: 0
      node_selector: {}
      update_strategy: null
      tolerations:
      authentication:
      strategy: x509
      sans:
      webhook: null
      addons: “”
      addons_include:
      system_images:
      etcd: rancher/mirrored-coreos-etcd:v3.5.6
      alpine: rancher/rke-tools:v0.1.88
      nginx_proxy: rancher/rke-tools:v0.1.88
      cert_downloader: rancher/rke-tools:v0.1.88
      kubernetes_services_sidecar: rancher/rke-tools:v0.1.88
      kubedns: rancher/mirrored-k8s-dns-kube-dns:1.22.8
      dnsmasq: rancher/mirrored-k8s-dns-dnsmasq-nanny:1.22.8
      kubedns_sidecar: rancher/mirrored-k8s-dns-sidecar:1.22.8
      kubedns_autoscaler: rancher/mirrored-cluster-proportional-autoscaler:1.8.6
      coredns: rancher/mirrored-coredns-coredns:1.9.4
      coredns_autoscaler: rancher/mirrored-cluster-proportional-autoscaler:1.8.6
      nodelocal: rancher/mirrored-k8s-dns-node-cache:1.22.10
      kubernetes: rancher/hyperkube:v1.25.9-rancher2
      flannel: rancher/mirrored-flannelcni-flannel:v0.19.2
      flannel_cni: rancher/flannel-cni:v0.3.0-rancher7
      calico_node: rancher/mirrored-calico-node:v3.24.1
      calico_cni: rancher/calico-cni:v3.24.1-rancher1
      calico_controllers: rancher/mirrored-calico-kube-controllers:v3.24.1
      calico_ctl: rancher/mirrored-calico-ctl:v3.24.1
      calico_flexvol: rancher/mirrored-calico-pod2daemon-flexvol:v3.24.1
      canal_node: rancher/mirrored-calico-node:v3.24.1
      canal_cni: rancher/calico-cni:v3.24.1-rancher1
      canal_controllers: rancher/mirrored-calico-kube-controllers:v3.24.1
      canal_flannel: rancher/mirrored-flannelcni-flannel:v0.19.2
      canal_flexvol: rancher/mirrored-calico-pod2daemon-flexvol:v3.24.1
      weave_node: weaveworks/weave-kube:2.8.1
      weave_cni: weaveworks/weave-npc:2.8.1
      pod_infra_container: rancher/mirrored-pause:3.7
      ingress: rancher/nginx-ingress-controller:nginx-1.5.1-rancher2
      ingress_backend: rancher/mirrored-nginx-ingress-controller-defaultbackend:1.5-rancher1
      ingress_webhook: rancher/mirrored-ingress-nginx-kube-webhook-certgen:v1.1.1
      metrics_server: rancher/mirrored-metrics-server:v0.6.2
      windows_pod_infra_container: rancher/mirrored-pause:3.7
      aci_cni_deploy_container: noiro/cnideploy:5.2.3.6.1d150da
      aci_host_container: noiro/aci-containers-host:5.2.3.6.1d150da
      aci_opflex_container: noiro/opflex:5.2.3.6.1d150da
      aci_mcast_container: noiro/opflex:5.2.3.6.1d150da
      aci_ovs_container: noiro/openvswitch:5.2.3.6.1d150da
      aci_controller_container: noiro/aci-containers-controller:5.2.3.6.1d150da
      aci_gbp_server_container: noiro/gbp-server:5.2.3.6.1d150da
      aci_opflex_server_container: noiro/opflex-server:5.2.3.6.1d150da
      ssh_key_path: ~/.ssh/id_rsa
      ssh_cert_path: “”
      ssh_agent_auth: false
      authorization:
      mode: rbac
      options: {}
      ignore_docker_version: null
      enable_cri_dockerd: null
      kubernetes_version: “”
      private_registries:
      ingress:
      provider: “”
      options: {}
      node_selector: {}
      extra_args: {}
      dns_policy: “”
      extra_envs:
      extra_volumes:
      extra_volume_mounts:
      update_strategy: null
      http_port: 0
      https_port: 0
      network_mode: “”
      tolerations:
      default_backend: null
      default_http_backend_priority_class_name: “”
      nginx_ingress_controller_priority_class_name: “”
      default_ingress_class: null
      cluster_name: “”
      cloud_provider:
      name: “”
      prefix_path: “”
      win_prefix_path: “”
      addon_job_timeout: 0
      bastion_host:
      address: “”
      port: “”
      user: “”
      ssh_key: “”
      ssh_key_path: “”
      ssh_cert: “”
      ssh_cert_path: “”
      ignore_proxy_env_vars: false
      monitoring:
      provider: “”
      options: {}
      node_selector: {}
      update_strategy: null
      replicas: null
      tolerations:
      metrics_server_priority_class_name: “”
      restore:
      restore: false
      snapshot_name: “”
      rotate_encryption_key: false
      dns: null

重现步骤:
rke --debug up --config cluster.yml
结果:
FATA[0051] [[network] Host [172.16.153.10] is not able to connect to the following ports: [172.16.153.11:2379, 172.16.153.11:2380, 172.16.153.12:2379, 172.16.153.12:2380]. Please check network policies and firewall rules]

防火墙都是关闭状态。本机可以访问端口但是其他节点不可以。
docker ps
81f1a60b7c1b rancher/rke-tools:v0.1.88 “/docker-entrypoint.…” 22 minutes ago Up 22 minutes 80/tcp, 0.0.0.0:6443->1337/tcp rke-cp-port-listener
4056d7c05fa4 rancher/rke-tools:v0.1.88 “/docker-entrypoint.…” 22 minutes ago Up 22 minutes 80/tcp, 0.0.0.0:2379->1337/tcp, 0.0.0.0:2380->1337/tcp rke-etcd-port-listener

在我多次尝试删除,重新执行后,出现了新的报错。
DEBU[0078] [healthcheck] Failed to check http://localhost:10248/healthz for service [kubelet] on host [172.16.153.12]: Get “http://localhost:10248/healthz”: Unable to access the service on localhost:10248. The service might be still starting up. Error: ssh: rejected: connect failed (拒绝连接), try #2

DEBU[0120] [healthcheck] Failed to check http://localhost:10256/healthz for service [kube-proxy] on host [172.16.153.30]: Get “http://localhost:10256/healthz”: Unable to access the service on localhost:10256. The service might be still starting up. Error: ssh: rejected: connect failed (拒绝连接), try #7
DEBU[0120] [healthcheck] Failed to check http://localhost:10256/healthz for service [kube-proxy] on host [172.16.153.32]: Get “http://localhost:10256/healthz”: Unable to access the service on localhost:10256. The service might be still starting up. Error: ssh: rejected: connect failed (拒绝连接), try #7
DEBU[0125] [healthcheck] Failed to check http://localhost:10256/healthz for service [kube-proxy] on host [172.16.153.31]: Get “http://localhost:10256/healthz”: Unable to access the service on localhost:10256. The service might be still starting up. Error: ssh: rejected: connect failed (拒绝连接), try #8
FATA[0294] [ “k8s-worker-02” not found]

你是指节点之间的端口不通?那更应该检查防火墙、安全组、selinux 等策略

检查过所有的防火墙、selinux 全是关闭状态

DEBU[0125] [healthcheck] Failed to check http://localhost:10256/healthz for service [kube-proxy] on host [172.16.153.31]: Get “http://localhost:10256/healthz”: Unable to access the service on localhost:10256. The service might be still starting up. Error: ssh: rejected: connect failed (拒绝连接), try #8
报的错误是这个,看提示应该是自己访问自己。但是我直接手动打命令curl http://localhost:10256/healthz 是有返回的