Cattle-cluster-agent无法连接到rancher

部署方案:

hostname 服务器 用途 config.yaml配置 备注
master1 192.168.16.2 rke2 server节点 server无 server节点通过nginx进行负载均衡
master2 192.168.16.3 rke2 server节点 server:https://192.168.1.7:9435 server节点通过nginx进行负载均衡
master3 192.168.16.4 rke2 server节点 server:https://192.168.1.7:9435 server节点通过nginx进行负载均衡
node1 192.168.16.5 rke2 agent节点 server:https://192.168.1.7:9435
node2 192.168.16.6 rke2 agent节点 server:https://192.168.1.7:9435
other1 192.168.16.68 nginx、rancher单节点 rke2 server负载均衡、docker+rancher单节点

部署后集群启动正常

单节点rancher和nginx部署在同一台主机,master节点在部署时server地址为nginx服务地址ip,现在的问题是,这样的部署方式,master节点cattle-cluster-agent连接不上rancher,具体错误如下:

time="2024-03-11T01:21:06Z" level=info msg="Connecting to wss://192.168.16.68:8443/v3/connect/register with token starting with 6jdnv22qfcj7zf5g4xjjxh6r69h"
time="2024-03-11T01:21:06Z" level=info msg="Connecting to proxy" url="wss://192.168.16.68:8443/v3/connect/register"
time="2024-03-11T01:21:06Z" level=error msg="Failed to connect to proxy. Response status: 400 - 400 Bad Request. Response body: cluster not found" error="websocket: bad handshake"
time="2024-03-11T01:21:06Z" level=error msg="Remotedialer proxy error" error="websocket: bad handshake"

请问这种情况要怎么处理,是不是rke2的bug? 如果部署master节点,master节点server地址为已存活master节点ip地址时,则cattle-cluster-agent
可以正常连接到rancher,当master节点config.yaml server地址为代理服务器地址时,则cattle-cluster-agent连接不上rancher。

这段有点没看懂,再帮描述下呗

在master 节点中,安装第二、第三个master节点,config.yaml配置的server地址为第一个master地址,则可以正常连接rancher,如果安装第二、第三个master节点时,config.yaml配置的server地址为nginx代理服务器地址时,则cattle-cluster-agent连接不上rancher。nginx代理服务器地址为所有master地连接地址,nginx 负载均衡器配置示例:

events {}
stream {
  upstream k3s_servers {
    server 10.10.10.50:6443;
    server 10.10.10.51:6443;
    server 10.10.10.52:6443;
  }
  server {
    listen 6443;
    proxy_pass k3s_servers;
  }
}

rancher导入rke2集群命令,在rke2 master节点执行

curl --insecure -sfL https://192.168.16.68:8443/v3/import/2wt5lkqn45rb2k56cf6jtpzr6d2dq29bk85g5hgt8h7q4vv6d5vntw_c-m-5cks6m6v.yaml | kubectl apply -f -

我感觉还是你 nginx 的配置问题,你可以参考我的:

#user nginx;
load_module /usr/lib/nginx/modules/ngx_stream_module.so;
worker_processes 4;
worker_rlimit_nofile 40000;
error_log /var/log/nginx/error.log;
pid /run/nginx.pid;

# Load dynamic modules. See /usr/share/doc/nginx/README.dynamic.
include /usr/share/nginx/modules/*.conf;

events {
    worker_connections 8192;
}

stream {
    upstream backend {
        least_conn;
        server 192.168.205.85:9345 max_fails=3 fail_timeout=5s;
        server 192.168.205.86:9345 max_fails=3 fail_timeout=5s;
        server 192.168.205.87:9345 max_fails=3 fail_timeout=5s;
   }

   # This server accepts all traffic to port 9345 and passes it to the upstream.
   # Notice that the upstream name and the proxy_pass need to match.
   server {

      listen 9345;

          proxy_pass backend;
   }
    upstream rancher_api {
        least_conn;
        server 192.168.205.85:6443 max_fails=3 fail_timeout=5s;
        server 192.168.205.86:6443 max_fails=3 fail_timeout=5s;
        server 192.168.205.87:6443 max_fails=3 fail_timeout=5s;
    }
        server {
        listen     6443;
        proxy_pass rancher_api;
        }
}

我使用这个配置,没有重现,可以正常倒入集群