多集群管路添加集群显示websocket: bad handshake错误

Rancher Server 设置

  • Rancher 版本:v2.4.5
  • 安装选项 (Docker install/Helm Chart):
    • 如果是 Helm Chart 安装,需要提供 Local 集群的类型(RKE1, RKE2, k3s, EKS, 等)和版本:
  • 在线或离线部署: 离线部署

下游集群信息

  • Kubernetes 版本: 1.18.16
  • Cluster Type (Local/Downstream):
    • 如果 Downstream,是什么类型的集群?(自定义/导入或为托管 等):

用户信息

  • 登录用户的角色是什么? (管理员/集群所有者/集群成员/项目所有者/项目成员/自定义):
    • 如果自定义,自定义权限集:

**主机操作系统:**centos7.8

问题描述:
通过rancher添加构建好的集群纳入rancher多集群管路,部署集群agent
yaml文件如下


---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: proxy-clusterrole-kubeapiserver
rules:
- apiGroups: [""]
  resources:
  - nodes/metrics
  - nodes/proxy
  - nodes/stats
  - nodes/log
  - nodes/spec
  verbs: ["get", "list", "watch", "create"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: proxy-role-binding-kubernetes-master
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: proxy-clusterrole-kubeapiserver
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: User
  name: kube-apiserver
---
apiVersion: v1
kind: Namespace
metadata:
  name: cattle-system

---

apiVersion: v1
kind: ServiceAccount
metadata:
  name: cattle
  namespace: cattle-system

---

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: cattle-admin-binding
  namespace: cattle-system
  labels:
    cattle.io/creator: "norman"
subjects:
- kind: ServiceAccount
  name: cattle
  namespace: cattle-system
roleRef:
  kind: ClusterRole
  name: cattle-admin
  apiGroup: rbac.authorization.k8s.io

---

apiVersion: v1
kind: Secret
metadata:
  name: cattle-credentials-5f2a67e
  namespace: cattle-system
type: Opaque
data:
  url: "aHR0cHM6Ly9yYW5jaGVyLmRldi5jb20="
  token: "OGx6MnY2a3BnNzdsdm1jOXFtdGxqNHhxazI3cnhoZnE5ZDZtcmJ3dnFwOWd4MmJienZ0Zjd2"
  namespace: "bG9jYWw="

---

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cattle-admin
  labels:
    cattle.io/creator: "norman"
rules:
- apiGroups:
  - '*'
  resources:
  - '*'
  verbs:
  - '*'
- nonResourceURLs:
  - '*'
  verbs:
  - '*'

---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cattle-cluster-agent
  namespace: cattle-system
spec:
  selector:
    matchLabels:
      app: cattle-cluster-agent
  template:
    metadata:
      labels:
        app: cattle-cluster-agent
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                - key: beta.kubernetes.io/os
                  operator: NotIn
                  values:
                    - windows
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            preference:
              matchExpressions:
              - key: node-role.kubernetes.io/controlplane
                operator: In
                values:
                - "true"
          - weight: 1
            preference:
              matchExpressions:
              - key: node-role.kubernetes.io/etcd
                operator: In
                values:
                - "true"
      serviceAccountName: cattle
      tolerations:
      - operator: Exists
      containers:
        - name: cluster-register
          imagePullPolicy: IfNotPresent
          env:
          - name: CATTLE_FEATURES
            value: ""
          - name: CATTLE_SERVER
            value: "https://rancher.dev.com"
          - name: CATTLE_CA_CHECKSUM
            value: "28f07020f4aa2ec1be3cd63041721e070427cc274ce743fb85e9d5527500c840"
          - name: CATTLE_CLUSTER
            value: "true"
          - name: CATTLE_K8S_MANAGED
            value: "true"
          image: registry.ce.inc/taxera/rancher-agent:v2.4.5
          volumeMounts:
          - name: cattle-credentials
            mountPath: /cattle-credentials
            readOnly: true
      hostAliases:
      - hostnames:
        - rancher.dev.com
        ip: 10.126.25.242
      dnsPolicy: ClusterFirst
      volumes:
      - name: cattle-credentials
        secret:
          secretName: cattle-credentials-5f2a67e
          defaultMode: 320

---

apiVersion: apps/v1
kind: DaemonSet
metadata:
    name: cattle-node-agent
    namespace: cattle-system
spec:
  selector:
    matchLabels:
      app: cattle-agent
  template:
    metadata:
      labels:
        app: cattle-agent
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                - key: beta.kubernetes.io/os
                  operator: NotIn
                  values:
                    - windows
      hostNetwork: true
      serviceAccountName: cattle
      tolerations:
      - operator: Exists
      containers:
      - name: agent
        image: registry.ce.inc/taxera/rancher-agent:v2.4.5
        imagePullPolicy: IfNotPresent
        env:
        - name: CATTLE_NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: CATTLE_SERVER
          value: "https://rancher.dev.com"
        - name: CATTLE_CA_CHECKSUM
          value: "28f07020f4aa2ec1be3cd63041721e070427cc274ce743fb85e9d5527500c840"
        - name: CATTLE_CLUSTER
          value: "false"
        - name: CATTLE_K8S_MANAGED
          value: "true"
        - name: CATTLE_AGENT_CONNECT
          value: "true"
        volumeMounts:
        - name: cattle-credentials
          mountPath: /cattle-credentials
          readOnly: true
        - name: k8s-ssl
          mountPath: /etc/kubernetes
        - name: var-run
          mountPath: /var/run
        - name: run
          mountPath: /run
        - name: docker-certs
          mountPath: /etc/docker/certs.d
        securityContext:
          privileged: true
      hostAliases:
      - hostnames:
        - rancher.dev.com
        ip: 10.126.25.242
      volumes:
      - name: k8s-ssl
        hostPath:
          path: /etc/kubernetes
          type: DirectoryOrCreate
      - name: var-run
        hostPath:
          path: /var/run
          type: DirectoryOrCreate
      - name: run
        hostPath:
          path: /run
          type: DirectoryOrCreate
      - name: cattle-credentials
        secret:
          secretName: cattle-credentials-5f2a67e
          defaultMode: 320
      - hostPath:
          path: /etc/docker/certs.d
          type: DirectoryOrCreate
        name: docker-certs
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 25%

重现步骤:

结果:
报错显示
websocket: bad handshake 错误
预期结果:

截图:

其他上下文信息:

日志

node-agent报错如下

time="2022-10-16T14:11:10Z" level=error msg="Remotedialer proxy error" error="websocket: bad handshake"
time="2022-10-16T14:11:20Z" level=info msg="Connecting to wss://rancher.dev.com/v3/connect with token 8lz2v6kpg77lvmc9qmtlj4xqk27rxhfq9d6mrbwvqp9gx2bbzvtf7v"
time="2022-10-16T14:11:20Z" level=info msg="Connecting to proxy" url="wss://rancher.dev.com/v3/connect"
time="2022-10-16T14:11:20Z" level=error msg="Failed to connect to proxy. Response status: 200 - 200 OK. Response body: node.management.cattle.io \"local/m-922845cc343e\" not found" error="websocket: bad handshake"
time="2022-10-16T14:11:20Z" level=error msg="Remotedialer proxy error" error="websocket: bad handshake"


  1. 下游集群是通过什么方式创建的?
  2. 执行这次导入之前,有没有把该下游集群导入过 Rancher 里进行纳管?

1.下游集群同样是按照rke部署的
2.导入前rancher有部署没问题,后来删除pod后,利用另一套集群rancher ui添加已有新集群,修改过yaml在新集群执行 报错
3.证书添加信任了新集群的ip

怀疑是你这个集群以前导入过其他的 rancher,然后重复导入到其他的 rancher 的时候因为垃圾数据报错了。

你可以试试 将 deployment/cattle-cluster-agent 删掉,然后重新导入

如果还有问题的话,你可以试试用 GitHub - rancher/rancher-cleanup 删除 rancher 创建的资源,但我没仔细读过这个脚本,可能有风险,建议你先找个测试环境试一下。

我在添加rancher-节点的时候单独写入worker不行,最少需要
Control Worker这两个角色都使用才可以添加,。我的rancher是在4.19.90-23.8.v2101.ky10.aarch64上面搭建的rancher:v2.5.11。。求助大佬,这个是什么清空,日志如下
time=“2024-03-11T08:25:01Z” level=info msg=“Listening on /tmp/log.sock”
time=“2024-03-11T08:25:01Z” level=info msg=“Rancher agent version v2.5.11 is starting”
time=“2024-03-11T08:25:01Z” level=info msg=“Option customConfig=map[address:172.16.16.29 internalAddress: label:map roles:[worker] taints:]”
time=“2024-03-11T08:25:01Z” level=info msg=“Option etcd=false”
time=“2024-03-11T08:25:01Z” level=info msg=“Option controlPlane=false”
time=“2024-03-11T08:25:01Z” level=info msg=“Option worker=true”
time=“2024-03-11T08:25:01Z” level=info msg=“Option requestedHostname=worker1”
time=“2024-03-11T08:25:01Z” level=info msg=“Connecting to wss://172.16.16.9:10443/v3/connect/register with token w234123123123123123”
time=“2024-03-11T08:25:01Z” level=info msg=“Connecting to proxy” url=“wss://172.16.16.9:10443/v3/connect/register”
time=“2024-03-11T08:25:01Z” level=error msg=“Failed to connect to proxy. Response status: 400 - 400 Bad Request. Response body: Operation cannot be fulfilled on nodes.management.cattle.io "m-123123123": the object has been modified; please apply your changes to the latest version and try again” error=“websocket: bad handshake”
time=“2024-03-11T08:25:01Z” level=error msg=“Remotedialer proxy error” error=“websocket: bad handshake”
time=“2024-03-11T08:25:11Z” level=info msg=“Connecting to wss://172.16.16.9:10443/v3/connect/register with token 123123123123123”
time=“2024-03-11T08:25:11Z” level=info msg=“Connecting to proxy” url=“wss://172.16.16.9:10443/v3/connect/register”
time=“2024-03-11T08:25:11Z” level=info msg=“Starting plan monitor, checking every 15 seconds”
time=“2024-03-11T08:25:26Z” level=warning msg=“Unable to read certificate kube-ca: open /etc/kubernetes/ssl/kube-ca.pem: no such file or directory”
time=“2024-03-11T08:25:40Z” level=info msg=“Option worker=true”
time=“2024-03-11T08:25:40Z” level=info msg=“Option requestedHostname=xc-worker1”
time=“2024-03-11T08:25:40Z” level=info msg=“Option customConfig=map[address:172.16.16.29 internalAddress: label:map roles:[worker] taints:]”
time=“2024-03-11T08:25:40Z” level=info msg=“Option etcd=false”
time=“2024-03-11T08:25:40Z” level=info msg=“Option controlPlane=false”
time=“2024-03-11T08:25:42Z” level=info msg=“Option controlPlane=false”
time=“2024-03-11T08:25:42Z” level=info msg=“Option worker=true”
time=“2024-03-11T08:25:42Z” level=info msg=“Option requestedHostname=xc-worker1”
time=“2024-03-11T08:25:42Z” level=info msg=“Option customConfig=map[address:172.16.16.29 internalAddress: label:map roles:[worker] taints:]”
time=“2024-03-11T08:25:42Z” level=info msg=“Option etcd=false”
time=“2024-03-11T08:25:43Z” level=info msg=“Plan monitor checking 120 seconds”
time=“2024-03-11T08:27:45Z” level=info msg=“Plan monitor checking 120 seconds”