授权集群端点(ace)异常

Rancher Server 设置

  • Rancher 版本:v2.8.3
  • 安装选项 (Docker install/Helm Chart):
    • 如果是 Helm Chart 安装,需要提供 Local 集群的类型(RKE1, RKE2, k3s, EKS, 等)和版本:RKE2
  • 在线或离线部署:在线

下游集群信息

  • Kubernetes 版本: v1.26.15 +rke2r1
  • Cluster Type (Local/Downstream): Downstream
    • 如果 Downstream,是什么类型的集群?(自定义/导入或为托管 等): 自定义

用户信息

  • 登录用户的角色是什么? (管理员/集群所有者/集群成员/项目所有者/项目成员/自定义):管理员
    • 如果自定义,自定义权限集:

主机操作系统:

# cat /etc/redhat-release 
Rocky Linux release 8.9 (Green Obsidian)

# uname -a
Linux sg-dev-yk-k8s-master-01-rke2 5.4.195-1.el8.elrepo.x86_64 #1 SMP Tue May 17 15:52:40 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux

问题描述:

启用授权集群端点遇到问题,如下

CURRENT   NAME                                  CLUSTER                               AUTHINFO   NAMESPACE
          yk-dev                                yk-dev                                yk-dev     
          yk-dev-sg-dev-yk-k8s-master-01-rke2   yk-dev-sg-dev-yk-k8s-master-01-rke2   yk-dev     
          yk-dev-sg-dev-yk-k8s-master-02-rke2   yk-dev-sg-dev-yk-k8s-master-02-rke2   yk-dev     
*         yk-dev-sg-dev-yk-k8s-master-03-rke2   yk-dev-sg-dev-yk-k8s-master-03-rke2   yk-dev

[root@sg-dev-yk-k8s-master-01-rke2 spadm]# /var/lib/rancher/rke2/bin/kubectl --kubeconfig yk-dev.yaml --context yk-dev-sg-dev-yk-k8s-master-03-rke2 get node
E0417 12:13:24.416990 1133467 memcache.go:265] couldn't get current server API group list: the server has asked for the client to provide credentials
E0417 12:13:24.419895 1133467 memcache.go:265] couldn't get current server API group list: the server has asked for the client to provide credentials
E0417 12:13:24.422547 1133467 memcache.go:265] couldn't get current server API group list: the server has asked for the client to provide credentials
E0417 12:13:24.425318 1133467 memcache.go:265] couldn't get current server API group list: the server has asked for the client to provide credentials
E0417 12:13:24.428502 1133467 memcache.go:265] couldn't get current server API group list: the server has asked for the client to provide credentials
error: You must be logged in to the server (the server has asked for the client to provide credentials)

上面是我没有使用 FQDN;

当我配置了 fqdn 的时候,并解析到其中一台 master 节点:

[root@sg-dev-yk-k8s-master-01-rke2 spadm]# /var/lib/rancher/rke2/bin/kubectl --kubeconfig yk-dev-fqdn.yaml config get-contexts
CURRENT   NAME          CLUSTER       AUTHINFO   NAMESPACE
          yk-dev        yk-dev        yk-dev     
*         yk-dev-fqdn   yk-dev-fqdn   yk-dev     

[root@sg-dev-yk-k8s-master-01-rke2 spadm]# /var/lib/rancher/rke2/bin/kubectl --kubeconfig yk-dev.yaml get node
E0417 11:25:19.482585 1080477 memcache.go:265] couldn't get current server API group list: Get "https://rancher-ace-yk-dev.ab.aaa/api?timeout=32s": dial tcp 10.65.23.14:443: connect: connection refused
E0417 11:25:19.490030 1080477 memcache.go:265] couldn't get current server API group list: Get "https://rancher-ace-yk-dev.ab.aaa/api?timeout=32s": dial tcp 10.65.23.14:443: connect: connection refused
E0417 11:25:19.496368 1080477 memcache.go:265] couldn't get current server API group list: Get "https://rancher-ace-yk-dev.ab.aaa/api?timeout=32s": dial tcp 10.65.23.14:443: connect: connection refused
E0417 11:25:19.541515 1080477 memcache.go:265] couldn't get current server API group list: Get "https://rancher-ace-yk-dev.ab.aaa/api?timeout=32s": dial tcp 10.65.23.14:443: connect: connection refused
E0417 11:25:19.547848 1080477 memcache.go:265] couldn't get current server API group list: Get "https://rancher-ace-yk-dev.ab.aaa/api?timeout=32s": dial tcp 10.65.23.14:443: connect: connection refused
The connection to the server rancher-ace-yk-dev.ab.aaa was refused - did you specify the right host or port?

重现步骤:

结果:

预期结果:

在启用授权集群端点(ACE)后能通过 FQDN 或生成的各master节点的上下文context访问集群(主要针对 Rancher 不可用时的备用方案)。

截图:

其他上下文信息:

这是各master节点自动生成的 kube-api-authn-webhook.yaml 文件内容:

# cat /var/lib/rancher/rke2/kube-api-authn-webhook.yaml 

apiVersion: v1
kind: Config
clusters:
- name: Default
  cluster:
    insecure-skip-tls-verify: true
    server: http://127.0.0.1:6440/v1/authenticate
users:
- name: Default
  user:
    insecure-skip-tls-verify: true
current-context: webhook
contexts:
- name: webhook
  context:
    user: Default
    cluster: Default

kube-api-auth 服务也是正常的:

下面是 kube-api-auth 的小段日志:

日志
W0420 02:38:12.700964       1 reflector.go:533] pkg/mod/github.com/rancher/client-go@v1.27.4-rancher1/tools/cache/reflector.go:231: failed to list *v3.ClusterAuthToken: the server could not find the requested resource (get clusterauthtokens.meta.k8s.io)
2024-04-20T10:38:12.701353462+08:00 E0420 02:38:12.701033       1 reflector.go:148] pkg/mod/github.com/rancher/client-go@v1.27.4-rancher1/tools/cache/reflector.go:231: Failed to watch *v3.ClusterAuthToken: failed to list *v3.ClusterAuthToken: the server could not find the requested resource (get clusterauthtokens.meta.k8s.io)
W0420 02:38:35.283078       1 reflector.go:533] pkg/mod/github.com/rancher/client-go@v1.27.4-rancher1/tools/cache/reflector.go:231: failed to list *v3.ClusterUserAttribute: the server could not find the requested resource (get clusteruserattributes.meta.k8s.io)
2024-04-20T10:38:35.283440431+08:00 E0420 02:38:35.283161       1 reflector.go:148] pkg/mod/github.com/rancher/client-go@v1.27.4-rancher1/tools/cache/reflector.go:231: Failed to watch *v3.ClusterUserAttribute: failed to list *v3.ClusterUserAttribute: the server could not find the requested resource (get clusteruserattributes.meta.k8s.io)
W0420 02:38:53.693848       1 reflector.go:533] pkg/mod/github.com/rancher/client-go@v1.27.4-rancher1/tools/cache/reflector.go:231: failed to list *v3.ClusterAuthToken: the server could not find the requested resource (get clusterauthtokens.meta.k8s.io)
2024-04-20T10:38:53.694139747+08:00 E0420 02:38:53.693907       1 reflector.go:148] pkg/mod/github.com/rancher/client-go@v1.27.4-rancher1/tools/cache/reflector.go:231: Failed to watch *v3.ClusterAuthToken: failed to list *v3.ClusterAuthToken: the server could not find the requested resource (get clusterauthtokens.meta.k8s.io)

没有人吗???

这个是创建集群成功后,然后修改为启用授权节点?

有尝试过好几次,首先在创建集群的时候就启用了 ACE,发现有问题,使用 kubeconfig 的 fqdn context 获取不到集群信息,然后又修改 ACE 配置,去掉 FQDN,改用 node 的 contexts,发现一样有问题
@ksd

稍后我去重现试试

:+1: :+1: :+1: :+1:

有解决方案还请同步下这里哈