Rancher从2.7.5升级至2.7.6后下游集群的fleet-agent启动失败

Rancher Server 设置

  • Rancher 版本:2.7.6
  • 安装选项 (Docker install/Helm Chart): Helm Chart安装,Local集群为k3s v1.26.5+k3s1
    • 如果是 Helm Chart 安装,需要提供 Local 集群的类型(RKE1, RKE2, k3s, EKS, 等)和版本:
  • 在线或离线部署:在线部署

下游集群信息

  • Kubernetes 版本: k8s-v1.25.3
  • Cluster Type (Local/Downstream):
    • 如果 Downstream,是什么类型的集群?(自定义/导入或为托管 等): 导入

用户信息

  • 登录用户的角色是什么? (管理员/集群所有者/集群成员/项目所有者/项目成员/自定义):管理员
    • 如果自定义,自定义权限集:

主机操作系统:
Ubuntu22.04
问题描述:
rancher版本从2.7.5升级至2.7.6之后,下游集群的fleet-agent的pod启动失败并带有日志报错,但local集群的fleet-agent服务是启动正常没有任何报错,local集群和下游进去均未开启使用fleet功能。
重现步骤:
期间重启过local集群的fleet-agent的pod一直是正常的,但重启下游集群的fleet-agent的pod还是同样的报错,容器启动后过2到3分钟直接报错。
结果:
下游集群的fleet-agent服务pod运行2到3分钟后直接报错并自动重启。
预期结果:
fleet-agent应该是正常启动并无报错并无自动重启。
截图:



其他上下文信息:

日志
I1026 09:54:14.631776       1 leaderelection.go:248] attempting to acquire leader lease cattle-fleet-system/fleet-agent-lock...
I1026 09:55:01.549676       1 leaderelection.go:258] successfully acquired lease cattle-fleet-system/fleet-agent-lock
time="2023-10-26T09:55:01Z" level=warning msg="Cannot find fleet-agent secret, running registration"
panic: assignment to entry in nil map

goroutine 100 [running]:
github.com/rancher/fleet/internal/cmd/agent/register.createAgentSecret({0x2b21500, 0xc0007aab40}, {0x0, 0x0}, {0x2b2dd90, 0xc0001afb10}, 0xc0001a3b80)
	/go/src/github.com/rancher/fleet/internal/cmd/agent/register/register.go:174 +0x3dc
github.com/rancher/fleet/internal/cmd/agent/register.runRegistration({0x2b21500, 0xc0007aab40}, {0x2b2dd90?, 0xc0001afb10?}, {0xc00005800a, 0x13}, {0x0, 0x0})
	/go/src/github.com/rancher/fleet/internal/cmd/agent/register/register.go:118 +0x1af
github.com/rancher/fleet/internal/cmd/agent/register.tryRegister({0x2b21500, 0xc0007aab40}, {0xc00005800a, 0x13}, {0x0, 0x0}, 0x1?)
	/go/src/github.com/rancher/fleet/internal/cmd/agent/register/register.go:81 +0x325
github.com/rancher/fleet/internal/cmd/agent/register.Register({0x2b21500, 0xc0007aab40}, {0xc00005800a, 0x13}, {0x0, 0x0}, 0x0?)
	/go/src/github.com/rancher/fleet/internal/cmd/agent/register/register.go:53 +0x97
github.com/rancher/fleet/internal/cmd/agent.start.func1({0x2b21500, 0xc0007aab40})
	/go/src/github.com/rancher/fleet/internal/cmd/agent/start.go:58 +0x9e
created by github.com/rancher/wrangler/pkg/leader.run.func1
	/go/pkg/mod/github.com/rancher/wrangler@v1.1.1/pkg/leader/leader.go:58 +0x98

可以参考 [BUG] Fleet-agent panics on k3s node driver cluster and doesn't recover · Issue #43012 · rancher/rancher · GitHub

将 fleet 的版本降级到 0.7.1

将fleet版本降级至0.7.1,已解决上述问题