2.6.7持续交付问题

Rancher Server 设置

  • Rancher 版本:2.6.7
  • 安装选项 (Docker install/Helm Chart): Helm Chart
    • 如果是 Helm Chart 安装,需要提供 Local 集群的类型(RKE1, RKE2, k3s, EKS, 等)和版本:RKE1
  • 在线或离线部署:在线

下游集群信息

  • Kubernetes 版本: v1.20.15
  • Cluster Type (Local/Downstream): DownStream
    • 如果 Downstream,是什么类型的集群?(自定义/导入或为托管 等): 自定义

用户信息

  • 登录用户的角色是什么? (管理员/集群所有者/集群成员/项目所有者/项目成员/自定义):管理员

主机操作系统:
CentOS7.9
问题描述:
在git仓库新增一个工作负载,由于程序原因此工作负载服务启动失败,所有git上的后续提交均无法部署到下游集群;在下游集群和git中均删除错误的工作负载也无法消除此错误,报错日志显示同步有问题
重现步骤:

结果:

预期结果:

截图:

其他上下文信息:

日志
I0905 08:37:59.928409 1 leaderelection.go:248] attempting to acquire leader lease cattle-fleet-system/fleet-agent-lock...
2022-09-05 16:37:59 I0905 08:37:59.977327 1 leaderelection.go:258] successfully acquired lease cattle-fleet-system/fleet-agent-lock
2022-09-05 16:38:00 time="2022-09-05T08:38:00Z" level=info msg="Starting /v1, Kind=ConfigMap controller"
2022-09-05 16:38:00 time="2022-09-05T08:38:00Z" level=info msg="Starting /v1, Kind=ServiceAccount controller"
2022-09-05 16:38:00 time="2022-09-05T08:38:00Z" level=info msg="Starting /v1, Kind=Secret controller"
2022-09-05 16:38:00 time="2022-09-05T08:38:00Z" level=info msg="Starting /v1, Kind=Node controller"
2022-09-05 16:38:00 time="2022-09-05T08:38:00Z" level=info msg="Starting fleet.cattle.io/v1alpha1, Kind=BundleDeployment controller"
2022-09-05 16:38:00 time="2022-09-05T08:38:00Z" level=info msg="getting history for release fleet-agent-c-qv"
2022-09-05 16:38:00 time="2022-09-05T08:38:00Z" level=info msg="getting history for release fleet-agent-c-qv"
2022-09-05 16:38:00 time="2022-09-05T08:38:00Z" level=info msg="preparing upgrade for manifests"
2022-09-05 16:38:01 time="2022-09-05T08:38:01Z" level=info msg="getting history for release manifests"
2022-09-05 16:38:01 time="2022-09-05T08:38:01Z" level=info msg="getting history for release fleet-agent-c-qv"
2022-09-05 16:38:01 time="2022-09-05T08:38:01Z" level=error msg="error syncing 'cluster-dev-c-qv-87dadc322f7b/manifests': handler bundle-deploy: another operation (install/upgrade/rollback) is in progress, requeuing"
2022-09-05 16:38:02 time="2022-09-05T08:38:02Z" level=info msg="getting history for release fleet-agent-c-qv"
2022-09-05 16:38:02 time="2022-09-05T08:38:02Z" level=info msg="getting history for release fleet-agent-c-qv"
2022-09-05 16:38:02 time="2022-09-05T08:38:02Z" level=info msg="preparing upgrade for manifests"
2022-09-05 16:38:02 time="2022-09-05T08:38:02Z" level=info msg="getting history for release manifests"
2022-09-05 16:38:02 time="2022-09-05T08:38:02Z" level=info msg="getting history for release fleet-agent-c-qv"
2022-09-05 16:38:02 time="2022-09-05T08:38:02Z" level=error msg="error syncing 'cluster-dev-c-qv-87dadc322f7b/manifests': handler bundle-deploy: another operation (install/upgrade/rollback) is in progress, requeuing"
2022-09-05 16:38:03 time="2022-09-05T08:38:03Z" level=info msg="getting history for release fleet-agent-c-qv"
2022-09-05 16:38:03 time="2022-09-05T08:38:03Z" level=info msg="getting history for release fleet-agent-c-qv"
2022-09-05 16:38:03 time="2022-09-05T08:38:03Z" level=info msg="preparing upgrade for manifests"
2022-09-05 16:38:03 time="2022-09-05T08:38:03Z" level=info msg="getting history for release manifests"
2022-09-05 16:38:04 time="2022-09-05T08:38:04Z" level=info msg="getting history for release fleet-agent-c-qv"
2022-09-05 16:38:04 time="2022-09-05T08:38:04Z" level=error msg="error syncing 'cluster-dev-c-qv-87dadc322f7b/manifests': handler bundle-deploy: another operation (install/upgrade/rollback) is in progress, requeuing"

我不知道该如何复现,不过,你试试UI上的Force Update。

Git Repos和Cluster的强制更新全部试过了,手动删除了bundle,bundleDeployment依然不行,我重建一下git repos试试。

我在local集群也发现了这个问题,我是从rancher2.6.6升级到2.6.7后发现的这个问题;local集群没有业务相关的所以这个报错应该没有影响,local集群的fleet-agent日志也是相同的报错。问题应该出在fleet-agent到fleet-controller之间吧,有什么解决办法吗?谢谢!


试试这个解决方案:fleet-agent in state `ErrApplied` with the following reason: `another operation (install/upgrade/rollback) is in progress` · Issue #637 · rancher/fleet · GitHub

我删除了对应的helm secret 目前已正常运行。