Rancher2.6.12系统自带k3s不停启动helm pod

Rancher Server 设置

  • Rancher 版本:2.6.12
  • 安装选项 (Docker install): docker run 启动
  • 在线或离线部署:在线部署

用户信息

  • 登录用户的角色是什么? :管理员

主机操作系统:
centos7.9
问题描述:
通过docker run rancher-server启动了rancher自带的local k3s集群,一段时间后发现该集群会不停启动helm的pods,启动到57个之后就会停止,可能是上限到了,这些pod的状态都是失败的,截图上都有,原因就是镜像rancher/shell:v0.1.19无法拉取,但是无论宿主机还是容器内部都有这个镜像,所以有啥办法解决这个问题,我之前部署的2.5.x好像也碰到类似的问题,但是有些server就不会出现这个问题,触发的条件到现在还不明确

截图:



其他上下文信息:

[details=“日志”]

2024/10/03 08:10:20 [ERROR] error syncing 'c-jsv2d': handler global-admin-cluster-sync: failed to get GlobalRoleBinding for 'globaladmin-user-8wrnn': %!!(MISSING)w(<nil>), requeuing
2024/10/03 08:10:25 [ERROR] error syncing 'local': handler global-admin-cluster-sync: failed to get GlobalRoleBinding for 'globaladmin-user-8wrnn': %!!(MISSING)w(<nil>), requeuing
2024/10/03 08:10:42 [ERROR] Failed to install system chart rancher-webhook: pod cattle-system/helm-operation-cj692 failed, watch closed
2024/10/03 08:11:44 [ERROR] error syncing 'cattle-fleet-system/helm-operation-qpdq2': handler helm-operation: Operation cannot be fulfilled on operations.catalog.cattle.io "helm-operation-qpdq2": StorageError: invalid object, Code: 4, Key: /registry/catalog.cattle.io/operations/cattle-fleet-system/helm-operation-qpdq2, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 3d77d443-df9c-4dec-b491-44f45b511e98, UID in object meta: , requeuing
2024/10/03 08:11:45 [ERROR] Failed to install system chart fleet-crd: pod cattle-system/helm-operation-bz2jv failed, watch closed
2024/10/03 08:12:20 [ERROR] error syncing 'c-jsv2d': handler global-admin-cluster-sync: failed to get GlobalRoleBinding for 'globaladmin-user-8wrnn': %!!(MISSING)w(<nil>), requeuing
2024/10/03 08:12:25 [ERROR] error syncing 'local': handler global-admin-cluster-sync: failed to get GlobalRoleBinding for 'globaladmin-user-8wrnn': %!!(MISSING)w(<nil>), requeuing
2024/10/03 08:12:47 [ERROR] Failed to install system chart fleet: pod cattle-system/helm-operation-rl78q failed, watch closed
2024/10/03 08:13:09 [ERROR] Error during subscribe websocket: close sent
2024/10/03 08:13:09 [ERROR] Error during subscribe websocket: close sent
2024/10/03 08:13:49 [ERROR] Failed to install system chart rancher-webhook: pod cattle-system/helm-operation-qpc74 failed, watch closed
2024/10/03 08:14:20 [ERROR] error syncing 'c-jsv2d': handler global-admin-cluster-sync: failed to get GlobalRoleBinding for 'globaladmin-user-8wrnn': %!!(MISSING)w(<nil>), requeuing
2024/10/03 08:14:25 [ERROR] error syncing 'local': handler global-admin-cluster-sync: failed to get GlobalRoleBinding for 'globaladmin-user-8wrnn': %!!(MISSING)w(<nil>), requeuing
2024/10/03 08:14:52 [ERROR] Failed to install system chart fleet-crd: pod cattle-system/helm-operation-f6b99 failed, watch closed
2024/10/03 08:15:54 [ERROR] Failed to install system chart fleet: pod cattle-system/helm-operation-rrch7 failed, watch closed
2024/10/03 08:16:20 [ERROR] error syncing 'c-jsv2d': handler global-admin-cluster-sync: failed to get GlobalRoleBinding for 'globaladmin-user-8wrnn': %!!(MISSING)w(<nil>), requeuing
2024/10/03 08:16:25 [ERROR] error syncing 'local': handler global-admin-cluster-sync: failed to get GlobalRoleBinding for 'globaladmin-user-8wrnn': %!!(MISSING)w(<nil>), requeuing
2024/10/03 08:16:56 [ERROR] Failed to install system chart rancher-webhook: pod cattle-system/helm-operation-k68rd failed, watch closed
2024/10/03 08:17:58 [ERROR] Failed to install system chart fleet-crd: pod cattle-system/helm-operation-9vf86 failed, watch closed
2024/10/03 08:18:20 [ERROR] error syncing 'c-jsv2d': handler global-admin-cluster-sync: failed to get GlobalRoleBinding for 'globaladmin-user-8wrnn': %!!(MISSING)w(<nil>), requeuing
2024/10/03 08:18:25 [ERROR] error syncing 'local': handler global-admin-cluster-sync: failed to get GlobalRoleBinding for 'globaladmin-user-8wrnn': %!!(MISSING)w(<nil>), requeuing
2024/10/03 08:19:00 [ERROR] Failed to install system chart fleet: pod cattle-system/helm-operation-wj2xd failed, watch closed
2024/10/03 08:20:02 [ERROR] Failed to install system chart rancher-webhook: pod cattle-system/helm-operation-4gjth failed, watch closed
2024/10/03 08:20:20 [ERROR] error syncing 'c-jsv2d': handler global-admin-cluster-sync: failed to get GlobalRoleBinding for 'globaladmin-user-8wrnn': %!!(MISSING)w(<nil>), requeuing
2024/10/03 08:20:25 [ERROR] error syncing 'local': handler global-admin-cluster-sync: failed to get GlobalRoleBinding for 'globaladmin-user-8wrnn': %!!(MISSING)w(<nil>), requeuing
2024/10/03 08:21:04 [ERROR] Failed to install system chart fleet-crd: pod cattle-system/helm-operation-57xrq failed, watch closed
2024/10/03 08:22:06 [ERROR] Failed to install system chart fleet: pod cattle-system/helm-operation-4xjn5 failed, watch closed
2024/10/03 08:22:20 [ERROR] error syncing 'c-jsv2d': handler global-admin-cluster-sync: failed to get GlobalRoleBinding for 'globaladmin-user-8wrnn': %!!(MISSING)w(<nil>), requeuing
2024/10/03 08:22:25 [ERROR] error syncing 'local': handler global-admin-cluster-sync: failed to get GlobalRoleBinding for 'globaladmin-user-8wrnn': %!!(MISSING)w(<nil>), requeuing
2024/10/03 08:23:08 [ERROR] Failed to install system chart rancher-webhook: pod cattle-system/helm-operation-df4jz failed, watch closed
2024/10/03 08:24:10 [ERROR] Failed to install system chart fleet-crd: pod cattle-system/helm-operation-7ql5c failed, watch closed
2024/10/03 08:24:20 [ERROR] error syncing 'c-jsv2d': handler global-admin-cluster-sync: failed to get GlobalRoleBinding for 'globaladmin-user-8wrnn': %!!(MISSING)w(<nil>), requeuing
2024/10/03 08:24:25 [ERROR] error syncing 'local': handler global-admin-cluster-sync: failed to get GlobalRoleBinding for 'globaladmin-user-8wrnn': %!!(MISSING)w(<nil>), requeuing
2024/10/03 08:25:12 [ERROR] Failed to install system chart fleet: pod cattle-system/helm-operation-7mvc6 failed, watch closed
2024/10/03 08:26:14 [ERROR] Failed to install system chart rancher-webhook: pod cattle-system/helm-operation-s6t2k failed, watch closed
2024/10/03 08:26:20 [ERROR] error syncing 'c-jsv2d': handler global-admin-cluster-sync: failed to get GlobalRoleBinding for 'globaladmin-user-8wrnn': %!!(MISSING)w(<nil>), requeuing
2024/10/03 08:26:25 [ERROR] error syncing 'local': handler global-admin-cluster-sync: failed to get GlobalRoleBinding for 'globaladmin-user-8wrnn': %!!(MISSING)w(<nil>), requeuing
2024/10/03 08:27:16 [ERROR] Failed to install system chart fleet-crd: pod cattle-system/helm-operation-dq7hs failed, watch closed
2024/10/03 08:28:18 [ERROR] Failed to install system chart fleet: pod cattle-system/helm-operation-7d2ps failed, watch closed
2024/10/03 08:28:20 [ERROR] error syncing 'c-jsv2d': handler global-admin-cluster-sync: failed to get GlobalRoleBinding for 'globaladmin-user-8wrnn': %!!(MISSING)w(<nil>), requeuing
2024/10/03 08:28:25 [ERROR] error syncing 'local': handler global-admin-cluster-sync: failed to get GlobalRoleBinding for 'globaladmin-user-8wrnn': %!!(MISSING)w(<nil>), requeuing
2024/10/03 08:29:21 [ERROR] Failed to install system chart rancher-webhook: pod cattle-system/helm-operation-rmzkf failed, watch closed
2024/10/03 08:30:20 [ERROR] error syncing 'c-jsv2d': handler global-admin-cluster-sync: failed to get GlobalRoleBinding for 'globaladmin-user-8wrnn': %!!(MISSING)w(<nil>), requeuing
2024/10/03 08:30:23 [ERROR] Failed to install system chart fleet-crd: pod cattle-system/helm-operation-w82bv failed, watch closed
2024/10/03 08:30:25 [ERROR] error syncing 'local': handler global-admin-cluster-sync: failed to get GlobalRoleBinding for 'globaladmin-user-8wrnn': %!!(MISSING)w(<nil>), requeuing
2024/10/03 08:31:25 [ERROR] Failed to install system chart fleet: pod cattle-system/helm-operation-4q4zx failed, watch closed
2024/10/03 08:32:20 [ERROR] error syncing 'c-jsv2d': handler global-admin-cluster-sync: failed to get GlobalRoleBinding for 'globaladmin-user-8wrnn': %!!(MISSING)w(<nil>), requeuing
2024/10/03 08:32:26 [ERROR] error syncing 'local': handler global-admin-cluster-sync: failed to get GlobalRoleBinding for 'globaladmin-user-8wrnn': %!!(MISSING)w(<nil>), requeuing
2024/10/03 08:32:27 [ERROR] Failed to install system chart rancher-webhook: pod cattle-system/helm-operation-55zgg failed, watch closed
2024/10/03 08:33:29 [ERROR] Failed to install system chart fleet-crd: pod cattle-system/helm-operation-kq4sk failed, watch closed


事情是这样的,你这个问题,很明显,是因为镜像没拉取下来导致。

虽然你查看到了宿主机上有这个镜像,但是你的 rancher 是通过 docker run 启动的容器,这个容器里面封装了一个 K3S 集群,这个 K3s 集群支撑 rancher 运行,所以说,容器内部运行的 K3S 使用的镜像和宿主机的镜像的隔离的。

要解决这个问题,可以将缺失的镜像上传到 docker 容器内的 K3s 集群中,可以将镜像打个 tar 包,然后 放到 /var/lib/rancher/k3s/agent/images/ 中应该就可以了,可参考:https://docs.k3s.io/installation/airgap#prepare-the-images-directory-and-airgap-image-tarball

Container: helm
虽然镜像能通过该方式拉到,但是它里面的helm 容器启动报错,日志如下

helm upgrade --force-adopt=true --history-max=5 --install=true --namespace=cattle-fleet-system --reset-values=true --timeout=5m0s --values=/home/shell/helm/values-fleet-100.2.3-up0.5.3.yaml --version=100.2.3+up0.5.3 --wait=true fleet /home/shell/helm/fleet-100.2.3-up0.5.3.tgz

Thu, Oct 10 2024 5:37:59 pmError: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress

看看是同一个问题不,大量 helm-operation 异常是什么

是同一个问题
在installed apps里面有一个fleet的状态是pending upgrade

报错日志如下:
helm upgrade --history-max=5 --install=true --namespace=cattle-fleet-system --timeout=10m0s --values=/home/shell/helm/values-fleet-crd-100.2.3-up0.5.3.yaml --version=100.2.3+up0.5.3 --wait=true fleet-crd /home/shell/helm/fleet-crd-100.2.3-up0.5.3.tgz

Mon, Oct 14 2024 3:35:59 pmchecking 12 resources for changes

Mon, Oct 14 2024 3:35:59 pmPatch CustomResourceDefinition “bundles.fleet.cattle.io” in namespace

Mon, Oct 14 2024 3:35:59 pmPatch CustomResourceDefinition “bundledeployments.fleet.cattle.io” in namespace

Mon, Oct 14 2024 3:35:59 pmPatch CustomResourceDefinition “bundlenamespacemappings.fleet.cattle.io” in namespace

Mon, Oct 14 2024 3:35:59 pmPatch CustomResourceDefinition “clustergroups.fleet.cattle.io” in namespace

Mon, Oct 14 2024 3:35:59 pmPatch CustomResourceDefinition “clusters.fleet.cattle.io” in namespace

Mon, Oct 14 2024 3:35:59 pmPatch CustomResourceDefinition “clusterregistrationtokens.fleet.cattle.io” in namespace

Mon, Oct 14 2024 3:35:59 pmPatch CustomResourceDefinition “gitrepos.fleet.cattle.io” in namespace

Mon, Oct 14 2024 3:35:59 pmPatch CustomResourceDefinition “clusterregistrations.fleet.cattle.io” in namespace

Mon, Oct 14 2024 3:35:59 pmPatch CustomResourceDefinition “gitreporestrictions.fleet.cattle.io” in namespace

Mon, Oct 14 2024 3:35:59 pmPatch CustomResourceDefinition “contents.fleet.cattle.io” in namespace

Mon, Oct 14 2024 3:35:59 pmPatch CustomResourceDefinition “imagescans.fleet.cattle.io” in namespace

Mon, Oct 14 2024 3:35:59 pmPatch CustomResourceDefinition “gitjobs.gitjob.cattle.io” in namespace

Mon, Oct 14 2024 3:35:59 pmbeginning wait for 12 resources with timeout of 10m0s

Mon, Oct 14 2024 3:35:59 pmRelease “fleet-crd” has been upgraded. Happy Helming!

Mon, Oct 14 2024 3:35:59 pmNAME: fleet-crd

Mon, Oct 14 2024 3:35:59 pmLAST DEPLOYED: Mon Oct 14 07:35:58 2024

Mon, Oct 14 2024 3:35:59 pmNAMESPACE: cattle-fleet-system

Mon, Oct 14 2024 3:35:59 pmSTATUS: deployed

Mon, Oct 14 2024 3:35:59 pmREVISION: 3

Mon, Oct 14 2024 3:35:59 pmTEST SUITE: None

Mon, Oct 14 2024 3:35:59 pm

Mon, Oct 14 2024 3:35:59 pm---------------------------------------------------------------------

Mon, Oct 14 2024 3:35:59 pmSUCCESS: helm upgrade --history-max=5 --install=true --namespace=cattle-fleet-system --timeout=10m0s --values=/home/shell/helm/values-fleet-crd-100.2.3-up0.5.3.yaml --version=100.2.3+up0.5.3 --wait=true fleet-crd /home/shell/helm/fleet-crd-100.2.3-up0.5.3.tgz

Mon, Oct 14 2024 3:35:59 pm---------------------------------------------------------------------

Mon, Oct 14 2024 3:35:59 pmhelm upgrade --history-max=5 --install=true --namespace=cattle-fleet-system --timeout=10m0s --values=/home/shell/helm/values-fleet-100.2.3-up0.5.3.yaml --version=100.2.3+up0.5.3 --wait=true fleet /home/shell/helm/fleet-100.2.3-up0.5.3.tgz

Mon, Oct 14 2024 3:35:59 pmError: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress