Rancher 升级v2.6.8报错

Rancher Server 设置

  • Rancher 版本:V2.6.7
  • 安装选项 (Docker install/Helm Chart):
    • 如果是 Helm Chart 安装,需要提供 Local 集群的类型(RKE1, RKE2, k3s, EKS, 等)和版本:
      helm v3.10.0
      rke1 v1.3.15
  • 在线或离线部署:

下游集群信息

  • Kubernetes 版本:
  • Cluster Type (Local/Downstream):
    • 如果 Downstream,是什么类型的集群?(自定义/导入或为托管 等):

用户信息

  • 登录用户的角色是什么? (管理员/集群所有者/集群成员/项目所有者/项目成员/自定义):
    • 如果自定义,自定义权限集:

主机操作系统:

问题描述:
通过helm 从rancher v2.6.7 升级到 v2.6.8 报错
重现步骤:
helm upgrade rancher rancher-latest/rancher
–namespace cattle-system
-f values.yaml
–version=v2.6.8
结果:
Error: UPGRADE FAILED: create: failed to create: Internal error occurred: failed calling webhook “rancher.cattle.io”: failed to call webhook: Post “https://rancher-webhook.cattle-system.svc:443/v1/webhook/mutation?timeout=10s”: context deadline exceeded

预期结果:

截图:

其他上下文信息:

日志


在 local 集群 kubectl get pods -A 查看 pod 是否都已经启动,如果没启动可以进去看看日志


所有pod均正常运行,以前同样的方法进行版本升级,都是正常的,就这次遇到这个问题

查看 cattle-system ns 下rancher-webhook workload的日志

日志都是这些内容

E0928 09:54:48.517677 1 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
time=“2022-09-28T18:40:08Z” level=info msg=“Sleeping for 15 seconds then applying webhook config”
time=“2022-09-28T18:40:08Z” level=info msg=“Updating TLS secret for cattle-webhook-tls (count: 1): map[listener.cattle.io/cn-rancher-webhook.cattle-system.svc:rancher-webhook.cattle-system.svc listener.cattle.io/fingerprint:SHA1=47D6FC44FC665B092C2786D0D446494575EA04FD]”
E0928 18:40:19.228457 1 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0928 18:40:19.275936 1 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0928 18:40:29.243909 1 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request

我有一个长期运行环境,一直触发类似的升级动作,没有触发这个问题。也许和你的环境有一些特殊差别,我没有特别的线索。

可以尝试redeploy rancher-webhook,然后尝试重新执行一下helm upgrade。

或者,如果你比较熟悉k8s admission webhook,可以尝试先清理掉这个它,然后redeploy racher-webhook workload时会自动重新生成。

试过很多种方法,还是报这个错,升级不了


[root@acc-rha-01 rha]# kubectl logs -f rancher-webhook-576c5b6859-cdp5h -n cattle-system
time=“2022-10-11T07:52:26Z” level=info msg=“Rancher-webhook version dev (HEAD) is starting”
time=“2022-10-11T07:52:42Z” level=info msg=“Active TLS secret cattle-webhook-tls (ver=244605661) (count 1): map[listener.cattle.io/cn-rancher-webhook.cattle-system.svc:rancher-webhook.cattle-system.svc listener.cat tle.io/fingerprint:SHA1=BA4DCF03B5AA72868735A3C166DFCCB9BDADF3A4]”
time=“2022-10-11T07:52:42Z” level=info msg=“Listening on :9443”
E1011 07:52:42.869655 1 memcache.go:196] couldn’t get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E1011 07:52:42.886380 1 memcache.go:101] couldn’t get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
time=“2022-10-11T07:52:43Z” level=info msg=“Starting rbac.authorization.k8s.io/v1, Kind=RoleBinding controller”
time=“2022-10-11T07:52:43Z” level=info msg=“Starting rbac.authorization.k8s.io/v1, Kind=Role controller”
time=“2022-10-11T07:52:43Z” level=info msg=“Starting /v1, Kind=Secret controller”
time=“2022-10-11T07:52:43Z” level=info msg=“Sleeping for 15 seconds then applying webhook config”
time=“2022-10-11T07:52:43Z” level=info msg=“Starting apiextensions.k8s.io/v1, Kind=CustomResourceDefinition controller”
time=“2022-10-11T07:52:43Z” level=info msg=“Starting rbac.authorization.k8s.io/v1, Kind=ClusterRole controller”
time=“2022-10-11T07:52:43Z” level=info msg=“Starting rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding controller”
time=“2022-10-11T07:52:43Z” level=info msg=“Starting management.cattle.io/v3, Kind=GlobalRole controller”
time=“2022-10-11T07:52:43Z” level=info msg=“Starting management.cattle.io/v3, Kind=RoleTemplate controller”
time=“2022-10-11T07:52:43Z” level=info msg=“Starting apiregistration.k8s.io/v1, Kind=APIService controller”
time=“2022-10-11T07:52:43Z” level=info msg=“Updating TLS secret for cattle-webhook-tls (count: 1): map[listener.cattle.io/cn-rancher-webhook.cattle-system.svc:rancher-webhook.cattle-system.svc listener.cattle.io/fi ngerprint:SHA1=BA4DCF03B5AA72868735A3C166DFCCB9BDADF3A4]”
E1011 07:52:44.419788 1 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E1011 07:52:44.420568 1 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E1011 07:52:44.468740 1 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request

通过恢复备份档案,已经完成了升级