证书到期,K3S部分节点未激活,容器“cluster-register”不可用

Rancher Server 设置

  • Rancher 版本:2.5.11
  • 安装选项 (Docker install/Helm Chart): Docker
  • 在线或离线部署:在线

下游集群信息

  • Kubernetes 版本: v1.20.14+k3s1
  • Cluster Type (Local/Downstream): 导入
    • 如果 Downstream,是什么类型的集群?(自定义/导入或为托管 等):

用户信息

  • 登录用户的角色是什么? (管理员/集群所有者/集群成员/项目所有者/项目成员/自定义):管理员
    • 如果自定义,自定义权限集:

主机操作系统: CentOS 7.6
问题描述:
Rancher服务页面可以正常访问,但无法部署服务。

重现步骤:

结果:

预期结果:

截图:

image

其他上下文信息:
人工检测到服务异常,查看日志,重新执行了证书更新相关操作参考帖子

日志
systemctl status k3s  日志:
[root@k3s-server-192-168-1-216 ~]#systemctl status k3s
● k3s.service - Lightweight Kubernetes
   Loaded: loaded (/etc/systemd/system/k3s.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2023-03-02 19:36:01 CST; 5 months 26 days ago
     Docs: https://k3s.io
  Process: 1042 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
  Process: 1020 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
  Process: 875 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service (code=exited, status=0/SUCCESS)
 Main PID: 1051 (k3s-server)
    Tasks: 179
   Memory: 1.0G
   CGroup: /system.slice/k3s.service
           └─1051 /usr/local/bin/k3s server

Aug 28 11:57:32 k3s-server-192-168-1-216 k3s[1051]: E0828 11:57:32.780187    1051 authentication.go:53] Unable to authenticate the request due to an error: [x509: certificate has expired or is not yet valid: current time 2023-08-28T11:57:32+08:00 is after 2023-08-...
Aug 28 11:57:32 k3s-server-192-168-1-216 k3s[1051]: E0828 11:57:32.843737    1051 authentication.go:53] Unable to authenticate the request due to an error: [x509: certificate has expired or is not yet valid: current time 2023-08-28T11:57:32+08:00 is after 2023-08-...
Aug 28 11:57:32 k3s-server-192-168-1-216 k3s[1051]: E0828 11:57:32.844476    1051 authentication.go:53] Unable to authenticate the request due to an error: [x509: certificate has expired or is not yet valid: current time 2023-08-28T11:57:32+08:00 is after 2023-08-...
Aug 28 11:57:32 k3s-server-192-168-1-216 k3s[1051]: E0828 11:57:32.847344    1051 authentication.go:53] Unable to authenticate the request due to an error: [x509: certificate has expired or is not yet valid: current time 2023-08-28T11:57:32+08:00 is after 2023-08-...
Aug 28 11:57:32 k3s-server-192-168-1-216 k3s[1051]: E0828 11:57:32.850214    1051 authentication.go:53] Unable to authenticate the request due to an error: [x509: certificate has expired or is not yet valid: current time 2023-08-28T11:57:32+08:00 is after 2023-08-...
Aug 28 11:57:32 k3s-server-192-168-1-216 k3s[1051]: E0828 11:57:32.852875    1051 authentication.go:53] Unable to authenticate the request due to an error: [x509: certificate has expired or is not yet valid: current time 2023-08-28T11:57:32+08:00 is after 2023-08-...
Aug 28 11:57:33 k3s-server-192-168-1-216 k3s[1051]: E0828 11:57:33.260446    1051 authentication.go:53] Unable to authenticate the request due to an error: [x509: certificate has expired or is not yet valid: current time 2023-08-28T11:57:33+08:00 is after 2023-04-...
Aug 28 11:57:34 k3s-server-192-168-1-216 k3s[1051]: E0828 11:57:34.392932    1051 authentication.go:53] Unable to authenticate the request due to an error: [x509: certificate has expired or is not yet valid: current time 2023-08-28T11:57:34+08:00 is after 2023-08-...
Aug 28 11:57:35 k3s-server-192-168-1-216 k3s[1051]: E0828 11:57:35.657267    1051 authentication.go:53] Unable to authenticate the request due to an error: [x509: certificate has expired or is not yet valid: current time 2023-08-28T11:57:35+08:00 is after 2023-04-...
Aug 28 11:57:35 k3s-server-192-168-1-216 k3s[1051]: E0828 11:57:35.682038    1051 authentication.go:53] Unable to authenticate the request due to an error: [x509: certificate has expired or is not yet valid: current time 2023-08-28T11:57:35+08:00 is after 2023-08-...
Hint: Some lines were ellipsized, use -l to show in full.
k3s.log
W0828 03:06:43.538114      78 dispatcher.go:134] Failed calling webhook, failing closed rancherauth.cattle.io: failed calling webhook "rancherauth.cattle.io": Post "https://rancher-webhook.cattle-system.svc:443/v1/webhook/validation?timeout=10s": x509: certificate has expired or is not yet valid: current time 2023-08-28T03:06:43Z is after 2023-03-01T07:23:24Z
W0828 03:06:43.539790      78 dispatcher.go:134] Failed calling webhook, failing closed rancherauth.cattle.io: failed calling webhook "rancherauth.cattle.io": Post "https://rancher-webhook.cattle-system.svc:443/v1/webhook/validation?timeout=10s": x509: certificate has expired or is not yet valid: current time 2023-08-28T03:06:43Z is after 2023-03-01T07:23:24Z
W0828 03:06:43.541083      78 dispatcher.go:134] Failed calling webhook, failing closed rancherauth.cattle.io: failed calling webhook "rancherauth.cattle.io": Post "https://rancher-webhook.cattle-system.svc:443/v1/webhook/validation?timeout=10s": x509: certificate has expired or is not yet valid: current time 2023-08-28T03:06:43Z is after 2023-03-01T07:23:24Z
W0828 03:06:43.546962      78 dispatcher.go:134] Failed calling webhook, failing closed rancherauth.cattle.io: failed calling webhook "rancherauth.cattle.io": Post "https://rancher-webhook.cattle-system.svc:443/v1/webhook/validation?timeout=10s": x509: certificate has expired or is not yet valid: current time 2023-08-28T03:06:43Z is after 2023-03-01T07:23:24Z
W0828 03:06:43.548422      78 dispatcher.go:134] Failed calling webhook, failing closed rancherauth.cattle.io: failed calling webhook "rancherauth.cattle.io": Post "https://rancher-webhook.cattle-system.svc:443/v1/webhook/validation?timeout=10s": x509: certificate has expired or is not yet valid: current time 2023-08-28T03:06:43Z is after 2023-03-01T07:23:24Z
W0828 03:06:43.549580      78 dispatcher.go:134] Failed calling webhook, failing closed rancherauth.cattle.io: failed calling webhook "rancherauth.cattle.io": Post "https://rancher-webhook.cattle-system.svc:443/v1/webhook/validation?timeout=10s": x509: certificate has expired or is not yet valid: current time 2023-08-28T03:06:43Z is after 2023-03-01T07:23:24Z
W0828 03:09:23.087963      78 watcher.go:207] watch chan error: etcdserver: mvcc: required revision has been compacted
W0828 03:12:40.637719      78 watcher.go:207] watch chan error: etcdserver: mvcc: required revision has been compacted
W0828 03:13:40.113305      78 watcher.go:207] watch chan error: etcdserver: mvcc: required revision has been compacted
W0828 03:14:14.155640      78 watcher.go:207] watch chan error: etcdserver: mvcc: required revision has been compacted

Rancher容器log:
Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.
, requeuing
W0828 03:21:33.515387       8 warnings.go:80] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
2023/08/28 03:23:23 [INFO] Creating globalRoleBindings for u-35wgz6uyjc
2023/08/28 03:23:23 [INFO] Creating globalRoleBindings for u-5zbiooz4q3
2023/08/28 03:23:23 [INFO] Creating globalRoleBindings for u-l27gg4sfyk
2023/08/28 03:23:23 [INFO] [mgmt-project-rbac-create] Creating creator projectRoleTemplateBinding for user user-l4tn8 for project p-vklgd
2023/08/28 03:23:23 [INFO] [mgmt-project-rbac-create] Creating creator projectRoleTemplateBinding for user user-l4tn8 for project p-rbgds
2023/08/28 03:23:23 [INFO] [mgmt-project-rbac-create] Creating creator projectRoleTemplateBinding for user user-l4tn8 for project p-x6lrk
2023/08/28 03:23:23 [ERROR] error syncing 'c-mftj7/p-vklgd': handler pipeline-controller: Internal error occurred: failed calling webhook "rancherauth.cattle.io": Post "https://rancher-webhook.cattle-system.svc:443/v1/webhook/validation?timeout=10s": x509: certificate has expired or is not yet valid: current time 2023-08-28T03:23:23Z is after 2023-03-01T07:23:24Z, handler mgmt-project-rbac-create: Internal error occurred: failed calling webhook "rancherauth.cattle.io": Post "https://rancher-webhook.cattle-system.svc:443/v1/webhook/validation?timeout=10s": x509: certificate has expired or is not yet valid: current time 2023-08-28T03:23:23Z is after 2023-03-01T07:23:24Z, requeuing
2023/08/28 03:23:23 [ERROR] error syncing 'c-mftj7/p-rbgds': handler pipeline-controller: Internal error occurred: failed calling webhook "rancherauth.cattle.io": Post "https://rancher-webhook.cattle-system.svc:443/v1/webhook/validation?timeout=10s": x509: certificate has expired or is not yet valid: current time 2023-08-28T03:23:23Z is after 2023-03-01T07:23:24Z, handler mgmt-project-rbac-create: Internal error occurred: failed calling webhook "rancherauth.cattle.io": Post "https://rancher-webhook.cattle-system.svc:443/v1/webhook/validation?timeout=10s": x509: certificate has expired or is not yet valid: current time 2023-08-28T03:23:23Z is after 2023-03-01T07:23:24Z, requeuing
2023/08/28 03:23:23 [ERROR] error syncing 'c-mftj7/p-x6lrk': handler pipeline-controller: Internal error occurred: failed calling webhook "rancherauth.cattle.io": Post "https://rancher-webhook.cattle-system.svc:443/v1/webhook/validation?timeout=10s": x509: certificate has expired or is not yet valid: current time 2023-08-28T03:23:23Z is after 2023-03-01T07:23:24Z, handler mgmt-project-rbac-create: Internal error occurred: failed calling webhook "rancherauth.cattle.io": Post "https://rancher-webhook.cattle-system.svc:443/v1/webhook/validation?timeout=10s": x509: certificate has expired or is not yet valid: current time 2023-08-28T03:23:23Z is after 2023-03-01T07:23:24Z, requeuing
W0828 03:24:12.286246       8 warnings.go:80] extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
2023-08-28 03:24:20.048509 I | mvcc: store.index: compact 288931156
2023-08-28 03:24:20.109068 I | mvcc: finished scheduled compaction at 288931156 (took 58.792803ms)
2023/08/28 03:25:18 [ERROR] error syncing 'rancher-partner-charts': handler helm-clusterrepo-ensure: git -C /var/lib/rancher-data/local-catalogs/v2/rancher-partner-charts/8f17acdce9bffd6e05a58a3798840e408c4ea71783381ecd2e9af30baad65974 reset --hard FETCH_HEAD error: exit status 128, detail: fatal: Unable to create '/var/lib/rancher-data/local-catalogs/v2/rancher-partner-charts/8f17acdce9bffd6e05a58a3798840e408c4ea71783381ecd2e9af30baad65974/.git/index.lock': File exists.

Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.
, handler helm-clusterrepo-download: git -C /var/lib/rancher-data/local-catalogs/v2/rancher-partner-charts/8f17acdce9bffd6e05a58a3798840e408c4ea71783381ecd2e9af30baad65974 reset --hard HEAD error: exit status 128, detail: fatal: Unable to create '/var/lib/rancher-data/local-catalogs/v2/rancher-partner-charts/8f17acdce9bffd6e05a58a3798840e408c4ea71783381ecd2e9af30baad65974/.git/index.lock': File exists.

Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.
, requeuing
2023/08/28 03:25:23 [ERROR] error syncing 'rancher-charts': handler helm-clusterrepo-ensure: git -C /var/lib/rancher-data/local-catalogs/v2/rancher-charts/4b40cac650031b74776e87c1a726b0484d0877c3ec137da0872547ff9b73a721 reset --hard FETCH_HEAD error: exit status 128, detail: fatal: Unable to create '/var/lib/rancher-data/local-catalogs/v2/rancher-charts/4b40cac650031b74776e87c1a726b0484d0877c3ec137da0872547ff9b73a721/.git/index.lock': File exists.

Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.
, handler helm-clusterrepo-download: git -C /var/lib/rancher-data/local-catalogs/v2/rancher-charts/4b40cac650031b74776e87c1a726b0484d0877c3ec137da0872547ff9b73a721 reset --hard HEAD error: exit status 128, detail: fatal: Unable to create '/var/lib/rancher-data/local-catalogs/v2/rancher-charts/4b40cac650031b74776e87c1a726b0484d0877c3ec137da0872547ff9b73a721/.git/index.lock': File exists.

Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.
, requeuing
W0828 03:25:29.002931       8 warnings.go:80] extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
命令“kubectl -n cattle-system logs -l app=cattle-cluster-agent ”日志:
[root@k3s-server-192-168-1-216 ~]#kubectl -n cattle-system logs -l app=cattle-cluster-agent
Error from server (BadRequest): container "cluster-register" in pod "cattle-cluster-agent-86f97944-nd82m" is not available
[root@k3s-server-192-168-1-216 ~]#kubectl -n cattle-system logs -l app=cattle-cluster-agent
time="2023-08-28T03:06:37Z" level=info msg="Watching metadata for monitoring.coreos.com/v1, Kind=Prometheus"
time="2023-08-28T03:06:37Z" level=info msg="Watching metadata for policy/v1beta1, Kind=PodSecurityPolicy"
time="2023-08-28T03:06:37Z" level=info msg="Watching metadata for coordination.k8s.io/v1, Kind=Lease"
time="2023-08-28T03:06:37Z" level=info msg="Watching metadata for /v1, Kind=Namespace"
time="2023-08-28T03:06:37Z" level=info msg="Watching metadata for storage.k8s.io/v1, Kind=StorageClass"
time="2023-08-28T03:06:37Z" level=info msg="Watching metadata for /v1, Kind=LimitRange"
time="2023-08-28T03:06:37Z" level=info msg="Watching metadata for helm.cattle.io/v1, Kind=HelmChartConfig"
time="2023-08-28T03:07:46Z" level=error msg="error syncing 'rancher-partner-charts': handler helm-clusterrepo-ensure: git -C /var/lib/rancher-data/local-catalogs/v2/rancher-partner-charts/8f17acdce9bffd6e05a58a3798840e408c4ea71783381ecd2e9af30baad65974 fetch origin 2b3f1f1c928b67eada993370101611e29d4c9d87 error: exit status 128, detail: error: RPC failed; curl 56 GnuTLS recv error (-54): Error in the pull function.\nfatal: The remote end hung up unexpectedly\nfatal: early EOF\nfatal: index-pack failed\n, requeuing"
W0828 03:13:41.806002      53 warnings.go:80] extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
W0828 03:13:56.344393      53 warnings.go:80] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
Error from server (BadRequest): container "cluster-register" in pod "cattle-cluster-agent-86f97944-jh7r2" is not available
命令 “kubectl get no” 日志:
[root@k3s-server-192-168-1-216 ~]#kubectl get no
NAME                       STATUS     ROLES                  AGE    VERSION
k3s-agent-192-168-1-171    NotReady   <none>                 369d   v1.20.14+k3s1
k3s-agent-192-168-1-224    NotReady   <none>                 503d   v1.20.14+k3s1
k3s-server-192-168-1-216   Ready      control-plane,master   503d   v1.20.14+k3s1
k3s-agent-192-168-1-243    Ready      <none>                 503d   v1.20.14+k3s1

暂时没啥好的思路,因为你的是导入的 K3s 集群,而 K3s 集群有两个节点是 NotReady 的,所以应该先忽略 rancher,先把K3s 的问题解决了

从你的日志来看,controlplan 节点存在证书过期的问题,那你可以参考下面的命令来先手动更新证书:

kubectl --insecure-skip-tls-verify -n kube-system delete secrets k3s-serving
rm -rf /var/lib/rancher/k3s/server/tls/dynamic-cert.json
service k3s restart

然后继续排查另外两个 NotReday 状态的 worker 节点的日志,centos 系统应该会输出到 syslog 中

执行了以上命令,还是提示证书过期:

Aug 28 18:25:25 k3s-server-192-168-1-216 k3s[51735]: E0828 18:25:25.786055   51735 authentication.go:53] Unable to authenticate the request due to an error: [x509: certificate has expired or is not yet valid: current time 2023-08-28T18:25:25+08:00 is after 2023-04...

问题得以解决,梳理了下,供大家参考。

问题一:部分k3s节点状态为 NotReady
解决办法:卸载NotReady节点,重新安装,如在线安装不成功,可参考官方离线安装文档,注意指定容器,代码如下:

wget https://get.k3s.io
mv index.html install.sh
chmod +x install.sh
INSTALL_K3S_SKIP_DOWNLOAD=true K3S_NODE_NAME=${节点名} K3S_URL=https://公网IP:6443 K3S_TOKEN=${k3s server token} INSTALL_K3S_EXEC=--docker ./install.sh

问题二:k3s更新证书后,还提示已到期
解决办法:查找所有k3s节点上的证书,人工核对是否有过期的证书,如果有则备份到其它目录,拷贝或生成新的,查找证书代码如下:

for i in `ls /var/lib/rancher/k3s/server/tls/*.crt`; do echo $i; openssl x509 -enddate -noout -in $i; done
for i in `ls /var/lib/rancher/k3s/agent/*.crt`; do echo $i; openssl x509 -enddate -noout -in $i; done

问题三:离线安装k3s时未正确指定容器,导致部署服务是无法拉取镜像
解决办法:参考官方文档,安装命令中加参数配置:INSTALL_K3S_EXEC=–docker

1 个赞