Rancher 警告: 组件 controller-manager 不健康

ryan · 2022 年3 月 28 日 09:06

页面显示不健康

rancher version:v2.5.12
tke(腾讯云k8s)k8s version：v1.18

腾讯云回复：为什么 controller-manager 和 scheduler 状态显示 Unhealthy · TKE Handbook
托管式集群就是这样，没版本解决

请问下大佬在rancher上怎么解决呢？没有master节点权限

leo · 2022 年3 月 28 日 09:29

这是因为 kube-controller-manager和kube-scheduler没有开启监控端口，您可以在/etc/kubernetes/manifests/kube-controller-manager.yaml和/etc/kubernetes/manifests/kube-scheduler.yaml注解此端口(- --port=0)以显示正常。

ryan · 2022 年3 月 28 日 09:38

腾讯云的说他们的是设置为0的的就是注释的。他们解释说是因为托管方式集群的 apiserver 与 controller-manager 和 scheduler 不在同一个节点导致的了这个状态。有其他方式解决吗？

ryan · 2022 年3 月 28 日 09:41

https://tencentcloudcontainerteam.github.io/tke-handbook/tke/why-controller-manager-and-scheduler-unhealthy.html

niusmallnan · 2022 年3 月 29 日 02:05

没有啥好办法，只能暂时忽略吧。只是UI上的提示，本身不影响使用。
因为Rancher使用的是Kubernetes的标准API来查看组件状态，不过这个API随着K8s的发展，也逐步废弃了。K8s也没有提供新的机制，并且K8s也不打算提供这类的检查接口，因为现在K8s发行版很碎片化。

github.com/kubernetes/enhancements

Deprecate ComponentStatus

opened 10:32PM - 16 Feb 18 UTC

closed 03:09PM - 14 Sep 20 UTC

rphillips

sig/cluster-lifecycle sig/api-machinery stage/alpha kind/feature

# Feature Description - One-line feature description (can be used as a release …note): Deprecate ComponentStatus APIs and kubectl componentstatus functionality - Primary contact (assignee): @rphillips - Responsible SIGs: @kubernetes/sig-cluster-lifecycle-feature-requests - Design proposal link (community repo): - Link to e2e and/or unit tests: - Issue https://github.com/kubernetes/kubernetes/issues/19570 - Reviewer(s) - (for LGTM) recommend having 2+ reviewers (at least one from code-area OWNERS file) agreed to review. Reviewers from multiple companies preferred: /cc @justinsb @hongchaodeng - Approver (likely from SIG/area to which feature belongs): - Feature target (which target equals to which milestone): - Alpha release target (x.y): add code to deprecate kubectl get cs. 1.10 add code to deprecate the APIs - Beta release target (x.y): change documentation and commandline deprecation to beta - Stable release target (x.y): Remove kubectl componentstatus functionality and APIs ComponentStatus is functionality to get the health of kubernetes components: etcd, controller manager, and scheduler. The [code](https://github.com/kubernetes/kubernetes/blob/master/pkg/registry/core/rest/storage_core.go#L240-L277) attempts to query controller manager and scheduler at a static (127.0.0.1) address and fixed port. This requires the components to be run alongside the API server, which might not necessarily be the case in all installations (see: https://github.com/kubernetes/kubernetes/issues/19570#issuecomment-354812863). In addition, the code queries etcd servers for their health which could be out of scope of kubernetes, or problematic to query from a networking standpoint as well. We could add registration of the controller manager and scheduler (ip+port), like we do with the Lease Endpoint Reconciler for API servers directly within the storage-api (etcd), but this was a stop-gap solution. This proposal is to deprecate the ComponentStatus API and cli, and eventually remove them around the 1.12-1.13 release.

Rancher的做法是默认以标准API探测组件健康，然后针对一些发行版忽略错误提示，参考：

全球来看，TKE还不是覆盖比较广的发行版，所以 detector 中没有纳入它。

ryan · 2022 年3 月 29 日 06:28

大佬能在页面上忽略这个警告吗？

leo · 2022 年3 月 29 日 08:44

我在我的环境中修改后显示正常

Warning: v1 ComponentStatus is deprecated in v1.19+
NAME                 STATUS    MESSAGE                         ERROR
scheduler            Healthy   ok                              
controller-manager   Healthy   ok                              
etcd-0               Healthy   {"health":"true","reason":""}

我修改了
kube-scheduler.yaml - --port=0 修改为 - --port=10251
kube-controller-manager - --port=0
备注：- --port=0没有注解。
重启 systemctl restart kubelt 后显示正常。

ksd · 2022 年3 月 29 日 09:29

但他的是公有云，控制组件的的参数他改不了，

ryan · 2022 年3 月 30 日 02:38

公有云托管的，没管理权限，尴尬

yulaoshi · 2022 年8 月 26 日 08:13

想问问您知道这个配置文件在哪里吗

yulaoshi · 2022 年8 月 26 日 08:35

大佬想问下这个配置文件在哪里吗，我是docker部署的