Rancher-agent没有了怎么恢复呢？

catterlai · 2023 年6 月 19 日 06:08

rancher2.5.16 重启服务器rancher-agent自己消失了，请问怎么才能创建出这个rancher-agent呀

ksd · 2023 年6 月 19 日 06:15

rancher-agent 就是一个 pod，基本不会自己消失，你可以在下游集群通过以下命令去确认下：

kubectl get pods -n cattle-system
kubectl get deployment -n cattle-system

catterlai · 2023 年6 月 19 日 06:21

我是自定义安装的集群，我这边停了一次电，然后io有点高我在重启就不见了，你知道怎么创建出那个新的rancher-agent不、？

catterlai · 2023 年6 月 19 日 06:27

现在rancher一直提示这个023/06/19 06:27:10 [ERROR] failed on subscribe replicationController: Get “https://192.168.30.127:6443/api/v1/replicationcontrollers?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true”: waiting for cluster [c-kk9dl] agent to connect

ksd · 2023 年6 月 19 日 06:31

在对应的下游集群节点通过 docker ps -a 去查询 cluster-agent 的容器 id，然后 docker log 看看为什么没启动

catterlai · 2023 年6 月 19 日 08:07

我的是rancher2.5.16 但是他生产的是2.5.12的agent而且删除后会自动出现，过一会儿就停止了。这个是他的日志INFO: https://192.168.50.56:8089/ping is accessible
time=“2023-06-19T07:27:12Z” level=info msg=“Listening on /tmp/log.sock”
time=“2023-06-19T07:27:12Z” level=info msg=“Rancher agent version v2.5.12 is starting”
time=“2023-06-19T07:27:12Z” level=info msg=“Option worker=false”
time=“2023-06-19T07:27:12Z” level=info msg=“Option requestedHostname=dev-master”
time=“2023-06-19T07:27:12Z” level=info msg=“Option customConfig=map[address:192.168.30.127 internalAddress: label:map roles: taints:]”
time=“2023-06-19T07:27:12Z” level=info msg=“Option etcd=false”
time=“2023-06-19T07:27:12Z” level=info msg=“Option controlPlane=false”
time=“2023-06-19T07:27:14Z” level=info msg=“attempting to stop the share-mnt container so it can reboot on startup”
INFO: Arguments: --no-register --only-write-certs --node-name dev-master --server https://192.168.50.56:8089 --token REDACTED --ca-checksum 8d3250d06ee58485ef0041a55b65c5f4b8cbd846340692a62ecb516134d1cafa
INFO: Environment: CATTLE_ADDRESS=192.168.30.127 CATTLE_AGENT_CONNECT=true CATTLE_INTERNAL_ADDRESS= CATTLE_NODE_NAME=dev-master CATTLE_SERVER=https://192.168.50.56:8089 CATTLE_TOKEN=REDACTED CATTLE_WRITE_CERT_ONLY=true
INFO: Using resolv.conf: nameserver 61.139.2.69 nameserver 114.114.114.114
INFO: https://192.168.50.56:8089/ping is accessible
time=“2023-06-19T07:33:35Z” level=info msg=“Rancher agent version v2.5.12 is starting”
time=“2023-06-19T07:33:35Z” level=info msg=“Option etcd=false”
time=“2023-06-19T07:33:35Z” level=info msg=“Option controlPlane=false”
time=“2023-06-19T07:33:35Z” level=info msg=“Option worker=false”
time=“2023-06-19T07:33:35Z” level=info msg=“Option requestedHostname=dev-master”
time=“2023-06-19T07:33:35Z” level=info msg=“Option customConfig=map[address:192.168.30.127 internalAddress: label:map roles: taints:]”
time=“2023-06-19T07:33:35Z” level=info msg=“Listening on /tmp/log.sock”
time=“2023-06-19T07:33:35Z” level=info msg=“attempting to stop the share-mnt container so it can reboot on startup”

ksd · 2023 年6 月 19 日 08:10

这个日志没任何异常信息，你再看看其他的容器，看看有没有反复重启的

catterlai · 2023 年6 月 19 日 08:31

PLEG is not healthy: pleg was last seen active 30m43.517937112s ago; threshold is 3m0s 集群有这个错误

ksd · 2023 年6 月 19 日 08:36

那你百度或者 google 下，能查得到的

catterlai · 2023 年6 月 19 日 08:54

在问下我又rancher的备份和etcd的备份能恢复到新的集群不？

ksd · 2023 年6 月 19 日 09:13

理论上，可以