Rancher-agent没有了怎么恢复呢?

rancher2.5.16 重启服务器rancher-agent自己消失了,请问怎么才能创建出这个rancher-agent呀

rancher-agent 就是一个 pod,基本不会自己消失,你可以在下游集群通过以下命令去确认下:

kubectl get pods -n cattle-system
kubectl get deployment -n cattle-system

我是自定义安装的集群,我这边停了一次电,然后io有点高 我在重启就不见了,你知道怎么创建出那个新的rancher-agent不、?

现在rancher一直提示这个023/06/19 06:27:10 [ERROR] failed on subscribe replicationController: Get “https://192.168.30.127:6443/api/v1/replicationcontrollers?resourceVersion=0&timeout=30m0s&timeoutSeconds=1800&watch=true”: waiting for cluster [c-kk9dl] agent to connect

在对应的下游集群节点通过 docker ps -a 去查询 cluster-agent 的容器 id,然后 docker log 看看为什么没启动

我的是rancher2.5.16 但是他生产的是2.5.12的agent而且删除后会自动出现,过一会儿就停止了。这个是他的日志INFO: https://192.168.50.56:8089/ping is accessible
time=“2023-06-19T07:27:12Z” level=info msg=“Listening on /tmp/log.sock”
time=“2023-06-19T07:27:12Z” level=info msg=“Rancher agent version v2.5.12 is starting”
time=“2023-06-19T07:27:12Z” level=info msg=“Option worker=false”
time=“2023-06-19T07:27:12Z” level=info msg=“Option requestedHostname=dev-master”
time=“2023-06-19T07:27:12Z” level=info msg=“Option customConfig=map[address:192.168.30.127 internalAddress: label:map roles: taints:]”
time=“2023-06-19T07:27:12Z” level=info msg=“Option etcd=false”
time=“2023-06-19T07:27:12Z” level=info msg=“Option controlPlane=false”
time=“2023-06-19T07:27:14Z” level=info msg=“attempting to stop the share-mnt container so it can reboot on startup”
INFO: Arguments: --no-register --only-write-certs --node-name dev-master --server https://192.168.50.56:8089 --token REDACTED --ca-checksum 8d3250d06ee58485ef0041a55b65c5f4b8cbd846340692a62ecb516134d1cafa
INFO: Environment: CATTLE_ADDRESS=192.168.30.127 CATTLE_AGENT_CONNECT=true CATTLE_INTERNAL_ADDRESS= CATTLE_NODE_NAME=dev-master CATTLE_SERVER=https://192.168.50.56:8089 CATTLE_TOKEN=REDACTED CATTLE_WRITE_CERT_ONLY=true
INFO: Using resolv.conf: nameserver 61.139.2.69 nameserver 114.114.114.114
INFO: https://192.168.50.56:8089/ping is accessible
time=“2023-06-19T07:33:35Z” level=info msg=“Rancher agent version v2.5.12 is starting”
time=“2023-06-19T07:33:35Z” level=info msg=“Option etcd=false”
time=“2023-06-19T07:33:35Z” level=info msg=“Option controlPlane=false”
time=“2023-06-19T07:33:35Z” level=info msg=“Option worker=false”
time=“2023-06-19T07:33:35Z” level=info msg=“Option requestedHostname=dev-master”
time=“2023-06-19T07:33:35Z” level=info msg=“Option customConfig=map[address:192.168.30.127 internalAddress: label:map roles: taints:]”
time=“2023-06-19T07:33:35Z” level=info msg=“Listening on /tmp/log.sock”
time=“2023-06-19T07:33:35Z” level=info msg=“attempting to stop the share-mnt container so it can reboot on startup”

这个日志没任何异常信息,你再看看其他的容器,看看有没有反复重启的

PLEG is not healthy: pleg was last seen active 30m43.517937112s ago; threshold is 3m0s 集群有这个错误

那你百度或者 google 下,能查得到的

在问下 我又rancher的备份和etcd的备份能恢复到新的集群不?

理论上,可以