Rancher重启之后，agent和service报错，无法启动

ThinkBlue1991 · 2024 年11 月 21 日 10:45

Rancher Server 设置

Rancher 版本：v2.7.1
安装选项 (Docker install/Helm Chart): Docker install
- 如果是 Helm Chart 安装，需要提供 Local 集群的类型（RKE1, RKE2, k3s, EKS, 等）和版本：
在线或离线部署：在线部署

下游集群信息

Kubernetes 版本: v1.24.17
Cluster Type (Local/Downstream): Local
- 如果 Downstream，是什么类型的集群?(自定义/导入或为托管等):

用户信息

登录用户的角色是什么？（管理员/集群所有者/集群成员/项目所有者/项目成员/自定义）：管理员
- 如果自定义，自定义权限集：

主机操作系统： centos 7.6

问题描述： rancher部署完成，重启docker之后，rancher-agent和rancher-server没法启动，但是k8s正常

重现步骤：
安装完成后，重启docker服务

结果：

rancher-server日志：

2024/11/21 10:28:38 [INFO] Stopping cluster agent for c-tzm6v
2024/11/21 10:28:38 [ERROR] failed to start cluster controllers c-tzm6v: context canceled
2024/11/21 10:29:49 [ERROR] error syncing 'c-tzm6v': handler cluster-deploy: Get "https://172.17.110.83:6443/apis/apps/v1/namespaces/cattle-system/daemonsets/cattle-node-agent": cluster agent disconnected, requeuing
2024/11/21 10:30:47 [INFO] Stopping cluster agent for c-tzm6v
2024/11/21 10:30:47 [ERROR] failed to start cluster controllers c-tzm6v: context canceled
2024/11/21 10:32:33 [ERROR] error syncing 'c-tzm6v': handler cluster-deploy: Get "https://172.17.110.83:6443/apis/apps/v1/namespaces/cattle-system/daemonsets/cattle-node-agent": cluster agent disconnected, requeuing
2024/11/21 10:32:45 [INFO] Stopping cluster agent for c-tzm6v
2024/11/21 10:32:45 [ERROR] failed to start cluster controllers c-tzm6v: context canceled
2024/11/21 10:34:44 [INFO] Stopping cluster agent for c-tzm6v
2024/11/21 10:34:44 [ERROR] failed to start cluster controllers c-tzm6v: context canceled
2024/11/21 10:35:19 [ERROR] error syncing 'c-tzm6v': handler cluster-deploy: Get "https://172.17.110.83:6443/apis/apps/v1/namespaces/cattle-system/daemonsets/cattle-node-agent": cluster agent disconnected, requeuing
2024/11/21 10:36:59 [INFO] Stopping cluster agent for c-tzm6v
2024/11/21 10:36:59 [ERROR] failed to start cluster controllers c-tzm6v: context canceled
2024/11/21 10:38:03 [ERROR] error syncing 'c-tzm6v': handler cluster-deploy: Get "https://172.17.110.83:6443/apis/apps/v1/namespaces/cattle-system/daemonsets/cattle-node-agent": cluster agent disconnected, requeuing
2024/11/21 10:38:57 [INFO] Stopping cluster agent for c-tzm6v
2024/11/21 10:38:57 [ERROR] failed to start cluster controllers c-tzm6v: context canceled
2024/11/21 10:40:45 [ERROR] error syncing 'c-tzm6v': handler cluster-deploy: Get "https://172.17.110.83:6443/apis/apps/v1/namespaces/cattle-system/daemonsets/cattle-node-agent": cluster agent disconnected, requeuing
2024/11/21 10:41:13 [INFO] Stopping cluster agent for c-tzm6v
2024/11/21 10:41:13 [ERROR] failed to start cluster controllers c-tzm6v: context canceled
2024/11/21 10:43:13 [INFO] Stopping cluster agent for c-tzm6v
2024/11/21 10:43:13 [ERROR] failed to start cluster controllers c-tzm6v: context canceled
2024/11/21 10:43:28 [ERROR] error syncing 'c-tzm6v': handler cluster-deploy: Get "https://172.17.110.83:6443/apis/apps/v1/namespaces/cattle-system/daemonsets/cattle-node-agent": cluster agent disconnected, requeuing

k8s中cattle-system情况：

# kubectl  get pod -n cattle-system
NAME                                    READY   STATUS             RESTARTS           AGE
cattle-cluster-agent-5d68fb6f84-dlzbw   0/1     CrashLoopBackOff   3991 (4m14s ago)   25d
cattle-cluster-agent-5d68fb6f84-kdwtt   0/1     CrashLoopBackOff   3991 (38s ago)     25d
cattle-node-agent-46vb4                 0/1     CrashLoopBackOff   7287 (3m56s ago)   25d
cattle-node-agent-d9tc7                 0/1     CrashLoopBackOff   2753 (4m53s ago)   25d
cattle-node-agent-zvkb4                 1/1     Running            1 (14d ago)        25d
kube-api-auth-sph98                     1/1     Running            1 (14d ago)        25d

状态为CrashLoopBackOff 的agent的日志:

# kubectl logs -f  cattle-node-agent-46vb4  -n cattle-system
INFO: Environment: CATTLE_ADDRESS=172.17.110.83 CATTLE_AGENT_CONNECT=true CATTLE_CA_CHECKSUM=59d47373e00f43a89a159a7ad4d39422ee453bde24c638879144c3e2cc8c47ac CATTLE_CLUSTER=false CATTLE_CLUSTER_AGENT_PORT=tcp://10.43.134.61:80 CATTLE_CLUSTER_AGENT_PORT_443_TCP=tcp://10.43.134.61:443 CATTLE_CLUSTER_AGENT_PORT_443_TCP_ADDR=10.43.134.61 CATTLE_CLUSTER_AGENT_PORT_443_TCP_PORT=443 CATTLE_CLUSTER_AGENT_PORT_443_TCP_PROTO=tcp CATTLE_CLUSTER_AGENT_PORT_80_TCP=tcp://10.43.134.61:80 CATTLE_CLUSTER_AGENT_PORT_80_TCP_ADDR=10.43.134.61 CATTLE_CLUSTER_AGENT_PORT_80_TCP_PORT=80 CATTLE_CLUSTER_AGENT_PORT_80_TCP_PROTO=tcp CATTLE_CLUSTER_AGENT_SERVICE_HOST=10.43.134.61 CATTLE_CLUSTER_AGENT_SERVICE_PORT=80 CATTLE_CLUSTER_AGENT_SERVICE_PORT_HTTP=80 CATTLE_CLUSTER_AGENT_SERVICE_PORT_HTTPS_INTERNAL=443 CATTLE_INGRESS_IP_DOMAIN=sslip.io CATTLE_INSTALL_UUID=b34f3bbf-c3bc-4774-9028-0da1faa24044 CATTLE_INTERNAL_ADDRESS= CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=app1 CATTLE_SERVER=https://172.17.110.83:443 CATTLE_SERVER_VERSION=v2.7.1
INFO: Using resolv.conf: nameserver 211.137.160.5
ERROR: https://172.17.110.83:443/ping is not accessible (The requested URL returned error: 404)

状态为Running的agent日志：

0.83 because it doesn't contain any IP SANs"
time="2024-11-21T10:48:51Z" level=error msg="Remotedialer proxy error" error="x509: cannot validate certificate for 172.17.110.83 because it doesn't contain any IP SANs"
time="2024-11-21T10:49:01Z" level=info msg="Connecting to wss://172.17.110.83:443/v3/connect with token starting with mmb5nhsscrwrjhs8z2l6h8jhs6c"
time="2024-11-21T10:49:01Z" level=info msg="Connecting to proxy" url="wss://172.17.110.83:443/v3/connect"
time="2024-11-21T10:49:01Z" level=error msg="Failed to connect to proxy. Empty dialer response" error="x509: cannot validate certificate for 172.17.110.83 because it doesn't contain any IP SANs"
time="2024-11-21T10:49:01Z" level=error msg="Remotedialer proxy error" error="x509: cannot validate certificate for 172.17.110.83 because it doesn't contain any IP SANs"
time="2024-11-21T10:49:11Z" level=info msg="Connecting to wss://172.17.110.83:443/v3/connect with token starting with mmb5nhsscrwrjhs8z2l6h8jhs6c"
time="2024-11-21T10:49:11Z" level=info msg="Connecting to proxy" url="wss://172.17.110.83:443/v3/connect"
time="2024-11-21T10:49:11Z" level=error msg="Failed to connect to proxy. Empty dialer response" error="x509: cannot validate certificate for 172.17.110.83 because it doesn't contain any IP SANs"
time="2024-11-21T10:49:11Z" level=error msg="Remotedialer proxy error" error="x509: cannot validate certificate for 172.17.110.83 because it doesn't contain any IP SANs"

ksd · 2024 年11 月 21 日 11:07

这个日志引起的，可以从这个主机去访问对应的 ip 和端口是否可以访问。

ThinkBlue1991 · 2024 年11 月 21 日 13:21

172.17.110.83是rancher-server所在宿主机的IP地址，443是rancher-server的端口

目前的情况是：

rancher-server访问k8s的kube-apiserver的6443端口获取k8s中namespace为cattle-system下的ds为cattle-node-agent的状态，

cattle-node-agent又访问rancher-server的状态

感觉这就是一个死循环，互相访问对方的状态

ksd · 2024 年11 月 22 日 01:15

首先确认 cluster-agent 的为什么访问不了 rancher server 的 443 端口吧

ThinkBlue1991 · 2024 年11 月 22 日 01:28

下面是rancher-server打印出来的日志：


[root@app1 ~]# docker ps |grep 443
33d2887f20d4   registry.cn-hangzhou.aliyuncs.com/rancher/rancher:v2.7.1                "entrypoint.sh"           3 weeks ago   Up 13 days                      0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp   rancher
[root@app1 ~]# docker logs -f --tail=10 rancher
2024/11/22 01:20:18 [ERROR] failed to start cluster controllers c-tzm6v: context canceled
2024/11/22 01:20:35 [ERROR] error syncing 'c-tzm6v': handler cluster-deploy: Get "https://172.17.110.83:6443/apis/apps/v1/namespaces/cattle-system/daemonsets/cattle-node-agent": cluster agent disconnected, requeuing
2024/11/22 01:22:16 [INFO] Stopping cluster agent for c-tzm6v
2024/11/22 01:22:16 [ERROR] failed to start cluster controllers c-tzm6v: context canceled
2024/11/22 01:23:16 [ERROR] error syncing 'c-tzm6v': handler cluster-deploy: Get "https://172.17.110.83:6443/apis/apps/v1/namespaces/cattle-system/daemonsets/cattle-node-agent": cluster agent disconnected, requeuing
2024/11/22 01:24:17 [INFO] Stopping cluster agent for c-tzm6v
2024/11/22 01:24:17 [ERROR] failed to start cluster controllers c-tzm6v: context canceled
2024/11/22 01:26:02 [INFO] Stopping cluster agent for c-tzm6v
2024/11/22 01:26:02 [ERROR] failed to start cluster controllers c-tzm6v: context canceled
2024/11/22 01:26:03 [ERROR] error syncing 'c-tzm6v': handler cluster-deploy: Get "https://172.17.110.83:6443/apis/apps/v1/namespaces/cattle-system/daemonsets/cattle-node-agent": cluster agent disconnected, requeuing

这种情况是不是说明rancher-server 没有起来？

ThinkBlue1991 · 2024 年11 月 22 日 01:36

进入到rancher-server的容器里面，执行curl，日志如下

33d2887f20d4:/var/lib/rancher # curl  https://127.0.0.1:443/ping -k
pong33d2887f20d4:/var/lib/rancher # curl  https://127.0.0.1:443 -k
{"type":"collection","links":{"self":"https://127.0.0.1/"},"actions":{},"pagination":{"limit":1000,"total":4},"sort":{"order":"asc","reverse":"https://127.0.0.1/?order=desc"},"resourceType":"apiRoot","data":[{"apiVersion":{"group":"meta.cattle.io","path":"/meta","version":"v1"},"baseType":"apiRoot","links":{"apiRoots":"https://127.0.0.1/meta/apiroots","root":"https://127.0.0.1/meta","schemas":"https://127.0.0.1/meta/schemas","self":"https://127.0.0.1/meta","subscribe":"https://127.0.0.1/meta/subscribe"},"type":"apiRoot"},{"apiVersion":{"group":"management.cattle.io","path":"/v3","version":"v3"},"baseType":"apiRoot","links":{"authConfigs":"https://127.0.0.1/v3/authconfigs","catalogs":"https://127.0.0.1/v3/catalogs","cisBenchmarkVersions":"https://127.0.0.1/v3/cisbenchmarkversions","cisConfigs":"https://127.0.0.1/v3/cisconfigs","cloudCredentials":"https://127.0.0.1/v3/cloudcredentials","clusterAlertGroups":"https://127.0.0.1/v3/clusteralertgroups","clusterAlertRules":"https://127.0.0.1/v3/clusteralertrules","clusterAlerts":"https://127.0.0.1/v3/clusteralerts","clusterCatalogs":"https://127.0.0.1/v3/clustercatalogs","clusterMonitorGraphs":"https://127.0.0.1/v3/clustermonitorgraphs","clusterRegistrationTokens":"https://127.0.0.1/v3/clusterregistrationtokens","clusterRoleTemplateBindings":"https://127.0.0.1/v3/clusterroletemplatebindings","clusterScans":"https://127.0.0.1/v3/clusterscans","clusterTemplateRevisions":"https://127.0.0.1/v3/clustertemplaterevisions","clusterTemplates":"https://127.0.0.1/v3/clustertemplates","clusters":"https://127.0.0.1/v3/clusters","composeConfigs":"https://127.0.0.1/v3/composeconfigs","dynamicSchemas":"https://127.0.0.1/v3/dynamicschemas","etcdBackups":"https://127.0.0.1/v3/etcdbackups","features":"https://127.0.0.1/v3/features","fleetWorkspaces":"https://127.0.0.1/v3/fleetworkspaces","globalDnsProviders":"https://127.0.0.1/v3/globaldnsproviders","globalDnses":"https://127.0.0.1/v3/globaldnses","globalRoleBindings":"https://127.0.0.1/v3/globalrolebindings","globalRoles":"https://127.0.0.1/v3/globalroles","groupMembers":"https://127.0.0.1/v3/groupmembers","groups":"https://127.0.0.1/v3/groups","kontainerDrivers":"https://127.0.0.1/v3/kontainerdrivers","ldapConfigs":"https://127.0.0.1/v3/ldapconfigs","managementSecrets":"https://127.0.0.1/v3/managementsecrets","monitorMetrics":"https://127.0.0.1/v3/monitormetrics","multiClusterAppRevisions":"https://127.0.0.1/v3/multiclusterapprevisions","multiClusterApps":"https://127.0.0.1/v3/multiclusterapps","nodeDrivers":"https://127.0.0.1/v3/nodedrivers","nodePools":"https://127.0.0.1/v3/nodepools","nodeTemplates":"https://127.0.0.1/v3/nodetemplates","nodes":"https://127.0.0.1/v3/nodes","notifiers":"https://127.0.0.1/v3/notifiers","podSecurityPolicyTemplateProjectBindings":"https://127.0.0.1/v3/podsecuritypolicytemplateprojectbindings","podSecurityPolicyTemplates":"https://127.0.0.1/v3/podsecuritypolicytemplates","preferences":"https://127.0.0.1/v3/preferences","principals":"https://127.0.0.1/v3/principals","projectAlertGroups":"https://127.0.0.1/v3/projectalertgroups","projectAlertRules":"https://127.0.0.1/v3/projectalertrules","projectAlerts":"https://127.0.0.1/v3/projectalerts","projectCatalogs":"https://127.0.0.1/v3/projectcatalogs","projectMonitorGraphs":"https://127.0.0.1/v3/projectmonitorgraphs","projectNetworkPolicies":"https://127.0.0.1/v3/projectnetworkpolicies","projectRoleTemplateBindings":"https://127.0.0.1/v3/projectroletemplatebindings","projects":"https://127.0.0.1/v3/projects","rancherUserNotifications":"https://127.0.0.1/v3/rancherusernotifications","rkeAddons":"https://127.0.0.1/v3/rkeaddons","rkeK8sServiceOptions":"https://127.0.0.1/v3/rkek8sserviceoptions","rkeK8sSystemImages":"https://127.0.0.1/v3/rkek8ssystemimages","roleTemplates":"https://127.0.0.1/v3/roletemplates","root":"https://127.0.0.1/v3","samlTokens":"https://127.0.0.1/v3/samltokens","self":"https://127.0.0.1/v3","settings":"https://127.0.0.1/v3/settings","subscribe":"https://127.0.0.1/v3/subscribe","templateVersions":"https://127.0.0.1/v3/templateversions","templates":"https://127.0.0.1/v3/templates","tokens":"https://127.0.0.1/v3/tokens","users":"https://127.0.0.1/v3/users"},"type":"apiRoot"},{"apiVersion":{"group":"cluster.cattle.io","path":"/v3/cluster","version":"v3"},"baseType":"apiRoot","links":{"apiServices":"https://127.0.0.1/v3/cluster/apiservices","namespaces":"https://127.0.0.1/v3/cluster/namespaces","persistentVolumes":"https://127.0.0.1/v3/cluster/persistentvolumes","root":"https://127.0.0.1/v3/cluster","self":"https://127.0.0.1/v3/cluster","storageClasses":"https://127.0.0.1/v3/cluster/storageclasses","subscribe":"https://127.0.0.1/v3/cluster/subscribe"},"type":"apiRoot"},{"apiVersion":{"group":"project.cattle.io","path":"/v3/project","version":"v3"},"baseType":"apiRoot","links":{"alertmanagers":"https://127.0.0.1/v3/project/alertmanagers","appRevisions":"https://127.0.0.1/v3/project/apprevisions","apps":"https://127.0.0.1/v3/project/apps","basicAuths":"https://127.0.0.1/v3/project/basicauths","certificates":"https://127.0.0.1/v3/project/certificates","configMaps":"https://127.0.0.1/v3/project/configmaps","cronJobs":"https://127.0.0.1/v3/project/cronjobs","daemonSets":"https://127.0.0.1/v3/project/daemonsets","deployments":"https://127.0.0.1/v3/project/deployments","dnsRecords":"https://127.0.0.1/v3/project/dnsrecords","dockerCredentials":"https://127.0.0.1/v3/project/dockercredentials","horizontalPodAutoscalers":"https://127.0.0.1/v3/project/horizontalpodautoscalers","ingresses":"https://127.0.0.1/v3/project/ingresses","jobs":"https://127.0.0.1/v3/project/jobs","namespacedBasicAuths":"https://127.0.0.1/v3/project/namespacedbasicauths","namespacedCertificates":"https://127.0.0.1/v3/project/namespacedcertificates","namespacedDockerCredentials":"https://127.0.0.1/v3/project/namespaceddockercredentials","namespacedSecrets":"https://127.0.0.1/v3/project/namespacedsecrets","namespacedServiceAccountTokens":"https://127.0.0.1/v3/project/namespacedserviceaccounttokens","namespacedSshAuths":"https://127.0.0.1/v3/project/namespacedsshauths","persistentVolumeClaims":"https://127.0.0.1/v3/project/persistentvolumeclaims","pods":"https://127.0.0.1/v3/project/pods","prometheusRules":"https://127.0.0.1/v3/project/prometheusrules","prometheuses":"https://127.0.0.1/v3/project/prometheuses","replicaSets":"https://127.0.0.1/v3/project/replicasets","replicationControllers":"https://127.0.0.1/v3/project/replicationcontrollers","root":"https://127.0.0.1/v3/project","secrets":"https://127.0.0.1/v3/project/secrets","self":"https://127.0.0.1/v3/project","serviceAccountTokens":"https://127.0.0.1/v3/project/serviceaccounttokens","serviceMonitors":"https://127.0.0.1/v3/project/servicemonitors","services":"https://127.0.0.1/v3/project/services","sshAuths":"https://127.0.0.1/v3/project/sshauths","statefulSets":"https://127.0.0.1/v3/project/statefulsets","subscribe":"https://127.0.0.1/v3/project/subscribe","workloads":"https://127.0.0.1/v3/project/workloads"},"type":"apiRoot"}]}
33d2887f20d4:/var/lib/rancher # exit

这是不是说明在docker转发443请求的时候出现了问题？

ksd · 2024 年11 月 22 日 01:42

你的 rancher 从浏览器上现在能访问么？如果能，那就是能起得来。

cluster agent 访问 rancher，你直接在cluster agent 的宿主机上 telnet 测试下就行

ThinkBlue1991 · 2024 年11 月 22 日 01:50

浏览器上不能访问报错： 404

ksd · 2024 年11 月 22 日 02:25

这个页面提示的是 nginx 的页面啊，你看看这个服务器上是不是启动了 nginx，把 rancher 的端口占用了

ThinkBlue1991 · 2024 年11 月 22 日 02:34

的确是启动了nginx,但是我已经把nginx的listen 443给关闭了

[root@app1 nginx]# docker ps |grep 443
33d2887f20d4   registry.cn-hangzhou.aliyuncs.com/rancher/rancher:v2.7.1                "entrypoint.sh"           3 weeks ago   Up 13 days                     0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp   rancher
[root@app1 nginx]# ss -tlnp|grep 443
LISTEN 0      32768        0.0.0.0:443        0.0.0.0:*    users:(("docker-proxy",pid=141106,fd=4))                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
LISTEN 0      32768              *:6443             *:*    users:(("kube-apiserver",pid=2178420,fd=7))
 [root@app1 nginx]# ps -aux|grep 141106
root      141106  0.0  0.0 1604400 9384 ?        Sl   11月08   0:00 /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 443 -container-ip 172.18.0.2 -container-port 443

按照docker和ps查看，443应该是rancher-server的，rancher-server后面不是用的nginx吧？

ksd · 2024 年11 月 22 日 05:38

不用 nginx 啊，你是不是把 rancher 和下游集群装在同一个主机上了？

ThinkBlue1991 · 2024 年11 月 22 日 05:54

下游集群指的是？

我现在有三台节点：

node1 rancher-server rancher-agent
node2 rancher-agent
node3 rancher-agent

ksd · 2024 年11 月 22 日 07:00

那问题就出在 node1 上了，因为 rancher 占用的是 443，下游集群的节点的 ingress controller 也占用 443 端口，所以当你重启之后，ingress controller 的 443 占用了 rancher 的 443，导致你访问不了 rancher

ThinkBlue1991 · 2024 年11 月 22 日 07:21

果真是ingress-controller的问题,

nginx-ingress-controller 直接将hostport的端口给占用了

请教一下：ingress-nginx是创建rancher时默认创建的namespace，作用是什么？可以删除吗？或者更改ds的配置？

kubectl describe ds nginx-ingress-controller -n ingress-nginx
Name:           nginx-ingress-controller
Selector:       app=ingress-nginx
Node-Selector:  <none>
Labels:         app.kubernetes.io/component=controller
                app.kubernetes.io/instance=ingress-nginx
                app.kubernetes.io/name=ingress-nginx
                app.kubernetes.io/version=1.5.1
Annotations:    deprecated.daemonset.template.generation: 1
                field.cattle.io/publicEndpoints:
                  [{"nodeName":":app1","addresses":["172.17.110.83"],"port":80,"protocol":"TCP","podName":"ingress-nginx:nginx-ingress-controller-lmfnq","al...
Desired Number of Nodes Scheduled: 3
Current Number of Nodes Scheduled: 3
Number of Nodes Scheduled with Up-to-date Pods: 3
Number of Nodes Scheduled with Available Pods: 3
Number of Nodes Misscheduled: 0
Pods Status:  3 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app=ingress-nginx
                    app.kubernetes.io/component=controller
                    app.kubernetes.io/instance=ingress-nginx
                    app.kubernetes.io/name=ingress-nginx
  Service Account:  ingress-nginx
  Containers:
   controller:
    Image:       registry.cn-hangzhou.aliyuncs.com/rancher/nginx-ingress-controller:nginx-1.5.1-rancher2
    Ports:       80/TCP, 443/TCP, 8443/TCP
    Host Ports:  80/TCP, 443/TCP, 0/TCP
    Args:
      /nginx-ingress-controller
      --election-id=ingress-controller-leader-nginx
      --controller-class=k8s.io/ingress-nginx
      --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
      --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
      --udp-services-configmap=$(POD_NAMESPACE)/udp-services
      --validating-webhook=:8443
      --validating-webhook-certificate=/usr/local/certificates/cert
      --validating-webhook-key=/usr/local/certificates/key
      --watch-ingress-without-class=true
    Liveness:   http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
    Readiness:  http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:        (v1:metadata.name)
      POD_NAMESPACE:   (v1:metadata.namespace)
      LD_PRELOAD:     /usr/local/lib/libmimalloc.so
    Mounts:
      /usr/local/certificates/ from webhook-cert (ro)
  Volumes:
   webhook-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ingress-nginx-admission
    Optional:    false
Events:          <none>

ksd · 2024 年11 月 22 日 07:31

哪个 namespace？

ThinkBlue1991 · 2024 年11 月 22 日 09:22

ingress-nginx 这个

ThinkBlue1991 · 2024 年11 月 23 日 00:53

感谢@ksd

问题已经解决，主要原因是因为kubernetes创建的namespace为ingress-nginx下的ds实例【nginx-ingress-controller】，映射出来的hostport为443的问题：

...
name: controller
        ports:
        - containerPort: 80
          hostPort: 80
          name: http
          protocol: TCP
        - containerPort: 443
          hostPort: 443
          name: https
          protocol: TCP
        - containerPort: 8443
          name: webhook
          protocol: TCP
        readinessProbe:
...

解决方案：

1.编辑nginx-ingress-controller,去除掉hostPort:443和hostPort:80的配置

kubectl edit ds  nginx-ingress-controller -n ingress-nginx

1. 在创建rancher-server问题时，直接将80和443端口映射为其他的端口，比如8080和4443

docker run -d --restart=unless-stopped \
  -p 8080:80 -p 4443:443 \
  --privileged \
  -e CATTLE_SYSTEM_DEFAULT_REGISTRY=registry.cn-hangzhou.aliyuncs.com \
  --name rancher \
  registry.cn-hangzhou.aliyuncs.com/rancher/rancher:v2.7.1

还需要请教一下，rancher-server在启动之后是否可以还更改容器的映射端口，更改之后，其他的agent如何发现server? @ksd

ksd · 2024 年11 月 25 日 01:03

这个是修改 rancher 的 IP 地址，操作起来特别复杂，不建议这样弄，还不如重新安装一套

ThinkBlue1991 · 2024 年11 月 25 日 01:34

OK，了解了，感谢