Rancher Server 设置
- Rancher 版本:
k8s-agent-01 Ready 369d v1.27.8+rke2r1
k8s-server-01 Ready control-plane,etcd,master 374d v1.27.8+rke2r1
k8s-server-02 Ready control-plane,etcd,master 374d v1.27.8+rke2r1
k8s-server-03 Ready control-plane,etcd,master 374d v1.27.8+rke2r1
- 节点CPU/内核版本
Linux k8s-server-01 5.10.178 #1 SMP Thu Jul 13 08:45:43 UTC 2023 x86_64 GNU/Linux
PRETTY_NAME=“Debian GNU/Linux 10 (buster)”
- 在线或离线部署:离线部署
问题描述:
RKE2安装后 /var/lib/rancher/rke2 目录默认是在系统盘上,由于系统盘较小,需要将其迁移到数据盘上。迁移后重启rke2-server / rke2-agent 服务,集群正常启动但是感觉底层镜像文件出现了异常。
重现步骤:
1、3个server + 1个agent节点,停止rke2服务:systemctl stop rke2-server ; systemctl stop rke2-agent
2、备份现在的RKE2目录:mv /var/lib/rancher/rke2/* /var/lib/rancher/rke2_bak
3、将数据盘进行挂载:mount /dev/nvme1n1 /var/lib/rancher/rke2/
4、将备份目录复制到挂载目录下:cp -r /var/lib/rancher/rke2/ /var/lib/rancher/rke2/
5、启动rke2服务 :systemctl start rke2-server ; systemctl start rke2-agent
结果:
1、部分业务namespace下的pod,根目录空了
2、部分pod由于权限问题启动失败,这是其中比较重要的作为示例(日志会贴在下面)
3、所有pod 启动时涉及网络插件的情况下都启动失败了
预期结果:
服务重启后集群正常,业务pod不受影响,其他pod正常运行
其他上下文信息:
1、出现异常后第一时间确认rke2 目录内容和rke2_bak目录下内容是否一致,发现部分目录和文件权限已经不一致了,推测是使用了cp -r 而不是cp -ra 的问题
2、业务namespace下根目录空了的Pod,重启后恢复了,推测是镜像文件目录变更导致的问题
3、网络插件使用的是canal + multus
cat /etc/rancher/rke2/config.yaml
tls-san:
- k8s-server-01
- k8s-server-02
- k8s-server-03
node-name: k8s-server-01
# bind-address: 0.0.0.0
# data-dir: /var/lib/rancher/rke2
# cluster-cidr: 10.42.0.0/16
# service-cidr: 10.43.0.0/16
# service-node-port-range: 30000-32767
# cluster-domian: cluster.local
bind-address: 172.29.71.11
node-ip: 172.29.71.11
cni:
- multus
- canal
【rke2-ingress-nginx-controller-pmt96】【Pod启动失败日志】
1 2025-01-24T17:27:27.754575571+08:00 stderr F E0124 09:27:27.7544557 main.go:157] "unexpected error obtaining NGINX version" err="fork/exec /usr/bin/nginx: permiss
ion denied"
2 2025-01-24T17:27:27.754597953+08:00 stdout F -------------------------------------------------------------------------------
3 2025-01-24T17:27:27.754653035+08:00 stdout F NGINX Ingress controller
4 2025-01-24T17:27:27.754660472+08:00 stdout F Release:nginx-1.9.3-hardened1
5 2025-01-24T17:27:27.754666549+08:00 stdout F Build:git-1d7cec346
6 2025-01-24T17:27:27.754672379+08:00 stdout F Repository: https://github.com/rancher/ingress-nginx.git
7 2025-01-24T17:27:27.754677974+08:00 stdout F N/A
8 2025-01-24T17:27:27.754684948+08:00 stdout F -------------------------------------------------------------------------------
9 2025-01-24T17:27:27.754690686+08:00 stdout F
10 2025-01-24T17:27:27.754864771+08:00 stderr F W0124 09:27:27.7548127 client_config.go:618] Neither --kubeconfig nor --master was specified. Using the inClusterCon
fig. This might not work.
11 2025-01-24T17:27:27.754972379+08:00 stderr F I0124 09:27:27.7549307 main.go:205] "Creating API client" host="https://10.43.0.1:443"
12 2025-01-24T17:27:27.761434634+08:00 stderr F I0124 09:27:27.7613607 main.go:249] "Running in Kubernetes cluster" major="1" minor="27" git="v1.27.8+rke2r1" state="
clean" commit="66fee42707cd7f5a89f1987f7cb81b02dd19161c" platform="linux/amd64"
13 2025-01-24T17:27:27.875345439+08:00 stderr F I0124 09:27:27.8752657 main.go:101] "SSL fake certificate created" file="/etc/ingress-controller/ssl/default-fake-cer
tificate.pem"
14 2025-01-24T17:27:27.900138016+08:00 stderr F I0124 09:27:27.9000407 ssl.go:536] "loading tls certificate" path="/usr/local/certificates/cert" key="/usr/local/cert
ificates/key"
15 2025-01-24T17:27:27.913263125+08:00 stderr F I0124 09:27:27.9131817 nginx.go:260] "Starting NGINX Ingress controller"
16 2025-01-24T17:27:27.950592885+08:00 stderr F I0124 09:27:27.9505007 event.go:298] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"starmap", Name:"rke2-ingre
ss-nginx-controller", UID:"92fe7550-ff92-4374-9d0d-e3da135e4269", APIVersion:"v1", ResourceVersion:"22034247", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap
starmap/rke2-ingress-nginx-controller
17 2025-01-24T17:27:29.017475752+08:00 stderr F I0124 09:27:29.0173727 store.go:440] "Found valid IngressClass" ingress="starmap/ingress-starmap-adminapp" ingresscla
ss="nginx"
18 2025-01-24T17:27:29.017689454+08:00 stderr F I0124 09:27:29.0175817 event.go:298] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"starmap", Name:"ingress-star
map-adminapp", UID:"7b7a90a8-07b1-460d-98d6-5620b6b91ebb", APIVersion:"networking.k8s.io/v1", ResourceVersion:"155049111", FieldPath:""}): type: 'Normal' reason: 'Sync'
Scheduled for sync
19 2025-01-24T17:27:29.018221017+08:00 stderr F I0124 09:27:29.0181677 store.go:440] "Found valid IngressClass" ingress="starmap/ingress-starmap-adminweb" ingresscla
ss="nginx"
20 2025-01-24T17:27:29.018345959+08:00 stderr F I0124 09:27:29.0182867 event.go:298] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"starmap", Name:"ingress-star
map-adminweb", UID:"3dbe8680-2246-4f65-a923-471cf68ae8d3", APIVersion:"networking.k8s.io/v1", ResourceVersion:"155049110", FieldPath:""}): type: 'Normal' reason: 'Sync'
Scheduled for sync
21 2025-01-24T17:27:29.115101026+08:00 stderr F I0124 09:27:29.1150177 nginx.go:303] "Starting NGINX process"
22 2025-01-24T17:27:29.115131435+08:00 stderr F I0124 09:27:29.1150237 leaderelection.go:245] attempting to acquire leader lease starmap/rke2-ingress-nginx-leader...
23 2025-01-24T17:27:29.115712597+08:00 stderr F F0124 09:27:29.1156577 nginx.go:421] NGINX error: fork/exec /usr/bin/nginx: permission denied
【rke2-ingress-nginx-controller-x2vjt】【网络插件异常describe信息】
[root@k8s-server-01 ~]# kubectl describe po -n starmap rke2-ingress-nginx-controller-x2vjt
Name: rke2-ingress-nginx-controller-x2vjt
Namespace: starmap
Priority: 0
Service Account: rke2-ingress-nginx
Node: k8s-server-02/172.29.71.12
Start Time: Fri, 24 Jan 2025 17:34:16 +0800
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=rke2-ingress-nginx
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=rke2-ingress-nginx
app.kubernetes.io/part-of=rke2-ingress-nginx
app.kubernetes.io/version=1.9.3
controller-revision-hash=dc8b796d5
helm.sh/chart=rke2-ingress-nginx-4.8.200
pod-template-generation=6
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: DaemonSet/rke2-ingress-nginx-controller
Containers:
rke2-ingress-nginx-controller:
Container ID:
Image: registry.ibdp.webray.com.cn:51808/rancher/nginx-ingress-controller:nginx-1.9.3-hardened1
Image ID:
Ports: 80/TCP, 443/TCP, 8443/TCP
Host Ports: 80/TCP, 443/TCP, 0/TCP
Args:
/nginx-ingress-controller
--election-id=rke2-ingress-nginx-leader
--controller-class=k8s.io/ingress-nginx
--ingress-class=nginx
--configmap=$(POD_NAMESPACE)/rke2-ingress-nginx-controller
--validating-webhook=:8443
--validating-webhook-certificate=/usr/local/certificates/cert
--validating-webhook-key=/usr/local/certificates/key
--watch-ingress-without-class=true
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Requests:
cpu: 100m
memory: 90Mi
Liveness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
Readiness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
POD_NAME: rke2-ingress-nginx-controller-x2vjt (v1:metadata.name)
POD_NAMESPACE: starmap (v1:metadata.namespace)
LD_PRELOAD: /usr/local/lib/libmimalloc.so
Mounts:
/usr/local/certificates/ from webhook-cert (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-cnrwb (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
webhook-cert:
Type: Secret (a volume populated by a Secret)
SecretName: rke2-ingress-nginx-admission
Optional: false
kube-api-access-cnrwb:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 13m default-scheduler Successfully assigned starmap/rke2-ingress-nginx-controller-x2vjt to k8s-server-02
Warning FailedCreatePodSandBox 13m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbo
x "2459c7418fefbd98f6123bcbbcd8406908ccf06c5bc79d0d70261eaa351e5294": plugin type="multus" name="multus-cni-network" failed (add): Multus: [starmap/rke2-ingress-nginx-contro
ller-x2vjt/f2acc921-df9c-46ab-bdc7-675e8d14722f]: error getting pod: Unauthorized
Warning FailedCreatePodSandBox 13m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbo
x "effea7e5faca9a93b57207776ba8d59b3324fcf8b233beedddae8c443ffff5e3": plugin type="multus" name="multus-cni-network" failed (add): Multus: [starmap/rke2-ingress-nginx-contro
ller-x2vjt/f2acc921-df9c-46ab-bdc7-675e8d14722f]: error getting pod: Unauthorized
Warning FailedCreatePodSandBox 12m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbo
x "cec68ff07fb9ba49d11e41b5f7511d72e2357dee3499e5ac7f06fabd8798c732": plugin type="multus" name="multus-cni-network" failed (add): Multus: [starmap/rke2-ingress-nginx-contro
ller-x2vjt/f2acc921-df9c-46ab-bdc7-675e8d14722f]: error getting pod: Get "https://[10.43.0.1]:443/api/v1/namespaces/starmap/pods/rke2-ingress-nginx-controller-x2vjt?timeout=
1m0s": dial tcp 10.43.0.1:443: connect: connection timed out
Warning FailedCreatePodSandBox 12m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbo
x "a2dc15bee830df99350f42e8ba6dd75b526d17ef61be02858a23cdd8ae43fd77": plugin type="multus" name="multus-cni-network" failed (add): Multus: [starmap/rke2-ingress-nginx-contro
ller-x2vjt/f2acc921-df9c-46ab-bdc7-675e8d14722f]: error getting pod: Unauthorized
Warning FailedCreatePodSandBox 12m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbo
x "9c88c9a811541bc9ae4729f637e567aa834b5ba1de4bad87261f5fde7483711c": plugin type="multus" name="multus-cni-network" failed (add): Multus: [starmap/rke2-ingress-nginx-contro
ller-x2vjt/f2acc921-df9c-46ab-bdc7-675e8d14722f]: error getting pod: Unauthorized
Warning FailedCreatePodSandBox 12m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbo
x "1ead31a1e2f563a7647a0144735afe11d50fb22190295942eb8dc7999dd0b474": plugin type="multus" name="multus-cni-network" failed (add): Multus: [starmap/rke2-ingress-nginx-contro
ller-x2vjt/f2acc921-df9c-46ab-bdc7-675e8d14722f]: error getting pod: Get "https://[10.43.0.1]:443/api/v1/namespaces/starmap/pods/rke2-ingress-nginx-controller-x2vjt?timeout=
1m0s": dial tcp 10.43.0.1:443: connect: connection timed out
Warning FailedCreatePodSandBox 11m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbo
x "4d268a1a7f902beb34f708589dde576be079c67f0a1f10761601105514703194": plugin type="multus" name="multus-cni-network" failed (add): Multus: [starmap/rke2-ingress-nginx-contro
ller-x2vjt/f2acc921-df9c-46ab-bdc7-675e8d14722f]: error getting pod: Get "https://[10.43.0.1]:443/api/v1/namespaces/starmap/pods/rke2-ingress-nginx-controller-x2vjt?timeout=
1m0s": dial tcp 10.43.0.1:443: connect: connection timed out
Warning FailedCreatePodSandBox 11m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbo
x "c95deb1fba26f07bfe413798d8c97c559fd1f52586117fbecd955e8244ae0316": plugin type="multus" name="multus-cni-network" failed (add): Multus: [starmap/rke2-ingress-nginx-contro
ller-x2vjt/f2acc921-df9c-46ab-bdc7-675e8d14722f]: error getting pod: Unauthorized
Warning FailedCreatePodSandBox 11m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbo
x "e829cae93364a48e09e464a2b0a6ff43654cee343b0e83c5c3d2a81172163a7f": plugin type="multus" name="multus-cni-network" failed (add): Multus: [starmap/rke2-ingress-nginx-contro
ller-x2vjt/f2acc921-df9c-46ab-bdc7-675e8d14722f]: error getting pod: Unauthorized
Warning FailedCreatePodSandBox 3m20s (x31 over 11m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = fa
iled to setup network for sandbox "dda93492ac045b933557567e322e6cfc5fcc29e26c4cb747c3b578df42ad85b8": plugin type="multus" name="multus-cni-network" failed (add): Multus: [s
tarmap/rke2-ingress-nginx-controller-x2vjt/f2acc921-df9c-46ab-bdc7-675e8d14722f]: error getting pod: Get "https://[10.43.0.1]:443/api/v1/namespaces/starmap/pods/rke2-ingress
-nginx-controller-x2vjt?timeout=1m0s": dial tcp 10.43.0.1:443: connect: connection timed out