CKA 예제 리마인더 - 36. Worker Node Failure

DevOps

CKA 예제 리마인더 - 36. Worker Node Failure

Vince_rf 2025. 1. 11. 22:57

node01 노드의 issue를 fix 하세요

controlplane ~ ➜  ssh node01
Welcome to Ubuntu 22.04.4 LTS (GNU/Linux 5.15.0-1072-gcp x86_64)

* Documentation:  https://help.ubuntu.com
* Management:     https://landscape.canonical.com
* Support:        https://ubuntu.com/pro

This system has been minimized by removing packages and content that are
not required on a system that users do not log into.

To restore this content, you can run the 'unminimize' command.
Last login: Sat Jan 11 13:16:45 2025 from 192.168.231.130

node01 ~ ➜  kubectl get all
E0111 13:18:35.407504    6889 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: the server could not find the requested resource"
E0111 13:18:35.410076    6889 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: the server could not find the requested resource"
E0111 13:18:35.412402    6889 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: the server could not find the requested resource"
E0111 13:18:35.414786    6889 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: the server could not find the requested resource"
E0111 13:18:35.417138    6889 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: the server could not find the requested resource"
Error from server (NotFound): the server could not find the requested resource

우선 kubelet의 상태 확인

node01 ~ ➜  systemctl status kubelet
○ kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /usr/lib/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: inactive (dead) since Sat 2025-01-11 13:14:54 UTC; 6min ago
       Docs: https://kubernetes.io/docs/
    Process: 2578 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $>
   Main PID: 2578 (code=exited, status=0/SUCCESS)

Jan 11 13:12:34 node01 kubelet[2578]: I0111 13:12:34.339103    2578 reconciler_common.go:245>
Jan 11 13:12:34 node01 kubelet[2578]: I0111 13:12:34.339124    2578 reconciler_common.go:245>
Jan 11 13:12:34 node01 kubelet[2578]: I0111 13:12:34.339143    2578 reconciler_common.go:245>
Jan 11 13:12:34 node01 kubelet[2578]: I0111 13:12:34.339162    2578 reconciler_common.go:245>
Jan 11 13:12:34 node01 kubelet[2578]: I0111 13:12:34.339182    2578 reconciler_common.go:245>
Jan 11 13:12:34 node01 kubelet[2578]: I0111 13:12:34.339202    2578 reconciler_common.go:245>
Jan 11 13:12:34 node01 kubelet[2578]: I0111 13:12:34.445047    2578 swap_util.go:54] "Runnin>
Jan 11 13:12:36 node01 kubelet[2578]: I0111 13:12:36.419197    2578 pod_startup_latency_trac>
Jan 11 13:12:37 node01 kubelet[2578]: I0111 13:12:37.733425    2578 kubelet_node_status.go:4>
Jan 11 13:12:39 node01 kubelet[2578]: I0111 13:12:39.431964    2578 pod_startup_latency_trac>

로그가 다 잘린채로 나오니까 no-pager 옵션을 써보자

node01 ~ ✖ systemctl status kubelet --no-pager -l
○ kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /usr/lib/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: inactive (dead) since Sat 2025-01-11 13:14:54 UTC; 8min ago
       Docs: https://kubernetes.io/docs/
    Process: 2578 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=0/SUCCESS)
   Main PID: 2578 (code=exited, status=0/SUCCESS)

Jan 11 13:12:34 node01 kubelet[2578]: I0111 13:12:34.339103    2578 reconciler_common.go:245] "operationExecutor.VerifyControllerAttachedVolume started for volume \"xtables-lock\" (UniqueName: \"kubernetes.io/host-path/8cc9821e-eb5d-45f4-8012-e2654b1f4647-xtables-lock\") pod \"kube-proxy-jpx5p\" (UID: \"8cc9821e-eb5d-45f4-8012-e2654b1f4647\") " pod="kube-system/kube-proxy-jpx5p"
Jan 11 13:12:34 node01 kubelet[2578]: I0111 13:12:34.339124    2578 reconciler_common.go:245] "operationExecutor.VerifyControllerAttachedVolume started for volume \"lib-modules\" (UniqueName: \"kubernetes.io/host-path/8cc9821e-eb5d-45f4-8012-e2654b1f4647-lib-modules\") pod \"kube-proxy-jpx5p\" (UID: \"8cc9821e-eb5d-45f4-8012-e2654b1f4647\") " pod="kube-system/kube-proxy-jpx5p"
Jan 11 13:12:34 node01 kubelet[2578]: I0111 13:12:34.339143    2578 reconciler_common.go:245] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access-fgqr7\" (UniqueName: \"kubernetes.io/projected/8cc9821e-eb5d-45f4-8012-e2654b1f4647-kube-api-access-fgqr7\") pod \"kube-proxy-jpx5p\" (UID: \"8cc9821e-eb5d-45f4-8012-e2654b1f4647\") " pod="kube-system/kube-proxy-jpx5p"
Jan 11 13:12:34 node01 kubelet[2578]: I0111 13:12:34.339162    2578 reconciler_common.go:245] "operationExecutor.VerifyControllerAttachedVolume started for volume \"cni\" (UniqueName: \"kubernetes.io/host-path/58f9b110-9b9a-49d4-9269-07c440c8a776-cni\") pod \"kube-flannel-ds-8f554\" (UID: \"58f9b110-9b9a-49d4-9269-07c440c8a776\") " pod="kube-flannel/kube-flannel-ds-8f554"
Jan 11 13:12:34 node01 kubelet[2578]: I0111 13:12:34.339182    2578 reconciler_common.go:245] "operationExecutor.VerifyControllerAttachedVolume started for volume \"flannel-cfg\" (UniqueName: \"kubernetes.io/configmap/58f9b110-9b9a-49d4-9269-07c440c8a776-flannel-cfg\") pod \"kube-flannel-ds-8f554\" (UID: \"58f9b110-9b9a-49d4-9269-07c440c8a776\") " pod="kube-flannel/kube-flannel-ds-8f554"
Jan 11 13:12:34 node01 kubelet[2578]: I0111 13:12:34.339202    2578 reconciler_common.go:245] "operationExecutor.VerifyControllerAttachedVolume started for volume \"xtables-lock\" (UniqueName: \"kubernetes.io/host-path/58f9b110-9b9a-49d4-9269-07c440c8a776-xtables-lock\") pod \"kube-flannel-ds-8f554\" (UID: \"58f9b110-9b9a-49d4-9269-07c440c8a776\") " pod="kube-flannel/kube-flannel-ds-8f554"
Jan 11 13:12:34 node01 kubelet[2578]: I0111 13:12:34.445047    2578 swap_util.go:54] "Running under a user namespace - tmpfs noswap is not supported"
Jan 11 13:12:36 node01 kubelet[2578]: I0111 13:12:36.419197    2578 pod_startup_latency_tracker.go:104] "Observed pod startup duration" pod="kube-system/kube-proxy-jpx5p" podStartSLOduration=3.419179267 podStartE2EDuration="3.419179267s" podCreationTimestamp="2025-01-11 13:12:33 +0000 UTC" firstStartedPulling="0001-01-01 00:00:00 +0000 UTC" lastFinishedPulling="0001-01-01 00:00:00 +0000 UTC" observedRunningTime="2025-01-11 13:12:36.418864096 +0000 UTC m=+4.318463660" watchObservedRunningTime="2025-01-11 13:12:36.419179267 +0000 UTC m=+4.318778804"
Jan 11 13:12:37 node01 kubelet[2578]: I0111 13:12:37.733425    2578 kubelet_node_status.go:488] "Fast updating node status as it just became ready"
Jan 11 13:12:39 node01 kubelet[2578]: I0111 13:12:39.431964    2578 pod_startup_latency_tracker.go:104] "Observed pod startup duration" pod="kube-flannel/kube-flannel-ds-8f554" podStartSLOduration=6.431947264 podStartE2EDuration="6.431947264s" podCreationTimestamp="2025-01-11 13:12:33 +0000 UTC" firstStartedPulling="0001-01-01 00:00:00 +0000 UTC" lastFinishedPulling="0001-01-01 00:00:00 +0000 UTC" observedRunningTime="2025-01-11 13:12:39.431826929 +0000 UTC m=+7.331426463" watchObservedRunningTime="2025-01-11 13:12:39.431947264 +0000 UTC m=+7.331546782"

inactive 상태이고 따로 error는 보이지 않음

kubelet을 기동시켜보자

node01 ~ ✖ systemctl start kubelet

node01 ~ ➜  systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /usr/lib/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: active (running) since Sat 2025-01-11 13:25:22 UTC; 4s ago
       Docs: https://kubernetes.io/docs/
   Main PID: 10504 (kubelet)
      Tasks: 20 (limit: 77143)
     Memory: 27.5M
     CGroup: /system.slice/kubelet.service
             └─10504 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/k>

Jan 11 13:25:23 node01 kubelet[10504]: I0111 13:25:23.304148   10504 kubelet_node_status.go:111] "Node was previously r>
Jan 11 13:25:23 node01 kubelet[10504]: I0111 13:25:23.304224   10504 kubelet_node_status.go:75] "Successfully registere>
Jan 11 13:25:24 node01 kubelet[10504]: I0111 13:25:24.028180   10504 apiserver.go:52] "Watching apiserver"
Jan 11 13:25:24 node01 kubelet[10504]: I0111 13:25:24.039340   10504 desired_state_of_world_populator.go:154] "Finished>
Jan 11 13:25:24 node01 kubelet[10504]: I0111 13:25:24.093474   10504 reconciler_common.go:245] "operationExecutor.Verif>
Jan 11 13:25:24 node01 kubelet[10504]: I0111 13:25:24.093526   10504 reconciler_common.go:245] "operationExecutor.Verif>
Jan 11 13:25:24 node01 kubelet[10504]: I0111 13:25:24.093552   10504 reconciler_common.go:245] "operationExecutor.Verif>
Jan 11 13:25:24 node01 kubelet[10504]: I0111 13:25:24.093585   10504 reconciler_common.go:245] "operationExecutor.Verif>
Jan 11 13:25:24 node01 kubelet[10504]: I0111 13:25:24.093606   10504 reconciler_common.go:245] "operationExecutor.Verif>
Jan 11 13:25:24 node01 kubelet[10504]: I0111 13:25:24.093626   10504 reconciler_common.go:245] "operationExecutor.Verif>

node01 ~ ➜  exit
logout
Connection to node01 closed.

controlplane ~ ➜  kubectl get node
NAME           STATUS   ROLES           AGE   VERSION
controlplane   Ready    control-plane   14m   v1.31.0
node01         Ready    <none>          14m   v1.31.0

controlplane으로 돌아와서 node01 Ready 확인

cluster의 issue를 fix하세요

controlplane ~ ➜  kubectl get all -A
NAMESPACE      NAME                                       READY   STATUS    RESTARTS   AGE
kube-flannel   pod/kube-flannel-ds-2vclv                  1/1     Running   0          17m
kube-flannel   pod/kube-flannel-ds-8f554                  1/1     Running   0          16m
kube-system    pod/coredns-77d6fd4654-nc68j               1/1     Running   0          17m
kube-system    pod/coredns-77d6fd4654-wk6mq               1/1     Running   0          17m
kube-system    pod/etcd-controlplane                      1/1     Running   0          17m
kube-system    pod/kube-apiserver-controlplane            1/1     Running   0          17m
kube-system    pod/kube-controller-manager-controlplane   1/1     Running   0          17m
kube-system    pod/kube-proxy-7brg6                       1/1     Running   0          17m
kube-system    pod/kube-proxy-jpx5p                       1/1     Running   0          16m
kube-system    pod/kube-scheduler-controlplane            1/1     Running   0          17m

NAMESPACE     NAME                 TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)                  AGE
default       service/kubernetes   ClusterIP   172.20.0.1    <none>        443/TCP                  17m
kube-system   service/kube-dns     ClusterIP   172.20.0.10   <none>        53/UDP,53/TCP,9153/TCP   17m

NAMESPACE      NAME                             DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
kube-flannel   daemonset.apps/kube-flannel-ds   2         2         1       2            1           <none>                   17m
kube-system    daemonset.apps/kube-proxy        2         2         1       2            1           kubernetes.io/os=linux   17m

NAMESPACE     NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
kube-system   deployment.apps/coredns   2/2     2            2           17m

NAMESPACE     NAME                                 DESIRED   CURRENT   READY   AGE
kube-system   replicaset.apps/coredns-77d6fd4654   2         2         2       17m

controlplane ~ ➜  kubectl get node
NAME           STATUS     ROLES           AGE   VERSION
controlplane   Ready      control-plane   17m   v1.31.0
node01         NotReady   <none>          16m   v1.31.0

node01에 또 문제가 생김

node01에서 kubelet 상태 확인

controlplane ~ ➜  ssh node01
Welcome to Ubuntu 22.04.4 LTS (GNU/Linux 5.15.0-1072-gcp x86_64)

* Documentation:  https://help.ubuntu.com
* Management:     https://landscape.canonical.com
* Support:        https://ubuntu.com/pro

This system has been minimized by removing packages and content that are
not required on a system that users do not log into.

To restore this content, you can run the 'unminimize' command.
Last login: Sat Jan 11 13:18:33 2025 from 192.168.231.130

node01 ~ ➜  systemctl status kubelet --no-pager -l
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /usr/lib/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: activating (auto-restart) (Result: exit-code) since Sat 2025-01-11 13:30:38 UTC; 6s ago
       Docs: https://kubernetes.io/docs/
    Process: 13789 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
   Main PID: 13789 (code=exited, status=1/FAILURE)

더 자세히 살펴보기 위해 journalctl 사용

node01 ~ ➜  journalctl -u kubelet --no-pager -l
Jan 11 13:31:29 node01 kubelet[14332]: E0111 13:31:29.796849   14332 run.go:72] "command failed" err="failed to construct kubelet dependencies: unable to load client CA file /etc/kubernetes/pki/WRONG-CA-FILE.crt: open /etc/kubernetes/pki/WRONG-CA-FILE.crt: no such file or directory"

crt 파일 경로 문제다

/var/lib/kubelet 경로로 가서 config파일을 살펴보자

node01 /var/lib/kubelet ➜  cat config.yaml

apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 0s
    enabled: true
  x509:
    clientCAFile: /etc/kubernetes/pki/WRONG-CA-FILE.crt
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 0s
    cacheUnauthorizedTTL: 0s
cgroupDriver: cgroupfs
clusterDNS:
- 172.20.0.10
clusterDomain: cluster.local
containerRuntimeEndpoint: ""
cpuManagerReconcilePeriod: 0s
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMaximumGCAge: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
logging:
  flushFrequency: 0
  options:
    json:
      infoBufferSize: "0"
    text:
      infoBufferSize: "0"
  verbosity: 0
memorySwap: {}
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
resolvConf: /run/systemd/resolve/resolv.conf
rotateCertificates: true
runtimeRequestTimeout: 0s
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s

clientCAFile 에서 잘못된 부분 발견

수정 후 kubelet 기동

node01 /var/lib/kubelet ➜  systemctl restart kubelet && systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /usr/lib/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: active (running) since Sat 2025-01-11 13:35:24 UTC; 12ms ago
       Docs: https://kubernetes.io/docs/
   Main PID: 16993 (kubelet)
      Tasks: 6 (limit: 77143)
     Memory: 1.9M
     CGroup: /system.slice/kubelet.service
             └─16993 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfi>

node01 /var/lib/kubelet ➜  exit
logout
Connection to node01 closed.

controlplane ~ ➜  kubectl get node
NAME           STATUS   ROLES           AGE   VERSION
controlplane   Ready    control-plane   23m   v1.31.0
node01         Ready    <none>          23m   v1.31.0

controlplane으로 돌아와서 node01 Ready 확인

cluster의 issue를 fix하세요

controlplane ~ ➜  kubectl get all -A
NAMESPACE      NAME                                       READY   STATUS    RESTARTS   AGE
kube-flannel   pod/kube-flannel-ds-2vclv                  1/1     Running   0          25m
kube-flannel   pod/kube-flannel-ds-8f554                  1/1     Running   0          25m
kube-system    pod/coredns-77d6fd4654-nc68j               1/1     Running   0          25m
kube-system    pod/coredns-77d6fd4654-wk6mq               1/1     Running   0          25m
kube-system    pod/etcd-controlplane                      1/1     Running   0          25m
kube-system    pod/kube-apiserver-controlplane            1/1     Running   0          25m
kube-system    pod/kube-controller-manager-controlplane   1/1     Running   0          25m
kube-system    pod/kube-proxy-7brg6                       1/1     Running   0          25m
kube-system    pod/kube-proxy-jpx5p                       1/1     Running   0          25m
kube-system    pod/kube-scheduler-controlplane            1/1     Running   0          25m

NAMESPACE     NAME                 TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)                  AGE
default       service/kubernetes   ClusterIP   172.20.0.1    <none>        443/TCP                  25m
kube-system   service/kube-dns     ClusterIP   172.20.0.10   <none>        53/UDP,53/TCP,9153/TCP   25m

NAMESPACE      NAME                             DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
kube-flannel   daemonset.apps/kube-flannel-ds   2         2         1       2            1           <none>                   25m
kube-system    daemonset.apps/kube-proxy        2         2         1       2            1           kubernetes.io/os=linux   25m

NAMESPACE     NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
kube-system   deployment.apps/coredns   2/2     2            2           25m

NAMESPACE     NAME                                 DESIRED   CURRENT   READY   AGE
kube-system   replicaset.apps/coredns-77d6fd4654   2         2         2       25m

controlplane ~ ➜  kubectl get node
NAME           STATUS     ROLES           AGE   VERSION
controlplane   Ready      control-plane   25m   v1.31.0
node01         NotReady   <none>          25m   v1.31.0

node01 에 또 문제가 생김

node01으로 가서 kubelet 상태 확인

node01 ~ ➜  systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /usr/lib/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: active (running) since Sat 2025-01-11 13:36:40 UTC; 1min 21s ago
       Docs: https://kubernetes.io/docs/
   Main PID: 17866 (kubelet)
      Tasks: 23 (limit: 77143)
     Memory: 28.2M
     CGroup: /system.slice/kubelet.service
             └─17866 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfi>

Jan 11 13:37:54 node01 kubelet[17866]: E0111 13:37:54.238767   17866 reflector.go:158] "Unhandled Error" err="k>
Jan 11 13:37:54 node01 kubelet[17866]: E0111 13:37:54.727279   17866 event.go:368] "Unable to write event (may >
Jan 11 13:37:56 node01 kubelet[17866]: E0111 13:37:56.218128   17866 controller.go:145] "Failed to ensure lease>
Jan 11 13:37:56 node01 kubelet[17866]: I0111 13:37:56.465277   17866 kubelet_node_status.go:72] "Attempting to >
Jan 11 13:37:56 node01 kubelet[17866]: E0111 13:37:56.466430   17866 kubelet_node_status.go:95] "Unable to regi>
Jan 11 13:37:57 node01 kubelet[17866]: W0111 13:37:57.080597   17866 reflector.go:561] k8s.io/client-go/informe>
Jan 11 13:37:57 node01 kubelet[17866]: E0111 13:37:57.080681   17866 reflector.go:158] "Unhandled Error" err="k>
Jan 11 13:37:58 node01 kubelet[17866]: W0111 13:37:58.055602   17866 reflector.go:561] k8s.io/client-go/informe>
Jan 11 13:37:58 node01 kubelet[17866]: E0111 13:37:58.055669   17866 reflector.go:158] "Unhandled Error" err="k>
Jan 11 13:38:00 node01 kubelet[17866]: E0111 13:38:00.729048   17866 eviction_manager.go:285] "Eviction manager

kubelet 로그 확인

node01 ~ ➜  journalctl -u kubelet --no-pager -l --lines 5
Jan 11 13:40:40 node01 kubelet[17866]: E0111 13:40:40.740250   17866 eviction_manager.go:285] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"node01\" not found"
Jan 11 13:40:44 node01 kubelet[17866]: E0111 13:40:44.112368   17866 event.go:368] "Unable to write event (may retry after sleeping)" err="Post \"https://controlplane:6553/api/v1/namespaces/default/events\": dial tcp 192.168.231.130:6553: connect: connection refused" event="&Event{ObjectMeta:{node01.1819a6dd443cce8f  default    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},InvolvedObject:ObjectReference{Kind:Node,Namespace:,Name:node01,UID:node01,APIVersion:,ResourceVersion:,FieldPath:,},Reason:InvalidDiskCapacity,Message:invalid capacity 0 on image filesystem,Source:EventSource{Component:kubelet,Host:node01,},FirstTimestamp:2025-01-11 13:36:40.590855823 +0000 UTC m=+0.115143600,LastTimestamp:2025-01-11 13:36:40.590855823 +0000 UTC m=+0.115143600,Count:1,Type:Warning,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:kubelet,ReportingInstance:node01,}"
Jan 11 13:40:44 node01 kubelet[17866]: E0111 13:40:44.262173   17866 controller.go:145] "Failed to ensure lease exists, will retry" err="Get \"https://controlplane:6553/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/node01?timeout=10s\": dial tcp 192.168.231.130:6553: connect: connection refused" interval="7s"
Jan 11 13:40:44 node01 kubelet[17866]: I0111 13:40:44.526864   17866 kubelet_node_status.go:72] "Attempting to register node" node="node01"
Jan 11 13:40:44 node01 kubelet[17866]: E0111 13:40:44.528138   17866 kubelet_node_status.go:95] "Unable to register node with API server" err="Post \"https://controlplane:6553/api/v1/nodes\": dial tcp 192.168.231.130:6553: connect: connection refused" node="node01"

connection refused 에러

https://contolplane:6553/ 으로 통신이 원활하지 않은 것 같다

요청이 kube-apiserver로 제대로 가는 건지 먼저 확인해야겠지

node01 ~ ➜  exit
logout
Connection to node01 closed.

controlplane ~ ➜  netstat -tnlp | grep kube
tcp        0      0 127.0.0.1:10249         0.0.0.0:*               LISTEN      4634/kube-proxy
tcp        0      0 127.0.0.1:10248         0.0.0.0:*               LISTEN      4169/kubelet
tcp        0      0 127.0.0.1:10257         0.0.0.0:*               LISTEN      3224/kube-controlle
tcp        0      0 127.0.0.1:10259         0.0.0.0:*               LISTEN      3790/kube-scheduler
tcp6       0      0 :::6443                 :::*                    LISTEN      3457/kube-apiserver
tcp6       0      0 :::8888                 :::*                    LISTEN      4324/kubectl
tcp6       0      0 :::10256                :::*                    LISTEN      4634/kube-proxy
tcp6       0      0 :::10250                :::*                    LISTEN      4169/kubelet

6553이 아니고 6443포트임

controlplane ~ ➜  ssh node01
Welcome to Ubuntu 22.04.4 LTS (GNU/Linux 5.15.0-1072-gcp x86_64)

* Documentation:  https://help.ubuntu.com
* Management:     https://landscape.canonical.com
* Support:        https://ubuntu.com/pro

This system has been minimized by removing packages and content that are
not required on a system that users do not log into.

To restore this content, you can run the 'unminimize' command.
Last login: Sat Jan 11 13:37:58 2025 from 192.168.231.130

다시 node01로 돌아와서

/etc/kubernetes/kubelet.conf 확인

node01 /etc/kubernetes ➜  cat kubelet.conf

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURCVENDQWUyZ0F3SUJBZ0lJZFBTT2J6YVVPY293RFFZSktvWklodmNOQVFFTEJRQXdGVEVUTUJFR0ExVUUKQXhNS2EzVmlaWEp1WlhSbGN6QWVGdzB5TlRBeE1URXhNekEyTXpkYUZ3MHpOVEF4TURreE16RXhNemRhTUJVeApFekFSQmdOVkJBTVRDbXQxWW1WeWJtVjBaWE13Z2dFaU1BMEdDU3FHU0liM0RRRUJBUVVBQTRJQkR3QXdnZ0VLCkFvSUJBUURtRkRUSGQ0dmxzdXdud0dMYW1Sc09GNjIxblJMNW1oaWRFbjBsUlkxMjBhcU96ekNjbGRDTWEzMkMKMmFRYUhFZWZnQlpwRnJPR0FlWXdlUnNIQ25zSzVhSEVrdmJBaWRMMjZpRUo5cVlQTEROb0Z5YmUvY3BBRklwSwoyalNjTmtJbnRrVzdNTWtvWU9yaUUwdzRTRVRWazNCMnNCNWRlL2FtZGhORTJXakY0VlhkVXRCYUZlSDdDdVBHCkg2UFhSY1hGS2o4VzkxOEhDdDNhSStzOXpncld5R09ZM1QrTVVjaDR3SHRncEk0STN0NUpSaXNtN0xsdjgvZnEKYVMrQWh2bEFyRUg2YW1VbmdGVUowazRhMklBL2QvSDhKOEZyVmdGUmZYZnpianhrZjArS1FURzNVVHhwN2ljeQpkaEJJNWt6VTVveVRHL2VIY3ZDZHUrNnNJcTBYQWdNQkFBR2pXVEJYTUE0R0ExVWREd0VCL3dRRUF3SUNwREFQCkJnTlZIUk1CQWY4RUJUQURBUUgvTUIwR0ExVWREZ1FXQkJRelZkUW5DR00rZWxrME42Q24xa3ZIbEZPbzNUQVYKQmdOVkhSRUVEakFNZ2dwcmRXSmxjbTVsZEdWek1BMEdDU3FHU0liM0RRRUJDd1VBQTRJQkFRQUQxWHYwazlpMApmL2pwVjRVcVlaRUpFTFBJczIrSHBQSXZYaEhtSEcvbFkzSHgwZldQVmkxRTBNK0JZMVFXTStDWHZqU2NjVDlSCmxBRXpXaWprWFF2K2Y5eW12VEVoQ25WREU0ZzR2dkhLR282dHlRWVFVZmtCTldVUWltcDRmNE1UaTNRSkNOTnMKTzJSdUlTYVRtYlNlR2QwTnNCOFBqMkhuR3NraVFOZmxYaXZxNks3V0h1dC9PK0x1TU82akIxRnI5TWV3dldGNQpRSXBNdU5oa2dyTUVCM0p4VG43VFA3T0pJclp2M3JSU1lGSHJoM3JwcDluU2pYKzJCUjR2d05TT2c0QmV2TjloCmxrVkRlTzNyQm5ZT1RKRkxSYUgvQ2g1bTVJWThpZWFOaDM0ZHovKy96MkhvZjJLOCtaSGowbi9nTDExTVFJYmUKcWV5eTQ3Zk9uR1QyCi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
    server: https://controlplane:6553
  name: default-cluster
contexts:
- context:
    cluster: default-cluster
    namespace: default
    user: default-auth
  name: default-context
current-context: default-context
kind: Config
preferences: {}
users:
- name: default-auth
  user:
    client-certificate: /var/lib/kubelet/pki/kubelet-client-current.pem
    client-key: /var/lib/kubelet/pki/kubelet-client-current.pem

server의 6553 포트 수정 후 kubelet 재기동

node01 ~ ➜  systemctl restart kubelet && systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /usr/lib/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: active (running) since Sat 2025-01-11 13:46:28 UTC; 29ms ago
       Docs: https://kubernetes.io/docs/
   Main PID: 22997 (kubelet)
      Tasks: 9 (limit: 77143)
     Memory: 7.9M
     CGroup: /system.slice/kubelet.service
             └─22997 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfi

node01 ~ ➜  exit
logout
Connection to node01 closed.

controlplane ~ ➜  kubectl get node
NAME           STATUS   ROLES           AGE   VERSION
controlplane   Ready    control-plane   34m   v1.31.0
node01         Ready    <none>          34m   v1.31.0

controlplane으로 돌아와서 node01 Ready 확인

Optional)

/var/libe/kubelet/config.yaml 과 /etc/kubernetes/kubelet.conf 차이

https://ysh94.tistory.com/170

/var/libe/kubelet/config.yaml 과 /etc/kubernetes/kubelet.conf 차이

/var/lib/kubelet/config.yaml와 /etc/kubernetes/kubelet.conf는 모두 Kubernetes에서 kubelet의 동작을 설정하는 파일이지만, 이 두 파일은 서로 다른 역할과 내용을 가지고 있습니다.1. /var/lib/kubelet/config.yaml역할: kub

ysh94.tistory.com

/var/libe/kubelet/config.yaml 과 /etc/kubernetes/kubelet.conf 차이

'DevOps' 카테고리의 다른 글

CKA 예제 리마인더 - 38. Mock Exam - 2 (0)	2025.01.15
CKA 예제 리마인더 - 37. Mock Exam - 1 (0)	2025.01.13
/var/libe/kubelet/config.yaml 과 /etc/kubernetes/kubelet.conf 차이 (0)	2025.01.11
CKA 예제 리마인더 - 35. Control Plane Failure (0)	2025.01.11
CKA 예제 리마인더 - 34. Application Failure (0)	2025.01.10

현재글CKA 예제 리마인더 - 36. Worker Node Failure

내 작은 개발 이야기 Vince_rf 님의 블로그입니다.

항해99, til, 코테, EKS, CKA, Wil, DevOps, Terraform, 프로그래머스, SQL, Java, 알고리즘, 1일1로그, cs스터디, kubernetes, CORS, kubelet, Python, AWS, 개발,

Today :
Yesterday :

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

내 작은 개발 이야기