What does CrashLoopBackOff mean in Kubernetes?

CrashLoopBackOff means a pod's container keeps crashing and Kubernetes is restarting it with exponentially increasing delays between each restart attempt.

How do I fix ImagePullBackOff in Kubernetes?

ImagePullBackOff means Kubernetes cannot pull the container image. Fix it by verifying the image name and tag, checking registry credentials, and ensuring network connectivity to the registry.

Why is my Kubernetes pod stuck in Pending state?

A pod stays Pending when the scheduler cannot place it on a node, usually due to insufficient CPU or memory resources, node affinity constraints, or missing PersistentVolumes.

What causes OOMKilled in Kubernetes pods?

OOMKilled occurs when a container exceeds its memory limit. The kernel's OOM killer terminates the process. Fix it by increasing memory limits or optimizing your application's memory usage.

Kubernetes Pod Troubleshooting Guide

Kubernetes pods can fail in many ways, and each failure state tells a different story. Whether you are facing CrashLoopBackOff, ImagePullBackOff, Pending, OOMKilled, or other error states, knowing how to systematically diagnose and fix these issues is essential for any Kubernetes operator. This guide walks you through the most common pod failure states, the kubectl commands to diagnose them, and proven strategies to resolve each one.

Prerequisites

A running Kubernetes cluster (v1.24 or later recommended)
kubectl installed and configured with cluster access
Basic understanding of Kubernetes objects (pods, deployments, services)
Permissions to read pods, events, and node resources in your target namespace
Familiarity with container concepts (images, registries, resource limits)

Understanding Pod Lifecycle and States

Before diving into troubleshooting, it helps to understand the Kubernetes pod lifecycle. A pod moves through several phases:

Phase	Description
Pending	Pod accepted by the cluster but one or more containers are not yet running
Running	Pod bound to a node and all containers started
Succeeded	All containers terminated successfully (exit code 0)
Failed	All containers terminated and at least one exited with an error
Unknown	Pod state cannot be determined, usually due to node communication failure

Within these phases, containers can enter specific waiting states that indicate what went wrong. These are the status messages you see in kubectl get pods output — and they are your first diagnostic clue.

Common Pod Failure States

CrashLoopBackOff

CrashLoopBackOff is the most common pod failure you will encounter. It means the container starts, crashes, and Kubernetes keeps restarting it with increasing delays (10s, 20s, 40s, up to 5 minutes).

Common causes:

Application error causing immediate exit (missing config, unhandled exception)
Missing environment variables or mounted secrets
Incorrect command or entrypoint in the container spec
Health check (liveness probe) failing too aggressively
Dependency on a service that is not available

Diagnostic commands:

# Check the pod status and restart count
kubectl get pods -o wide

# View the last container's logs
kubectl logs <pod-name> --previous

# Check events for the pod
kubectl describe pod <pod-name>

The --previous flag is critical — without it, you may get empty or partial logs because the new container instance just started. The describe output shows the Last State with the exit code, which tells you whether the process crashed (exit code 1) or was killed (exit code 137 for OOM, 143 for SIGTERM).

ImagePullBackOff

ImagePullBackOff means Kubernetes cannot download the container image from the registry. The pod stays in this state, retrying with exponential backoff.

Common causes:

Typo in the image name or tag
Image tag does not exist (e.g., referencing latest when only versioned tags are pushed)
Missing or expired imagePullSecrets
Private registry with no credentials configured
Network policy or firewall blocking access to the registry
Registry rate limits (Docker Hub throttling)

Diagnostic commands:

# Check the exact image reference
kubectl describe pod <pod-name> | grep -A 5 "Image:"

# Look for pull errors in events
kubectl get events --field-selector involvedObject.name=<pod-name>

# Verify imagePullSecrets exist
kubectl get secrets -n <namespace>

Pending

A Pending pod means the scheduler has not yet assigned it to a node. This can persist indefinitely if the underlying issue is not resolved.

Common causes:

Insufficient CPU or memory across all nodes
Node selectors, taints, or affinities that no node satisfies
PersistentVolumeClaim (PVC) not bound — no matching PersistentVolume available
ResourceQuota exceeded in the namespace
Too many pods on the cluster (max-pods limit on nodes)

Diagnostic commands:

# Check why the pod is pending
kubectl describe pod <pod-name>

# View scheduler events
kubectl get events --sort-by='.lastTimestamp' -n <namespace>

# Check node resources
kubectl describe nodes | grep -A 5 "Allocated resources"

# Check PVC status if the pod uses volumes
kubectl get pvc -n <namespace>

OOMKilled

OOMKilled (exit code 137) means the Linux kernel’s Out-Of-Memory killer terminated the container because it exceeded its memory limit.

Common causes:

Memory limit set too low for the application
Memory leak in the application
JVM heap size not aligned with container memory limit
Sidecar containers consuming shared memory
Loading large datasets into memory

Diagnostic commands:

# Check the termination reason
kubectl describe pod <pod-name> | grep -A 3 "Last State"

# View current memory usage (requires metrics-server)
kubectl top pod <pod-name>

# Check configured limits
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[*].resources}'

Other Failure States

State	Meaning	Typical Fix
CreateContainerConfigError	Missing ConfigMap or Secret	Verify referenced ConfigMaps and Secrets exist
RunContainerError	Container runtime failure	Check security context, volume mounts, and node container runtime logs
Evicted	Node under resource pressure	Check node disk/memory pressure conditions and set proper resource requests
Init:Error	Init container failed	Check init container logs with `kubectl logs <pod> -c <init-container>`
Terminating (stuck)	Finalizers blocking deletion	Check finalizers with `kubectl get pod -o json` and remove if safe

Diagnosing Issues with kubectl

The diagnosis workflow follows a consistent pattern regardless of the failure type:

Step 1: Get the Big Picture

# All pods in the namespace with status
kubectl get pods -n <namespace> -o wide

# Recent events sorted by time
kubectl get events --sort-by='.lastTimestamp' -n <namespace> | tail -20

Step 2: Deep Dive into the Pod

# Full pod description with events
kubectl describe pod <pod-name> -n <namespace>

# Container logs (current instance)
kubectl logs <pod-name> -n <namespace>

# Container logs (previous crashed instance)
kubectl logs <pod-name> -n <namespace> --previous

# Logs from a specific container in a multi-container pod
kubectl logs <pod-name> -c <container-name> -n <namespace>

Step 3: Check the Node

# Node conditions (disk pressure, memory pressure, PID pressure)
kubectl describe node <node-name> | grep -A 10 "Conditions"

# Resource allocation on the node
kubectl describe node <node-name> | grep -A 20 "Allocated resources"

Step 4: Interactive Debugging

# Exec into a running container
kubectl exec -it <pod-name> -- /bin/sh

# Use ephemeral debug container (K8s 1.23+)
kubectl debug -it <pod-name> --image=busybox --target=<container-name>

# Run a debug pod in the same network namespace
kubectl run debug --rm -it --image=busybox -- /bin/sh

kubectl Commands Comparison Table

Failure State	First Command	Key Information
CrashLoopBackOff	`kubectl logs <pod> --previous`	Application error output before crash
ImagePullBackOff	`kubectl describe pod <pod>`	Image name, pull errors, secret references
Pending	`kubectl describe pod <pod>`	Scheduler failure reason in Events section
OOMKilled	`kubectl describe pod <pod>`	Last State termination reason and exit code
CreateContainerConfigError	`kubectl get configmap,secret -n <ns>`	Missing referenced resources
Evicted	`kubectl describe node <node>`	Node resource pressure conditions
Init:Error	`kubectl logs <pod> -c <init-container>`	Init container failure logs

Real-World Scenario

You manage a production cluster running a microservices application. After a deployment, the payment-service pod keeps restarting and shows CrashLoopBackOff. Here is how you diagnose it:

$ kubectl get pods -n production
NAME                              READY   STATUS             RESTARTS   AGE
payment-service-7d4f8b9c6-x2k9m  0/1     CrashLoopBackOff   5          8m
api-gateway-5c8f7d6b4-h3j7n      1/1     Running            0          2d
user-service-6b7c8d9e5-m4n8p     1/1     Running            0          2d

You check the previous container’s logs:

$ kubectl logs payment-service-7d4f8b9c6-x2k9m --previous
2026-02-28 10:15:03 ERROR: Failed to connect to database
  ConnectionRefused: tcp://db-service:5432
2026-02-28 10:15:03 FATAL: Cannot start without database connection. Exiting.

The application requires a database connection at startup, but db-service is unreachable. You check the service:

$ kubectl get svc db-service -n production
NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
db-service   ClusterIP   10.96.45.123   <none>        5432/TCP   30d

$ kubectl get endpoints db-service -n production
NAME         ENDPOINTS   AGE
db-service   <none>      30d

No endpoints — the database pod is down. You find it was evicted due to node disk pressure:

$ kubectl get pods -n production | grep db
db-postgresql-0   0/1     Evicted   0   30d

The fix: clear disk space on the node (or add more nodes), restart the database pod, and the payment service recovers automatically. You also add a startupProbe with a generous timeout so the payment service waits for the database instead of immediately crashing.

Gotchas and Edge Cases

Exit code 137 vs 143: Exit code 137 means the container was killed by SIGKILL (usually OOMKilled). Exit code 143 means SIGTERM (graceful shutdown). Do not confuse them — 137 requires memory investigation, 143 is usually normal during rollouts.
CrashLoopBackOff delay: Kubernetes uses exponential backoff up to 5 minutes between restarts. If you fix the issue, you may still need to wait or delete the pod to restart it immediately.
ImagePullPolicy: Always: If your pod spec uses imagePullPolicy: Always (the default for latest tags), every pod restart triggers an image pull. This can cause ImagePullBackOff if the registry is temporarily unreachable, even though the image was previously cached on the node.
Resource requests vs limits: Pods are scheduled based on requests, not limits. A pod requesting 100Mi but limited to 500Mi can be OOMKilled at 500Mi even if the node has 2Gi free — the limit is enforced regardless of node capacity.
Multi-container pods: In a pod with sidecars, kubectl logs defaults to the first container. Always specify -c <container-name> when debugging multi-container pods.
Ephemeral storage evictions: Even if CPU and memory are fine, high ephemeral storage usage (logs, temp files) can trigger eviction. Check with kubectl describe node under Conditions.
PVC in wrong availability zone: In cloud environments, a PVC bound to a volume in us-east-1a cannot be mounted by a pod scheduled to a node in us-east-1b. The pod stays Pending with no obvious error.
DNS resolution lag: Newly created services may not resolve immediately. If your container crashes because it cannot resolve a service name, add a startup delay or retry logic instead of relying on instant DNS propagation.

Summary

CrashLoopBackOff means your container keeps crashing — check logs with --previous to see the error before the crash
ImagePullBackOff indicates an image pull failure — verify image name, tag, and registry credentials
Pending means the scheduler cannot place the pod — check resource availability, PVC status, and node affinity rules
OOMKilled (exit code 137) means the container exceeded its memory limit — increase limits or optimize memory usage
Always start with kubectl describe pod and kubectl get events to understand the failure context
Use kubectl debug for ephemeral containers when you need interactive troubleshooting without modifying the pod spec
Set proper resource requests, limits, and probes to prevent many common pod failures before they happen

Prerequisites

Understanding Pod Lifecycle and States

Common Pod Failure States

CrashLoopBackOff

ImagePullBackOff

Pending

OOMKilled

Other Failure States

Diagnosing Issues with kubectl

Step 1: Get the Big Picture

Step 2: Deep Dive into the Pod

Step 3: Check the Node

Step 4: Interactive Debugging

kubectl Commands Comparison Table

Real-World Scenario

Gotchas and Edge Cases

Summary

Related Articles

Frequently Asked Questions

Related Articles

Kubernetes Fundamentals: Deploy Your First Cluster with kubeadm

ArgoCD GitOps for Kubernetes Deployments

Loki Log Aggregation with Promtail and Grafana