Expected Behavior and the CrashLoopBackOff Error

When deploying an application to Kubernetes (K8s), the expected behavior is that the Pod will transition from Pending directly into Running, and the containers within will remain up. However, one of the most common and notorious errors encountered is CrashLoopBackOff.

This status means that a container inside the Pod is starting, instantly crashing, and Kubernetes is continuously trying to restart it—with increasing delays (backoffs) between each attempt (10s, 20s, 40s, up to 5 minutes).

Unlike a Pending state (which implies an infrastructure or scheduling issue), a CrashLoopBackOff explicitly tells you that the node has scheduled the Pod, pulled the image, and executed the container, but the application inside it voluntarily exited or was forcefully killed.

Prerequisites

Before diving into the troubleshooting steps, ensure you have:

  • A running Kubernetes Cluster (Minikube, EKS, GKE, AKS, or bare metal).
  • The kubectl command-line tool installed and configured to point to your cluster.
  • The exact name of the Pod exhibiting the error and the namespace it resides in.

Root Causes of CrashLoopBackOff

Since CrashLoopBackOff is a symptom rather than the exact cause, we have to look for the underlying reason. The most common culprits include:

  1. Application Panic/Fatal Error: The code encounters an unhandled exception or missing dependency immediately upon startup (e.g., failing to connect to a database) and explicitly exits.
  2. Missing Configuration/Secrets: The application expects an environment variable mapped from a ConfigMap or Secret that does not exist or has a typo.
  3. Liveness Probe Failures: Your livenessProbe is failing consecutively, causing Kubernetes to interpret the container as unhealthy and killing it repeatedly to restart it.
  4. OOMKilled (Out of Memory): The container tries to allocate more memory than its assigned limits.memory allows, prompting the Linux kernel to terminate the process.
  5. Invalid Entrypoint/Command: The command specified in the YAML or the Dockerfile’s CMD/ENTRYPOINT is syntactically incorrect, lacks permissions, or exits immediately because it’s not a foreground process.

Step-by-Step Solution

1. View the Pod Description

The first command you should run is describe. This will give you the exact Exit Code of the crash.

kubectl describe pod <pod-name> -n <namespace>

Scroll down to the Containers section and look at the Last State block:

    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 27 Feb 2026 10:00:00 GMT
      Finished:     Fri, 27 Feb 2026 10:00:02 GMT

Common Exit Codes:

  • Exit Code 1: General application error (look at your code logs).
  • Exit Code 2: Misuse of shell built-ins (check your Dockerfile command).
  • Exit Code 126: Command invoked cannot execute (permissions error).
  • Exit Code 128: Invalid exit argument.
  • Exit Code 137: OOMKilled (Out of Memory - the container hit its limit).
  • Exit Code 255: Exit status out of range (usually a fatal initialization failure).

2. Check the Previous Logs

Because the container is actively crashing and restarting, normal kubectl logs might return empty if you hit the container just as it spins up. Instead, use the --previous flag to fetch the logs from the dead container just before it was killed:

kubectl logs <pod-name> -n <namespace> --previous

This is highly effective for catching stack traces, missing database bindings, or “File Not Found” errors that caused the panic.

3. Check for OOMKilled

If the Exit Code is 137 or the describe output literally says OOMKilled, the solution is to increase your container’s memory limits.

Edit your deployment YAML to bump the memory limit:

resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"  # <-- Increase this value
    cpu: "500m"

Apply the changes using kubectl apply -f deployment.yaml.

4. Overriding the Entrypoint to Debug (Alternative Solution)

If the log output is cryptic or non-existent (e.g., immediate script exit due to permissions), a great way to debug safely is to override the command with a sleeping shell. This forces the container to stay alive long enough for you to enter it.

Temporarily modify your Deployment manifest:

containers:
  - name: my-crashing-app
    image: my-registry/my-app:v1
    command: ["sleep", "3600"] # Forces the pod to stay alive

Apply the manifest. The Pod will now transition to Running and stay there. Then, safely exec into it:

kubectl exec -it <pod-name> -- /bin/sh

Once inside, you can manually run your application script (npm start, python app.py, etc.) and observe the error in real-time, inspect files, or verify environment variables.


Prevention

To reduce the occurrence of CrashLoopBackOff in your production clusters, implement the following best practices:

  • Implement Readiness and Liveness Probes Properly: Make sure your livenessProbe gives the application enough initialDelaySeconds to boot before it starts pinging it.
  • Use Init Containers: If your app depends on a database or external service being available, use an initContainer to wait for that service before starting the main application, effectively preventing the panic.
  • Validate Configuration Dependencies: Ensure your CI/CD pipeline validates that ConfigMaps and Secrets referenced in your Deployments actually exist before applying the manifests.

Summary

  • CrashLoopBackOff indicates a container started but exited/died repeatedly.
  • Use kubectl describe pod to locate the Exit Code.
  • Use kubectl logs --previous to read the stack trace of the last crashed iteration.
  • Exit Code 137 means memory limits were hit (OOMKilled).
  • Override the deployment command to sleep if you need to manually SSH into the pod to debug filesystem or permission issues.