Docker Container Debugging

Master the art of Docker container debugging with this comprehensive guide. Learn advanced techniques for troubleshooting container issues in production environments, from analyzing logs and interactive debugging to resolving networking problems, optimizing resource usage, and implementing automated debugging workflows.

Advanced Docker Container Debugging Techniques

Introduction to Container Debugging Challenges

Containerized applications present unique debugging challenges compared to traditional deployments. Docker’s isolation mechanisms—while beneficial for security and portability—add complexity when troubleshooting. Common challenges include:

  1. Limited visibility into container internals
  2. Ephemeral nature of containers
  3. Layered filesystem complexity
  4. Network abstraction complications
  5. Resource constraint issues

This comprehensive guide provides structured approaches and advanced techniques for debugging Docker containers in development and production environments.

Foundational Debugging Workflow

Before diving into specific techniques, let’s establish a methodical debugging workflow:

  1. Identify symptoms: Define what’s wrong specifically
  2. Gather information: Collect logs, states, and metrics
  3. Form hypotheses: Develop theories about potential causes
  4. Test systematically: Verify each hypothesis
  5. Implement solution: Apply fixes and verify results
  6. Document findings: Record the issue and solution

Following this workflow ensures a structured approach rather than random troubleshooting.

Essential Container Inspection Techniques

Analyzing Container Logs

Container logs are your first line of defense when debugging. Docker provides several ways to access logs:

# Basic log retrieval
docker logs container_name

# Follow logs in real-time 
docker logs -f container_name

# Show timestamps
docker logs --timestamps container_name

# Show logs since a specific time
docker logs --since 2023-01-01T00:00:00 container_name

# Show only the last N lines
docker logs --tail 100 container_name

For multi-container applications using Docker Compose:

# View logs for all services
docker-compose logs

# View logs for specific services
docker-compose logs service1 service2

# Follow logs for specific services
docker-compose logs -f service1

Advanced Log Analysis

For complex logging setups:

# Filter logs using grep
docker logs container_name | grep ERROR

# Extract logs to a file for analysis
docker logs container_name > container_logs.txt

# View logs with detailed formatting
docker logs container_name --details

Interactive Container Debugging

When logs aren’t enough, interactive debugging inside the container is essential:

# Start an interactive shell in a running container
docker exec -it container_name /bin/bash

# For containers without bash
docker exec -it container_name /bin/sh

# Run a specific command in the container
docker exec container_name ps aux

If your container has already crashed or won’t start:

# Start a container with the same image but override the entrypoint
docker run --rm -it --entrypoint /bin/bash image_name

# For containers in a Docker Compose setup
docker-compose run --rm --entrypoint /bin/bash service_name

Working with Minimal Container Images

Alpine and distroless images often lack debugging tools. Add them temporarily:

# For Alpine-based images
docker exec -it container_name /bin/sh
apk add --no-cache curl procps lsof htop strace

# For distroless images
# Use a multi-stage build with debugging tools for development

Example Dockerfile for a debuggable distroless container:

FROM golang:1.21 as builder
WORKDIR /app
COPY . .
RUN go build -o /app/myapp

FROM gcr.io/distroless/base-debian12 as production
COPY --from=builder /app/myapp /
CMD ["/myapp"]

FROM debian:12-slim as debug
RUN apt-get update && apt-get install -y curl procps lsof strace htop
COPY --from=builder /app/myapp /
CMD ["/myapp"]

# Use production target by default
# Override with --target=debug for debugging builds

Advanced Container Inspection

Get detailed information about your container’s configuration and state:

# Basic container inspection
docker inspect container_name

# Filter specific fields 
docker inspect --format='{{.State.Status}}' container_name
docker inspect --format='{{.NetworkSettings.IPAddress}}' container_name
docker inspect --format='{{.Config.Env}}' container_name

# Check resource usage
docker stats container_name

For examining mounts, environment variables, and network settings:

# List mounts
docker inspect --format='{{range .Mounts}}{{.Source}} -> {{.Destination}}{{println}}{{end}}' container_name

# List environment variables
docker inspect --format='{{range .Config.Env}}{{println .}}{{end}}' container_name

# Check network settings
docker inspect --format='{{json .NetworkSettings}}' container_name | jq

Network Troubleshooting

Networking issues are among the most common Docker problems. Here’s how to diagnose them:

Inspecting Container Networking

# List all Docker networks
docker network ls

# Inspect a specific network
docker network inspect bridge

# Find which network a container is connected to
docker inspect --format='{{range $net,$v := .NetworkSettings.Networks}}{{$net}}{{end}}' container_name

Testing Network Connectivity

From within a container:

# Install networking tools if needed
apt-get update && apt-get install -y iputils-ping curl netcat-openbsd dnsutils

# Check DNS resolution
nslookup service_name
dig service_name

# Test TCP connectivity
nc -zv service_name 80

# Check routing
traceroute service_name

From the host:

# Test connectivity to a container
docker exec container_name ping -c 4 service_name

# Check if a port is exposed correctly
docker port container_name

# Verify port bindings
netstat -tuln | grep LISTEN

Common Network Issues and Solutions

  1. DNS Resolution Problems

    Symptom: Container can’t resolve other service names

    Debugging:

    docker exec container_name cat /etc/resolv.conf
    docker exec container_name nslookup service_name
    

    Solution: Add custom DNS or use the --dns flag

    docker run --dns 8.8.8.8 image_name
    
  2. Port Binding Conflicts

    Symptom: Container fails to start with “port already in use” error

    Debugging:

    sudo lsof -i :80
    netstat -tuln | grep 80
    

    Solution: Change the host port mapping

    docker run -p 8080:80 image_name
    
  3. Network Mode Issues

    Symptom: Container can’t communicate with specific networks

    Debugging:

    docker network inspect bridge
    docker inspect container_name
    

    Solution: Connect container to the correct network

    docker network connect custom_network container_name
    

Advanced Network Diagnostics

For complex networking issues, use specialized containers:

# Run a network diagnostics container
docker run --rm -it --network container:target_container nicolaka/netshoot

# Capture network traffic
docker run --rm -it --network container:target_container nicolaka/netshoot tcpdump -i any port 80

Resource and Performance Debugging

Resource constraints often cause container instability. Here’s how to identify and resolve them:

Analyzing Resource Usage

# View real-time container stats
docker stats container_name

# Check resource limits
docker inspect --format='{{.HostConfig.Resources}}' container_name

For detailed process information inside the container:

docker exec container_name top
docker exec container_name ps aux
docker exec container_name free -m
docker exec container_name df -h

Diagnosing CPU Issues

Symptom: High CPU usage or throttling

Debugging:

# Check current CPU usage
docker stats container_name --no-stream

# Find CPU-intensive processes in the container
docker exec container_name top -b -n 1 | sort -k 9 -r | head

# Install and use htop for better visibility
docker exec -it container_name sh -c "apt-get update && apt-get install -y htop && htop"

Solutions:

  • Increase CPU limits: docker run --cpus=2 image_name
  • Optimize application code for CPU usage
  • Add CPU affinity: docker run --cpuset-cpus="0,1" image_name

Resolving Memory Problems

Symptom: Container crashes with Out-of-Memory (OOM) errors

Debugging:

# Check if container was killed by OOM
docker inspect container_name | grep OOMKilled

# Analyze memory usage
docker stats container_name --no-stream

# Check memory details inside container
docker exec container_name cat /proc/meminfo

Solutions:

  • Increase memory limits: docker run --memory=2g image_name
  • Add swap limit: docker run --memory=1g --memory-swap=2g image_name
  • Fix memory leaks in application code

Investigating I/O Bottlenecks

Symptom: Slow disk operations

Debugging:

# Check disk I/O stats
docker stats container_name

# Use iostat in the container
docker exec container_name sh -c "apt-get update && apt-get install -y sysstat && iostat -dx 1 10"

Solutions:

  • Use volume mounts for high I/O workloads: docker run -v /host/data:/data image_name
  • Consider tmpfs for temporary files: docker run --tmpfs /tmp:rw,noexec,nosuid,size=1g image_name
  • Set I/O limits: docker run --device-write-bps /dev/sda:1mb image_name

Docker Engine and Host-Level Debugging

Sometimes the issue lies with the Docker engine itself rather than individual containers:

Docker Daemon Logs

Check the Docker daemon logs for system-wide issues:

# For systemd-based systems
journalctl -u docker.service

# For non-systemd systems
cat /var/log/docker.log

Docker Events

Monitor Docker events to see what’s happening:

# Watch Docker events in real-time
docker events

# Filter events by type
docker events --filter type=container

# Filter events for a specific container
docker events --filter container=container_name

Docker Info and System Diagnostics

# Get Docker system information
docker info

# Check Docker disk usage
docker system df -v

# Run Docker diagnostics
docker system info

Diagnosing Common Host-Level Issues

  1. Docker Storage Driver Problems

    Symptom: “No space left on device” errors despite having disk space

    Debugging:

    docker info | grep "Storage Driver"
    df -h /var/lib/docker
    

    Solution: Clean up unused Docker resources

    docker system prune -a
    
  2. Docker Daemon Crashes

    Symptom: All containers stop unexpectedly

    Debugging:

    systemctl status docker
    journalctl -u docker.service -n 100
    

    Solution: Restart Docker and investigate host system issues

    systemctl restart docker
    

Deep Dive Debugging with Docker API

For programmatic debugging, use the Docker API directly:

# Get API version
curl --unix-socket /var/run/docker.sock http://localhost/version

# List containers
curl --unix-socket /var/run/docker.sock http://localhost/containers/json | jq

# Inspect container details
curl --unix-socket /var/run/docker.sock http://localhost/containers/container_id/json | jq

Image Layer Debugging

Container issues often stem from image problems:

Analyzing Image Layers

# View image history
docker history image_name

# Analyze image layers in detail
docker inspect image_name

# View intermediate layers
docker images --all

Using Container Diff Tools

# Install container-diff
curl -LO https://storage.googleapis.com/container-diff/latest/container-diff-linux-amd64
chmod +x container-diff-linux-amd64
sudo mv container-diff-linux-amd64 /usr/local/bin/container-diff

# Compare image differences
container-diff analyze image1 image2 --type=file

Security Debugging and Auditing

Security issues can cause container instability or unauthorized behavior:

Scanning Container Images

# Using Trivy
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock aquasec/trivy image image_name

# Using Clair
docker run --rm -p 5432:5432 -p 6060:6060 quay.io/coreos/clair

Auditing Container Runtime

# Inspect container capabilities
docker inspect --format='{{.HostConfig.CapAdd}}' container_name

# Check seccomp profile
docker inspect --format='{{.HostConfig.SecurityOpt}}' container_name

Using Docker Bench Security

# Run Docker Bench Security
docker run --rm -it \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v /usr/bin/docker:/usr/bin/docker \
  -v /var/lib/docker:/var/lib/docker \
  -v /etc/docker:/etc/docker \
  -v /etc/systemd/system/docker.service.d:/etc/systemd/system/docker.service.d \
  -v /etc:/host/etc \
  -v /lib/systemd/system/docker.service:/lib/systemd/system/docker.service \
  --label docker_bench_security \
  docker/docker-bench-security

Debugging Multi-Container Applications

Docker Compose environments require special attention:

Service Dependency Issues

Symptom: Services start in wrong order or fail to connect

Debugging:

# Check the service dependency graph
docker-compose config --services

# Follow logs from all services
docker-compose logs -f

Solution: Define dependencies in docker-compose.yml

services:
  app:
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy

Environment Variable Problems

Symptom: Service can’t connect to related services

Debugging:

# Check environment variables
docker-compose exec service_name env | sort

# Verify .env file loading
docker-compose config

Solution: Define environment variables properly

services:
  app:
    environment:
      DB_HOST: db
      REDIS_HOST: redis

Automated Debugging Techniques

For production environments, automated debugging tools help:

Using Docker Healthchecks

Define healthchecks in your Dockerfile:

HEALTHCHECK --interval=5s --timeout=3s --retries=3 \
  CMD curl -f http://localhost/health || exit 1

Or in docker-compose.yml:

services:
  app:
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost/health"]
      interval: 5s
      timeout: 3s
      retries: 3
      start_period: 10s

Implementing Debugging Sidecars

Add debugging sidecars to production pods:

services:
  app:
    # Main application configuration
    
  debug-sidecar:
    image: nicolaka/netshoot
    network_mode: "service:app"
    depends_on:
      - app
    command: ["tail", "-f", "/dev/null"]  # Keep container running

Setting Up Container Monitoring

# Run cAdvisor for container monitoring
docker run \
  --volume=/:/rootfs:ro \
  --volume=/var/run:/var/run:ro \
  --volume=/sys:/sys:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --publish=8080:8080 \
  --detach=true \
  --name=cadvisor \
  gcr.io/cadvisor/cadvisor:latest

Practical Debugging Examples

Let’s go through some real-world debugging scenarios:

Example 1: Container Exits Immediately

Symptom: Container starts and exits immediately

Debugging Process:

  1. Check the exit code:

    docker inspect container_name --format='{{.State.ExitCode}}'
    
  2. View the last few log lines:

    docker logs container_name
    
  3. Try running with an interactive shell to see what’s happening:

    docker run --rm -it --entrypoint /bin/sh image_name
    

Common Solutions:

  • Fix the entrypoint script
  • Ensure foreground process doesn’t exit
  • Add proper signal handling
  • Check for missing dependencies

Example 2: Web Application Returns 502 Bad Gateway

Symptom: Nginx or other proxy returns 502 Bad Gateway

Debugging Process:

  1. Check if the application container is running:

    docker ps | grep app_container
    
  2. Verify the application logs:

    docker logs app_container
    
  3. Test internal connectivity:

    docker exec proxy_container curl -v http://app_container:8080
    
  4. Check the network configuration:

    docker network inspect network_name
    

Common Solutions:

  • Ensure the application is listening on the correct interface (0.0.0.0 vs localhost)
  • Verify the port configuration
  • Check for firewall or security group issues
  • Validate the proxy configuration

Example 3: Container Memory Leak

Symptom: Container memory usage increases over time until OOM kill

Debugging Process:

  1. Confirm OOM is occurring:

    docker inspect container_name | grep OOMKilled
    
  2. Monitor memory usage:

    docker stats container_name
    
  3. Take memory snapshots at intervals:

    docker exec container_name sh -c "apt-get update && apt-get install -y python3-pip && pip3 install memory_profiler && python3 -m memory_profiler my_app.py"
    

Common Solutions:

  • Fix memory leaks in application code
  • Increase container memory limits
  • Implement proper garbage collection
  • Consider using a memory-optimized language for critical components

Best Practices for Container Debugging

Adopt these practices to make debugging easier:

1. Design for Debuggability

  • Include health endpoints in applications
  • Build with proper logging
  • Version your images properly
  • Use multi-stage builds with debug targets

2. Implement Proper Logging

  • Output logs to stdout/stderr
  • Use structured logging (JSON)
  • Include relevant context in log entries
  • Set appropriate log levels

3. Create Debugging Images

For production debugging, create special debug images:

FROM production-image AS debug
USER root
RUN apt-get update && apt-get install -y \
    curl wget telnet netcat-openbsd dnsutils \
    procps lsof strace tcpdump htop vim
USER appuser

4. Use Debugging Init Process

For complex containers, use an init process:

docker run --init -it image_name

Or in Dockerfile:

FROM alpine:3.18
RUN apk add --no-cache tini
ENTRYPOINT ["/sbin/tini", "--"]
CMD ["my_application"]

5. Leverage tmpfs for Debugging Data

Use tmpfs for debugging artifacts:

docker run --tmpfs /debug:rw,exec,size=100m image_name

Conclusion: A Systematic Approach

Debugging Docker containers requires a systematic approach and the right tools. By following the techniques in this guide, you can efficiently diagnose and resolve even the most complex container issues.

Remember these key principles:

  1. Start with logs and basic inspection
  2. Isolate the problem domain (app, container, network, or host)
  3. Use the right tools for each situation
  4. Document your findings for future reference

Docker’s containerization adds complexity but also provides powerful isolation that helps pinpoint issues. With practice, you’ll develop intuition about where to look first and which techniques to apply for different classes of problems.

Further Reading