AWS Elastic Kubernetes Service (EKS) uses the Amazon VPC Container Network Interface (CNI) plugin as its default networking solution. While the basic setup is straightforward, enterprise production environments require advanced networking configurations to optimize IP address utilization, implement fine-grained security controls, and ensure high performance at scale. This comprehensive guide explores advanced VPC CNI features, custom networking patterns, and production-ready configurations for enterprise EKS deployments.

Understanding AWS VPC CNI Architecture

VPC CNI Fundamentals

The Amazon VPC CNI plugin enables Kubernetes pods to have the same IP address inside the pod as they do on the VPC network. This native integration provides several advantages:

# VPC CNI DaemonSet Architecture Overview
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: aws-node
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: aws-node
  template:
    metadata:
      labels:
        k8s-app: aws-node
    spec:
      serviceAccountName: aws-node
      hostNetwork: true
      tolerations:
      - operator: Exists
      containers:
      - name: aws-node
        image: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:v1.15.4
        env:
        - name: AWS_VPC_K8S_CNI_LOGLEVEL
          value: "DEBUG"
        - name: MY_NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: ENABLE_IPv4
          value: "true"
        - name: ENABLE_IPv6
          value: "false"
        securityContext:
          privileged: true
        resources:
          requests:
            cpu: 25m
            memory: 40Mi

IP Address Management (IPAM)

The VPC CNI uses two primary components for IP address management:

  1. IPAMD (IP Address Management Daemon): Manages ENI attachment and IP allocation
  2. CNI Plugin: Assigns IP addresses to pods during creation
// Simplified IPAMD logic for IP allocation
type IPAMContext struct {
    awsClient     *ec2wrapper.EC2Wrapper
    k8sClient     kubernetes.Interface
    maxENI        int
    maxIPsPerENI  int
    warmIPTarget  int
    minimumIPTarget int
    warmENITarget int
}

func (c *IPAMContext) AllocateIPAddress() (string, error) {
    // Check available IPs in warm pool
    if availableIP := c.getAvailableIP(); availableIP != "" {
        return availableIP, nil
    }

    // If no IPs available, attempt to allocate new ENI or IPs
    if err := c.increaseIPPool(); err != nil {
        return "", err
    }

    return c.getAvailableIP(), nil
}

func (c *IPAMContext) increaseIPPool() error {
    // Determine if we need a new ENI or can add IPs to existing
    if c.shouldAddNewENI() {
        return c.attachENI()
    }
    return c.allocateIPsToENI()
}

Advanced VPC CNI Configuration

Custom Networking Mode

Custom networking allows you to specify different subnets for pod networking, separating pod IPs from node IPs:

# Enable custom networking
apiVersion: v1
kind: ConfigMap
metadata:
  name: amazon-vpc-cni
  namespace: kube-system
data:
  enable-custom-networking: "true"
  eni-config-label-def: "failure-domain.beta.kubernetes.io/zone"
# ENIConfig for custom subnet in us-west-2a
apiVersion: crd.k8s.amazonaws.com/v1alpha1
kind: ENIConfig
metadata:
  name: us-west-2a
spec:
  subnet: subnet-0123456789abcdef0
  securityGroups:
  - sg-0123456789abcdef0
  - sg-0fedcba9876543210
---
# ENIConfig for us-west-2b
apiVersion: crd.k8s.amazonaws.com/v1alpha1
kind: ENIConfig
metadata:
  name: us-west-2b
spec:
  subnet: subnet-0fedcba9876543210
  securityGroups:
  - sg-0123456789abcdef0
  - sg-0fedcba9876543210
---
# ENIConfig for us-west-2c
apiVersion: crd.k8s.amazonaws.com/v1alpha1
kind: ENIConfig
metadata:
  name: us-west-2c
spec:
  subnet: subnet-0abcdef0123456789
  securityGroups:
  - sg-0123456789abcdef0
  - sg-0fedcba9876543210

IP Prefix Delegation

IP prefix delegation allows each ENI to receive a /28 prefix (16 IP addresses) instead of individual secondary IPs:

# Enable prefix delegation
apiVersion: v1
kind: ConfigMap
metadata:
  name: amazon-vpc-cni
  namespace: kube-system
data:
  enable-prefix-delegation: "true"
  warm-prefix-target: "1"
  warm-ip-target: "5"
# Verify prefix delegation status
kubectl get daemonset aws-node -n kube-system -o yaml | grep ENABLE_PREFIX_DELEGATION

# Check node's IP allocation
kubectl describe node <node-name> | grep "vpc.amazonaws.com"

# View ENI prefixes using AWS CLI
aws ec2 describe-network-interfaces \
  --filters "Name=attachment.instance-id,Values=i-0123456789abcdef0" \
  --query 'NetworkInterfaces[*].{ID:NetworkInterfaceId,Prefixes:Ipv4Prefixes}'

Security Groups for Pods

Security groups for pods enable pod-level network security policies using native AWS security groups:

# Enable security groups for pods
apiVersion: v1
kind: ConfigMap
metadata:
  name: amazon-vpc-cni
  namespace: kube-system
data:
  enable-pod-eni: "true"
# SecurityGroupPolicy for specific deployment
apiVersion: vpcresources.k8s.aws/v1beta1
kind: SecurityGroupPolicy
metadata:
  name: database-app-sgp
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: database-client
      tier: backend
  securityGroups:
    groupIds:
    - sg-0123456789abcdef0  # Database access security group
    - sg-0fedcba9876543210  # Application security group
---
# Example deployment using security group policy
apiVersion: apps/v1
kind: Deployment
metadata:
  name: database-client
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: database-client
      tier: backend
  template:
    metadata:
      labels:
        app: database-client
        tier: backend
    spec:
      containers:
      - name: app
        image: myapp:latest
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"

Production Optimization Strategies

Warm IP Pool Configuration

Optimize IP address allocation for predictable scaling:

# Advanced warm pool configuration
apiVersion: v1
kind: ConfigMap
metadata:
  name: amazon-vpc-cni
  namespace: kube-system
data:
  # Minimum number of IPs to keep available
  warm-ip-target: "10"

  # Minimum total IPs to maintain
  minimum-ip-target: "20"

  # Number of warm ENIs to keep attached
  warm-eni-target: "1"

  # Maximum number of ENIs per node
  max-eni: "8"

  # Enable IP cooldown period before release
  enable-ip-cooldown: "true"

  # Cooldown period in seconds
  ip-cooldown-period: "30"
# Calculate optimal warm pool settings
#!/bin/bash

# Instance type specific values
INSTANCE_TYPE="m5.2xlarge"
MAX_ENI=4
MAX_IPS_PER_ENI=15

# Application specific values
PODS_PER_NODE=50
SCALE_UP_PODS=20
SCALE_UP_TIME=60  # seconds

# Calculate warm IP target
# Should cover burst scaling + buffer
WARM_IP_TARGET=$((SCALE_UP_PODS + 10))

# Calculate minimum IP target
# Should cover normal operation
MINIMUM_IP_TARGET=$((PODS_PER_NODE + WARM_IP_TARGET))

echo "Recommended settings for ${INSTANCE_TYPE}:"
echo "warm-ip-target: ${WARM_IP_TARGET}"
echo "minimum-ip-target: ${MINIMUM_IP_TARGET}"
echo "warm-eni-target: 1"

# Apply configuration
kubectl set env daemonset aws-node -n kube-system \
  WARM_IP_TARGET=${WARM_IP_TARGET} \
  MINIMUM_IP_TARGET=${MINIMUM_IP_TARGET} \
  WARM_ENI_TARGET=1

SNAT Policy Configuration

Configure Source Network Address Translation for optimal egress traffic routing:

# Disable SNAT for pods using VPC CIDR
apiVersion: v1
kind: ConfigMap
metadata:
  name: amazon-vpc-cni
  namespace: kube-system
data:
  # Disable SNAT to preserve source IP
  aws-vpc-k8s-cni-externalsnat: "true"

  # Exclude SNAT for specific CIDRs
  aws-vpc-k8s-cni-exclude-snat-cidrs: "10.0.0.0/8,172.16.0.0/12"
# Verify SNAT configuration
kubectl exec -n kube-system ds/aws-node -- \
  sh -c 'cat /var/log/aws-routed-eni/plugin.log | grep SNAT'

# Test connectivity with source IP preservation
kubectl run test-pod --image=nicolaka/netshoot -it --rm -- bash

# Inside pod, check source IP
curl ifconfig.me

# Verify iptables rules on node
sudo iptables -t nat -L AWS-SNAT-CHAIN-0 -n -v

Network Policy Integration

Combine VPC CNI with Calico for advanced network policies:

# Install Calico for network policy enforcement
kubectl apply -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/master/config/master/calico-operator.yaml
kubectl apply -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/master/config/master/calico-crs.yaml
# NetworkPolicy using Calico
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: database-network-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: postgres
      tier: database
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: database-client
    - namespaceSelector:
        matchLabels:
          name: production
    ports:
    - protocol: TCP
      port: 5432
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: postgres
          tier: database
    ports:
    - protocol: TCP
      port: 5432
  - to:
    - namespaceSelector: {}
    ports:
    - protocol: UDP
      port: 53

Advanced Troubleshooting

IP Allocation Issues

# Check IPAMD logs
kubectl logs -n kube-system -l k8s-app=aws-node --tail=100 -c aws-node

# View current IP allocation
kubectl get pods -n kube-system -l k8s-app=aws-node -o wide

# Describe node for IP information
kubectl describe node <node-name> | grep -A 10 "Allocatable"

# Check for IP exhaustion
aws ec2 describe-instances \
  --instance-ids i-0123456789abcdef0 \
  --query 'Reservations[*].Instances[*].NetworkInterfaces[*].[NetworkInterfaceId,PrivateIpAddresses]'

# Force IP garbage collection
kubectl delete pod -n kube-system -l k8s-app=aws-node

ENI Attachment Problems

# Verify ENI limits
aws ec2 describe-instance-types \
  --instance-types m5.2xlarge \
  --query 'InstanceTypes[*].NetworkInfo.MaximumNetworkInterfaces'

# Check ENI attachment status
aws ec2 describe-network-interfaces \
  --filters "Name=attachment.instance-id,Values=i-0123456789abcdef0" \
  --query 'NetworkInterfaces[*].[NetworkInterfaceId,Status,Attachment.Status]'

# Review IAM permissions
aws iam get-role-policy \
  --role-name eks-node-role \
  --policy-name AmazonEKS_CNI_Policy

# Enable introspection server for debugging
kubectl set env daemonset aws-node -n kube-system ENABLE_POD_ENI=true
kubectl set env daemonset aws-node -n kube-system POD_SECURITY_GROUP_ENFORCING_MODE=standard

# Access introspection endpoint
kubectl port-forward -n kube-system ds/aws-node 61679:61679
curl http://localhost:61679/v1/enis

Performance Monitoring

# Prometheus ServiceMonitor for VPC CNI metrics
apiVersion: v1
kind: Service
metadata:
  name: aws-node-metrics
  namespace: kube-system
  labels:
    k8s-app: aws-node
spec:
  selector:
    k8s-app: aws-node
  ports:
  - name: metrics
    port: 61678
    targetPort: 61678
    protocol: TCP
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: aws-node
  namespace: kube-system
  labels:
    k8s-app: aws-node
spec:
  selector:
    matchLabels:
      k8s-app: aws-node
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics
# Key VPC CNI metrics to monitor

# IP address allocation rate
rate(awscni_ip_allocation_total[5m])

# Available IPs per node
awscni_total_ip_addresses - awscni_assigned_ip_addresses

# ENI allocation errors
rate(awscni_eni_allocation_error_count[5m])

# Pod ENI allocation latency
histogram_quantile(0.99, rate(awscni_pod_eni_allocation_duration_seconds_bucket[5m]))

# IP cooldown queue size
awscni_ip_cooldown_queue_size

# Reconcile errors
rate(awscni_reconcile_error_count[5m])

Production Best Practices

Subnet Planning

# Subnet allocation strategy for EKS with custom networking
# VPC CIDR: 10.0.0.0/16

# Node subnets (smaller, for instance ENIs)
# AZ A: 10.0.0.0/24  (256 addresses)
# AZ B: 10.0.1.0/24  (256 addresses)
# AZ C: 10.0.2.0/24  (256 addresses)

# Pod subnets (larger, for pod IPs with prefix delegation)
# AZ A: 10.0.16.0/20 (4096 addresses)
# AZ B: 10.0.32.0/20 (4096 addresses)
# AZ C: 10.0.48.0/20 (4096 addresses)

# Calculate required IPs for pod subnet
pods_per_node = 110
nodes_per_az = 50
growth_factor = 2.0

required_ips = pods_per_node * nodes_per_az * growth_factor
# = 110 * 50 * 2.0 = 11,000 IPs

# /20 provides 4,096 IPs per AZ
# Total across 3 AZs: 12,288 IPs (sufficient with some buffer)

Security Hardening

# IAM role for VPC CNI with least privilege
apiVersion: v1
kind: ServiceAccount
metadata:
  name: aws-node
  namespace: kube-system
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/eks-vpc-cni-role
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:AssignPrivateIpAddresses",
        "ec2:AttachNetworkInterface",
        "ec2:CreateNetworkInterface",
        "ec2:DeleteNetworkInterface",
        "ec2:DescribeInstances",
        "ec2:DescribeTags",
        "ec2:DescribeNetworkInterfaces",
        "ec2:DescribeInstanceTypes",
        "ec2:DetachNetworkInterface",
        "ec2:ModifyNetworkInterfaceAttribute",
        "ec2:UnassignPrivateIpAddresses"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "ec2:CreateTags"
      ],
      "Resource": "arn:aws:ec2:*:*:network-interface/*",
      "Condition": {
        "StringEquals": {
          "ec2:CreateAction": "CreateNetworkInterface"
        }
      }
    }
  ]
}

High Availability Configuration

# Configure VPC CNI for high availability
apiVersion: v1
kind: ConfigMap
metadata:
  name: amazon-vpc-cni
  namespace: kube-system
data:
  # Enable prefix delegation for higher pod density
  enable-prefix-delegation: "true"

  # Maintain extra capacity for quick scaling
  warm-prefix-target: "1"
  warm-ip-target: "10"
  minimum-ip-target: "20"

  # Enable IPv6 for dual-stack support
  enable-ipv6: "false"

  # Configure introspection for monitoring
  enable-pod-eni: "true"
  enable-network-policy-controller: "true"

  # Optimize for reliability
  max-eni: "8"
  enable-ip-cooldown: "true"
  ip-cooldown-period: "30"
# Deploy VPC CNI with high availability
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: aws-node
  namespace: kube-system
spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 10%
  template:
    spec:
      priorityClassName: system-node-critical
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/os
                operator: In
                values:
                - linux
              - key: kubernetes.io/arch
                operator: In
                values:
                - amd64
                - arm64
      containers:
      - name: aws-node
        resources:
          requests:
            cpu: 25m
            memory: 40Mi
          limits:
            memory: 200Mi
        livenessProbe:
          httpGet:
            path: /healthz
            port: 61679
          initialDelaySeconds: 60
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /readyz
            port: 61679
          initialDelaySeconds: 10
          periodSeconds: 10
EOF

Conclusion

AWS VPC CNI provides powerful networking capabilities for EKS clusters, but realizing its full potential requires careful configuration and understanding of advanced features. By implementing custom networking, IP prefix delegation, and security groups for pods, you can build highly scalable and secure Kubernetes environments on AWS.

Key takeaways:

  • Use custom networking to separate pod and node IP spaces for better IP management
  • Enable prefix delegation to support higher pod density per node
  • Implement security groups for pods for fine-grained network security
  • Optimize warm IP pools based on your scaling patterns
  • Monitor VPC CNI metrics for proactive issue detection
  • Plan subnet allocation carefully for future growth

The configurations and patterns presented in this guide are battle-tested in production environments and provide a solid foundation for enterprise EKS deployments. Regular monitoring, testing, and optimization ensure your networking infrastructure can support your application’s evolving requirements.