Managing digital certificates at scale across distributed Kubernetes environments presents significant operational challenges for enterprise organizations. cert-manager transforms certificate lifecycle management from a manual, error-prone process into an automated, policy-driven system that ensures security compliance and operational efficiency. This comprehensive guide demonstrates enterprise-grade PKI implementation patterns, advanced automation strategies, and production-ready security practices for mission-critical infrastructure.

Executive Summary

Enterprise PKI management requires sophisticated automation to handle certificate provisioning, renewal, revocation, and compliance across complex infrastructure landscapes. cert-manager provides native Kubernetes integration for certificate lifecycle automation, supporting multiple Certificate Authorities, validation methods, and deployment patterns. This implementation guide covers advanced PKI architectures, security hardening, compliance frameworks, and operational excellence patterns for production environments managing thousands of certificates across multi-cluster deployments.

Understanding Enterprise PKI Requirements

Certificate Lifecycle Management

Modern enterprise environments require comprehensive certificate management addressing:

  1. Automated Provisioning: On-demand certificate generation with policy enforcement
  2. Lifecycle Automation: Automated renewal, revocation, and replacement processes
  3. Compliance Monitoring: Audit trails, compliance reporting, and policy violations
  4. Security Integration: HSM integration, key escrow, and cryptographic standards
  5. Operational Excellence: Monitoring, alerting, and disaster recovery procedures

PKI Architecture Patterns

Hierarchical PKI Structure:

Root CA (Offline, Air-gapped)
├── Intermediate CA 1 (Production)
│   ├── Service Certificates
│   └── Client Certificates
├── Intermediate CA 2 (Development)
│   ├── Development Services
│   └── Testing Certificates
└── Intermediate CA 3 (Infrastructure)
    ├── Kubernetes Components
    └── Network Equipment

cert-manager Installation and Configuration

Enterprise Deployment Architecture

Deploy cert-manager with high availability and security hardening:

# cert-manager-values.yaml
installCRDs: true
namespace: cert-manager

replicaCount: 3

image:
  repository: quay.io/jetstack/cert-manager-controller
  tag: v1.13.2
  pullPolicy: IfNotPresent

serviceAccount:
  create: true
  automountServiceAccountToken: false

securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  runAsGroup: 1000
  readOnlyRootFilesystem: true
  allowPrivilegeEscalation: false
  capabilities:
    drop:
    - ALL

podSecurityContext:
  fsGroup: 1000
  seccompProfile:
    type: RuntimeDefault

resources:
  requests:
    cpu: 100m
    memory: 256Mi
  limits:
    cpu: 1000m
    memory: 1Gi

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/name
            operator: In
            values:
            - cert-manager
        topologyKey: kubernetes.io/hostname

nodeSelector:
  kubernetes.io/os: linux

tolerations:
- key: node-role.kubernetes.io/master
  operator: Exists
  effect: NoSchedule

prometheus:
  enabled: true
  servicemonitor:
    enabled: true
    prometheusInstance: default
    targetPort: 9402
    path: /metrics
    interval: 60s
    scrapeTimeout: 30s
    labels:
      app.kubernetes.io/component: monitoring

webhook:
  replicaCount: 3
  image:
    repository: quay.io/jetstack/cert-manager-webhook
    tag: v1.13.2

  resources:
    requests:
      cpu: 50m
      memory: 64Mi
    limits:
      cpu: 200m
      memory: 256Mi

  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    runAsGroup: 1000
    readOnlyRootFilesystem: true
    allowPrivilegeEscalation: false

  networkPolicy:
    enabled: true

cainjector:
  replicaCount: 2
  image:
    repository: quay.io/jetstack/cert-manager-cainjector
    tag: v1.13.2

  resources:
    requests:
      cpu: 100m
      memory: 256Mi
    limits:
      cpu: 500m
      memory: 1Gi

  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    runAsGroup: 1000
    readOnlyRootFilesystem: true
    allowPrivilegeEscalation: false

Installation with Security Hardening

Deploy cert-manager with comprehensive security configuration:

# Create namespace with security labels
kubectl create namespace cert-manager

kubectl label namespace cert-manager \
  pod-security.kubernetes.io/enforce=restricted \
  pod-security.kubernetes.io/audit=restricted \
  pod-security.kubernetes.io/warn=restricted

# Add Helm repository
helm repo add jetstack https://charts.jetstack.io
helm repo update

# Install cert-manager with security-focused values
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --values cert-manager-values.yaml \
  --wait \
  --timeout 300s

# Verify installation
kubectl get pods -n cert-manager
kubectl get customresourcedefinitions | grep cert-manager
kubectl get validatingwebhookconfigurations | grep cert-manager

Network Policy Implementation

Secure cert-manager communication:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: cert-manager-controller
  namespace: cert-manager
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: cert-manager
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: monitoring
    ports:
    - protocol: TCP
      port: 9402  # Metrics
  - from:
    - podSelector:
        matchLabels:
          app.kubernetes.io/name: cert-manager-webhook
    ports:
    - protocol: TCP
      port: 6060  # Health checks
  egress:
  - to: []
    ports:
    - protocol: TCP
      port: 443   # HTTPS to CAs
    - protocol: TCP
      port: 53    # DNS
    - protocol: UDP
      port: 53    # DNS
  - to:
    - namespaceSelector: {}
    ports:
    - protocol: TCP
      port: 6443  # Kubernetes API
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: cert-manager-webhook
  namespace: cert-manager
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: cert-manager-webhook
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector: {}
    ports:
    - protocol: TCP
      port: 10250  # Webhook
  egress:
  - to:
    - podSelector:
        matchLabels:
          app.kubernetes.io/name: cert-manager
    ports:
    - protocol: TCP
      port: 6060
  - to: []
    ports:
    - protocol: TCP
      port: 53    # DNS
    - protocol: UDP
      port: 53    # DNS

Certificate Authority Configuration

Private CA Hierarchy

Establish enterprise-grade CA infrastructure:

# Root CA Secret (imported from secure offline system)
apiVersion: v1
kind: Secret
metadata:
  name: root-ca-secret
  namespace: cert-manager
type: Opaque
data:
  tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0t...  # Root CA certificate
  tls.key: LS0tLS1CRUdJTiBQUklWQVRFIEtFWS0tLS0t...   # Root CA private key (encrypted)
---
# Production Intermediate CA
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: production-ca-issuer
spec:
  ca:
    secretName: production-intermediate-ca
---
# Self-signed issuer for creating intermediate CAs
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: selfsigned-issuer
spec:
  selfSigned: {}
---
# Production Intermediate CA Certificate
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: production-intermediate-ca
  namespace: cert-manager
spec:
  secretName: production-intermediate-ca
  isCA: true
  commonName: "Production Intermediate CA"
  subject:
    organizationalUnits:
    - "IT Department"
    organizations:
    - "ACME Corporation"
    countries:
    - "US"
    localities:
    - "San Francisco"
    provinces:
    - "California"
  issuerRef:
    name: selfsigned-issuer
    kind: ClusterIssuer
    group: cert-manager.io
  duration: 17520h  # 2 years
  renewBefore: 1440h  # 60 days
  keyAlgorithm: RSA
  keySize: 4096
  keyUsages:
  - cert sign
  - crl sign
  - digital signature
  - key encipherment

ACME Integration with Let’s Encrypt

Configure automated public certificate management:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-production
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: certificates@company.com
    privateKeySecretRef:
      name: letsencrypt-production-key

    # Enable External Account Binding for enterprise accounts
    externalAccountBinding:
      keyID: "your-eab-key-id"
      keySecretRef:
        name: letsencrypt-eab-secret
        key: secret

    solvers:
    # DNS-01 solver for wildcard certificates
    - dns01:
        route53:
          region: us-west-2
          accessKeyID: AKIAIOSFODNN7EXAMPLE
          secretAccessKeySecretRef:
            name: route53-credentials
            key: secret-access-key
      selector:
        dnsZones:
        - "company.com"
        - "*.company.com"

    # HTTP-01 solver for single domain certificates
    - http01:
        ingress:
          class: nginx
          podTemplate:
            spec:
              nodeSelector:
                kubernetes.io/os: linux
              tolerations:
              - key: node-role.kubernetes.io/master
                operator: Exists
                effect: NoSchedule
      selector:
        dnsNames:
        - "api.company.com"
        - "app.company.com"
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    email: certificates@company.com
    privateKeySecretRef:
      name: letsencrypt-staging-key
    solvers:
    - dns01:
        route53:
          region: us-west-2
          accessKeyID: AKIAIOSFODNN7EXAMPLE
          secretAccessKeySecretRef:
            name: route53-credentials
            key: secret-access-key
    - http01:
        ingress:
          class: nginx

Vault Integration

Integrate with HashiCorp Vault for enterprise PKI:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: vault-issuer
spec:
  vault:
    server: https://vault.company.com:8200
    path: pki/sign/kubernetes
    auth:
      kubernetes:
        mountPath: /v1/auth/kubernetes
        role: cert-manager
        secretRef:
          name: vault-token
          key: token
    caBundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0t...  # Vault CA bundle
---
# Vault service account and secret
apiVersion: v1
kind: ServiceAccount
metadata:
  name: vault-auth
  namespace: cert-manager
---
apiVersion: v1
kind: Secret
metadata:
  name: vault-token
  namespace: cert-manager
  annotations:
    kubernetes.io/service-account.name: vault-auth
type: kubernetes.io/service-account-token

Advanced Certificate Provisioning Patterns

Application-Specific Certificate Templates

Create standardized certificate templates for different application types:

# Web Service Certificate Template
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: web-service-template
  namespace: default
  annotations:
    cert-manager.io/cluster-issuer: "production-ca-issuer"
spec:
  secretName: web-service-tls
  issuerRef:
    name: production-ca-issuer
    kind: ClusterIssuer
  commonName: "web.company.com"
  dnsNames:
  - "web.company.com"
  - "www.web.company.com"
  duration: 8760h    # 1 year
  renewBefore: 720h  # 30 days
  subject:
    organizationalUnits: ["Web Services"]
    organizations: ["ACME Corporation"]
  keyAlgorithm: RSA
  keySize: 2048
  keyUsages:
  - digital signature
  - key encipherment
  - server auth
  secretTemplate:
    labels:
      app.kubernetes.io/component: web-service
      cert-manager.io/certificate-type: server
    annotations:
      cert-manager.io/common-name: "web.company.com"
      cert-manager.io/certificate-template: "web-service"
---
# API Service Certificate Template
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: api-service-template
  namespace: default
spec:
  secretName: api-service-tls
  issuerRef:
    name: production-ca-issuer
    kind: ClusterIssuer
  commonName: "api.company.com"
  dnsNames:
  - "api.company.com"
  - "*.api.company.com"
  ipAddresses:
  - "10.0.1.100"
  uris:
  - "spiffe://company.com/api-service"
  duration: 4380h    # 6 months
  renewBefore: 360h  # 15 days
  subject:
    organizationalUnits: ["API Services"]
    organizations: ["ACME Corporation"]
  keyAlgorithm: ECDSA
  keySize: 256
  keyUsages:
  - digital signature
  - key agreement
  - server auth
  - client auth
  secretTemplate:
    labels:
      app.kubernetes.io/component: api-service
      cert-manager.io/certificate-type: mutual-tls

Automated Certificate Injection

Implement automated certificate injection for applications:

# Certificate injection using init containers
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-application
  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-application
  template:
    metadata:
      labels:
        app: web-application
    spec:
      initContainers:
      - name: cert-init
        image: alpine:3.18
        command:
        - sh
        - -c
        - |
          # Wait for certificate to be ready
          until [ -f /certs/tls.crt ] && [ -f /certs/tls.key ]; do
            echo "Waiting for certificates..."
            sleep 5
          done

          # Validate certificate
          openssl verify -CAfile /certs/ca.crt /certs/tls.crt || exit 1

          # Set proper permissions
          chmod 600 /certs/tls.key
          chmod 644 /certs/tls.crt

          echo "Certificates ready"
        volumeMounts:
        - name: certs
          mountPath: /certs

      containers:
      - name: web-app
        image: nginx:1.25-alpine
        ports:
        - containerPort: 443
          name: https
        volumeMounts:
        - name: certs
          mountPath: /etc/nginx/certs
          readOnly: true
        - name: nginx-config
          mountPath: /etc/nginx/nginx.conf
          subPath: nginx.conf

      volumes:
      - name: certs
        secret:
          secretName: web-service-tls
          defaultMode: 0600
      - name: nginx-config
        configMap:
          name: nginx-tls-config
---
# Nginx configuration with TLS
apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx-tls-config
data:
  nginx.conf: |
    events {
        worker_connections 1024;
    }

    http {
        server {
            listen 443 ssl http2;
            server_name web.company.com;

            ssl_certificate /etc/nginx/certs/tls.crt;
            ssl_certificate_key /etc/nginx/certs/tls.key;

            ssl_protocols TLSv1.2 TLSv1.3;
            ssl_ciphers ECDHE+AESGCM:ECDHE+CHACHA20:DHE+AESGCM:DHE+CHACHA20:!aNULL:!SHA1:!WEAK;
            ssl_prefer_server_ciphers off;

            location / {
                return 200 "Certificate-secured application\n";
                add_header Content-Type text/plain;
            }

            location /health {
                return 200 "healthy\n";
                add_header Content-Type text/plain;
            }
        }
    }

Certificate Policy and Governance

Policy-Driven Certificate Management

Implement comprehensive certificate policies:

# Certificate policy using OPA Gatekeeper
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
  name: certificatepolicy
spec:
  crd:
    spec:
      names:
        kind: CertificatePolicy
      validation:
        properties:
          allowedIssuers:
            type: array
            items:
              type: string
          maxDuration:
            type: string
          requiredKeyUsages:
            type: array
            items:
              type: string
          allowedAlgorithms:
            type: array
            items:
              type: string

  targets:
  - target: admission.k8s.gatekeeper.sh
    rego: |
      package certificatepolicy

      violation[{"msg": msg}] {
        input.review.object.kind == "Certificate"
        issuer := input.review.object.spec.issuerRef.name
        not issuer in input.parameters.allowedIssuers
        msg := sprintf("Issuer '%v' is not in allowed list: %v", [issuer, input.parameters.allowedIssuers])
      }

      violation[{"msg": msg}] {
        input.review.object.kind == "Certificate"
        duration := input.review.object.spec.duration
        max_duration := input.parameters.maxDuration
        duration_seconds := time.parse_duration_ns(duration) / 1000000000
        max_seconds := time.parse_duration_ns(max_duration) / 1000000000
        duration_seconds > max_seconds
        msg := sprintf("Certificate duration '%v' exceeds maximum allowed '%v'", [duration, max_duration])
      }

      violation[{"msg": msg}] {
        input.review.object.kind == "Certificate"
        algorithm := input.review.object.spec.keyAlgorithm
        not algorithm in input.parameters.allowedAlgorithms
        msg := sprintf("Key algorithm '%v' is not allowed. Permitted algorithms: %v", [algorithm, input.parameters.allowedAlgorithms])
      }
---
apiVersion: config.gatekeeper.sh/v1beta1
kind: CertificatePolicy
metadata:
  name: production-cert-policy
spec:
  match:
  - apiGroups: ["cert-manager.io"]
    kinds: ["Certificate"]
    namespaces: ["production"]
  parameters:
    allowedIssuers:
    - "production-ca-issuer"
    - "letsencrypt-production"
    maxDuration: "8760h"  # 1 year maximum
    requiredKeyUsages:
    - "digital signature"
    - "key encipherment"
    allowedAlgorithms:
    - "RSA"
    - "ECDSA"

Certificate Compliance Monitoring

Implement comprehensive compliance monitoring:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: certificate-compliance
  namespace: cert-manager
spec:
  groups:
  - name: certificate.compliance
    rules:
    - alert: CertificateExpiringWithoutAutoRenewal
      expr: |
        (certmanager_certificate_expiration_timestamp_seconds - time()) / 86400 < 30
        and on (name, namespace) (certmanager_certificate_renewal_timestamp_seconds == 0)
      for: 24h
      labels:
        severity: warning
        compliance: certificate-lifecycle
      annotations:
        summary: "Certificate {{ $labels.name }} in {{ $labels.namespace }} expires in < 30 days without auto-renewal"

    - alert: CertificateUsingWeakAlgorithm
      expr: |
        certmanager_certificate_info{algorithm="RSA",key_size!~"2048|4096"}
        or certmanager_certificate_info{algorithm="ECDSA",key_size!~"256|384"}
      for: 0s
      labels:
        severity: critical
        compliance: cryptographic-standards
      annotations:
        summary: "Certificate {{ $labels.name }} uses weak cryptographic algorithm"

    - alert: CertificateExcessiveDuration
      expr: |
        (certmanager_certificate_expiration_timestamp_seconds - certmanager_certificate_not_before_timestamp_seconds) / 86400 > 365
      for: 0s
      labels:
        severity: warning
        compliance: certificate-duration
      annotations:
        summary: "Certificate {{ $labels.name }} has duration > 365 days"

High Availability and Disaster Recovery

Multi-Cluster Certificate Management

Implement certificate synchronization across clusters:

# Certificate replication controller
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cert-replicator
  namespace: cert-manager
spec:
  replicas: 2
  selector:
    matchLabels:
      app: cert-replicator
  template:
    metadata:
      labels:
        app: cert-replicator
    spec:
      serviceAccountName: cert-replicator
      containers:
      - name: replicator
        image: cert-replicator:v1.2.0
        env:
        - name: SOURCE_KUBECONFIG
          value: "/etc/kubeconfig/primary/config"
        - name: TARGET_CLUSTERS
          value: "secondary,tertiary"
        - name: SYNC_INTERVAL
          value: "300s"
        - name: CERTIFICATE_SELECTOR
          value: "cert-manager.io/replicate=true"
        volumeMounts:
        - name: primary-kubeconfig
          mountPath: /etc/kubeconfig/primary
        - name: secondary-kubeconfig
          mountPath: /etc/kubeconfig/secondary
        - name: tertiary-kubeconfig
          mountPath: /etc/kubeconfig/tertiary
        resources:
          requests:
            cpu: 50m
            memory: 64Mi
          limits:
            cpu: 200m
            memory: 256Mi
      volumes:
      - name: primary-kubeconfig
        secret:
          secretName: primary-cluster-access
      - name: secondary-kubeconfig
        secret:
          secretName: secondary-cluster-access
      - name: tertiary-kubeconfig
        secret:
          secretName: tertiary-cluster-access

Backup and Recovery Procedures

Implement comprehensive backup strategies:

#!/bin/bash
# cert-manager-backup.sh

BACKUP_DIR="/backup/cert-manager/$(date +%Y%m%d-%H%M%S)"
NAMESPACES=("cert-manager" "default" "production")

mkdir -p "$BACKUP_DIR"

# Backup cert-manager configuration
echo "Backing up cert-manager resources..."
kubectl get clusterissuers,certificates,certificaterequests -o yaml > "$BACKUP_DIR/cert-manager-resources.yaml"

# Backup certificate secrets
echo "Backing up certificate secrets..."
for ns in "${NAMESPACES[@]}"; do
    kubectl get secrets -n "$ns" -l cert-manager.io/certificate-name -o yaml > "$BACKUP_DIR/cert-secrets-$ns.yaml"
done

# Backup CA certificates and keys (encrypted)
echo "Backing up CA certificates..."
kubectl get secrets -n cert-manager -l cert-manager.io/ca-certificate=true -o yaml > "$BACKUP_DIR/ca-certificates.yaml"

# Create backup manifest
cat << EOF > "$BACKUP_DIR/backup-manifest.yaml"
apiVersion: v1
kind: ConfigMap
metadata:
  name: backup-manifest
  namespace: cert-manager
data:
  timestamp: "$(date -u +%Y-%m-%dT%H:%M:%SZ)"
  version: "$(kubectl version --short --client)"
  cert-manager-version: "$(kubectl get deployment cert-manager -n cert-manager -o jsonpath='{.spec.template.spec.containers[0].image}')"
  clusters: |
$(kubectl config get-contexts --no-headers | awk '{print "    - " $2}')
EOF

# Encrypt backup if GPG key available
if command -v gpg &> /dev/null && [ -n "$BACKUP_GPG_KEY" ]; then
    echo "Encrypting backup..."
    tar czf - "$BACKUP_DIR" | gpg --trust-model always --encrypt --armor \
        --recipient "$BACKUP_GPG_KEY" > "$BACKUP_DIR.tar.gz.gpg"
    rm -rf "$BACKUP_DIR"
    echo "Encrypted backup created: $BACKUP_DIR.tar.gz.gpg"
else
    echo "Backup created: $BACKUP_DIR"
fi

Security Hardening and Compliance

RBAC Configuration

Implement principle of least privilege:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: cert-manager-restricted
  namespace: cert-manager
automountServiceAccountToken: false
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cert-manager-controller-restricted
rules:
# Certificate management
- apiGroups: ["cert-manager.io"]
  resources: ["certificates", "certificaterequests", "orders", "challenges"]
  verbs: ["get", "list", "watch", "create", "update", "patch"]
- apiGroups: ["cert-manager.io"]
  resources: ["certificates/status", "certificaterequests/status"]
  verbs: ["update", "patch"]
- apiGroups: ["cert-manager.io"]
  resources: ["clusterissuers", "issuers"]
  verbs: ["get", "list", "watch"]

# Secret management (restricted to cert-manager labeled secrets)
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
  resourceNames: []  # Will be restricted by validating webhook

# Event creation
- apiGroups: [""]
  resources: ["events"]
  verbs: ["create", "patch"]

# Ingress for HTTP-01 challenges (restricted)
- apiGroups: ["networking.k8s.io"]
  resources: ["ingresses"]
  verbs: ["get", "list", "watch", "create", "delete", "update"]
  resourceNames: ["cm-acme-http-solver-*"]

# Pod creation for DNS-01 challenges (restricted)
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["create", "delete", "get", "list", "watch"]
  resourceNames: ["cm-acme-dns-solver-*"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cert-manager-controller-restricted
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cert-manager-controller-restricted
subjects:
- kind: ServiceAccount
  name: cert-manager-restricted
  namespace: cert-manager

Security Monitoring

Implement comprehensive security monitoring:

apiVersion: v1
kind: ConfigMap
metadata:
  name: falco-cert-manager-rules
  namespace: falco
data:
  cert_manager_rules.yaml: |
    - rule: Unauthorized Certificate Access
      desc: Detect unauthorized access to certificate secrets
      condition: >
        k8s_audit and ka.verb in (get, list) and
        ka.target.resource=secrets and
        ka.target.name contains "tls" and
        not ka.user.name in (cert-manager, system:serviceaccount:cert-manager:cert-manager)
      output: >
        Unauthorized access to certificate secret
        (user=%ka.user.name verb=%ka.verb secret=%ka.target.name namespace=%ka.target.namespace)
      priority: WARNING
      tags: [k8s, security, certificates]

    - rule: Certificate Manipulation
      desc: Detect direct manipulation of certificate resources
      condition: >
        k8s_audit and ka.verb in (create, update, patch, delete) and
        ka.target.resource=certificates and
        not ka.user.name in (cert-manager, system:serviceaccount:cert-manager:cert-manager)
      output: >
        Direct certificate manipulation detected
        (user=%ka.user.name verb=%ka.verb cert=%ka.target.name namespace=%ka.target.namespace)
      priority: WARNING
      tags: [k8s, security, certificates]

    - rule: CA Certificate Access
      desc: Detect access to CA certificates
      condition: >
        k8s_audit and ka.verb in (get, list) and
        ka.target.resource=secrets and
        ka.target.name contains "ca" and
        not ka.user.name in (cert-manager, system:serviceaccount:cert-manager:cert-manager)
      output: >
        CA certificate access detected
        (user=%ka.user.name verb=%ka.verb secret=%ka.target.name namespace=%ka.target.namespace)
      priority: CRITICAL
      tags: [k8s, security, ca-certificates]

Monitoring and Observability

Comprehensive Metrics Collection

Deploy advanced monitoring for certificate lifecycle:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: cert-manager-detailed
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: cert-manager
  namespaceSelector:
    matchNames:
    - cert-manager
  endpoints:
  - port: tcp-prometheus-servicemonitor
    interval: 30s
    path: /metrics
    honorLabels: true
    metricRelabelings:
    - sourceLabels: [__name__]
      regex: 'certmanager_certificate_.*'
      targetLabel: __name__
      replacement: '${1}'
    - sourceLabels: [name, namespace]
      separator: '/'
      targetLabel: certificate_fqn
      replacement: '${1}'

Grafana Dashboard for Certificate Management

{
  "dashboard": {
    "title": "Enterprise Certificate Management",
    "tags": ["cert-manager", "pki", "security"],
    "templating": {
      "list": [
        {
          "name": "namespace",
          "type": "query",
          "query": "label_values(certmanager_certificate_info, exported_namespace)",
          "includeAll": true,
          "allValue": ".*"
        },
        {
          "name": "issuer",
          "type": "query",
          "query": "label_values(certmanager_certificate_info{exported_namespace=~\"$namespace\"}, issuer_name)",
          "includeAll": true
        }
      ]
    },
    "panels": [
      {
        "title": "Certificate Overview",
        "type": "stat",
        "targets": [
          {
            "expr": "count(certmanager_certificate_info{exported_namespace=~\"$namespace\",issuer_name=~\"$issuer\"})",
            "legendFormat": "Total Certificates"
          }
        ]
      },
      {
        "title": "Certificate Expiration Timeline",
        "type": "graph",
        "targets": [
          {
            "expr": "sort_desc((certmanager_certificate_expiration_timestamp_seconds{exported_namespace=~\"$namespace\"} - time()) / 86400)",
            "legendFormat": "{{name}} ({{exported_namespace}})"
          }
        ]
      },
      {
        "title": "Certificate Renewal Status",
        "type": "table",
        "targets": [
          {
            "expr": "certmanager_certificate_info{exported_namespace=~\"$namespace\",issuer_name=~\"$issuer\"}",
            "format": "table",
            "instant": true
          }
        ]
      },
      {
        "title": "Failed Certificate Requests",
        "type": "graph",
        "targets": [
          {
            "expr": "increase(certmanager_certificate_request_conditions{condition=\"Failed\"}[5m])",
            "legendFormat": "{{name}} - Failed Requests"
          }
        ]
      },
      {
        "title": "ACME Challenge Success Rate",
        "type": "stat",
        "targets": [
          {
            "expr": "rate(certmanager_acme_client_request_count{status=\"success\"}[5m]) / rate(certmanager_acme_client_request_count[5m]) * 100",
            "legendFormat": "Success Rate %"
          }
        ]
      }
    ]
  }
}

Troubleshooting and Operations

Diagnostic Tools and Scripts

Comprehensive troubleshooting toolkit:

#!/bin/bash
# cert-manager-diagnostics.sh

echo "=== cert-manager Component Status ==="
kubectl get pods -n cert-manager
kubectl get deployments -n cert-manager

echo -e "\n=== Certificate Status Overview ==="
kubectl get certificates --all-namespaces

echo -e "\n=== Failed Certificate Requests ==="
kubectl get certificaterequests --all-namespaces --field-selector status.phase=Failed

echo -e "\n=== ACME Orders Status ==="
kubectl get orders --all-namespaces

echo -e "\n=== Certificate Expiration Check ==="
kubectl get certificates --all-namespaces -o json | \
jq -r '.items[] | select(.status.notAfter != null) |
"\(.metadata.namespace)/\(.metadata.name): expires \(.status.notAfter)"' | \
while read line; do
    expiry=$(echo $line | cut -d: -f2 | sed 's/expires //')
    days_left=$(( ($(date -d "$expiry" +%s) - $(date +%s)) / 86400 ))
    if [ $days_left -lt 30 ]; then
        echo "⚠️  $line ($days_left days left)"
    else
        echo "✅ $line ($days_left days left)"
    fi
done

echo -e "\n=== Recent Events ==="
kubectl get events --all-namespaces --field-selector involvedObject.apiVersion=cert-manager.io/v1 --sort-by='.lastTimestamp' | tail -20

echo -e "\n=== Webhook Configuration ==="
kubectl get validatingwebhookconfigurations cert-manager-webhook -o yaml

echo -e "\n=== Certificate Controller Logs ==="
kubectl logs -n cert-manager -l app.kubernetes.io/name=cert-manager --tail=50

Common Issues and Resolutions

Certificate Renewal Failures:

# Check certificate status and events
kubectl describe certificate problem-cert -n production

# Force certificate renewal
kubectl annotate certificate problem-cert -n production \
  cert-manager.io/force-renew=$(date +%s)

# Check certificate request details
kubectl get certificaterequest -n production --sort-by='.metadata.creationTimestamp'

ACME Challenge Failures:

# Check ACME order status
kubectl describe order -n production

# Verify DNS propagation for DNS-01 challenges
dig TXT _acme-challenge.example.com

# Test HTTP-01 challenge endpoint
curl -v http://example.com/.well-known/acme-challenge/test

Conclusion

Enterprise PKI management with cert-manager provides automated, scalable, and secure certificate lifecycle management for complex Kubernetes environments. This implementation demonstrates comprehensive patterns that ensure operational excellence, security compliance, and business continuity for mission-critical infrastructure.

Key benefits of this enterprise cert-manager implementation include:

  • Automated Lifecycle: Eliminates manual certificate management processes
  • Policy Enforcement: Ensures compliance with organizational security standards
  • Multi-CA Support: Integrates with various Certificate Authorities and PKI systems
  • High Availability: Provides redundancy and disaster recovery capabilities
  • Security Hardening: Implements comprehensive security controls and monitoring
  • Operational Excellence: Delivers comprehensive monitoring and troubleshooting capabilities

Regular security audits, certificate inventory reviews, and disaster recovery testing ensure the continued effectiveness of the PKI infrastructure. Consider implementing additional security measures such as Hardware Security Modules (HSMs), certificate transparency logging, and advanced threat detection for high-security environments.

The patterns demonstrated here provide a solid foundation for implementing enterprise-grade certificate management that scales from hundreds to thousands of certificates across multiple clusters and environments.