Enterprise PKI Management with cert-manager: Automated Certificate Lifecycle in Production Kubernetes
Managing digital certificates at scale across distributed Kubernetes environments presents significant operational challenges for enterprise organizations. cert-manager transforms certificate lifecycle management from a manual, error-prone process into an automated, policy-driven system that ensures security compliance and operational efficiency. This comprehensive guide demonstrates enterprise-grade PKI implementation patterns, advanced automation strategies, and production-ready security practices for mission-critical infrastructure.
Executive Summary
Enterprise PKI management requires sophisticated automation to handle certificate provisioning, renewal, revocation, and compliance across complex infrastructure landscapes. cert-manager provides native Kubernetes integration for certificate lifecycle automation, supporting multiple Certificate Authorities, validation methods, and deployment patterns. This implementation guide covers advanced PKI architectures, security hardening, compliance frameworks, and operational excellence patterns for production environments managing thousands of certificates across multi-cluster deployments.
Understanding Enterprise PKI Requirements
Certificate Lifecycle Management
Modern enterprise environments require comprehensive certificate management addressing:
- Automated Provisioning: On-demand certificate generation with policy enforcement
- Lifecycle Automation: Automated renewal, revocation, and replacement processes
- Compliance Monitoring: Audit trails, compliance reporting, and policy violations
- Security Integration: HSM integration, key escrow, and cryptographic standards
- Operational Excellence: Monitoring, alerting, and disaster recovery procedures
PKI Architecture Patterns
Hierarchical PKI Structure:
Root CA (Offline, Air-gapped)
├── Intermediate CA 1 (Production)
│ ├── Service Certificates
│ └── Client Certificates
├── Intermediate CA 2 (Development)
│ ├── Development Services
│ └── Testing Certificates
└── Intermediate CA 3 (Infrastructure)
├── Kubernetes Components
└── Network Equipment
cert-manager Installation and Configuration
Enterprise Deployment Architecture
Deploy cert-manager with high availability and security hardening:
# cert-manager-values.yaml
installCRDs: true
namespace: cert-manager
replicaCount: 3
image:
repository: quay.io/jetstack/cert-manager-controller
tag: v1.13.2
pullPolicy: IfNotPresent
serviceAccount:
create: true
automountServiceAccountToken: false
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 1000
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
podSecurityContext:
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 1000m
memory: 1Gi
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- cert-manager
topologyKey: kubernetes.io/hostname
nodeSelector:
kubernetes.io/os: linux
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
prometheus:
enabled: true
servicemonitor:
enabled: true
prometheusInstance: default
targetPort: 9402
path: /metrics
interval: 60s
scrapeTimeout: 30s
labels:
app.kubernetes.io/component: monitoring
webhook:
replicaCount: 3
image:
repository: quay.io/jetstack/cert-manager-webhook
tag: v1.13.2
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 200m
memory: 256Mi
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 1000
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
networkPolicy:
enabled: true
cainjector:
replicaCount: 2
image:
repository: quay.io/jetstack/cert-manager-cainjector
tag: v1.13.2
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 1Gi
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 1000
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
Installation with Security Hardening
Deploy cert-manager with comprehensive security configuration:
# Create namespace with security labels
kubectl create namespace cert-manager
kubectl label namespace cert-manager \
pod-security.kubernetes.io/enforce=restricted \
pod-security.kubernetes.io/audit=restricted \
pod-security.kubernetes.io/warn=restricted
# Add Helm repository
helm repo add jetstack https://charts.jetstack.io
helm repo update
# Install cert-manager with security-focused values
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--values cert-manager-values.yaml \
--wait \
--timeout 300s
# Verify installation
kubectl get pods -n cert-manager
kubectl get customresourcedefinitions | grep cert-manager
kubectl get validatingwebhookconfigurations | grep cert-manager
Network Policy Implementation
Secure cert-manager communication:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: cert-manager-controller
namespace: cert-manager
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: cert-manager
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: monitoring
ports:
- protocol: TCP
port: 9402 # Metrics
- from:
- podSelector:
matchLabels:
app.kubernetes.io/name: cert-manager-webhook
ports:
- protocol: TCP
port: 6060 # Health checks
egress:
- to: []
ports:
- protocol: TCP
port: 443 # HTTPS to CAs
- protocol: TCP
port: 53 # DNS
- protocol: UDP
port: 53 # DNS
- to:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 6443 # Kubernetes API
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: cert-manager-webhook
namespace: cert-manager
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: cert-manager-webhook
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 10250 # Webhook
egress:
- to:
- podSelector:
matchLabels:
app.kubernetes.io/name: cert-manager
ports:
- protocol: TCP
port: 6060
- to: []
ports:
- protocol: TCP
port: 53 # DNS
- protocol: UDP
port: 53 # DNS
Certificate Authority Configuration
Private CA Hierarchy
Establish enterprise-grade CA infrastructure:
# Root CA Secret (imported from secure offline system)
apiVersion: v1
kind: Secret
metadata:
name: root-ca-secret
namespace: cert-manager
type: Opaque
data:
tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0t... # Root CA certificate
tls.key: LS0tLS1CRUdJTiBQUklWQVRFIEtFWS0tLS0t... # Root CA private key (encrypted)
---
# Production Intermediate CA
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: production-ca-issuer
spec:
ca:
secretName: production-intermediate-ca
---
# Self-signed issuer for creating intermediate CAs
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: selfsigned-issuer
spec:
selfSigned: {}
---
# Production Intermediate CA Certificate
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: production-intermediate-ca
namespace: cert-manager
spec:
secretName: production-intermediate-ca
isCA: true
commonName: "Production Intermediate CA"
subject:
organizationalUnits:
- "IT Department"
organizations:
- "ACME Corporation"
countries:
- "US"
localities:
- "San Francisco"
provinces:
- "California"
issuerRef:
name: selfsigned-issuer
kind: ClusterIssuer
group: cert-manager.io
duration: 17520h # 2 years
renewBefore: 1440h # 60 days
keyAlgorithm: RSA
keySize: 4096
keyUsages:
- cert sign
- crl sign
- digital signature
- key encipherment
ACME Integration with Let’s Encrypt
Configure automated public certificate management:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-production
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: certificates@company.com
privateKeySecretRef:
name: letsencrypt-production-key
# Enable External Account Binding for enterprise accounts
externalAccountBinding:
keyID: "your-eab-key-id"
keySecretRef:
name: letsencrypt-eab-secret
key: secret
solvers:
# DNS-01 solver for wildcard certificates
- dns01:
route53:
region: us-west-2
accessKeyID: AKIAIOSFODNN7EXAMPLE
secretAccessKeySecretRef:
name: route53-credentials
key: secret-access-key
selector:
dnsZones:
- "company.com"
- "*.company.com"
# HTTP-01 solver for single domain certificates
- http01:
ingress:
class: nginx
podTemplate:
spec:
nodeSelector:
kubernetes.io/os: linux
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
selector:
dnsNames:
- "api.company.com"
- "app.company.com"
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
email: certificates@company.com
privateKeySecretRef:
name: letsencrypt-staging-key
solvers:
- dns01:
route53:
region: us-west-2
accessKeyID: AKIAIOSFODNN7EXAMPLE
secretAccessKeySecretRef:
name: route53-credentials
key: secret-access-key
- http01:
ingress:
class: nginx
Vault Integration
Integrate with HashiCorp Vault for enterprise PKI:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: vault-issuer
spec:
vault:
server: https://vault.company.com:8200
path: pki/sign/kubernetes
auth:
kubernetes:
mountPath: /v1/auth/kubernetes
role: cert-manager
secretRef:
name: vault-token
key: token
caBundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0t... # Vault CA bundle
---
# Vault service account and secret
apiVersion: v1
kind: ServiceAccount
metadata:
name: vault-auth
namespace: cert-manager
---
apiVersion: v1
kind: Secret
metadata:
name: vault-token
namespace: cert-manager
annotations:
kubernetes.io/service-account.name: vault-auth
type: kubernetes.io/service-account-token
Advanced Certificate Provisioning Patterns
Application-Specific Certificate Templates
Create standardized certificate templates for different application types:
# Web Service Certificate Template
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: web-service-template
namespace: default
annotations:
cert-manager.io/cluster-issuer: "production-ca-issuer"
spec:
secretName: web-service-tls
issuerRef:
name: production-ca-issuer
kind: ClusterIssuer
commonName: "web.company.com"
dnsNames:
- "web.company.com"
- "www.web.company.com"
duration: 8760h # 1 year
renewBefore: 720h # 30 days
subject:
organizationalUnits: ["Web Services"]
organizations: ["ACME Corporation"]
keyAlgorithm: RSA
keySize: 2048
keyUsages:
- digital signature
- key encipherment
- server auth
secretTemplate:
labels:
app.kubernetes.io/component: web-service
cert-manager.io/certificate-type: server
annotations:
cert-manager.io/common-name: "web.company.com"
cert-manager.io/certificate-template: "web-service"
---
# API Service Certificate Template
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: api-service-template
namespace: default
spec:
secretName: api-service-tls
issuerRef:
name: production-ca-issuer
kind: ClusterIssuer
commonName: "api.company.com"
dnsNames:
- "api.company.com"
- "*.api.company.com"
ipAddresses:
- "10.0.1.100"
uris:
- "spiffe://company.com/api-service"
duration: 4380h # 6 months
renewBefore: 360h # 15 days
subject:
organizationalUnits: ["API Services"]
organizations: ["ACME Corporation"]
keyAlgorithm: ECDSA
keySize: 256
keyUsages:
- digital signature
- key agreement
- server auth
- client auth
secretTemplate:
labels:
app.kubernetes.io/component: api-service
cert-manager.io/certificate-type: mutual-tls
Automated Certificate Injection
Implement automated certificate injection for applications:
# Certificate injection using init containers
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-application
namespace: default
spec:
replicas: 3
selector:
matchLabels:
app: web-application
template:
metadata:
labels:
app: web-application
spec:
initContainers:
- name: cert-init
image: alpine:3.18
command:
- sh
- -c
- |
# Wait for certificate to be ready
until [ -f /certs/tls.crt ] && [ -f /certs/tls.key ]; do
echo "Waiting for certificates..."
sleep 5
done
# Validate certificate
openssl verify -CAfile /certs/ca.crt /certs/tls.crt || exit 1
# Set proper permissions
chmod 600 /certs/tls.key
chmod 644 /certs/tls.crt
echo "Certificates ready"
volumeMounts:
- name: certs
mountPath: /certs
containers:
- name: web-app
image: nginx:1.25-alpine
ports:
- containerPort: 443
name: https
volumeMounts:
- name: certs
mountPath: /etc/nginx/certs
readOnly: true
- name: nginx-config
mountPath: /etc/nginx/nginx.conf
subPath: nginx.conf
volumes:
- name: certs
secret:
secretName: web-service-tls
defaultMode: 0600
- name: nginx-config
configMap:
name: nginx-tls-config
---
# Nginx configuration with TLS
apiVersion: v1
kind: ConfigMap
metadata:
name: nginx-tls-config
data:
nginx.conf: |
events {
worker_connections 1024;
}
http {
server {
listen 443 ssl http2;
server_name web.company.com;
ssl_certificate /etc/nginx/certs/tls.crt;
ssl_certificate_key /etc/nginx/certs/tls.key;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE+AESGCM:ECDHE+CHACHA20:DHE+AESGCM:DHE+CHACHA20:!aNULL:!SHA1:!WEAK;
ssl_prefer_server_ciphers off;
location / {
return 200 "Certificate-secured application\n";
add_header Content-Type text/plain;
}
location /health {
return 200 "healthy\n";
add_header Content-Type text/plain;
}
}
}
Certificate Policy and Governance
Policy-Driven Certificate Management
Implement comprehensive certificate policies:
# Certificate policy using OPA Gatekeeper
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
name: certificatepolicy
spec:
crd:
spec:
names:
kind: CertificatePolicy
validation:
properties:
allowedIssuers:
type: array
items:
type: string
maxDuration:
type: string
requiredKeyUsages:
type: array
items:
type: string
allowedAlgorithms:
type: array
items:
type: string
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package certificatepolicy
violation[{"msg": msg}] {
input.review.object.kind == "Certificate"
issuer := input.review.object.spec.issuerRef.name
not issuer in input.parameters.allowedIssuers
msg := sprintf("Issuer '%v' is not in allowed list: %v", [issuer, input.parameters.allowedIssuers])
}
violation[{"msg": msg}] {
input.review.object.kind == "Certificate"
duration := input.review.object.spec.duration
max_duration := input.parameters.maxDuration
duration_seconds := time.parse_duration_ns(duration) / 1000000000
max_seconds := time.parse_duration_ns(max_duration) / 1000000000
duration_seconds > max_seconds
msg := sprintf("Certificate duration '%v' exceeds maximum allowed '%v'", [duration, max_duration])
}
violation[{"msg": msg}] {
input.review.object.kind == "Certificate"
algorithm := input.review.object.spec.keyAlgorithm
not algorithm in input.parameters.allowedAlgorithms
msg := sprintf("Key algorithm '%v' is not allowed. Permitted algorithms: %v", [algorithm, input.parameters.allowedAlgorithms])
}
---
apiVersion: config.gatekeeper.sh/v1beta1
kind: CertificatePolicy
metadata:
name: production-cert-policy
spec:
match:
- apiGroups: ["cert-manager.io"]
kinds: ["Certificate"]
namespaces: ["production"]
parameters:
allowedIssuers:
- "production-ca-issuer"
- "letsencrypt-production"
maxDuration: "8760h" # 1 year maximum
requiredKeyUsages:
- "digital signature"
- "key encipherment"
allowedAlgorithms:
- "RSA"
- "ECDSA"
Certificate Compliance Monitoring
Implement comprehensive compliance monitoring:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: certificate-compliance
namespace: cert-manager
spec:
groups:
- name: certificate.compliance
rules:
- alert: CertificateExpiringWithoutAutoRenewal
expr: |
(certmanager_certificate_expiration_timestamp_seconds - time()) / 86400 < 30
and on (name, namespace) (certmanager_certificate_renewal_timestamp_seconds == 0)
for: 24h
labels:
severity: warning
compliance: certificate-lifecycle
annotations:
summary: "Certificate {{ $labels.name }} in {{ $labels.namespace }} expires in < 30 days without auto-renewal"
- alert: CertificateUsingWeakAlgorithm
expr: |
certmanager_certificate_info{algorithm="RSA",key_size!~"2048|4096"}
or certmanager_certificate_info{algorithm="ECDSA",key_size!~"256|384"}
for: 0s
labels:
severity: critical
compliance: cryptographic-standards
annotations:
summary: "Certificate {{ $labels.name }} uses weak cryptographic algorithm"
- alert: CertificateExcessiveDuration
expr: |
(certmanager_certificate_expiration_timestamp_seconds - certmanager_certificate_not_before_timestamp_seconds) / 86400 > 365
for: 0s
labels:
severity: warning
compliance: certificate-duration
annotations:
summary: "Certificate {{ $labels.name }} has duration > 365 days"
High Availability and Disaster Recovery
Multi-Cluster Certificate Management
Implement certificate synchronization across clusters:
# Certificate replication controller
apiVersion: apps/v1
kind: Deployment
metadata:
name: cert-replicator
namespace: cert-manager
spec:
replicas: 2
selector:
matchLabels:
app: cert-replicator
template:
metadata:
labels:
app: cert-replicator
spec:
serviceAccountName: cert-replicator
containers:
- name: replicator
image: cert-replicator:v1.2.0
env:
- name: SOURCE_KUBECONFIG
value: "/etc/kubeconfig/primary/config"
- name: TARGET_CLUSTERS
value: "secondary,tertiary"
- name: SYNC_INTERVAL
value: "300s"
- name: CERTIFICATE_SELECTOR
value: "cert-manager.io/replicate=true"
volumeMounts:
- name: primary-kubeconfig
mountPath: /etc/kubeconfig/primary
- name: secondary-kubeconfig
mountPath: /etc/kubeconfig/secondary
- name: tertiary-kubeconfig
mountPath: /etc/kubeconfig/tertiary
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 200m
memory: 256Mi
volumes:
- name: primary-kubeconfig
secret:
secretName: primary-cluster-access
- name: secondary-kubeconfig
secret:
secretName: secondary-cluster-access
- name: tertiary-kubeconfig
secret:
secretName: tertiary-cluster-access
Backup and Recovery Procedures
Implement comprehensive backup strategies:
#!/bin/bash
# cert-manager-backup.sh
BACKUP_DIR="/backup/cert-manager/$(date +%Y%m%d-%H%M%S)"
NAMESPACES=("cert-manager" "default" "production")
mkdir -p "$BACKUP_DIR"
# Backup cert-manager configuration
echo "Backing up cert-manager resources..."
kubectl get clusterissuers,certificates,certificaterequests -o yaml > "$BACKUP_DIR/cert-manager-resources.yaml"
# Backup certificate secrets
echo "Backing up certificate secrets..."
for ns in "${NAMESPACES[@]}"; do
kubectl get secrets -n "$ns" -l cert-manager.io/certificate-name -o yaml > "$BACKUP_DIR/cert-secrets-$ns.yaml"
done
# Backup CA certificates and keys (encrypted)
echo "Backing up CA certificates..."
kubectl get secrets -n cert-manager -l cert-manager.io/ca-certificate=true -o yaml > "$BACKUP_DIR/ca-certificates.yaml"
# Create backup manifest
cat << EOF > "$BACKUP_DIR/backup-manifest.yaml"
apiVersion: v1
kind: ConfigMap
metadata:
name: backup-manifest
namespace: cert-manager
data:
timestamp: "$(date -u +%Y-%m-%dT%H:%M:%SZ)"
version: "$(kubectl version --short --client)"
cert-manager-version: "$(kubectl get deployment cert-manager -n cert-manager -o jsonpath='{.spec.template.spec.containers[0].image}')"
clusters: |
$(kubectl config get-contexts --no-headers | awk '{print " - " $2}')
EOF
# Encrypt backup if GPG key available
if command -v gpg &> /dev/null && [ -n "$BACKUP_GPG_KEY" ]; then
echo "Encrypting backup..."
tar czf - "$BACKUP_DIR" | gpg --trust-model always --encrypt --armor \
--recipient "$BACKUP_GPG_KEY" > "$BACKUP_DIR.tar.gz.gpg"
rm -rf "$BACKUP_DIR"
echo "Encrypted backup created: $BACKUP_DIR.tar.gz.gpg"
else
echo "Backup created: $BACKUP_DIR"
fi
Security Hardening and Compliance
RBAC Configuration
Implement principle of least privilege:
apiVersion: v1
kind: ServiceAccount
metadata:
name: cert-manager-restricted
namespace: cert-manager
automountServiceAccountToken: false
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cert-manager-controller-restricted
rules:
# Certificate management
- apiGroups: ["cert-manager.io"]
resources: ["certificates", "certificaterequests", "orders", "challenges"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
- apiGroups: ["cert-manager.io"]
resources: ["certificates/status", "certificaterequests/status"]
verbs: ["update", "patch"]
- apiGroups: ["cert-manager.io"]
resources: ["clusterissuers", "issuers"]
verbs: ["get", "list", "watch"]
# Secret management (restricted to cert-manager labeled secrets)
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
resourceNames: [] # Will be restricted by validating webhook
# Event creation
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "patch"]
# Ingress for HTTP-01 challenges (restricted)
- apiGroups: ["networking.k8s.io"]
resources: ["ingresses"]
verbs: ["get", "list", "watch", "create", "delete", "update"]
resourceNames: ["cm-acme-http-solver-*"]
# Pod creation for DNS-01 challenges (restricted)
- apiGroups: [""]
resources: ["pods"]
verbs: ["create", "delete", "get", "list", "watch"]
resourceNames: ["cm-acme-dns-solver-*"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cert-manager-controller-restricted
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cert-manager-controller-restricted
subjects:
- kind: ServiceAccount
name: cert-manager-restricted
namespace: cert-manager
Security Monitoring
Implement comprehensive security monitoring:
apiVersion: v1
kind: ConfigMap
metadata:
name: falco-cert-manager-rules
namespace: falco
data:
cert_manager_rules.yaml: |
- rule: Unauthorized Certificate Access
desc: Detect unauthorized access to certificate secrets
condition: >
k8s_audit and ka.verb in (get, list) and
ka.target.resource=secrets and
ka.target.name contains "tls" and
not ka.user.name in (cert-manager, system:serviceaccount:cert-manager:cert-manager)
output: >
Unauthorized access to certificate secret
(user=%ka.user.name verb=%ka.verb secret=%ka.target.name namespace=%ka.target.namespace)
priority: WARNING
tags: [k8s, security, certificates]
- rule: Certificate Manipulation
desc: Detect direct manipulation of certificate resources
condition: >
k8s_audit and ka.verb in (create, update, patch, delete) and
ka.target.resource=certificates and
not ka.user.name in (cert-manager, system:serviceaccount:cert-manager:cert-manager)
output: >
Direct certificate manipulation detected
(user=%ka.user.name verb=%ka.verb cert=%ka.target.name namespace=%ka.target.namespace)
priority: WARNING
tags: [k8s, security, certificates]
- rule: CA Certificate Access
desc: Detect access to CA certificates
condition: >
k8s_audit and ka.verb in (get, list) and
ka.target.resource=secrets and
ka.target.name contains "ca" and
not ka.user.name in (cert-manager, system:serviceaccount:cert-manager:cert-manager)
output: >
CA certificate access detected
(user=%ka.user.name verb=%ka.verb secret=%ka.target.name namespace=%ka.target.namespace)
priority: CRITICAL
tags: [k8s, security, ca-certificates]
Monitoring and Observability
Comprehensive Metrics Collection
Deploy advanced monitoring for certificate lifecycle:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: cert-manager-detailed
namespace: monitoring
spec:
selector:
matchLabels:
app.kubernetes.io/name: cert-manager
namespaceSelector:
matchNames:
- cert-manager
endpoints:
- port: tcp-prometheus-servicemonitor
interval: 30s
path: /metrics
honorLabels: true
metricRelabelings:
- sourceLabels: [__name__]
regex: 'certmanager_certificate_.*'
targetLabel: __name__
replacement: '${1}'
- sourceLabels: [name, namespace]
separator: '/'
targetLabel: certificate_fqn
replacement: '${1}'
Grafana Dashboard for Certificate Management
{
"dashboard": {
"title": "Enterprise Certificate Management",
"tags": ["cert-manager", "pki", "security"],
"templating": {
"list": [
{
"name": "namespace",
"type": "query",
"query": "label_values(certmanager_certificate_info, exported_namespace)",
"includeAll": true,
"allValue": ".*"
},
{
"name": "issuer",
"type": "query",
"query": "label_values(certmanager_certificate_info{exported_namespace=~\"$namespace\"}, issuer_name)",
"includeAll": true
}
]
},
"panels": [
{
"title": "Certificate Overview",
"type": "stat",
"targets": [
{
"expr": "count(certmanager_certificate_info{exported_namespace=~\"$namespace\",issuer_name=~\"$issuer\"})",
"legendFormat": "Total Certificates"
}
]
},
{
"title": "Certificate Expiration Timeline",
"type": "graph",
"targets": [
{
"expr": "sort_desc((certmanager_certificate_expiration_timestamp_seconds{exported_namespace=~\"$namespace\"} - time()) / 86400)",
"legendFormat": "{{name}} ({{exported_namespace}})"
}
]
},
{
"title": "Certificate Renewal Status",
"type": "table",
"targets": [
{
"expr": "certmanager_certificate_info{exported_namespace=~\"$namespace\",issuer_name=~\"$issuer\"}",
"format": "table",
"instant": true
}
]
},
{
"title": "Failed Certificate Requests",
"type": "graph",
"targets": [
{
"expr": "increase(certmanager_certificate_request_conditions{condition=\"Failed\"}[5m])",
"legendFormat": "{{name}} - Failed Requests"
}
]
},
{
"title": "ACME Challenge Success Rate",
"type": "stat",
"targets": [
{
"expr": "rate(certmanager_acme_client_request_count{status=\"success\"}[5m]) / rate(certmanager_acme_client_request_count[5m]) * 100",
"legendFormat": "Success Rate %"
}
]
}
]
}
}
Troubleshooting and Operations
Diagnostic Tools and Scripts
Comprehensive troubleshooting toolkit:
#!/bin/bash
# cert-manager-diagnostics.sh
echo "=== cert-manager Component Status ==="
kubectl get pods -n cert-manager
kubectl get deployments -n cert-manager
echo -e "\n=== Certificate Status Overview ==="
kubectl get certificates --all-namespaces
echo -e "\n=== Failed Certificate Requests ==="
kubectl get certificaterequests --all-namespaces --field-selector status.phase=Failed
echo -e "\n=== ACME Orders Status ==="
kubectl get orders --all-namespaces
echo -e "\n=== Certificate Expiration Check ==="
kubectl get certificates --all-namespaces -o json | \
jq -r '.items[] | select(.status.notAfter != null) |
"\(.metadata.namespace)/\(.metadata.name): expires \(.status.notAfter)"' | \
while read line; do
expiry=$(echo $line | cut -d: -f2 | sed 's/expires //')
days_left=$(( ($(date -d "$expiry" +%s) - $(date +%s)) / 86400 ))
if [ $days_left -lt 30 ]; then
echo "⚠️ $line ($days_left days left)"
else
echo "✅ $line ($days_left days left)"
fi
done
echo -e "\n=== Recent Events ==="
kubectl get events --all-namespaces --field-selector involvedObject.apiVersion=cert-manager.io/v1 --sort-by='.lastTimestamp' | tail -20
echo -e "\n=== Webhook Configuration ==="
kubectl get validatingwebhookconfigurations cert-manager-webhook -o yaml
echo -e "\n=== Certificate Controller Logs ==="
kubectl logs -n cert-manager -l app.kubernetes.io/name=cert-manager --tail=50
Common Issues and Resolutions
Certificate Renewal Failures:
# Check certificate status and events
kubectl describe certificate problem-cert -n production
# Force certificate renewal
kubectl annotate certificate problem-cert -n production \
cert-manager.io/force-renew=$(date +%s)
# Check certificate request details
kubectl get certificaterequest -n production --sort-by='.metadata.creationTimestamp'
ACME Challenge Failures:
# Check ACME order status
kubectl describe order -n production
# Verify DNS propagation for DNS-01 challenges
dig TXT _acme-challenge.example.com
# Test HTTP-01 challenge endpoint
curl -v http://example.com/.well-known/acme-challenge/test
Conclusion
Enterprise PKI management with cert-manager provides automated, scalable, and secure certificate lifecycle management for complex Kubernetes environments. This implementation demonstrates comprehensive patterns that ensure operational excellence, security compliance, and business continuity for mission-critical infrastructure.
Key benefits of this enterprise cert-manager implementation include:
- Automated Lifecycle: Eliminates manual certificate management processes
- Policy Enforcement: Ensures compliance with organizational security standards
- Multi-CA Support: Integrates with various Certificate Authorities and PKI systems
- High Availability: Provides redundancy and disaster recovery capabilities
- Security Hardening: Implements comprehensive security controls and monitoring
- Operational Excellence: Delivers comprehensive monitoring and troubleshooting capabilities
Regular security audits, certificate inventory reviews, and disaster recovery testing ensure the continued effectiveness of the PKI infrastructure. Consider implementing additional security measures such as Hardware Security Modules (HSMs), certificate transparency logging, and advanced threat detection for high-security environments.
The patterns demonstrated here provide a solid foundation for implementing enterprise-grade certificate management that scales from hundreds to thousands of certificates across multiple clusters and environments.