Longhorn Advanced Operations
This guide covers advanced Longhorn operations, including performance tuning, troubleshooting, and advanced features for production environments.
Performance Tuning
Storage Performance Optimization
1. Disk Configuration
apiVersion: longhorn.io/v1beta1
kind: Node
metadata:
name: worker-1
spec:
disks:
nvme0:
path: /mnt/nvme0
allowScheduling: true
storageReserved: 10Gi
tags: ["ssd", "fast"]
2. Volume Settings
apiVersion: longhorn.io/v1beta1
kind: Volume
metadata:
name: high-performance
spec:
numberOfReplicas: 3
frontend: blockdev
engineImage: longhornio/longhorn-engine:v1.4.0
diskSelector: ["ssd"]
nodeSelector: ["storage"]
Network Optimization
# Node configuration for dedicated storage network
apiVersion: v1
kind: Node
metadata:
annotations:
storage.network: "192.168.10.0/24"
Advanced Features
1. Backup Configuration
apiVersion: longhorn.io/v1beta1
kind: BackupTarget
metadata:
name: s3-backup
spec:
backupTargetURL: s3://your-bucket@us-east-1/
credentialSecret: aws-credentials
2. Recurring Jobs
apiVersion: longhorn.io/v1beta1
kind: RecurringJob
metadata:
name: daily-backup
spec:
cron: "0 0 * * *"
task: "backup"
groups: ["default"]
retain: 7
concurrency: 2
Monitoring Setup
1. Prometheus Integration
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: longhorn-prometheus
spec:
selector:
matchLabels:
app: longhorn-manager
endpoints:
- port: manager
2. Custom Metrics
# Grafana Dashboard Configuration
{
"datasource": "Prometheus",
"fieldConfig": {
"defaults": {
"custom": {},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": []
}
},
"overrides": []
},
"panels": [
{
"datasource": "Prometheus",
"fieldConfig": {
"defaults": {
"custom": {}
},
"overrides": []
},
"targets": [
{
"expr": "longhorn_volume_actual_size_bytes",
"interval": "",
"legendFormat": "",
"refId": "A"
}
],
"title": "Volume Actual Size",
"type": "gauge"
}
]
}
Troubleshooting
1. Volume Recovery
# Check volume state
kubectl -n longhorn-system get volumes
# Force delete a stuck volume
kubectl -n longhorn-system patch volumes stuck-volume \
--type='json' -p='[{"op": "replace", "path": "/metadata/finalizers", "value":[]}]'
# Recover replica
kubectl -n longhorn-system exec -it longhorn-manager-xxx -- \
longhorn-manager replica-rebuild volume-name
2. Node Recovery
# Check node status
kubectl -n longhorn-system get nodes
# Cordon node for maintenance
kubectl cordon worker-1
# Evacuate volumes
kubectl -n longhorn-system annotate node worker-1 \
node.longhorn.io/evacuate=true
High Availability Configuration
1. Volume Replication
apiVersion: longhorn.io/v1beta1
kind: Volume
metadata:
name: ha-volume
spec:
numberOfReplicas: 3
replicaAutoBalance: "best-effort"
dataLocality: "best-effort"
2. Node Affinity Rules
apiVersion: v1
kind: Pod
metadata:
name: storage-pod
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: storage
operator: In
values:
- "true"
Disaster Recovery
1. Backup Strategy
# Volume backup configuration
apiVersion: longhorn.io/v1beta1
kind: VolumeBackup
metadata:
name: critical-backup
spec:
snapshotName: snapshot-1
volume: critical-volume
2. Recovery Process
# Restore from backup
kubectl -n longhorn-system create -f - <<EOF
apiVersion: longhorn.io/v1beta1
kind: Volume
metadata:
name: restored-volume
spec:
fromBackup: backupstore:///backup-name
EOF
Performance Testing
1. FIO Benchmarking
apiVersion: batch/v1
kind: Job
metadata:
name: fio-test
spec:
template:
spec:
containers:
- name: fio
image: nixery.dev/shell/fio
command:
- /bin/sh
- -c
- |
fio --name=randwrite --ioengine=libaio --iodepth=1 \
--rw=randwrite --bs=4k --direct=0 --size=512M \
--numjobs=2 --runtime=240 --group_reporting
volumeMounts:
- name: test-vol
mountPath: /data
volumes:
- name: test-vol
persistentVolumeClaim:
claimName: test-pvc
2. Results Analysis
# Collect performance metrics
kubectl -n longhorn-system exec -it \
$(kubectl -n longhorn-system get pod -l app=longhorn-manager -o jsonpath='{.items[0].metadata.name}') \
-- longhorn-manager info volume
Best Practices
Storage Configuration
- Use dedicated storage nodes
- Implement proper backup strategies
- Monitor disk usage regularly
Performance
- Use SSDs for better performance
- Configure proper replica count
- Implement resource limits
Maintenance
- Regular health checks
- Scheduled backups
- Update planning
Conclusion
Understanding advanced Longhorn operations is crucial for:
- Optimal performance
- Reliable disaster recovery
- Effective troubleshooting
- Production readiness
For more information, check out: