Migrating from RKE1 to RKE2 Using Longhorn DR Volumes: A Kubernetes-Native Approach // Support Tools

With RKE1 reaching its end-of-life on July 31, 2025, organizations running Longhorn for persistent storage have a powerful built-in option for migration to RKE2. Longhorn’s Disaster Recovery (DR) volumes provide a Kubernetes-native approach that synchronizes data directly between clusters for efficient, low-downtime migration.

Why Use Longhorn DR Volumes for RKE1 to RKE2 Migration?

Longhorn DR volumes offer unique advantages for cross-cluster data migration:

Native Kubernetes Integration: Fully integrated with Kubernetes, requiring no external systems.
Incremental Replication: Only changed data blocks are transferred, minimizing network overhead.
Simple Management: Direct volume-to-volume replication without intermediate storage.
Minimal Downtime: Continuous replication allows for rapid cutover with minimal data loss.
Application Consistency: Support for consistent backups of multi-volume applications.
Built-in Health Monitoring: Automated verification of replication status and health.

Prerequisites for Migration

Before beginning the migration process, ensure you have:

Longhorn Running on Both Clusters: Installed and operational in both RKE1 and RKE2 clusters.
Network Connectivity: The clusters must be able to communicate on Longhorn’s replication ports.
Matching Longhorn Versions: Ideally the same version on both source and target.
Deployed RKE2 Cluster: A functioning RKE2 cluster with sufficient resources.
Resource Mapping: Plan for namespace and storage class mapping between clusters.

Step-by-Step Migration Process

1. Install and Configure Longhorn on Both Clusters

If Longhorn isn’t already installed on both clusters, install it using Helm:

# On both RKE1 and RKE2 clusters
helm repo add longhorn https://charts.longhorn.io
helm repo update
kubectl create namespace longhorn-system
helm install longhorn longhorn/longhorn --namespace longhorn-system

Verify the installation is healthy:

kubectl -n longhorn-system get pods

All pods should show as Running with status 1/1.

2. Set Up External Access for Inter-Cluster Communication

Enable the Longhorn backend services to communicate between clusters. This typically requires:

Setting up a LoadBalancer or NodePort service for the source cluster:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
  name: longhorn-backend-external
  namespace: longhorn-system
spec:
  selector:
    app: longhorn-manager
  ports:
  - port: 9500
    targetPort: 9500
    name: manager
  type: LoadBalancer
EOF

Get the external IP or hostname:

kubectl -n longhorn-system get svc longhorn-backend-external

Note the EXTERNAL-IP for later use.

3. Create and Set Up a Disaster Recovery Volume in RKE2

For each persistent volume in your RKE1 cluster that you need to migrate:

Create a DR volume in your RKE2 cluster through the Longhorn UI:
- Access the Longhorn UI in your RKE2 cluster
- Navigate to “Volume” page
- Click “Create Volume”
- Name it appropriately (e.g., dr-[original-volume-name])
- Set size matching or larger than the source volume
- Select “Disaster Recovery Volume” option
- Click “Create”
Configure the DR volume to point to the source volume:
- In the Longhorn UI, locate the newly created DR volume
- Click the “Enable Disaster Recovery” button
- In the dialog:
  - Enter the external URL of your RKE1 Longhorn (e.g., http://EXTERNAL-IP:9500)
  - Select the source volume from RKE1
  - Set the replication schedule (e.g., */5 * * * * for every 5 minutes)
  - Click “OK”

Repeat this process for each volume you need to migrate.

4. Monitor the Initial Synchronization

Watch the synchronization progress through the Longhorn UI:

Navigate to the DR volume in the RKE2 Longhorn UI
The “Last Backup” field will update when the first sync completes
Check the “Last Backup At” timestamp to confirm regular updates

You can also check via CLI:

kubectl -n longhorn-system get volumes.longhorn.io -o custom-columns=NAME:.metadata.name,STATE:.status.state,ROBUSTNESS:.status.robustness

5. Create a PVC from the DR Volume

Once the initial synchronization is complete, create a PVC from the DR volume:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: migrated-pvc-name
  namespace: your-app-namespace
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 10Gi  # Match source volume size
  selector:
    matchLabels:
      longhornvolume: dr-source-volume-name  # Label of the DR volume
EOF

6. Prepare for Application Migration

Export the application manifests from your RKE1 cluster:

kubectl get deployment,statefulset,service,configmap -n your-app-namespace -o yaml > app-manifests.yaml

Modify the manifests to reference the new PVCs:
- Update PVC names in volumes sections
- Adjust any cluster-specific configurations

7. Execute the Migration

For a planned migration with minimal downtime:

Stop the application in RKE1:

kubectl scale deployment/your-app --replicas=0 -n your-app-namespace

Trigger a final synchronization:
- In the Longhorn UI, select the DR volume
- Click “Take Disaster Recovery Backup Now”
- Wait for completion (monitor in the UI)
Activate the DR volume in RKE2:
- In the Longhorn UI, select the DR volume
- Click “Activate Disaster Recovery Volume”
- This converts the DR volume to a regular Longhorn volume
Deploy the application in RKE2:

kubectl apply -f modified-app-manifests.yaml

Verify the application is running correctly in RKE2:

kubectl get pods -n your-app-namespace
kubectl describe pvc migrated-pvc-name -n your-app-namespace

Advanced Configuration Options

Volume Groups for Multi-Volume Applications

For applications with multiple related volumes, use Longhorn’s volume groups to ensure consistency:

Create a volume group in the Longhorn UI:
- Navigate to “Volume” page
- Select multiple volumes by checking their boxes
- Click “Create Group”
- Name the group (e.g., app-name-group)
Configure DR synchronization at the group level:
- Select the group
- Click “Take Group Backup”
- Configure a schedule for the entire group

Fine-Tuning Replication Schedule

Optimize replication frequency based on data change rate and network constraints:

High Change Rate Data: Use frequent schedules like */5 * * * * (every 5 minutes)
Lower Change Rate Data: Consider hourly schedules like 0 * * * *
Before Migration: Switch to more frequent replication to minimize data loss

Network Bandwidth Management

Control replication bandwidth to prevent network congestion:

Configure global settings in the Longhorn UI:
- Navigate to “Setting” > “General”
- Adjust “Backup Concurrent Limit” (default: 5)
Or modify the CRD directly:

kubectl -n longhorn-system edit settings.longhorn.io backup-concurrent-limit

Handling Large Volumes

For very large volumes (>1TB):

Consider setting up dedicated network paths for replication traffic
Increase the initial sync window to allow complete synchronization
Use the backupstoragerecurring CRD to set a custom timeout:

apiVersion: longhorn.io/v1beta1
kind: BackupStorageRecurring
metadata:
  name: dr-large-volume-settings
  namespace: longhorn-system
spec:
  backupStorageSpec:
    backupTargetSpec:
      address: target-address
    credentialSecret: longhorn-backup-target
  recurringJobSelector:
    include: []
    exclude: []
  jobRecurringConfigs:
  - name: large-volume-backup
    task: dr-backup
    groups: []
    concurrency: 1
    retain: 10
    labels: {}
    schedule: "*/30 * * * *"
    timeoutSeconds: 14400  # 4 hours

Best Practices and Troubleshooting

Pre-Migration Preparation

Test Run: Perform a test migration with non-critical workloads
Resource Planning: Ensure RKE2 nodes have sufficient storage capacity
Version Compatibility: Verify Longhorn version compatibility between clusters
Network Testing: Validate network connectivity between clusters before migration

Common Issues and Solutions

Failed Synchronization

Issue: DR volume shows error or fails to sync
Solution: Check network connectivity and Longhorn manager logs:

kubectl -n longhorn-system logs -l app=longhorn-manager

Look for error messages related to the DR volume and address any connectivity issues.

Volume Remains in “RestoreInProgress” State

Issue: After activation, the volume stays in “RestoreInProgress”
Solution: Check for restore issues and manually force completion if needed:

kubectl -n longhorn-system describe volumes.longhorn.io volume-name

If stuck and data is verified as complete:

kubectl -n longhorn-system edit volumes.longhorn.io volume-name
# Change spec.restoreInitiated to false

Incorrect PVC Binding

Issue: PVC remains in Pending state after DR volume activation
Solution: Verify PV labels and PVC selector:

kubectl get pv -o wide
kubectl describe pvc problem-pvc

Ensure the PVC selector labels match the PV labels.

Performance Issues During Replication

Issue: Slow replication or degraded performance
Solution: Adjust concurrent backup limits and check node resource utilization:

kubectl -n longhorn-system edit settings.longhorn.io backup-concurrent-limit

Reduce the value if nodes are experiencing resource pressure.

Post-Migration Steps

After successful migration:

Cleanup Source Resources:
- Once the migration is confirmed successful, clean up the source volumes:
```
kubectl -n app-namespace delete pvc old-pvc-name
```
Update DNS and Access Points:
- Redirect traffic to services in the new RKE2 cluster
- Update any ingress configurations
Configure Regular Backups:
- Set up regular backup schedules for the new volumes
- Consider retaining the DR configuration for potential rollback needs
Performance Tuning:
- Optimize Longhorn settings in the RKE2 cluster after migration
- Consider running the Longhorn node monitoring for optimization

Conclusion

Migrating from RKE1 to RKE2 using Longhorn DR volumes provides a Kubernetes-native approach that leverages your existing storage infrastructure. The key advantages include incremental synchronization, application consistency, and minimal downtime during the cutover phase.

By following this guide, you can efficiently migrate your stateful workloads to RKE2 while maintaining data integrity and minimizing operational disruption. Longhorn’s DR capability transforms what could be a complex migration challenge into a manageable, controlled process.