Migrating from RKE1 to RKE2 is an essential transition for organizations relying on Rancher-managed Kubernetes clusters. With RKE1 reaching end-of-life (EOL) on July 31, 2025, moving to RKE2 ensures ongoing support, security updates, and performance improvements.

Why Migrate from RKE1 to RKE2?

RKE1 will no longer receive security patches or updates beyond its EOL date. According to the official SUSE announcement, RKE1 support ends July 31, 2025.

Delaying migration increases operational risk due to:

  • Lack of security updates
  • Compatibility issues with future Kubernetes versions
  • Missing out on critical performance improvements

Here are the key advantages of moving to RKE2:

  • Improved Security: SELinux support, FIPS compliance, and Pod Security Standards.
  • Better Performance: RKE2 uses containerd, optimizing resource utilization and reducing overhead.
  • Long-Term Stability: RKE2 aligns closely with upstream Kubernetes for better future compatibility.
  • Seamless Rancher Integration: Multi-cluster management with built-in rolling upgrades.

Beyond RKE1 EOL: Other Reasons for Cluster Migration

While RKE1’s EOL is a pressing concern, there are other scenarios where cluster migration becomes necessary:

Moving Into and Out of the Cloud

Organizations frequently move workloads between cloud and on-premises environments for cost savings, compliance, performance, and vendor flexibility.

Common Challenges:

  • Networking Differences: VPC configurations, CNI plugins, and ingress controllers need reconfiguration
  • Cloud Storage Differences: Persistent volume formats are cloud-specific (e.g., AWS EBS vs. Azure Disks)
  • IAM & Security Policies: RBAC and firewall rules require updates

Example Use Cases:

  • AWS EKS → RKE2 on-prem for cost control and compliance
  • Self-managed Kubernetes → managed services (EKS, AKS, GKE)
  • Hybrid & multi-cloud scaling for resilience

Disaster Recovery (DR) & High Availability

Ensuring business continuity by maintaining failover clusters or running workloads across multiple regions.

Benefits:

  • Minimize downtime during failures
  • Protection against outages (cloud, network, hardware)
  • Regulatory compliance with business continuity requirements

Key Challenges:

  • Keeping stateful applications in sync
  • Failover orchestration using DNS, load balancers, or BGP
  • Storage and data replication across environments

Foundational Changes & Infrastructure Upgrades

Major infrastructure changes often require migration rather than in-place upgrades.

Common Scenarios:

  • Adopting new Kubernetes architectures
  • Improving performance and scalability
  • Enhancing security and compliance
  • Switching container runtimes (Docker → Containerd)
  • Upgrading storage solutions

Choosing the Right Migration Strategy

Migration isn’t a one-size-fits-all approach. The right strategy depends on several factors:

  • Timeline: How quickly do you need to complete the migration?
  • Risk Tolerance: Can you afford downtime or need a gradual transition?
  • Team Involvement: Will this be admin-driven or do app teams need control?
  • Cluster Differences: Are you making minimal changes or a major infrastructure shift?

Below are the three common migration strategies to consider:

1. Lift-and-Shift (Fastest, but Riskier)

What is Lift-and-Shift?

  • You as the cluster admin move all workloads from one cluster to another in one big move
  • Little to no changes are made to applications or configurations
  • Best when workloads are compatible with the new cluster

Pros:

  • Fastest migration method - Everything moves at once
  • Minimal app team involvement - Admin-driven process
  • Works well when clusters are nearly identical (same Kubernetes version, storage, etc.)

Cons:

  • Higher risk of failures - No gradual testing phase
  • Potential downtime - Some workloads may need to restart in the new cluster
  • Infrastructure differences may require post-move fixes

2. Rolling Migration (Balanced Approach)

What is Rolling Migration?

  • You as the cluster admin move applications one at a time in coordination with app teams
  • Small to medium-size changes to applications may be made to better utilize the new environment
  • Each app team tests and validates their services in the new cluster before fully migrating

Pros:

  • Minimized risk - Applications are moved gradually with validation
  • App teams validate their own workloads - Less troubleshooting after migration
  • No major downtime - Old cluster stays online while workloads migrate

Cons:

  • Slower migration process - Requires coordination with multiple teams
  • Potential inconsistencies - If teams don’t migrate in sync, dependencies may break
  • Higher resource costs - Both clusters run in parallel during migration

3. Phased Migration (Most Flexible, Requires App Team Cooperation)

What is Phased Migration?

  • You as the cluster admin build a new cluster and inform app teams that they need to migrate
  • Responsibility is on app teams to move their workloads when ready
  • Original cluster stays online until everything is moved, then decommissioned

Pros:

  • Less work for cluster admins - App teams handle their own migrations
  • Flexibility - Teams move on their own timeline, reducing coordination pressure
  • Great for major infrastructure changes - Teams can refactor if needed before moving

Cons:

  • Unpredictable timeline - Some teams may delay migration, leaving two clusters running longer
  • Potential inconsistencies - If teams don’t migrate in a structured way, dependencies may break
  • May require temporary workarounds - Cross-cluster communication might be needed during migration

Migration Methods – Choosing the Right Approach

Different workloads and environments require different migration techniques. When selecting a method, consider:

  • Are your workloads stateless or stateful?
  • Do you need a fast migration or a controlled process?
  • How critical is data consistency?
  • What’s your team’s expertise with various migration tools?

1. YAML Export/Import

  • How It Works:

    • Export workloads using:
      kubectl get resource -o yaml > backup.yaml
      
    • Apply them in the new cluster with:
      kubectl apply -f backup.yaml
      
  • Pros:

    • Fast and simple, no extra tools required
    • Good for stateless workloads (Deployments, Services, ConfigMaps)
  • Cons:

    • No Persistent Volume (PV) migration, must move storage separately
    • Manual and error-prone, requires careful dependency handling
  • Best for:

  • Open Source Tool:

2. DR-Syncer

  • How It Works:

    • Replicates Deployments, Services, ConfigMaps, Secrets, Persistent Volumes across clusters
    • Ensures scheduled syncing for seamless migration
  • Pros:

    • Purpose-built for Kubernetes migrations/DR – Handles both workloads and PVs
    • Minimizes downtime – Keeps namespaces and data synchronized
    • More efficient than manual YAML exports – Reduces human error
  • Cons:

    • Requires setup & configuration
    • May need cluster connectivity – Ensure network policies allow cross-cluster syncs
    • Requires similar cluster setup – Target cluster should match source
    • Target cluster must have storage configured for PV replication
  • Best for:

    • Stateless and Stateful applications that need replication between clusters
  • Open Source Tool:

3. Backup and Restore Tools

  • How It Works:

    • Backup workloads in the old cluster
    • Restore them in the new cluster, including Persistent Volumes
  • Pros:

    • Works across cloud and on-prem clusters
    • Backs up all workloads including PVs, RBAC, and secrets
  • Cons:

    • Requires object storage (AWS S3, MinIO, Azure Blob)
    • May be slow for large clusters with many Persistent Volumes
    • Some solutions require paid licenses
  • Best for:

    • Full-cluster migrations needing persistent storage and security settings
    • Backup and disaster recovery strategies
  • Detailed guides available for:

    • Velero - Open-source Kubernetes backup/restore with plugin architecture
    • CloudCasa - Cloud-based backup solution with comprehensive resource coverage
    • Kasten K10 - Application-centric Kubernetes data management platform

4. Redeploy (GitOps)

  • How It Works:

    • Update the target cluster in your pipelines to reflect the new environment
    • Deploy a fresh environment in the new cluster using Helm, Kustomize, or GitOps (ArgoCD, Flux)
    • Migrate data separately using snapshots, database replication, or manual restores
  • Pros:

    • Ensures a clean deployment, avoiding legacy config issues
    • Best for infrastructure upgrades or Kubernetes version changes
  • Cons:

    • No automatic PV migration, must handle database and storage manually
    • Takes more time, especially for complex applications
    • Requires applications to be fully defined as code (IaC/GitOps)
  • Best for:

    • Organizations following Infrastructure-as-Code (IaC) or GitOps practices
    • Teams migrating to declarative deployments for better reproducibility

5. Cattle-Drive for Rancher Resources

  • How It Works:

    • Migrates Rancher-specific objects from source to target cluster
    • Includes Projects, Namespaces, Rancher Permissions, Cluster Apps, and Catalog Repos
  • Pros:

    • Automates the migration of Rancher resources between clusters
    • Preserves project structure and access controls
  • Cons:

    • Does not migrate your applications
    • Limited to Rancher-specific resources
  • Best for:

    • Use with redeployment migrations where you don’t want to manually recreate Projects and permissions
  • Open Source Tool:

Data Migration Methods

Migrating persistent data is crucial to maintaining application stability. Here are the recommended approaches:

Longhorn DR Volumes

  • How It Works:

    • Longhorn’s Disaster Recovery (DR) volumes sync with a backup cluster on a scheduled basis
    • Uses incremental restores to minimize transfer time
    • DR volume is created from a volume’s backup in the backupstore
    • Scheduled backup intervals determine how frequently data is updated
  • Pros:

    • Scheduled Data Syncing – Uses periodic snapshots and incremental restoration
    • Faster Recovery vs. Full Backup Restores – Avoids recovering entire volumes from scratch
    • Built-in with Longhorn – No additional tools required for Longhorn users
  • Cons:

    • Not real-time replication – Data is only as current as the last scheduled backup
    • No live snapshots or backups on DR volumes
    • Recovery Point Objective (RPO) depends on backup frequency
  • Best for:

    • Organizations already using Longhorn for persistent storage
    • Detailed guide on migrating using Longhorn DR volumes

pv-migrate

  • How It Works:

    • CLI tool that migrates Persistent Volume Claims (PVCs) across namespaces, clusters, or storage backends
    • Uses rsync over SSH with Load Balancers, Bind Mounts, and Port-Forwarding for data transfer
    • Supports multiple migration strategies, automatically selecting the most efficient method
  • Pros:

    • Works across namespaces, clusters, and storage backends – Not tied to a specific CSI driver
    • Secure migrations – Uses SSH and rsync for encrypted data transfer
    • Multiple migration strategies – Falls back to different approaches when needed
    • Highly customizable – Configure rsync/SSH images, affinity, and network settings
  • Cons:

    • Requires storage compatibility – Target storage class must support expected access modes
    • Live data requires careful handling – Works best for pre-migration syncing
    • Networking considerations – Cross-cluster migrations require proper network connectivity
  • Best for:

  • Open Source Tool:

Backup and Restore Solutions

  • Backup and restore solutions that work across cloud and on-prem environments

  • Pros: Support full-cluster backups, including PVCs, RBAC, and custom resources

  • Cons: Slower for large clusters, requires object storage (e.g., AWS S3)

  • Detailed guides available for:

    • Velero - Open-source backup/restore tool
    • CloudCasa - SaaS Kubernetes backup solution
    • Kasten K10 - Enterprise data management platform

Common Migration Failures & Troubleshooting

Even with careful planning, migrations can encounter issues. Here are common problems and their solutions:

1. Missing Critical Cluster Services

Issue: After migration, applications fail due to missing dependencies like cert-manager, monitoring, or GitOps tools.

Fix:

  • Ensure required cluster services are installed first (cert-manager, Prometheus, ArgoCD)
  • Deploy cluster-wide services before migrating workloads

2. Forgetting Cluster-Scoped Resources

Issue: Applications fail to start because ClusterRoles, RoleBindings, or CRDs are missing.

Fix:

  • Export and apply CRDs before migrating workloads:
    kubectl get crd -o yaml > crds.yaml
    kubectl apply -f crds.yaml
    
  • Ensure RBAC rules (ClusterRoleBindings, ClusterRoles) are migrated properly
  • List cluster-wide resources with:
    kubectl api-resources --verbs=list --namespaced=false
    

3. Secrets Not Stored Externally

Issue: Applications crash because Secrets were lost during migration.

Fix:

  • Externalize secrets using Vault, AWS Secrets Manager, or Kubernetes External Secrets
  • Backup secrets before migration:
    kubectl get secrets -A -o yaml > secrets-backup.yaml
    
  • Restore secrets manually or via GitOps after migration

4. CNI Changes Impact Network Policies

Issue: A different CNI (Calico, Cilium, etc.) can change network policies, causing communication failures.

Fix:

  • Check existing network policies before migration:
    kubectl get networkpolicy -A
    
  • Verify pod-to-pod and pod-to-service communication is allowed
  • Update network policies to match the new CNI’s behavior before migration

Best Practices for a Smooth Migration

  • Pre-flight validation: Run kubectl get all -A to detect missing resources
  • Test migration in staging: Never migrate production workloads without a test run
  • Use GitOps for consistency: Store and redeploy cluster-wide resources via ArgoCD or Flux
  • Document dependencies: Ensure all external services, cluster-scoped resources, and security policies are accounted for
  • Inventory Resources: Run kubectl api-resources on both clusters to identify potential CRD compatibility issues
  • Resource Planning: Ensure RKE2 nodes have sufficient capacity for all workloads
  • Version Compatibility: Verify compatibility of operators and controllers between clusters
  • Network Testing: Validate network connectivity between clusters before migration

Conclusion

Migrating from RKE1 to RKE2 is a critical step to ensure your Kubernetes clusters remain secure, performant, and supported. With RKE1 reaching end-of-life in 2025, organizations need to plan their transition strategy now.

By understanding the different migration strategies and choosing the right migration method for your specific workloads, you can transition seamlessly with minimal downtime. The key is thorough preparation, testing, and addressing common challenges before they impact your production environment.

We’ve created detailed guides for several migration methods to help you through the process:

For further discussion, feel free to connect with me at support.tools or check out my book Rancher Deep Dive for in-depth insights into Kubernetes and Rancher management.


Additional Resources