Kubernetes Master Class: A Seamless Approach to Rancher & Kubernetes Upgrades
Kubernetes Master Class: A Seamless Approach to Rancher & Kubernetes Upgrades
Upgrading your Rancher and Kubernetes clusters requires a strategic approach to minimize downtime and ensure seamless operation. This guide covers rules, backup operations, planning steps, and upgrade procedures for Rancher and Kubernetes environments using RKE1 and RKE2.
High-Level Rules for Upgrades
Create an upgrade plan
- Using the rule listed below, create a plan for all of your upgrades and follow it.
- You might need to do multiple upgrades to get to the latest version.
- Please use Rancher Upgrade Tool to help you pick the right version(s) to upgrade to. Rancher Upgrade Tool
Don’t Rush Upgrades:
- Allow at least 24 hours between upgrades to ensure the stability of each component.
- Give yourself plenty of time for backups, testing, monitoring, and rollback. I recommend at least 4hr change window for each upgrade.
Don’t Stack Upgrades:
- Avoid upgrading Rancher, Kubernetes, and Docker/Containerd in one session to reduce risk. Perform them sequentially.
Backups Are Mandatory:
- Take ETCD snapshots and use the Rancher Backup Operator to ensure quick recovery in case of failure.
Upgrade Order:
- Follow the sequence: Rancher → Kubernetes → Docker/Containerd → Operating System.
Pause CI/CD Pipelines:
- Halt pipelines using the Rancher API to prevent conflicts during upgrades.
Test in Non-Production Environments:
- Always validate upgrades in a lab environment before deploying to production.
Review Release Notes and Support Matrix:
- Check the Rancher release notes and Kubernetes support matrix to avoid issues with version incompatibilities.
Monitor and Verify:
- Continuously monitor the health of nodes and pods after each upgrade to ensure everything is running smoothly.
Planning Your Upgrade
1. Backup Plan
- Take ETCD snapshots and Rancher backups before starting any upgrades.
2. Prepare a Change Control Plan
- Scheduled Windows:
- Rancher upgrade: 30 minutes (+30 minutes for rollback)
- Kubernetes upgrade: 60 minutes (or longer for large clusters)
- Effect and Impact:
- Rancher upgrades: Only management functions are affected; running workloads remain unaffected.
- Kubernetes upgrades: May cause short network blips as ingress controllers restart.
3. Maintenance Window Recommendations
- Rancher Upgrade: No strict window, but pause CI/CD pipelines.
- Kubernetes Local Cluster: Prefer quiet hours to minimize disruptions.
- Downstream Clusters: Use a maintenance window to avoid impact on production workloads.
Rancher Upgrade Procedure
Rancher Backup Operator: The Key to Seamless Upgrades
The Rancher Backup Operator automates backup and restore operations, ensuring you can recover quickly in case of a failed upgrade.
Install the Rancher Backup Operator
Add the Backup Helm Repository:
helm repo add rancher-backup https://charts.rancher.io helm repo update
Install the Backup Operator:
helm install rancher-backup rancher-backup/rancher-backup \ --namespace cattle-resources-system --create-namespace
Verify Installation:
kubectl get pods -n cattle-resources-system
Step 1: Backup Rancher with the Backup Operator
Create a Backup Resource:
apiVersion: resources.cattle.io/v1 kind: Backup metadata: name: rancher-backup namespace: cattle-resources-system spec: storageLocation: s3: bucketName: rancher-backups folder: daily-backup endpoint: s3.amazonaws.com credentialSecretName: s3-credentials
Create an S3 Secret for Backup Storage:
kubectl create secret generic s3-credentials \ --namespace cattle-resources-system \ --from-literal=accessKey=<your-access-key> \ --from-literal=secretKey=<your-secret-key>
Apply the Backup Resource:
kubectl apply -f rancher-backup.yaml
Check Backup Status:
kubectl get backups -n cattle-resources-system
Step 2: Upgrade Rancher
Update Helm Repositories:
helm repo update helm fetch rancher-stable/rancher
Upgrade Rancher with Helm:
helm upgrade --install rancher rancher-stable/rancher \ --namespace cattle-system \ --set hostname=rancher.example.com \ --version 2.9.2
Verify the Upgrade:
kubectl -n cattle-system rollout status deploy/rancher kubectl get pods -n cattle-system -o wide
Kubernetes Upgrade Procedure (RKE1/RKE2)
Step 1: Take an ETCD Snapshot
RKE1:
rke etcd snapshot-save --config cluster.yaml --name pre-upgrade-$(date '+%Y%m%d%H%M%S')
RKE2:
etcdctl snapshot save /var/lib/rancher/etcd-snapshots/pre-upgrade-$(date '+%Y%m%d%H%M%S')
Step 2: Update the Kubernetes Version
Edit cluster.yaml
(RKE1) or config.yaml
(RKE2):
kubernetes_version: "v1.28.0-rancher1-1"
Step 3: Perform the Upgrade
RKE1:
rke up --config cluster.yaml
RKE2:
rke2-upgrade --version v1.28.0
Verifying and Rolling Back
Verify the Upgrade
Check the health of nodes and pods:
kubectl get nodes -o wide
kubectl get pods --all-namespaces -o wide | grep -v 'Running\|Completed'
Roll Back with Rancher Backup Operator
Create a Restore Resource:
apiVersion: resources.cattle.io/v1 kind: Restore metadata: name: rancher-restore namespace: cattle-resources-system spec: backupName: rancher-backup storageLocation: s3: bucketName: rancher-backups folder: daily-backup endpoint: s3.amazonaws.com credentialSecretName: s3-credentials
Apply the Restore Resource:
kubectl apply -f rancher-restore.yaml
Monitor the Restore:
kubectl get restores -n cattle-resources-system
Conclusion
Upgrading Rancher and Kubernetes clusters requires careful planning, regular backups, and thorough testing. Using the Rancher Backup Operator ensures fast recovery from failures. By following the outlined rules, backup strategies, and upgrade procedures, you can minimize disruptions and keep your clusters secure and stable.