Kubernetes Master Class: A Seamless Approach to Rancher & Kubernetes Upgrades // Support Tools

Kubernetes Master Class: A Seamless Approach to Rancher & Kubernetes Upgrades

Upgrading your Rancher and Kubernetes clusters requires a strategic approach to minimize downtime and ensure seamless operation. This guide covers rules, backup operations, planning steps, and upgrade procedures for Rancher and Kubernetes environments using RKE1 and RKE2.

High-Level Rules for Upgrades

Create an upgrade plan
- Using the rule listed below, create a plan for all of your upgrades and follow it.
- You might need to do multiple upgrades to get to the latest version.
- Please use Rancher Upgrade Tool to help you pick the right version(s) to upgrade to. Rancher Upgrade Tool
Don’t Rush Upgrades:
- Allow at least 24 hours between upgrades to ensure the stability of each component.
- Give yourself plenty of time for backups, testing, monitoring, and rollback. I recommend at least 4hr change window for each upgrade.
Don’t Stack Upgrades:
- Avoid upgrading Rancher, Kubernetes, and Docker/Containerd in one session to reduce risk. Perform them sequentially.
Backups Are Mandatory:
- Take ETCD snapshots and use the Rancher Backup Operator to ensure quick recovery in case of failure.
Upgrade Order:
- Follow the sequence: Rancher → Kubernetes → Docker/Containerd → Operating System.
Pause CI/CD Pipelines:
- Halt pipelines using the Rancher API to prevent conflicts during upgrades.
Test in Non-Production Environments:
- Always validate upgrades in a lab environment before deploying to production.
Review Release Notes and Support Matrix:
- Check the Rancher release notes and Kubernetes support matrix to avoid issues with version incompatibilities.
Monitor and Verify:
- Continuously monitor the health of nodes and pods after each upgrade to ensure everything is running smoothly.

Planning Your Upgrade

1. Backup Plan

Take ETCD snapshots and Rancher backups before starting any upgrades.

2. Prepare a Change Control Plan

Scheduled Windows:
- Rancher upgrade: 30 minutes (+30 minutes for rollback)
- Kubernetes upgrade: 60 minutes (or longer for large clusters)
Effect and Impact:
- Rancher upgrades: Only management functions are affected; running workloads remain unaffected.
- Kubernetes upgrades: May cause short network blips as ingress controllers restart.

3. Maintenance Window Recommendations

Rancher Upgrade: No strict window, but pause CI/CD pipelines.
Kubernetes Local Cluster: Prefer quiet hours to minimize disruptions.
Downstream Clusters: Use a maintenance window to avoid impact on production workloads.

Rancher Upgrade Procedure

Rancher Backup Operator: The Key to Seamless Upgrades

The Rancher Backup Operator automates backup and restore operations, ensuring you can recover quickly in case of a failed upgrade.

Install the Rancher Backup Operator

Add the Backup Helm Repository:

helm repo add rancher-backup https://charts.rancher.io
helm repo update

Install the Backup Operator:

helm install rancher-backup rancher-backup/rancher-backup \
--namespace cattle-resources-system --create-namespace

Verify Installation:

kubectl get pods -n cattle-resources-system

Step 1: Backup Rancher with the Backup Operator

Create a Backup Resource:

apiVersion: resources.cattle.io/v1
kind: Backup
metadata:
  name: rancher-backup
  namespace: cattle-resources-system
spec:
  storageLocation:
    s3:
      bucketName: rancher-backups
      folder: daily-backup
      endpoint: s3.amazonaws.com
      credentialSecretName: s3-credentials

Create an S3 Secret for Backup Storage:

kubectl create secret generic s3-credentials \
--namespace cattle-resources-system \
--from-literal=accessKey=<your-access-key> \
--from-literal=secretKey=<your-secret-key>

Apply the Backup Resource:
```
kubectl apply -f rancher-backup.yaml
```

Check Backup Status:

kubectl get backups -n cattle-resources-system

Step 2: Upgrade Rancher

Update Helm Repositories:

helm repo update
helm fetch rancher-stable/rancher

Upgrade Rancher with Helm:

helm upgrade --install rancher rancher-stable/rancher \
--namespace cattle-system \
--set hostname=rancher.example.com \
--version 2.9.2

Verify the Upgrade:

kubectl -n cattle-system rollout status deploy/rancher
kubectl get pods -n cattle-system -o wide

Kubernetes Upgrade Procedure (RKE1/RKE2)

Step 1: Take an ETCD Snapshot

RKE1:

rke etcd snapshot-save --config cluster.yaml --name pre-upgrade-$(date '+%Y%m%d%H%M%S')

RKE2:

etcdctl snapshot save /var/lib/rancher/etcd-snapshots/pre-upgrade-$(date '+%Y%m%d%H%M%S')

Step 2: Update the Kubernetes Version

Edit cluster.yaml (RKE1) or config.yaml (RKE2):

kubernetes_version: "v1.28.0-rancher1-1"

Step 3: Perform the Upgrade

RKE1:
```
rke up --config cluster.yaml
```
RKE2:
```
rke2-upgrade --version v1.28.0
```

Verifying and Rolling Back

Verify the Upgrade

Check the health of nodes and pods:

kubectl get nodes -o wide
kubectl get pods --all-namespaces -o wide | grep -v 'Running\|Completed'

Roll Back with Rancher Backup Operator

Create a Restore Resource:

apiVersion: resources.cattle.io/v1
kind: Restore
metadata:
  name: rancher-restore
  namespace: cattle-resources-system
spec:
  backupName: rancher-backup
  storageLocation:
    s3:
      bucketName: rancher-backups
      folder: daily-backup
      endpoint: s3.amazonaws.com
      credentialSecretName: s3-credentials

Apply the Restore Resource:
```
kubectl apply -f rancher-restore.yaml
```

Monitor the Restore:

kubectl get restores -n cattle-resources-system

Conclusion

Upgrading Rancher and Kubernetes clusters requires careful planning, regular backups, and thorough testing. Using the Rancher Backup Operator ensures fast recovery from failures. By following the outlined rules, backup strategies, and upgrade procedures, you can minimize disruptions and keep your clusters secure and stable.