Steve is a critical component in Rancher’s architecture, acting as a caching proxy between Rancher and downstream Kubernetes clusters. By introducing an efficient caching layer, Steve optimizes how Rancher’s UI interacts with large-scale Kubernetes environments.

Steve GitHub Repository

Rancher Components Diagram

Figure: Rancher Components Overview

What is Rancher’s Steve?

Introduction to Steve

Steve is Rancher’s Kubernetes API aggregation server. It provides a caching layer that stores and serves Kubernetes resources, reducing the load on downstream clusters and improving Rancher’s responsiveness. This is particularly beneficial in large environments with multiple clusters or high-frequency API calls.

How Steve Powers Rancher Agents and Server Pods

Steve’s Role in the Cattle-Cluster-Agent

The cattle-cluster-agent plays a vital role in syncing data between Rancher and downstream clusters. When the cattle-cluster-agent starts, Steve immediately begins to populate its cache by pulling all relevant Kubernetes resources from the downstream cluster.

1. Initial Cache Population

Upon startup, the cattle-cluster-agent connects to the downstream Kubernetes cluster and retrieves a comprehensive set of resources, including but not limited to:

  • Nodes
  • Pods
  • Deployments
  • Services
  • ConfigMaps
  • Secrets
  • Custom Resources (CRDs)

This initial fetch ensures that the agent has a complete view of the downstream cluster. Steve stores these resources in its local cache, which is then synchronized with the Rancher leader pod.

The Rancher leader pod uses the same caching mechanism to keep its state in sync with the cattle-cluster-agent. This allows Rancher to serve API requests efficiently without making frequent direct calls to the downstream Kubernetes API.

2. Continuous Synchronization with Watch Handlers

Once the initial cache is populated, Steve sets up watch handlers for each resource type. These handlers maintain a persistent connection to the downstream Kubernetes API, listening for real-time updates. Watch handlers ensure that any changes in the downstream cluster are immediately reflected in Steve’s cache.

Key events monitored by the watch handlers include:

  • Resource Creation: New pods, services, or other resources are added to the cache.
  • Resource Updates: Changes in existing resources, such as updated configurations or status, are applied to the cache.
  • Resource Deletion: Removed resources are purged from the cache.

This ensures Rancher’s state remains accurate and up-to-date without excessive API calls.

3. Leader Election and Cache Distribution

Steve operates in conjunction with Rancher’s leader election process. In a multi-pod Rancher deployment, only the leader pod synchronizes directly with the cattle-cluster-agent. Once the leader pod’s cache is updated, other Rancher pods query the leader’s cache for resource data, ensuring consistency across the deployment.

This design improves system resilience and scalability by distributing the load and avoiding redundant API requests.

Why Use Steve?

Steve provides several advantages:

Reduced API Load

By caching resources locally, Steve significantly reduces the number of API calls sent to the downstream Kubernetes API server. This helps preserve the control plane’s performance, especially in large or heavily utilized clusters.

Low Latency

Since resources are retrieved from the cache rather than the Kubernetes API, Rancher can respond to API requests quickly, improving the user experience in the Rancher UI and API clients.

Real-Time Updates

The watch handlers ensure that Rancher’s cache is always in sync with the actual state of the downstream cluster, providing an accurate and real-time view of resources.

Scalability

Steve allows Rancher to efficiently manage hundreds or even thousands of clusters. Each cluster’s data is cached and synchronized independently, allowing Rancher to scale horizontally without performance degradation.

API Features of Steve

Steve exposes a /v1 API, which makes it easy to query Kubernetes resources. This API supports methods like GET, POST, PATCH, PUT, and DELETE, depending on the underlying Kubernetes resource and the user’s permissions.

Examples of API endpoints include:

  • /v1/pods: List all pods in the cluster.
  • /v1/deployments: Retrieve all deployments.
  • /v1/services/default/nginx-service: Fetch details of a specific service in the default namespace.

Steve’s API also includes query parameters for filtering, sorting, and pagination, which are especially useful for managing large datasets.

Conclusion

Steve is an indispensable part of Rancher’s architecture, providing a robust caching and synchronization mechanism between Rancher and downstream Kubernetes clusters. By reducing API load, improving response times, and ensuring real-time synchronization, Steve enables Rancher to manage clusters at scale efficiently.

For a deeper technical dive, check out the Steve GitHub Repository.