Go Microservices: Common Bugs, Pitfalls, and Solutions for Kubernetes Environments
Go has become a dominant language for building microservices, particularly those deployed on Kubernetes. Its lightweight concurrency model, strong standard library, and excellent performance make it an ideal choice for distributed systems. However, with these advantages come unique challenges and potential pitfalls that can lead to subtle bugs, memory leaks, and performance issues.
Go Microservices: Common Bugs, Pitfalls, and Solutions for Kubernetes Environments
Understanding Go-Specific Challenges in Microservices
Go’s approach to concurrency through goroutines and channels, while powerful, introduces complexity that requires careful design and understanding. In microservices architectures, where components interact asynchronously and scale independently, these challenges are amplified.
Common patterns that lead to issues include:
- Goroutine Management: Spawning goroutines without proper lifecycle management
- Channel Design: Misuse of channels leading to deadlocks or resource leaks
- Memory Management: Unexpected memory growth due to subtle reference holding
- Error Handling: Insufficient error propagation across service boundaries
- Performance Bottlenecks: Inefficient algorithms or data structures for high-volume processing
Let’s explore these issues with real-world examples and practical solutions, particularly focusing on Kubernetes environments.
Goroutine Leaks and Management
The Leaking Goroutine Problem
One of the most common issues in Go microservices is goroutine leaks, where new goroutines are created but never terminated.
Consider this example:
func handleConnection(conn net.Conn, db *sql.DB) {
go processRequests(conn, db)
}
func processRequests(conn net.Conn, db *sql.DB) {
defer conn.Close()
for {
// Read request from connection
req, err := readRequest(conn)
if err != nil {
log.Printf("Error reading request: %v", err)
return
}
// Process each request in a new goroutine
go func(request Request) {
// Query database (may block for a long time)
result, err := db.Query("SELECT * FROM data WHERE id = ?", request.ID)
if err != nil {
log.Printf("Database error: %v", err)
return
}
defer result.Close()
// Send response
sendResponse(conn, result)
}(req)
}
}
What’s wrong here?
- Each incoming request spawns a new goroutine with no limit
- If the database is slow, goroutines will pile up
- There’s no context for cancellation if the client disconnects
- The parent goroutine doesn’t track or wait for child goroutines
Solution: Worker Pools and Context Propagation
Here’s a better approach:
func handleConnection(conn net.Conn, db *sql.DB, workerPool *WorkerPool) {
ctx, cancel := context.WithCancel(context.Background())
defer cancel() // Ensure all operations are cancelled when we return
go processRequests(ctx, conn, db, workerPool)
}
func processRequests(ctx context.Context, conn net.Conn, db *sql.DB, workerPool *WorkerPool) {
defer conn.Close()
for {
select {
case <-ctx.Done():
return // Context cancelled, stop processing
default:
// Read request with timeout
conn.SetReadDeadline(time.Now().Add(5 * time.Second))
req, err := readRequest(conn)
if err != nil {
if netErr, ok := err.(net.Error); ok && netErr.Timeout() {
continue // Just a timeout, keep waiting
}
log.Printf("Error reading request: %v", err)
return
}
// Submit task to worker pool instead of spawning unlimited goroutines
err = workerPool.Submit(func() {
requestCtx, requestCancel := context.WithTimeout(ctx, 10*time.Second)
defer requestCancel()
// Use context-aware database methods
result, err := db.QueryContext(requestCtx, "SELECT * FROM data WHERE id = ?", req.ID)
if err != nil {
log.Printf("Database error: %v", err)
return
}
defer result.Close()
// Send response with deadline
conn.SetWriteDeadline(time.Now().Add(5 * time.Second))
sendResponse(conn, result)
})
if err != nil {
// Worker pool full or shutting down
log.Printf("Worker pool error: %v", err)
sendRejection(conn)
}
}
}
}
// Simple worker pool implementation
type WorkerPool struct {
tasks chan func()
wg sync.WaitGroup
}
func NewWorkerPool(maxWorkers int, queueSize int) *WorkerPool {
pool := &WorkerPool{
tasks: make(chan func(), queueSize),
}
// Start fixed number of workers
pool.wg.Add(maxWorkers)
for i := 0; i < maxWorkers; i++ {
go func() {
defer pool.wg.Done()
for task := range pool.tasks {
task()
}
}()
}
return pool
}
func (p *WorkerPool) Submit(task func()) error {
select {
case p.tasks <- task:
return nil
default:
return errors.New("worker pool queue full")
}
}
func (p *WorkerPool) Shutdown() {
close(p.tasks)
p.wg.Wait()
}
Key improvements:
- Worker Pool: Limits the number of concurrent operations
- Context Propagation: Ensures proper cancellation
- Timeouts: Prevents indefinite blocking
- Graceful Shutdown: Properly closes connections and waits for tasks
Kubernetes Considerations
When running in Kubernetes, pod termination becomes a critical consideration:
func main() {
// Initialize resources
db, err := sql.Open("mysql", dsn)
if err != nil {
log.Fatalf("Failed to connect to database: %v", err)
}
defer db.Close()
workerPool := NewWorkerPool(100, 1000)
defer workerPool.Shutdown()
server := &http.Server{
Addr: ":8080",
Handler: setupRoutes(db, workerPool),
}
// Start server
go func() {
if err := server.ListenAndServe(); err != nil && err != http.ErrServerClosed {
log.Fatalf("Server error: %v", err)
}
}()
// Handle graceful shutdown
signals := make(chan os.Signal, 1)
signal.Notify(signals, syscall.SIGTERM, syscall.SIGINT)
<-signals
log.Println("Shutdown signal received, exiting...")
// Create context with timeout for graceful shutdown
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
// Gracefully shut down the server
if err := server.Shutdown(ctx); err != nil {
log.Printf("Server shutdown error: %v", err)
}
// Wait for worker pool to complete outstanding tasks
workerPool.Shutdown()
log.Println("Server exited properly")
}
This pattern ensures that when Kubernetes sends a SIGTERM signal during pod termination, your service completes in-flight requests before shutting down.
Channel Misuse and Deadlocks
The Deadlock Problem
Channels are a powerful Go feature, but they’re also a common source of bugs. Consider this microservice that processes events:
func processEvents(events <-chan Event, results chan<- Result) {
for event := range events {
// Process event
result := processEvent(event)
// Send result - potential deadlock here!
results <- result
}
}
func main() {
events := make(chan Event)
results := make(chan Result)
// Start processor
go processEvents(events, results)
// Send events
for _, event := range getEvents() {
events <- event
}
close(events)
// Collect results - may never complete if processEvent panics
var allResults []Result
for result := range results {
allResults = append(allResults, result)
}
}
What’s wrong here?
- If
processEventpanics, the goroutine exits and no one closes theresultschannel - The main goroutine will deadlock waiting for results
- There’s no timeout or cancellation mechanism
- Unbuffered channels mean the producer blocks until the consumer reads
Solution: Buffered Channels, WaitGroups, and Error Handling
A more robust approach:
func processEvents(ctx context.Context, events <-chan Event) <-chan Result {
results := make(chan Result, 100) // Buffered channel
var wg sync.WaitGroup
// Launch workers
for i := 0; i < 10; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for {
select {
case event, ok := <-events:
if !ok {
return // Channel closed
}
// Process with recover to prevent goroutine exit on panic
func() {
defer func() {
if r := recover(); r != nil {
log.Printf("Recovered from panic: %v", r)
// Send error result instead of crashing
select {
case results <- Result{Error: fmt.Errorf("processing panic: %v", r)}:
case <-ctx.Done():
// Context cancelled, just exit
}
}
}()
// Process and send result
result := processEvent(event)
select {
case results <- result:
// Result sent successfully
case <-ctx.Done():
// Context cancelled, drop the result
}
}()
case <-ctx.Done():
return // Context cancelled
}
}
}()
}
// Close results channel when all processing completes
go func() {
wg.Wait()
close(results)
}()
return results
}
func main() {
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
defer cancel()
events := make(chan Event, 100) // Buffered channel
// Start sending events in background
go func() {
defer close(events) // Ensure channel gets closed
for _, event := range getEvents() {
select {
case events <- event:
// Event sent
case <-ctx.Done():
// Context cancelled, stop sending
return
}
}
}()
// Process events and collect results
results := processEvents(ctx, events)
var allResults []Result
for result := range results {
if result.Error != nil {
log.Printf("Error processing event: %v", result.Error)
continue
}
allResults = append(allResults, result)
}
}
Key improvements:
- Buffered Channels: Reduces blocking and improves throughput
- WaitGroup: Ensures proper synchronization between goroutines
- Panic Recovery: Prevents goroutine crashes from blocking the entire system
- Context Usage: Provides timeout and cancellation mechanisms
- Select Statements: Prevents deadlocks when sending on channels
Real-World Example: Signal Distribution Problem
A common pattern in Go microservices is to use channels to distribute signals to multiple consumers. However, this can lead to subtle bugs:
func broadcastSignal(signal chan struct{}, handlers []Handler) {
<-signal // Wait for signal
// Broadcast to all handlers - PROBLEM!
for _, handler := range handlers {
handler.Channel <- struct{}{} // Will block if any handler isn't ready
}
}
In this pattern, if any handler isn’t ready to receive, the entire broadcast blocks. A better approach:
func broadcastSignal(ctx context.Context, signal chan struct{}, handlers []Handler) {
select {
case <-signal:
// Signal received, broadcast to all handlers
case <-ctx.Done():
return // Context cancelled
}
// Use separate goroutine for each handler with timeout
var wg sync.WaitGroup
for _, handler := range handlers {
wg.Add(1)
go func(h Handler) {
defer wg.Done()
// Try to send with timeout
sendCtx, cancel := context.WithTimeout(ctx, 5*time.Second)
defer cancel()
select {
case h.Channel <- struct{}{}:
// Successfully sent
case <-sendCtx.Done():
log.Printf("Failed to send signal to handler: timeout")
}
}(handler)
}
// Optionally wait for all broadcasts to complete
wg.Wait()
}
Memory Management Pitfalls
The Slice Reference Problem
Go’s slices hold references to underlying arrays. This can lead to unexpected memory retention:
func processLargeData(data []byte) []byte {
// Extract just a small portion of the data
return data[:20] // PROBLEM: This still references the entire original array
}
func main() {
// Read a large file into memory
largeData, _ := ioutil.ReadFile("largefile.dat") // 100MB
// Process many files, storing small results
var results [][]byte
for i := 0; i < 1000; i++ {
// This will eventually consume ~100GB of memory despite only needing ~20KB
results = append(results, processLargeData(largeData))
}
}
Solution: Copy Data When Slicing
func processLargeData(data []byte) []byte {
// Create a copy of just the portion we need
result := make([]byte, 20)
copy(result, data[:20])
return result
}
JSON Unmarshaling Memory Consumption
Another common issue occurs with JSON parsing:
func getUsers(w http.ResponseWriter, r *http.Request) {
// Read entire request body - potentially very large
body, err := ioutil.ReadAll(r.Body)
if err != nil {
http.Error(w, "Failed to read request", http.StatusBadRequest)
return
}
var request UserRequest
if err := json.Unmarshal(body, &request); err != nil {
http.Error(w, "Invalid request format", http.StatusBadRequest)
return
}
// Process request...
}
The problem is that we read the entire request body, which could be very large, before parsing it.
Solution: Stream Processing with Decoders
func getUsers(w http.ResponseWriter, r *http.Request) {
// Limit request size and parse directly from the stream
r.Body = http.MaxBytesReader(w, r.Body, 1<<20) // 1MB limit
var request UserRequest
dec := json.NewDecoder(r.Body)
if err := dec.Decode(&request); err != nil {
http.Error(w, "Invalid request format", http.StatusBadRequest)
return
}
// Process request...
}
Container Memory Management
In Kubernetes, container memory limits make proper memory management even more important. Consider setting appropriate limits in your deployment manifests:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 3
template:
spec:
containers:
- name: myapp
image: myapp:v1
resources:
requests:
memory: "64Mi"
cpu: "100m"
limits:
memory: "128Mi"
cpu: "200m"
Then ensure your application respects these limits:
func main() {
// Configure sensible defaults based on container memory
var maxWorkers int
var cacheSize int
// Detect memory limits (Kubernetes sets this in container)
if memLimit := os.Getenv("GOMEMLIMIT"); memLimit != "" {
// Parse value like "128MiB"
if bytes, err := parseMemoryString(memLimit); err == nil {
// Scale workers based on available memory
maxWorkers = int(bytes / (10 * 1024 * 1024)) // 10MB per worker
if maxWorkers < 2 {
maxWorkers = 2 // Minimum 2 workers
}
// Set cache size to 40% of memory limit
cacheSize = int(float64(bytes) * 0.4)
}
} else {
// Default values if not running in container
maxWorkers = 10
cacheSize = 100 * 1024 * 1024 // 100MB
}
// Initialize with dynamic settings
workerPool := NewWorkerPool(maxWorkers)
cache := NewCache(cacheSize)
// Continue with application initialization...
}
Performance Bottlenecks
The String Concatenation Problem
Inefficient string handling can significantly impact performance:
func buildReport(items []Item) string {
var report string
// Inefficient string concatenation
for _, item := range items {
report += fmt.Sprintf("ID: %s, Name: %s, Value: %f\n",
item.ID, item.Name, item.Value)
}
return report
}
Each concatenation allocates a new string, copying all previous content.
Solution: Use StringBuilder
func buildReport(items []Item) string {
var report strings.Builder
// Efficient string building
for _, item := range items {
fmt.Fprintf(&report, "ID: %s, Name: %s, Value: %f\n",
item.ID, item.Name, item.Value)
}
return report.String()
}
JSON Marshaling Performance
Default JSON marshaling is convenient but can be slow for high-throughput services:
func handleRequest(w http.ResponseWriter, r *http.Request) {
result := processRequest(r)
// Marshal and send response
responseJSON, err := json.Marshal(result)
if err != nil {
http.Error(w, "Error generating response", http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "application/json")
w.Write(responseJSON)
}
Solution: Use Custom Marshaling or Alternative Libraries
import "github.com/json-iterator/go"
var json = jsoniter.ConfigCompatibleWithStandardLibrary
func handleRequest(w http.ResponseWriter, r *http.Request) {
result := processRequest(r)
// Use faster JSON library
responseJSON, err := json.Marshal(result)
if err != nil {
http.Error(w, "Error generating response", http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "application/json")
w.Write(responseJSON)
}
Alternatively, for extremely performance-critical paths, consider custom marshaling:
type FastResponse struct {
Status string `json:"status"`
Count int `json:"count"`
Results []Item `json:"results"`
}
// MarshalJSON implements custom marshaling
func (r *FastResponse) MarshalJSON() ([]byte, error) {
var b strings.Builder
// Hand-code the JSON structure for maximum performance
b.WriteString(`{"status":"`)
b.WriteString(r.Status)
b.WriteString(`","count":`)
b.WriteString(strconv.Itoa(r.Count))
b.WriteString(`,"results":[`)
for i, item := range r.Results {
if i > 0 {
b.WriteByte(',')
}
itemJSON, err := json.Marshal(item)
if err != nil {
return nil, err
}
b.Write(itemJSON)
}
b.WriteString(`]}`)
return []byte(b.String()), nil
}
Slow Data Processing: The Hidden Copy Problem
One unexpected performance issue involves iterating through large data structures:
func processItems(items []LargeItem) {
for _, item := range items { // Each iteration copies the entire LargeItem
processItem(item)
}
}
For large structs, this creates significant copying overhead.
Solution: Use Pointers or Indexes for Iteration
func processItems(items []LargeItem) {
for i := range items { // No copy, just the index
processItem(&items[i])
}
}
// Or with pointers
func processItemPointers(items []*LargeItem) {
for _, item := range items { // Only copying pointers, not the large structs
processItem(item)
}
}
Debugging Go Microservices in Kubernetes
When bugs occur in production Kubernetes environments, having the right tooling is essential.
Effective Logging Strategies
Structured logging is critical for microservices:
import "go.uber.org/zap"
func processOrder(ctx context.Context, orderID string) error {
logger := zap.L().With(
zap.String("orderID", orderID),
zap.String("traceID", extractTraceID(ctx)),
)
logger.Info("Processing order")
// Attempt to retrieve order
order, err := getOrder(ctx, orderID)
if err != nil {
logger.Error("Failed to retrieve order",
zap.Error(err),
zap.String("errorType", reflect.TypeOf(err).String()),
)
return err
}
// Process payment
paymentResult, err := processPayment(ctx, order)
if err != nil {
logger.Error("Payment processing failed",
zap.Error(err),
zap.String("paymentMethod", order.PaymentMethod),
zap.Float64("amount", order.TotalAmount),
)
return err
}
logger.Info("Order processed successfully",
zap.String("paymentID", paymentResult.ID),
zap.Duration("processingTime", time.Since(startTime)),
)
return nil
}
Distributed Tracing
For complex microservices interactions, OpenTelemetry provides distributed tracing:
import (
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/trace"
)
func processOrder(ctx context.Context, orderID string) error {
tracer := otel.Tracer("orders-service")
ctx, span := tracer.Start(ctx, "ProcessOrder")
defer span.End()
span.SetAttributes(attribute.String("orderID", orderID))
// Add error details if things go wrong
order, err := getOrder(ctx, orderID)
if err != nil {
span.RecordError(err)
span.SetStatus(codes.Error, "Failed to retrieve order")
return err
}
// Create child span for payment processing
paymentCtx, paymentSpan := tracer.Start(ctx, "ProcessPayment")
paymentResult, err := processPayment(paymentCtx, order)
paymentSpan.End()
if err != nil {
span.RecordError(err)
span.SetStatus(codes.Error, "Payment processing failed")
return err
}
span.SetAttributes(attribute.String("paymentID", paymentResult.ID))
return nil
}
Runtime Performance Analysis
For identifying performance issues, go’s built-in profiling is invaluable:
import (
"net/http"
_ "net/http/pprof"
)
func main() {
// Start pprof server on a separate port for Kubernetes access
go func() {
http.ListenAndServe(":6060", nil)
}()
// Rest of your application...
}
Then you can use port forwarding to access the profiles:
kubectl port-forward deployment/myapp 6060:6060
And view them in your browser at http://localhost:6060/debug/pprof/ or use the go tool:
go tool pprof http://localhost:6060/debug/pprof/heap
Kubernetes-Specific Debugging
Kubernetes adds its own layer of complexity. Here are some useful approaches:
- Use init containers for debugging setup:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-debug
spec:
template:
spec:
initContainers:
- name: debug-tools
image: busybox
command: ['sh', '-c', 'echo "Debug mode enabled"; mkdir -p /tmp/debug; chmod 777 /tmp/debug']
volumeMounts:
- name: debug-volume
mountPath: /tmp/debug
containers:
- name: myapp
image: myapp:debug
volumeMounts:
- name: debug-volume
mountPath: /tmp/debug
volumes:
- name: debug-volume
emptyDir: {}
- Inject environment flags for verbose logging:
env:
- name: DEBUG_LEVEL
value: "trace"
- name: ENABLE_PPROF
value: "true"
- name: LOG_HTTP_REQUESTS
value: "true"
Then in your application:
func setupLogging() {
logLevel := os.Getenv("DEBUG_LEVEL")
var zapLevel zapcore.Level
switch strings.ToLower(logLevel) {
case "trace":
zapLevel = zap.DebugLevel
// Enable more verbose tracing
os.Setenv("OTEL_TRACES_SAMPLER", "always_on")
case "debug":
zapLevel = zap.DebugLevel
case "info":
zapLevel = zap.InfoLevel
default:
zapLevel = zap.InfoLevel
}
logConfig := zap.Config{
Level: zap.NewAtomicLevelAt(zapLevel),
Encoding: "json",
OutputPaths: []string{"stdout"},
ErrorOutputPaths: []string{"stderr"},
// Other config...
}
logger, _ := logConfig.Build()
zap.ReplaceGlobals(logger)
}
Prevention Strategies
While fixing bugs is important, preventing them is even better. Here are strategies to avoid common Go microservice issues.
Use Static Analysis Tools
Incorporate linters in your CI/CD pipeline:
# .github/workflows/go.yml
name: Go
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-go@v2
with:
go-version: 1.18
- name: golangci-lint
uses: golangci/golangci-lint-action@v2
with:
version: v1.45
Configure .golangci.yml for microservice-specific checks:
linters:
enable:
- errcheck # Check for unchecked errors
- gosec # Security checks
- govet # Reports suspicious constructs
- staticcheck # Advanced static analysis
- bodyclose # Checks whether HTTP response bodies are closed
- contextcheck # Check whether context is propagated correctly
- noctx # Find places where context.Context should be used
- exhaustive # Check exhaustiveness of enum switch statements
- gocognit # Check cognitive complexity
- goconst # Find repeated constants that could be constants
- gocyclo # Check cyclomatic complexity
- godot # Check if comments end with a period
- gofmt # Check if the code is formatted
- misspell # Check for misspellings
- unconvert # Remove unnecessary type conversions
- unparam # Find unused parameters
Write Effective Tests for Concurrency Issues
Testing concurrent code is challenging. Here’s a pattern for testing goroutine leaks:
func TestNoGoroutineLeaks(t *testing.T) {
// Record starting number of goroutines
startGoroutines := runtime.NumGoroutine()
// Run the code that might leak goroutines
processRequests(testRequests)
// Allow any remaining work to finish
time.Sleep(100 * time.Millisecond)
// Check if goroutines increased
endGoroutines := runtime.NumGoroutine()
if endGoroutines > startGoroutines {
t.Errorf("Goroutine leak: started with %d, ended with %d",
startGoroutines, endGoroutines)
}
}
For testing concurrent behavior, the race detector is invaluable:
go test -race ./...
Use Context for Cancellation and Timeouts
Propagating context is critical in microservices:
func (s *Service) GetUserData(ctx context.Context, userID string) (*UserData, error) {
// Check if context is already cancelled
if ctx.Err() != nil {
return nil, ctx.Err()
}
// Create a timeout for this operation if not already set
timeoutCtx, cancel := context.WithTimeout(ctx, 5*time.Second)
defer cancel()
// Make database call with context
user, err := s.db.GetUserWithContext(timeoutCtx, userID)
if err != nil {
if errors.Is(err, context.DeadlineExceeded) {
return nil, fmt.Errorf("database timeout: %w", err)
}
return nil, fmt.Errorf("database error: %w", err)
}
// Call another service with the same context
preferences, err := s.preferencesClient.GetPreferences(timeoutCtx, userID)
if err != nil {
return nil, fmt.Errorf("preferences service error: %w", err)
}
// Combine and return results
return &UserData{
User: user,
Preferences: preferences,
}, nil
}
Design for Failure
Assume all external calls can fail and design accordingly:
func getProductDetails(ctx context.Context, productID string) (*ProductDetails, error) {
// Try to get from cache first
if details, found := cache.Get(productID); found {
return details, nil
}
// Set a short timeout for this specific call
ctx, cancel := context.WithTimeout(ctx, 500*time.Millisecond)
defer cancel()
// Try to get from primary service
details, err := primaryClient.GetProductDetails(ctx, productID)
if err == nil {
// Success - cache and return
cache.Set(productID, details, 5*time.Minute)
return details, nil
}
// Log the error but don't fail yet
log.Printf("Primary service failed: %v", err)
// Try backup service
backupCtx, backupCancel := context.WithTimeout(context.Background(), 1*time.Second)
defer backupCancel()
details, err = backupClient.GetProductDetails(backupCtx, productID)
if err == nil {
// Success from backup - cache and return
cache.Set(productID, details, 1*time.Minute) // Shorter TTL for backup data
return details, nil
}
// Both services failed - check if we have stale data
if staleDeta, found := cache.GetStale(productID); found {
log.Printf("Returning stale data for product %s", productID)
return staleDeta, nil
}
// Complete failure - return error
return nil, fmt.Errorf("failed to get product details: %w", err)
}
Use Kubernetes Readiness and Liveness Probes
Configure probes that detect service health:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
template:
spec:
containers:
- name: myapp
image: myapp:v1
ports:
- containerPort: 8080
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 15
periodSeconds: 20
Implement comprehensive health checks:
func setupHealthChecks(router *http.ServeMux) {
// Basic liveness check - just confirms the server is responding
router.HandleFunc("/health/live", func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
w.Write([]byte("OK"))
})
// Readiness check - confirms the service can process requests
router.HandleFunc("/health/ready", func(w http.ResponseWriter, r *http.Request) {
// Check database connection
if err := db.PingContext(r.Context()); err != nil {
log.Printf("Database connection check failed: %v", err)
w.WriteHeader(http.StatusServiceUnavailable)
w.Write([]byte("Database unavailable"))
return
}
// Check dependencies
if !checkDependencies(r.Context()) {
w.WriteHeader(http.StatusServiceUnavailable)
w.Write([]byte("Dependencies unavailable"))
return
}
w.WriteHeader(http.StatusOK)
w.Write([]byte("Ready"))
})
// Detailed health status
router.HandleFunc("/health/status", func(w http.ResponseWriter, r *http.Request) {
status := map[string]interface{}{
"status": "OK",
"version": version,
"timestamp": time.Now().Format(time.RFC3339),
"goroutines": runtime.NumGoroutine(),
"uptime": time.Since(startTime).String(),
"dependencies": checkDependencyStatuses(r.Context()),
}
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(status)
})
}
Conclusion
Go microservices in Kubernetes environments offer tremendous advantages but come with their own set of challenges. By understanding common pitfalls around goroutine management, channel usage, memory handling, and performance optimization, you can build more robust and reliable systems.
Key takeaways:
- Manage Goroutine Lifecycles: Use worker pools, WaitGroups, and contexts to prevent leaks
- Design Channels Carefully: Use buffered channels, select statements, and proper closure patterns
- Optimize Memory Usage: Be aware of slice references, streaming processing, and memory limits
- Improve Performance: Use builders for string concatenation, optimize JSON handling, and be mindful of copying
- Debug Effectively: Implement structured logging, distributed tracing, and runtime profiling
- Prevent Issues: Use static analysis, write effective tests, and design for failure
By following these practices, you can build Go microservices that are resilient, performant, and maintainable, even at scale in complex Kubernetes environments.
The code examples in this article are for illustration purposes and may require adaptation for your specific use cases.