Observability for Go Microservices in Kubernetes: Implementing Logging, Metrics, and Tracing
In today’s complex distributed systems, particularly microservices deployed on Kubernetes, understanding what’s happening inside your applications is crucial. This article provides a comprehensive guide to implementing observability in Go microservices through structured logging, metrics collection, and distributed tracing.
Observability for Go Microservices in Kubernetes
In the cloud-native world, observability has evolved from simple log aggregation and basic metrics to a comprehensive approach encompassing three pillars: logs, metrics, and traces. For Go microservices running on Kubernetes, implementing proper observability is essential for troubleshooting, performance optimization, and ensuring system reliability.
Section 1: Understanding the Observability Triad
Observability goes beyond mere monitoring. While monitoring tells you if a system is working, observability helps you understand why it isn’t working. An observable system is one that can be understood from the outside by examining its outputs.
The three pillars of observability are:
- Logs - Discrete text records of events that happened over time
- Metrics - Numeric representations of data measured over intervals of time
- Traces - Representations of a series of causally related distributed events
Why Observability Matters for Go Microservices
Go’s concurrency model and lightweight goroutines make it excellent for building microservices, but this same distributed nature creates challenges:
- Request flows span multiple services
- Performance bottlenecks can be difficult to pinpoint
- Errors may propagate through the system in non-obvious ways
- Resource usage needs to be tracked across numerous instances
Let’s explore how to implement each pillar effectively in Go microservices deployed on Kubernetes.
Section 2: Structured Logging in Go Microservices
Key Logging Concepts for Microservices
Traditional logging approaches fall short in microservice environments. Instead, we need:
- Structured logging - Machine-parseable logs with consistent fields
- Contextual information - Including request IDs, service names, etc.
- Centralized aggregation - Collecting logs from all services
- Log correlation - Ability to trace requests across services
Implementing Structured Logging with zerolog
Among the many Go logging libraries, zerolog stands out for its performance and JSON-native approach. Here’s how to implement it:
package main
import (
"os"
"time"
"github.com/rs/zerolog"
"github.com/rs/zerolog/log"
)
func main() {
// Configure global logger
zerolog.TimeFieldFormat = zerolog.TimeFormatUnix
zerolog.SetGlobalLevel(zerolog.InfoLevel)
// Add service name and environment to all logs
logger := log.With().
Str("service", "user-service").
Str("environment", os.Getenv("ENVIRONMENT")).
Logger()
// Replace global logger
log.Logger = logger
// Example log with structured fields
log.Info().
Str("user_id", "12345").
Str("action", "login").
Int("attempt", 1).
Msg("User login attempt")
// Log error with additional context
log.Error().
Err(errors.New("database connection failed")).
Str("db_host", "postgres-primary").
Msg("Failed to connect to database")
}
Handling Request Context and Correlation IDs
For proper request tracing, we need to propagate correlation IDs across service boundaries:
func LoggingMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// Generate or extract trace ID
traceID := r.Header.Get("X-Trace-ID")
if traceID == "" {
traceID = uuid.New().String()
}
// Add trace ID to response headers
w.Header().Set("X-Trace-ID", traceID)
// Create a request-scoped logger with trace ID
requestLogger := log.With().
Str("trace_id", traceID).
Str("method", r.Method).
Str("path", r.URL.Path).
Str("remote_addr", r.RemoteAddr).
Logger()
// Add logger to request context
ctx := requestLogger.WithContext(r.Context())
// Process the request with our new context
next.ServeHTTP(w, r.WithContext(ctx))
})
}
// Retrieve logger from context
func GetLogger(ctx context.Context) zerolog.Logger {
return log.Ctx(ctx).With().Logger()
}
Setting Up Log Collection in Kubernetes with Loki
To collect and centralize logs, we’ll use Grafana Loki, a horizontally-scalable log aggregation system:
# values.yaml for Loki Helm chart
loki:
enabled: true
persistence:
enabled: true
size: 10Gi
promtail:
enabled: true
config:
snippets:
extraScrapeConfigs: |
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scrape
action: keep
regex: true
- source_labels:
- __meta_kubernetes_pod_label_app
target_label: app
- source_labels:
- __meta_kubernetes_pod_node_name
target_label: node_name
- source_labels:
- __meta_kubernetes_namespace
target_label: namespace
- source_labels:
- __meta_kubernetes_pod_name
target_label: pod
Deploy Loki using Helm:
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm upgrade --install loki grafana/loki-stack --values values.yaml -n monitoring
Section 3: Metrics Collection with Prometheus
Key Metrics Concepts for Go Microservices
When instrumenting Go microservices, focus on these metric types:
- Counters - Cumulative metrics that only increase (e.g., request count)
- Gauges - Metrics that can increase and decrease (e.g., active goroutines)
- Histograms - Sample observations distributed in buckets (e.g., request duration)
- Summaries - Similar to histograms but with calculated quantiles
Implementing Prometheus Metrics in Go
Using the official Prometheus client library:
package main
import (
"net/http"
"time"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
)
var (
// Define a counter for HTTP requests
httpRequestsTotal = prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "http_requests_total",
Help: "Total number of HTTP requests by status code and method",
},
[]string{"code", "method", "path"},
)
// Define a histogram for HTTP request duration
httpRequestDuration = prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Name: "http_request_duration_seconds",
Help: "HTTP request duration in seconds",
Buckets: prometheus.DefBuckets,
},
[]string{"method", "path"},
)
// Define a gauge for active requests
httpActiveRequests = prometheus.NewGauge(
prometheus.GaugeOpts{
Name: "http_active_requests",
Help: "Number of active HTTP requests",
},
)
)
func init() {
// Register metrics with Prometheus
prometheus.MustRegister(httpRequestsTotal)
prometheus.MustRegister(httpRequestDuration)
prometheus.MustRegister(httpActiveRequests)
}
func MetricsMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// Track active requests
httpActiveRequests.Inc()
defer httpActiveRequests.Dec()
// Record start time
start := time.Now()
// Create a custom response writer to capture status code
rww := NewResponseWriterWrapper(w)
// Call the next handler
next.ServeHTTP(rww, r)
// Record metrics after request is processed
duration := time.Since(start).Seconds()
statusCode := rww.StatusCode
// Update request count metric
httpRequestsTotal.WithLabelValues(
string(statusCode),
r.Method,
r.URL.Path,
).Inc()
// Update duration metric
httpRequestDuration.WithLabelValues(
r.Method,
r.URL.Path,
).Observe(duration)
})
}
func main() {
// Register metrics endpoint
http.Handle("/metrics", promhttp.Handler())
// Register application endpoints with middleware
apiHandler := http.HandlerFunc(apiFunc)
http.Handle("/api/", MetricsMiddleware(apiHandler))
// Start server
http.ListenAndServe(":8080", nil)
}
Custom ResponseWriter for Status Code Tracking
type ResponseWriterWrapper struct {
http.ResponseWriter
StatusCode int
}
func NewResponseWriterWrapper(w http.ResponseWriter) *ResponseWriterWrapper {
return &ResponseWriterWrapper{w, http.StatusOK}
}
func (rww *ResponseWriterWrapper) WriteHeader(code int) {
rww.StatusCode = code
rww.ResponseWriter.WriteHeader(code)
}
Configure Prometheus in Kubernetes
Create a ServiceMonitor for Prometheus Operator to discover and scrape your Go services:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: go-microservices
namespace: monitoring
spec:
selector:
matchLabels:
app.kubernetes.io/part-of: go-microservices
namespaceSelector:
matchNames:
- default
- production
- staging
endpoints:
- port: metrics
path: /metrics
interval: 15s
scrapeTimeout: 14s
Section 4: Distributed Tracing with OpenTelemetry
Key Tracing Concepts
In distributed systems:
- A trace represents the entire journey of a request
- A span represents a unit of work within that trace
- Context propagation enables connecting spans across service boundaries
Implementing OpenTelemetry in Go
OpenTelemetry provides a unified API for tracing, metrics, and logging. Let’s focus on the tracing aspects:
package main
import (
"context"
"log"
"net/http"
"os"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
"go.opentelemetry.io/otel/propagation"
"go.opentelemetry.io/otel/sdk/resource"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
semconv "go.opentelemetry.io/otel/semconv/v1.17.0"
"go.opentelemetry.io/otel/trace"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials/insecure"
)
var tracer trace.Tracer
func initTracer() func() {
// OTLP exporter
ctx := context.Background()
// Create OTLP exporter
conn, err := grpc.DialContext(ctx, "otel-collector:4317",
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithBlock(),
)
if err != nil {
log.Fatalf("Failed to create gRPC connection: %v", err)
}
exporter, err := otlptracegrpc.New(ctx, otlptracegrpc.WithGRPCConn(conn))
if err != nil {
log.Fatalf("Failed to create OTLP trace exporter: %v", err)
}
// Resource with service information
res, err := resource.New(ctx,
resource.WithAttributes(
semconv.ServiceNameKey.String("user-service"),
semconv.ServiceVersionKey.String("1.0.0"),
semconv.DeploymentEnvironmentKey.String(os.Getenv("ENVIRONMENT")),
),
)
if err != nil {
log.Fatalf("Failed to create resource: %v", err)
}
// Create trace provider
tp := sdktrace.NewTracerProvider(
sdktrace.WithSampler(sdktrace.AlwaysSample()),
sdktrace.WithBatcher(exporter),
sdktrace.WithResource(res),
)
otel.SetTracerProvider(tp)
// Set global propagator for context extraction/injection
otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(
propagation.TraceContext{},
propagation.Baggage{},
))
// Get tracer
tracer = tp.Tracer("user-service")
// Return cleanup function
return func() {
if err := tp.Shutdown(ctx); err != nil {
log.Printf("Error shutting down tracer provider: %v", err)
}
}
}
func TracingMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// Extract context from headers
ctx := r.Context()
ctx = otel.GetTextMapPropagator().Extract(ctx, propagation.HeaderCarrier(r.Header))
// Start a new span
ctx, span := tracer.Start(ctx, r.URL.Path,
trace.WithSpanKind(trace.SpanKindServer),
trace.WithAttributes(
semconv.HTTPMethodKey.String(r.Method),
semconv.HTTPURLKey.String(r.URL.String()),
semconv.HTTPUserAgentKey.String(r.UserAgent()),
),
)
defer span.End()
// Add trace context to response headers for debugging
traceID := span.SpanContext().TraceID().String()
w.Header().Set("X-Trace-ID", traceID)
// Process the request with tracing context
next.ServeHTTP(w, r.WithContext(ctx))
})
}
// Example of creating a child span for a database operation
func queryDatabase(ctx context.Context, query string) ([]byte, error) {
ctx, span := tracer.Start(ctx, "database.query",
trace.WithAttributes(
semconv.DBSystemKey.String("postgresql"),
semconv.DBStatementKey.String(query),
),
)
defer span.End()
// Simulate database query
// In a real app, you would perform the actual query here
time.Sleep(100 * time.Millisecond)
// Simulate an occasional error
if rand.Intn(10) == 0 {
err := errors.New("database connection error")
span.RecordError(err)
span.SetStatus(codes.Error, "Database connection failed")
return nil, err
}
return []byte("result"), nil
}
Configure OpenTelemetry Collector in Kubernetes
Deploy the OpenTelemetry Collector to receive, process, and export telemetry data:
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: otel-collector
spec:
mode: deployment
config: |
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 10s
memory_limiter:
check_interval: 5s
limit_mib: 1000
resourcedetection:
detectors: [env, kubernetes]
timeout: 2s
k8sattributes:
extract:
metadata:
- k8s.namespace.name
- k8s.pod.name
- k8s.deployment.name
exporters:
logging:
verbosity: detailed
jaeger:
endpoint: jaeger-collector:14250
tls:
insecure: true
prometheus:
endpoint: 0.0.0.0:8889
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch, resourcedetection, k8sattributes]
exporters: [logging, jaeger]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch, resourcedetection, k8sattributes]
exporters: [logging, prometheus]
Section 5: Putting It All Together - Creating Observable Go Microservices
Integrating the Three Pillars
For maximum effectiveness, integrate logs, metrics, and traces:
func Handler(w http.ResponseWriter, r *http.Request) {
// Get context with trace
ctx := r.Context()
// Get current span from context
span := trace.SpanFromContext(ctx)
traceID := span.SpanContext().TraceID().String()
// Get logger from context and add trace ID
logger := GetLogger(ctx).With().
Str("trace_id", traceID).
Logger()
// Log with trace correlation
logger.Info().Msg("Processing request")
// Record start time for custom metric
startTime := time.Now()
// Process request (with potential errors)
result, err := processRequest(ctx, r)
if err != nil {
// Record error in span
span.RecordError(err)
span.SetStatus(codes.Error, "Request processing failed")
// Log error with tracing context
logger.Error().Err(err).Msg("Failed to process request")
// Update error metric
requestErrorsTotal.WithLabelValues(r.URL.Path).Inc()
http.Error(w, "Internal server error", http.StatusInternalServerError)
return
}
// Record processing duration
duration := time.Since(startTime).Seconds()
requestDuration.WithLabelValues(r.URL.Path).Observe(duration)
// Log success with timing
logger.Info().
Float64("duration_seconds", duration).
Msg("Request processed successfully")
// Write response
w.Header().Set("Content-Type", "application/json")
w.Write(result)
}
Observability-Ready Kubernetes Deployment
Here’s a Kubernetes deployment manifest with observability configurations:
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-service
labels:
app: user-service
app.kubernetes.io/part-of: go-microservices
spec:
replicas: 3
selector:
matchLabels:
app: user-service
template:
metadata:
labels:
app: user-service
app.kubernetes.io/part-of: go-microservices
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
spec:
containers:
- name: user-service
image: example/user-service:1.0.0
ports:
- containerPort: 8080
name: http
- containerPort: 8081
name: metrics
env:
- name: ENVIRONMENT
value: "production"
- name: LOG_LEVEL
value: "info"
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "otel-collector:4317"
- name: OTEL_SERVICE_NAME
value: "user-service"
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 100m
memory: 128Mi
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 30
timeoutSeconds: 5
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 5
timeoutSeconds: 3
Visualizing the Data - Grafana Dashboards
To get insights from your observability data, set up Grafana dashboards:
Log Exploration Dashboard
- Connect Grafana to Loki
- Create queries to filter logs by service, trace ID, and severity
Go Service Metrics Dashboard
- Graph request rates, errors, and duration (RED method)
- Monitor resource usage (CPU, memory, goroutines)
- Track key business metrics
Distributed Tracing Dashboard
- Connect Grafana to Jaeger
- Create service topology views
- Analyze trace spans and timing
Section 6: Advanced Observability Patterns
Contextual Logging with Trace Correlation
Enhance log messages with span IDs for precise correlation:
func LogFromContext(ctx context.Context, level zerolog.Level, msg string) {
span := trace.SpanFromContext(ctx)
spanCtx := span.SpanContext()
// Create log event with trace information
logEvent := log.WithLevel(level).
Str("trace_id", spanCtx.TraceID().String()).
Str("span_id", spanCtx.SpanID().String())
// Add parent span if available
if spanCtx.IsValid() && span.Parent().IsValid() {
logEvent = logEvent.Str("parent_id", span.Parent().SpanID().String())
}
// Add attributes from span as log fields
for _, kv := range span.Attributes() {
logEvent = logEvent.Interface(string(kv.Key), kv.Value.AsInterface())
}
logEvent.Msg(msg)
}
Health Checks with Observability Data
Implement intelligent health checks that leverage metrics and trace data:
func HealthHandler(w http.ResponseWriter, r *http.Request) {
ctx := r.Context()
// Check database connectivity
dbHealthy := checkDatabaseHealth(ctx)
// Check error rate metric (failing if above threshold)
errorRate := calculateErrorRate()
errorRateHealthy := errorRate < 0.05 // Less than 5% errors
// Check latency metric (failing if above threshold)
p99Latency := getP99Latency()
latencyHealthy := p99Latency < 500*time.Millisecond
// Overall health status
healthy := dbHealthy && errorRateHealthy && latencyHealthy
// Create health response
health := map[string]interface{}{
"status": map[string]bool{
"healthy": healthy,
"database": dbHealthy,
"error_rate": errorRateHealthy,
"latency": latencyHealthy,
},
"metrics": map[string]interface{}{
"error_rate": errorRate,
"p99_latency_ms": p99Latency.Milliseconds(),
},
}
w.Header().Set("Content-Type", "application/json")
if !healthy {
w.WriteHeader(http.StatusServiceUnavailable)
}
json.NewEncoder(w).Encode(health)
}
Feature Flag Impact Analysis
Use observability data to measure the impact of feature flags:
func processWithFeatureFlag(ctx context.Context, r *http.Request) ([]byte, error) {
// Check if feature is enabled
featureEnabled := featureFlags.IsEnabled("new-algorithm")
// Add feature flag information to current span
span := trace.SpanFromContext(ctx)
span.SetAttributes(attribute.Bool("feature.new_algorithm", featureEnabled))
// Start timing
startTime := time.Now()
var result []byte
var err error
if featureEnabled {
// New algorithm path
result, err = newAlgorithm(ctx, r)
// Record metrics for new algorithm
algorithmProcessingTime.WithLabelValues("new").Observe(time.Since(startTime).Seconds())
} else {
// Legacy algorithm path
result, err = legacyAlgorithm(ctx, r)
// Record metrics for legacy algorithm
algorithmProcessingTime.WithLabelValues("legacy").Observe(time.Since(startTime).Seconds())
}
// Record success/failure by algorithm version
if err != nil {
algorithmErrors.WithLabelValues(
featureEnabled ? "new" : "legacy",
).Inc()
}
return result, err
}
Conclusion: Observability as a Culture
Implementing observability in Go microservices on Kubernetes is not just about tools and code. It requires cultural changes:
- Shift-left observability - Instrumenting code from the start
- SLOs and error budgets - Defining what “good” looks like
- Continuous improvement - Using observability data to drive optimizations
- Debugging mindset - Designing systems to answer “why” questions
- Democratized access - Making observability data available to all teams
By implementing the three pillars of observability (logs, metrics, and traces) in your Go microservices, you gain unparalleled insight into your distributed systems. This visibility enables you to diagnose issues faster, optimize performance effectively, and build more reliable applications.
When properly instrumented, your Go microservices will tell you their story through structured logs, detailed metrics, and comprehensive trace data. This observability becomes your competitive advantage in managing complex distributed systems.