Complete Guide to Observability in Go: Logging, Metrics, and Tracing with OpenTelemetry and Prometheus
Modern applications, especially those built using microservices architecture, can be challenging to debug and monitor. Observability—the ability to understand a system’s internal state by examining its outputs—has become essential for maintaining reliable services. For Go applications, implementing comprehensive observability involves three pillars: logging, metrics, and distributed tracing.
This guide provides a practical approach to implementing observability in Go applications using industry-standard tools: structured logging with Zap, metrics with Prometheus, and distributed tracing with OpenTelemetry.
Understanding Observability
Observability consists of three primary components:
- Logging: Discrete events that provide context about what’s happening in your application
- Metrics: Numerical data captured at regular intervals to monitor system health and performance
- Tracing: Tracking the flow of requests through distributed systems
Each component serves a distinct purpose, and together they offer a complete view of your application’s behavior.
Structured Logging with Zap
While Go’s standard library includes a basic logging package, structured logging libraries like Zap provide superior performance and flexibility.
Setting Up Zap Logger
First, install Zap:
go get -u go.uber.org/zap
Next, create a logger configuration:
package logger
import (
"go.uber.org/zap"
"go.uber.org/zap/zapcore"
)
var log *zap.Logger
// Initialize sets up the global logger
func Initialize(environment string) {
var config zap.Config
if environment == "production" {
config = zap.NewProductionConfig()
config.EncoderConfig.EncodeTime = zapcore.ISO8601TimeEncoder
} else {
config = zap.NewDevelopmentConfig()
config.EncoderConfig.EncodeLevel = zapcore.CapitalColorLevelEncoder
}
var err error
log, err = config.Build()
if err != nil {
panic(err)
}
}
// Get returns the global logger
func Get() *zap.Logger {
return log
}
// Sync flushes any buffered log entries
func Sync() error {
return log.Sync()
}
Using the Logger
Now you can use the logger throughout your application:
package main
import (
"yourpackage/logger"
"go.uber.org/zap"
)
func main() {
// Initialize the logger
logger.Initialize("development")
defer logger.Sync()
log := logger.Get()
// Structured logging with context
log.Info("Server starting",
zap.String("env", "development"),
zap.Int("port", 8080),
zap.String("version", "1.0.0"),
)
// Log with error context
err := startServer()
if err != nil {
log.Error("Failed to start server",
zap.Error(err),
zap.String("server_type", "http"),
)
}
}
Logging Best Practices
- Use Structured Logging: Always include relevant fields for better search and analysis
- Appropriate Log Levels: Use Debug for development information, Info for operational events, Warn for concerning situations, Error for failures, and Fatal for critical errors that require immediate attention
- Include Context: Add request IDs, user IDs, and other relevant context to connect related log entries
- Avoid Sensitive Data: Never log passwords, authentication tokens, or personal information
- Log Sparingly: Excessive logging impacts performance and creates noise
Metrics with Prometheus
Prometheus is a powerful monitoring system and time series database that integrates well with Go applications.
Setting Up Prometheus
Install the Prometheus client:
go get github.com/prometheus/client_golang/prometheus
go get github.com/prometheus/client_golang/prometheus/promauto
go get github.com/prometheus/client_golang/prometheus/promhttp
Create a metrics package:
package metrics
import (
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
)
var (
// RequestCounter tracks HTTP requests
RequestCounter = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "http_requests_total",
Help: "Total number of HTTP requests",
},
[]string{"method", "path", "status"},
)
// RequestDuration tracks latency
RequestDuration = promauto.NewHistogramVec(
prometheus.HistogramOpts{
Name: "http_request_duration_seconds",
Help: "HTTP request duration in seconds",
Buckets: prometheus.DefBuckets,
},
[]string{"method", "path", "status"},
)
// ActiveRequests tracks concurrent requests
ActiveRequests = promauto.NewGauge(
prometheus.GaugeOpts{
Name: "http_requests_active",
Help: "Number of active HTTP requests",
},
)
// DatabaseOperations tracks database operation counts
DatabaseOperations = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "database_operations_total",
Help: "Total number of database operations",
},
[]string{"operation", "table", "status"},
)
// DatabaseDuration tracks database operation latency
DatabaseDuration = promauto.NewHistogramVec(
prometheus.HistogramOpts{
Name: "database_operation_duration_seconds",
Help: "Database operation duration in seconds",
Buckets: prometheus.DefBuckets,
},
[]string{"operation", "table"},
)
)
Exposing Metrics Endpoint
In your HTTP server setup, add the Prometheus metrics endpoint:
package main
import (
"net/http"
"time"
"github.com/prometheus/client_golang/prometheus/promhttp"
"yourpackage/logger"
"yourpackage/metrics"
)
func main() {
logger.Initialize("development")
defer logger.Sync()
log := logger.Get()
// Expose metrics endpoint
http.Handle("/metrics", promhttp.Handler())
// Register your application routes
http.HandleFunc("/api/users", instrumentHandler(handleUsers))
log.Info("Starting server on :8080")
if err := http.ListenAndServe(":8080", nil); err != nil {
log.Error("Server failed", err)
}
}
// instrumentHandler wraps HTTP handlers with metrics tracking
func instrumentHandler(next http.HandlerFunc) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
start := time.Now()
metrics.ActiveRequests.Inc()
// Custom response writer to capture status code
sw := statusWriter{ResponseWriter: w, status: http.StatusOK}
// Call the actual handler
next(&sw, r)
// Record metrics
duration := time.Since(start).Seconds()
metrics.RequestDuration.WithLabelValues(r.Method, r.URL.Path, sw.StatusString()).Observe(duration)
metrics.RequestCounter.WithLabelValues(r.Method, r.URL.Path, sw.StatusString()).Inc()
metrics.ActiveRequests.Dec()
}
}
// statusWriter wraps http.ResponseWriter to capture status codes
type statusWriter struct {
http.ResponseWriter
status int
}
func (w *statusWriter) WriteHeader(status int) {
w.status = status
w.ResponseWriter.WriteHeader(status)
}
func (w *statusWriter) Write(b []byte) (int, error) {
if w.status == 0 {
w.status = http.StatusOK
}
return w.ResponseWriter.Write(b)
}
func (w *statusWriter) StatusString() string {
return http.StatusText(w.status)
}
func handleUsers(w http.ResponseWriter, r *http.Request) {
// Handler implementation
}
Instrumenting Database Operations
For database operations, wrap your repository methods with metrics:
func (r *UserRepository) GetUser(id string) (*User, error) {
start := time.Now()
tableLabel := "users"
operationLabel := "get"
user, err := r.findByID(id)
duration := time.Since(start).Seconds()
metrics.DatabaseDuration.WithLabelValues(operationLabel, tableLabel).Observe(duration)
status := "success"
if err != nil {
status = "error"
}
metrics.DatabaseOperations.WithLabelValues(operationLabel, tableLabel, status).Inc()
return user, err
}
Key Metrics to Collect
- Request Rates: Total requests per endpoint/service
- Error Rates: Failed requests by type and endpoint
- Latency: Distribution of request durations
- Resource Utilization: CPU, memory, disk, and network usage
- Business Metrics: User signups, orders placed, or other application-specific metrics
Distributed Tracing with OpenTelemetry
OpenTelemetry provides a vendor-neutral API for distributed tracing, allowing you to trace requests as they flow through your distributed system.
Setting Up OpenTelemetry
Install the required packages:
go get go.opentelemetry.io/otel
go get go.opentelemetry.io/otel/sdk
go get go.opentelemetry.io/otel/trace
go get go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc
go get go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp
Create a tracing package:
package tracing
import (
"context"
"log"
"time"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
"go.opentelemetry.io/otel/propagation"
"go.opentelemetry.io/otel/sdk/resource"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
semconv "go.opentelemetry.io/otel/semconv/v1.4.0"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials/insecure"
)
var tracer = otel.Tracer("yourapp")
// Initialize sets up the OpenTelemetry tracer
func Initialize(serviceName, environment, version string, collectorEndpoint string) func() {
// Create OTLP exporter
ctx := context.Background()
conn, err := grpc.DialContext(ctx, collectorEndpoint,
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithBlock(),
)
if err != nil {
log.Fatalf("Failed to create gRPC connection to collector: %v", err)
}
traceExporter, err := otlptracegrpc.New(ctx, otlptracegrpc.WithGRPCConn(conn))
if err != nil {
log.Fatalf("Failed to create trace exporter: %v", err)
}
// Resource identifies your service in the trace visualization UI
res, err := resource.New(ctx,
resource.WithAttributes(
semconv.ServiceNameKey.String(serviceName),
semconv.ServiceVersionKey.String(version),
semconv.DeploymentEnvironmentKey.String(environment),
),
)
if err != nil {
log.Fatalf("Failed to create resource: %v", err)
}
// Create trace provider
bsp := sdktrace.NewBatchSpanProcessor(traceExporter)
tracerProvider := sdktrace.NewTracerProvider(
sdktrace.WithSampler(sdktrace.AlwaysSample()),
sdktrace.WithResource(res),
sdktrace.WithSpanProcessor(bsp),
)
otel.SetTracerProvider(tracerProvider)
// Set global propagator
otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(
propagation.TraceContext{},
propagation.Baggage{},
))
return func() {
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
if err := tracerProvider.Shutdown(ctx); err != nil {
log.Fatalf("Failed to shutdown TracerProvider: %v", err)
}
}
}
// Tracer returns the application tracer
func Tracer() otel.Tracer {
return tracer
}
Instrumenting HTTP Handlers
Use the OpenTelemetry HTTP middleware to automatically create spans for each request:
package main
import (
"net/http"
"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
"yourpackage/logger"
"yourpackage/metrics"
"yourpackage/tracing"
)
func main() {
logger.Initialize("development")
defer logger.Sync()
log := logger.Get()
// Initialize tracing
shutdown := tracing.Initialize(
"user-service",
"development",
"1.0.0",
"otel-collector:4317",
)
defer shutdown()
// Expose metrics endpoint
http.Handle("/metrics", promhttp.Handler())
// Register your application routes with OpenTelemetry instrumentation
userHandler := http.HandlerFunc(handleUsers)
http.Handle("/api/users", otelhttp.NewHandler(
instrumentHandler(userHandler),
"users-endpoint",
otelhttp.WithMessageEvents(otelhttp.ReadEvents, otelhttp.WriteEvents),
))
log.Info("Starting server on :8080")
if err := http.ListenAndServe(":8080", nil); err != nil {
log.Error("Server failed", log.Error(err))
}
}
Manual Span Creation
For more granular control, create spans manually:
func processRequest(ctx context.Context, req *Request) (*Response, error) {
ctx, span := tracing.Tracer().Start(ctx, "processRequest")
defer span.End()
// Add attributes to span
span.SetAttributes(
attribute.String("user.id", req.UserID),
attribute.String("request.type", req.Type),
)
// Create a child span for database operation
dbCtx, dbSpan := tracing.Tracer().Start(ctx, "database.operation")
result, err := queryDatabase(dbCtx, req.Query)
if err != nil {
dbSpan.RecordError(err)
dbSpan.SetStatus(codes.Error, err.Error())
}
dbSpan.End()
if err != nil {
span.RecordError(err)
span.SetStatus(codes.Error, err.Error())
return nil, err
}
// Process the result
response := processResult(result)
return response, nil
}
Propagating Context in gRPC Calls
For gRPC clients, use OpenTelemetry interceptors:
import (
"go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc"
"google.golang.org/grpc"
)
func newGrpcClient(target string) (*grpc.ClientConn, error) {
conn, err := grpc.Dial(
target,
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithUnaryInterceptor(otelgrpc.UnaryClientInterceptor()),
grpc.WithStreamInterceptor(otelgrpc.StreamClientInterceptor()),
)
return conn, err
}
For gRPC servers:
func newGrpcServer() *grpc.Server {
return grpc.NewServer(
grpc.UnaryInterceptor(otelgrpc.UnaryServerInterceptor()),
grpc.StreamInterceptor(otelgrpc.StreamServerInterceptor()),
)
}
Integrating All Three Pillars
The most effective observability strategy integrates logging, metrics, and tracing. Here’s how to tie them together:
Connecting Logs to Traces
Add trace and span IDs to your logs:
package logger
import (
"context"
"go.opentelemetry.io/otel/trace"
"go.uber.org/zap"
)
// WithContext returns a logger with trace context information
func WithContext(ctx context.Context, baseLogger *zap.Logger) *zap.Logger {
spanContext := trace.SpanContextFromContext(ctx)
if !spanContext.IsValid() {
return baseLogger
}
return baseLogger.With(
zap.String("trace_id", spanContext.TraceID().String()),
zap.String("span_id", spanContext.SpanID().String()),
)
}
Usage:
func handleRequest(w http.ResponseWriter, r *http.Request) {
ctx := r.Context()
logger := logger.WithContext(ctx, logger.Get())
logger.Info("Processing request",
zap.String("method", r.Method),
zap.String("path", r.URL.Path),
)
// Process request...
}
Adding Metric Labels to Traces
Enrich your traces with the same labels used in metrics:
func processOrder(ctx context.Context, orderID string) error {
ctx, span := tracing.Tracer().Start(ctx, "process_order")
defer span.End()
span.SetAttributes(
attribute.String("order.id", orderID),
attribute.String("service", "order-processor"),
)
// ... process the order
return nil
}
Middleware that Ties Everything Together
Create a middleware that handles logs, metrics, and traces:
func observabilityMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
ctx := r.Context()
// Get trace information
spanContext := trace.SpanContextFromContext(ctx)
traceID := spanContext.TraceID().String()
// Start timer for metrics
start := time.Now()
metrics.ActiveRequests.Inc()
// Create a response writer wrapper to capture the status code
sw := statusWriter{ResponseWriter: w, status: http.StatusOK}
// Get logger with trace context
log := logger.WithContext(ctx, logger.Get())
// Log the incoming request
log.Info("Request started",
zap.String("method", r.Method),
zap.String("path", r.URL.Path),
zap.String("remote_addr", r.RemoteAddr),
zap.String("user_agent", r.UserAgent()),
)
// Call the actual handler
next.ServeHTTP(&sw, r)
// Calculate duration
duration := time.Since(start).Seconds()
// Record metrics
statusStr := http.StatusText(sw.status)
metrics.RequestDuration.WithLabelValues(r.Method, r.URL.Path, statusStr).Observe(duration)
metrics.RequestCounter.WithLabelValues(r.Method, r.URL.Path, statusStr).Inc()
metrics.ActiveRequests.Dec()
// Log the completed request
logLevel := zap.InfoLevel
if sw.status >= 400 {
logLevel = zap.ErrorLevel
}
log.Log(logLevel, "Request completed",
zap.Int("status", sw.status),
zap.String("status_text", statusStr),
zap.Float64("duration_seconds", duration),
)
})
}
HTTP Client with Observability
Create an HTTP client that propagates traces and records metrics:
func newHTTPClient() *http.Client {
return &http.Client{
Transport: otelhttp.NewTransport(
http.DefaultTransport,
otelhttp.WithSpanNameFormatter(func(operation string, r *http.Request) string {
return "HTTP " + r.Method + " " + r.URL.Path
}),
),
Timeout: 30 * time.Second,
}
}
func callExternalService(ctx context.Context, url string) ([]byte, error) {
req, err := http.NewRequestWithContext(ctx, "GET", url, nil)
if err != nil {
return nil, err
}
start := time.Now()
resp, err := newHTTPClient().Do(req)
duration := time.Since(start).Seconds()
// Record metrics
status := "error"
if err == nil {
status = strconv.Itoa(resp.StatusCode)
}
metrics.ExternalServiceCalls.WithLabelValues("GET", url, status).Inc()
metrics.ExternalServiceDuration.WithLabelValues("GET", url).Observe(duration)
if err != nil {
return nil, err
}
defer resp.Body.Close()
return io.ReadAll(resp.Body)
}
Setting Up an Observability Stack
To collect and visualize your observability data, set up:
- Collector: OpenTelemetry Collector to receive, process, and export telemetry data
- Storage: Prometheus for metrics, Jaeger for traces, and Elasticsearch for logs
- Visualization: Grafana for dashboards
Sample Docker Compose Configuration
version: '3'
services:
# OpenTelemetry Collector
otel-collector:
image: otel/opentelemetry-collector-contrib:latest
command: ["--config=/etc/otel-collector-config.yaml"]
volumes:
- ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
ports:
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
- "8888:8888" # Metrics
depends_on:
- jaeger
- prometheus
# Jaeger
jaeger:
image: jaegertracing/all-in-one:latest
ports:
- "16686:16686" # UI
- "14250:14250" # Model
# Prometheus
prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- "9090:9090"
# Grafana
grafana:
image: grafana/grafana:latest
volumes:
- ./grafana/provisioning:/etc/grafana/provisioning
ports:
- "3000:3000"
depends_on:
- prometheus
- elasticsearch
# Elasticsearch
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.16.2
environment:
- discovery.type=single-node
ports:
- "9200:9200"
# Kibana
kibana:
image: docker.elastic.co/kibana/kibana:7.16.2
ports:
- "5601:5601"
depends_on:
- elasticsearch
Best Practices for Go Observability
1. Consistent Naming Conventions
Use consistent naming for services, endpoints, and metrics:
- Service names: lowercase with dashes (e.g.,
user-service) - Metrics: snake_case with prefix (e.g.,
http_requests_total) - Spans: lowercase with underscores for operations (e.g.,
database_query)
2. Use Correlation IDs
Ensure every request has a unique ID that flows through all services:
func generateCorrelationID(r *http.Request) string {
id := r.Header.Get("X-Correlation-ID")
if id == "" {
id = uuid.New().String()
}
return id
}
func correlationMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
correlationID := generateCorrelationID(r)
ctx := context.WithValue(r.Context(), "correlation_id", correlationID)
w.Header().Set("X-Correlation-ID", correlationID)
next.ServeHTTP(w, r.WithContext(ctx))
})
}
3. Monitor System Resources
Include system-level metrics like CPU, memory, and disk usage:
var (
// System metrics
cpuUsage = promauto.NewGauge(prometheus.GaugeOpts{
Name: "system_cpu_usage",
Help: "Current CPU usage percentage",
})
memoryUsage = promauto.NewGauge(prometheus.GaugeOpts{
Name: "system_memory_bytes",
Help: "Current memory usage in bytes",
})
)
func recordSystemMetrics() {
go func() {
ticker := time.NewTicker(15 * time.Second)
defer ticker.Stop()
for range ticker.C {
var m runtime.MemStats
runtime.ReadMemStats(&m)
memoryUsage.Set(float64(m.Alloc))
// For full CPU metrics, consider using a library like gopsutil
}
}()
}
4. Track Custom Business Metrics
Don’t limit yourself to technical metrics; include business-level metrics:
var (
ordersProcessed = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "business_orders_processed_total",
Help: "Total number of orders processed",
},
[]string{"status", "payment_method"},
)
revenueGenerated = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "business_revenue_cents_total",
Help: "Total revenue in cents",
},
[]string{"product_category"},
)
)
func processOrder(order *Order) error {
// Process order...
// Record business metrics
ordersProcessed.WithLabelValues(order.Status, order.PaymentMethod).Inc()
revenueGenerated.WithLabelValues(order.ProductCategory).Add(float64(order.AmountCents))
return nil
}
5. Use Sampling Strategies for Production
For high-volume production systems, use sampling to reduce overhead:
// Create sampler based on environment
var sampler sdktrace.Sampler
if environment == "production" {
// Sample 10% of traces in production
sampler = sdktrace.ParentBased(
sdktrace.TraceIDRatioBased(0.1),
)
} else {
// Sample all traces in non-production
sampler = sdktrace.AlwaysSample()
}
// Use the sampler in trace provider
tracerProvider := sdktrace.NewTracerProvider(
sdktrace.WithSampler(sampler),
sdktrace.WithResource(res),
sdktrace.WithSpanProcessor(bsp),
)
Conclusion
Implementing comprehensive observability in Go applications requires careful integration of logging, metrics, and tracing. By using structured logging with Zap, metrics collection with Prometheus, and distributed tracing with OpenTelemetry, you can build a robust observability system that helps you understand your application’s behavior, troubleshoot issues, and optimize performance.
Remember that observability is not a one-time setup but an ongoing process. Continuously refine what you measure and how you visualize it based on your application’s evolving needs and the issues you encounter.
Start simple, focus on the most critical parts of your application, and gradually expand your observability coverage. The investment in good observability practices pays off many times over in reduced debugging time and improved system reliability.