Advanced Go Profiling and Tracing: Pinpointing Performance Issues in Production // Support Tools

When your Go microservices are running smoothly in development but struggling in production, you need advanced profiling and tracing techniques that work without impacting your users. This comprehensive guide shows you how to implement low-overhead continuous profiling and distributed tracing to identify performance bottlenecks in your Go applications.

Introduction

Performance issues in production often differ dramatically from what you see in development environments. In our production systems, we’ve repeatedly found that theoretical optimizations don’t always match real-world needs. The solution? Implementing robust profiling and tracing systems that can run continuously in production with minimal overhead.

This article builds on our previous exploration of optimizing Go data layers by focusing on how to identify exactly what needs optimization in the first place.

Section 1: Continuous Profiling with pprof

The standard Go profiler (pprof) is powerful but traditionally challenging to use in production due to overhead concerns. Let’s implement a low-impact continuous profiling system that automatically captures profiles during performance anomalies.

Setting Up Continuous Profiling

First, let’s add a basic pprof HTTP endpoint to your application:

import (
    "net/http"
    _ "net/http/pprof" // Import for side effects
    "log"
)

func main() {
    // Your regular application code
    
    // Start pprof server on a separate port (not exposed externally)
    go func() {
        log.Println("Starting pprof server on :6060")
        log.Println(http.ListenAndServe("localhost:6060", nil))
    }()
    
    // Your application continues...
}

While this provides on-demand profiling, we need something more sophisticated for production. Let’s create an adaptive profiler that throttles itself based on system load:

package profiler

import (
    "os"
    "runtime"
    "runtime/pprof"
    "time"
    "sync"
    "fmt"
    "context"
)

type AdaptiveProfiler struct {
    // Configuration
    profileDir      string
    cpuThreshold    float64  // CPU threshold to trigger profiling (0-1)
    memThreshold    float64  // Memory threshold (0-1)
    minInterval     time.Duration
    profileDuration time.Duration
    
    // State
    lastProfile     time.Time
    mutex           sync.Mutex
    isRunning       bool
}

func NewAdaptiveProfiler(profileDir string) *AdaptiveProfiler {
    return &AdaptiveProfiler{
        profileDir:      profileDir,
        cpuThreshold:    0.70,  // Start profiling at 70% CPU
        memThreshold:    0.80,  // Start profiling at 80% memory
        minInterval:     10 * time.Minute,
        profileDuration: 30 * time.Second,
        lastProfile:     time.Time{},
    }
}

func (p *AdaptiveProfiler) Start(ctx context.Context) {
    go p.monitor(ctx)
}

func (p *AdaptiveProfiler) monitor(ctx context.Context) {
    ticker := time.NewTicker(15 * time.Second)
    defer ticker.Stop()
    
    for {
        select {
        case <-ticker.C:
            p.checkAndProfile()
        case <-ctx.Done():
            return
        }
    }
}

func (p *AdaptiveProfiler) checkAndProfile() {
    p.mutex.Lock()
    defer p.mutex.Unlock()
    
    if p.isRunning {
        return
    }
    
    // Check if we've profiled recently
    if time.Since(p.lastProfile) < p.minInterval {
        return
    }
    
    // Check system load
    var m runtime.MemStats
    runtime.ReadMemStats(&m)
    
    memUsage := float64(m.Alloc) / float64(m.Sys)
    
    // Use GOMAXPROCS as an approximation for available CPU
    cpus := runtime.GOMAXPROCS(0)
    var cpuUsage float64
    // Here you would calculate CPU usage
    // For a real implementation, you'd use something like:
    // - Read /proc/stat on Linux
    // - Use syscall.GetProcessTimes on Windows
    // For simplicity, we'll assume a function exists
    cpuUsage = getCPUUsage(cpus)
    
    // If thresholds are exceeded, profile
    if cpuUsage > p.cpuThreshold || memUsage > p.memThreshold {
        p.isRunning = true
        go p.captureProfiles()
    }
}

func getCPUUsage(cpus int) float64 {
    // Implementation depends on OS
    // Simple implementation that would need to be replaced
    return 0.5 // Placeholder
}

func (p *AdaptiveProfiler) captureProfiles() {
    timestamp := time.Now().Format("20060102-150405")
    
    // Capture CPU profile
    cpuFile, err := os.Create(fmt.Sprintf("%s/cpu-%s.pprof", p.profileDir, timestamp))
    if err != nil {
        // Log error and continue
        fmt.Printf("Error creating CPU profile: %v\n", err)
    } else {
        runtime.GC() // Run GC before profiling
        pprof.StartCPUProfile(cpuFile)
        time.Sleep(p.profileDuration) // Profile for N seconds
        pprof.StopCPUProfile()
        cpuFile.Close()
    }
    
    // Capture memory profile
    memFile, err := os.Create(fmt.Sprintf("%s/mem-%s.pprof", p.profileDir, timestamp))
    if err != nil {
        fmt.Printf("Error creating memory profile: %v\n", err)
    } else {
        runtime.GC() // Run GC before profiling
        if err := pprof.WriteHeapProfile(memFile); err != nil {
            fmt.Printf("Error writing memory profile: %v\n", err)
        }
        memFile.Close()
    }
    
    // Capture goroutine profile
    goroutineFile, err := os.Create(fmt.Sprintf("%s/goroutine-%s.pprof", p.profileDir, timestamp))
    if err != nil {
        fmt.Printf("Error creating goroutine profile: %v\n", err)
    } else {
        p := pprof.Lookup("goroutine")
        if p != nil {
            p.WriteTo(goroutineFile, 0)
        }
        goroutineFile.Close()
    }
    
    // Mark profiling complete
    p.mutex.Lock()
    p.lastProfile = time.Now()
    p.isRunning = false
    p.mutex.Unlock()
}

Use this adaptive profiler in your application:

func main() {
    // Create a profile directory
    profileDir := "/var/log/myapp/profiles"
    os.MkdirAll(profileDir, 0755)
    
    // Create and start the adaptive profiler
    profiler := NewAdaptiveProfiler(profileDir)
    profiler.Start(context.Background())
    
    // Rest of your application...
}

Analyzing Captured Profiles

To analyze the captured profiles, you can use the standard Go tools:

go tool pprof -http=:8080 /var/log/myapp/profiles/cpu-20250624-120000.pprof

For regular analysis, implement an automation script that processes profiles and generates reports:

package main

import (
    "fmt"
    "os"
    "os/exec"
    "path/filepath"
    "strings"
    "time"
)

func main() {
    profileDir := "/var/log/myapp/profiles"
    reportDir := "/var/log/myapp/reports"
    
    // Create report directory
    os.MkdirAll(reportDir, 0755)
    
    // Find all CPU profiles from the last 24 hours
    yesterday := time.Now().Add(-24 * time.Hour)
    
    err := filepath.Walk(profileDir, func(path string, info os.FileInfo, err error) error {
        if err != nil {
            return err
        }
        
        // Only process CPU profiles
        if !strings.HasPrefix(filepath.Base(path), "cpu-") {
            return nil
        }
        
        // Check if the file is recent enough
        if info.ModTime().Before(yesterday) {
            return nil
        }
        
        // Generate SVG for this profile
        svgPath := filepath.Join(reportDir, strings.TrimSuffix(filepath.Base(path), ".pprof")+".svg")
        cmd := exec.Command("go", "tool", "pprof", "-svg", path)
        svg, err := cmd.Output()
        if err != nil {
            fmt.Printf("Error generating SVG for %s: %v\n", path, err)
            return nil
        }
        
        // Write SVG to file
        err = os.WriteFile(svgPath, svg, 0644)
        if err != nil {
            fmt.Printf("Error writing SVG to %s: %v\n", svgPath, err)
        }
        
        // Also generate a text report
        txtPath := filepath.Join(reportDir, strings.TrimSuffix(filepath.Base(path), ".pprof")+".txt")
        cmd = exec.Command("go", "tool", "pprof", "-top", path)
        txt, err := cmd.Output()
        if err != nil {
            fmt.Printf("Error generating text report for %s: %v\n", path, err)
            return nil
        }
        
        // Write text report to file
        err = os.WriteFile(txtPath, txt, 0644)
        if err != nil {
            fmt.Printf("Error writing text report to %s: %v\n", txtPath, err)
        }
        
        return nil
    })
    
    if err != nil {
        fmt.Printf("Error walking profile directory: %v\n", err)
    }
}

Section 2: Distributed Tracing with OpenTelemetry

Profiling helps identify inefficient code, but distributed tracing shows you the bigger picture of your application’s behavior across services. OpenTelemetry is the industry standard for distributed tracing and observability.

Setting Up OpenTelemetry in Go

First, add the required dependencies:

import (
    "context"
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/attribute"
    "go.opentelemetry.io/otel/exporters/otlp/otlptrace"
    "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
    "go.opentelemetry.io/otel/propagation"
    "go.opentelemetry.io/otel/sdk/resource"
    sdktrace "go.opentelemetry.io/otel/sdk/trace"
    semconv "go.opentelemetry.io/otel/semconv/v1.4.0"
    "go.opentelemetry.io/otel/trace"
    "google.golang.org/grpc"
    "time"
)

Now, initialize the OpenTelemetry trace provider:

func initTracer(ctx context.Context) (*sdktrace.TracerProvider, error) {
    // Create the OTLP exporter
    conn, err := grpc.DialContext(ctx, "otel-collector:4317", grpc.WithInsecure())
    if err != nil {
        return nil, err
    }
    
    exporter, err := otlptracegrpc.New(ctx, otlptracegrpc.WithGRPCConn(conn))
    if err != nil {
        return nil, err
    }
    
    // Create a resource describing this service
    res, err := resource.New(ctx,
        resource.WithAttributes(
            semconv.ServiceNameKey.String("user-service"),
            semconv.ServiceVersionKey.String("v1.0.0"),
            attribute.String("environment", "production"),
        ),
    )
    if err != nil {
        return nil, err
    }
    
    // Create the trace provider with a batch span processor
    bsp := sdktrace.NewBatchSpanProcessor(exporter)
    tracerProvider := sdktrace.NewTracerProvider(
        sdktrace.WithResource(res),
        sdktrace.WithSpanProcessor(bsp),
        // Sample 10% of traces in production for efficient resource usage
        sdktrace.WithSampler(sdktrace.TraceIDRatioBased(0.1)),
    )
    
    // Set the global trace provider
    otel.SetTracerProvider(tracerProvider)
    
    // Set the global propagator
    otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(
        propagation.TraceContext{},
        propagation.Baggage{},
    ))
    
    return tracerProvider, nil
}

Adding Tracing to Your Data Layer

Now that we have the tracing infrastructure set up, let’s add tracing to our database calls:

type TracedRepository struct {
    db     *sql.DB
    tracer trace.Tracer
}

func NewTracedRepository(db *sql.DB) *TracedRepository {
    return &TracedRepository{
        db:     db,
        tracer: otel.Tracer("data-repository"),
    }
}

func (r *TracedRepository) GetProductByID(ctx context.Context, id int) (*Product, error) {
    ctx, span := r.tracer.Start(ctx, "GetProductByID", 
        trace.WithAttributes(attribute.Int("product.id", id)))
    defer span.End()
    
    // Capture timing for the DB call
    startTime := time.Now()
    
    rows, err := r.db.QueryContext(ctx, "SELECT id, name, price FROM products WHERE id = ?", id)
    if err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, err.Error())
        return nil, err
    }
    defer rows.Close()
    
    // Record DB call duration
    span.SetAttributes(attribute.Float64("db.duration_ms", float64(time.Since(startTime).Milliseconds())))
    
    if !rows.Next() {
        span.SetAttributes(attribute.Bool("product.found", false))
        return nil, ErrProductNotFound
    }
    
    var product Product
    if err := rows.Scan(&product.ID, &product.Name, &product.Price); err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, err.Error())
        return nil, err
    }
    
    span.SetAttributes(attribute.Bool("product.found", true))
    
    // Get related data (potentially causing N+1 problem)
    if err := r.loadProductReviews(ctx, &product); err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, err.Error())
        return nil, err
    }
    
    return &product, nil
}

func (r *TracedRepository) loadProductReviews(ctx context.Context, product *Product) error {
    ctx, span := r.tracer.Start(ctx, "loadProductReviews",
        trace.WithAttributes(attribute.Int("product.id", product.ID)))
    defer span.End()
    
    // Example of a database call that might be inefficient
    rows, err := r.db.QueryContext(ctx, 
        "SELECT id, rating, comment FROM reviews WHERE product_id = ?", 
        product.ID)
    if err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, err.Error())
        return err
    }
    defer rows.Close()
    
    var reviews []Review
    for rows.Next() {
        var review Review
        if err := rows.Scan(&review.ID, &review.Rating, &review.Comment); err != nil {
            span.RecordError(err)
            span.SetStatus(codes.Error, err.Error())
            return err
        }
        reviews = append(reviews, review)
    }
    
    product.Reviews = reviews
    span.SetAttributes(attribute.Int("reviews.count", len(reviews)))
    
    return nil
}

Identifying Problems with Traces

The power of distributed tracing becomes evident when you have multiple services. Let’s look at a complete HTTP handler with tracing:

func (h *Handler) GetProductHandler(w http.ResponseWriter, r *http.Request) {
    ctx := r.Context()
    
    // Extract trace context from incoming request
    ctx = otel.GetTextMapPropagator().Extract(ctx, propagation.HeaderCarrier(r.Header))
    
    // Create a new span for this handler
    ctx, span := h.tracer.Start(ctx, "GetProductHandler")
    defer span.End()
    
    // Extract product ID from request
    productID, err := strconv.Atoi(chi.URLParam(r, "id"))
    if err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, "Invalid product ID")
        http.Error(w, "Invalid product ID", http.StatusBadRequest)
        return
    }
    
    span.SetAttributes(attribute.Int("product.id", productID))
    
    // Get product from repository
    product, err := h.repo.GetProductByID(ctx, productID)
    if err != nil {
        span.RecordError(err)
        
        if err == ErrProductNotFound {
            span.SetStatus(codes.Error, "Product not found")
            http.Error(w, "Product not found", http.StatusNotFound)
            return
        }
        
        span.SetStatus(codes.Error, "Failed to get product")
        http.Error(w, "Internal server error", http.StatusInternalServerError)
        return
    }
    
    // Get inventory from separate service
    inventory, err := h.getInventory(ctx, productID)
    if err != nil {
        span.RecordError(err)
        // Continue processing, inventory is not critical
        span.SetAttributes(attribute.Bool("inventory.available", false))
    } else {
        span.SetAttributes(attribute.Bool("inventory.available", true))
        span.SetAttributes(attribute.Int("inventory.quantity", inventory.Quantity))
        product.AvailableQuantity = inventory.Quantity
    }
    
    // Get pricing information (potentially from another service)
    pricing, err := h.getPricing(ctx, productID)
    if err != nil {
        span.RecordError(err)
        // Continue with default pricing
        span.SetAttributes(attribute.Bool("pricing.available", false))
    } else {
        span.SetAttributes(attribute.Bool("pricing.available", true))
        span.SetAttributes(attribute.Float64("pricing.discount", pricing.DiscountPercent))
        product.Price = pricing.CurrentPrice
        product.DiscountPercent = pricing.DiscountPercent
    }
    
    // Respond with JSON
    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(product)
}

func (h *Handler) getInventory(ctx context.Context, productID int) (*Inventory, error) {
    ctx, span := h.tracer.Start(ctx, "getInventory")
    defer span.End()
    
    span.SetAttributes(attribute.Int("product.id", productID))
    
    url := fmt.Sprintf("http://inventory-service/products/%d/inventory", productID)
    
    // Create HTTP request with trace context
    req, err := http.NewRequestWithContext(ctx, "GET", url, nil)
    if err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, err.Error())
        return nil, err
    }
    
    // Inject trace context into the outgoing HTTP request
    otel.GetTextMapPropagator().Inject(ctx, propagation.HeaderCarrier(req.Header))
    
    // Make HTTP request with timeout
    client := &http.Client{Timeout: 500 * time.Millisecond}
    resp, err := client.Do(req)
    if err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, err.Error())
        return nil, err
    }
    defer resp.Body.Close()
    
    if resp.StatusCode != http.StatusOK {
        err := fmt.Errorf("inventory service returned status %d", resp.StatusCode)
        span.RecordError(err)
        span.SetStatus(codes.Error, err.Error())
        return nil, err
    }
    
    var inventory Inventory
    if err := json.NewDecoder(resp.Body).Decode(&inventory); err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, err.Error())
        return nil, err
    }
    
    return &inventory, nil
}

With this tracing in place, you can visualize the complete request flow through your system, identify bottlenecks in service calls, database queries, and third-party integrations.

Section 3: Correlating Profiles with Traces

The real power comes when combining profiling with tracing. Here’s how to correlate them:

Add profile metadata to trace a specific endpoint:

func (h *Handler) GetProductHandler(w http.ResponseWriter, r *http.Request) {
    ctx := r.Context()
    
    // Start the span as before
    ctx, span := h.tracer.Start(ctx, "GetProductHandler")
    defer span.End()
    
    // Take a CPU profile sample if this is a traced request
    spanContext := span.SpanContext()
    if spanContext.IsSampled() {
        // Capture a 5-second profile
        profileFile := fmt.Sprintf("/var/log/myapp/profiles/endpoint-cpu-%s.pprof", 
            spanContext.TraceID().String())
            
        f, err := os.Create(profileFile)
        if err == nil {
            span.SetAttributes(attribute.String("cpu_profile", profileFile))
            pprof.StartCPUProfile(f)
            defer func() {
                pprof.StopCPUProfile()
                f.Close()
            }()
        }
    }
    
    // Rest of the handler as before
    // ...
}

Build a simple dashboard to correlate the two:

package main

import (
    "encoding/json"
    "fmt"
    "html/template"
    "net/http"
    "os"
    "path/filepath"
    "strings"
)

type Trace struct {
    ID        string `json:"id"`
    Name      string `json:"name"`
    Timestamp string `json:"timestamp"`
    Duration  int64  `json:"duration_ms"`
    CPUProfile string `json:"cpu_profile,omitempty"`
}

func main() {
    http.HandleFunc("/", handleRoot)
    http.HandleFunc("/traces", handleTraces)
    http.HandleFunc("/profiles/", handleProfiles)
    
    fmt.Println("Starting performance dashboard on :8080")
    http.ListenAndServe(":8080", nil)
}

func handleRoot(w http.ResponseWriter, r *http.Request) {
    tmpl := template.Must(template.New("index").Parse(`
        <!DOCTYPE html>
        <html>
        <head>
            <title>Performance Dashboard</title>
            <style>
                body { font-family: Arial, sans-serif; margin: 20px; }
                table { border-collapse: collapse; width: 100%; }
                th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }
                tr:nth-child(even) { background-color: #f2f2f2; }
                th { background-color: #4CAF50; color: white; }
            </style>
            <script>
                async function loadTraces() {
                    const resp = await fetch('/traces');
                    const traces = await resp.json();
                    const tbody = document.getElementById('traces-body');
                    tbody.innerHTML = '';
                    
                    for (const trace of traces) {
                        const row = document.createElement('tr');
                        
                        const idCell = document.createElement('td');
                        idCell.textContent = trace.id;
                        row.appendChild(idCell);
                        
                        const nameCell = document.createElement('td');
                        nameCell.textContent = trace.name;
                        row.appendChild(nameCell);
                        
                        const timeCell = document.createElement('td');
                        timeCell.textContent = trace.timestamp;
                        row.appendChild(timeCell);
                        
                        const durationCell = document.createElement('td');
                        durationCell.textContent = trace.duration_ms + 'ms';
                        row.appendChild(durationCell);
                        
                        const profileCell = document.createElement('td');
                        if (trace.cpu_profile) {
                            const link = document.createElement('a');
                            link.href = '/profiles/' + trace.cpu_profile;
                            link.textContent = 'View CPU Profile';
                            link.target = '_blank';
                            profileCell.appendChild(link);
                        } else {
                            profileCell.textContent = 'N/A';
                        }
                        row.appendChild(profileCell);
                        
                        tbody.appendChild(row);
                    }
                }
                
                window.onload = loadTraces;
            </script>
        </head>
        <body>
            <h1>Performance Dashboard</h1>
            <h2>Recent Traces with Profiles</h2>
            <table>
                <thead>
                    <tr>
                        <th>Trace ID</th>
                        <th>Name</th>
                        <th>Timestamp</th>
                        <th>Duration</th>
                        <th>CPU Profile</th>
                    </tr>
                </thead>
                <tbody id="traces-body">
                    <tr><td colspan="5">Loading...</td></tr>
                </tbody>
            </table>
        </body>
        </html>
    `))
    
    tmpl.Execute(w, nil)
}

func handleTraces(w http.ResponseWriter, r *http.Request) {
    // In a real implementation, you would query your tracing backend
    // For this example, we'll simulate some traces with profile links
    
    traces := []Trace{
        {
            ID:         "d6e73400d48cfb94e716cd8b1c31f6d6",
            Name:       "GetProductHandler",
            Timestamp:  "2025-06-23T15:32:12Z",
            Duration:   327,
            CPUProfile: "endpoint-cpu-d6e73400d48cfb94e716cd8b1c31f6d6.pprof",
        },
        {
            ID:         "b39a2e5a7b1dcf4e8736a9b1c0a45e2d",
            Name:       "GetProductHandler",
            Timestamp:  "2025-06-23T15:30:45Z",
            Duration:   542,
            CPUProfile: "endpoint-cpu-b39a2e5a7b1dcf4e8736a9b1c0a45e2d.pprof",
        },
        {
            ID:         "f8e9a1b2c3d4e5f6a7b8c9d0e1f2a3b4",
            Name:       "SearchProductsHandler",
            Timestamp:  "2025-06-23T15:28:33Z",
            Duration:   892,
            CPUProfile: "endpoint-cpu-f8e9a1b2c3d4e5f6a7b8c9d0e1f2a3b4.pprof",
        },
    }
    
    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(traces)
}

func handleProfiles(w http.ResponseWriter, r *http.Request) {
    profilePath := strings.TrimPrefix(r.URL.Path, "/profiles/")
    
    // Security check - ensure the path doesn't contain ".."
    if strings.Contains(profilePath, "..") {
        http.Error(w, "Invalid path", http.StatusBadRequest)
        return
    }
    
    // Path to profiles directory
    profileDir := "/var/log/myapp/profiles"
    fullPath := filepath.Join(profileDir, profilePath)
    
    // Generate an SVG visualization
    w.Header().Set("Content-Type", "image/svg+xml")
    
    // In a real implementation, you'd use the pprof tool to generate an SVG
    // Here we're simulating the output
    svg := `<?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <svg width="800" height="600" xmlns="http://www.w3.org/2000/svg">
        <rect width="800" height="600" fill="#f0f0f0"/>
        <text x="400" y="50" text-anchor="middle" font-family="Arial" font-size="24">
            CPU Profile Visualization for ` + profilePath + `
        </text>
        <text x="400" y="300" text-anchor="middle" font-family="Arial" font-size="18">
            [Actual SVG would be generated from the profile data]
        </text>
    </svg>`
    
    fmt.Fprint(w, svg)
}

Section 4: Practical Application - Finding and Fixing Real Issues

Let’s apply these techniques to identify and fix common performance problems:

1. Identifying N+1 Query Problems

The trace from our product handler might show:

GetProductHandler [328ms]
|
+-- GetProductByID [120ms]
|   |
|   +-- SQL Query: SELECT * FROM products WHERE id = ? [15ms]
|   |
|   +-- loadProductReviews [105ms]
|       |
|       +-- SQL Query: SELECT * FROM reviews WHERE product_id = ? [105ms]
|
+-- getInventory [80ms]
|
+-- getPricing [128ms]

Our trace clearly shows that loading product reviews takes 105ms, a significant portion of our request time. The pprof CPU profile might show that our database driver’s Scan method is consuming a lot of CPU time.

The solution is to optimize the query with a join:

func (r *TracedRepository) GetProductWithReviews(ctx context.Context, id int) (*Product, error) {
    ctx, span := r.tracer.Start(ctx, "GetProductWithReviews")
    defer span.End()
    
    // Use a JOIN query instead of separate queries
    query := `
        SELECT p.id, p.name, p.price, 
               r.id as review_id, r.rating, r.comment
        FROM products p
        LEFT JOIN reviews r ON p.id = r.product_id
        WHERE p.id = ?
    `
    
    rows, err := r.db.QueryContext(ctx, query, id)
    if err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, err.Error())
        return nil, err
    }
    defer rows.Close()
    
    var product *Product
    reviews := make(map[int]Review)
    
    for rows.Next() {
        var reviewID sql.NullInt64
        var rating sql.NullFloat64
        var comment sql.NullString
        
        if product == nil {
            product = &Product{
                Reviews: []Review{},
            }
            
            if err := rows.Scan(
                &product.ID, &product.Name, &product.Price,
                &reviewID, &rating, &comment,
            ); err != nil {
                span.RecordError(err)
                span.SetStatus(codes.Error, err.Error())
                return nil, err
            }
        } else {
            var productID int
            var name string
            var price float64
            
            if err := rows.Scan(
                &productID, &name, &price,
                &reviewID, &rating, &comment,
            ); err != nil {
                span.RecordError(err)
                span.SetStatus(codes.Error, err.Error())
                return nil, err
            }
        }
        
        // If we have a valid review, add it
        if reviewID.Valid {
            reviewIDInt := int(reviewID.Int64)
            if _, exists := reviews[reviewIDInt]; !exists {
                reviews[reviewIDInt] = Review{
                    ID:      reviewIDInt,
                    Rating:  rating.Float64,
                    Comment: comment.String,
                }
            }
        }
    }
    
    // Convert the map to a slice
    for _, review := range reviews {
        product.Reviews = append(product.Reviews, review)
    }
    
    return product, nil
}

After this change, our trace might look like:

GetProductHandler [215ms]
|
+-- GetProductWithReviews [30ms]
|   |
|   +-- SQL Query: SELECT products.*, reviews.* FROM products LEFT JOIN reviews... [30ms]
|
+-- getInventory [80ms]
|
+-- getPricing [105ms]

2. High Latency in Service Calls

Our trace also shows that the pricing service call takes 128ms, which is quite high. Looking at the code, we can see we’re not using connection pooling:

// Before:
client := &http.Client{Timeout: 500 * time.Millisecond}
resp, err := client.Do(req)

Let’s fix this by using a shared HTTP client with proper connection pooling:

// Global HTTP client with connection pooling
var httpClient = &http.Client{
    Timeout: 500 * time.Millisecond,
    Transport: &http.Transport{
        MaxIdleConns:        100,
        MaxIdleConnsPerHost: 20,
        IdleConnTimeout:     90 * time.Second,
    },
}

// In the getPricing function:
resp, err := httpClient.Do(req)

This simple change could reduce the service call time significantly.

3. Memory Allocation Issues

Analysis of the memory profile might show excessive allocations in our JSON serialization code. We can optimize this using a pre-allocated buffer:

// Before:
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(product)

// After:
w.Header().Set("Content-Type", "application/json")
var buf bytes.Buffer
buf.Grow(1024) // Pre-allocate 1KB
enc := json.NewEncoder(&buf)
if err := enc.Encode(product); err != nil {
    // Handle error
    return
}
w.Write(buf.Bytes())

Conclusion

Effective profiling and tracing are essential for maintaining high-performance Go applications in production. By combining continuous profiling with distributed tracing, you can identify exactly what needs to be optimized and measure the improvement after changes.

The techniques presented in this article allow you to:

Implement low-overhead continuous profiling that triggers automatically during performance issues
Set up distributed tracing to visualize request flow across services
Correlate trace data with profile data to get a complete picture
Use this information to fix common issues like N+1 queries, inefficient service calls, and memory allocation problems

Remember that optimization should always be data-driven. The combination of profiling and tracing provides the data you need to make informed decisions about where to focus your optimization efforts.

For more on optimizing Go applications, see our related articles on optimizing Go data layers and benchmarking Go goroutines.