Enterprise Linux Swap and Memory Management: Comprehensive Performance Optimization and Infrastructure Automation
Enterprise Linux environments require sophisticated swap and memory management strategies to ensure optimal performance, prevent out-of-memory conditions, and maintain system stability across thousands of servers running mission-critical workloads. This guide covers advanced swap configuration, enterprise memory optimization frameworks, automated performance tuning, and comprehensive monitoring solutions for production infrastructures.
Enterprise Memory Management Architecture
Comprehensive Swap Strategy Framework
Enterprise systems demand intelligent swap management that balances performance, reliability, and resource utilization while preventing catastrophic failures and maintaining predictable application behavior under varying load conditions.
Enterprise Memory Architecture Overview
┌─────────────────────────────────────────────────────────────────┐
│ Enterprise Memory Management Architecture │
├─────────────────┬─────────────────┬─────────────────┬───────────┤
│ Physical Layer │ Virtual Layer │ Application │ Monitoring│
├─────────────────┼─────────────────┼─────────────────┼───────────┤
│ ┌─────────────┐ │ ┌─────────────┐ │ ┌─────────────┐ │ ┌───────┐ │
│ │ RAM/DIMM │ │ │ Page Tables │ │ │ Process Mem │ │ │Metrics│ │
│ │ NUMA Nodes │ │ │ Swap Space │ │ │ Shared Mem │ │ │Alerts │ │
│ │ Memory Ctrl │ │ │ Page Cache │ │ │ Heap/Stack │ │ │Logs │ │
│ │ ECC/Correct │ │ │ Huge Pages │ │ │ Memory Maps │ │ │Trace │ │
│ └─────────────┘ │ └─────────────┘ │ └─────────────┘ │ └───────┘ │
│ │ │ │ │
│ • Hardware │ • Kernel managed│ • App specific │ • Real │
│ • NUMA aware │ • Transparent │ • Controllable │ • Time │
│ • Error correct │ • Optimized │ • Monitored │ • Alert │
└─────────────────┴─────────────────┴─────────────────┴───────────┘
Memory Management Maturity Model
| Level | Swap Config | Monitoring | Optimization | Scale |
|---|---|---|---|---|
| Basic | Default swap | Manual checks | None | Single server |
| Managed | Sized swap | Basic alerts | Tuned params | 10s of servers |
| Advanced | Dynamic swap | Automated monitoring | Performance profiling | 100s of servers |
| Enterprise | Intelligent swap | Predictive analytics | ML-based optimization | 1000s+ servers |
Advanced Swap Management Framework
Enterprise Swap Configuration System
#!/usr/bin/env python3
"""
Enterprise Linux Swap and Memory Management Framework
"""
import os
import sys
import json
import yaml
import logging
import psutil
import asyncio
import subprocess
from typing import Dict, List, Optional, Tuple, Any, Union
from dataclasses import dataclass, asdict, field
from pathlib import Path
from enum import Enum
from datetime import datetime, timedelta
import numpy as np
from prometheus_client import Counter, Gauge, Histogram
import redis
import boto3
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
import warnings
warnings.filterwarnings('ignore')
class SwapType(Enum):
PARTITION = "partition"
FILE = "file"
ZRAM = "zram"
ZSWAP = "zswap"
DISTRIBUTED = "distributed"
class MemoryPressure(Enum):
LOW = "low"
MODERATE = "moderate"
HIGH = "high"
CRITICAL = "critical"
class WorkloadType(Enum):
DATABASE = "database"
WEBSERVER = "webserver"
COMPUTE = "compute"
CONTAINER = "container"
MIXED = "mixed"
@dataclass
class SwapConfiguration:
"""Swap configuration parameters"""
swap_type: SwapType
size_mb: int
priority: int = -1
device: Optional[str] = None
file_path: Optional[str] = None
compression_algo: Optional[str] = None
max_pool_percent: Optional[int] = None
swappiness: int = 60
vfs_cache_pressure: int = 100
min_free_kbytes: Optional[int] = None
watermark_scale_factor: int = 10
oom_kill_allocating_task: bool = False
metadata: Dict[str, Any] = field(default_factory=dict)
@dataclass
class MemoryMetrics:
"""System memory metrics"""
timestamp: datetime
total_memory: int
available_memory: int
used_memory: int
free_memory: int
cached_memory: int
buffer_memory: int
swap_total: int
swap_used: int
swap_free: int
swap_in_rate: float
swap_out_rate: float
page_fault_rate: float
memory_pressure: MemoryPressure
numa_stats: Dict[int, Dict[str, int]] = field(default_factory=dict)
process_metrics: List[Dict[str, Any]] = field(default_factory=list)
@dataclass
class PerformanceProfile:
"""System performance profile"""
workload_type: WorkloadType
avg_memory_usage: float
peak_memory_usage: float
memory_volatility: float
swap_usage_pattern: str
recommended_swap_size: int
recommended_swappiness: int
optimization_params: Dict[str, Any] = field(default_factory=dict)
class EnterpriseSwapManager:
"""Enterprise swap and memory management system"""
def __init__(self, config_path: str):
self.config = self._load_config(config_path)
self.logger = self._setup_logging()
self.redis_client = self._init_redis()
self.metrics_history: List[MemoryMetrics] = []
self.performance_model = None
# Metrics
self.swap_operations = Counter('swap_operations_total',
'Total swap operations',
['operation', 'status'])
self.memory_usage_bytes = Gauge('memory_usage_bytes',
'Memory usage in bytes',
['type'])
self.swap_usage_bytes = Gauge('swap_usage_bytes',
'Swap usage in bytes',
['device'])
self.memory_pressure_score = Gauge('memory_pressure_score',
'Memory pressure score (0-100)')
self.swap_io_rate = Gauge('swap_io_rate_bytes_per_second',
'Swap I/O rate',
['direction'])
def _load_config(self, config_path: str) -> Dict[str, Any]:
"""Load configuration from file"""
with open(config_path, 'r') as f:
return yaml.safe_load(f)
def _setup_logging(self) -> logging.Logger:
"""Setup enterprise logging"""
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
# Console handler
console_handler = logging.StreamHandler()
console_handler.setLevel(logging.INFO)
# File handler with rotation
from logging.handlers import RotatingFileHandler
file_handler = RotatingFileHandler(
'/var/log/swap-manager/swap-manager.log',
maxBytes=50*1024*1024, # 50MB
backupCount=10
)
file_handler.setLevel(logging.DEBUG)
# Syslog handler
syslog_handler = logging.handlers.SysLogHandler(
address=(self.config.get('syslog_host', 'localhost'), 514)
)
syslog_handler.setLevel(logging.WARNING)
# Formatter
formatter = logging.Formatter(
'%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
for handler in [console_handler, file_handler, syslog_handler]:
handler.setFormatter(formatter)
logger.addHandler(handler)
return logger
def _init_redis(self) -> redis.Redis:
"""Initialize Redis client for caching"""
return redis.Redis(
host=self.config.get('redis_host', 'localhost'),
port=self.config.get('redis_port', 6379),
decode_responses=True
)
async def analyze_system(self) -> PerformanceProfile:
"""Analyze system to determine optimal swap configuration"""
self.logger.info("Analyzing system performance profile")
# Collect metrics over time
metrics = await self._collect_metrics_sample()
# Determine workload type
workload_type = await self._identify_workload_type(metrics)
# Calculate memory statistics
memory_stats = self._calculate_memory_statistics(metrics)
# Generate performance profile
profile = PerformanceProfile(
workload_type=workload_type,
avg_memory_usage=memory_stats['avg_usage'],
peak_memory_usage=memory_stats['peak_usage'],
memory_volatility=memory_stats['volatility'],
swap_usage_pattern=memory_stats['swap_pattern'],
recommended_swap_size=self._calculate_optimal_swap_size(memory_stats),
recommended_swappiness=self._calculate_optimal_swappiness(workload_type, memory_stats),
optimization_params=self._generate_optimization_params(workload_type, memory_stats)
)
# Store profile
self._store_performance_profile(profile)
return profile
async def _collect_metrics_sample(self, duration_minutes: int = 60) -> List[MemoryMetrics]:
"""Collect memory metrics over specified duration"""
metrics = []
interval = 10 # seconds
samples = (duration_minutes * 60) // interval
self.logger.info(f"Collecting {samples} metric samples over {duration_minutes} minutes")
for i in range(samples):
metric = await self._get_current_metrics()
metrics.append(metric)
# Update Prometheus metrics
self._update_prometheus_metrics(metric)
# Store in history
self.metrics_history.append(metric)
if len(self.metrics_history) > 10000: # Keep last 10k samples
self.metrics_history.pop(0)
await asyncio.sleep(interval)
return metrics
async def _get_current_metrics(self) -> MemoryMetrics:
"""Get current memory metrics"""
mem = psutil.virtual_memory()
swap = psutil.swap_memory()
# Get swap I/O rates
swap_stats = await self._get_swap_io_stats()
# Get NUMA statistics
numa_stats = await self._get_numa_stats()
# Get top memory consumers
process_metrics = self._get_process_memory_metrics()
# Calculate memory pressure
pressure = self._calculate_memory_pressure(mem, swap)
return MemoryMetrics(
timestamp=datetime.now(),
total_memory=mem.total,
available_memory=mem.available,
used_memory=mem.used,
free_memory=mem.free,
cached_memory=mem.cached,
buffer_memory=mem.buffers,
swap_total=swap.total,
swap_used=swap.used,
swap_free=swap.free,
swap_in_rate=swap_stats['swap_in_rate'],
swap_out_rate=swap_stats['swap_out_rate'],
page_fault_rate=swap_stats['page_fault_rate'],
memory_pressure=pressure,
numa_stats=numa_stats,
process_metrics=process_metrics
)
async def _get_swap_io_stats(self) -> Dict[str, float]:
"""Get swap I/O statistics"""
try:
# Read from /proc/vmstat
with open('/proc/vmstat', 'r') as f:
vmstat = dict(line.strip().split() for line in f)
# Calculate rates (this is simplified, should track deltas)
return {
'swap_in_rate': float(vmstat.get('pswpin', 0)),
'swap_out_rate': float(vmstat.get('pswpout', 0)),
'page_fault_rate': float(vmstat.get('pgfault', 0))
}
except Exception as e:
self.logger.error(f"Failed to get swap I/O stats: {e}")
return {'swap_in_rate': 0.0, 'swap_out_rate': 0.0, 'page_fault_rate': 0.0}
async def _get_numa_stats(self) -> Dict[int, Dict[str, int]]:
"""Get NUMA node memory statistics"""
numa_stats = {}
try:
# Check if system has NUMA
numa_nodes = psutil.cpu_count() // psutil.cpu_count(logical=False)
for node in range(numa_nodes):
node_path = f"/sys/devices/system/node/node{node}"
if os.path.exists(node_path):
meminfo_path = f"{node_path}/meminfo"
if os.path.exists(meminfo_path):
with open(meminfo_path, 'r') as f:
node_stats = {}
for line in f:
if 'MemTotal' in line or 'MemFree' in line or 'MemUsed' in line:
parts = line.split()
if len(parts) >= 4:
key = parts[2].replace(':', '')
value = int(parts[3]) * 1024 # Convert to bytes
node_stats[key.lower()] = value
numa_stats[node] = node_stats
except Exception as e:
self.logger.debug(f"NUMA stats collection failed: {e}")
return numa_stats
def _get_process_memory_metrics(self, top_n: int = 10) -> List[Dict[str, Any]]:
"""Get memory metrics for top N processes"""
processes = []
try:
for proc in psutil.process_iter(['pid', 'name', 'memory_info', 'memory_percent']):
try:
pinfo = proc.info
processes.append({
'pid': pinfo['pid'],
'name': pinfo['name'],
'rss': pinfo['memory_info'].rss if pinfo['memory_info'] else 0,
'vms': pinfo['memory_info'].vms if pinfo['memory_info'] else 0,
'percent': pinfo['memory_percent'] or 0
})
except (psutil.NoSuchProcess, psutil.AccessDenied):
pass
# Sort by RSS and return top N
processes.sort(key=lambda x: x['rss'], reverse=True)
return processes[:top_n]
except Exception as e:
self.logger.error(f"Failed to get process metrics: {e}")
return []
def _calculate_memory_pressure(self, mem: Any, swap: Any) -> MemoryPressure:
"""Calculate memory pressure level"""
# Calculate various pressure indicators
mem_usage_percent = (mem.used / mem.total) * 100
swap_usage_percent = (swap.used / swap.total * 100) if swap.total > 0 else 0
available_percent = (mem.available / mem.total) * 100
# Memory pressure scoring
pressure_score = 0
# Memory usage contribution
if mem_usage_percent > 95:
pressure_score += 40
elif mem_usage_percent > 90:
pressure_score += 30
elif mem_usage_percent > 80:
pressure_score += 20
elif mem_usage_percent > 70:
pressure_score += 10
# Available memory contribution
if available_percent < 5:
pressure_score += 30
elif available_percent < 10:
pressure_score += 20
elif available_percent < 20:
pressure_score += 10
# Swap usage contribution
if swap_usage_percent > 80:
pressure_score += 30
elif swap_usage_percent > 50:
pressure_score += 20
elif swap_usage_percent > 25:
pressure_score += 10
# Update metric
self.memory_pressure_score.set(pressure_score)
# Determine pressure level
if pressure_score >= 70:
return MemoryPressure.CRITICAL
elif pressure_score >= 50:
return MemoryPressure.HIGH
elif pressure_score >= 30:
return MemoryPressure.MODERATE
else:
return MemoryPressure.LOW
def _update_prometheus_metrics(self, metrics: MemoryMetrics):
"""Update Prometheus metrics"""
# Memory usage
self.memory_usage_bytes.labels(type='total').set(metrics.total_memory)
self.memory_usage_bytes.labels(type='used').set(metrics.used_memory)
self.memory_usage_bytes.labels(type='free').set(metrics.free_memory)
self.memory_usage_bytes.labels(type='available').set(metrics.available_memory)
self.memory_usage_bytes.labels(type='cached').set(metrics.cached_memory)
self.memory_usage_bytes.labels(type='buffers').set(metrics.buffer_memory)
# Swap usage
self.swap_usage_bytes.labels(device='total').set(metrics.swap_total)
self.swap_usage_bytes.labels(device='used').set(metrics.swap_used)
self.swap_usage_bytes.labels(device='free').set(metrics.swap_free)
# Swap I/O rates
self.swap_io_rate.labels(direction='in').set(metrics.swap_in_rate)
self.swap_io_rate.labels(direction='out').set(metrics.swap_out_rate)
async def _identify_workload_type(self, metrics: List[MemoryMetrics]) -> WorkloadType:
"""Identify workload type based on memory patterns"""
if not metrics:
return WorkloadType.MIXED
# Analyze memory usage patterns
memory_usage = [m.used_memory / m.total_memory for m in metrics]
swap_usage = [m.swap_used / m.swap_total if m.swap_total > 0 else 0 for m in metrics]
# Calculate statistics
avg_memory = np.mean(memory_usage)
std_memory = np.std(memory_usage)
cv_memory = std_memory / avg_memory if avg_memory > 0 else 0 # Coefficient of variation
avg_swap = np.mean(swap_usage)
# Analyze top processes
process_types = {}
for metric in metrics:
for proc in metric.process_metrics:
name = proc['name'].lower()
if any(db in name for db in ['mysql', 'postgres', 'mongo', 'redis', 'cassandra']):
process_types['database'] = process_types.get('database', 0) + 1
elif any(web in name for web in ['nginx', 'apache', 'httpd', 'node', 'java']):
process_types['webserver'] = process_types.get('webserver', 0) + 1
elif any(cont in name for cont in ['docker', 'containerd', 'runc', 'kubelet']):
process_types['container'] = process_types.get('container', 0) + 1
elif any(comp in name for comp in ['python', 'R', 'matlab', 'julia']):
process_types['compute'] = process_types.get('compute', 0) + 1
# Determine workload type
if process_types:
dominant_type = max(process_types, key=process_types.get)
if process_types[dominant_type] > len(metrics) * 5: # Significant presence
return WorkloadType[dominant_type.upper()]
# Fallback to pattern-based detection
if avg_memory > 0.8 and cv_memory < 0.1: # High, stable memory usage
return WorkloadType.DATABASE
elif cv_memory > 0.3: # Highly variable memory usage
return WorkloadType.WEBSERVER
elif avg_swap > 0.2: # Significant swap usage
return WorkloadType.COMPUTE
else:
return WorkloadType.MIXED
def _calculate_memory_statistics(self, metrics: List[MemoryMetrics]) -> Dict[str, Any]:
"""Calculate memory usage statistics"""
if not metrics:
return {
'avg_usage': 0,
'peak_usage': 0,
'volatility': 0,
'swap_pattern': 'unknown'
}
memory_usage = [(m.used_memory / m.total_memory) * 100 for m in metrics]
swap_usage = [(m.swap_used / m.swap_total * 100) if m.swap_total > 0 else 0 for m in metrics]
# Swap pattern analysis
swap_pattern = 'minimal'
if np.mean(swap_usage) > 50:
swap_pattern = 'heavy'
elif np.mean(swap_usage) > 20:
swap_pattern = 'moderate'
elif np.std(swap_usage) > 10:
swap_pattern = 'bursty'
return {
'avg_usage': np.mean(memory_usage),
'peak_usage': np.max(memory_usage),
'volatility': np.std(memory_usage),
'swap_pattern': swap_pattern,
'p95_usage': np.percentile(memory_usage, 95),
'p99_usage': np.percentile(memory_usage, 99)
}
def _calculate_optimal_swap_size(self, memory_stats: Dict[str, Any]) -> int:
"""Calculate optimal swap size based on system profile"""
total_ram = psutil.virtual_memory().total
# Base calculation on RAM size and usage patterns
if total_ram <= 2 * (1024**3): # <= 2GB RAM
base_swap = total_ram * 2
elif total_ram <= 8 * (1024**3): # <= 8GB RAM
base_swap = total_ram
elif total_ram <= 64 * (1024**3): # <= 64GB RAM
base_swap = int(total_ram * 0.5)
else: # > 64GB RAM
base_swap = min(32 * (1024**3), int(total_ram * 0.25))
# Adjust based on usage patterns
if memory_stats['swap_pattern'] == 'heavy':
swap_size = int(base_swap * 1.5)
elif memory_stats['swap_pattern'] == 'bursty':
swap_size = int(base_swap * 1.25)
elif memory_stats['peak_usage'] > 90:
swap_size = int(base_swap * 1.25)
else:
swap_size = base_swap
# Ensure minimum swap for hibernation if configured
if self.config.get('enable_hibernation', False):
swap_size = max(swap_size, total_ram + (1024**3)) # RAM + 1GB
return swap_size // (1024**2) # Return in MB
def _calculate_optimal_swappiness(self,
workload_type: WorkloadType,
memory_stats: Dict[str, Any]) -> int:
"""Calculate optimal swappiness value"""
# Base swappiness by workload type
base_swappiness = {
WorkloadType.DATABASE: 10, # Minimize swapping for databases
WorkloadType.WEBSERVER: 30, # Moderate swapping
WorkloadType.COMPUTE: 60, # Default swapping
WorkloadType.CONTAINER: 40, # Container-friendly
WorkloadType.MIXED: 50 # Balanced
}
swappiness = base_swappiness.get(workload_type, 60)
# Adjust based on memory pressure
if memory_stats['avg_usage'] > 85:
swappiness = min(swappiness + 10, 100)
elif memory_stats['avg_usage'] < 50:
swappiness = max(swappiness - 10, 0)
# Adjust based on swap pattern
if memory_stats['swap_pattern'] == 'heavy':
swappiness = min(swappiness + 10, 100)
elif memory_stats['swap_pattern'] == 'minimal':
swappiness = max(swappiness - 10, 0)
return swappiness
def _generate_optimization_params(self,
workload_type: WorkloadType,
memory_stats: Dict[str, Any]) -> Dict[str, Any]:
"""Generate optimization parameters"""
params = {}
# VFS cache pressure
if workload_type == WorkloadType.DATABASE:
params['vfs_cache_pressure'] = 50 # Prefer caching
elif workload_type == WorkloadType.WEBSERVER:
params['vfs_cache_pressure'] = 80
else:
params['vfs_cache_pressure'] = 100
# Dirty ratio and background ratio
total_ram_gb = psutil.virtual_memory().total // (1024**3)
if total_ram_gb <= 4:
params['dirty_ratio'] = 15
params['dirty_background_ratio'] = 5
elif total_ram_gb <= 16:
params['dirty_ratio'] = 10
params['dirty_background_ratio'] = 3
else:
params['dirty_ratio'] = 5
params['dirty_background_ratio'] = 2
# Min free kbytes
params['min_free_kbytes'] = min(
int(psutil.virtual_memory().total * 0.01 / 1024), # 1% of RAM
262144 # Max 256MB
)
# Watermark scale factor
if memory_stats['volatility'] > 20:
params['watermark_scale_factor'] = 200 # More aggressive
else:
params['watermark_scale_factor'] = 100
# Zone reclaim mode
if workload_type == WorkloadType.DATABASE:
params['zone_reclaim_mode'] = 0 # Disable for databases
else:
params['zone_reclaim_mode'] = 1
# Transparent huge pages
if workload_type in [WorkloadType.DATABASE, WorkloadType.COMPUTE]:
params['transparent_hugepage'] = 'madvise'
else:
params['transparent_hugepage'] = 'always'
# OOM killer settings
params['oom_kill_allocating_task'] = 0
params['panic_on_oom'] = 0
return params
def _store_performance_profile(self, profile: PerformanceProfile):
"""Store performance profile for future reference"""
try:
# Store in Redis
key = f"performance_profile:{datetime.now().strftime('%Y%m%d_%H%M%S')}"
self.redis_client.setex(
key,
timedelta(days=30),
json.dumps(asdict(profile), default=str)
)
# Store latest profile
self.redis_client.set(
"performance_profile:latest",
json.dumps(asdict(profile), default=str)
)
except Exception as e:
self.logger.error(f"Failed to store performance profile: {e}")
async def configure_swap(self, config: SwapConfiguration) -> Dict[str, Any]:
"""Configure swap based on provided configuration"""
self.logger.info(f"Configuring swap: {config.swap_type.value}")
result = {
'status': 'pending',
'config': asdict(config),
'timestamp': datetime.now().isoformat()
}
try:
# Disable existing swap if requested
if self.config.get('disable_existing_swap', False):
await self._disable_all_swap()
# Configure based on swap type
if config.swap_type == SwapType.PARTITION:
swap_result = await self._configure_partition_swap(config)
elif config.swap_type == SwapType.FILE:
swap_result = await self._configure_file_swap(config)
elif config.swap_type == SwapType.ZRAM:
swap_result = await self._configure_zram_swap(config)
elif config.swap_type == SwapType.ZSWAP:
swap_result = await self._configure_zswap(config)
else:
raise ValueError(f"Unsupported swap type: {config.swap_type}")
result.update(swap_result)
# Apply system parameters
await self._apply_system_parameters(config)
# Verify configuration
verification = await self._verify_swap_configuration(config)
result['verification'] = verification
if verification['success']:
result['status'] = 'success'
self.swap_operations.labels(
operation='configure',
status='success'
).inc()
else:
result['status'] = 'failed'
self.swap_operations.labels(
operation='configure',
status='failure'
).inc()
except Exception as e:
self.logger.error(f"Swap configuration failed: {e}")
result['status'] = 'error'
result['error'] = str(e)
self.swap_operations.labels(
operation='configure',
status='error'
).inc()
return result
async def _disable_all_swap(self):
"""Disable all swap devices"""
self.logger.info("Disabling all swap devices")
try:
# Get current swap devices
result = subprocess.run(
['swapon', '--show', '--raw', '--noheadings'],
capture_output=True,
text=True
)
if result.returncode == 0 and result.stdout:
for line in result.stdout.strip().split('\n'):
parts = line.split()
if parts:
device = parts[0]
subprocess.run(['swapoff', device], check=True)
self.logger.info(f"Disabled swap on {device}")
except subprocess.CalledProcessError as e:
self.logger.error(f"Failed to disable swap: {e}")
raise
async def _configure_partition_swap(self, config: SwapConfiguration) -> Dict[str, Any]:
"""Configure partition-based swap"""
if not config.device:
raise ValueError("Device path required for partition swap")
self.logger.info(f"Configuring partition swap on {config.device}")
# Check if device exists
if not os.path.exists(config.device):
raise FileNotFoundError(f"Device not found: {config.device}")
# Create swap signature
subprocess.run(['mkswap', config.device], check=True)
# Enable swap with priority
cmd = ['swapon', config.device]
if config.priority >= 0:
cmd.extend(['-p', str(config.priority)])
subprocess.run(cmd, check=True)
# Update /etc/fstab
await self._update_fstab(config)
return {
'device': config.device,
'enabled': True
}
async def _configure_file_swap(self, config: SwapConfiguration) -> Dict[str, Any]:
"""Configure file-based swap"""
if not config.file_path:
config.file_path = '/var/swap/swapfile'
self.logger.info(f"Configuring file swap at {config.file_path}")
# Create directory if needed
swap_dir = os.path.dirname(config.file_path)
os.makedirs(swap_dir, exist_ok=True)
# Create swap file using dd (more reliable than fallocate on some filesystems)
size_blocks = config.size_mb
subprocess.run([
'dd', 'if=/dev/zero', f'of={config.file_path}',
f'bs=1M', f'count={size_blocks}', 'status=progress'
], check=True)
# Set permissions
os.chmod(config.file_path, 0o600)
# Create swap signature
subprocess.run(['mkswap', config.file_path], check=True)
# Enable swap
cmd = ['swapon', config.file_path]
if config.priority >= 0:
cmd.extend(['-p', str(config.priority)])
subprocess.run(cmd, check=True)
# Update /etc/fstab
await self._update_fstab(config)
return {
'file': config.file_path,
'size_mb': config.size_mb,
'enabled': True
}
async def _configure_zram_swap(self, config: SwapConfiguration) -> Dict[str, Any]:
"""Configure ZRAM-based swap"""
self.logger.info("Configuring ZRAM swap")
# Load zram module
subprocess.run(['modprobe', 'zram'], check=True)
# Find available zram device
zram_device = None
for i in range(256):
device = f"/dev/zram{i}"
if not os.path.exists(device):
# Create device
with open('/sys/class/zram-control/hot_add', 'w') as f:
f.write('1')
if os.path.exists(device):
zram_device = device
break
else:
# Check if unused
with open(f'/sys/block/zram{i}/disksize', 'r') as f:
if f.read().strip() == '0':
zram_device = device
break
if not zram_device:
raise RuntimeError("No available zram device found")
# Configure compression algorithm
if config.compression_algo:
algo_path = f'/sys/block/{os.path.basename(zram_device)}/comp_algorithm'
with open(algo_path, 'w') as f:
f.write(config.compression_algo)
# Set size
size_bytes = config.size_mb * 1024 * 1024
size_path = f'/sys/block/{os.path.basename(zram_device)}/disksize'
with open(size_path, 'w') as f:
f.write(str(size_bytes))
# Create swap on zram device
subprocess.run(['mkswap', zram_device], check=True)
# Enable swap
cmd = ['swapon', zram_device]
if config.priority >= 0:
cmd.extend(['-p', str(config.priority)])
subprocess.run(cmd, check=True)
# Create systemd service for persistence
await self._create_zram_service(config, zram_device)
return {
'device': zram_device,
'compression': config.compression_algo or 'lzo',
'size_mb': config.size_mb,
'enabled': True
}
async def _configure_zswap(self, config: SwapConfiguration) -> Dict[str, Any]:
"""Configure ZSWAP (compressed swap cache)"""
self.logger.info("Configuring ZSWAP")
# Enable zswap
with open('/sys/module/zswap/parameters/enabled', 'w') as f:
f.write('1')
# Configure compression algorithm
if config.compression_algo:
with open('/sys/module/zswap/parameters/compressor', 'w') as f:
f.write(config.compression_algo)
# Configure max pool percent
if config.max_pool_percent:
with open('/sys/module/zswap/parameters/max_pool_percent', 'w') as f:
f.write(str(config.max_pool_percent))
# Make persistent via kernel parameters
grub_params = []
grub_params.append('zswap.enabled=1')
if config.compression_algo:
grub_params.append(f'zswap.compressor={config.compression_algo}')
if config.max_pool_percent:
grub_params.append(f'zswap.max_pool_percent={config.max_pool_percent}')
await self._update_grub_config(grub_params)
return {
'type': 'zswap',
'enabled': True,
'compressor': config.compression_algo or 'lzo',
'max_pool_percent': config.max_pool_percent or 20
}
async def _apply_system_parameters(self, config: SwapConfiguration):
"""Apply system parameters for swap configuration"""
self.logger.info("Applying system parameters")
# Swappiness
with open('/proc/sys/vm/swappiness', 'w') as f:
f.write(str(config.swappiness))
# VFS cache pressure
with open('/proc/sys/vm/vfs_cache_pressure', 'w') as f:
f.write(str(config.vfs_cache_pressure))
# Min free kbytes
if config.min_free_kbytes:
with open('/proc/sys/vm/min_free_kbytes', 'w') as f:
f.write(str(config.min_free_kbytes))
# Watermark scale factor
wm_path = '/proc/sys/vm/watermark_scale_factor'
if os.path.exists(wm_path):
with open(wm_path, 'w') as f:
f.write(str(config.watermark_scale_factor))
# OOM killer settings
oom_path = '/proc/sys/vm/oom_kill_allocating_task'
if os.path.exists(oom_path):
with open(oom_path, 'w') as f:
f.write('1' if config.oom_kill_allocating_task else '0')
# Make persistent via sysctl
await self._update_sysctl_conf(config)
async def _update_fstab(self, config: SwapConfiguration):
"""Update /etc/fstab for swap persistence"""
fstab_entry = None
if config.swap_type == SwapType.PARTITION:
# Get UUID
result = subprocess.run(
['blkid', '-s', 'UUID', '-o', 'value', config.device],
capture_output=True,
text=True
)
if result.returncode == 0 and result.stdout:
uuid = result.stdout.strip()
fstab_entry = f"UUID={uuid} none swap sw,pri={config.priority} 0 0"
else:
fstab_entry = f"{config.device} none swap sw,pri={config.priority} 0 0"
elif config.swap_type == SwapType.FILE:
fstab_entry = f"{config.file_path} none swap sw,pri={config.priority} 0 0"
if fstab_entry:
# Check if entry already exists
with open('/etc/fstab', 'r') as f:
fstab_content = f.read()
if fstab_entry not in fstab_content:
# Backup fstab
subprocess.run(['cp', '/etc/fstab', '/etc/fstab.bak'], check=True)
# Add entry
with open('/etc/fstab', 'a') as f:
f.write(f"\n# Added by swap manager - {datetime.now()}\n")
f.write(f"{fstab_entry}\n")
async def _update_sysctl_conf(self, config: SwapConfiguration):
"""Update sysctl.conf for parameter persistence"""
sysctl_params = {
'vm.swappiness': config.swappiness,
'vm.vfs_cache_pressure': config.vfs_cache_pressure,
'vm.watermark_scale_factor': config.watermark_scale_factor,
'vm.oom_kill_allocating_task': 1 if config.oom_kill_allocating_task else 0
}
if config.min_free_kbytes:
sysctl_params['vm.min_free_kbytes'] = config.min_free_kbytes
# Read existing sysctl.conf
sysctl_file = '/etc/sysctl.d/99-swap-manager.conf'
with open(sysctl_file, 'w') as f:
f.write("# Swap Manager Configuration\n")
f.write(f"# Generated: {datetime.now()}\n\n")
for param, value in sysctl_params.items():
f.write(f"{param} = {value}\n")
# Apply settings
subprocess.run(['sysctl', '-p', sysctl_file], check=True)
async def _create_zram_service(self, config: SwapConfiguration, device: str):
"""Create systemd service for ZRAM persistence"""
service_content = f"""[Unit]
Description=ZRAM Swap Device
After=multi-user.target
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/local/bin/setup-zram-swap.sh
ExecStop=/sbin/swapoff {device}
[Install]
WantedBy=multi-user.target
"""
# Create service file
service_file = '/etc/systemd/system/zram-swap.service'
with open(service_file, 'w') as f:
f.write(service_content)
# Create setup script
script_content = f"""#!/bin/bash
# ZRAM Swap Setup Script
# Generated by Swap Manager
modprobe zram
# Configure device
echo {config.compression_algo or 'lzo'} > /sys/block/{os.path.basename(device)}/comp_algorithm
echo {config.size_mb * 1024 * 1024} > /sys/block/{os.path.basename(device)}/disksize
# Create and enable swap
mkswap {device}
swapon -p {config.priority} {device}
"""
script_file = '/usr/local/bin/setup-zram-swap.sh'
with open(script_file, 'w') as f:
f.write(script_content)
os.chmod(script_file, 0o755)
# Enable service
subprocess.run(['systemctl', 'daemon-reload'], check=True)
subprocess.run(['systemctl', 'enable', 'zram-swap.service'], check=True)
async def _update_grub_config(self, params: List[str]):
"""Update GRUB configuration for kernel parameters"""
grub_file = '/etc/default/grub'
# Backup
subprocess.run(['cp', grub_file, f'{grub_file}.bak'], check=True)
# Read current config
with open(grub_file, 'r') as f:
lines = f.readlines()
# Update GRUB_CMDLINE_LINUX_DEFAULT
param_string = ' '.join(params)
updated = False
for i, line in enumerate(lines):
if line.startswith('GRUB_CMDLINE_LINUX_DEFAULT='):
# Extract current parameters
import re
match = re.match(r'GRUB_CMDLINE_LINUX_DEFAULT="([^"]*)"', line)
if match:
current_params = match.group(1)
# Add our parameters if not already present
for param in params:
if param not in current_params:
current_params += f' {param}'
lines[i] = f'GRUB_CMDLINE_LINUX_DEFAULT="{current_params}"\n'
updated = True
break
if updated:
# Write updated config
with open(grub_file, 'w') as f:
f.writelines(lines)
# Update GRUB
subprocess.run(['update-grub'], check=True)
async def _verify_swap_configuration(self, config: SwapConfiguration) -> Dict[str, Any]:
"""Verify swap configuration is active and correct"""
verification = {
'success': False,
'active_swap': [],
'parameters': {}
}
try:
# Check swap status
result = subprocess.run(
['swapon', '--show', '--raw', '--noheadings'],
capture_output=True,
text=True
)
if result.returncode == 0:
for line in result.stdout.strip().split('\n'):
if line:
parts = line.split()
if len(parts) >= 5:
swap_info = {
'name': parts[0],
'type': parts[1],
'size': parts[2],
'used': parts[3],
'priority': parts[4]
}
verification['active_swap'].append(swap_info)
# Check if our swap is active
if config.swap_type == SwapType.PARTITION and parts[0] == config.device:
verification['success'] = True
elif config.swap_type == SwapType.FILE and parts[0] == config.file_path:
verification['success'] = True
elif config.swap_type == SwapType.ZRAM and 'zram' in parts[0]:
verification['success'] = True
# Verify system parameters
param_files = {
'swappiness': '/proc/sys/vm/swappiness',
'vfs_cache_pressure': '/proc/sys/vm/vfs_cache_pressure',
'min_free_kbytes': '/proc/sys/vm/min_free_kbytes',
'watermark_scale_factor': '/proc/sys/vm/watermark_scale_factor'
}
for param, path in param_files.items():
if os.path.exists(path):
with open(path, 'r') as f:
verification['parameters'][param] = int(f.read().strip())
# Check ZSWAP if configured
if config.swap_type == SwapType.ZSWAP:
with open('/sys/module/zswap/parameters/enabled', 'r') as f:
enabled = f.read().strip()
verification['zswap_enabled'] = enabled == 'Y'
verification['success'] = verification['zswap_enabled']
except Exception as e:
self.logger.error(f"Verification failed: {e}")
verification['error'] = str(e)
return verification
async def monitor_memory_health(self) -> Dict[str, Any]:
"""Monitor memory health and provide recommendations"""
self.logger.info("Monitoring memory health")
# Collect current metrics
metrics = await self._get_current_metrics()
# Analyze health
health_report = {
'timestamp': datetime.now().isoformat(),
'status': 'healthy',
'memory_pressure': metrics.memory_pressure.value,
'metrics': {
'memory_usage_percent': (metrics.used_memory / metrics.total_memory) * 100,
'swap_usage_percent': (metrics.swap_used / metrics.swap_total * 100) if metrics.swap_total > 0 else 0,
'available_memory_gb': metrics.available_memory / (1024**3),
'swap_io_rate': metrics.swap_in_rate + metrics.swap_out_rate
},
'issues': [],
'recommendations': []
}
# Check for issues
if metrics.memory_pressure == MemoryPressure.CRITICAL:
health_report['status'] = 'critical'
health_report['issues'].append("Critical memory pressure detected")
health_report['recommendations'].append("Consider adding more RAM or increasing swap space")
elif metrics.memory_pressure == MemoryPressure.HIGH:
health_report['status'] = 'warning'
health_report['issues'].append("High memory pressure detected")
health_report['recommendations'].append("Monitor closely and consider memory optimization")
# Check swap usage
if metrics.swap_total > 0:
swap_usage_percent = (metrics.swap_used / metrics.swap_total) * 100
if swap_usage_percent > 80:
health_report['issues'].append(f"High swap usage: {swap_usage_percent:.1f}%")
health_report['recommendations'].append("Consider increasing swap space or optimizing memory usage")
# Check swap I/O
if metrics.swap_in_rate + metrics.swap_out_rate > 1000000: # 1MB/s
health_report['issues'].append("High swap I/O activity")
health_report['recommendations'].append("Consider using faster storage for swap or adding more RAM")
# Check for memory leaks in top processes
for proc in metrics.process_metrics[:5]:
if proc['percent'] > 20:
health_report['issues'].append(
f"Process {proc['name']} (PID: {proc['pid']}) using {proc['percent']:.1f}% memory"
)
# Machine learning predictions if model is trained
if self.performance_model:
prediction = self._predict_memory_usage()
if prediction:
health_report['prediction'] = prediction
return health_report
def _predict_memory_usage(self) -> Optional[Dict[str, Any]]:
"""Predict future memory usage using ML model"""
if not self.metrics_history or len(self.metrics_history) < 100:
return None
try:
# Prepare data for prediction
X = []
y = []
for i in range(len(self.metrics_history) - 1):
features = [
self.metrics_history[i].used_memory / self.metrics_history[i].total_memory,
self.metrics_history[i].swap_used / max(self.metrics_history[i].swap_total, 1),
self.metrics_history[i].swap_in_rate,
self.metrics_history[i].swap_out_rate,
i # Time component
]
X.append(features)
y.append(self.metrics_history[i + 1].used_memory / self.metrics_history[i + 1].total_memory)
X = np.array(X)
y = np.array(y)
# Train simple model if not already trained
if not self.performance_model:
self.performance_model = LinearRegression()
self.performance_model.fit(X, y)
# Predict next hour
predictions = []
current_features = X[-1].copy()
for i in range(6): # 6 x 10 minutes = 1 hour
pred = self.performance_model.predict([current_features])[0]
predictions.append(pred * 100) # Convert to percentage
# Update features for next prediction
current_features[0] = pred
current_features[4] += 1
return {
'next_hour_avg': np.mean(predictions),
'next_hour_max': np.max(predictions),
'trend': 'increasing' if predictions[-1] > predictions[0] else 'decreasing'
}
except Exception as e:
self.logger.error(f"Prediction failed: {e}")
return None
class SwapOptimizationEngine:
"""Automated swap optimization engine"""
def __init__(self, manager: EnterpriseSwapManager):
self.manager = manager
self.logger = logging.getLogger(__name__)
async def auto_optimize(self) -> Dict[str, Any]:
"""Automatically optimize swap configuration"""
self.logger.info("Starting automatic swap optimization")
# Analyze system
profile = await self.manager.analyze_system()
# Generate optimal configuration
optimal_config = SwapConfiguration(
swap_type=self._determine_best_swap_type(profile),
size_mb=profile.recommended_swap_size,
priority=-1,
swappiness=profile.recommended_swappiness,
vfs_cache_pressure=profile.optimization_params.get('vfs_cache_pressure', 100),
min_free_kbytes=profile.optimization_params.get('min_free_kbytes'),
watermark_scale_factor=profile.optimization_params.get('watermark_scale_factor', 100),
oom_kill_allocating_task=profile.optimization_params.get('oom_kill_allocating_task', False)
)
# Apply configuration
result = await self.manager.configure_swap(optimal_config)
return {
'profile': asdict(profile),
'configuration': asdict(optimal_config),
'result': result
}
def _determine_best_swap_type(self, profile: PerformanceProfile) -> SwapType:
"""Determine best swap type based on profile"""
# Check available storage
disk_usage = psutil.disk_usage('/')
free_space_gb = disk_usage.free / (1024**3)
# Check for SSD
is_ssd = self._check_ssd()
# Decision logic
if profile.workload_type == WorkloadType.DATABASE and is_ssd:
# Use ZRAM for databases on SSD to minimize I/O
return SwapType.ZRAM
elif free_space_gb < profile.recommended_swap_size / 1024:
# Not enough disk space, use ZRAM
return SwapType.ZRAM
elif is_ssd:
# SSD available, use file swap
return SwapType.FILE
else:
# HDD, check for dedicated partition
if self._find_swap_partition():
return SwapType.PARTITION
else:
return SwapType.FILE
def _check_ssd(self) -> bool:
"""Check if root filesystem is on SSD"""
try:
# Simple heuristic - check rotational flag
with open('/sys/block/sda/queue/rotational', 'r') as f:
return f.read().strip() == '0'
except:
return False
def _find_swap_partition(self) -> Optional[str]:
"""Find available swap partition"""
try:
result = subprocess.run(
['blkid', '-t', 'TYPE=swap'],
capture_output=True,
text=True
)
if result.returncode == 0 and result.stdout:
# Extract device path
parts = result.stdout.split(':')
if parts:
return parts[0]
except:
pass
return None
async def main():
"""Main execution function"""
import argparse
parser = argparse.ArgumentParser(description='Enterprise Swap Manager')
parser.add_argument('--config', default='/etc/swap-manager/config.yaml',
help='Configuration file path')
parser.add_argument('--action', required=True,
choices=['analyze', 'configure', 'optimize', 'monitor', 'report'],
help='Action to perform')
parser.add_argument('--swap-type', choices=['partition', 'file', 'zram', 'zswap'],
help='Swap type for configure action')
parser.add_argument('--size', type=int, help='Swap size in MB')
parser.add_argument('--device', help='Device path for partition swap')
parser.add_argument('--output', default='json',
choices=['json', 'yaml', 'table'],
help='Output format')
args = parser.parse_args()
# Initialize manager
manager = EnterpriseSwapManager(args.config)
try:
if args.action == 'analyze':
# Analyze system
profile = await manager.analyze_system()
if args.output == 'json':
print(json.dumps(asdict(profile), indent=2, default=str))
elif args.output == 'yaml':
print(yaml.dump(asdict(profile), default_flow_style=False))
else:
print(f"Workload Type: {profile.workload_type.value}")
print(f"Average Memory Usage: {profile.avg_memory_usage:.1f}%")
print(f"Peak Memory Usage: {profile.peak_memory_usage:.1f}%")
print(f"Recommended Swap Size: {profile.recommended_swap_size} MB")
print(f"Recommended Swappiness: {profile.recommended_swappiness}")
elif args.action == 'configure':
if not args.swap_type:
parser.error('--swap-type required for configure action')
# Create configuration
config = SwapConfiguration(
swap_type=SwapType(args.swap_type),
size_mb=args.size or 4096,
device=args.device
)
# Apply configuration
result = await manager.configure_swap(config)
print(json.dumps(result, indent=2))
elif args.action == 'optimize':
# Auto-optimize
optimizer = SwapOptimizationEngine(manager)
result = await optimizer.auto_optimize()
print(json.dumps(result, indent=2, default=str))
elif args.action == 'monitor':
# Monitor health
health = await manager.monitor_memory_health()
if args.output == 'json':
print(json.dumps(health, indent=2))
else:
print(f"Status: {health['status'].upper()}")
print(f"Memory Pressure: {health['memory_pressure']}")
print(f"Memory Usage: {health['metrics']['memory_usage_percent']:.1f}%")
print(f"Swap Usage: {health['metrics']['swap_usage_percent']:.1f}%")
if health['issues']:
print("\nIssues:")
for issue in health['issues']:
print(f" - {issue}")
if health['recommendations']:
print("\nRecommendations:")
for rec in health['recommendations']:
print(f" - {rec}")
elif args.action == 'report':
# Generate detailed report
print("Generating comprehensive memory report...")
# This would generate a detailed report
# Implementation depends on specific requirements
except Exception as e:
print(f"Error: {e}", file=sys.stderr)
sys.exit(1)
if __name__ == "__main__":
asyncio.run(main())
Enterprise Swap Management Implementation
Production Deployment Scripts
Automated Swap Configuration Script
#!/bin/bash
# enterprise-swap-deploy.sh - Enterprise swap deployment automation
set -euo pipefail
# Configuration
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
CONFIG_FILE="${CONFIG_FILE:-/etc/swap-manager/config.yaml}"
LOG_DIR="/var/log/swap-manager"
STATE_DIR="/var/lib/swap-manager"
# Create directories
mkdir -p "$LOG_DIR" "$STATE_DIR"
# Logging
LOG_FILE="$LOG_DIR/swap-deploy-$(date +%Y%m%d-%H%M%S).log"
exec 1> >(tee -a "$LOG_FILE")
exec 2>&1
log() {
echo "[$(date +'%Y-%m-%d %H:%M:%S')] $*"
}
error() {
echo "[$(date +'%Y-%m-%d %H:%M:%S')] ERROR: $*" >&2
exit 1
}
# Check system requirements
check_requirements() {
log "Checking system requirements"
# Check for required tools
local required_tools=("python3" "swapon" "mkswap" "dd" "blkid")
for tool in "${required_tools[@]}"; do
if ! command -v "$tool" &> /dev/null; then
error "Required tool not found: $tool"
fi
done
# Check Python modules
python3 -c "import psutil, yaml, prometheus_client" || \
error "Required Python modules not installed"
# Check if running as root
if [[ $EUID -ne 0 ]]; then
error "This script must be run as root"
fi
log "Requirements check passed"
}
# Analyze system and get recommendations
analyze_system() {
log "Analyzing system configuration"
# Run analysis
local analysis_output
analysis_output=$(python3 /usr/local/bin/swap-manager.py \
--config "$CONFIG_FILE" \
--action analyze \
--output json)
# Save analysis
echo "$analysis_output" > "$STATE_DIR/last-analysis.json"
# Extract recommendations
RECOMMENDED_SWAP_SIZE=$(echo "$analysis_output" | \
python3 -c "import sys, json; print(json.load(sys.stdin)['recommended_swap_size'])")
RECOMMENDED_SWAPPINESS=$(echo "$analysis_output" | \
python3 -c "import sys, json; print(json.load(sys.stdin)['recommended_swappiness'])")
WORKLOAD_TYPE=$(echo "$analysis_output" | \
python3 -c "import sys, json; print(json.load(sys.stdin)['workload_type'])")
log "Analysis complete:"
log " Workload type: $WORKLOAD_TYPE"
log " Recommended swap size: ${RECOMMENDED_SWAP_SIZE}MB"
log " Recommended swappiness: $RECOMMENDED_SWAPPINESS"
}
# Configure optimal swap
configure_swap() {
local swap_type="${1:-auto}"
local swap_size="${2:-$RECOMMENDED_SWAP_SIZE}"
log "Configuring swap (type: $swap_type, size: ${swap_size}MB)"
# Determine swap type if auto
if [[ "$swap_type" == "auto" ]]; then
swap_type=$(determine_swap_type)
fi
case "$swap_type" in
partition)
configure_partition_swap "$swap_size"
;;
file)
configure_file_swap "$swap_size"
;;
zram)
configure_zram_swap "$swap_size"
;;
zswap)
configure_zswap
;;
*)
error "Unknown swap type: $swap_type"
;;
esac
# Apply system parameters
apply_system_parameters
# Verify configuration
verify_swap_configuration
}
# Determine best swap type
determine_swap_type() {
local swap_type="file" # Default
# Check for existing swap partition
if blkid -t TYPE=swap &> /dev/null; then
swap_type="partition"
# Check available disk space
elif [[ $(df -BG / | tail -1 | awk '{print $4}' | sed 's/G//') -lt 10 ]]; then
# Less than 10GB free, use ZRAM
swap_type="zram"
# Check if SSD
elif [[ -f /sys/block/sda/queue/rotational ]] && \
[[ $(cat /sys/block/sda/queue/rotational) -eq 0 ]]; then
# SSD detected, prefer ZRAM for databases
if [[ "$WORKLOAD_TYPE" == "database" ]]; then
swap_type="zram"
fi
fi
echo "$swap_type"
}
# Configure partition swap
configure_partition_swap() {
local size_mb="$1"
log "Configuring partition swap"
# Find swap partition
local swap_device
swap_device=$(blkid -t TYPE=swap -o device | head -1)
if [[ -z "$swap_device" ]]; then
error "No swap partition found"
fi
# Disable existing swap
swapoff -a 2>/dev/null || true
# Setup swap
mkswap "$swap_device"
swapon -p -1 "$swap_device"
# Update fstab
update_fstab "$swap_device" "partition"
log "Partition swap configured on $swap_device"
}
# Configure file swap
configure_file_swap() {
local size_mb="$1"
local swap_file="/var/swap/swapfile"
log "Configuring file swap (${size_mb}MB)"
# Create swap directory
mkdir -p "$(dirname "$swap_file")"
# Remove existing swap file if present
if [[ -f "$swap_file" ]]; then
swapoff "$swap_file" 2>/dev/null || true
rm -f "$swap_file"
fi
# Create swap file
log "Creating swap file..."
dd if=/dev/zero of="$swap_file" bs=1M count="$size_mb" status=progress
# Set permissions
chmod 600 "$swap_file"
# Setup swap
mkswap "$swap_file"
swapon -p -1 "$swap_file"
# Update fstab
update_fstab "$swap_file" "file"
log "File swap configured at $swap_file"
}
# Configure ZRAM swap
configure_zram_swap() {
local size_mb="$1"
log "Configuring ZRAM swap (${size_mb}MB)"
# Load module
modprobe zram num_devices=1
# Configure compression algorithm
echo lz4 > /sys/block/zram0/comp_algorithm
# Set size
echo "${size_mb}M" > /sys/block/zram0/disksize
# Create swap
mkswap /dev/zram0
swapon -p 5 /dev/zram0
# Create systemd service
create_zram_service "$size_mb"
log "ZRAM swap configured"
}
# Configure ZSWAP
configure_zswap() {
log "Configuring ZSWAP"
# Enable zswap
echo 1 > /sys/module/zswap/parameters/enabled
echo lz4 > /sys/module/zswap/parameters/compressor
echo 20 > /sys/module/zswap/parameters/max_pool_percent
# Update GRUB
update_grub_for_zswap
log "ZSWAP configured (requires reboot to fully activate)"
}
# Apply system parameters
apply_system_parameters() {
log "Applying system parameters"
# Apply recommended swappiness
echo "$RECOMMENDED_SWAPPINESS" > /proc/sys/vm/swappiness
# Apply other parameters based on workload
case "$WORKLOAD_TYPE" in
database)
echo 50 > /proc/sys/vm/vfs_cache_pressure
echo 5 > /proc/sys/vm/dirty_ratio
echo 2 > /proc/sys/vm/dirty_background_ratio
;;
webserver)
echo 80 > /proc/sys/vm/vfs_cache_pressure
echo 10 > /proc/sys/vm/dirty_ratio
echo 5 > /proc/sys/vm/dirty_background_ratio
;;
*)
echo 100 > /proc/sys/vm/vfs_cache_pressure
echo 15 > /proc/sys/vm/dirty_ratio
echo 5 > /proc/sys/vm/dirty_background_ratio
;;
esac
# Calculate min_free_kbytes (1% of RAM, max 256MB)
local total_ram_kb=$(grep MemTotal /proc/meminfo | awk '{print $2}')
local min_free_kb=$((total_ram_kb / 100))
if [[ $min_free_kb -gt 262144 ]]; then
min_free_kb=262144
fi
echo "$min_free_kb" > /proc/sys/vm/min_free_kbytes
# Make persistent
cat > /etc/sysctl.d/99-swap-optimization.conf <<EOF
# Swap optimization parameters
# Generated by enterprise-swap-deploy.sh
vm.swappiness = $RECOMMENDED_SWAPPINESS
vm.vfs_cache_pressure = $(cat /proc/sys/vm/vfs_cache_pressure)
vm.dirty_ratio = $(cat /proc/sys/vm/dirty_ratio)
vm.dirty_background_ratio = $(cat /proc/sys/vm/dirty_background_ratio)
vm.min_free_kbytes = $min_free_kb
vm.watermark_scale_factor = 100
EOF
sysctl -p /etc/sysctl.d/99-swap-optimization.conf
}
# Update fstab
update_fstab() {
local device="$1"
local type="$2"
log "Updating /etc/fstab"
# Backup fstab
cp /etc/fstab "/etc/fstab.bak.$(date +%Y%m%d-%H%M%S)"
# Remove existing swap entries
sed -i '/\sswap\s/d' /etc/fstab
# Add new entry
if [[ "$type" == "partition" ]]; then
local uuid=$(blkid -s UUID -o value "$device")
echo "UUID=$uuid none swap sw,pri=-1 0 0" >> /etc/fstab
else
echo "$device none swap sw,pri=-1 0 0" >> /etc/fstab
fi
}
# Create ZRAM systemd service
create_zram_service() {
local size_mb="$1"
cat > /etc/systemd/system/zram-swap.service <<EOF
[Unit]
Description=Configure ZRAM swap device
After=multi-user.target
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/local/bin/configure-zram.sh $size_mb
ExecStop=/sbin/swapoff /dev/zram0
[Install]
WantedBy=multi-user.target
EOF
cat > /usr/local/bin/configure-zram.sh <<'EOF'
#!/bin/bash
SIZE_MB=$1
modprobe zram
echo lz4 > /sys/block/zram0/comp_algorithm
echo "${SIZE_MB}M" > /sys/block/zram0/disksize
mkswap /dev/zram0
swapon -p 5 /dev/zram0
EOF
chmod +x /usr/local/bin/configure-zram.sh
systemctl daemon-reload
systemctl enable zram-swap.service
}
# Update GRUB for ZSWAP
update_grub_for_zswap() {
log "Updating GRUB configuration"
# Backup
cp /etc/default/grub "/etc/default/grub.bak.$(date +%Y%m%d-%H%M%S)"
# Add zswap parameters
sed -i 's/GRUB_CMDLINE_LINUX_DEFAULT="\(.*\)"/GRUB_CMDLINE_LINUX_DEFAULT="\1 zswap.enabled=1 zswap.compressor=lz4 zswap.max_pool_percent=20"/' /etc/default/grub
# Update GRUB
update-grub
}
# Verify swap configuration
verify_swap_configuration() {
log "Verifying swap configuration"
# Check swap status
if ! swapon --show | grep -q swap; then
error "No swap configured"
fi
# Display swap info
log "Current swap configuration:"
swapon --show
# Check parameters
log "System parameters:"
log " Swappiness: $(cat /proc/sys/vm/swappiness)"
log " VFS cache pressure: $(cat /proc/sys/vm/vfs_cache_pressure)"
log " Min free kbytes: $(cat /proc/sys/vm/min_free_kbytes)"
# Run health check
python3 /usr/local/bin/swap-manager.py \
--config "$CONFIG_FILE" \
--action monitor
}
# Setup monitoring
setup_monitoring() {
log "Setting up monitoring"
# Create monitoring script
cat > /usr/local/bin/swap-monitor.sh <<'EOF'
#!/bin/bash
# Continuous swap monitoring
while true; do
# Get metrics
SWAP_USED=$(free -b | grep Swap | awk '{print $3}')
SWAP_TOTAL=$(free -b | grep Swap | awk '{print $2}')
if [[ $SWAP_TOTAL -gt 0 ]]; then
SWAP_PERCENT=$((SWAP_USED * 100 / SWAP_TOTAL))
# Alert if swap usage is high
if [[ $SWAP_PERCENT -gt 80 ]]; then
logger -p warning "High swap usage: ${SWAP_PERCENT}%"
fi
fi
sleep 60
done
EOF
chmod +x /usr/local/bin/swap-monitor.sh
# Create systemd service
cat > /etc/systemd/system/swap-monitor.service <<EOF
[Unit]
Description=Swap usage monitor
After=multi-user.target
[Service]
Type=simple
ExecStart=/usr/local/bin/swap-monitor.sh
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable swap-monitor.service
systemctl start swap-monitor.service
}
# Main execution
main() {
local action="${1:-deploy}"
case "$action" in
deploy)
check_requirements
analyze_system
configure_swap "${2:-auto}" "${3:-}"
setup_monitoring
log "Swap deployment completed successfully"
;;
analyze)
check_requirements
analyze_system
;;
optimize)
check_requirements
python3 /usr/local/bin/swap-manager.py \
--config "$CONFIG_FILE" \
--action optimize
;;
status)
swapon --show
free -h
;;
help|*)
cat <<EOF
Usage: $0 [action] [options]
Actions:
deploy [type] [size] - Deploy swap configuration
analyze - Analyze system only
optimize - Auto-optimize swap
status - Show current status
help - Show this help
Swap types:
auto - Automatically determine best type
partition - Use swap partition
file - Use swap file
zram - Use compressed RAM swap
zswap - Use kernel swap compression
Examples:
$0 deploy # Auto-deploy with recommendations
$0 deploy file 8192 # Deploy 8GB file swap
$0 deploy zram # Deploy ZRAM with recommended size
EOF
;;
esac
}
# Execute main function
main "$@"
Memory Monitoring Dashboard
Grafana Dashboard Configuration
{
"dashboard": {
"title": "Enterprise Memory and Swap Management",
"uid": "memory-swap-dashboard",
"panels": [
{
"title": "Memory Usage Overview",
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 0},
"targets": [
{
"expr": "memory_usage_bytes{type=\"used\"} / memory_usage_bytes{type=\"total\"} * 100",
"legendFormat": "Memory Usage %"
},
{
"expr": "swap_usage_bytes{device=\"used\"} / swap_usage_bytes{device=\"total\"} * 100",
"legendFormat": "Swap Usage %"
}
]
},
{
"title": "Memory Pressure Score",
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 0},
"targets": [
{
"expr": "memory_pressure_score",
"legendFormat": "Pressure Score"
}
],
"thresholds": [
{"value": 30, "color": "green"},
{"value": 50, "color": "yellow"},
{"value": 70, "color": "red"}
]
},
{
"title": "Swap I/O Activity",
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 8},
"targets": [
{
"expr": "rate(swap_io_rate_bytes_per_second{direction=\"in\"}[5m])",
"legendFormat": "Swap In"
},
{
"expr": "rate(swap_io_rate_bytes_per_second{direction=\"out\"}[5m])",
"legendFormat": "Swap Out"
}
]
},
{
"title": "Top Memory Consumers",
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 8},
"type": "table",
"targets": [
{
"expr": "topk(10, process_memory_rss_bytes)",
"format": "table"
}
]
}
]
}
}
Troubleshooting Common Issues
High Swap Usage
#!/bin/bash
# diagnose-swap-usage.sh - Diagnose high swap usage
echo "=== Swap Usage Diagnosis ==="
echo
# Current swap status
echo "Current Swap Status:"
free -h
echo
# Top swap consumers
echo "Top Swap Consumers:"
for file in /proc/*/status; do
if [[ -r $file ]]; then
awk '/^Name:|^Pid:|^VmSwap:/ {printf "%s ", $2} END {print ""}' "$file"
fi
done 2>/dev/null | grep -v " 0 kB" | sort -k3 -rn | head -20
echo
# Swap activity
echo "Recent Swap Activity:"
sar -W 1 5
echo
# Memory pressure
echo "Memory Pressure Indicators:"
cat /proc/pressure/memory
echo
# Recommendations
echo "Recommendations:"
python3 -c "
import psutil
mem = psutil.virtual_memory()
swap = psutil.swap_memory()
if swap.percent > 80:
print('- CRITICAL: Swap usage exceeds 80%')
print('- Consider adding more RAM or increasing swap space')
elif swap.percent > 50:
print('- WARNING: Moderate swap usage')
print('- Monitor closely and optimize memory usage')
if mem.available < mem.total * 0.1:
print('- Low available memory - consider memory optimization')
"
Memory Performance Tuning
#!/bin/bash
# memory-performance-tune.sh - Tune memory performance
# Huge pages configuration for databases
configure_hugepages() {
local pages="${1:-1024}" # Default 2GB (1024 * 2MB)
echo "Configuring huge pages..."
# Set number of huge pages
echo "$pages" > /proc/sys/vm/nr_hugepages
# Make persistent
echo "vm.nr_hugepages = $pages" >> /etc/sysctl.d/99-hugepages.conf
# Create hugetlbfs mount
mkdir -p /dev/hugepages
mount -t hugetlbfs none /dev/hugepages
# Add to fstab
echo "none /dev/hugepages hugetlbfs defaults 0 0" >> /etc/fstab
}
# NUMA optimization
optimize_numa() {
echo "Optimizing NUMA settings..."
# Set NUMA balancing
echo 1 > /proc/sys/kernel/numa_balancing
# Configure zone reclaim
echo 0 > /proc/sys/vm/zone_reclaim_mode
# Display NUMA topology
numactl --hardware
}
# Transparent huge pages for applications
configure_thp() {
local mode="${1:-madvise}" # always, madvise, never
echo "Configuring transparent huge pages: $mode"
echo "$mode" > /sys/kernel/mm/transparent_hugepage/enabled
echo "$mode" > /sys/kernel/mm/transparent_hugepage/defrag
# Make persistent
cat >> /etc/rc.local <<EOF
echo $mode > /sys/kernel/mm/transparent_hugepage/enabled
echo $mode > /sys/kernel/mm/transparent_hugepage/defrag
EOF
}
# Main tuning
echo "=== Memory Performance Tuning ==="
echo
# Detect workload type
if pgrep -x mysqld > /dev/null || pgrep -x postgres > /dev/null; then
echo "Database workload detected"
configure_hugepages 2048 # 4GB
configure_thp "never" # Disable THP for databases
elif pgrep -x java > /dev/null; then
echo "Java application detected"
configure_hugepages 1024 # 2GB
configure_thp "madvise"
else
echo "General workload"
configure_thp "always"
fi
# NUMA optimization if available
if command -v numactl &> /dev/null; then
optimize_numa
fi
echo "Tuning complete"
Best Practices
1. Swap Sizing Guidelines
- Small Systems (≤2GB RAM): 2x RAM
- Medium Systems (4-8GB RAM): Equal to RAM
- Large Systems (16-64GB RAM): 0.5x RAM
- Very Large Systems (>64GB RAM): 16-32GB fixed
- With Hibernation: RAM + 10%
2. Swap Type Selection
- Databases: ZRAM or minimal swap
- Web Servers: File swap on SSD
- Compute Workloads: Large file/partition swap
- Containers: ZRAM with memory limits
- Virtual Machines: Disable swap in guest
3. Performance Optimization
- Monitor swap I/O patterns
- Use appropriate swappiness values
- Enable ZSWAP for compression
- Consider ZRAM for low-latency needs
- Regular memory leak detection
4. Monitoring Requirements
- Real-time memory pressure tracking
- Swap usage trends and patterns
- Per-process memory consumption
- OOM killer activity logging
- Predictive analytics for capacity
Conclusion
Enterprise Linux swap and memory management requires sophisticated strategies that balance performance, reliability, and resource utilization. By implementing comprehensive monitoring, intelligent configuration, and automated optimization frameworks, organizations can ensure optimal memory performance across diverse workloads while preventing out-of-memory conditions and maintaining system stability.
The combination of advanced swap technologies, machine learning-based predictions, and automated management systems provides the foundation for resilient memory management in modern data centers, enabling systems to handle varying workloads efficiently while maintaining predictable performance characteristics.