Complete Linux LVM Management Guide: Advanced Storage Administration and Enterprise Best Practices
Linux Logical Volume Manager (LVM) provides enterprise-grade storage management capabilities, enabling dynamic disk space allocation, advanced snapshot functionality, and flexible storage pool management. This comprehensive guide covers LVM fundamentals, advanced thin provisioning, enterprise deployment strategies, and automation frameworks for production environments.
LVM Architecture and Fundamentals
Storage Stack Overview
LVM operates as an abstraction layer between physical storage devices and filesystem mount points:
┌─────────────────┐
│ Filesystems │ (ext4, XFS, Btrfs)
├─────────────────┤
│ Logical Volumes │ (LV)
├─────────────────┤
│ Volume Groups │ (VG)
├─────────────────┤
│ Physical Volumes│ (PV)
├─────────────────┤
│ Block Devices │ (/dev/sda, /dev/nvme0n1)
└─────────────────┘
Core Components
Physical Volumes (PV)
- Raw storage devices prepared for LVM use
- Can be entire disks, partitions, or RAID devices
- Contain LVM metadata and physical extents
Volume Groups (VG)
- Storage pools combining multiple physical volumes
- Provide unified space management across devices
- Enable dynamic storage allocation
Logical Volumes (LV)
- Virtual partitions allocated from volume group space
- Support dynamic resizing and advanced features
- Can span multiple physical devices
Essential LVM Operations
System Information and Monitoring
# Display physical volume information
pvs # Summary view
pvdisplay # Detailed view
pvdisplay /dev/sdb1 # Specific PV details
# Volume group information
vgs # Summary view
vgdisplay # Detailed view
vgdisplay vg_data # Specific VG details
# Logical volume information
lvs # Summary view
lvdisplay # Detailed view
lvdisplay /dev/vg_data/lv_app # Specific LV details
# Complete LVM overview with size formatting
pvs --units g --separator " | " --aligned
vgs --units g --separator " | " --aligned
lvs --units g --separator " | " --aligned
Advanced Monitoring with Custom Output
# Custom LVM status display
lvs -o +lv_layout,lv_role,lv_when_full,data_percent,metadata_percent \
--separator " | " --aligned
# Monitor thin pool usage
lvs -o lv_name,lv_size,data_percent,metadata_percent,pool_lv \
--select 'lv_layout=~"thin.*"' --units g
# Physical volume space analysis
pvs -o pv_name,pv_size,pv_free,pv_used,vg_name \
--units g --separator " | " --aligned
# Volume group extent information
vgs -o vg_name,vg_size,vg_free,vg_extent_size,vg_extent_count,vg_free_count \
--units g --separator " | " --aligned
Physical Volume and Volume Group Management
Physical Volume Operations
Creating Physical Volumes
# Standard physical volume creation
pvcreate /dev/sdb
# Multiple devices simultaneously
pvcreate /dev/sdb /dev/sdc /dev/sdd
# Advanced options with metadata backup
pvcreate --dataalignment 1M --metadatasize 128M /dev/sdb
# Force creation (overwrite existing data)
pvcreate --force /dev/sdb
# Create with custom UUID
pvcreate --uuid $(uuidgen) /dev/sdb
Physical Volume Maintenance
# Remove physical volume from LVM
pvremove /dev/sdb
# Move data off physical volume before removal
pvmove /dev/sdb
# Move specific logical volume
pvmove /dev/sdb:1000-2000 /dev/sdc
# Resize physical volume after underlying device expansion
pvresize /dev/sdb
# Check physical volume integrity
pvck /dev/sdb
Volume Group Management
Advanced Volume Group Creation
# Standard volume group creation
vgcreate vg_data /dev/sdb /dev/sdc
# Custom extent size (default 4MB)
vgcreate --physicalextentsize 32M vg_data /dev/sdb /dev/sdc
# Maximum logical volumes and physical volumes
vgcreate --maxlogicalvolumes 255 --maxphysicalvolumes 64 vg_data /dev/sdb
# Enable clustering support
vgcreate --clustered y vg_cluster /dev/sdb /dev/sdc
Volume Group Modification
# Add physical volumes to existing VG
vgextend vg_data /dev/sdd /dev/sde
# Remove physical volume from VG
vgreduce vg_data /dev/sde
# Remove missing/failed physical volumes
vgreduce --removemissing vg_data
# Rename volume group
vgrename vg_old vg_new
# Import/export volume groups
vgexport vg_data
vgimport vg_data
# Activate/deactivate volume groups
vgchange -a y vg_data # Activate
vgchange -a n vg_data # Deactivate
Logical Volume Advanced Management
Standard Logical Volume Operations
Creation and Basic Management
# Create logical volume with specific size
lvcreate -L 100G -n lv_app vg_data
# Create using percentage of VG space
lvcreate -l 50%VG -n lv_database vg_data
# Use all available space
lvcreate -l 100%FREE -n lv_storage vg_data
# Create with specific filesystem
lvcreate -L 50G -n lv_web vg_data
mkfs.ext4 /dev/vg_data/lv_web
# Create and mount in one operation
lvcreate -L 25G -n lv_logs vg_data
mkfs.xfs /dev/vg_data/lv_logs
mkdir -p /var/log/application
mount /dev/vg_data/lv_logs /var/log/application
Logical Volume Resizing
# Extend logical volume and filesystem simultaneously
lvextend -L+20G -r /dev/vg_data/lv_app
# Extend to specific size
lvextend -L 150G -r /dev/vg_data/lv_database
# Extend using percentage
lvextend -l +50%FREE -r /dev/vg_data/lv_storage
# Manual filesystem resize (if -r option unavailable)
lvextend -L+10G /dev/vg_data/lv_app
resize2fs /dev/vg_data/lv_app # For ext2/3/4
xfs_growfs /mount/point # For XFS
# Reduce logical volume (ext2/3/4 only - NOT XFS)
umount /dev/vg_data/lv_app
e2fsck -f /dev/vg_data/lv_app
resize2fs /dev/vg_data/lv_app 80G
lvreduce -L 80G /dev/vg_data/lv_app
Linear vs Striped Logical Volumes
Striping for Performance
# Create striped logical volume across multiple PVs
lvcreate -L 100G -i 3 -I 64 -n lv_fast_storage vg_data
# Parameters:
# -i 3: Stripe across 3 physical volumes
# -I 64: 64KB stripe size
# Verify striping configuration
lvdisplay -m /dev/vg_data/lv_fast_storage
# Create striped volume on specific PVs
lvcreate -L 50G -i 2 -I 128 -n lv_database vg_data /dev/sdb /dev/sdc
Mirroring for Redundancy
# Create mirrored logical volume
lvcreate -L 100G -m 1 -n lv_critical vg_data
# Mirror with specific physical volumes
lvcreate -L 50G -m 1 -n lv_important vg_data /dev/sdb /dev/sdc
# Convert existing LV to mirrored
lvconvert -m 1 /dev/vg_data/lv_app
# Remove mirroring
lvconvert -m 0 /dev/vg_data/lv_app
Thin Provisioning Management
Thin Pool Architecture
Thin provisioning enables over-allocation of storage resources, providing space efficiency and advanced snapshot capabilities.
Thin Pool Creation and Management
# Install required tools
apt update && apt install -y thin-provisioning-tools
# Create thin pool
lvcreate --type thin-pool -L 500G --thinpool pool_main vg_data
# Create thin pool with custom chunk size
lvcreate --type thin-pool -L 1T --chunksize 1M --thinpool pool_ssd vg_data
# Create thin pool with metadata on separate device
lvcreate --type thin-pool -L 500G --poolmetadata lv_metadata \
--thinpool pool_main vg_data
# Display thin pool information
lvs -o +chunksize,data_percent,metadata_percent pool_main
Thin Volume Operations
# Create thin logical volumes
lvcreate -V 100G --thinpool pool_main -n thin_app vg_data
lvcreate -V 200G --thinpool pool_main -n thin_database vg_data
lvcreate -V 50G --thinpool pool_main -n thin_web vg_data
# Monitor thin pool usage
watch -n 5 'lvs -o lv_name,lv_size,data_percent,metadata_percent \
--select "lv_layout=~thin.*" --units g'
# Extend thin pool when approaching capacity
lvextend -L+100G vg_data/pool_main
# Extend thin pool metadata
lvextend --poolmetadatasize +1G vg_data/pool_main
Advanced Thin Pool Configuration
Automated Thin Pool Management
#!/bin/bash
# Automated thin pool monitoring and extension script
THIN_POOL="vg_data/pool_main"
DATA_THRESHOLD=80
META_THRESHOLD=70
EXTEND_SIZE="50G"
LOGFILE="/var/log/lvm-thin-monitor.log"
log_message() {
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" | tee -a "$LOGFILE"
}
check_thin_pool_usage() {
local pool="$1"
local data_usage=$(lvs --noheadings -o data_percent "$pool" | tr -d ' %')
local meta_usage=$(lvs --noheadings -o metadata_percent "$pool" | tr -d ' %')
if [[ $(echo "$data_usage > $DATA_THRESHOLD" | bc -l) -eq 1 ]]; then
log_message "WARNING: Thin pool data usage at ${data_usage}%, extending by $EXTEND_SIZE"
if lvextend -L+$EXTEND_SIZE "$pool"; then
log_message "SUCCESS: Extended thin pool data by $EXTEND_SIZE"
else
log_message "ERROR: Failed to extend thin pool data"
return 1
fi
fi
if [[ $(echo "$meta_usage > $META_THRESHOLD" | bc -l) -eq 1 ]]; then
log_message "WARNING: Thin pool metadata usage at ${meta_usage}%, extending by 1G"
if lvextend --poolmetadatasize +1G "$pool"; then
log_message "SUCCESS: Extended thin pool metadata by 1G"
else
log_message "ERROR: Failed to extend thin pool metadata"
return 1
fi
fi
log_message "INFO: Thin pool usage - Data: ${data_usage}%, Metadata: ${meta_usage}%"
}
# Monitor thin pool
check_thin_pool_usage "$THIN_POOL"
Thin Pool Performance Optimization
# Configure thin pool for optimal performance
cat > /etc/lvm/lvm.conf.d/thin-performance.conf << 'EOF'
# Thin pool performance tuning
allocation {
thin_pool_metadata_require_separate_pvs = 1
thin_pool_zero = 0
thin_pool_chunk_size_policy = "generic"
thin_pool_chunk_size = 1024 # 1MB chunks
}
devices {
issue_discards = 1
}
EOF
# Set kernel parameters for thin provisioning
echo 'vm.vfs_cache_pressure = 50' >> /etc/sysctl.conf
echo 'vm.dirty_background_ratio = 5' >> /etc/sysctl.conf
echo 'vm.dirty_ratio = 10' >> /etc/sysctl.conf
sysctl -p
LVM Snapshot Management
Traditional LVM Snapshots
Snapshot Creation and Management
# Create traditional snapshot with specific size
lvcreate -L 10G -s -n snap_app_backup /dev/vg_data/lv_app
# Create snapshot using percentage of origin size
lvcreate -l 20%ORIGIN -s -n snap_database_maint /dev/vg_data/lv_database
# List snapshots with origin information
lvs -o lv_name,lv_size,origin,snap_percent --select 'lv_layout=snapshot'
# Monitor snapshot usage
watch -n 5 'lvs -o lv_name,origin,snap_percent --select "lv_layout=snapshot"'
Snapshot Operations
# Mount snapshot for backup operations
mkdir -p /mnt/snapshots/app_backup
mount /dev/vg_data/snap_app_backup /mnt/snapshots/app_backup
# Create backup from snapshot
tar -czf /backup/app_$(date +%Y%m%d_%H%M%S).tar.gz \
-C /mnt/snapshots/app_backup .
# Merge snapshot back to origin (revert changes)
umount /mnt/snapshots/app_backup
lvconvert --merge /dev/vg_data/snap_app_backup
# Remove snapshot
lvremove /dev/vg_data/snap_app_backup
Thin Snapshots
Advanced Thin Snapshot Operations
# Create thin snapshot (no additional space required)
lvcreate -s -n thin_snap_app vg_data/thin_app
# Create multiple snapshots for version control
lvcreate -s -n snap_app_v1.0 vg_data/thin_app
lvcreate -s -n snap_app_v1.1 vg_data/thin_app
lvcreate -s -n snap_app_v1.2 vg_data/thin_app
# Activate specific snapshot
lvchange -a y vg_data/snap_app_v1.0
# Create read-write snapshot for testing
lvcreate -s -n test_environment vg_data/thin_app
mkdir -p /mnt/test_env
mount /dev/vg_data/test_environment /mnt/test_env
Automated Snapshot Management
#!/usr/bin/env python3
"""
Enterprise LVM Snapshot Management System
"""
import subprocess
import json
import datetime
import logging
from pathlib import Path
from typing import List, Dict, Optional
class LVMSnapshotManager:
def __init__(self, config_file: str = "/etc/lvm-snapshot.conf"):
self.config = self.load_config(config_file)
self.logger = logging.getLogger(__name__)
def load_config(self, config_file: str) -> Dict:
"""Load snapshot configuration"""
default_config = {
"retention_days": 7,
"max_snapshots": 10,
"snapshot_prefix": "auto_snap",
"volumes": [],
"backup_location": "/backup/snapshots"
}
try:
with open(config_file, 'r') as f:
config = json.load(f)
return {**default_config, **config}
except FileNotFoundError:
return default_config
def create_snapshot(self, volume_path: str, snapshot_name: str = None) -> bool:
"""Create LVM snapshot"""
if not snapshot_name:
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
volume_name = volume_path.split('/')[-1]
snapshot_name = f"{self.config['snapshot_prefix']}_{volume_name}_{timestamp}"
try:
# Create thin snapshot
result = subprocess.run([
'lvcreate', '-s', '-n', snapshot_name, volume_path
], capture_output=True, text=True, check=True)
self.logger.info(f"Created snapshot {snapshot_name} for {volume_path}")
return True
except subprocess.CalledProcessError as e:
self.logger.error(f"Failed to create snapshot: {e.stderr}")
return False
def list_snapshots(self, volume_path: str = None) -> List[Dict]:
"""List existing snapshots"""
try:
cmd = ['lvs', '--noheadings', '-o', 'lv_name,origin,lv_time',
'--select', 'lv_layout=snapshot', '--separator', '|']
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
snapshots = []
for line in result.stdout.strip().split('\n'):
if line.strip():
name, origin, time_created = line.strip().split('|')
snapshots.append({
'name': name.strip(),
'origin': origin.strip(),
'created': time_created.strip()
})
if volume_path:
volume_name = volume_path.split('/')[-1]
snapshots = [s for s in snapshots if s['origin'] == volume_name]
return snapshots
except subprocess.CalledProcessError as e:
self.logger.error(f"Failed to list snapshots: {e.stderr}")
return []
def cleanup_old_snapshots(self, volume_path: str) -> None:
"""Remove old snapshots based on retention policy"""
snapshots = self.list_snapshots(volume_path)
volume_name = volume_path.split('/')[-1]
vg_name = volume_path.split('/')[2]
# Sort by creation time (newest first)
snapshots.sort(key=lambda x: x['created'], reverse=True)
# Keep only the specified number of snapshots
snapshots_to_remove = snapshots[self.config['max_snapshots']:]
for snapshot in snapshots_to_remove:
try:
subprocess.run([
'lvremove', '-f', f"{vg_name}/{snapshot['name']}"
], check=True)
self.logger.info(f"Removed old snapshot: {snapshot['name']}")
except subprocess.CalledProcessError as e:
self.logger.error(f"Failed to remove snapshot {snapshot['name']}: {e}")
def backup_snapshot(self, snapshot_path: str, backup_name: str = None) -> bool:
"""Create backup from snapshot"""
if not backup_name:
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
backup_name = f"backup_{timestamp}.tar.gz"
backup_dir = Path(self.config['backup_location'])
backup_dir.mkdir(parents=True, exist_ok=True)
backup_file = backup_dir / backup_name
mount_point = Path(f"/mnt/snapshot_backup_{datetime.datetime.now().strftime('%Y%m%d_%H%M%S')}")
try:
# Create temporary mount point
mount_point.mkdir(parents=True, exist_ok=True)
# Mount snapshot
subprocess.run(['mount', snapshot_path, str(mount_point)], check=True)
# Create backup
subprocess.run([
'tar', '-czf', str(backup_file),
'-C', str(mount_point), '.'
], check=True)
self.logger.info(f"Created backup: {backup_file}")
return True
except subprocess.CalledProcessError as e:
self.logger.error(f"Backup failed: {e}")
return False
finally:
# Cleanup
try:
subprocess.run(['umount', str(mount_point)], check=False)
mount_point.rmdir()
except:
pass
def automated_snapshot_routine(self) -> None:
"""Perform automated snapshot routine for all configured volumes"""
for volume in self.config['volumes']:
volume_path = volume['path']
self.logger.info(f"Processing volume: {volume_path}")
# Create new snapshot
if self.create_snapshot(volume_path):
# Cleanup old snapshots
self.cleanup_old_snapshots(volume_path)
# Create backup if configured
if volume.get('backup', False):
snapshots = self.list_snapshots(volume_path)
if snapshots:
newest_snapshot = snapshots[0]
vg_name = volume_path.split('/')[2]
snapshot_path = f"/dev/{vg_name}/{newest_snapshot['name']}"
self.backup_snapshot(snapshot_path)
# Example configuration file
example_config = {
"retention_days": 7,
"max_snapshots": 5,
"snapshot_prefix": "auto_snap",
"backup_location": "/backup/lvm_snapshots",
"volumes": [
{
"path": "/dev/vg_data/lv_app",
"backup": True
},
{
"path": "/dev/vg_data/lv_database",
"backup": True
},
{
"path": "/dev/vg_data/lv_web",
"backup": False
}
]
}
# Usage example
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)
manager = LVMSnapshotManager()
manager.automated_snapshot_routine()
XFS Filesystem Considerations
XFS Snapshot Mounting Issues
XFS filesystems require special handling for snapshot mounting due to UUID conflicts:
UUID Conflict Resolution
# Method 1: Mount with UUID disabled (temporary)
mount -o nouuid -t xfs /dev/vg_data/snap_xfs_volume /mnt/snapshot
# Method 2: Regenerate UUID (permanent fix)
# First repair the filesystem
xfs_repair -L /dev/vg_data/snap_xfs_volume
# Generate new UUID
xfs_admin -U $(uuidgen) /dev/vg_data/snap_xfs_volume
# Verify new UUID
xfs_admin -u /dev/vg_data/snap_xfs_volume
# Mount normally
mount -t xfs /dev/vg_data/snap_xfs_volume /mnt/snapshot
Automated XFS Snapshot Handling
#!/bin/bash
# XFS snapshot mount helper script
mount_xfs_snapshot() {
local snapshot_device="$1"
local mount_point="$2"
local force_uuid_regen="${3:-false}"
if [[ -z "$snapshot_device" || -z "$mount_point" ]]; then
echo "Usage: mount_xfs_snapshot <device> <mount_point> [force_uuid_regen]"
return 1
fi
# Create mount point if it doesn't exist
mkdir -p "$mount_point"
# Try normal mount first
if mount -t xfs "$snapshot_device" "$mount_point" 2>/dev/null; then
echo "Successfully mounted $snapshot_device to $mount_point"
return 0
fi
# Try nouuid mount
if mount -o nouuid -t xfs "$snapshot_device" "$mount_point" 2>/dev/null; then
echo "Mounted $snapshot_device to $mount_point with nouuid option"
return 0
fi
# If force UUID regeneration is requested
if [[ "$force_uuid_regen" == "true" ]]; then
echo "Regenerating UUID for $snapshot_device"
# Repair filesystem
if xfs_repair -L "$snapshot_device"; then
echo "XFS repair completed"
else
echo "XFS repair failed"
return 1
fi
# Generate new UUID
if xfs_admin -U "$(uuidgen)" "$snapshot_device"; then
echo "UUID regenerated successfully"
else
echo "UUID regeneration failed"
return 1
fi
# Mount with new UUID
if mount -t xfs "$snapshot_device" "$mount_point"; then
echo "Successfully mounted with new UUID"
return 0
fi
fi
echo "Failed to mount XFS snapshot $snapshot_device"
return 1
}
# Example usage
# mount_xfs_snapshot /dev/vg_data/snap_xfs_app /mnt/snapshots/app_backup true
Enterprise LVM Deployment Strategies
High Availability LVM Configuration
Clustered LVM Setup
# Install cluster tools
apt install -y cman clvm
# Configure cluster.conf
cat > /etc/cluster/cluster.conf << 'EOF'
<?xml version="1.0"?>
<cluster name="storage_cluster" config_version="1">
<cman expected_votes="3" two_node="0"/>
<clusternodes>
<clusternode name="node1" nodeid="1">
<fence>
<method name="single">
<device name="node1_fence" port="1"/>
</method>
</fence>
</clusternode>
<clusternode name="node2" nodeid="2">
<fence>
<method name="single">
<device name="node2_fence" port="2"/>
</method>
</fence>
</clusternode>
<clusternode name="node3" nodeid="3">
<fence>
<method name="single">
<device name="node3_fence" port="3"/>
</method>
</fence>
</clusternode>
</clusternodes>
</cluster>
EOF
# Enable clustered LVM
sed -i 's/locking_type = 1/locking_type = 3/' /etc/lvm/lvm.conf
systemctl enable cman clvm
systemctl start cman clvm
# Create clustered volume group
vgcreate --clustered y vg_cluster /dev/shared_storage
Storage Tiering with LVM
#!/bin/bash
# Automated storage tiering implementation
setup_tiered_storage() {
local vg_name="$1"
local ssd_devices="$2"
local hdd_devices="$3"
# Create volume group with mixed storage
vgcreate "$vg_name" $ssd_devices $hdd_devices
# Create cache pool on SSD
lvcreate --type cache-pool -L 100G -n cache_pool "$vg_name" $ssd_devices
# Create data volume on HDD
lvcreate -L 1T -n data_volume "$vg_name" $hdd_devices
# Attach cache to data volume
lvconvert --type cache --cachepool "$vg_name/cache_pool" "$vg_name/data_volume"
echo "Tiered storage setup completed for $vg_name"
}
# Usage example
# setup_tiered_storage "vg_tiered" "/dev/nvme0n1 /dev/nvme0n2" "/dev/sda /dev/sdb /dev/sdc"
Performance Optimization
I/O Scheduler Configuration
# Configure I/O schedulers for different storage types
cat > /etc/udev/rules.d/60-lvm-scheduler.rules << 'EOF'
# SSD optimization
ACTION=="add|change", KERNEL=="nvme*", ATTR{queue/scheduler}="none"
ACTION=="add|change", KERNEL=="sd*", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="mq-deadline"
# HDD optimization
ACTION=="add|change", KERNEL=="sd*", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="bfq"
EOF
# Apply udev rules
udevadm control --reload-rules
udevadm trigger
LVM Performance Tuning
# Optimize LVM configuration for performance
cat > /etc/lvm/lvm.conf.d/performance.conf << 'EOF'
# Performance optimization settings
devices {
# Reduce device scanning overhead
scan = [ "/dev/disk/by-id", "/dev/mapper", "/dev/md" ]
# Enable multipath support
multipath_component_detection = 1
# Optimize for large storage arrays
data_alignment_detection = 1
data_alignment_offset_detection = 1
}
allocation {
# Thin pool optimization
thin_pool_metadata_require_separate_pvs = 1
thin_pool_zero = 0
# RAID optimization
raid_region_size = 2048
}
# Cache settings for better performance
global {
use_aio = 1
use_mlockall = 1
locking_dir = "/run/lock/lvm"
}
EOF
Monitoring and Alerting
Comprehensive LVM Monitoring
Prometheus Monitoring Integration
#!/usr/bin/env python3
"""
LVM Metrics Exporter for Prometheus
"""
import subprocess
import time
import re
from prometheus_client import start_http_server, Gauge, Info
class LVMMetricsExporter:
def __init__(self, port=9100):
self.port = port
# Define metrics
self.pv_size = Gauge('lvm_pv_size_bytes', 'Physical volume size in bytes', ['pv_name', 'vg_name'])
self.pv_free = Gauge('lvm_pv_free_bytes', 'Physical volume free space in bytes', ['pv_name', 'vg_name'])
self.vg_size = Gauge('lvm_vg_size_bytes', 'Volume group size in bytes', ['vg_name'])
self.vg_free = Gauge('lvm_vg_free_bytes', 'Volume group free space in bytes', ['vg_name'])
self.lv_size = Gauge('lvm_lv_size_bytes', 'Logical volume size in bytes', ['lv_name', 'vg_name'])
self.lv_data_percent = Gauge('lvm_lv_data_percent', 'Thin pool data usage percent', ['lv_name', 'vg_name'])
self.lv_metadata_percent = Gauge('lvm_lv_metadata_percent', 'Thin pool metadata usage percent', ['lv_name', 'vg_name'])
self.lvm_info = Info('lvm_info', 'LVM system information')
def collect_pv_metrics(self):
"""Collect physical volume metrics"""
try:
result = subprocess.run([
'pvs', '--noheadings', '--units', 'b', '--nosuffix',
'-o', 'pv_name,vg_name,pv_size,pv_free'
], capture_output=True, text=True, check=True)
for line in result.stdout.strip().split('\n'):
if line.strip():
parts = line.strip().split()
if len(parts) >= 4:
pv_name, vg_name, pv_size, pv_free = parts[:4]
self.pv_size.labels(pv_name=pv_name, vg_name=vg_name).set(float(pv_size))
self.pv_free.labels(pv_name=pv_name, vg_name=vg_name).set(float(pv_free))
except subprocess.CalledProcessError as e:
print(f"Error collecting PV metrics: {e}")
def collect_vg_metrics(self):
"""Collect volume group metrics"""
try:
result = subprocess.run([
'vgs', '--noheadings', '--units', 'b', '--nosuffix',
'-o', 'vg_name,vg_size,vg_free'
], capture_output=True, text=True, check=True)
for line in result.stdout.strip().split('\n'):
if line.strip():
parts = line.strip().split()
if len(parts) >= 3:
vg_name, vg_size, vg_free = parts[:3]
self.vg_size.labels(vg_name=vg_name).set(float(vg_size))
self.vg_free.labels(vg_name=vg_name).set(float(vg_free))
except subprocess.CalledProcessError as e:
print(f"Error collecting VG metrics: {e}")
def collect_lv_metrics(self):
"""Collect logical volume metrics"""
try:
result = subprocess.run([
'lvs', '--noheadings', '--units', 'b', '--nosuffix',
'-o', 'lv_name,vg_name,lv_size,data_percent,metadata_percent'
], capture_output=True, text=True, check=True)
for line in result.stdout.strip().split('\n'):
if line.strip():
parts = line.strip().split()
if len(parts) >= 3:
lv_name, vg_name, lv_size = parts[:3]
self.lv_size.labels(lv_name=lv_name, vg_name=vg_name).set(float(lv_size))
# Handle optional thin pool metrics
if len(parts) >= 4 and parts[3] != '':
data_percent = float(parts[3])
self.lv_data_percent.labels(lv_name=lv_name, vg_name=vg_name).set(data_percent)
if len(parts) >= 5 and parts[4] != '':
metadata_percent = float(parts[4])
self.lv_metadata_percent.labels(lv_name=lv_name, vg_name=vg_name).set(metadata_percent)
except subprocess.CalledProcessError as e:
print(f"Error collecting LV metrics: {e}")
def collect_all_metrics(self):
"""Collect all LVM metrics"""
self.collect_pv_metrics()
self.collect_vg_metrics()
self.collect_lv_metrics()
def start_server(self):
"""Start metrics server"""
start_http_server(self.port)
print(f"LVM metrics server started on port {self.port}")
while True:
self.collect_all_metrics()
time.sleep(30) # Collect metrics every 30 seconds
if __name__ == "__main__":
exporter = LVMMetricsExporter()
exporter.start_server()
Grafana Dashboard Configuration
{
"dashboard": {
"title": "LVM Storage Monitoring",
"panels": [
{
"title": "Volume Group Usage",
"type": "stat",
"targets": [
{
"expr": "(lvm_vg_size_bytes - lvm_vg_free_bytes) / lvm_vg_size_bytes * 100",
"legendFormat": "{{vg_name}} Usage %"
}
]
},
{
"title": "Thin Pool Data Usage",
"type": "graph",
"targets": [
{
"expr": "lvm_lv_data_percent",
"legendFormat": "{{vg_name}}/{{lv_name}} Data %"
}
]
},
{
"title": "Thin Pool Metadata Usage",
"type": "graph",
"targets": [
{
"expr": "lvm_lv_metadata_percent",
"legendFormat": "{{vg_name}}/{{lv_name}} Metadata %"
}
]
}
]
}
}
Automated Alert System
AlertManager Rules
# LVM alerting rules for Prometheus AlertManager
groups:
- name: lvm.rules
rules:
- alert: VolumeGroupHighUsage
expr: (lvm_vg_size_bytes - lvm_vg_free_bytes) / lvm_vg_size_bytes * 100 > 85
for: 5m
labels:
severity: warning
annotations:
summary: "Volume group {{ $labels.vg_name }} usage is high"
description: "Volume group {{ $labels.vg_name }} is {{ $value }}% full"
- alert: ThinPoolDataCritical
expr: lvm_lv_data_percent > 90
for: 2m
labels:
severity: critical
annotations:
summary: "Thin pool {{ $labels.vg_name }}/{{ $labels.lv_name }} data usage critical"
description: "Thin pool data usage is {{ $value }}%"
- alert: ThinPoolMetadataCritical
expr: lvm_lv_metadata_percent > 80
for: 2m
labels:
severity: critical
annotations:
summary: "Thin pool {{ $labels.vg_name }}/{{ $labels.lv_name }} metadata usage critical"
description: "Thin pool metadata usage is {{ $value }}%"
- alert: PhysicalVolumeDown
expr: up{job="lvm-exporter"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "LVM monitoring down"
description: "LVM metrics collection is not working"
Disaster Recovery and Backup Strategies
LVM Metadata Backup
Automated Metadata Protection
#!/bin/bash
# LVM metadata backup and recovery script
BACKUP_DIR="/etc/lvm/backup"
ARCHIVE_DIR="/etc/lvm/archive"
REMOTE_BACKUP="/backup/lvm-metadata"
RETENTION_DAYS=30
backup_lvm_metadata() {
local timestamp=$(date +%Y%m%d_%H%M%S)
local backup_file="lvm_metadata_${timestamp}.tar.gz"
echo "Creating LVM metadata backup..."
# Create backup directory
mkdir -p "$REMOTE_BACKUP"
# Force metadata backup
vgcfgbackup
# Create compressed archive
tar -czf "$REMOTE_BACKUP/$backup_file" \
-C /etc/lvm backup archive
# Backup physical volume labels
pvs --noheadings -o pv_name | while read pv; do
pv_clean=$(echo "$pv" | tr '/' '_')
dd if="$pv" bs=512 count=1 \
of="$REMOTE_BACKUP/pv_label_${pv_clean}_${timestamp}.bin" 2>/dev/null
done
echo "Metadata backup completed: $backup_file"
}
restore_lvm_metadata() {
local vg_name="$1"
local backup_file="$2"
if [[ -z "$vg_name" || -z "$backup_file" ]]; then
echo "Usage: restore_lvm_metadata <vg_name> <backup_file>"
return 1
fi
echo "Restoring metadata for volume group: $vg_name"
# Restore from backup file
vgcfgrestore -f "$backup_file" "$vg_name"
# Activate volume group
vgchange -a y "$vg_name"
echo "Metadata restoration completed for $vg_name"
}
cleanup_old_backups() {
find "$REMOTE_BACKUP" -name "lvm_metadata_*.tar.gz" \
-mtime +$RETENTION_DAYS -delete
find "$REMOTE_BACKUP" -name "pv_label_*.bin" \
-mtime +$RETENTION_DAYS -delete
echo "Cleaned up backups older than $RETENTION_DAYS days"
}
# Automated daily backup
backup_lvm_metadata
cleanup_old_backups
Cross-Site Replication
#!/bin/bash
# LVM replication setup for disaster recovery
setup_lvm_replication() {
local source_vg="$1"
local target_host="$2"
local target_vg="$3"
# Create replication logical volume
lvcreate --type raid1 -m 1 -L 100G -n replicated_data "$source_vg"
# Setup DRBD replication
cat > /etc/drbd.d/lvm_replication.res << EOF
resource lvm_replication {
device /dev/drbd0;
disk /dev/${source_vg}/replicated_data;
meta-disk internal;
on $(hostname) {
address $(hostname -I | awk '{print $1}'):7788;
}
on ${target_host} {
address ${target_host}:7788;
}
}
EOF
# Initialize DRBD
drbdadm create-md lvm_replication
drbdadm up lvm_replication
drbdadm primary --force lvm_replication
echo "LVM replication setup completed"
}
# Monitor replication status
monitor_replication() {
while true; do
echo "=== DRBD Replication Status ==="
drbdadm status
echo "==============================="
sleep 60
done
}
This comprehensive LVM guide provides enterprise-grade storage management capabilities, enabling administrators to implement robust, scalable, and highly available storage solutions in production environments. Regular monitoring, automated management, and proper backup procedures ensure reliable storage operations across diverse infrastructure requirements.