Backup and Recovery Procedures

Backup and Recovery Procedures

This document outlines the backup and recovery procedures for SyncApp.

Backup Strategy

SyncApp requires backups of the following components:

  1. PostgreSQL database
  2. Media files and uploads
  3. Configuration files
  4. Celery task state (if applicable)

Database Backups

Automated Daily Backups

A daily backup script /scripts/backup_db.sh is configured to run via cron:

#!/bin/bash
# /scripts/backup_db.sh

# Configuration
DB_NAME="syncapp"
DB_USER="syncapp_user"
BACKUP_DIR="/var/backups/syncapp/database"
RETENTION_DAYS=14
DATE=$(date +%Y%m%d%H%M)

# Ensure backup directory exists
mkdir -p $BACKUP_DIR

# Create backup
pg_dump -Fc -U $DB_USER $DB_NAME > $BACKUP_DIR/syncapp_db_$DATE.dump

# Compress backup
gzip $BACKUP_DIR/syncapp_db_$DATE.dump

# Remove old backups
find $BACKUP_DIR -name "syncapp_db_*.dump.gz" -mtime +$RETENTION_DAYS -delete

# Log success
echo "Database backup completed: $BACKUP_DIR/syncapp_db_$DATE.dump.gz"

Set up a cron job to run this script daily:

# Add to crontab
0 2 * * * /scripts/backup_db.sh >> /var/log/syncapp/backup.log 2>&1

Point-in-Time Recovery

For PostgreSQL, enable WAL (Write-Ahead Log) archiving for point-in-time recovery:

  1. Configure postgresql.conf:
wal_level = replica
archive_mode = on
archive_command = 'cp %p /var/backups/syncapp/wal_archive/%f'
  1. Set up WAL archive cleanup:
#!/bin/bash
# /scripts/cleanup_wal.sh

WAL_DIR="/var/backups/syncapp/wal_archive"
RETENTION_DAYS=3

# Remove WAL files older than retention period
find $WAL_DIR -name "*.gz" -mtime +$RETENTION_DAYS -delete

Media and Static Files Backup

Media files should be backed up regularly:

#!/bin/bash
# /scripts/backup_media.sh

MEDIA_DIR="/var/www/syncapp/media"
BACKUP_DIR="/var/backups/syncapp/media"
DATE=$(date +%Y%m%d%H%M)
RETENTION_DAYS=14

# Ensure backup directory exists
mkdir -p $BACKUP_DIR

# Create backup
tar -czf $BACKUP_DIR/syncapp_media_$DATE.tar.gz $MEDIA_DIR

# Remove old backups
find $BACKUP_DIR -name "syncapp_media_*.tar.gz" -mtime +$RETENTION_DAYS -delete

# Log success
echo "Media backup completed: $BACKUP_DIR/syncapp_media_$DATE.tar.gz"

Configuration Backup

Application configuration should be backed up after any changes:

#!/bin/bash
# /scripts/backup_config.sh

CONFIG_DIRS="/etc/syncapp /etc/nginx/sites-available /etc/supervisord/conf.d"
BACKUP_DIR="/var/backups/syncapp/config"
DATE=$(date +%Y%m%d%H%M)

# Ensure backup directory exists
mkdir -p $BACKUP_DIR

# Create backup
tar -czf $BACKUP_DIR/syncapp_config_$DATE.tar.gz $CONFIG_DIRS

# Keep all configuration backups (they are small)
# Optionally, implement rotation if needed

# Log success
echo "Configuration backup completed: $BACKUP_DIR/syncapp_config_$DATE.tar.gz"

Off-site Backup Storage

All backups should be copied to off-site storage:

#!/bin/bash
# /scripts/offsite_backup.sh

BACKUP_DIR="/var/backups/syncapp"
S3_BUCKET="s3://syncapp-backups"
DATE=$(date +%Y%m%d)

# Sync backups to S3
aws s3 sync $BACKUP_DIR $S3_BUCKET/$DATE/

# Log success
echo "Off-site backup completed to $S3_BUCKET/$DATE/"

Verification Procedures

Regularly verify backup integrity:

#!/bin/bash
# /scripts/verify_backups.sh

DB_NAME="syncapp_verify"
LATEST_BACKUP=$(find /var/backups/syncapp/database -name "syncapp_db_*.dump.gz" | sort | tail -n 1)

# Create temporary database
createdb -U syncapp_user $DB_NAME

# Restore latest backup to temporary database
gunzip -c $LATEST_BACKUP | pg_restore -d $DB_NAME

# Run verification queries
psql -U syncapp_user -d $DB_NAME -c "SELECT COUNT(*) FROM sync_job;"
psql -U syncapp_user -d $DB_NAME -c "SELECT COUNT(*) FROM auth_user;"

# Drop temporary database
dropdb -U syncapp_user $DB_NAME

# Log success
echo "Backup verification completed for $LATEST_BACKUP"

Schedule this verification weekly:

# Add to crontab
0 4 * * 0 /scripts/verify_backups.sh >> /var/log/syncapp/backup_verify.log 2>&1

Recovery Procedures

Database Recovery

Full Database Restore

To restore the entire database:

#!/bin/bash
# /scripts/restore_db.sh

# Configuration
DB_NAME="syncapp"
DB_USER="syncapp_user"
BACKUP_FILE=$1  # Provide as argument

if [ -z "$BACKUP_FILE" ]; then
    echo "Usage: restore_db.sh <backup_file>"
    exit 1
fi

# Stop application services
systemctl stop syncapp
systemctl stop syncapp-celery
systemctl stop syncapp-celerybeat

# Drop and recreate database
dropdb -U $DB_USER $DB_NAME
createdb -U $DB_USER $DB_NAME

# Restore from backup
if [[ $BACKUP_FILE == *.gz ]]; then
    gunzip -c $BACKUP_FILE | pg_restore -U $DB_USER -d $DB_NAME
else
    pg_restore -U $DB_USER -d $DB_NAME $BACKUP_FILE
fi

# Start application services
systemctl start syncapp
systemctl start syncapp-celery
systemctl start syncapp-celerybeat

echo "Database restored from $BACKUP_FILE"

Point-in-Time Recovery

For recovering to a specific point in time:

#!/bin/bash
# /scripts/pitr_recovery.sh

# Configuration
DB_NAME="syncapp"
DB_USER="syncapp_user"
BACKUP_FILE=$1  # Provide as argument
RECOVERY_TARGET_TIME=$2  # Format: '2023-05-01 14:30:00'

if [ -z "$BACKUP_FILE" ] || [ -z "$RECOVERY_TARGET_TIME" ]; then
    echo "Usage: pitr_recovery.sh <backup_file> <recovery_target_time>"
    echo "Example: pitr_recovery.sh /var/backups/syncapp_db_20230501.dump.gz '2023-05-01 14:30:00'"
    exit 1
fi

# Stop application services
systemctl stop syncapp
systemctl stop syncapp-celery
systemctl stop syncapp-celerybeat

# Create recovery.conf file
cat > /var/lib/postgresql/data/recovery.conf << EOF
restore_command = 'cp /var/backups/syncapp/wal_archive/%f %p'
recovery_target_time = '$RECOVERY_TARGET_TIME'
EOF

# Start recovery process
# ... Additional PostgreSQL-specific steps here ...

echo "Point-in-time recovery started for time: $RECOVERY_TARGET_TIME"

Media Files Recovery

To restore media files:

#!/bin/bash
# /scripts/restore_media.sh

MEDIA_DIR="/var/www/syncapp/media"
BACKUP_FILE=$1  # Provide as argument

if [ -z "$BACKUP_FILE" ]; then
    echo "Usage: restore_media.sh <backup_file>"
    exit 1
fi

# Extract backup to media directory
tar -xzf $BACKUP_FILE -C /

# Fix permissions
chown -R www-data:www-data $MEDIA_DIR

echo "Media files restored from $BACKUP_FILE"

Configuration Recovery

To restore configuration files:

#!/bin/bash
# /scripts/restore_config.sh

BACKUP_FILE=$1  # Provide as argument

if [ -z "$BACKUP_FILE" ]; then
    echo "Usage: restore_config.sh <backup_file>"
    exit 1
fi

# Extract backup to root
tar -xzf $BACKUP_FILE -C /

# Restart services to apply configuration
systemctl restart nginx
systemctl restart supervisord
systemctl restart syncapp

echo "Configuration restored from $BACKUP_FILE"

Disaster Recovery Plan

Complete System Failure

In case of complete system failure, follow these steps:

  1. Provision New Server

    # Install required packages
    apt-get update
    apt-get install -y postgresql nginx python3 python3-pip supervisor
    
  2. Restore Configuration

    # Download latest config backup from S3
    aws s3 cp s3://syncapp-backups/latest/syncapp_config_latest.tar.gz /tmp/
    
    # Restore configuration
    /scripts/restore_config.sh /tmp/syncapp_config_latest.tar.gz
    
  3. Restore Database

    # Download latest database backup from S3
    aws s3 cp s3://syncapp-backups/latest/syncapp_db_latest.dump.gz /tmp/
    
    # Restore database
    /scripts/restore_db.sh /tmp/syncapp_db_latest.dump.gz
    
  4. Restore Media Files

    # Download latest media backup from S3
    aws s3 cp s3://syncapp-backups/latest/syncapp_media_latest.tar.gz /tmp/
    
    # Restore media files
    /scripts/restore_media.sh /tmp/syncapp_media_latest.tar.gz
    
  5. Verify System

    # Run application health checks
    curl -I http://localhost/health/
    
    # Check database connectivity
    python3 manage.py dbshell
    
    # Validate application functionality
    python3 manage.py test_recovery
    

Partial System Failure

For component-specific failures:

Database Failure

# Check PostgreSQL status
systemctl status postgresql

# If corrupted, restore database only
/scripts/restore_db.sh /var/backups/syncapp/database/latest.dump.gz

Web Server Failure

# Check Nginx status
systemctl status nginx

# Restore Nginx configuration
tar -xzf /var/backups/syncapp/config/latest.tar.gz --strip-components=3 etc/nginx/sites-available/

# Restart Nginx
systemctl restart nginx

Testing Recovery Procedures

Test recovery procedures quarterly:

#!/bin/bash
# /scripts/test_recovery.sh

# Create test environment
docker-compose -f docker-compose.test.yml up -d

# Test database recovery
docker-compose -f docker-compose.test.yml exec app /scripts/restore_db.sh /var/backups/syncapp/database/latest.dump.gz

# Test media recovery
docker-compose -f docker-compose.test.yml exec app /scripts/restore_media.sh /var/backups/syncapp/media/latest.tar.gz

# Run integration tests
docker-compose -f docker-compose.test.yml exec app python manage.py test recovery

# Report results
echo "Recovery test completed at $(date)"

Schedule quarterly testing:

# Add to crontab
0 0 1 */3 * /scripts/test_recovery.sh >> /var/log/syncapp/recovery_test.log 2>&1

Documentation and Training

  1. Recovery Documentation

    Keep this document updated and ensure it's accessible offline:

    # Print physical copy
    lpr -P office_printer /path/to/backup_recovery.md
    
    # Store copy on USB drive
    cp /path/to/backup_recovery.md /mnt/usb_backup/
    
  2. Staff Training

    Conduct recovery drills with staff twice a year:

    # Schedule in company calendar
    echo "Backup recovery drill: $(date -d '6 months')" | mail -s "Schedule Recovery Drill" team@example.com
    
  3. Recovery Runbooks

    Create simplified runbooks for common scenarios:

    # Example: Database Recovery Runbook
    cat > /runbooks/database_recovery.md << EOF
    # Database Recovery Runbook
    
    1. Login to server: ssh admin@syncapp-server
    2. Run restore script: /scripts/restore_db.sh <backup_file>
    3. Verify: curl https://syncapp.example.com/health/
    EOF