Load Balancing Configuration

Load Balancing Configuration

This document outlines the configuration for load balancing SyncApp in a production environment.

Overview

Load balancing distributes incoming traffic across multiple server instances to ensure high availability, fault tolerance, and optimal resource utilization. SyncApp's architecture supports both horizontal scaling of web servers and vertical scaling of specialized components.

Architecture Diagram

                                   ┌─────────────┐
                                   │             │
                        ┌──────────▶   CDN / WAF │
                        │          │             │
                        │          └─────────────┘
                        │                 │
                        │                 ▼
┌─────────────┐   ┌─────────────┐   ┌─────────────┐
│             │   │             │   │             │
│   Clients   ├───▶   DNS      ├───▶ Load Balancer│
│             │   │             │   │             │
└─────────────┘   └─────────────┘   └─────────────┘
                                          │
                                          │
                                          ▼
          ┌─────────────┬─────────────┬─────────────┐
          │             │             │             │
          │  Web App 1  │  Web App 2  │  Web App N  │
          │             │             │             │
          └─────────────┴─────────────┴─────────────┘
                 │             │             │
                 │             │             │
                 ▼             ▼             ▼
          ┌─────────────────────────────────────┐
          │                                     │
          │           Database Cluster          │
          │                                     │
          └─────────────────────────────────────┘
                            │
                            │
        ┌───────────────────┴────────────────────┐
        │                   │                    │
        ▼                   ▼                    ▼
┌─────────────┐      ┌─────────────┐      ┌─────────────┐
│             │      │             │      │             │
│   Celery    │      │   Redis     │      │   RabbitMQ  │
│   Workers   │      │   Cache     │      │   Broker    │
│             │      │             │      │             │
└─────────────┘      └─────────────┘      └─────────────┘

Load Balancing Options

1. Nginx Load Balancer

Nginx can be used as a cost-effective and highly performant load balancer for SyncApp.

Configuration Example:

# /etc/nginx/nginx.conf

user nginx;
worker_processes auto;
pid /var/run/nginx.pid;

events {
    worker_connections 4096;
    multi_accept on;
    use epoll;
}

http {
    # Basic settings
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;
    server_tokens off;

    # SSL settings
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_prefer_server_ciphers on;
    ssl_ciphers 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384';
    ssl_session_cache shared:SSL:10m;
    ssl_session_timeout 10m;

    # Logging
    access_log /var/log/nginx/access.log;
    error_log /var/log/nginx/error.log warn;

    # Gzip
    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

    # Upstream servers (SyncApp web servers)
    upstream syncapp_backend {
        # IP-hash for session persistence
        ip_hash;

        # List of backend servers
        server 10.0.1.1:8000 max_fails=3 fail_timeout=30s;
        server 10.0.1.2:8000 max_fails=3 fail_timeout=30s;
        server 10.0.1.3:8000 max_fails=3 fail_timeout=30s;

        # Slow-start to prevent overwhelming newly started instances
        server 10.0.1.4:8000 max_fails=3 fail_timeout=30s slow_start=30s backup;

        # Connection limits
        keepalive 32;
    }

    # Main server block
    server {
        listen 80;
        listen [::]:80;
        server_name syncapp.example.com;

        # Redirect HTTP to HTTPS
        location / {
            return 301 https://$host$request_uri;
        }
    }

    # HTTPS server block
    server {
        listen 443 ssl http2;
        listen [::]:443 ssl http2;
        server_name syncapp.example.com;

        # SSL certificates
        ssl_certificate /etc/nginx/ssl/syncapp.crt;
        ssl_certificate_key /etc/nginx/ssl/syncapp.key;

        # HSTS
        add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;

        # Other security headers
        add_header X-Content-Type-Options "nosniff" always;
        add_header X-Frame-Options "DENY" always;
        add_header X-XSS-Protection "1; mode=block" always;
        add_header Content-Security-Policy "default-src 'self'; script-src 'self' 'unsafe-inline'; style-src 'self' 'unsafe-inline'; img-src 'self' data:;" always;

        # Load balancing (proxy to upstream)
        location / {
            proxy_pass http://syncapp_backend;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;

            # Timeout settings
            proxy_connect_timeout 5s;
            proxy_send_timeout 60s;
            proxy_read_timeout 60s;

            # WebSocket support
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";

            # Buffering settings
            proxy_buffering on;
            proxy_buffer_size 16k;
            proxy_buffers 8 16k;

            # Cache control
            proxy_cache_bypass $http_cache_control;
            add_header X-Cache-Status $upstream_cache_status;
        }

        # Serve static files directly
        location /static/ {
            alias /var/www/syncapp/static/;
            expires 1d;
            add_header Cache-Control "public";
        }

        # Serve media files directly
        location /media/ {
            alias /var/www/syncapp/media/;
            expires 1d;
            add_header Cache-Control "public";
        }

        # Health checks
        location /health/ {
            access_log off;
            proxy_pass http://syncapp_backend;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;

            # More frequent health checks
            proxy_connect_timeout 2s;
            proxy_send_timeout 5s;
            proxy_read_timeout 5s;
        }
    }
}

Health Checks

Implement a health check endpoint in SyncApp to allow the load balancer to monitor server health:

# In views.py
from django.http import JsonResponse
from django.db import connection
from django.views.decorators.cache import never_cache
from django.conf import settings

@never_cache
def health_check(request):
    """
    Health check endpoint for load balancers.

    Checks:
    1. Database connection
    2. Redis connection (if used for cache)
    3. Celery workers (optional deeper check)
    """
    health_status = {
        'status': 'healthy',
        'database': False,
        'cache': False,
        'celery': False,
        'version': settings.APP_VERSION
    }

    # Check database
    try:
        with connection.cursor() as cursor:
            cursor.execute('SELECT 1')
            row = cursor.fetchone()
            health_status['database'] = row[0] == 1
    except Exception as e:
        health_status['database'] = False
        health_status['database_error'] = str(e)
        health_status['status'] = 'unhealthy'

    # Check cache (Redis)
    try:
        from django.core.cache import cache
        cache.set('healthcheck', 'ok', 10)
        result = cache.get('healthcheck')
        health_status['cache'] = result == 'ok'
    except Exception as e:
        health_status['cache'] = False
        health_status['cache_error'] = str(e)
        health_status['status'] = 'unhealthy'

    # Optional: Check Celery
    if request.GET.get('check_celery', '').lower() in ('true', '1', 'yes'):
        try:
            from celery.app.control import Control
            from sync.celery import app as celery_app

            control = Control(celery_app)
            ping_response = control.ping(timeout=1.0)
            health_status['celery'] = len(ping_response) > 0
            health_status['celery_workers'] = len(ping_response)
        except Exception as e:
            health_status['celery'] = False
            health_status['celery_error'] = str(e)
            health_status['status'] = 'unhealthy'
    else:
        health_status['celery'] = True  # Skip detailed check

    # Return appropriate status code
    status_code = 200 if health_status['status'] == 'healthy' else 503

    return JsonResponse(health_status, status=status_code)

2. HAProxy Load Balancer

HAProxy is another excellent option for load balancing with advanced health checking and traffic management capabilities.

Configuration Example:

# /etc/haproxy/haproxy.cfg

global
    log /dev/log local0
    log /dev/log local1 notice
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
    stats timeout 30s
    user haproxy
    group haproxy
    daemon

    # SSL settings
    ssl-default-bind-options no-sslv3 no-tlsv10 no-tlsv11 no-tls-tickets
    ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
    ssl-default-server-options no-sslv3 no-tlsv10 no-tlsv11 no-tls-tickets
    ssl-default-server-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384

    # DH parameters
    tune.ssl.default-dh-param 2048

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    timeout connect 5000
    timeout client  50000
    timeout server  50000
    errorfile 400 /etc/haproxy/errors/400.http
    errorfile 403 /etc/haproxy/errors/403.http
    errorfile 408 /etc/haproxy/errors/408.http
    errorfile 500 /etc/haproxy/errors/500.http
    errorfile 502 /etc/haproxy/errors/502.http
    errorfile 503 /etc/haproxy/errors/503.http
    errorfile 504 /etc/haproxy/errors/504.http

# Frontend for HTTP (redirect to HTTPS)
frontend http-in
    bind *:80
    mode http
    option httplog
    option forwardfor
    redirect scheme https code 301 if !{ ssl_fc }

# Frontend for HTTPS
frontend https-in
    bind *:443 ssl crt /etc/haproxy/certs/syncapp.pem
    mode http
    option httplog
    option forwardfor

    # HSTS
    http-response set-header Strict-Transport-Security "max-age=31536000; includeSubDomains"

    # Security headers
    http-response set-header X-Content-Type-Options "nosniff"
    http-response set-header X-Frame-Options "DENY"
    http-response set-header X-XSS-Protection "1; mode=block"

    # ACLs
    acl is_static path_beg /static/ /media/
    acl is_health path_beg /health/

    # Route based on ACLs
    use_backend syncapp-static if is_static
    use_backend syncapp-health if is_health
    default_backend syncapp-backend

# Backend for application servers
backend syncapp-backend
    mode http
    balance roundrobin
    option httpchk GET /health/
    http-check expect status 200

    # Session persistence with cookie
    cookie SERVERID insert indirect nocache

    # Backend servers
    server web1 10.0.1.1:8000 check cookie web1 maxconn 100 weight 10
    server web2 10.0.1.2:8000 check cookie web2 maxconn 100 weight 10
    server web3 10.0.1.3:8000 check cookie web3 maxconn 100 weight 10
    server backup1 10.0.1.4:8000 check cookie backup1 maxconn 50 weight 5 backup

    # Compression
    compression algo gzip
    compression type text/html text/plain text/css application/javascript application/json

# Backend for static files
backend syncapp-static
    mode http
    balance roundrobin

    # Static file servers (could be separate from application servers)
    server static1 10.0.2.1:80 check
    server static2 10.0.2.2:80 check

    # Caching headers
    http-response set-header Cache-Control "max-age=86400, public"
    http-response set-header Expires "%[date(3600+%s),http_date]"

# Backend for health checks
backend syncapp-health
    mode http
    balance roundrobin
    option httpchk GET /health/
    http-check expect status 200

    # Health check servers (same as app servers)
    server web1 10.0.1.1:8000 check
    server web2 10.0.1.2:8000 check
    server web3 10.0.1.3:8000 check

# Stats page
listen stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 10s
    stats admin if LOCALHOST
    stats auth admin:strongpassword

3. AWS Application Load Balancer (ALB)

For cloud deployments on AWS, the Application Load Balancer provides advanced features like content-based routing.

Configuration Example (via AWS CLI):

# Create a target group for the SyncApp instances
aws elbv2 create-target-group \
    --name syncapp-tg \
    --protocol HTTP \
    --port 8000 \
    --vpc-id vpc-0abc123 \
    --health-check-protocol HTTP \
    --health-check-path /health/ \
    --health-check-interval-seconds 30 \
    --health-check-timeout-seconds 5 \
    --healthy-threshold-count 2 \
    --unhealthy-threshold-count 2 \
    --matcher "HttpCode=200" \
    --target-type instance

# Register instances with the target group
aws elbv2 register-targets \
    --target-group-arn arn:aws:elasticloadbalancing:region:account-id:targetgroup/syncapp-tg/1234567890 \
    --targets Id=i-0123456789,Port=8000 Id=i-0987654321,Port=8000

# Create a load balancer
aws elbv2 create-load-balancer \
    --name syncapp-lb \
    --subnets subnet-0abc123 subnet-0def456 \
    --security-groups sg-0abc123 \
    --scheme internet-facing \
    --type application \
    --ip-address-type ipv4

# Create a listener for HTTPS
aws elbv2 create-listener \
    --load-balancer-arn arn:aws:elasticloadbalancing:region:account-id:loadbalancer/app/syncapp-lb/1234567890 \
    --protocol HTTPS \
    --port 443 \
    --ssl-policy ELBSecurityPolicy-TLS-1-2-2017-01 \
    --certificates CertificateArn=arn:aws:acm:region:account-id:certificate/12345678-1234-1234-1234-123456789012 \
    --default-actions Type=forward,TargetGroupArn=arn:aws:elasticloadbalancing:region:account-id:targetgroup/syncapp-tg/1234567890

# Create a listener for HTTP (redirect to HTTPS)
aws elbv2 create-listener \
    --load-balancer-arn arn:aws:elasticloadbalancing:region:account-id:loadbalancer/app/syncapp-lb/1234567890 \
    --protocol HTTP \
    --port 80 \
    --default-actions "Type=redirect,RedirectConfig={Protocol=HTTPS,Port=443,StatusCode=HTTP_301}"

Session Persistence

For applications requiring session persistence, configure your load balancer with one of these methods:

  1. Sticky Sessions: Configure the load balancer to route a user's requests to the same backend server.

    • Cookie-based: Load balancer sets a cookie to track which server to use.
    • IP-based: Uses the client's IP address to determine the server.
  2. Centralized Session Storage: Store session data in a shared backend like Redis.

    # settings.py
    CACHES = {
        'default': {
            'BACKEND': 'django_redis.cache.RedisCache',
            'LOCATION': 'redis://redis:6379/1',
            'OPTIONS': {
                'CLIENT_CLASS': 'django_redis.client.DefaultClient',
            }
        }
    }
    
    SESSION_ENGINE = 'django.contrib.sessions.backends.cache'
    SESSION_CACHE_ALIAS = 'default'
    

Auto-Scaling Configuration

AWS Auto Scaling Group

# Create a launch template
aws ec2 create-launch-template \
    --launch-template-name syncapp-launch-template \
    --version-description "Initial version" \
    --launch-template-data '{
        "ImageId": "ami-0abcdef1234567890",
        "InstanceType": "t3.large",
        "SecurityGroupIds": ["sg-0abc123"],
        "KeyName": "syncapp-key",
        "UserData": "IyEvYmluL2Jhc2gKIyBJbnN0YWxsIGRlcGVuZGVuY2llcwphcHQtZ2V0IHVwZGF0ZQphcHQtZ2V0IGluc3RhbGwgLXkgZ2l0IHB5dGhvbjMgcHl0aG9uMy1waXAgbmdpbnggc3VwZXJ2aXNvcgojIENsb25lIGFwcGxpY2F0aW9uCmdpdCBjbG9uZSBodHRwczovL2dpdGh1Yi5jb20veW91cnVzZXJuYW1lL3N5bmNhcHAuZ2l0IC9vcHQvc3luY2FwcAojIFNldCB1cCBhcHBsaWNhdGlvbgpjZCAvb3B0L3N5bmNhcHAKcGlwMybigKY="
    }'

# Create an auto scaling group
aws autoscaling create-auto-scaling-group \
    --auto-scaling-group-name syncapp-asg \
    --launch-template LaunchTemplateName=syncapp-launch-template,Version='$Latest' \
    --min-size 2 \
    --max-size 10 \
    --desired-capacity 2 \
    --vpc-zone-identifier "subnet-0abc123,subnet-0def456" \
    --target-group-arns "arn:aws:elasticloadbalancing:region:account-id:targetgroup/syncapp-tg/1234567890" \
    --health-check-type ELB \
    --health-check-grace-period 300

# Create scaling policies based on CPU utilization
aws autoscaling put-scaling-policy \
    --auto-scaling-group-name syncapp-asg \
    --policy-name cpu-scale-out \
    --policy-type TargetTrackingScaling \
    --target-tracking-configuration '{
        "PredefinedMetricSpecification": {
            "PredefinedMetricType": "ASGAverageCPUUtilization"
        },
        "TargetValue": 70.0,
        "ScaleOutCooldown": 300,
        "ScaleInCooldown": 300
    }'

Specialized Workload Distribution

For optimal performance, separate different types of SyncApp workloads onto dedicated servers:

  1. Web Servers: Handle HTTP requests and API calls.
  2. Worker Servers: Run Celery workers for asynchronous tasks.
  3. Database Servers: Run PostgreSQL with read replicas.
  4. Cache Servers: Run Redis for caching and Celery broker.

Example Celery Worker Configuration

# celery_config.py
from kombu import Queue, Exchange

# Define task queues with routing
task_queues = (
    Queue('default', Exchange('default'), routing_key='default'),
    Queue('sync_jobs', Exchange('sync_jobs'), routing_key='sync.jobs.*'),
    Queue('data_processing', Exchange('data_processing'), routing_key='data.processing.*'),
    Queue('reporting', Exchange('reporting'), routing_key='reporting.*'),
    Queue('maintenance', Exchange('maintenance'), routing_key='maintenance.*'),
)

# Route tasks to specific queues
task_routes = {
    'sync.tasks.run_sync_job': {'queue': 'sync_jobs'},
    'sync.tasks.process_data_chunk': {'queue': 'data_processing'},
    'sync.tasks.generate_report': {'queue': 'reporting'},
    'sync.tasks.cleanup_old_data': {'queue': 'maintenance'},
}

# Start workers for specific queues:
# celery -A ipass worker -Q sync_jobs -l INFO -c 4 --hostname=sync_jobs@%h
# celery -A ipass worker -Q data_processing -l INFO -c 8 --hostname=data_proc@%h
# celery -A ipass worker -Q reporting -l INFO -c 2 --hostname=reporting@%h
# celery -A ipass worker -Q maintenance -l INFO -c 1 --hostname=maintenance@%h

Load Balancer Monitoring

Nginx Monitoring

# Add to your nginx.conf
server {
    listen 8080;
    allow 127.0.0.1;
    allow 10.0.0.0/8;
    deny all;

    location /nginx_status {
        stub_status on;
        access_log off;
    }
}

HAProxy Monitoring

# Add to your haproxy.cfg
listen stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 10s
    stats admin if LOCALHOST
    stats auth admin:strongpassword

Configure Prometheus to scrape these metrics endpoints for comprehensive monitoring.

Best Practices

  1. Health Checks: Implement proper health checks that verify actual application functionality, not just that the server is responding.

  2. Graceful Draining: When removing a server from rotation, allow it to finish processing current requests:

    # AWS Example
    aws elbv2 deregister-targets \
        --target-group-arn arn:aws:elasticloadbalancing:region:account-id:targetgroup/syncapp-tg/1234567890 \
        --targets Id=i-0123456789
    
    # Wait for connections to drain
    sleep 60
    
    # Then stop the instance
    aws ec2 stop-instances --instance-ids i-0123456789
    
  3. Consistent Hashing: For cache servers, use consistent hashing to minimize cache misses during scaling events.

  4. Connection Pooling: Implement database connection pooling to handle large numbers of connections:

    # Django settings.py
    DATABASES = {
        'default': {
            'ENGINE': 'django.db.backends.postgresql',
            'NAME': 'syncapp',
            'USER': 'syncapp_user',
            'PASSWORD': 'password',
            'HOST': 'db.example.com',
            'PORT': '5432',
            'CONN_MAX_AGE': 600,  # Keep connections open for 10 minutes
            'OPTIONS': {
                'keepalives': 1,
                'keepalives_idle': 30,
                'keepalives_interval': 10,
                'keepalives_count': 5,
            }
        }
    }
    
  5. Rate Limiting: Implement rate limiting at the load balancer level to protect from traffic spikes.

  6. SSL Termination: Terminate SSL at the load balancer to reduce CPU load on application servers.

  7. Static Content: Serve static content from a CDN rather than through the load balancer when possible.

  8. Immutable Infrastructure: Use immutable infrastructure approach – instead of updating servers, deploy new ones and retire old ones.

Troubleshooting

Common Issues and Solutions

  1. Uneven Load Distribution

    • Check load balancer algorithm (round-robin, least connections, etc.)
    • Verify server weights are properly configured
    • Check for stuck connections or sessions
  2. Health Check Failures

    • Validate health check endpoints are working properly
    • Increase health check thresholds if servers are being marked unhealthy too aggressively
    • Check for resource exhaustion (CPU, memory, connections) causing health checks to fail
  3. Session Loss

    • Verify sticky sessions are properly configured
    • Check that session storage (Redis) is accessible from all servers
    • Validate cookie settings for session persistence
  4. Timeout Issues

    • Adjust timeout settings in load balancer configuration
    • Increase timeouts for long-running operations
    • Implement asynchronous processing for slow operations

Monitoring and Alerts

  1. Load Balancer Metrics to Monitor

    • Active connections
    • Request rate
    • Error rate
    • Backend server health
    • Response time
    • Rejected connections
  2. Alert Thresholds

    • High error rates (> 1% of requests)
    • Unhealthy backends (any server failing health checks)
    • High response time (> 500ms for 95th percentile)
    • High connection count (> 80% of maximum)
  3. Dashboard Example (Prometheus + Grafana)

    • Connection tracking (current, rate, total)
    • Request tracking by server
    • Error rates by server and status code
    • Response time percentiles
    • Server health status timeline