Load Balancing Configuration
Load Balancing Configuration
This document outlines the configuration for load balancing SyncApp in a production environment.
Overview
Load balancing distributes incoming traffic across multiple server instances to ensure high availability, fault tolerance, and optimal resource utilization. SyncApp's architecture supports both horizontal scaling of web servers and vertical scaling of specialized components.
Architecture Diagram
┌─────────────┐
│ │
┌──────────▶ CDN / WAF │
│ │ │
│ └─────────────┘
│ │
│ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ │ │ │ │ │
│ Clients ├───▶ DNS ├───▶ Load Balancer│
│ │ │ │ │ │
└─────────────┘ └─────────────┘ └─────────────┘
│
│
▼
┌─────────────┬─────────────┬─────────────┐
│ │ │ │
│ Web App 1 │ Web App 2 │ Web App N │
│ │ │ │
└─────────────┴─────────────┴─────────────┘
│ │ │
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────┐
│ │
│ Database Cluster │
│ │
└─────────────────────────────────────┘
│
│
┌───────────────────┴────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ │ │ │ │ │
│ Celery │ │ Redis │ │ RabbitMQ │
│ Workers │ │ Cache │ │ Broker │
│ │ │ │ │ │
└─────────────┘ └─────────────┘ └─────────────┘
Load Balancing Options
1. Nginx Load Balancer
Nginx can be used as a cost-effective and highly performant load balancer for SyncApp.
Configuration Example:
# /etc/nginx/nginx.conf
user nginx;
worker_processes auto;
pid /var/run/nginx.pid;
events {
worker_connections 4096;
multi_accept on;
use epoll;
}
http {
# Basic settings
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
server_tokens off;
# SSL settings
ssl_protocols TLSv1.2 TLSv1.3;
ssl_prefer_server_ciphers on;
ssl_ciphers 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384';
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
# Logging
access_log /var/log/nginx/access.log;
error_log /var/log/nginx/error.log warn;
# Gzip
gzip on;
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;
# Upstream servers (SyncApp web servers)
upstream syncapp_backend {
# IP-hash for session persistence
ip_hash;
# List of backend servers
server 10.0.1.1:8000 max_fails=3 fail_timeout=30s;
server 10.0.1.2:8000 max_fails=3 fail_timeout=30s;
server 10.0.1.3:8000 max_fails=3 fail_timeout=30s;
# Slow-start to prevent overwhelming newly started instances
server 10.0.1.4:8000 max_fails=3 fail_timeout=30s slow_start=30s backup;
# Connection limits
keepalive 32;
}
# Main server block
server {
listen 80;
listen [::]:80;
server_name syncapp.example.com;
# Redirect HTTP to HTTPS
location / {
return 301 https://$host$request_uri;
}
}
# HTTPS server block
server {
listen 443 ssl http2;
listen [::]:443 ssl http2;
server_name syncapp.example.com;
# SSL certificates
ssl_certificate /etc/nginx/ssl/syncapp.crt;
ssl_certificate_key /etc/nginx/ssl/syncapp.key;
# HSTS
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
# Other security headers
add_header X-Content-Type-Options "nosniff" always;
add_header X-Frame-Options "DENY" always;
add_header X-XSS-Protection "1; mode=block" always;
add_header Content-Security-Policy "default-src 'self'; script-src 'self' 'unsafe-inline'; style-src 'self' 'unsafe-inline'; img-src 'self' data:;" always;
# Load balancing (proxy to upstream)
location / {
proxy_pass http://syncapp_backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Timeout settings
proxy_connect_timeout 5s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
# WebSocket support
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
# Buffering settings
proxy_buffering on;
proxy_buffer_size 16k;
proxy_buffers 8 16k;
# Cache control
proxy_cache_bypass $http_cache_control;
add_header X-Cache-Status $upstream_cache_status;
}
# Serve static files directly
location /static/ {
alias /var/www/syncapp/static/;
expires 1d;
add_header Cache-Control "public";
}
# Serve media files directly
location /media/ {
alias /var/www/syncapp/media/;
expires 1d;
add_header Cache-Control "public";
}
# Health checks
location /health/ {
access_log off;
proxy_pass http://syncapp_backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
# More frequent health checks
proxy_connect_timeout 2s;
proxy_send_timeout 5s;
proxy_read_timeout 5s;
}
}
}
Health Checks
Implement a health check endpoint in SyncApp to allow the load balancer to monitor server health:
# In views.py
from django.http import JsonResponse
from django.db import connection
from django.views.decorators.cache import never_cache
from django.conf import settings
@never_cache
def health_check(request):
"""
Health check endpoint for load balancers.
Checks:
1. Database connection
2. Redis connection (if used for cache)
3. Celery workers (optional deeper check)
"""
health_status = {
'status': 'healthy',
'database': False,
'cache': False,
'celery': False,
'version': settings.APP_VERSION
}
# Check database
try:
with connection.cursor() as cursor:
cursor.execute('SELECT 1')
row = cursor.fetchone()
health_status['database'] = row[0] == 1
except Exception as e:
health_status['database'] = False
health_status['database_error'] = str(e)
health_status['status'] = 'unhealthy'
# Check cache (Redis)
try:
from django.core.cache import cache
cache.set('healthcheck', 'ok', 10)
result = cache.get('healthcheck')
health_status['cache'] = result == 'ok'
except Exception as e:
health_status['cache'] = False
health_status['cache_error'] = str(e)
health_status['status'] = 'unhealthy'
# Optional: Check Celery
if request.GET.get('check_celery', '').lower() in ('true', '1', 'yes'):
try:
from celery.app.control import Control
from sync.celery import app as celery_app
control = Control(celery_app)
ping_response = control.ping(timeout=1.0)
health_status['celery'] = len(ping_response) > 0
health_status['celery_workers'] = len(ping_response)
except Exception as e:
health_status['celery'] = False
health_status['celery_error'] = str(e)
health_status['status'] = 'unhealthy'
else:
health_status['celery'] = True # Skip detailed check
# Return appropriate status code
status_code = 200 if health_status['status'] == 'healthy' else 503
return JsonResponse(health_status, status=status_code)
2. HAProxy Load Balancer
HAProxy is another excellent option for load balancing with advanced health checking and traffic management capabilities.
Configuration Example:
# /etc/haproxy/haproxy.cfg
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
stats timeout 30s
user haproxy
group haproxy
daemon
# SSL settings
ssl-default-bind-options no-sslv3 no-tlsv10 no-tlsv11 no-tls-tickets
ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
ssl-default-server-options no-sslv3 no-tlsv10 no-tlsv11 no-tls-tickets
ssl-default-server-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
# DH parameters
tune.ssl.default-dh-param 2048
defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
# Frontend for HTTP (redirect to HTTPS)
frontend http-in
bind *:80
mode http
option httplog
option forwardfor
redirect scheme https code 301 if !{ ssl_fc }
# Frontend for HTTPS
frontend https-in
bind *:443 ssl crt /etc/haproxy/certs/syncapp.pem
mode http
option httplog
option forwardfor
# HSTS
http-response set-header Strict-Transport-Security "max-age=31536000; includeSubDomains"
# Security headers
http-response set-header X-Content-Type-Options "nosniff"
http-response set-header X-Frame-Options "DENY"
http-response set-header X-XSS-Protection "1; mode=block"
# ACLs
acl is_static path_beg /static/ /media/
acl is_health path_beg /health/
# Route based on ACLs
use_backend syncapp-static if is_static
use_backend syncapp-health if is_health
default_backend syncapp-backend
# Backend for application servers
backend syncapp-backend
mode http
balance roundrobin
option httpchk GET /health/
http-check expect status 200
# Session persistence with cookie
cookie SERVERID insert indirect nocache
# Backend servers
server web1 10.0.1.1:8000 check cookie web1 maxconn 100 weight 10
server web2 10.0.1.2:8000 check cookie web2 maxconn 100 weight 10
server web3 10.0.1.3:8000 check cookie web3 maxconn 100 weight 10
server backup1 10.0.1.4:8000 check cookie backup1 maxconn 50 weight 5 backup
# Compression
compression algo gzip
compression type text/html text/plain text/css application/javascript application/json
# Backend for static files
backend syncapp-static
mode http
balance roundrobin
# Static file servers (could be separate from application servers)
server static1 10.0.2.1:80 check
server static2 10.0.2.2:80 check
# Caching headers
http-response set-header Cache-Control "max-age=86400, public"
http-response set-header Expires "%[date(3600+%s),http_date]"
# Backend for health checks
backend syncapp-health
mode http
balance roundrobin
option httpchk GET /health/
http-check expect status 200
# Health check servers (same as app servers)
server web1 10.0.1.1:8000 check
server web2 10.0.1.2:8000 check
server web3 10.0.1.3:8000 check
# Stats page
listen stats
bind *:8404
stats enable
stats uri /stats
stats refresh 10s
stats admin if LOCALHOST
stats auth admin:strongpassword
3. AWS Application Load Balancer (ALB)
For cloud deployments on AWS, the Application Load Balancer provides advanced features like content-based routing.
Configuration Example (via AWS CLI):
# Create a target group for the SyncApp instances
aws elbv2 create-target-group \
--name syncapp-tg \
--protocol HTTP \
--port 8000 \
--vpc-id vpc-0abc123 \
--health-check-protocol HTTP \
--health-check-path /health/ \
--health-check-interval-seconds 30 \
--health-check-timeout-seconds 5 \
--healthy-threshold-count 2 \
--unhealthy-threshold-count 2 \
--matcher "HttpCode=200" \
--target-type instance
# Register instances with the target group
aws elbv2 register-targets \
--target-group-arn arn:aws:elasticloadbalancing:region:account-id:targetgroup/syncapp-tg/1234567890 \
--targets Id=i-0123456789,Port=8000 Id=i-0987654321,Port=8000
# Create a load balancer
aws elbv2 create-load-balancer \
--name syncapp-lb \
--subnets subnet-0abc123 subnet-0def456 \
--security-groups sg-0abc123 \
--scheme internet-facing \
--type application \
--ip-address-type ipv4
# Create a listener for HTTPS
aws elbv2 create-listener \
--load-balancer-arn arn:aws:elasticloadbalancing:region:account-id:loadbalancer/app/syncapp-lb/1234567890 \
--protocol HTTPS \
--port 443 \
--ssl-policy ELBSecurityPolicy-TLS-1-2-2017-01 \
--certificates CertificateArn=arn:aws:acm:region:account-id:certificate/12345678-1234-1234-1234-123456789012 \
--default-actions Type=forward,TargetGroupArn=arn:aws:elasticloadbalancing:region:account-id:targetgroup/syncapp-tg/1234567890
# Create a listener for HTTP (redirect to HTTPS)
aws elbv2 create-listener \
--load-balancer-arn arn:aws:elasticloadbalancing:region:account-id:loadbalancer/app/syncapp-lb/1234567890 \
--protocol HTTP \
--port 80 \
--default-actions "Type=redirect,RedirectConfig={Protocol=HTTPS,Port=443,StatusCode=HTTP_301}"
Session Persistence
For applications requiring session persistence, configure your load balancer with one of these methods:
Sticky Sessions: Configure the load balancer to route a user's requests to the same backend server.
- Cookie-based: Load balancer sets a cookie to track which server to use.
- IP-based: Uses the client's IP address to determine the server.
Centralized Session Storage: Store session data in a shared backend like Redis.
# settings.py CACHES = { 'default': { 'BACKEND': 'django_redis.cache.RedisCache', 'LOCATION': 'redis://redis:6379/1', 'OPTIONS': { 'CLIENT_CLASS': 'django_redis.client.DefaultClient', } } } SESSION_ENGINE = 'django.contrib.sessions.backends.cache' SESSION_CACHE_ALIAS = 'default'
Auto-Scaling Configuration
AWS Auto Scaling Group
# Create a launch template
aws ec2 create-launch-template \
--launch-template-name syncapp-launch-template \
--version-description "Initial version" \
--launch-template-data '{
"ImageId": "ami-0abcdef1234567890",
"InstanceType": "t3.large",
"SecurityGroupIds": ["sg-0abc123"],
"KeyName": "syncapp-key",
"UserData": "IyEvYmluL2Jhc2gKIyBJbnN0YWxsIGRlcGVuZGVuY2llcwphcHQtZ2V0IHVwZGF0ZQphcHQtZ2V0IGluc3RhbGwgLXkgZ2l0IHB5dGhvbjMgcHl0aG9uMy1waXAgbmdpbnggc3VwZXJ2aXNvcgojIENsb25lIGFwcGxpY2F0aW9uCmdpdCBjbG9uZSBodHRwczovL2dpdGh1Yi5jb20veW91cnVzZXJuYW1lL3N5bmNhcHAuZ2l0IC9vcHQvc3luY2FwcAojIFNldCB1cCBhcHBsaWNhdGlvbgpjZCAvb3B0L3N5bmNhcHAKcGlwMybigKY="
}'
# Create an auto scaling group
aws autoscaling create-auto-scaling-group \
--auto-scaling-group-name syncapp-asg \
--launch-template LaunchTemplateName=syncapp-launch-template,Version='$Latest' \
--min-size 2 \
--max-size 10 \
--desired-capacity 2 \
--vpc-zone-identifier "subnet-0abc123,subnet-0def456" \
--target-group-arns "arn:aws:elasticloadbalancing:region:account-id:targetgroup/syncapp-tg/1234567890" \
--health-check-type ELB \
--health-check-grace-period 300
# Create scaling policies based on CPU utilization
aws autoscaling put-scaling-policy \
--auto-scaling-group-name syncapp-asg \
--policy-name cpu-scale-out \
--policy-type TargetTrackingScaling \
--target-tracking-configuration '{
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ASGAverageCPUUtilization"
},
"TargetValue": 70.0,
"ScaleOutCooldown": 300,
"ScaleInCooldown": 300
}'
Specialized Workload Distribution
For optimal performance, separate different types of SyncApp workloads onto dedicated servers:
- Web Servers: Handle HTTP requests and API calls.
- Worker Servers: Run Celery workers for asynchronous tasks.
- Database Servers: Run PostgreSQL with read replicas.
- Cache Servers: Run Redis for caching and Celery broker.
Example Celery Worker Configuration
# celery_config.py
from kombu import Queue, Exchange
# Define task queues with routing
task_queues = (
Queue('default', Exchange('default'), routing_key='default'),
Queue('sync_jobs', Exchange('sync_jobs'), routing_key='sync.jobs.*'),
Queue('data_processing', Exchange('data_processing'), routing_key='data.processing.*'),
Queue('reporting', Exchange('reporting'), routing_key='reporting.*'),
Queue('maintenance', Exchange('maintenance'), routing_key='maintenance.*'),
)
# Route tasks to specific queues
task_routes = {
'sync.tasks.run_sync_job': {'queue': 'sync_jobs'},
'sync.tasks.process_data_chunk': {'queue': 'data_processing'},
'sync.tasks.generate_report': {'queue': 'reporting'},
'sync.tasks.cleanup_old_data': {'queue': 'maintenance'},
}
# Start workers for specific queues:
# celery -A ipass worker -Q sync_jobs -l INFO -c 4 --hostname=sync_jobs@%h
# celery -A ipass worker -Q data_processing -l INFO -c 8 --hostname=data_proc@%h
# celery -A ipass worker -Q reporting -l INFO -c 2 --hostname=reporting@%h
# celery -A ipass worker -Q maintenance -l INFO -c 1 --hostname=maintenance@%h
Load Balancer Monitoring
Nginx Monitoring
# Add to your nginx.conf
server {
listen 8080;
allow 127.0.0.1;
allow 10.0.0.0/8;
deny all;
location /nginx_status {
stub_status on;
access_log off;
}
}
HAProxy Monitoring
# Add to your haproxy.cfg
listen stats
bind *:8404
stats enable
stats uri /stats
stats refresh 10s
stats admin if LOCALHOST
stats auth admin:strongpassword
Configure Prometheus to scrape these metrics endpoints for comprehensive monitoring.
Best Practices
Health Checks: Implement proper health checks that verify actual application functionality, not just that the server is responding.
Graceful Draining: When removing a server from rotation, allow it to finish processing current requests:
# AWS Example aws elbv2 deregister-targets \ --target-group-arn arn:aws:elasticloadbalancing:region:account-id:targetgroup/syncapp-tg/1234567890 \ --targets Id=i-0123456789 # Wait for connections to drain sleep 60 # Then stop the instance aws ec2 stop-instances --instance-ids i-0123456789
Consistent Hashing: For cache servers, use consistent hashing to minimize cache misses during scaling events.
Connection Pooling: Implement database connection pooling to handle large numbers of connections:
# Django settings.py DATABASES = { 'default': { 'ENGINE': 'django.db.backends.postgresql', 'NAME': 'syncapp', 'USER': 'syncapp_user', 'PASSWORD': 'password', 'HOST': 'db.example.com', 'PORT': '5432', 'CONN_MAX_AGE': 600, # Keep connections open for 10 minutes 'OPTIONS': { 'keepalives': 1, 'keepalives_idle': 30, 'keepalives_interval': 10, 'keepalives_count': 5, } } }
Rate Limiting: Implement rate limiting at the load balancer level to protect from traffic spikes.
SSL Termination: Terminate SSL at the load balancer to reduce CPU load on application servers.
Static Content: Serve static content from a CDN rather than through the load balancer when possible.
Immutable Infrastructure: Use immutable infrastructure approach – instead of updating servers, deploy new ones and retire old ones.
Troubleshooting
Common Issues and Solutions
Uneven Load Distribution
- Check load balancer algorithm (round-robin, least connections, etc.)
- Verify server weights are properly configured
- Check for stuck connections or sessions
Health Check Failures
- Validate health check endpoints are working properly
- Increase health check thresholds if servers are being marked unhealthy too aggressively
- Check for resource exhaustion (CPU, memory, connections) causing health checks to fail
Session Loss
- Verify sticky sessions are properly configured
- Check that session storage (Redis) is accessible from all servers
- Validate cookie settings for session persistence
Timeout Issues
- Adjust timeout settings in load balancer configuration
- Increase timeouts for long-running operations
- Implement asynchronous processing for slow operations
Monitoring and Alerts
Load Balancer Metrics to Monitor
- Active connections
- Request rate
- Error rate
- Backend server health
- Response time
- Rejected connections
Alert Thresholds
- High error rates (> 1% of requests)
- Unhealthy backends (any server failing health checks)
- High response time (> 500ms for 95th percentile)
- High connection count (> 80% of maximum)
Dashboard Example (Prometheus + Grafana)
- Connection tracking (current, rate, total)
- Request tracking by server
- Error rates by server and status code
- Response time percentiles
- Server health status timeline