Most Django deployment tutorials stop at docker run. That gap between a container that boots on your laptop and a service that survives a deploy, a health check, and a database failover is where the real work lives. This is the configuration we reach for when we put a Django application on AWS ECS Fargate for a client, written down so you can adapt it rather than rediscover it.
We use Fargate (rather than EC2-backed ECS) because we would rather not patch and scale a fleet of container hosts. You hand AWS a task definition and it finds capacity. The trade-off is that you give up host-level control and pay a small premium per vCPU-hour. For a typical Django API or admin-backed product, that trade is worth it.
The Dockerfile
Build for a small, predictable image. Use a slim Python base, install dependencies in a layer that only rebuilds when requirements.txt changes, collect static files at build time, and run as a non-root user.

FROM python:3.12-slim AS base
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PIP_NO_CACHE_DIR=1
WORKDIR /app
# System deps for psycopg and building wheels.
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential libpq-dev \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
# Bake static assets into the image so the container is self-contained.
RUN python manage.py collectstatic --noinput
RUN useradd --create-home appuser
USER appuser
EXPOSE 8000
CMD ["gunicorn", "myproject.wsgi:application", "-c", "gunicorn.conf.py"]A note on collectstatic: under a static-asset CDN you usually push the collected files to S3 and serve them from CloudFront, not from the container. We still run collectstatic in the build so the manifest exists and ManifestStaticFilesStorage can resolve hashed filenames. Where the files physically live is a separate decision from whether the manifest is present.
gunicorn configuration
Put gunicorn settings in a file rather than a long CMD line. The worker count is the setting people get wrong most often. A common starting point is 2 * CPU + 1, but that formula assumes CPU-bound work. Django request handlers spend most of their time waiting on the database, so we usually run sync workers sized to the task's vCPU allocation and add a few threads per worker for I/O overlap.
# gunicorn.conf.py
import os
bind = "0.0.0.0:8000"
workers = int(os.environ.get("GUNICORN_WORKERS", "3"))
threads = int(os.environ.get("GUNICORN_THREADS", "4"))
worker_class = "gthread"
# Recycle workers to bound memory growth from long-lived processes.
max_requests = 1000
max_requests_jitter = 100
# Must be shorter than the ALB idle timeout so gunicorn closes first.
timeout = 60
graceful_timeout = 30
accesslog = "-" # stdout -> CloudWatch Logs
errorlog = "-"Sizing is something you tune against your own traffic. Start conservative, watch the task's CPU and memory in CloudWatch, and raise the worker count only when the workers are actually saturated rather than idle-waiting on the database.
The ECS task definition
The task definition is where Django meets AWS. It declares the container, its CPU and memory, the environment, secrets, and the logging driver. Here is a trimmed version of one we run.
{
"family": "myproject-web",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "512",
"memory": "1024",
"executionRoleArn": "arn:aws:iam::ACCOUNT:role/myproject-ecs-execution",
"taskRoleArn": "arn:aws:iam::ACCOUNT:role/myproject-task",
"containerDefinitions": [
{
"name": "web",
"image": "ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/myproject:GIT_SHA",
"portMappings": [{ "containerPort": 8000, "protocol": "tcp" }],
"environment": [
{ "name": "DJANGO_SETTINGS_MODULE", "value": "myproject.settings.production" },
{ "name": "ALLOWED_HOSTS", "value": ".myproject.com" }
],
"secrets": [
{ "name": "DATABASE_URL", "valueFrom": "arn:aws:secretsmanager:us-east-1:ACCOUNT:secret:myproject/db" },
{ "name": "SECRET_KEY", "valueFrom": "arn:aws:secretsmanager:us-east-1:ACCOUNT:secret:myproject/django" }
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/myproject-web",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "web"
}
}
}
]
}Two decisions worth calling out. First, image tags are the Git SHA, never latest, so a rollback is just pointing the service back at the previous task definition revision. Second, DATABASE_URL and SECRET_KEY come from secrets (resolved from Secrets Manager by the execution role at launch), not from environment. Plaintext secrets in a task definition are readable by anyone with ecs:DescribeTaskDefinition.
ALB health checks
The Application Load Balancer needs a path that returns 200 only when the app can actually serve traffic. Do not point it at / if / runs an expensive query or redirects. Add a dedicated lightweight endpoint.

# urls.py
from django.http import JsonResponse
from django.urls import path
def healthz(request):
return JsonResponse({"status": "ok"})
urlpatterns = [
path("healthz", healthz),
# ... your routes
]In the target group, set the health check path to /healthz, the success code to 200, and keep the interval and healthy-threshold tight enough that a bad task is pulled quickly but not so aggressive that a brief GC pause cycles a healthy one. Add healthz to ALLOWED_HOSTS handling. The ALB hits the container by IP, so either allow the health-check host or special-case the path before host validation.
One subtlety: the ALB idle timeout must be longer than gunicorn's timeout, and gunicorn's timeout must be longer than your slowest legitimate request. If those are out of order you get truncated responses that are painful to trace.
RDS PostgreSQL and connection handling
Use RDS for PostgreSQL and give Django persistent connections so it is not opening a fresh TCP and TLS handshake on every request.
# settings/production.py
import dj_database_url
DATABASES = {
"default": {
**dj_database_url.config(conn_max_age=600, ssl_require=True),
"OPTIONS": {"connect_timeout": 5},
}
}conn_max_age keeps a connection alive across requests within a worker. Be deliberate here: persistent connections multiply by worker count by task count, so check that (workers x threads x tasks) stays under the RDS instance's max_connections. When you outgrow that, put PgBouncer in transaction-pooling mode in front of RDS rather than raising max_connections indefinitely.
Run migrations as a separate one-off ECS task before the new service version goes live, not in the container CMD. If migrations run on every task start, a scale-up event can fire several migration attempts at once.
Where Celery fits
Celery workers are a second ECS service from the same image, with a different command. They share the task definition image and secrets but do not sit behind the ALB. They pull from the broker instead.
{
"name": "worker",
"image": "ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/myproject:GIT_SHA",
"command": ["celery", "-A", "myproject", "worker", "--concurrency", "4"],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/myproject-worker",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "worker"
}
}
}Because the worker has no health-check endpoint, monitor it on queue depth and task failure rate in CloudWatch rather than on HTTP status. Scale the worker service on broker backlog, and scale the web service on ALB request count or CPU. They have different load shapes and should autoscale independently.
Putting it together
The pieces are: an image tagged by SHA, a task definition that pulls secrets at launch, a web service behind an ALB with a real health endpoint, a worker service on the same image, and RDS with bounded persistent connections. None of it is exotic. The work is in getting the boundaries right: timeouts ordered correctly, connections counted, migrations run once.
If you would rather hand this off, our Django cloud deployment team builds and operates exactly this setup, and if your data model is the hard part we also do PostgreSQL database engineering alongside it. Tell us what you are running and we will tell you, plainly, what we would change. Start the conversation on our Django cloud deployment service page.


