Engineering Note

PgBouncer + Django/FastAPI/Flask: Setup, Performance, and 5000-Request Benchmark

A practical guide to PgBouncer with PostgreSQL in Django, FastAPI, and Flask: pooling modes, real performance effects, production config, and a 5000-request benchmark.

TL;DR

  • Default mode for APIs: pool_mode = transaction
  • Django: CONN_MAX_AGE = 0
  • SQLAlchemy: usually NullPool with PgBouncer transaction mode
  • Measure p95/p99, not only average latency

1) What PgBouncer is and why it matters

PgBouncer is a lightweight connection pooler for PostgreSQL. Your app talks to PgBouncer, and PgBouncer manages a limited number of real Postgres connections.

It helps most when:

  • traffic spikes create connection storms;
  • worker count is high;
  • Postgres hits connection limits.

Important: PgBouncer does not make SQL itself faster. It improves connection-level stability and tail latency under load.

2) Pooling modes: session / transaction / statement

pool_mode = transaction is usually best for APIs:

  • high pool density;
  • better scaling for short transactions;
  • avoid session-state assumptions.

pool_mode = session is more compatible but less efficient under concurrency.

statement is uncommon and can break multi-statement transaction behavior.

3) What performance gains to expect

Typical improvements:

  • fewer too many connections and connect timeouts;
  • smoother p95/p99 on peaks;
  • better throughput at high concurrency.

4) PgBouncer config template

[databases]
appdb = host=POSTGRES_HOST port=5432 dbname=appdb user=appuser password=apppass

[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
max_client_conn = 5000
default_pool_size = 50
min_pool_size = 10
reserve_pool_size = 20
reserve_pool_timeout = 5
server_idle_timeout = 30
server_lifetime = 3600
ignore_startup_parameters = extra_float_digits,search_path
stats_period = 60
"appuser" "md5<md5passwordhash>"

5) Django + PgBouncer

# settings.py
DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.postgresql",
        "NAME": "appdb",
        "USER": "appuser",
        "PASSWORD": "apppass",
        "HOST": "pgbouncer",
        "PORT": "6432",
        "CONN_MAX_AGE": 0,
        "OPTIONS": {"connect_timeout": 5},
    }
}

6) FastAPI + SQLAlchemy + PgBouncer

from fastapi import FastAPI
from sqlalchemy import text
from sqlalchemy.ext.asyncio import create_async_engine
from sqlalchemy.pool import NullPool

app = FastAPI()
engine = create_async_engine(
    "postgresql+asyncpg://appuser:apppass@pgbouncer:6432/appdb",
    poolclass=NullPool,
    pool_pre_ping=True,
)

@app.get("/ping-db")
async def ping_db():
    async with engine.connect() as conn:
        await conn.execute(text("SELECT 1"))
    return {"ok": True}

7) Flask + SQLAlchemy + PgBouncer

SQLALCHEMY_DATABASE_URI = "postgresql+psycopg2://appuser:apppass@pgbouncer:6432/appdb"
SQLALCHEMY_ENGINE_OPTIONS = {
    "poolclass": __import__("sqlalchemy").pool.NullPool,
    "pool_pre_ping": True,
}

8) 5000-request benchmark script

import asyncio
import time
import statistics
import httpx

TOTAL = 5000
CONCURRENCY = 200
URL = "http://127.0.0.1:8000/ping-db"

async def worker(client: httpx.AsyncClient, n: int, latencies: list[float]):
    for _ in range(n):
        t0 = time.perf_counter()
        r = await client.get(URL, timeout=10.0)
        r.raise_for_status()
        latencies.append((time.perf_counter() - t0) * 1000.0)

async def main():
    per_worker = TOTAL // CONCURRENCY
    remainder = TOTAL % CONCURRENCY
    latencies: list[float] = []
    start = time.perf_counter()

    limits = httpx.Limits(max_connections=CONCURRENCY, max_keepalive_connections=CONCURRENCY)
    async with httpx.AsyncClient(limits=limits) as client:
      tasks = []
      for i in range(CONCURRENCY):
          n = per_worker + (1 if i < remainder else 0)
          tasks.append(asyncio.create_task(worker(client, n, latencies)))
      await asyncio.gather(*tasks)

    elapsed = time.perf_counter() - start
    rps = TOTAL / elapsed
    latencies.sort()
    p50 = latencies[int(0.50 * len(latencies))]
    p95 = latencies[int(0.95 * len(latencies))]
    p99 = latencies[int(0.99 * len(latencies))]
    avg = statistics.mean(latencies)

    print(f"TOTAL={TOTAL} CONCURRENCY={CONCURRENCY}")
    print(f"Elapsed: {elapsed:.2f}s RPS: {rps:.1f}")
    print(f"Latency ms: avg={avg:.2f} p50={p50:.2f} p95={p95:.2f} p99={p99:.2f}")

if __name__ == "__main__":
    asyncio.run(main())
pip install httpx
python bench_http.py

For a fair comparison, only switch the DB endpoint (Postgres:5432 vs PgBouncer:6432) and keep the same load profile.

9) Practical production tips

  • Track p95/p99, not only average.
  • Keep transactions short with pool_mode=transaction.
  • Django: keep CONN_MAX_AGE = 0.
  • SQLAlchemy: usually NullPool to avoid double pooling.
  • Tune default_pool_size so Postgres stays stable under peak load.

Quick checklist:

  • PgBouncer metrics: SHOW POOLS;, SHOW STATS;
  • Postgres metrics: pg_stat_activity, transaction duration, active connections
  • Load test steps: 50 -> 200 -> 500 concurrency and compare p95/p99 tails