Skip to main content
المدونة

Zalt Blog

Deep Dives into Code & Architecture

AT SCALE

When a Database Becomes a Traffic Cop

By محمود الزلط
Code Cracking
30m read
<

When a database becomes a traffic cop, it’s not just about storing rows—it’s about orchestrating chaos at scale. Curious how that control loop really works?

/>
When a Database Becomes a Traffic Cop - Featured blog post image

CONSULTING

Got a specific database architecture problem?

Schema design, throughput bottlenecks, event-driven patterns — bring your situation, get a direct answer in one session.

Every production database sits at a chaotic intersection: thousands of client messages racing in, timeouts ticking, signals flying, and long-running queries trying to finish in peace. Yet from the outside, everything feels simple: we send SQL, we get rows. Somewhere in the middle, a piece of code is quietly orchestrating all of this.

In PostgreSQL, that orchestration lives in src/backend/tcop/postgres.c. We’ll treat it as a “traffic cop”: the coordinator that parses, plans, and executes queries while juggling protocol messages, transactions, and interrupts without losing its cool. I’m Mahmoud Zalt, an AI solutions architect who helps leaders turn AI into ROI, and we’ll use this file to learn how to design robust server control loops that stay predictable under load.

Where postgres.c sits

PostgreSQL is a multi-process database server. A postmaster process accepts connections and forks a backend process per client. That backend then runs the main control loop implemented in postgres.c.

postgres/
  src/
    backend/
      tcop/
        postgres.c        <- main backend loop & traffic cop
        pquery.c          <- portal query utilities
        fastpath.c        <- fast-path function calls
        utility.c         <- utility command execution
        backend_startup.c <- backend initialization helpers
      parser/
        parser.c          <- SQL parser front-end
      optimizer/
        optimizer.c       <- planner entry points
      executor/
        execMain.c        <- executor entry
      libpq/
        be-secure.c       <- backend I/O helpers

Postmaster
  └─ PostgresSingleUserMain / Backend fork
       └─ PostgresMain
            ├─ process_postgres_switches
            ├─ InitPostgres
            └─ main loop
                ├─ ReadCommand
                │    ├─ SocketBackend
                │    └─ InteractiveBackend
                └─ message handlers
                     ├─ exec_simple_query
                     ├─ exec_parse_message
                     ├─ exec_bind_message
                     ├─ exec_execute_message
                     └─ others (Describe, Close, Sync, FunctionCall)
postgres.c sits at the top of the backend stack, steering traffic to parser, planner, executor, and protocol layers.

This module does not implement SQL semantics. Instead, it:

  • Runs the main backend loop (PostgresMain)
  • Speaks the frontend/backend protocol (Query, Parse, Bind, Execute, Sync, etc.)
  • Orchestrates the query pipeline: parse → analyze → rewrite → plan → execute
  • Manages prepared statements (CachedPlanSource) and portals
  • Centralizes interrupts, signals, and timeouts (ProcessInterrupts)

The explicit query assembly line

Once you see PostgresMain as a traffic cop, its core loop looks like an assembly-line supervisor: read a message, classify it, and run it through standardized stages.

From wire message to SQL pipeline

The main loop repeatedly:

  1. Sends ReadyForQuery when idle
  2. Reads the next protocol message via ReadCommand()
  3. Dispatches based on the first byte (firstchar)
  4. For query-related messages, runs the query pipeline and manages the transaction

For the simple protocol (PqMsg_Query), that orchestration is wrapped in exec_simple_query. Conceptually, it does the following:

  • Report activity and optionally reset per-statement stats
  • Start a top-level transaction command for all statements in the message
  • Drop any prior unnamed prepared statement to reclaim memory
  • Switch to MessageContext and call pg_parse_query to get a list of RawStmt parse trees
  • Optionally log the statement based on configuration
  • Decide whether to wrap multiple statements in an implicit transaction block
  • For each RawStmt:
    • Check transaction state; reject commands when the transaction is already aborted
    • Start a new xact command and, if needed, an implicit block
    • CHECK_FOR_INTERRUPTS() at a safe point
    • Acquire a snapshot if analysis requires it
    • Run pg_analyze_and_rewrite_fixedparams to get Query trees
    • Run pg_plan_queries to get PlannedStmt nodes
    • Release the snapshot
    • Create a portal, start it, and execute via PortalRun
    • End or advance the transaction depending on what the statement did and whether more statements are coming
    • Call EndCommand to finalize the command result
  • Finish the top-level xact command
  • Handle the case of an empty parse tree list with NullCommand
  • Call check_log_duration to decide if duration (and maybe the statement) should be logged

Even without every line, the structure is clear: this is a carefully staged pipeline wrapped in transaction and logging policy.

The “assembly line” is explicitly layered:

  • Parse: pg_parse_query turns raw SQL into RawStmt nodes. It does not touch catalogs, so it can run even in aborted transactions.
  • Analyze & rewrite: pg_analyze_and_rewrite_*() takes a single raw statement and produces one or more Query trees under a fresh snapshot, then drops the snapshot.
  • Plan: pg_plan_queries() runs the planner and produces PlannedStmt nodes (or wrappers for utility commands).
  • Execute: Everything runs inside a Portal, which owns executor state and is driven by PortalRun.

Why this matters: by making each stage explicit, PostgreSQL can reason about snapshots, memory lifetimes, and error boundaries. That’s how a long-lived backend avoids “ghost” allocations and stale catalog views across thousands of queries.

Extended protocol: the same pipeline, stretched over messages

The extended query protocol takes the same stages and spreads them across several messages:

  • Parseexec_parse_message: parse, analyze, rewrite, and store a CachedPlanSource
  • Bindexec_bind_message: bind parameters and formats, create a Portal backed by a cached (or freshly generated) plan
  • Executeexec_execute_message: run the portal, optionally in chunks (for cursors and pipelining)

The traffic cop now has more to track: several in-flight portals, prepared statements, and the need to resynchronize with the client after errors. postgres.c handles this by:

  • Validating message lengths and types early in SocketBackend()
  • Using flags like doing_extended_query_message and ignore_till_sync so that, after an error, it can skip messages until a Sync arrives
  • Refusing extended protocol entirely in replication mode via forbidden_in_wal_sender()
Stage Simple protocol Extended protocol
Parse Inline in exec_simple_query exec_parse_message
Bind parameters Per execution, inside simple pipeline exec_bind_message
Execute PortalRun per statement exec_execute_message
Error recovery Abort transaction, next message starts fresh ignore_till_sync to resync at Sync

The pipeline is the same; the control loop just has to track more state across messages and enforce stricter protocol rules.

Interrupts and timeouts as a state machine

The assembly line looks clean on paper, but real systems are noisy. Clients disconnect mid-query, admins send signals, timeouts expire, and replicas conflict with recovery. postgres.c keeps that chaos from corrupting protocol or transaction state by treating interrupts as inputs to a central state machine.

The central interrupt gate: ProcessInterrupts()

PostgreSQL’s signal handlers are deliberately simple: they set flags. Real work happens later at safe points via CHECK_FOR_INTERRUPTS(), which calls ProcessInterrupts if anything is pending. The function looks roughly like this:

void
ProcessInterrupts(void)
{
    /* Don't act while interrupts are held off or in a critical section. */
    if (InterruptHoldoffCount != 0 || CritSectionCount != 0)
        return;

    InterruptPending = false;

    if (ProcDiePending)
    {
        ProcDiePending = false;
        QueryCancelPending = false; /* ProcDie trumps QueryCancel */
        LockErrorCleanup();
        if (ClientAuthInProgress && whereToSendOutput == DestRemote)
            whereToSendOutput = DestNone;
        if (ClientAuthInProgress)
            ereport(FATAL,
                    (errcode(ERRCODE_QUERY_CANCELED),
                     errmsg("canceling authentication due to timeout")));
        else if (AmAutoVacuumWorkerProcess())
            ereport(FATAL,
                    (errcode(ERRCODE_ADMIN_SHUTDOWN),
                     errmsg("terminating autovacuum process due to administrator command")));
        ...
    }

    if (CheckClientConnectionPending)
    {
        CheckClientConnectionPending = false;
        if (!DoingCommandRead && client_connection_check_interval > 0)
        {
            if (!pq_check_connection())
                ClientConnectionLost = true;
            else
                enable_timeout_after(CLIENT_CONNECTION_CHECK_TIMEOUT,
                                     client_connection_check_interval);
        }
    }

    if (ClientConnectionLost)
    {
        QueryCancelPending = false; /* lost connection trumps QueryCancel */
        LockErrorCleanup();
        whereToSendOutput = DestNone;
        ereport(FATAL,
                (errcode(ERRCODE_CONNECTION_FAILURE),
                 errmsg("connection to client lost")));
    }

    if (QueryCancelPending && QueryCancelHoldoffCount != 0)
    {
        /* Can't cancel right now, keep the flag set. */
        InterruptPending = true;
    }
    else if (QueryCancelPending)
    {
        bool lock_timeout_occurred;
        bool stmt_timeout_occurred;

        QueryCancelPending = false;
        lock_timeout_occurred = get_timeout_indicator(LOCK_TIMEOUT, true);
        stmt_timeout_occurred = get_timeout_indicator(STATEMENT_TIMEOUT, true);

        if (lock_timeout_occurred && stmt_timeout_occurred &&
            get_timeout_finish_time(STATEMENT_TIMEOUT) < get_timeout_finish_time(LOCK_TIMEOUT))
            lock_timeout_occurred = false; /* report statement timeout instead */

        if (lock_timeout_occurred)
        {
            LockErrorCleanup();
            ereport(ERROR,
                    (errcode(ERRCODE_LOCK_NOT_AVAILABLE),
                     errmsg("canceling statement due to lock timeout")));
        }
        if (stmt_timeout_occurred)
        {
            LockErrorCleanup();
            ereport(ERROR,
                    (errcode(ERRCODE_QUERY_CANCELED),
                     errmsg("canceling statement due to statement timeout")));
        }

        if (AmAutoVacuumWorkerProcess())
        {
            LockErrorCleanup();
            ereport(ERROR,
                    (errcode(ERRCODE_QUERY_CANCELED),
                     errmsg("canceling autovacuum task")));
        }

        if (!DoingCommandRead)
        {
            LockErrorCleanup();
            ereport(ERROR,
                    (errcode(ERRCODE_QUERY_CANCELED),
                     errmsg("canceling statement due to user request")));
        }
    }

    if (pg_atomic_read_u32(&MyProc->pendingRecoveryConflicts) != 0)
        ProcessRecoveryConflictInterrupts();

    ... /* idle timeouts, stats, barriers, parallel messages ... */
}

A few design choices are worth copying:

  • Single gate: all asynchronous events route through one function. When you reason about fatal vs non-fatal paths, you go here first.
  • Precedence: some events override others (process death > query cancel; connection loss > cancel). The rules are encoded, not left to guesswork.
  • Context sensitivity: behavior depends on whether we’re reading a command (DoingCommandRead) or executing SQL. Query cancel during a read is deferred to avoid desynchronizing the protocol.
  • Timeout semantics in code: lock vs statement timeout precedence is implemented directly, including the “later deadline wins” rule.

Recovery conflicts: yielding to the primary

On hot standby replicas, user queries can conflict with recovery: pinned buffers, locks, or replication slots can block WAL replay. ProcessRecoveryConflictInterrupts() and report_recovery_conflict() decide whether to cancel the statement (ERROR) or terminate the whole session (FATAL), with detailed, user-facing messages.

This logic lives in the traffic cop layer for a reason: it doesn’t need to understand query semantics, only when client work must yield to recovery to keep replicas in sync.

Policy helpers: logging and client behavior

postgres.c is also where configuration (GUCs) turns into concrete runtime behavior. Timeouts, logging thresholds, and statistics are applied around query execution in a consistent way.

Logging duration without drowning in data

check_log_duration is a compact policy helper that decides if and how to log how long a statement took:

int
check_log_duration(char *msec_str, bool was_logged)
{
    if (log_duration || log_min_duration_sample >= 0 ||
        log_min_duration_statement >= 0 || xact_is_sampled)
    {
        long secs;
        int  usecs;
        int  msecs;
        bool exceeded_duration;
        bool exceeded_sample_duration;
        bool in_sample = false;

        TimestampDifference(GetCurrentStatementStartTimestamp(),
                            GetCurrentTimestamp(),
                            &secs, &usecs);
        msecs = usecs / 1000;

        exceeded_duration = (log_min_duration_statement == 0 ||
                             (log_min_duration_statement > 0 &&
                              (secs > log_min_duration_statement / 1000 ||
                               secs * 1000 + msecs >= log_min_duration_statement)));

        exceeded_sample_duration = (log_min_duration_sample == 0 ||
                                    (log_min_duration_sample > 0 &&
                                     (secs > log_min_duration_sample / 1000 ||
                                      secs * 1000 + msecs >= log_min_duration_sample)));

        if (exceeded_sample_duration)
            in_sample = log_statement_sample_rate != 0 &&
                (log_statement_sample_rate == 1 ||
                 pg_prng_double(&pg_global_prng_state) <= log_statement_sample_rate);

        if (exceeded_duration || in_sample || log_duration || xact_is_sampled)
        {
            snprintf(msec_str, 32, "%ld.%03d",
                     secs * 1000 + msecs, usecs % 1000);
            if ((exceeded_duration || in_sample || xact_is_sampled) && !was_logged)
                return 2;   /* log duration + statement */
            else
                return 1;   /* log duration only */
        }
    }

    return 0;
}

In words, it:

  • Computes duration in milliseconds from statement start to now
  • Checks against two thresholds: a hard minimum (log_min_duration_statement) and a sampling threshold (log_min_duration_sample)
  • Optionally samples based on log_statement_sample_rate
  • Fills msec_str and returns an enum-like integer: 0 = no logging, 1 = log duration only, 2 = log duration plus statement

This single helper is used from exec_simple_query, exec_parse_message, and exec_execute_message, ensuring that “how we decide to log” is consistent across protocol paths.

Timeouts as levers to steer clients

PostgreSQL exposes several timeouts that ultimately surface as interrupts:

  • StatementTimeout – per-statement deadline
  • IdleInTransactionSessionTimeout – kill sessions that sit idle in an open transaction
  • IdleSessionTimeout – kill completely idle sessions
  • TransactionTimeout – maximum lifetime of a transaction

The main loop arms these timers only when relevant. For example, when sending ReadyForQuery, it chooses which idle timeout to enable based on current transaction state:

if (send_ready_for_query)
{
    if (IsAbortedTransactionBlockState())
    {
        set_ps_display("idle in transaction (aborted)");
        pgstat_report_activity(STATE_IDLEINTRANSACTION_ABORTED, NULL);

        if (IdleInTransactionSessionTimeout > 0 &&
            (IdleInTransactionSessionTimeout < TransactionTimeout ||
             TransactionTimeout == 0))
        {
            idle_in_transaction_timeout_enabled = true;
            enable_timeout_after(IDLE_IN_TRANSACTION_SESSION_TIMEOUT,
                                 IdleInTransactionSessionTimeout);
        }
    }
    else if (IsTransactionOrTransactionBlock())
    {
        set_ps_display("idle in transaction");
        pgstat_report_activity(STATE_IDLEINTRANSACTION, NULL);

        if (IdleInTransactionSessionTimeout > 0 &&
            (IdleInTransactionSessionTimeout < TransactionTimeout ||
             TransactionTimeout == 0))
        {
            idle_in_transaction_timeout_enabled = true;
            enable_timeout_after(IDLE_IN_TRANSACTION_SESSION_TIMEOUT,
                                 IdleInTransactionSessionTimeout);
        }
    }
    else
    {
        set_ps_display("idle");
        pgstat_report_activity(STATE_IDLE, NULL);

        if (IdleSessionTimeout > 0)
        {
            idle_session_timeout_enabled = true;
            enable_timeout_after(IDLE_SESSION_TIMEOUT,
                                 IdleSessionTimeout);
        }
    }

    ReadyForQuery(whereToSendOutput);
    send_ready_for_query = false;
}

Later, ProcessInterrupts turns the corresponding pending flags into hard outcomes with specific SQLSTATEs:

if (IdleInTransactionSessionTimeoutPending)
{
    IdleInTransactionSessionTimeoutPending = false;
    if (IdleInTransactionSessionTimeout > 0)
    {
        INJECTION_POINT("idle-in-transaction-session-timeout", NULL);
        ereport(FATAL,
                (errcode(ERRCODE_IDLE_IN_TRANSACTION_SESSION_TIMEOUT),
                 errmsg("terminating connection due to idle-in-transaction timeout")));
    }
}

if (IdleSessionTimeoutPending)
{
    IdleSessionTimeoutPending = false;
    if (IdleSessionTimeout > 0)
    {
        INJECTION_POINT("idle-session-timeout", NULL);
        ereport(FATAL,
                (errcode(ERRCODE_IDLE_SESSION_TIMEOUT),
                 errmsg("terminating connection due to idle-session timeout")));
    }
}

Why this matters: these timeouts are resource guards and behavioral signals. Misbehaving applications that leave transactions open or sessions idle get specific error codes that operators can monitor and feed back into development.

The same layer is a natural place to define useful counters, such as:

  • backend_statement_timeout_count – how often statements hit STATEMENT_TIMEOUT
  • backend_idle_in_transaction_timeout_count – how often sessions die while idle in a transaction
  • backend_protocol_violation_count – how often we raise PROTOCOL_VIOLATION, often due to buggy clients

Patterns to steal for your own servers

Reading postgres.c as a story rather than a 2,500-line C file surfaces a set of reusable patterns. They apply whether you’re building a database, a gRPC service, or a custom control plane.

1. Make the request pipeline explicit

  • Expose functions like parse, analyze, plan, and execute as separate steps, even if they’re always called together today.
  • Document invariants for each step (for example, “planner requires an active snapshot”).
  • In long-lived processes, scope memory per stage (PostgreSQL’s MessageContext and per-statement contexts are a strong reference).

2. Centralize protocol dispatch

  • Have a single place where you decode and validate incoming messages (e.g., a SocketBackend-style read loop plus a dispatch switch).
  • Fail fast on invalid types or sizes with clear, fatal errors; a hard disconnect is better than a desynchronized protocol.
  • Keep the main loop readable by extracting a focused dispatcher (for example, a handle_client_message-style helper) rather than expanding the switch indefinitely.

3. Treat interrupts and timeouts as first-class inputs

  • Keep signal handlers minimal; let them set flags.
  • Route all handling through one ProcessInterrupts-style function that encodes precedence and context rules.
  • Express timeout precedence as code (lock vs statement timeouts, idle vs transaction limits), not as folklore in comments.

4. Encapsulate policy: logging, privacy, behavior

  • Implement small helpers like check_log_statement and check_log_duration to decide what to log and when.
  • Use configuration-driven guards (e.g., log_parameter_max_length and similar) to prevent logs from leaking entire payloads or PII.
  • Let those helpers be the only place that knows about sampling rates and thresholds.

5. Accept some centralization, but fight monolith bloat

postgres.c shows both good patterns and inevitable trade-offs:

  • The monolithic PostgresMain switch and intertwined behaviors increase regression risk when adding new message types.
  • Global session flags like xact_started, DoingCommandRead, doing_extended_query_message, and ignore_till_sync create implicit coupling between distant code paths.
  • Protocol handling, interrupts, command-line parsing, and some GUC plumbing all live in the same file.

The suggested refactors in the upstream discussions—extracting a dedicated message dispatcher, encapsulating session state in a struct, and factoring timeout logic into helpers—are ways to keep the traffic cop’s role clear while shrinking its blast radius.

Bringing it back to your code

If you’re designing or refactoring a server today, you can apply these ideas immediately:

  1. Draw your ASCII call graph. Sketch how requests flow through your process, including where you read from the network and where you decide on timeouts and logging.
  2. Introduce a single interrupt handler. Collect cancellation, timeouts, and shutdown into a ProcessInterrupts-like function, and call it from safe points in your pipeline.
  3. Split your main loop by responsibility. Separate read_message, dispatch_message, and run_pipeline, and give each a narrow, testable contract.

The primary lesson from PostgreSQL’s traffic cop is simple: robust servers make their control loops and state transitions explicit. postgres.c keeps the protocol honest, transactions well-scoped, and interrupts under control by treating message handling, timeouts, and logging as first-class parts of the design—not afterthoughts.

If we bring that same discipline into our own services, we end up with systems that are not just fast, but also trustworthy when the intersection gets busy.

Full Source Code

Direct source from the upstream repository. Preview it inline or open it on GitHub.

src/backend/tcop/postgres.c

postgres/postgres • master

Choose one action below.

Open on GitHub

Thanks for reading! I hope this was useful. If you have questions or thoughts, feel free to reach out.

Content Creation Process: This article was generated via a semi-automated workflow using AI tools. I prepared the strategic framework, including specific prompts and data sources. From there, the automation system conducted the research, analysis, and writing. The content passed through automated verification steps before being finalized and published without manual intervention.

Mahmoud Zalt

About the Author

I’m Zalt, a technologist with 16+ years of experience, passionate about designing and building AI systems that move us closer to a world where machines handle everything and humans reclaim wonder.

Let's connect if you're working on interesting AI projects, looking for technical advice or want to discuss anything.

Support this content

Share this article