Every production database sits at a chaotic intersection: thousands of client messages racing in, timeouts ticking, signals flying, and long-running queries trying to finish in peace. Yet from the outside, everything feels simple: we send SQL, we get rows. Somewhere in the middle, a piece of code is quietly orchestrating all of this.
In PostgreSQL, that orchestration lives in src/backend/tcop/postgres.c. We’ll treat it as a “traffic cop”: the coordinator that parses, plans, and executes queries while juggling protocol messages, transactions, and interrupts without losing its cool. I’m Mahmoud Zalt, an AI solutions architect who helps leaders turn AI into ROI, and we’ll use this file to learn how to design robust server control loops that stay predictable under load.
Where postgres.c sits
PostgreSQL is a multi-process database server. A postmaster process accepts connections and forks a backend process per client. That backend then runs the main control loop implemented in postgres.c.
postgres/
src/
backend/
tcop/
postgres.c <- main backend loop & traffic cop
pquery.c <- portal query utilities
fastpath.c <- fast-path function calls
utility.c <- utility command execution
backend_startup.c <- backend initialization helpers
parser/
parser.c <- SQL parser front-end
optimizer/
optimizer.c <- planner entry points
executor/
execMain.c <- executor entry
libpq/
be-secure.c <- backend I/O helpers
Postmaster
└─ PostgresSingleUserMain / Backend fork
└─ PostgresMain
├─ process_postgres_switches
├─ InitPostgres
└─ main loop
├─ ReadCommand
│ ├─ SocketBackend
│ └─ InteractiveBackend
└─ message handlers
├─ exec_simple_query
├─ exec_parse_message
├─ exec_bind_message
├─ exec_execute_message
└─ others (Describe, Close, Sync, FunctionCall)
postgres.c sits at the top of the backend stack, steering traffic to parser, planner, executor, and protocol layers.This module does not implement SQL semantics. Instead, it:
- Runs the main backend loop (
PostgresMain) - Speaks the frontend/backend protocol (
Query,Parse,Bind,Execute,Sync, etc.) - Orchestrates the query pipeline: parse → analyze → rewrite → plan → execute
- Manages prepared statements (
CachedPlanSource) and portals - Centralizes interrupts, signals, and timeouts (
ProcessInterrupts)
The explicit query assembly line
Once you see PostgresMain as a traffic cop, its core loop looks like an assembly-line supervisor: read a message, classify it, and run it through standardized stages.
From wire message to SQL pipeline
The main loop repeatedly:
- Sends
ReadyForQuerywhen idle - Reads the next protocol message via
ReadCommand() - Dispatches based on the first byte (
firstchar) - For query-related messages, runs the query pipeline and manages the transaction
For the simple protocol (PqMsg_Query), that orchestration is wrapped in exec_simple_query. Conceptually, it does the following:
- Report activity and optionally reset per-statement stats
- Start a top-level transaction command for all statements in the message
- Drop any prior unnamed prepared statement to reclaim memory
- Switch to
MessageContextand callpg_parse_queryto get a list ofRawStmtparse trees - Optionally log the statement based on configuration
- Decide whether to wrap multiple statements in an implicit transaction block
- For each
RawStmt:- Check transaction state; reject commands when the transaction is already aborted
- Start a new xact command and, if needed, an implicit block
CHECK_FOR_INTERRUPTS()at a safe point- Acquire a snapshot if analysis requires it
- Run
pg_analyze_and_rewrite_fixedparamsto getQuerytrees - Run
pg_plan_queriesto getPlannedStmtnodes - Release the snapshot
- Create a portal, start it, and execute via
PortalRun - End or advance the transaction depending on what the statement did and whether more statements are coming
- Call
EndCommandto finalize the command result
- Finish the top-level xact command
- Handle the case of an empty parse tree list with
NullCommand - Call
check_log_durationto decide if duration (and maybe the statement) should be logged
Even without every line, the structure is clear: this is a carefully staged pipeline wrapped in transaction and logging policy.
The “assembly line” is explicitly layered:
- Parse:
pg_parse_queryturns raw SQL intoRawStmtnodes. It does not touch catalogs, so it can run even in aborted transactions. - Analyze & rewrite:
pg_analyze_and_rewrite_*()takes a single raw statement and produces one or moreQuerytrees under a fresh snapshot, then drops the snapshot. - Plan:
pg_plan_queries()runs the planner and producesPlannedStmtnodes (or wrappers for utility commands). - Execute: Everything runs inside a
Portal, which owns executor state and is driven byPortalRun.
Why this matters: by making each stage explicit, PostgreSQL can reason about snapshots, memory lifetimes, and error boundaries. That’s how a long-lived backend avoids “ghost” allocations and stale catalog views across thousands of queries.
Extended protocol: the same pipeline, stretched over messages
The extended query protocol takes the same stages and spreads them across several messages:
Parse→exec_parse_message: parse, analyze, rewrite, and store aCachedPlanSourceBind→exec_bind_message: bind parameters and formats, create aPortalbacked by a cached (or freshly generated) planExecute→exec_execute_message: run the portal, optionally in chunks (for cursors and pipelining)
The traffic cop now has more to track: several in-flight portals, prepared statements, and the need to resynchronize with the client after errors. postgres.c handles this by:
- Validating message lengths and types early in
SocketBackend() - Using flags like
doing_extended_query_messageandignore_till_syncso that, after an error, it can skip messages until aSyncarrives - Refusing extended protocol entirely in replication mode via
forbidden_in_wal_sender()
| Stage | Simple protocol | Extended protocol |
|---|---|---|
| Parse | Inline in exec_simple_query |
exec_parse_message |
| Bind parameters | Per execution, inside simple pipeline | exec_bind_message |
| Execute | PortalRun per statement |
exec_execute_message |
| Error recovery | Abort transaction, next message starts fresh | ignore_till_sync to resync at Sync |
The pipeline is the same; the control loop just has to track more state across messages and enforce stricter protocol rules.
Interrupts and timeouts as a state machine
The assembly line looks clean on paper, but real systems are noisy. Clients disconnect mid-query, admins send signals, timeouts expire, and replicas conflict with recovery. postgres.c keeps that chaos from corrupting protocol or transaction state by treating interrupts as inputs to a central state machine.
The central interrupt gate: ProcessInterrupts()
PostgreSQL’s signal handlers are deliberately simple: they set flags. Real work happens later at safe points via CHECK_FOR_INTERRUPTS(), which calls ProcessInterrupts if anything is pending. The function looks roughly like this:
void
ProcessInterrupts(void)
{
/* Don't act while interrupts are held off or in a critical section. */
if (InterruptHoldoffCount != 0 || CritSectionCount != 0)
return;
InterruptPending = false;
if (ProcDiePending)
{
ProcDiePending = false;
QueryCancelPending = false; /* ProcDie trumps QueryCancel */
LockErrorCleanup();
if (ClientAuthInProgress && whereToSendOutput == DestRemote)
whereToSendOutput = DestNone;
if (ClientAuthInProgress)
ereport(FATAL,
(errcode(ERRCODE_QUERY_CANCELED),
errmsg("canceling authentication due to timeout")));
else if (AmAutoVacuumWorkerProcess())
ereport(FATAL,
(errcode(ERRCODE_ADMIN_SHUTDOWN),
errmsg("terminating autovacuum process due to administrator command")));
...
}
if (CheckClientConnectionPending)
{
CheckClientConnectionPending = false;
if (!DoingCommandRead && client_connection_check_interval > 0)
{
if (!pq_check_connection())
ClientConnectionLost = true;
else
enable_timeout_after(CLIENT_CONNECTION_CHECK_TIMEOUT,
client_connection_check_interval);
}
}
if (ClientConnectionLost)
{
QueryCancelPending = false; /* lost connection trumps QueryCancel */
LockErrorCleanup();
whereToSendOutput = DestNone;
ereport(FATAL,
(errcode(ERRCODE_CONNECTION_FAILURE),
errmsg("connection to client lost")));
}
if (QueryCancelPending && QueryCancelHoldoffCount != 0)
{
/* Can't cancel right now, keep the flag set. */
InterruptPending = true;
}
else if (QueryCancelPending)
{
bool lock_timeout_occurred;
bool stmt_timeout_occurred;
QueryCancelPending = false;
lock_timeout_occurred = get_timeout_indicator(LOCK_TIMEOUT, true);
stmt_timeout_occurred = get_timeout_indicator(STATEMENT_TIMEOUT, true);
if (lock_timeout_occurred && stmt_timeout_occurred &&
get_timeout_finish_time(STATEMENT_TIMEOUT) < get_timeout_finish_time(LOCK_TIMEOUT))
lock_timeout_occurred = false; /* report statement timeout instead */
if (lock_timeout_occurred)
{
LockErrorCleanup();
ereport(ERROR,
(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
errmsg("canceling statement due to lock timeout")));
}
if (stmt_timeout_occurred)
{
LockErrorCleanup();
ereport(ERROR,
(errcode(ERRCODE_QUERY_CANCELED),
errmsg("canceling statement due to statement timeout")));
}
if (AmAutoVacuumWorkerProcess())
{
LockErrorCleanup();
ereport(ERROR,
(errcode(ERRCODE_QUERY_CANCELED),
errmsg("canceling autovacuum task")));
}
if (!DoingCommandRead)
{
LockErrorCleanup();
ereport(ERROR,
(errcode(ERRCODE_QUERY_CANCELED),
errmsg("canceling statement due to user request")));
}
}
if (pg_atomic_read_u32(&MyProc->pendingRecoveryConflicts) != 0)
ProcessRecoveryConflictInterrupts();
... /* idle timeouts, stats, barriers, parallel messages ... */
}
A few design choices are worth copying:
- Single gate: all asynchronous events route through one function. When you reason about fatal vs non-fatal paths, you go here first.
- Precedence: some events override others (process death > query cancel; connection loss > cancel). The rules are encoded, not left to guesswork.
- Context sensitivity: behavior depends on whether we’re reading a command (
DoingCommandRead) or executing SQL. Query cancel during a read is deferred to avoid desynchronizing the protocol. - Timeout semantics in code: lock vs statement timeout precedence is implemented directly, including the “later deadline wins” rule.
Recovery conflicts: yielding to the primary
On hot standby replicas, user queries can conflict with recovery: pinned buffers, locks, or replication slots can block WAL replay. ProcessRecoveryConflictInterrupts() and report_recovery_conflict() decide whether to cancel the statement (ERROR) or terminate the whole session (FATAL), with detailed, user-facing messages.
This logic lives in the traffic cop layer for a reason: it doesn’t need to understand query semantics, only when client work must yield to recovery to keep replicas in sync.
Policy helpers: logging and client behavior
postgres.c is also where configuration (GUCs) turns into concrete runtime behavior. Timeouts, logging thresholds, and statistics are applied around query execution in a consistent way.
Logging duration without drowning in data
check_log_duration is a compact policy helper that decides if and how to log how long a statement took:
int
check_log_duration(char *msec_str, bool was_logged)
{
if (log_duration || log_min_duration_sample >= 0 ||
log_min_duration_statement >= 0 || xact_is_sampled)
{
long secs;
int usecs;
int msecs;
bool exceeded_duration;
bool exceeded_sample_duration;
bool in_sample = false;
TimestampDifference(GetCurrentStatementStartTimestamp(),
GetCurrentTimestamp(),
&secs, &usecs);
msecs = usecs / 1000;
exceeded_duration = (log_min_duration_statement == 0 ||
(log_min_duration_statement > 0 &&
(secs > log_min_duration_statement / 1000 ||
secs * 1000 + msecs >= log_min_duration_statement)));
exceeded_sample_duration = (log_min_duration_sample == 0 ||
(log_min_duration_sample > 0 &&
(secs > log_min_duration_sample / 1000 ||
secs * 1000 + msecs >= log_min_duration_sample)));
if (exceeded_sample_duration)
in_sample = log_statement_sample_rate != 0 &&
(log_statement_sample_rate == 1 ||
pg_prng_double(&pg_global_prng_state) <= log_statement_sample_rate);
if (exceeded_duration || in_sample || log_duration || xact_is_sampled)
{
snprintf(msec_str, 32, "%ld.%03d",
secs * 1000 + msecs, usecs % 1000);
if ((exceeded_duration || in_sample || xact_is_sampled) && !was_logged)
return 2; /* log duration + statement */
else
return 1; /* log duration only */
}
}
return 0;
}
In words, it:
- Computes duration in milliseconds from statement start to now
- Checks against two thresholds: a hard minimum (
log_min_duration_statement) and a sampling threshold (log_min_duration_sample) - Optionally samples based on
log_statement_sample_rate - Fills
msec_strand returns an enum-like integer: 0 = no logging, 1 = log duration only, 2 = log duration plus statement
This single helper is used from exec_simple_query, exec_parse_message, and exec_execute_message, ensuring that “how we decide to log” is consistent across protocol paths.
Timeouts as levers to steer clients
PostgreSQL exposes several timeouts that ultimately surface as interrupts:
StatementTimeout– per-statement deadlineIdleInTransactionSessionTimeout– kill sessions that sit idle in an open transactionIdleSessionTimeout– kill completely idle sessionsTransactionTimeout– maximum lifetime of a transaction
The main loop arms these timers only when relevant. For example, when sending ReadyForQuery, it chooses which idle timeout to enable based on current transaction state:
if (send_ready_for_query)
{
if (IsAbortedTransactionBlockState())
{
set_ps_display("idle in transaction (aborted)");
pgstat_report_activity(STATE_IDLEINTRANSACTION_ABORTED, NULL);
if (IdleInTransactionSessionTimeout > 0 &&
(IdleInTransactionSessionTimeout < TransactionTimeout ||
TransactionTimeout == 0))
{
idle_in_transaction_timeout_enabled = true;
enable_timeout_after(IDLE_IN_TRANSACTION_SESSION_TIMEOUT,
IdleInTransactionSessionTimeout);
}
}
else if (IsTransactionOrTransactionBlock())
{
set_ps_display("idle in transaction");
pgstat_report_activity(STATE_IDLEINTRANSACTION, NULL);
if (IdleInTransactionSessionTimeout > 0 &&
(IdleInTransactionSessionTimeout < TransactionTimeout ||
TransactionTimeout == 0))
{
idle_in_transaction_timeout_enabled = true;
enable_timeout_after(IDLE_IN_TRANSACTION_SESSION_TIMEOUT,
IdleInTransactionSessionTimeout);
}
}
else
{
set_ps_display("idle");
pgstat_report_activity(STATE_IDLE, NULL);
if (IdleSessionTimeout > 0)
{
idle_session_timeout_enabled = true;
enable_timeout_after(IDLE_SESSION_TIMEOUT,
IdleSessionTimeout);
}
}
ReadyForQuery(whereToSendOutput);
send_ready_for_query = false;
}
Later, ProcessInterrupts turns the corresponding pending flags into hard outcomes with specific SQLSTATEs:
if (IdleInTransactionSessionTimeoutPending)
{
IdleInTransactionSessionTimeoutPending = false;
if (IdleInTransactionSessionTimeout > 0)
{
INJECTION_POINT("idle-in-transaction-session-timeout", NULL);
ereport(FATAL,
(errcode(ERRCODE_IDLE_IN_TRANSACTION_SESSION_TIMEOUT),
errmsg("terminating connection due to idle-in-transaction timeout")));
}
}
if (IdleSessionTimeoutPending)
{
IdleSessionTimeoutPending = false;
if (IdleSessionTimeout > 0)
{
INJECTION_POINT("idle-session-timeout", NULL);
ereport(FATAL,
(errcode(ERRCODE_IDLE_SESSION_TIMEOUT),
errmsg("terminating connection due to idle-session timeout")));
}
}
Why this matters: these timeouts are resource guards and behavioral signals. Misbehaving applications that leave transactions open or sessions idle get specific error codes that operators can monitor and feed back into development.
The same layer is a natural place to define useful counters, such as:
backend_statement_timeout_count– how often statements hitSTATEMENT_TIMEOUTbackend_idle_in_transaction_timeout_count– how often sessions die while idle in a transactionbackend_protocol_violation_count– how often we raisePROTOCOL_VIOLATION, often due to buggy clients
Patterns to steal for your own servers
Reading postgres.c as a story rather than a 2,500-line C file surfaces a set of reusable patterns. They apply whether you’re building a database, a gRPC service, or a custom control plane.
1. Make the request pipeline explicit
- Expose functions like
parse,analyze,plan, andexecuteas separate steps, even if they’re always called together today. - Document invariants for each step (for example, “planner requires an active snapshot”).
- In long-lived processes, scope memory per stage (PostgreSQL’s
MessageContextand per-statement contexts are a strong reference).
2. Centralize protocol dispatch
- Have a single place where you decode and validate incoming messages (e.g., a
SocketBackend-style read loop plus a dispatch switch). - Fail fast on invalid types or sizes with clear, fatal errors; a hard disconnect is better than a desynchronized protocol.
- Keep the main loop readable by extracting a focused dispatcher (for example, a
handle_client_message-style helper) rather than expanding the switch indefinitely.
3. Treat interrupts and timeouts as first-class inputs
- Keep signal handlers minimal; let them set flags.
- Route all handling through one
ProcessInterrupts-style function that encodes precedence and context rules. - Express timeout precedence as code (lock vs statement timeouts, idle vs transaction limits), not as folklore in comments.
4. Encapsulate policy: logging, privacy, behavior
- Implement small helpers like
check_log_statementandcheck_log_durationto decide what to log and when. - Use configuration-driven guards (e.g.,
log_parameter_max_lengthand similar) to prevent logs from leaking entire payloads or PII. - Let those helpers be the only place that knows about sampling rates and thresholds.
5. Accept some centralization, but fight monolith bloat
postgres.c shows both good patterns and inevitable trade-offs:
- The monolithic
PostgresMainswitch and intertwined behaviors increase regression risk when adding new message types. - Global session flags like
xact_started,DoingCommandRead,doing_extended_query_message, andignore_till_synccreate implicit coupling between distant code paths. - Protocol handling, interrupts, command-line parsing, and some GUC plumbing all live in the same file.
The suggested refactors in the upstream discussions—extracting a dedicated message dispatcher, encapsulating session state in a struct, and factoring timeout logic into helpers—are ways to keep the traffic cop’s role clear while shrinking its blast radius.
Bringing it back to your code
If you’re designing or refactoring a server today, you can apply these ideas immediately:
- Draw your ASCII call graph. Sketch how requests flow through your process, including where you read from the network and where you decide on timeouts and logging.
- Introduce a single interrupt handler. Collect cancellation, timeouts, and shutdown into a
ProcessInterrupts-like function, and call it from safe points in your pipeline. - Split your main loop by responsibility. Separate
read_message,dispatch_message, andrun_pipeline, and give each a narrow, testable contract.
The primary lesson from PostgreSQL’s traffic cop is simple: robust servers make their control loops and state transitions explicit. postgres.c keeps the protocol honest, transactions well-scoped, and interrupts under control by treating message handling, timeouts, and logging as first-class parts of the design—not afterthoughts.
If we bring that same discipline into our own services, we end up with systems that are not just fast, but also trustworthy when the intersection gets busy.





