We’re examining how NGINX boots itself into a multi‑process, zero‑downtime engine. NGINX is a high‑performance reverse proxy and web server used to terminate and route enormous amounts of traffic. At the center of its startup path is src/core/nginx.c, the file that owns main(), wires configuration into a process model, and quietly enables hot upgrades and CPU‑aware scaling. I'm Mahmoud Zalt, an AI solutions architect, and we’ll use this file to uncover a single lesson: treat startup as a first‑class, carefully designed system, not just glue before the “real” work.
We’ll build a mental model of this bootstrap layer, see how it implements zero‑downtime binary upgrades, how it turns a few core directives into a scalable worker model, and then translate those patterns into concrete practices for our own services.
The Stage Crew Behind NGINX
The src/core/nginx.c file is not where HTTP requests are handled; it’s the stage crew. It builds the set (configuration), arranges the props (environment, sockets, pid/lock files), invites guest performers (dynamic modules), then opens the curtain and lets other subsystems run the show.
nginx/
├── src/
│ ├── core/
│ │ ├── nginx.c # this file: main() and core module
│ │ ├── ngx_cycle.c # cycle creation and management
│ │ ├── ngx_log.c # logging subsystem
│ │ ├── ngx_conf_file.c # configuration parser
│ │ ├── ngx_os.c # OS-specific initialization
│ │ └── ...
│ ├── http/
│ │ ├── ngx_http.c # HTTP module entry
│ │ └── ...
│ ├── stream/
│ │ └── ...
│ └── mail/
│ └── ...
└── objs/
└── nginx # built binary invoking main()
nginx.c sits at the top of the core layer, orchestrating everything else.At the center is main(). It:
- Parses CLI flags (
-t,-s,-p,-g,-T, etc.). - Initializes OS and core subsystems (errors, time, regex, SSL, CRC, slab sizes).
- Creates an initial
ngx_cycle_t(the runtime configuration “universe”). - Loads modules and parses configuration into that cycle.
- Chooses a process model (single vs master/worker) and daemonizes if needed.
- Finally hands control to
ngx_master_process_cycle()orngx_single_process_cycle().
This startup path is also where NGINX wires in two advanced operational capabilities we often take for granted: hot upgrades with zero downtime and CPU‑aware scaling. Both are expressed as ordinary configuration and environment handling, not as special‑case hacks.
The entry point to that configuration is the core module’s directive table, which works like a programmable control panel for startup behavior:
static ngx_command_t ngx_core_commands[] = {
{ ngx_string("daemon"),
NGX_MAIN_CONF|NGX_DIRECT_CONF|NGX_CONF_FLAG,
ngx_conf_set_flag_slot,
0,
offsetof(ngx_core_conf_t, daemon),
NULL },
{ ngx_string("master_process"),
NGX_MAIN_CONF|NGX_DIRECT_CONF|NGX_CONF_FLAG,
ngx_conf_set_flag_slot,
0,
offsetof(ngx_core_conf_t, master),
NULL },
{ ngx_string("worker_processes"),
NGX_MAIN_CONF|NGX_DIRECT_CONF|NGX_CONF_TAKE1,
ngx_set_worker_processes,
0,
0,
NULL },
...
};
Each entry ties a directive name (like worker_processes) to:
- A scope (
NGX_MAIN_CONF|NGX_DIRECT_CONFmeans main‑level or-gon the CLI). - A parser (
ngx_conf_set_flag_slot,ngx_conf_set_str_slot, or a custom handler). - An offset into
ngx_core_conf_t, the central configuration struct.
The rest of NGINX reaches that struct through:
ngx_core_conf_t *ccf = ngx_get_conf(cycle->conf_ctx, ngx_core_module);
and then obeys whatever it says about daemonization, master mode, worker count, pid file paths, and CPU affinity. Startup becomes data‑driven and extensible instead of being hard‑coded branches in main().
How NGINX Swaps Binaries Without Dropping Connections
The most impressive trick implemented in nginx.c is hot upgrading the NGINX binary without downtime. The idea is to treat listening sockets as precious shared state, pass them from the old master to the new one, and coordinate the swap via the environment and pid files.
Inheriting sockets in the new binary
When a new NGINX binary starts during an upgrade, it doesn’t open fresh listening sockets. It reads a special environment variable, NGINX_VAR, which encodes file descriptors from the old master, and rehydrates its cycle->listening array from that string:
static ngx_int_t
ngx_add_inherited_sockets(ngx_cycle_t *cycle)
{
u_char *p, *v, *inherited;
ngx_int_t s;
ngx_listening_t *ls;
inherited = (u_char *) getenv(NGINX_VAR);
if (inherited == NULL) {
return NGX_OK;
}
ngx_log_error(NGX_LOG_NOTICE, cycle->log, 0,
"using inherited sockets from \"%s\"", inherited);
if (ngx_array_init(&cycle->listening, cycle->pool, 10,
sizeof(ngx_listening_t))
!= NGX_OK)
{
return NGX_ERROR;
}
for (p = inherited, v = p; *p; p++) {
if (*p == ':' || *p == ';') {
s = ngx_atoi(v, p - v);
if (s == NGX_ERROR) {
ngx_log_error(NGX_LOG_EMERG, cycle->log, 0,
"invalid socket number \"%s\" in " NGINX_VAR,
v);
break;
}
v = p + 1;
ls = ngx_array_push(&cycle->listening);
if (ls == NULL) {
return NGX_ERROR;
}
ngx_memzero(ls, sizeof(ngx_listening_t));
ls->fd = (ngx_socket_t) s;
ls->inherited = 1;
}
}
...
}
It validates each file descriptor, logs EMERG on malformed data, and populates the same cycle->listening structure that a cold start would. Every later subsystem works against that abstraction and doesn’t care whether sockets were created or inherited.
By converging cold start and hot upgrade on the same cycle->listening representation, NGINX keeps upgrade complexity localized to startup instead of sprinkling special‑case checks across the codebase.
Preparing the environment in the old master
On the other side, the old master has to construct that NGINX_VAR value and execute the new binary. That’s handled by ngx_exec_new_binary():
ngx_pid_t
ngx_exec_new_binary(ngx_cycle_t *cycle, char *const *argv)
{
char **env, *var;
u_char *p;
ngx_uint_t i, n;
ngx_pid_t pid;
ngx_exec_ctx_t ctx;
ngx_core_conf_t *ccf;
ngx_listening_t *ls;
ngx_memzero(&ctx, sizeof(ngx_exec_ctx_t));
ctx.path = argv[0];
ctx.name = "new binary process";
ctx.argv = argv;
n = 2;
env = ngx_set_environment(cycle, &n);
if (env == NULL) {
return NGX_INVALID_PID;
}
var = ngx_alloc(sizeof(NGINX_VAR)
+ cycle->listening.nelts * (NGX_INT32_LEN + 1) + 2,
cycle->log);
if (var == NULL) {
ngx_free(env);
return NGX_INVALID_PID;
}
p = ngx_cpymem(var, NGINX_VAR "=", sizeof(NGINX_VAR));
ls = cycle->listening.elts;
for (i = 0; i < cycle->listening.nelts; i++) {
if (ls[i].ignore) {
continue;
}
p = ngx_sprintf(p, "%ud;", ls[i].fd);
}
*p = '\0';
env[n++] = var;
ctx.envp = (char *const *) env;
ccf = (ngx_core_conf_t *) ngx_get_conf(cycle->conf_ctx, ngx_core_module);
if (ngx_rename_file(ccf->pid.data, ccf->oldpid.data) == NGX_FILE_ERROR) {
...
return NGX_INVALID_PID;
}
pid = ngx_execute(cycle, &ctx);
if (pid == NGX_INVALID_PID) {
(void) ngx_rename_file(ccf->oldpid.data, ccf->pid.data);
}
ngx_free(env);
ngx_free(var);
return pid;
}
NGINX_VAR, swaps pid files, and execs the new binary.The sequence is deliberate and reversible:
- Build a base environment via
ngx_set_environment(). - Append
NGINX_VAR=fd1;fd2;...;for all non‑ignored listening sockets. - Rename the pid file to the “old” pid before
exec, so tooling can distinguish old vs. new master. - Execute the new binary; if that fails, restore the original pid filename.
This is “swap the engine of a moving ship” done in a small, testable surface area: sockets and pid files are the only shared contracts, and both are handled with validation, logging, and a rollback path.
Why stuff FDs into an environment variable?
Passing file descriptors via environment looks odd, but it builds on the existing exec model: the new process already inherits open FDs. The only missing piece is a way to identify which ones are listening sockets. An environment variable is portable, inspectable, and doesn’t require a separate coordination channel or long‑lived helper process.
Observability for a rare, critical path
The hot‑upgrade path is rarely executed but high impact. The analysis suggests metrics such as:
nginx_hot_upgrade_attempts_total– how oftenngx_exec_new_binary()runs.nginx_inherited_sockets_count– how many sockets the new binary parsed fromNGINX_VAR.
These aren’t performance metrics; they’re safety signals. If upgrades start inheriting zero sockets or failing to exec, you want alerts before users hit downtime.
Scaling Out: Workers and CPU Affinity
Hot upgrades keep NGINX continuous; worker processes and CPU affinity determine how much load it can sustain. Both are set up entirely at startup through core directives and a few helper functions.
Choosing worker count
The worker_processes directive is parsed by ngx_set_worker_processes(). It supports an auto mode that maps directly to CPU cores:
static char *
ngx_set_worker_processes(ngx_conf_t *cf, ngx_command_t *cmd, void *conf)
{
ngx_str_t *value;
ngx_core_conf_t *ccf;
ccf = (ngx_core_conf_t *) conf;
if (ccf->worker_processes != NGX_CONF_UNSET) {
return "is duplicate";
}
value = cf->args->elts;
if (ngx_strcmp(value[1].data, "auto") == 0) {
ccf->worker_processes = ngx_ncpu;
return NGX_CONF_OK;
}
ccf->worker_processes = ngx_atoi(value[1].data, value[1].len);
if (ccf->worker_processes == NGX_ERROR) {
return "invalid value";
}
return NGX_CONF_OK;
}
Auto‑scaling here is intentionally simple: one worker per core using ngx_ncpu. There’s no runtime feedback loop, just a clear rule applied once at startup.
Pinning workers to CPUs
On platforms that support CPU affinity, the worker_cpu_affinity directive lets operators specify exact masks or ask NGINX to derive them automatically. The parser:
- Accepts
autowith at most one extra mask argument. - Enforces
CPU_SETSIZEas an upper bound on addressable CPUs. - Validates that masks contain only
0,1, and spaces.
Later, ngx_core_module_init_conf() compares the number of masks to worker_processes and, if they differ, logs a warning and falls back gracefully:
if (!ccf->cpu_affinity_auto
&& ccf->cpu_affinity_n
&& ccf->cpu_affinity_n != 1
&& ccf->cpu_affinity_n != (ngx_uint_t) ccf->worker_processes)
{
ngx_log_error(NGX_LOG_WARN, cycle->log, 0,
"the number of \"worker_processes\" is not equal to "
"the number of \"worker_cpu_affinity\" masks, "
"using last mask for remaining worker processes");
}
Hard syntax errors (invalid masks) abort startup; minor semantic mismatches are tolerated with a clear WARN and a predictable default (reuse the last mask).
Serving a mask to each worker
When the master forks workers, it asks ngx_get_cpu_affinity() which mask to apply for worker n:
ngx_cpuset_t *
ngx_get_cpu_affinity(ngx_uint_t n)
{
#if (NGX_HAVE_CPU_AFFINITY)
ngx_uint_t i, j;
ngx_cpuset_t *mask;
ngx_core_conf_t *ccf;
static ngx_cpuset_t result;
ccf = (ngx_core_conf_t *) ngx_get_conf(ngx_cycle->conf_ctx,
ngx_core_module);
if (ccf->cpu_affinity == NULL) {
return NULL;
}
if (ccf->cpu_affinity_auto) {
mask = &ccf->cpu_affinity[ccf->cpu_affinity_n - 1];
for (i = 0, j = n; /* void */ ; i++) {
if (CPU_ISSET(i % CPU_SETSIZE, mask) && j-- == 0) {
break;
}
if (i == CPU_SETSIZE && j == n) {
/* empty mask */
return NULL;
}
}
CPU_ZERO(&result);
CPU_SET(i % CPU_SETSIZE, &result);
return &result;
}
if (ccf->cpu_affinity_n > n) {
return &ccf->cpu_affinity[n];
}
return &ccf->cpu_affinity[ccf->cpu_affinity_n - 1];
#else
return NULL;
#endif
}
For auto, it walks the base mask and assigns one CPU per worker in order. For explicit masks, it returns the nth mask or the last one as a fallback.
There is a deliberate trade‑off here: result is a static mutable buffer, which makes this helper non‑reentrant and awkward in a multithreaded world. The analysis calls this out as a code smell and suggests a future API that writes into a caller‑provided buffer instead.
Startup as an Operational Contract
NGINX’s bootstrap code doesn’t just wire processes; it defines how operators and tooling interact with the server day‑to‑day. The CLI, environment handling, and pid/lock file management together form an operational API.
CLI as a façade over startup modes
ngx_get_options() parses CLI flags into a small set of globals like ngx_test_config, ngx_dump_config, ngx_quiet_mode, and ngx_signal. main() then branches early based on those values:
| Scenario | Key flags | What main() actually does |
|---|---|---|
| Config test | -t / -T |
Initialize a cycle, parse config, log success/failure, optionally dump config, then exit. |
| Signal existing master | -s stop|quit|reopen|reload |
Call ngx_signal_process() against the pid file, then exit; no new master/worker cycle starts. |
| Normal start | no -t, no -s |
Initialize cycle, create pid/lock files, daemonize if configured, then enter master or single‑process cycle. |
Config tests stay side‑effect‑free with respect to pid files and workers, which makes them safe in CI, deployment scripts, and orchestrators. Signals are handled as a separate control path that doesn’t interleave with full initialization.
Environment as an explicit resource
ngx_set_environment() treats the process environment as something to own explicitly, not a global afterthought. It:
- Ensures
TZis present, adding it if needed. - Honors
envdirectives from config by copying named variables from the OS environment. - Registers cleanup handlers for the environment array and any allocated variable strings.
- On exit, deliberately leaks a few bytes if environment strings might still be referenced, preferring safety over aggressive freeing.
Control‑plane health, not just data‑plane metrics
The analysis highlights several high‑leverage metrics you can derive from this startup layer:
nginx_master_startup_duration_seconds– time from process start to entering the master/single cycle.nginx_config_reload_duration_seconds– time spent inngx_init_cycle()when reloading.nginx_dynamic_module_load_failures_total– EMERG‑level failures fromngx_load_module().
These are control‑plane metrics: they describe the health of configuration parsing, dynamic module loading, and process orchestration. When they regress, the root cause is almost always at the boundaries this file manages—filesystems, ABI changes, configuration drift—rather than inside request handlers.
Design Patterns to Reuse
Stepping back from the C details, nginx.c offers a blueprint for designing startup as part of the architecture of any serious service.
-
Treat startup as a designed system, not a dump of initialization calls.
NGINX’s
main()still lives in a single function, but conceptually it’s phased: parse options, build a core config object, initialize OS‑level subsystems, then choose a process model and enter the appropriate cycle. In your own services, make those phases explicit—ideally as separate functions or modules—and be clear about what side effects each phase is allowed to have. -
Centralize configuration in a typed core struct.
The combination of
ngx_core_conf_tandngx_core_commands[]means new directives are added in one place and surfaced through a single accessor (ngx_get_conf()). If you find your startup scattered across many globals and ad‑hoc flags, introduce a coreStartupConfig(or similar) and a small, declarative way of populating it. -
Design hot upgrade and reload as first‑class flows.
NGINX’s zero‑downtime upgrade path (
ngx_exec_new_binary()↔ngx_add_inherited_sockets()) is localized, reversible, and observable. If you need “restart without downtime,” give that path a clear contract: what state is handed off, how failures are detected, and how to roll back. Don’t hide it as a side effect of “restart” scripts. -
Treat OS resources as contracts with your ecosystem.
Pid files, lock files, environment variables, and CPU affinity aren’t just implementation details; they’re how systemd units, Kubernetes, and shell scripts coordinate with your process. Validate them, log clearly when they change or fail, and avoid surprising behavior across reloads (for example, silently changing pid paths).
-
Avoid hidden shared state in helpers.
Helpers like
ngx_get_cpu_affinity()that return static buffers couple callers to hidden lifetime rules. In higher‑level languages it’s usually trivial to pass output buffers or return immutable values; doing so will make your startup and orchestration code much easier to reason about and to parallelize later.
The primary lesson from NGINX’s bootstrap layer is simple but easy to ignore: startup is part of your system’s architecture. In nginx.c, that architecture is what turns a single binary into a robust, upgradeable, multi‑process engine. If we adopt the same mindset—treating initialization, upgrades, and process orchestration as first‑class concerns—we can make our own services far more predictable under change, not just under load.



