<article>
  <header>
    <h1>Bootstrapping curl’s CLI Safely</h1>
    <p>The tiniest part of a tool can decide its reliability. In curl’s case, that’s the entry point: a small file that sets the stage for everything the tool will do. I’m Mahmoud Zalt, and in this article I’ll walk you through the practical engineering behind curl’s bootstrap layer.</p>
    <p>We’ll examine <a href="https://github.com/curl/curl/blob/master/src/tool_main.c" target="_blank" rel="noopener">src/tool_main.c</a> from the <a href="https://github.com/curl/curl" target="_blank" rel="noopener">curl project</a>—the command-line tool built on top of libcurl. This file orchestrates OS-specific initialization, file descriptor hygiene, signal handling, and debug toggles before delegating the real work to <code>operate()</code>. Expect concrete takeaways on maintainability, extensibility, usability/DX, and reliability at scale.</p>
    <p>Roadmap: How It Works → What’s Brilliant → Areas for Improvement → Performance at Scale → Conclusion.</p>
  </header>

  <nav aria-label="Mini table of contents">
    <ul>
      <li><a href="#how-it-works">How It Works</a></li>
      <li><a href="#whats-brilliant">What’s Brilliant</a></li>
      <li><a href="#areas-for-improvement">Areas for Improvement</a></li>
      <li><a href="#performance-at-scale">Performance at Scale</a></li>
      <li><a href="#conclusion">Conclusion</a></li>
    </ul>
  </nav>

  <section id="how-it-works">
    <h2>How It Works</h2>
    <p>Before we can improve anything, we have to understand the flow. The entry file is a classic bootstrap—sometimes called a <dfn>composition root</dfn>—that wires together early process concerns and then hands off to the tool’s core logic.</p>

    <figure>
      <pre>curl (project root)
├─ lib/            [libcurl]
├─ src/
│  ├─ tool_operate.c   <-- main delegates here
│  ├─ tool_cfgable.c
│  ├─ tool_main.c      <-- this file (entry/boot)
│  ├─ tool_msgs.c
│  └─ ...
└─ docs/

Call graph (simplified):

[OS loader] -> main/wmain
  -> tool_init_stderr
  -> (Windows) GetLoadedModulePaths [when --dump-module-paths]
  -> win32_init (Windows)
  -> main_checkfds
  -> signal(SIGPIPE, SIG_IGN)
  -> memory_tracking_init
  -> globalconf_init -> operate -> globalconf_free
  -> (Windows) fflush(NULL)
  -> return/vms_special_exit</pre>
      <figcaption>Bootstrap and call graph for src/tool_main.c — the entry point for curl’s CLI tool.</figcaption>
    </figure>

    <p>In plain terms, here’s what the file does:</p>
    <ul>
      <li>Sets up stderr routing early via <code>tool_init_stderr()</code>.</li>
      <li>Handles Windows-specific initialization and a hidden diagnostic switch <code>--dump-module-paths</code> (prints loaded module paths).</li>
      <li>Ensures standard file descriptors are valid before any sockets are opened (<code>main_checkfds()</code>).</li>
      <li>Installs an ignore for SIGPIPE on POSIX so writes to broken pipes don’t kill the process.</li>
      <li>Optionally enables memory tracking during development builds using <code>CURL_MEMDEBUG</code> and <code>CURL_MEMLIMIT</code>.</li>
      <li>Initializes global config, calls <code>operate(argc, argv)</code>, cleans up, then exits with a mapped <code>CURLcode</code>.</li>
    </ul>

    <p>There are a few essential invariants maintained along the way:</p>
    <ul>
      <li>No tool/libcurl operations happen before <code>globalconf_init()</code>.</li>
      <li><code>operate()</code> only runs if initialization succeeds.</li>
      <li>File descriptors 0, 1, 2 are made safe before network activity begins.</li>
      <li>SIGPIPE is ignored globally to prefer error handling over abrupt termination.</li>
    </ul>

    <aside class="callout">
      <p>Windows has two entry points here: <code>main</code> and <code>wmain</code>. <code>wmain</code> handles Unicode argv on Windows; otherwise the logic is equivalent.</p>
    </aside>

    <p>Two helper routines carry a lot of practical weight: <code>main_checkfds()</code> and <code>memory_tracking_init()</code>. They’re small, but their behavior shapes reliability and developer experience.</p>

    <h3>File-descriptor hygiene</h3>
    <p>First, here’s the verbatim code curl uses to ensure the standard file descriptors exist. This matters because if stdin/stdout/stderr are closed, the first sockets created by curl could accidentally become those descriptors.</p>

    <figure>
      <figcaption>FD hygiene in tool_main.c (lines 44–63). <a href="https://github.com/curl/curl/blob/master/src/tool_main.c#L44-L63" target="_blank" rel="noopener">View on GitHub</a></figcaption>
      <pre class="language-c">static int main_checkfds(void)
{
  int fd[2];
  while((fcntl(STDIN_FILENO, F_GETFD) == -1) ||
        (fcntl(STDOUT_FILENO, F_GETFD) == -1) ||
        (fcntl(STDERR_FILENO, F_GETFD) == -1))
    if(pipe(fd))
      return 1;
  return 0;
}</pre>
    </figure>
    <p class="why">By looping until 0, 1, and 2 are occupied, the process avoids misusing network sockets as stdio. It’s a pragmatic guard against surprising environments.</p>

    <h3>Memory tracking in debug builds</h3>
    <p>When building with <code>CURLDEBUG</code>, the tool reads two environment variables to enable fine-grained memory diagnostics: <code>CURL_MEMDEBUG</code> (filename for logs) and <code>CURL_MEMLIMIT</code> (fail on nth allocation). These are invaluable for troubleshooting allocation problems in CI or local dev.</p>

    <details>
      <summary>Why a process-wide SIGPIPE ignore?</summary>
      <p>Ignoring <code>SIGPIPE</code> prevents abrupt termination when the other end of a pipe closes early. That converts a crash into a normal error path (e.g., <code>EPIPE</code>) you can handle gracefully. The trade-off is global: it applies to the entire process and any threads created later. Documenting this near the installation site helps future maintainers reason about write semantics and error handling.</p>
    </details>
  </section>

  <section id="whats-brilliant">
    <h2>What’s Brilliant</h2>
    <p>With the flow understood, let’s recognize the design choices that make this file robust and maintainable. These are practices you can lift into your own CLIs.</p>

    <ul>
      <li>Bootstrap done right. The entry point is a thin <em>composition root</em> that wires up process-wide concerns and delegates behavior to <code>operate()</code>. This keeps policy out of the entry layer and makes the tool easier to evolve.</li>
      <li>Platform abstraction via conditional compilation. Windows, VMS, Amiga, and POSIX flows are clearly separated. This isolates complexity and protects maintainability.</li>
      <li>Guarded debug feature flags. Memory tracking features are gated behind <code>CURLDEBUG</code> and enabled by environment variables. This yields powerful diagnostics with negligible runtime cost in production builds.</li>
      <li>FD hygiene prevents hard-to-debug misroutes. Proactively occupying descriptors 0–2 avoids a class of bugs that would only surface under unusual shells or embedding environments.</li>
      <li>Clear invariants. No libcurl usage before init; always cleanup after operate; process exit code is mapped from a strongly-typed <code>CURLcode</code>.</li>
    </ul>

    <aside class="callout">
      <p>Small but mighty: the hidden Windows diagnostic <code>--dump-module-paths</code> offers quick visibility—handy for support engineers. We’ll discuss how to make it safer and discoverable later.</p>
    </aside>

    <p>As a bootstrap, the file keeps complexity low. Per-function metrics reinforce that point: <code>main_checkfds</code> is 13 SLOC with cyclomatic 3; <code>memory_tracking_init</code> is 24 SLOC with cyclomatic 4; <code>main</code> is still readable at 70 SLOC. That clarity pays dividends when debugging early failures.</p>
  </section>

  <section id="areas-for-improvement">
    <h2>Areas for Improvement</h2>
    <p>Even great bootstrap code benefits from polish. Here’s a prioritized list of risks and pragmatic fixes grounded in the code.</p>

    <table>
      <thead>
        <tr>
          <th>Smell</th>
          <th>Impact</th>
          <th>Fix</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <td>Use of <code>strcpy</code> on env-derived data</td>
          <td>Unsafe copy pattern; increases maintenance risk despite bounds checks.</td>
          <td>Use <code>snprintf</code> with explicit bounds and NUL-termination.</td>
        </tr>
        <tr>
          <td>Securing stdio FDs via anonymous pipes</td>
          <td>Writes to stdout/stderr can block or raise <code>EPIPE</code> when no reader exists; behavior diverges from conventional null device semantics.</td>
          <td>Reopen missing FDs to the platform null device (<code>/dev/null</code> or <code>NUL</code>).</td>
        </tr>
        <tr>
          <td>Global <code>SIGPIPE</code> ignore</td>
          <td>Process-wide effect can mask broken-pipe expectations down the stack.</td>
          <td>Document near the installation site; consider more localized handling in lower layers where possible.</td>
        </tr>
        <tr>
          <td>Hidden Windows diagnostic switch</td>
          <td>Undocumented behavior surprises users; may reveal sensitive path details.</td>
          <td>Document guarded by a build flag or move under a clearly prefixed debug flag.</td>
        </tr>
      </tbody>
    </table>

    <h3>Refactor 1: Safer, bounded copy for <code>CURL_MEMDEBUG</code></h3>
    <p>Replace the <code>strcpy</code>-based copy with a bounded <code>snprintf</code> to simplify reasoning and guarantee termination.</p>

    <figure>
      <figcaption>Bounded copy refactor</figcaption>
      <pre class="language-diff">--- a/src/tool_main.c
+++ b/src/tool_main.c
@@
-    char fname[512];
-    if(strlen(env) &gt;= sizeof(fname))
-      env[sizeof(fname)-1] = '\0';
-    strcpy(fname, env);
+    char fname[512];
+    /* Copy with explicit bound and guarantee NUL-termination */
+    snprintf(fname, sizeof(fname), "%s", env);
</pre>
    </figure>
    <p class="why">This change removes an error-prone primitive and expresses the intent clearly: copy the env value into a fixed buffer, safely.</p>

    <h3>Refactor 2: Restore stdio using the null device</h3>
    <p>Instead of consuming anonymous pipes to occupy <dfn>FDs</dfn> 0–2, reopen any missing descriptor to the platform’s null device. This aligns behavior with Unix conventions and avoids surprising blocking.</p>

    <figure>
      <figcaption>Replace pipes with <code>/dev/null</code> (or <code>NUL</code> on Windows)</figcaption>
      <pre class="language-diff">--- a/src/tool_main.c
+++ b/src/tool_main.c
@@
-static int main_checkfds(void)
-{
-  int fd[2];
-  while((fcntl(STDIN_FILENO, F_GETFD) == -1) ||
-        (fcntl(STDOUT_FILENO, F_GETFD) == -1) ||
-        (fcntl(STDERR_FILENO, F_GETFD) == -1))
-    if(pipe(fd))
-      return 1;
-  return 0;
-}
+static int main_checkfds(void)
+{
+#ifdef _WIN32
+  const char *nul = "NUL";
+#else
+  const char *nul = "/dev/null";
+#endif
+  if(fcntl(STDIN_FILENO, F_GETFD) == -1) {
+    int n = open(nul, O_RDONLY);
+    if(n &lt; 0) return 1;
+    if(n != STDIN_FILENO) close(n);
+  }
+  if(fcntl(STDOUT_FILENO, F_GETFD) == -1) {
+    int n = open(nul, O_WRONLY);
+    if(n &lt; 0) return 1;
+    if(n != STDOUT_FILENO) close(n);
+  }
+  if(fcntl(STDERR_FILENO, F_GETFD) == -1) {
+    int n = open(nul, O_WRONLY);
+    if(n &lt; 0) return 1;
+    if(n != STDERR_FILENO) close(n);
+  }
+  return 0;
+}
</pre>
    </figure>
    <p class="why">Occupying stdio with the null device prevents deadlocks and respects how other Unix tools behave when stdout/stderr are absent.</p>

    <h3>Refactor 3: Document global <code>SIGPIPE</code> semantics</h3>
    <p>One well-placed comment can save hours of debugging for future contributors.</p>

    <figure>
      <figcaption>Make the global effect explicit</figcaption>
      <pre class="language-diff">--- a/src/tool_main.c
+++ b/src/tool_main.c
@@
-#if defined(HAVE_SIGNAL) &amp;&amp; defined(SIGPIPE)
-  (void)signal(SIGPIPE, SIG_IGN);
-#endif
+#if defined(HAVE_SIGNAL) &amp;&amp; defined(SIGPIPE)
+  /* Global process-level change: avoid termination on broken pipes.
+     Downstream writes must handle EPIPE returns explicitly. */
+  (void)signal(SIGPIPE, SIG_IGN);
+#endif
</pre>
    </figure>
    <p class="why">By stating the trade-off, we set clear expectations for all I/O that follows.</p>

    <aside class="callout">
      <p>On the Windows diagnostic switch, consider surfacing it in <code>--help</code> behind a “debug” section or a <code>--debug-*</code> prefix. That keeps the power while making intent and risks explicit.</p>
    </aside>
  </section>

  <section id="performance-at-scale">
    <h2>Performance at Scale</h2>
    <p>Although the entry point is not CPU-bound, bootstrap quality shows up in reliability and tail behavior. Here’s how to think about it operationally.</p>

    <h3>Hot paths and latency</h3>
    <ul>
      <li><code>operate(argc, argv)</code> dominates runtime (outside this file).</li>
      <li><code>main_checkfds()</code> can become a surprise hot path in environments that start processes with stdio closed.</li>
      <li>Environment parsing (<code>CURL_MEMDEBUG</code>, <code>CURL_MEMLIMIT</code>) is O(n) in small strings—negligible for latency.</li>
    </ul>

    <h3>Scalability and I/O safety</h3>
    <p>When stdout/stderr are closed, the current pipe-based strategy may block writers with no consumer. Reopening to the null device eliminates that risk and aligns with conventional tooling. If you keep pipes, be sure your write paths handle <code>EPIPE</code> and that logs don’t silently stall.</p>

    <h3>Observability suggestions</h3>
    <p>Bootstrap is a perfect place to emit cheap, high-signal measurements. Start with three metrics:</p>
    <ul>
      <li><code>tool.startup.duration_ms</code>: p95 <abbr title="Service Level Objective">SLO</abbr> under 10ms on typical systems.</li>
      <li><code>tool.startup.stderr_fd_open</code>: boolean; verify FD 2 is valid post <code>main_checkfds()</code>.</li>
      <li><code>tool.env.memdebug.enabled</code>: track the rate of runs with memory tracking turned on.</li>
    </ul>
    <p>These let you detect regressions (slow startups), environment anomalies (missing stdio), and the blast radius of debug features in production.</p>

    <h3>Testing the bootstrap</h3>
    <p>Entry-point code touches process-wide concerns that are hard to unit test. Favor integration harnesses that sandbox the environment, especially for file descriptors and signals. Here’s a minimal test harness inspired by the plan to verify FD restoration when 0–2 start closed.</p>

    <figure>
      <figcaption>Test harness (illustrative): spawn curl with 0,1,2 closed</figcaption>
      <pre class="language-c">#include &lt;unistd.h&gt;
#include &lt;stdlib.h&gt;
int main(void) {
  close(0); close(1); close(2);
  execlp("curl", "curl", "--version", NULL);
  return 127; /* exec failed */
}</pre>
    </figure>
    <p class="why">This validates that <code>main_checkfds()</code> succeeds and the process doesn’t fail with <code>CURLE_FAILED_INIT</code> even when launched without stdio.</p>

    <p>Additional high-value tests:</p>
    <ul>
      <li><strong>Memory tracking enablement:</strong> set <code>CURL_MEMDEBUG</code> to a writable path; assert the log is written and the command still succeeds.</li>
      <li><strong>Allocation-failure injection:</strong> set <code>CURL_MEMLIMIT=10</code> and expect a deterministic failure path in a debug build.</li>
      <li><strong>Windows module dump:</strong> <code>curl.exe --dump-module-paths</code> prints non-empty absolute paths and exits 0 if any.</li>
    </ul>

    <aside class="callout">
      <p>Trace the bootstrap as a single span: attributes like <code>platform</code>, <code>has_stdio</code>, and <code>memdebug_enabled</code> give just enough context when diagnosing startup issues.</p>
    </aside>
  </section>

  <section id="conclusion">
    <h2>Conclusion</h2>
    <p>Small files, big impact. Curl’s <a href="https://github.com/curl/curl/blob/master/src/tool_main.c" target="_blank" rel="noopener">tool_main.c</a> is a model bootstrap: cohesive, readable, and careful about the realities of cross-platform processes. A few finishing touches can make it even safer and more predictable in odd environments.</p>

    <ul>
      <li>Adopt safer copies for env-derived strings; prefer <code>snprintf</code> over <code>strcpy</code>.</li>
      <li>Restore stdio to the null device instead of consuming pipes—predictable behavior, fewer surprises.</li>
      <li>Document global effects like <code>SIGPIPE</code> ignores near the installation site.</li>
    </ul>

    <p>I hope this walkthrough helps you design reliable bootstraps in your own tools. If you’re building a CLI with platform nuance, investing in a disciplined entry layer will pay off in stability, debuggability, and developer experience.</p>
  </section>

  <section aria-label="Appendix: Supporting snippets">
    <h2 id="supporting-snippets">Supporting snippets</h2>

    <h3>Signal handling for SIGPIPE</h3>
    <figure>
      <figcaption>Install a process-wide ignore (lines 129–132). <a href="https://github.com/curl/curl/blob/master/src/tool_main.c#L129-L132" target="_blank" rel="noopener">View on GitHub</a></figcaption>
      <pre class="language-c">#if defined(HAVE_SIGNAL) &amp;&amp; defined(SIGPIPE)
  (void)signal(SIGPIPE, SIG_IGN);
#endif</pre>
    </figure>
    <p class="why">Prevents abrupt termination on broken pipes; downstream writes must check for <code>EPIPE</code> instead.</p>

    <h3>Core run sequence</h3>
    <figure>
      <figcaption>Initialize → operate → cleanup (lines 137–148). <a href="https://github.com/curl/curl/blob/master/src/tool_main.c#L137-L148" target="_blank" rel="noopener">View on GitHub</a></figcaption>
      <pre class="language-c">  /* Initialize the curl library - do not call any libcurl functions before
     this point */
  result = globalconf_init();
  if(!result) {
    /* Start our curl operation */
    result = operate(argc, argv);

    /* Perform the main cleanup */
    globalconf_free();
  }</pre>
    </figure>
    <p class="why">A clean orchestration: fail-fast on init errors, delegate the work, then always clean up.</p>
  </section>
</article>

main_checkfds

memory_tracking_init

main

wmain

Provides the curl tool's process entry point (main/wmain), performs early initialization, ensures standard file descriptors are open, configures signal handling, optionally enables memory tracking from environment variables, and delegates all CLI-driven operations to operate(). Handles platform-specific edge cases (Windows module path dump, VMS exit).