Skip to home
المدونة

Zalt Blog

Deep Dives into Code & Architecture at Scale

When a CLI Becomes an Operating System

By محمود الزلط
Code Cracking
35m read
<

When a CLI starts feeling more like an operating system than a simple command, you know something interesting is going on. Where’s that line for your tools?

/>
When a CLI Becomes an Operating System - Featured blog post image

Every serious CLI starts the same way: a small script that parses args and calls a function. Then, little by little, it turns into something else entirely. In lib/npm.js, npm has crossed that line. It no longer behaves like a thin wrapper; it behaves like a tiny operating system for npm commands.

In this article, we’ll walk through how this single file builds a whole runtime around each npm invocation—handling configuration, logging, timing, workspaces, and errors—while still staying under 300 lines. I’m Mahmoud Zalt, and we’ll use it as a concrete guide for designing robust orchestration layers for our own CLIs and services.

Npm as a micro‑OS

To see why this file feels like an operating system kernel, we should first look at what it’s responsible for and what it deliberately delegates.

Project/npm-cli
└── lib/
    ├── npm.js          (this file: Npm orchestrator)
    ├── commands/
    │   ├── install.js  (example command module)
    │   ├── publish.js
    │   └── ...
    └── utils/
        ├── display.js       (Display, chalk, output formatting)
        ├── log-file.js      (log file creation/rotation, .files)
        ├── timers.js        (timing, metrics, .load/.finish/.off)
        ├── npm-usage.js     (usage text generator)
        ├── cmd-list.js      (deref command alias -> canonical)
        ├── error-message.js (getError: shapes error + report files)
        └── output-error.js  (outputError: render error to user)
High‑level structure: lib/npm.js orchestrates, everything else specializes.

Conceptually, the Npm class represents “one npm run.” It:

  • Boots the environment (config, stdout/stderr, colors, cache and logs directories).
  • Resolves which command to run (install, publish, …) via a small command registry (deref).
  • Executes that command under timers and workspace rules.
  • Shuts down cleanly, writing timing metadata and user‑friendly errors.

Why this matters: treating the orchestrator as a “micro‑OS” forces a clean separation between the runtime (process, config, logs) and the application logic (commands). That separation is what keeps this file small and maintainable in spite of its central role.

Boot sequence of an npm run

Once we see Npm as a tiny OS, the next natural question is: how does it boot? The load() method is the entrypoint, but the interesting work happens in the private #load() method it wraps.

Constructing the runtime context

Everything starts with the constructor, which wires up display and configuration. The constructor is intentionally “test friendly” but also reveals how the real runtime is expected to look.

constructor ({
  stdout = process.stdout,
  stderr = process.stderr,
  npmRoot = dirname(__dirname),
  argv = [],
  excludeNpmCwd = false,
} = {}) {
  this.#display = new Display({ stdout, stderr })
  this.#npmRoot = npmRoot
  this.config = new Config({
    npmPath: this.#npmRoot,
    definitions,
    flatten,
    nerfDarts,
    shorthands,
    argv: [...process.argv, ...argv],
    excludeNpmCwd,
  })
}

Two important design ideas are packed here:

  • Dependency injection (a pattern where you pass dependencies in instead of creating them inside) via stdout, stderr, npmRoot, and argv. This makes testing and embedding far easier.
  • Config and display are constructed once and then treated as long‑lived collaborators, not re‑created per command.

Step‑by‑step boot pipeline

The core boot sequence in #load() is essentially a scripted pipeline. Each step is wrapped in timers, so we can measure where startup time goes.

async #load () {
  await time.start('npm:load:whichnode', async () => {
    const node = await which(process.argv[0]).catch(() => {})
    if (node && node.toUpperCase() !== process.execPath.toUpperCase()) {
      log.verbose('node symlink', node)
      process.execPath = node
      this.config.execPath = node
    }
  })

  await time.start('npm:load:configload', () => this.config.load())

  if (this.config.get('versions', 'cli')) {
    this.argv = ['version']
    this.config.set('usage', false, 'cli')
  } else {
    this.argv = [...this.config.parsedArgv.remain]
  }

  const commandArg = this.argv.shift()
  const command = deref(commandArg)

  await this.#display.load({
    command,
    loglevel: this.config.get('loglevel'),
    stdoutColor: this.color,
    stderrColor: this.logColor,
    timing: this.config.get('timing'),
    unicode: this.config.get('unicode'),
    progress: this.flatOptions.progress,
    json: this.config.get('json'),
    heading: this.config.get('heading'),
  })
  process.env.COLOR = this.color ? '1' : '0'

  if (this.config.get('version', 'cli')) {
    output.standard(this.version)
    return { exec: false }
  }

  // ... cache/log directories, titles, timers, scope normalization ...
}

Let’s unpack what’s happening conceptually:

  1. Resolve the Node binary: which is used to find the canonical Node executable and normalize process.execPath. This sounds minor, but getting the exact binary right affects stack traces, help text, and some platform bugs.
  2. Load configuration: @npmcli/config reads environment, npmrc files, and CLI flags. This is expensive enough that it’s timed separately (npm:load:configload).
  3. Resolve the command: arguments are split into the raw command as typed (commandArg) and the remaining args. A deref step translates aliases into canonical names, giving a stable handle for module loading.
  4. Initialize display: the UI layer is configured with log level, color, JSON mode, unicode, progress, and heading, all derived from config and flatOptions.
  5. Short‑circuit for --version/--versions: those fast paths return early with { exec: false } to avoid unnecessary work like cache/log directory creation.

Why this matters: by explicitly scripting the boot sequence, we get a natural place to measure, to short‑circuit, and to plug in new behaviors without turning load() into a maze of conditionals.

Security through careful title and argv handling

One of the more subtle parts of the boot sequence is how it sets process.title and logs arguments without leaking secrets.

time.start('npm:load:setTitle', () => {
  const { parsedArgv: { cooked, remain } } = this.config
  this.#title = ['npm'].concat(replaceInfo(remain)).join(' ').trim()
  process.title = this.#title

  this.#argvClean = replaceInfo(cooked)
  log.verbose('title', this.title)
  log.verbose('argv', this.#argvClean.map(JSON.stringify).join(' '))
})

Two points stand out:

  • Redaction first: replaceInfo from @npmcli/redact is applied before setting process.title or logging args to avoid exposing tokens or passwords in process listings or debug logs.
  • Measuring cost: setting process.title can be slow on some platforms, so it’s wrapped in a time.start span. That’s observability wired right into the core lifecycle.

Command execution as a first‑class citizen

With the runtime booted, the next responsibility of this micro‑OS is to run exactly one “userland program”: an npm command. The file uses a clean command pattern to do that.

Resolving commands by name

The static Npm.cmd method is the dispatcher. It does two things: normalization and dynamic loading.

static cmd (c) {
  const command = deref(c)
  if (!command) {
    throw Object.assign(new Error(`Unknown command ${c}`), {
      code: 'EUNKNOWNCOMMAND',
      command: c,
    })
  }
  return require(`./commands/${command}.js`)
}

We can think of deref() as the symbol table of this mini‑OS: it maps whatever the user typed to the canonical command implementation. The explicit EUNKNOWNCOMMAND error code ensures the rest of the error pipeline can treat “unknown command” as a first‑class scenario, not just a generic exception string.

This design has a trade‑off: the require() call is dynamic, which hurts static analysis and bundling, but it keeps the command set easy to extend. The report suggests a future static registry as a middle ground: a map from command names to modules that tooling can introspect.

Executing commands with workspace and engine semantics

The heart of execution lives in #exec(). This is where the runtime treats commands as citizens of a larger environment rather than isolated functions.

async #exec (cmd, args) {
  const Command = this.constructor.cmd(cmd)
  const command = new Command(this)

  if (!this.#command) {
    this.#command = command
    process.env.npm_command = this.command
  }

  if (this.config.get('usage')) {
    return output.standard(command.usage)
  }

  let execWorkspaces = false
  const hasWsConfig = this.config.get('workspaces') || this.config.get('workspace').length
  const implicitWs = this.config.get('workspace', 'default').length

  if (hasWsConfig && (!implicitWs || !Command.ignoreImplicitWorkspace)) {
    if (this.global) {
      throw new Error('Workspaces not supported for global packages')
    }
    if (!Command.workspaces) {
      throw Object.assign(new Error('This command does not support workspaces.'), {
        code: 'ENOWORKSPACES',
      })
    }
    execWorkspaces = true
  }

  if (command.checkDevEngines && !this.global) {
    await command.checkDevEngines()
  }

  return time.start(`command:${cmd}`, () =>
    execWorkspaces ? command.execWorkspaces(args) : command.exec(args))
}

There are several layers of behavior here:

  • Command identity: the first command to run “claims” this.#command, and process.env.npm_command is set once. Even if commands re‑enter exec() internally (like npm test delegating to run), the logical command for this run stays stable.
  • Workspace awareness: workspace config is interpreted in combination with static command flags (Command.workspaces, Command.ignoreImplicitWorkspace). The orchestrator enforces “workspaces and global don’t mix” and “don’t accidentally run workspace‑unsafe commands” centrally.
  • Engine checks: if a command exposes checkDevEngines, it will be called for non‑global runs before execution, giving a hook for version compatibility enforcement.
  • Timing as a contract: every command is timed under a span like command:install. This turns performance into an explicit part of the programming model.

Why this matters: the orchestrator owns cross‑cutting policy (workspaces, engines, timing) while each command owns its domain logic. That’s exactly what we want from a command pattern in a real‑world CLI.

Errors as events, not afterthoughts

So far, the story has been about happy‑path boot and execution. But the most interesting part of lib/npm.js is how it treats errors as first‑class events with their own lifecycle.

Public methods wrap the private core

Both load() and exec() follow the same pattern: they delegate to a private method and route any thrown errors through a central handler.

async load () {
  let err
  try {
    return await time.start('npm:load', () => this.#load())
  } catch (e) {
    err = e
  }
  return this.#handleError(err)
}

async exec (cmd, args = this.argv) {
  if (!this.#command) {
    let err
    try {
      await this.#exec(cmd, args)
    } catch (e) {
      err = e
    }
    return this.#handleError(err)
  } else {
    return this.#exec(cmd, args)
  }
}

This gives us a neat separation:

  • Private methods (#load, #exec) focus on doing work.
  • Public methods (load, exec) focus on boundaries: timing, error normalization, and finalization.

Enriching and reporting errors

The real power sits in #handleError() and #getError(). Together, they decide what the user sees and what gets written to disk.

async #handleError (err) {
  if (err) {
    const localPkg = await require('@npmcli/package-json')
      .normalize(this.localPrefix)
      .then(p => p.content)
      .catch(() => null)
    Object.assign(err, this.#getError(err, { pkg: localPkg }))
  }

  this.finish(err)

  if (err) {
    throw err
  }
}

Two key ideas show up here:

  • Contextual enrichment: the error is augmented with local package metadata (if available) so messages can say things like “in package my-app at version X.”
  • Always finish: regardless of success or failure, finish(err) is called to close timers and flush the final output frame.

The lower‑level shaping and file writing happens in #getError():

#getError (rawErr, opts) {
  const { files = [], ...error } = require('./utils/error-message.js').getError(rawErr, {
    npm: this,
    command: this.#command,
    ...opts,
  })

  const { writeFileSync } = require('node:fs')
  for (const [file, content] of files) {
    const filePath = `${this.logPath}${file}`
    const fileContent = `'Log files:\n${this.logFiles.join('\n')}\n\n${content.trim()}\n`
    try {
      writeFileSync(filePath, fileContent)
      error.detail.push(['', `\n\nFor a full report see:\n${filePath}`])
    } catch (fileErr) {
      log.warn('', `Could not write error message to ${file} due to ${fileErr}`)
    }
  }

  outputError(error)

  return error
}

Here, error-message.js effectively returns a plan for error reporting: a structured error object plus any extra files that should be created. #getError() then applies that plan:

  • Each extra file is written synchronously with a standard header listing log file paths.
  • If a write succeeds, a “for a full report see…” snippet is appended to error.detail, which will be rendered for the user.
  • If a write fails, the failure is logged but the original error is preserved.

Why this matters: errors are treated as multi‑channel events (console + disk) with a repeatable structure, not just thrown strings. That architecture makes it much easier to build tooling around “npm failed” in the future.

Finishing the run and messaging about logs

After errors are handled (or if there was no error), finish() and exitErrorMessage() coordinate user‑facing messaging.

finish (err) {
  this.#timers.finish({
    id: this.#runId,
    command: this.#argvClean,
    logfiles: this.logFiles,
    version: this.version,
  })

  output.flush({
    [META]: true,
    json: this.loaded && this.config.get('json'),
    jsonError: jsonError(err, this),
  })
}

This is the final “frame” of output: timers are closed, and a structured JSON error object (or null) is passed to the display layer. exitErrorMessage() then tells the user whether logs were written and where to find them, with different branches for:

  • Logs exist.
  • Logs were disabled via logs-max=0.
  • Log directory couldn’t be written.

Design choices that make this work

Now that we’ve walked through boot, execution, and errors, it’s easier to spot the key architectural patterns that give this file its clarity.

1. A clear façade for the rest of the CLI

The Npm class is a classic facade (an object that provides a simplified interface to a larger subsystem). Command modules don’t need to know about @npmcli/config, timers, or log files directly; they just depend on an Npm instance with small, well‑named getters:

  • cache, prefix, bin, global, usage, logFiles, …
  • Derived paths like globalDir, localDir, globalBin, localBin.

This keeps command code focused on “what this command does” instead of “how npm sets up its environment.”

2. Template‑method style lifecycle

The pattern used for load() and exec() is very close to the Template Method pattern: a public method defines the skeleton (timing, error handling, finalization), while private methods fill in the specifics (actual loading, actual execution).

This gives us three benefits:

  • Lifecycle concerns (timing, logging) are consistent and easy to audit.
  • Implementation details can evolve without changing how callers use load() or exec().
  • Testing can focus on either the outer behavior or the inner mechanics independently by mocking collaborators.

3. Guardrails baked into getters

Many of the getters—global, dir, bin, flatOptions—encode the rules of the system in one place. For example:

get global () {
  return this.config.get('global') || this.config.get('location') === 'global'
}

get dir () {
  return this.global ? this.globalDir : this.localDir
}

Any command that wants “the directory npm should operate on” just asks for npm.dir. It can’t accidentally re‑implement the global/local decision incorrectly. The orchestrator becomes the single source of truth for these semantics.

4. One notable footgun: mutating flatOptions

Not everything is perfect. One subtle smell is that the flatOptions getter mutates this.config.flat each time it’s accessed:

get flatOptions () {
  const { flat } = this.config
  flat.nodeVersion = process.version
  flat.npmVersion = pkg.version
  if (this.command) {
    flat.npmCommand = this.command
  }
  return flat
}

This breaks the usual expectation that a getter is “read‑only.” The report suggests a straightforward refactor: clone flat into a derived object and add the extra fields there. That keeps config.flat as a pure view of configuration and puts runtime additions in a separate layer.

Getter design: current vs suggested flatOptions
Version Behavior Impact
Current Mutates config.flat on every access Hidden side effects, surprising to callers
Suggested Returns { ...flat, nodeVersion, npmVersion, npmCommand } Getter becomes referentially transparent; config stays clean

Performance and operational angles

So far we’ve treated performance and operations as side notes, but in a CLI used millions of times per day, they become central to the design. This file embeds observability directly into the orchestrator.

Hot paths and where they’re measured

The main hot paths are:

  • Boot: Npm.#load, especially config.load(), which() calls, and process.title setting.
  • Command execution: Npm.#exec, which delegates to command modules.
  • Error handling: #getError when large error reports are written synchronously.

Each of these stages is wrapped in time.start() spans with clear labels (npm:load, npm:load:configload, command:). That makes it trivial to surface metrics like:

  • npm_load_duration_seconds: how long startup takes.
  • npm_command_duration_seconds: per‑command latency, especially for popular ones like install or publish.
  • npm_error_reports_written_total: how often error reports are generated.

Why this matters: by measuring at the orchestration layer, we can track user‑perceived performance across all commands without touching each command module individually.

Risky but acceptable choices

The file makes a few trade‑offs that are safe in context but worth calling out so we can make informed decisions in our own systems:

  • Synchronous error writes: as mentioned, writeFileSync will block the event loop. For a CLI that’s about to exit, it’s usually fine. For long‑running daemons, the asynchronous refactor from the report would be critical.
  • Dynamic command requires: makes the set of commands flexible and easy to extend but complicates bundling and static analysis.
  • Strong coupling to config shape: the orchestrator knows about config.parsedArgv.remain, config.flat, globalPrefix, and more. A small adapter layer around @npmcli/config would isolate this dependency and make refactors easier.

Lessons you can apply today

Stepping back, lib/npm.js is a compact demonstration of how to turn “a script that runs some code” into a reliable, observable runtime for commands. You don’t need to be building a package manager to adopt the same patterns.

1. Treat your entrypoint as a kernel

Whether you’re designing a CLI, a background worker, or an HTTP server, give your top‑level orchestrator a clear set of responsibilities:

  • Load configuration once and expose it through small, focused getters.
  • Initialize cross‑cutting services (logging, metrics, error formatting) in one place.
  • Define a lifecycle: boot → execute → finish, and make it explicit in code.

2. Make error handling a first‑class pipeline

Instead of throwing strings or logging ad‑hoc, build a small error pipeline:

  • Shape raw errors into structured objects (code, message, detail, files).
  • Let a single place decide how to output and persist them.
  • Always call a finish() or equivalent at the end of a run to flush timers and logs.

3. Centralize policy, decentralize behavior

Just like npm’s orchestrator owns workspace rules, process title, and color decisions, your orchestrator should own:

  • Global/local selection logic.
  • Feature flags and mode switches (JSON output, verbose logging, etc.).
  • Shared constraints (e.g., “this feature can’t be used in global mode”).

Individual commands or handlers should only need to ask for environment facts, not re‑encode global rules.

4. Avoid hidden side effects in getters

Use the flatOptions smell as a reminder: if a getter needs to compute extra information, have it return a fresh object. The only time it’s reasonable to mutate internal state from a getter is when you’re lazily initializing something that is obviously internal (for example, caching a computed regular expression).

5. Put observability at the edges

Follow npm’s lead by timing high‑level phases and key commands, not every micro‑operation:

  • Wrap startup in one span, with a few nested spans for heavy pieces like config load.
  • Wrap each user‑visible command in a command: span.
  • Expose metrics such as load_duration, command_duration, error_reports_written, and log_dir_failures.

Think of your orchestrator as the “narrator” of your system: it knows when the story starts, what chapter you’re in, and how it ends. By designing it consciously—like the Npm class does—you make every command run more predictable, more debuggable, and safer to evolve.

If you’re working on a CLI or any service with a command‑like API, try sketching your own mini‑OS: a single file or class that owns boot, execute, and finish. Use npm’s orchestrator as a reference, and then adapt the patterns to your stack and constraints.

Full Source Code

Here's the full source code of the file that inspired this article.
Read on GitHub

Unable to load source code

Thanks for reading! I hope this was useful. If you have questions or thoughts, feel free to reach out.

Content Creation Process: This article was generated via a semi-automated workflow using AI tools. I prepared the strategic framework, including specific prompts and data sources. From there, the automation system conducted the research, analysis, and writing. The content passed through automated verification steps before being finalized and published without manual intervention.

Mahmoud Zalt

About the Author

I’m Zalt, a technologist with 15+ years of experience, passionate about designing and building AI systems that move us closer to a world where machines handle everything and humans reclaim wonder.

Let's connect if you're working on interesting AI projects, looking for technical advice or want to discuss your career.

Support this content

Share this article