Trust Boundaries

| Domain | Trust Level | Role |
|---|---|---|
| Kernel | Fully trusted | Enforces Landlock rules, delivers seccomp notifications, blocks unauthorized syscalls |
| Supervisor (parent) | Trusted | Receives trapped syscalls, consults approval backend, opens files, injects file descriptors |
| Sandboxed child | Untrusted | Runs the agent command under full kernel enforcement |
never_grant paths are checked before any approval backend is consulted.
When proxy mode is active, a fourth domain exists:
| Domain | Trust Level | Role |
|---|---|---|
| Network proxy | Trusted (runs in supervisor) | Filters outbound connections by host, injects credentials, enforces deny CIDRs |
localhost:<port> — all other outbound TCP is blocked at the kernel level. A session token (256-bit random) prevents other localhost processes from using the proxy.
The Two-Layer Architecture
Layer 1: Landlock (the floor)
Landlock LSM provides the hard security floor. Oncerestrict_self() is called, the child process is permanently restricted to its initial capability set. No API exists to expand or remove the restrictions. Child processes inherit them. The only escape is a kernel exploit.
Landlock is:
- Unprivileged — any process can sandbox itself without
CAP_SYS_ADMINor root - Irreversible — the kernel provides no undo mechanism
- Inherited — all child processes and threads inherit the ruleset
- Available — present on any kernel 5.13+ (Ubuntu 22.04+, Fedora 35+, Debian 12+)
Layer 2: seccomp-notify (the gate)
seccomp user notification (SECCOMP_RET_USER_NOTIF) provides dynamic capability expansion on top of the Landlock floor. A BPF filter traps openat and openat2 syscalls before they reach Landlock and routes them to the supervisor for a decision.
The critical property: seccomp runs before Landlock. When the child calls open(), the seccomp filter fires first, suspending the syscall and notifying the supervisor. The supervisor can then:
- Deny — return
EPERMto the child (the syscall never reaches Landlock) - Approve — open the file itself and inject the fd into the child via
SECCOMP_IOCTL_NOTIF_ADDFD
open() call returns a valid file descriptor. The child never executes its own openat — the supervisor’s open() is the single point of truth. The agent does not need to know nono exists; its standard file operations succeed after a brief pause during approval.
Why This Ordering Matters

Why Not Other Mechanisms
Why not SECCOMP_USER_NOTIF_FLAG_CONTINUE?
CONTINUE tells the kernel to let the child’s original syscall proceed. For syscalls that take pointer arguments (like openat, which takes a path string), this creates a TOCTOU race: the child can change the path in memory between the supervisor’s check and the kernel’s execution. The child could get the supervisor to approve /tmp/harmless and then swap the pointer to /etc/shadow before the kernel reads it.
fd injection via SECCOMP_IOCTL_NOTIF_ADDFD eliminates this entirely. The supervisor opens the file itself — whatever path it read from the child’s memory is what gets opened. The child’s memory contents after that point are irrelevant.
Why not mount namespaces?
A mount namespace with a minimal filesystem view would eliminate the information leak surface (the child could notstat paths outside the namespace). However:
unshare(CLONE_NEWUSER | CLONE_NEWNS)requires unprivileged user namespaces- Unprivileged user namespaces are disabled by default on Debian, restricted by AppArmor on Ubuntu 23.10+, and turned off in many enterprise configurations
- Building a core security boundary on a mechanism that distro maintainers are actively restricting is fragile
Why not DYLD interposition on macOS?
Transparent capability expansion on macOS would require interceptingopen() calls via DYLD_INSERT_LIBRARIES. This fails for three reasons:
- SIP strips the variable — Apple’s System Integrity Protection removes
DYLD_INSERT_LIBRARIESfrom Apple Platform Binaries (/usr/bin/env,/bin/bash,/bin/sh). Any command that routes through these interpreters loses the interposition. - Version manager shims break the chain — Tools like
pyenvandrbenvuse shims that exec through SIP-protected interpreters. - Calling convention mismatch — The variadic
open(const char*, int, ...)function cannot be safely interposed on arm64 without matching the exact register layout, which causes crashes.
sandbox_init()) provides the kernel enforcement layer on macOS, with the same irreversibility guarantees as Landlock on Linux.
The fd Injection Model
When the supervisor approves a request, it does not tell the child “go ahead and open it yourself.” It opens the file and hands the child a file descriptor. This distinction is fundamental to the security model.What the supervisor does on approval
- Reads the requested path from
/proc/CHILD/mem - Validates the notification is still live (
SECCOMP_IOCTL_NOTIF_ID_VALID) - Checks the path against
never_grant - Canonicalizes the path to resolve symlinks
- Re-checks
never_granton the canonical path (a symlink from an innocuous path could point to a blocked target) - Walks the canonical path component-by-component using
openatwithO_NOFOLLOWat each step (prevents symlink substitution between canonicalization and open) - Injects the resulting fd into the child via
SECCOMP_IOCTL_NOTIF_ADDFDwithSECCOMP_ADDFD_FLAG_SEND
SECCOMP_ADDFD_FLAG_SEND flag is critical: it atomically injects the fd and completes the child’s syscall in one operation. The child’s open() returns the injected fd directly.
What the supervisor does NOT do
- Does not pass
O_CREAT— the supervisor cannot be tricked into creating files that do not exist - Does not pass
O_TRUNC— the child cannot use the supervisor as a proxy to truncate files; it receives a plain writable fd and can seek/write within the file, but truncation is an explicit operation on an fd the user approved - Does not use
SECCOMP_USER_NOTIF_FLAG_CONTINUE— the child never executes its ownopenat
Scope of an approved fd
Once injected, the child holds the fd until it closes it. There is no revocation mechanism — this is an inherent property of Unix file descriptors. The approval grants access for the remainder of the session.Syscall Scope
The seccomp filter traps onlyopenat and openat2. All other syscalls pass through at full speed with zero overhead.
| Syscall | Trapped | Rationale |
|---|---|---|
openat / openat2 | Yes | File access is the capability boundary |
read / write / close | No | Hot path; operates on already-granted fds |
stat / access | No | Information leak, but agents handle failures gracefully; not a data exfiltration vector |
unlink / rename | No | Governed by Landlock; cannot affect paths outside the allowed set |
connect / bind | No | Governed by Landlock ABI v4+ network filtering; in proxy mode, restricted to localhost proxy port |
Information leak surface
Becausestat and access are not trapped, the sandboxed child can enumerate filesystem structure — file existence, types, permissions — without triggering a supervisor notification. For cooperative agents (the target use case), this is acceptable. For adversarial code, this could enable reconnaissance. The optional mount namespace layer (when available) would close this gap.
Failure Modes
| Failure | Behavior | Safety Property |
|---|---|---|
| Supervisor crashes | Notif fd closes; pending syscalls get ENOSYS; child falls back to Landlock | Safe — Landlock denies the access |
| Supervisor denies request | Child receives EPERM | Safe — explicit denial |
never_grant match | Request rejected before approval backend is consulted | Safe — hard policy boundary |
| Rate limit exceeded | Request automatically denied | Safe — prevents prompt flooding |
| Child exits during approval | SECCOMP_IOCTL_NOTIF_ID_VALID check fails; supervisor discards the request | Safe — no orphaned state |
| Path canonicalization fails | Supervisor returns error to child | Safe — fail-closed |
What If the Supervisor Is Compromised?
A reasonable question: the supervisor can open any file and inject it into the child. If an attacker compromises the supervisor, can they use it as a proxy to feed arbitrary files to the sandboxed agent? The answer depends on where the attacker is.From the child (inside the sandbox)
The child cannot compromise the supervisor because the supervisor never runs untrusted code. The agent runs in the child. The supervisor is nono’s own Rust binary — the parent process afterfork().
The child’s communication channels to the supervisor are:
- seccomp notification fd — kernel-mediated. The child cannot forge or manipulate these; the kernel generates them from trapped syscalls.
- Unix socket — length-prefixed JSON parsed by serde in memory-safe Rust. Malformed messages are rejected. Valid messages are checked against
never_grant, rate-limited, and require user approval.
ptrace the parent (blocked by yama ptrace_scope on most distributions, and the child’s own seccomp filter restricts its syscalls). There is no shared memory, no signal-based control channel, and no way to inject code into the supervisor process.
Compromising the supervisor from the child would require a memory corruption bug in nono’s Rust code (memory-safe by default, no unsafe in the IPC path) or a kernel exploit.
From outside (an external attacker)
The supervisor runs as the same user who invokednono run. It is not setuid, does not run as root, and holds no elevated capabilities. It can only open files the invoking user can already open.
If an external attacker can compromise the supervisor process, they already have user-level code execution on the host. At that point, they can open the same files directly — the supervisor grants them nothing they do not already have. The supervisor is a privilege boundary in the downward direction (restricting the child), not the upward direction.
The supervisor’s external attack surface is minimal:
| Surface | Exposure |
|---|---|
| IPC socket | Anonymous socketpair() — no filesystem path, no way for external processes to connect |
| seccomp notif fd | Owned solely by the supervisor; no other process holds it |
| Approval input | Reads from /dev/tty; requires access to the user’s terminal session |
| Network | Minimal — in proxy mode, the supervisor listens on a random localhost port protected by a session token; otherwise none |
What if the supervisor has a vulnerability?
Even in an unprivileged supervisor, a memory corruption vulnerability could allow the child to escape the sandbox by hijacking the supervisor’s control flow. The question is how realistic this is. Rust eliminates the most common vulnerability classes. Buffer overflows, use-after-free, double-free, and format string attacks are structurally impossible in safe Rust. The compiler prevents them, not programmer discipline. The IPC message parsing uses serde JSON with no manual buffer management — there is nosprintf into a stack buffer, no memcpy with an attacker-controlled length.
The unsafe surface is small and does not parse complex input. The unsafe blocks in the supervisor path are limited to libc FFI calls: openat, poll, seccomp ioctls, and SCM_RIGHTS fd passing. These are thin wrappers around syscalls with fixed-size arguments, not parsing routines operating on attacker-controlled data.
The child can only deliver a payload through two narrow channels:
| Channel | Data Format | Parser |
|---|---|---|
| Unix socket | Length-prefixed JSON | serde (memory-safe, widely audited) |
/proc/CHILD/mem | Bounded-length path string | canonicalize + openat (standard libc) |
unsafe FFI block with a delivery mechanism from the child, they achieve user-level code execution in the supervisor. This is the same privilege level the invoking user already has — the attacker has escaped the sandbox but has not escalated privileges. This is meaningful (a sandbox escape is a real security event) but it is not the catastrophic outcome of compromising a root-level supervisor.
The risk is not zero — nothing is. But Rust’s memory safety guarantees make the traditional exploit classes structurally impossible across the vast majority of the codebase, the remaining unsafe surface is small and constrained, and the worst-case outcome is lateral movement to the user’s own privilege level rather than privilege escalation.
The key distinction
The supervisor is not a privilege escalation target because it does not hold privileges the user does not already have. This is a deliberate design choice. nono runs entirely unprivileged — no root, noCAP_SYS_ADMIN, no setuid. An architecture where the supervisor ran with elevated privileges (as some container runtimes do) would make supervisor compromise a serious escalation vector. nono avoids this by design.
Network Proxy Security Model
When--network-profile or --proxy-allow is used, nono starts an HTTP proxy in the supervisor process and restricts the child to ProxyOnly mode — only localhost:<port> is reachable from inside the sandbox.
Enforcement Layers
| Platform | Mechanism | What It Restricts |
|---|---|---|
| Linux | Landlock ABI v4+ per-port TCP rules | connect() limited to proxy port only |
| macOS | Seatbelt (allow network-outbound (remote tcp "localhost:PORT")) | All other outbound denied |
connect() to any address other than 127.0.0.1:<port> returns EPERM.
Session Token Authentication
Every proxy session generates a 256-bit random token (viagetrandom). The child receives it as NONO_PROXY_TOKEN. Every request must include this token:
- CONNECT mode:
Proxy-Authorization: Bearer <token> - Reverse proxy mode:
X-Nono-Token: <token>
DNS Rebinding Protection
The proxy resolves DNS itself and checks all resolved IP addresses against the deny list before connecting. This prevents attacks where:- An attacker controls DNS for an allowed hostname
- DNS returns an RFC1918 address (e.g.,
10.0.0.1) or cloud metadata (169.254.169.254) - The proxy would connect to an internal host thinking it’s an allowed external API
Credential Isolation
In reverse proxy mode, API credentials are loaded from the system keyring at proxy startup and stored in the supervisor’s memory asZeroizing<String>. They are never passed to the sandboxed child:
- The child sees
OPENAI_BASE_URL=http://127.0.0.1:<port>/openai— a local HTTP URL with no key - The proxy injects
Authorization: Bearer sk-...when forwarding to the upstream over TLS - The child cannot read the credential from the proxy’s memory (separate process, no shared memory, no
ptrace)
Proxy Failure Modes
| Failure | Behavior | Safety Property |
|---|---|---|
| Proxy crashes | Child loses all network access (only proxy port was allowed) | Safe — fail-closed |
| Invalid token | Proxy returns 403 | Safe — authentication enforced |
| Host not in allowlist | Proxy returns 403 | Safe — deny by default |
| DNS resolves to denied CIDR | Proxy returns 403 | Safe — rebinding protection |
| Upstream TLS failure | Proxy returns 502 | Safe — no fallback to plaintext |
macOS Model
On macOS, Seatbelt provides the kernel enforcement layer viasandbox_init(). The security properties are equivalent to Landlock:
- Irreversible once applied
- Enforced by the XNU kernel
- Inherited by child processes
- No userspace escape mechanism
Summary
The architecture optimizes for three properties simultaneously:- Unprivileged deployment — no root, no
CAP_SYS_ADMIN, no kernel configuration changes. Works on any kernel 5.14+ out of the box. - Defense in depth — Landlock provides a hard floor that catches failures in the dynamic layer. The supervisor can only grant what it can open.
never_grantprovides a policy ceiling that no approval can override. - Transparency — the sandboxed agent does not need to know about nono. Standard
open()calls succeed after supervisor approval. No retries, no special APIs, no agent modifications.