Threat model

Formal threat model — what grith defends, what it doesn't, and where the trust boundaries sit.

This is the formal threat model. For the high-level conceptual overview, see The threat model (concept). This page is the reference for security reviewers.

Asset categorisation

Asset	Description	Sensitivity
Credentials at rest	API keys, SSH keys, cloud creds in user dotfiles	Critical
Source code	Project files	High
Operational data	Customer data, internal docs	High to critical
Operating system state	Files, processes, registry-equivalents	Medium-high
Network reachability	Outbound destinations and reachability state	Medium
Reputation table	Accumulated trust signals	Low (leak is uninteresting; tamper is medium)
Audit log	Record of decisions made	Medium (privacy: reveals usage patterns)

Adversary types

A1: Prompt-injection attacker

Capability: controls some text the agent will read (README, search result, comment, error message).
Goal: exfiltrate credentials, deploy malicious code, pivot to other resources.
Threat: high. Most realistic attack today.

A2: Compromised dependency

Capability: controls a package or tool the agent will run.
Goal: as A1 plus persistence.
Threat: high.

A3: Compromised model

Capability: malicious model weights, MITM'd inference endpoint, or malicious provider.
Goal: as A1.
Threat: medium. Requires significant resources.

A4: Confused agent

Capability: none — agent is non-malicious but mistaken.
Goal: none. Mistakes happen.
Threat: medium. Often as costly as actual attacks.

A5: Local userland attacker

Capability: runs as the grith user.
Goal: bypass grith.
Threat: out of scope. Userland trust boundary is the kernel's job.

A6: Local root attacker

Capability: root.
Goal: anything.
Threat: out of scope.

A7: Kernel-level attacker

Capability: kernel exploit.
Goal: anything.
Threat: out of scope.

A8: Network-layer attacker

Capability: MITM between grith and grith.ai, or between grith and the agent's provider.
Goal: license tamper / model MITM / exfil eavesdrop.
Threat: low (TLS + cert pinning). Out of scope for non-TLS variants.

Filter coverage matrix

Attack class	Primary defence	Secondary	Notes
Read credential, send out	sensitive_path, taint, dlp, secret_scan	reputation, canary	Canary catches when others fuzz.
Read credential, leak via summary	dlp, secret_scan	egress, reputation	Model summary can defeat taint; dlp catches if content shape persists.
Pivot to other project	path_match, session_containment	sensitive_path	Zone enforcement.
Privilege escalation	capability, command	rate_limit	Hard gate via capability.
Lateral via shell	command, egress	rate_limit, behavioural	Command parser.
Persistent backdoor (file write)	path_match, sensitive_path	behavioural	Writes outside project.
Beaconing / C2	egress, reputation	behavioural, rate_limit	Adaptive trust.
Confused agent destructive	destructive_action, operation_risk	sensitive_path, path_match	Hard-deny on catastrophic ops; deny on prod-targeted.

Defence-in-depth properties

Composition: every call routes through every applicable filter. No single-filter failure compromises the whole pipeline.
Determinism: filters don't depend on the model's behaviour, only on the syscall and session state.
Hard gates: canary detection and capability enforcement bypass scoring to deny unconditionally.
Fixed thresholds: every call is evaluated against the same thresholds (< 3.0 allow, > 8.0 deny) — there is no call-count "cold-start" widening, so a destructive or exfiltrating operation issued early in a session is never under-scored.

What grith doesn't promise

Bypass impossibility: a determined attacker with sufficient cleverness in prompt construction may find a sequence of calls that, individually, all auto-allow but composed effect a leak. Mitigations: canary, behavioural, cross-session analytics (Pro).
IPC-delegated authority: enforcement is scoped to the supervised process tree. An action the tool delegates to a more-privileged peer executes outside that tree and is not intercepted per-operation — e.g. docker run -v /etc:…:rw (runs in the docker daemon as root), tmux send-keys into a pre-existing pane, or commands sent to ssh-agent, the session D-Bus bus, or X11. These require pre-existing conditions (docker-group membership, an attached unrestricted multiplexer). grith mitigates the known vectors — container host-escalation scoring (see Destructive action coverage), plus audit-only detection of foreign-pts writes, control-socket connects, and authority-delegating spawns — but is not a wall-based sandbox substitute for a fully untrusted agent. Its model is observe-score-mediate in your real environment, not isolation.
Future-proof signatures: secret_scan and dlp_gate have curated pattern sets. New credential formats won't be detected until added.
Side-channel defence: timing-based DNS exfil, slow-and-low patterns — partially covered by behavioural, not formally guaranteed.