Filter overview
The 17 filters grith runs on every supervised call, organised into three phases.
Every call your agent makes is run past 17 filters, organised into three phases. A filter's only job is to contribute a number to the composite score. The threshold table at the end of the pipeline routes the call to one of three outcomes:
- Auto-allow — composite score below
proxy.auto_allow_threshold(default3.0) - Quarantine — composite score between the thresholds; routed to the digest queue
- Auto-deny — composite score above
proxy.auto_deny_threshold(default8.0)
Filters within a phase run in parallel. Phases run in order: static → pattern → context. The budget is roughly <1ms / ~3ms / ~5ms — ~10ms typical, ~15ms worst-case.
The 17 filters
Security filters
| # | Name | Phase | Latency budget | Score | Summary |
|---|---|---|---|---|---|
| 1 | Operation risk scoring | Static | <1ms | +1 to +3 | Assigns a baseline risk score based on the call's operation class (file_read, file_write, shell, network, etc). |
| 2 | Static path matching | Static | <1ms | +2 to +5 | Aho-Corasick scan of paths against curated denylists and allowlists. Hot — runs against every fs operation. |
| 3 | Sensitive path heuristic | Static | <1ms | +1 to +4 | Detects access to known-sensitive files: .env*, id_rsa, id_ed25519, credentials.json, .aws/, .ssh/, .kube/, and similar. |
| 4 | Allowlist / denylist | Static | <1ms | -1 to +3 | User-managed allow and deny rules. Approved entries lower the score; explicit denies raise it. Edited from the digest review UI. |
| 5 | Argument length & structure | Static | <1ms | 0 to +2 | Flags suspicious argument shapes: encoded payloads, oversized strings, shell metacharacters in unexpected positions. |
| 6 | Capability enforcement | Static | <1ms | DENY | Hard gate: enforces per-profile capability grants. Returns DENY when the active supervisor profile does not permit this operation class. |
| 7 | Secret / credential scanning | Pattern | ~3ms | +3 to +5 | Scans content for embedded credentials. 1,600+ regex patterns covering provider API keys, tokens, private keys, and bespoke credential formats. |
| 8 | Command structure analysis | Pattern | ~3ms | +2 to +4 | Parses shell commands. Catches dangerous patterns: piped curl-to-sh, sudo with arbitrary scripts, fork-bombs, base64-encoded payloads in subshells. |
| 9 | Egress policy | Pattern | ~3ms | -1 to +5 | Network destination policy. Allowlisted hosts lower the score; denylisted or unknown hosts raise it. Plays with the reputation filter for adaptive trust. |
| 10 | DLP gate | Pattern | ~3ms | +3 to +5 | Scans outbound network payloads for sensitive data leaving the box: credentials, PII, large file contents bundled into POST bodies. |
| 11 | Canary secret detection | Pattern | ~3ms | DENY | Hard DENY. Detects registered canary tokens (fake secrets seeded in your repo) reaching any outbound sink — definitive proof of exfiltration. |
| 12 | Destination reputation | Context | ~5ms | -1 to +4 | Per-destination trust accumulated over time. Established trusted hosts get a discount; suspicious or unfamiliar ones get a markup. |
| 13 | Behavioural anomaly | Context | ~5ms | +1 to +3 | Compares the current call against the session's historical baseline (default 200-call minimum). Sudden bursts, unusual destinations, or out-of-pattern operations contribute score. |
| 14 | Taint tracking | Context | ~5ms | 0 to +5 | Tracks data flow from sensitive sources (e.g. a .env read) through subsequent operations. Network calls or shell execs after a sensitive read inherit taint. |
| 15 | Session containment | Context | ~5ms | 0 to +5 | Enforces per-session isolation zones. A session bound to project A cannot reach project B's files or network without explicit cross-zone approval. |
| 16 | Rate limiting | Pattern | ~3ms | DENY | Per-minute caps on file writes (default 30), shell execs (20), and network requests (60). Bursts trigger queue; sustained violations trigger DENY. |
| 17 | Semantic analysis | Pattern | ~3ms | — | Stub for v0.1. Local embedding-based intent classification. Disabled by default; full implementation planned for v1.5. |
17 of 17 filters shown. Within each phase, filters run in parallel; phase order is static → pattern → context.
Reading the table
- Phase determines when the filter runs. Static filters do cheap structural checks against pre-built indexes. Pattern filters do regex / parser work. Context filters look up state from the running session and historical baseline.
- Latency budget is the wall-clock target for the phase, not the individual filter. Filters within a phase are concurrent.
- Score is the inclusive range a filter can contribute. Negative values lower the
composite score (i.e. argue for allow). A value of
DENYmeans the filter is a hard gate — when it fires, the call is denied regardless of any other filter.
What's not visible in the table
- The scoring engine doesn't sum scores naively. It applies weights and ceilings
from
[reputation](e.g.ceiling_filter_threshold = 5.0for capped contributions). See Composite scoring. - The reputation system can subtract from the composite when the destination and call shape have been observed-and-approved many times — see Adaptive reputation.
- Canary detection and capability enforcement are hard gates; they don't contribute score, they short-circuit to DENY.
See also
- Three-phase pipeline — why the phases are ordered this way and what each contributes
- Scoring rules — the math behind the composite
- Tuning scoring thresholds — when and how
to move the default
3.0 / 8.0thresholds
Last updated: 2026-05-14Edit this page on GitHub →