Adaptive reputation
How grith gets quieter over time by learning which call shapes you trust.
A grith install on day one is loud. The filters know what looks suspicious but not what's typical for your setup. A week in, you've approved a hundred calls that the filters scored as ambiguous. The reputation system folds those approvals into a per-shape trust score that quietens the digest going forward.
What "reputation" tracks
The trust table keys are tuples of:
- Operation (
file_read,network,shell, …) - Destination shape (path prefix, domain, command head)
- Profile context (which profile was active)
For each key, the table holds:
observations— how many times this shape has been seen.denials— how many times it was denied (manually or auto).trust— a smoothed score in [0, 1], roughly "how often this shape was approved".last_seen— for decay.
A new call's filter score gets adjusted by the trust value: if trust ≥ reputation.auto_allow_trust (default 0.92) and observations ≥ reputation.auto_allow_min_observations (default 8), the composite gets a
reduction up to reputation.max_score_reduction (default 4.0).
The cap matters. A high-trust shape can still escape into the digest if a different filter (DLP, canary, taint) lights it up. Trust is a discount, not a bypass.
How trust changes
- Approve in the digest → trust nudges up.
- Approve + learn → trust jumps faster; this is the deliberate "yes, please remember this" signal.
- Deny → trust drops, weighted by
reputation.deny_weight(default3.0, meaning denials are worth 3× as much as approvals in the smoothing). - Auto-allow (the call was already auto-allowed by score) → trust nudges up very slightly. The system shouldn't be calibrated by easy approvals.
- Auto-deny → trust drops.
- Time → trust decays toward neutral via
reputation.decay_lambda(default0.98, applied per save cycle, default 5 minutes). Trust earned six months ago matters less than trust earned yesterday.
Why decay
Without decay, a one-time bad approval would mark a destination "trusted" forever.
Decay lets old signals fade so the trust table reflects recent behaviour, not your
worst day from a year ago. The default 0.98 per save cycle works out to roughly
half-life of 30 days under typical usage; tighter agents (more frequent calls) decay
faster in elapsed time.
What it can't do
The reputation system cannot promote a call past a hard gate:
- A canary detection denies, full stop, regardless of trust.
- A capability deny is enforced regardless of trust.
- The
max_score_reductioncap ensures even a perfectly-trusted shape can't push the composite below 0 in the presence of other strong contributions.
The reputation system also doesn't share state between sessions started by different users (in multi-user team setups) by default — see Team sync for the Pro flow that does.
Adversarial robustness
A natural worry: what if the agent (or attacker controlling it) does a hundred benign reads to build up reputation, then sneaks one bad call through? Two mitigations:
- Only human approvals move the needle fast. Auto-allow contributions are
intentionally tiny (
reputation.auto_allow_trust_increment≈ 0.001). An attacker would need years of pure auto-allow to flip a shape from neutral to fully trusted. - Strong filters bypass reputation. Even a fully-trusted destination gets the secret scan, DLP gate, canary detection, and taint inheritance done on every call. Reputation only knocks the composite down; it doesn't make the filter blind.
The system is designed so that the only way to teach grith something significantly wrong is to keep explicitly approving wrong things in the digest. Which is a different problem.
Inspecting the trust table
grith reputation show
Lists shapes with their trust, observation count, and last-seen time. Sorted by
recency by default; --sort trust shows your most-trusted shapes (sometimes
surprising).
grith reputation reset
Wipes the table. Useful when starting fresh — for example, after switching agents or after a security audit.
Persistence
The trust table is held in memory and flushed to
~/.cache/grith/reputation/<profile>.bin every reputation.save_interval_seconds
(default 300s). Atomic write-rename, so a crash mid-flush doesn't corrupt. The file
is binary CBOR; safe to delete if it gets corrupted (you'll lose accumulated trust,
but nothing in the audit log or pending digest).
See also
- Composite scoring — where the trust discount is applied
- The quarantine digest — where approvals are recorded
- grith reputation — CLI for inspecting and resetting