Performance & tuning

Measuring grith overhead and squeezing latency out of the pipeline.

grith is designed for low overhead — a typical call exits the pipeline in 8–12ms wall-clock. This page covers measuring that and the knobs you can turn when it matters.

Targets

Metric	Target
Per-call pipeline latency (median)	8–12ms
Per-call pipeline latency (p99)	15ms
Supervisor overhead per syscall (intercepted)	~50–100µs
Supervisor overhead per syscall (uninteresting, seccomp-fast-path)	under 1µs
Daemon idle RSS	~30MB
Daemon busy RSS	80–150MB

Measuring

Built-in: GET /proxy/status/full (IPC) returns per-filter timing histograms.

For p99 over time:

grith audit export --since 1d --format json |
  jq '[.[] | .filter_timing_ms] | sort_by(.) | .[length*99/100]'

For per-filter breakdown, the trace-mode export captures filter timing per call:

grith exec --trace-syscalls-jsonl /tmp/trace.jsonl -- claude
# ... use the agent ...
grith profile audit --trace /tmp/trace.jsonl --report-timing

Common bottlenecks

1. Secret scanner on large payloads

Symptom: p99 latency exceeds 20ms on calls with large content.

Fix:

[proxy.filters.secret_scan]
max_scan_bytes = 524288     # 512KB cap (default 1MB)

Trade-off: large payloads beyond the cap are sampled rather than scanned in full. Set this lower than your typical legitimate payload size to avoid the sample mode silently helping exfil.

2. Behavioural baseline computation

Symptom: latency spikes on certain calls after a long session.

The behavioural filter rebuilds its smoothing window periodically. Tune:

[proxy.filters.behavioural]
recompute_interval_seconds = 60     # less frequent = lower per-call cost

3. Audit log fsync

Symptom: high disk wait, even with fast SSD.

Default: fsync per audit event. For lower disk IO:

[general]
audit_fsync = "batch"        # "per-event" (default), "batch", "none"
audit_fsync_batch_ms = 100

none is unsafe; events can be lost on crash. batch is the right choice for high-throughput workloads.

4. Reputation table size

Symptom: daemon RSS climbs to 500MB+.

The reputation table accumulates without bound by default. For very long- running daemons:

[reputation]
max_entries = 50000          # cap; oldest decay first
gc_interval_seconds = 3600

Noise reduction knobs

These reduce syscall count (not latency per call), often the bigger win:

[supervisor.noise_reduction]
ignore_read_only = true              # skip filter pipeline for reads on open fds
batch_rapid_reads = true             # coalesce rapid reads in same fd
batch_window_ms = 50                 # batching window

Both default-on. Disable only for forensic recording.

When grith is "slow"

If an agent feels noticeably slow under grith:

Check GET /proxy/status/full — is the filter pipeline actually the bottleneck, or is it the model latency?
Check the audit log — is the agent making 10× the syscalls it would without grith? Filters can't help with sheer syscall count.
Try noise-reduction off vs on — sometimes coalescing reads hides pathological agent behaviour worth seeing.

The cost model of grith is per-syscall, not per-second. Idle agents pay nothing. Busy agents pay proportional to their syscall rate.