7. Secret / credential scanning

1,620+ regex patterns over outbound content. The credential leak filter.


Phase	Pattern
Score range	+3 to +5
Module	`crates/grith-proxy/src/filters/secret_scan.rs`
Config file	`config/filters/secrets.toml`

The big one. Sixteen hundred-odd compiled regexes scan outbound content for credential shapes — API keys, tokens, private keys, JWTs, anything that looks like "this is the kind of thing you don't want to send out by accident".

What it catches

The pattern set is sourced from a curated upstream (similar in shape to gitleaks / trufflehog rule sets, but tuned for the false-positive characteristics of an agent workload). Categories:

Cloud providers — AWS, Azure, GCP, Vercel, Cloudflare, Heroku, …
CI/CD — GitHub, GitLab, CircleCI, BuildKite tokens.
SaaS APIs — Stripe, Slack, Twilio, SendGrid, Linear, Notion.
Databases — MongoDB connection strings, Postgres URIs with passwords, Redis URLs with auth.
Generic credential shapes — RSA/Ed25519 private key headers, JWTs, basic-auth URLs, S3 presigned URLs.
Cryptocurrency — wallet seeds, private keys (because exfil patterns are similar).

Each pattern has:

A score contribution (+3.0 to +5.0 based on confidence).
A label that surfaces in the audit log (e.g. aws_secret_access_key, slack_xoxb).
An optional entropy floor — high-entropy detection to reduce false positives on generic-looking shapes.

When it fires

The scanner runs when a call carries content that could leave the box:

Network sends (sendto, send, sendmsg, write to a socket fd).
Shell command args (the args might be smuggling credentials).
File writes where the destination is outside the session's containment zone.

It does not run on reads. A file_read of a .env is caught by sensitive_path / path_match, not secret_scan — there's nothing to scan yet at read time. Once that data flows to a sink, secret_scan catches the content if it goes outbound.

Performance

The scanner compiles the rule set into a single regex automaton at start (regex::RegexSet). Matching is linear in the content length, regardless of the number of patterns. For typical syscall content (a few KB), the scan is in the ~0.5–2ms range.

For very large bodies, the scanner caps the scan length at 1MB by default. Anything beyond that is sampled. Configurable via [proxy.filters.secret_scan] max_scan_bytes.

Tuning

Most users never need to touch this filter — the defaults are tuned for low false positives. Reasons you might:

A pattern fires on legitimate content — disable the specific pattern:
```
[proxy.filters.secret_scan]
disabled_patterns = ["generic_jwt"]
```

Your team uses an internal credential format — add a custom pattern:

[[proxy.filters.secret_scan.custom_patterns]]
name = "acme_internal"
regex = "ACME-[A-Z0-9]{32}"
score = 4.0

You want everything paranoid — flip on the aggressive profile, which re-enables ~200 high-false-positive patterns that are off by default.

7. Secret / credential scanning

What it catches

When it fires

Performance

Tuning

See also