30/05/2026
We benchmarked PromptPurify against 4 OSS prompt injection guardrails. The size gap is embarrassing. Here's the comparison:
Most teams evaluating guardrails compare recall numbers. That's the wrong first question. The right first question: will your team actually ship it?
A model that needs a GPU, a sidecar, and API budget never makes it to production. It stays in the eval spreadsheet.
Good guardrail vs Bad guardrail:
- 14 MB vs 180 MB to 7 GB
- CPU, in-process vs GPU recommended
- Single-digit ms vs network round-trip
- $0/call vs compute cost
- Ships inside your app vs runs next to it
Same inputs. Same scoring code. Same eval slice. Reproducible in 2 commands on your laptop.
1. Threshold-neutral methodology Every model evaluated at its own published default. No model gets a home-field advantage.
2. Held-out eval Hash-bucketed splits. The evaluation slice was never seen by our model at training time.
3. Reproduce it yourself No cloud credits. No GPU. CPU only, 3-5 minutes.
The guardrail that fits in your stack is the one that actually protects your users.
https://lnkd.in/gVesqZR5
Star it. Run the bench. Tell us where it fails.
This link will take you to a page that’s not on LinkedIn