Promptfoo

Also known as: promptfoo-cli, promptfoo red team, promptfoo eval

Promptfoo
An open-source CLI and library for evaluating, testing, and red-teaming LLM applications, scanning for vulnerabilities like prompt injection, jailbreaks, and PII leaks across configurable test suites.

Promptfoo is an open-source command-line tool and library that automates red-teaming and security testing of LLM applications by scanning for vulnerabilities like prompt injection, jailbreaks, and data leaks.

What It Is

If you’re building an application on top of an LLM, you face a question that manual testing can’t answer at scale: how does this system behave when someone actively tries to break it? The attack surface for language models is wide — prompt injections, jailbreak attempts, extraction of personally identifiable information (PII), toxic outputs — and checking each scenario by hand takes more time than most teams have. Promptfoo automates that process, giving teams a structured way to probe their LLM applications for known weaknesses before real users find them.

Think of it as a structured fuzzer for language models. A traditional software fuzzer throws random inputs at a program to find crashes. Promptfoo works on the same principle, but instead of random data, it generates adversarial prompts — carefully crafted inputs designed to trick the model into violating its safety boundaries. It then scores each response against pass/fail criteria you define, producing a clear picture of where your defenses hold and where they break.

The workflow is configuration-driven. You write a YAML file (promptfooconfig.yaml) that specifies the model or API endpoint under test, the prompt templates to run, and the assertions that define acceptable output. Running promptfoo eval executes those test cases and produces a report showing what passed and what failed. For red-teaming specifically, promptfoo redteam generates adversarial probes automatically — according to Promptfoo Docs, the tool covers over fifty vulnerability categories including injection attacks, jailbreak sequences, PII extraction, and toxicity patterns.

What separates Promptfoo from ad-hoc testing is its support for recognized security frameworks. You can configure scans aligned with OWASP LLM Top 10, NIST AI RMF, or MITRE ATLAS without writing every test case from scratch. This is the difference between a scattered collection of edge-case checks and structured evidence that compliance teams and auditors can work with. Framework-aligned testing turns security verification into a repeatable workflow rather than an improvised exercise.

According to Promptfoo GitHub, the current version is 0.121.3, and the project is MIT-licensed. OpenAI announced its acquisition of Promptfoo on March 9, 2026, with the project remaining open-source.

How It’s Used in Practice

The most common way teams encounter Promptfoo is during pre-deployment security testing. A developer finishes building a chatbot, a retrieval-augmented generation (RAG) system, or an AI agent and needs to verify it won’t leak confidential data, follow injected instructions, or produce harmful content. Instead of writing test cases manually, they run promptfoo redteam init, answer a few questions about their application, and get a generated configuration targeting relevant vulnerability categories.

Once the scan finishes, Promptfoo produces an interactive report — a local web UI where you can browse each probe, see the model’s response, and check whether it passed or failed. Teams typically integrate these scans into their CI/CD pipelines (automated build-and-deploy workflows) so every code change triggers a fresh round of adversarial testing before reaching production.

Pro Tip: Start with the OWASP LLM Top 10 preset for your first red-team run. It covers the most common vulnerability categories without requiring deep security expertise, and the results give you a structured baseline you can share with your security team or include in compliance documentation.

When to Use / When Not

ScenarioUseAvoid
Pre-deployment security check for an LLM-powered feature
Quick one-off prompt test during early prototyping
Continuous red-teaming integrated into CI/CD pipelines
Testing a system with no LLM component
Compliance-aligned vulnerability scanning (OWASP, NIST)
Evaluating prompt quality or output formatting only

Common Misconception

Myth: Running Promptfoo once before launch means your LLM application is “secured.”

Reality: Red-teaming is a continuous process, not a one-time gate. Models get updated, prompts change, and new attack techniques surface regularly. A clean scan today doesn’t guarantee safety next month. Teams that treat red-teaming as a recurring CI/CD step — rather than a pre-launch checkbox — catch regressions before users discover them.

One Sentence to Remember

Promptfoo turns “we think our LLM app is safe” into “we tested it against fifty-plus vulnerability categories and here’s the evidence” — making red-teaming repeatable, auditable, and fast enough to run on every deployment.

FAQ

Q: Does Promptfoo only work for red-teaming, or can it test prompt quality too? A: It handles both. You can use it for general prompt evaluation — comparing outputs across models, scoring quality, checking format — and for dedicated red-teaming where it generates adversarial probes automatically.

Q: Do I need security expertise to run a Promptfoo red-team scan? A: No. The OWASP LLM Top 10 and NIST AI RMF presets provide ready-made vulnerability categories with pre-built probes. You configure your target endpoint, and Promptfoo handles the rest.

Q: Is Promptfoo still open-source after the OpenAI acquisition? A: Yes. The project remains MIT-licensed on GitHub. OpenAI announced the acquisition on March 9, 2026, and has not changed the licensing model or restricted community access to the codebase.

Sources

Expert Takes

Red-teaming without structure is anecdote collection. Promptfoo enforces methodology by mapping probes to recognized vulnerability taxonomies — OWASP, NIST, MITRE ATLAS — so each test run produces categorized evidence rather than a scattered list of failures. The distinction matters: structured testing reveals systemic weaknesses in safety layers, while ad-hoc probing only finds whatever you happen to stumble on.

The configuration-first design is what makes this practical. You define your target endpoint, pick a framework preset, and run a single command. The report shows exactly which categories passed and which failed, with the specific prompts and responses that triggered each result. That’s a debugging workflow, not a security ritual — you can trace every failure back to a fixable root cause.

OpenAI acquired this tool for a reason. As AI agents handle more sensitive tasks — financial transactions, medical queries, code execution — the companies deploying them need auditable proof that their systems were tested against known attack patterns. Promptfoo turns red-teaming from a specialist skill into a standard development step, and that shift changes who can ship AI products responsibly.

Automated red-teaming creates a false sense of completeness. Passing preset vulnerability categories sounds thorough, but those categories reflect yesterday’s attack research. The probes that will matter most next quarter haven’t been written yet. Teams that rely on any single tool — no matter how well-designed — risk confusing “we ran the scan” with “we understand our risks.” The scan is a floor, not a ceiling.