Content Shields API¶
Built-in content and security shields for pydantic-ai agents.
Ready-to-use shields for prompt injection detection, PII filtering, secret redaction, keyword blocking, and refusal detection.
Example
PromptInjection
dataclass
¶
Bases: AbstractCapability[Any]
Detect and block prompt injection attempts.
Scans user input for common injection patterns across 6 categories: ignore_instructions, system_override, role_play, delimiter_injection, prompt_leaking, and jailbreak.
Example
from pydantic_ai import Agent
from pydantic_ai_shields import PromptInjection
# Default: medium sensitivity
agent = Agent("openai:gpt-4.1", capabilities=[PromptInjection()])
# High sensitivity (more false positives, better detection)
agent = Agent("openai:gpt-4.1", capabilities=[PromptInjection(sensitivity="high")])
# Only check specific categories
agent = Agent("openai:gpt-4.1", capabilities=[PromptInjection(
categories=["jailbreak", "prompt_leaking"],
)])
# Add custom patterns
agent = Agent("openai:gpt-4.1", capabilities=[PromptInjection(
custom_patterns=[r"sudo\s+mode", r"admin\s+override"],
)])
Attributes:
| Name | Type | Description |
|---|---|---|
sensitivity |
Literal['low', 'medium', 'high']
|
Detection sensitivity — "low" (obvious attacks only), "medium" (balanced), or "high" (aggressive, may have false positives). |
categories |
list[str] | None
|
Which injection categories to check. None = all. |
custom_patterns |
list[str] | None
|
Additional regex patterns to check. |
Source code in src/pydantic_ai_shields/shields.py
PiiDetector
dataclass
¶
Bases: AbstractCapability[Any]
Detect personally identifiable information (PII) in user input.
Scans for email addresses, phone numbers, SSNs, credit card numbers, and IP addresses using regex patterns.
Example
from pydantic_ai import Agent
from pydantic_ai_shields import PiiDetector
# Detect all PII types
agent = Agent("openai:gpt-4.1", capabilities=[PiiDetector()])
# Only specific types
agent = Agent("openai:gpt-4.1", capabilities=[PiiDetector(
detect=["email", "ssn", "credit_card"],
)])
# Add custom patterns
agent = Agent("openai:gpt-4.1", capabilities=[PiiDetector(
custom_patterns={"passport": r"[A-Z]{2}\d{7}"},
)])
# Log instead of blocking
agent = Agent("openai:gpt-4.1", capabilities=[PiiDetector(action="log")])
Attributes:
| Name | Type | Description |
|---|---|---|
detect |
list[str] | None
|
Which PII types to detect. None = all built-in types. Built-in: "email", "phone", "ssn", "credit_card", "ip_address". |
custom_patterns |
dict[str, str] | None
|
Dict of additional pattern name → regex string. |
action |
Literal['block', 'log']
|
"block" raises InputBlocked, "log" allows through (check |
Source code in src/pydantic_ai_shields/shields.py
SecretRedaction
dataclass
¶
Bases: AbstractCapability[Any]
Detect and block exposure of API keys, tokens, and credentials in model output.
Scans model responses for common secret patterns including OpenAI, Anthropic, AWS, GitHub, Slack keys, JWTs, and private keys.
Example
from pydantic_ai import Agent
from pydantic_ai_shields import SecretRedaction
# Block any secret in output
agent = Agent("openai:gpt-4.1", capabilities=[SecretRedaction()])
# Only check specific secret types
agent = Agent("openai:gpt-4.1", capabilities=[SecretRedaction(
detect=["openai_key", "aws_access_key", "private_key"],
)])
# Add custom patterns
agent = Agent("openai:gpt-4.1", capabilities=[SecretRedaction(
custom_patterns={"stripe_key": r"sk_live_[A-Za-z0-9]{24,}"},
)])
Attributes:
| Name | Type | Description |
|---|---|---|
detect |
list[str] | None
|
Which secret types to scan for. None = all built-in types. Built-in: "openai_key", "anthropic_key", "aws_access_key", "aws_secret_key", "github_token", "slack_token", "jwt", "private_key", "generic_api_key". |
custom_patterns |
dict[str, str] | None
|
Dict of additional pattern name → regex string. |
Source code in src/pydantic_ai_shields/shields.py
BlockedKeywords
dataclass
¶
Bases: AbstractCapability[Any]
Block prompts containing forbidden keywords or phrases.
Configurable keyword blocking with support for case sensitivity, whole-word matching, and regex patterns.
Example
from pydantic_ai import Agent
from pydantic_ai_shields import BlockedKeywords
# Simple keyword list
agent = Agent("openai:gpt-4.1", capabilities=[BlockedKeywords(
keywords=["competitor_name", "internal_only", "classified"],
)])
# Case-sensitive matching
agent = Agent("openai:gpt-4.1", capabilities=[BlockedKeywords(
keywords=["SECRET", "CLASSIFIED"],
case_sensitive=True,
)])
# Whole-word only (won't match "classification")
agent = Agent("openai:gpt-4.1", capabilities=[BlockedKeywords(
keywords=["class", "secret"],
whole_words=True,
)])
# Regex patterns
agent = Agent("openai:gpt-4.1", capabilities=[BlockedKeywords(
keywords=[r"password\s*=\s*\S+"],
use_regex=True,
)])
Attributes:
| Name | Type | Description |
|---|---|---|
keywords |
list[str]
|
List of keywords, phrases, or regex patterns to block. |
case_sensitive |
bool
|
Whether matching is case-sensitive (default: False). |
whole_words |
bool
|
Match whole words only — "class" won't match "classification" (default: False). |
use_regex |
bool
|
Treat keywords as regex patterns (default: False). |
Source code in src/pydantic_ai_shields/shields.py
NoRefusals
dataclass
¶
Bases: AbstractCapability[Any]
Block LLM refusals — ensure the model attempts to answer.
Detects common refusal phrases in model output and raises OutputBlocked. Useful for agents that should always attempt a task rather than refuse.
Example
from pydantic_ai import Agent
from pydantic_ai_shields import NoRefusals
# Default refusal patterns
agent = Agent("openai:gpt-4.1", capabilities=[NoRefusals()])
# Custom patterns
agent = Agent("openai:gpt-4.1", capabilities=[NoRefusals(
patterns=[r"I cannot", r"I'm not able to", r"outside my scope"],
)])
# Allow partial refusals (refusal + substance)
agent = Agent("openai:gpt-4.1", capabilities=[NoRefusals(
allow_partial=True,
min_response_length=50,
)])
Attributes:
| Name | Type | Description |
|---|---|---|
patterns |
list[str] | None
|
Refusal regex patterns to detect. None = built-in set of 10 patterns. |
allow_partial |
bool
|
If True, allow responses that contain refusal language but also have substantial content (above min_response_length). |
min_response_length |
int
|
Minimum character count for a "substantial" response when allow_partial is True (default: 50). |