Guardrails and PII redaction for LLM apps β simple Python SDK.βοΈ Guardrails Benchmark#
Platform | π‘οΈ English β | π Multilingual β | β‘ Latency β | π’ On-Prem |
---|
π Walled AI | 90.30% | 90.29% | 300 ms (30 ms*) | β
Yes |
Bedrock | 83.36% | 79.26% | 500 ms | β No |
Mistral | 76.07% | 76.86% | 300 ms | β No |
Azure | 74.52% | 73.74% | 300 ms | β No |
OpenAI | 76.29% | 72.95% | 350 ms | β No |
π Multilingual benchmark: Arabic, English, Filipino, French, Hindi, Russian, Serbian, Spanish.*β¨ 30 ms on-premises deployment.π Installation#
Quick Start#
1) Minimal moderation#
Example output
2) Minimal redaction#
Example output
Masked: Hi, I'm [Person_1]. Email [Email_1]. I have [Diagnosis_1].
Mapping: {'[Person_1]': 'John', '[Email_1]': 'john@walled.ai', '[Diagnosis_1]': 'cancer'}
Use with OpenAI#
If unsafe, return a default response; else forward to OpenAI.Example output
Sorry, I canβt help with that.
Banana bread recipe: ...
Core Concepts#
WalledProtect β Moderation & compliance + PII presence flags.
WalledRedact β Detects & masks PII/PHI consistently across turns.
Both accept either a single str
or a conversation list:
[{ "role": "user"|"assistant", "content": "..." }, ...]
Guided Examples#
Prompt moderation with compliance + PII flags#
Example output
Is_safe: False
Banking -> True
Medical -> False
Person's Name -> True
Address -> False
Email Id -> False
Contact No -> False
Date Of Birth -> True
Unique Id -> True
Financial Data -> True
Multi-turn conversation moderation#
Example output
Is_safe: False
Medical -> False
Banking -> True
Person's Name -> True
Address -> False
Email Id -> False
Contact No -> False
Date Of Birth -> True
Unique Id -> True
Financial Data -> True
WalledRedact - PII Detection & Masking#
Basic PII Masking#
Example output
Masked text: Hi, myself [Person_1]. My email is [Email_1] and I have been diagnosed with [Diagnosis_1].
Mapping: {'[Person_1]': 'John', '[Email_1]': 'john@walled.ai', '[Diagnosis_1]': 'cancer'}
Multi-turn Conversation PII Masking#
Example output
Masked text:
[
{'role': 'user', 'content': 'Hi there, my name is [Person_1]'},
{'role': 'assistant', 'content': 'Hello [Person_1]! How can I help you today?'},
{'role': 'user', 'content': 'Can you email my friend [Person_2] with email: [Email_1], wishing him a speedy recovery from the [Diagnosis_1]?'}
]
Mapping: {'[Person_1]': 'John Doe', '[Person_2]': 'Joseph', '[Email_1]': 'Joseph.cena@example.com', '[Diagnosis_1]': 'viral fever'}
Response Shapes#
Protect
Redact
Errors#
WalledProtect#
Expand
Error Response#
Field | Type | Description |
---|
success | bool | Always False for error responses |
statusCode | int | Http Status Code for errors |
errorCode | str | Main Model Error Code (for guardrail/pii) |
message | str | Description of Error |
details | dict | Details of Error |
WalledRedact#
Expand
Error Response#
Field | Type | Description |
---|
success | bool | Always False for error responses |
statusCode | int | Http Status Code for errors |
errorCode | str | Main Model Error Code (for guardrail/pii) |
message | str | Description of Error |
details | dict | Details of Error |
Evaluation#
The SDK provides an evaluation method to test and measure the performance of the Walled Protect functionality against a ground truth dataset.Batch Evaluation with CSV#
Eval Method Parameters
Parameter | Type | Required | Default | Description |
---|
ground_truth_file_path | str | Yes | - | Path to CSV with test cases |
model_output_file_path | str | Yes | - | Path to save results |
metrics_output_file_path | str | Yes | - | Path to save metrics |
concurrency_limit | int | No | 20 | Max concurrent requests |
Ground Truth CSV Format
Required Columns (must be present in this order):Column Name | Type | Description |
---|
test_input | str | The input text to be processed |
compliance_topic | str | The compliance topic for the test case |
compliance_isOnTopic | bool | Whether the input is on the specified topic (TRUE /FALSE ) |
Optional Columns (can be included as needed):Column Name | Type | Description |
---|
Person's Name | bool | Whether a person's name is present (TRUE /FALSE ) |
Address | bool | Whether an address is present (TRUE /FALSE ) |
Email Id | bool | Whether an email ID is present (TRUE /FALSE ) |
Contact No | bool | Whether a contact number is present (TRUE /FALSE ) |
Date Of Birth | bool | Whether a date of birth is present (TRUE /FALSE ) |
Unique Id | bool | Whether a unique ID is present (TRUE /FALSE ) |
Financial Data | bool | Whether financial data is present (TRUE /FALSE ) |
Casual & Friendly | bool | Whether the greeting is casual & friendly (TRUE /FALSE ) |
Professional & Polite | bool | Whether the greeting is professional & polite (TRUE /FALSE ) |
Evaluation Features
CSV-based testing: Load test cases from CSV files
Concurrent processing: Configurable concurrency limits
Automatic retries: Built-in retry logic with delays
Metrics generation: Accuracy, precision, recall, and F1 scores
Dynamic column support: Automatically detects PII and greeting columns
Output Files
1.
Model Results CSV: Contains the actual model predictions for each test case, including:All columns present in the ground truth file
An additional is_safe
column with TRUE
or FALSE
values indicating whether the input passed the safety evaluation
2.
Metrics CSV: Contains evaluation metrics including:
FAQ#
Strings vs conversations? Both supported.
Consistent masking across turns? Yes.
PII detection vs redaction? Protect flags, Redact masks.
Contributing & License#
PRs welcome. Licensed under MIT. Modified atΒ 2025-09-05 09:53:21