Voicegain platform supports PII redaction in two of its web APIs: Transcription and Speech Analytics.
PII Redaction in the Transcribe API
PII redaction is supported for the OFF-LINE mode of the POST /asr/ranscribe/async API.
By default PII redaction is disabled, and it can be enabled in the body of each API request using settings.formatters. Alternatively, it can be also enabled in the Context's formatters setting using PUT /confgroup/{uuid} method for the Context from which the JWT is obtained that will be used to authenticate the /asr/transcribe/async requests.
There are two formatter types that can be used for PII Redaction:
- redact type - provides redaction for entities like: EMAIL, PHONE, SSN , CC, PERSON, ZIP.
To turn formatting on you need to include the name of the entity (e.g. EMAIL) in the parameters map (see example below) with value being one of:- full to fully mask the entity with ****
- partial, e.g., ****-****-****-1234, or a***@g***
- [WORD] - if a word in square brackets is provided, then this word with brackets will be used to replace the entity (obviously, you can put any word in place of WORD)
- regex type - provides text redaction/modification using regular expression matching, has the following parameters:
- pattern - (required) this is the regex pattern to match
- mask - (required) full, partial, or [WORD]
- options - (optional) string with options. Same (single-letter) options apply as in python re library. For example: "IS" means: I ignore case, S dotall
Here is an example value of the settings.formatters:
{
...
"settings" : {
"formatters" : [
{
"type": "redact",
"parameters": {
"CC": "partial",
"ZIP": "full",
"PERSON": "[PERSON]"
}
},
{
"type": "regex",
"parameters": {
"regex": "[1-9][0-9]{3}[ ]?[a-zA-Z]{2}",
"mask": "full",
"options": "IA"
}
}
]
}
Coming Soon
The transcript audio (referenced by audio.capturedAudio) will be blanked out in places where the words get redacted.
PII Redaction in the Speech Analytics API
The Speech Analytics API supports redaction of text and audio for all the recognized NER entities. PII redaction can be enabled in the Speech Analytics Configuration that is passed to POST /sa API request (in saConfig field).
Speech Analytics Configuration can be modified using PUT /sa/config/{id} API. The relevant setting is piiRedaction which is an array of objects each with up to 1 to 3 fields:
- namedEntity - (required) this is the entity to be redacted. Possible values are:
- ADDRESS - Postal address.
- CARDINAL - Numerals that do not fall under another type.
- CC - Credit Card
- DATE - Absolute or relative dates or periods.
- EMAIL - Email address
- EVENT - Named hurricanes, battles, wars, sports events, etc.
- FAC - Buildings, airports, highways, bridges, etc.
- GPE - Countries, cities, states.
- LANGUAGE - Any named language.
- LAW - Named documents made into laws.
- NORP - Nationalities or religious or political groups.
- MONEY - Monetary values, including unit.
- ORDINAL - "first", "second", etc.
- ORG - Companies, agencies, institutions, etc.
- PERCENT - Percentage, including "%".
- PERSON - People, including fictional.
- PHONE - Phone number.
- PRODUCT - Objects, vehicles, foods, etc. (Not services.)
- QUANTITY - Measurements, as of weight or distance.
- SSN - Social Security number
- TIME - Times smaller than a day.
- WORK_OF_ART - Titles of books, songs, etc.
- ZIP - Zip Code (if not part of an Address)
- redactTranscript - If not null then the value of redactTranscript will be used to replace the entity matched.
If redactTranscript is not provided or null then the redacted text is replaced with the name of the NER in <>, e.g. <ADDRESS> - redactAudio - if not null then the audio for the matching entity will be replaced with either silence or beep as specified. (NOTE: current implementation only replaces with silence, even if beep is requested)
Comments
0 comments
Please sign in to leave a comment.