Web API methods that perform speech recognition (as opposed to transcription) do not use a large vocabulary Natural Language Model (NLM) and instead require grammars to define what utterances are possible and to assign semantic meaning (tags) to the recognized utterances.
/asr/recognize Web API supports 3 types of grammars:
- GRXML - standard grammars
- JJSGF - JSON wrapped JJSGF grammar
- BUILT-IN - built-in grammars provided by Voicegain
There is also a fourth grammar type which is specific to GREG test platform.
Single /asr/recognize requests takes an array of grammars. They are all active at the same time. The name of the grammar that matched the utterance will be returned in the result.
You can access the full API documentation from the Voicegain Web Console at: https://console.voicegain.ai/api-documentation#operation/asrRecognizeAsync (look for: settings.asr.grammars)
GRXML
GRXML grammars are the XML form of SRGS (Speech Recognition Grammar Specification) grammars - the other form being ABNF.
Details of the specific GRXML syntax supported by Voicegain platform are here.
JJSGF
JJSGF is a JSON version of JSGF (JSpeech Grammar Format). They are equivalent to GRXML grammars.
You can read more about JJSGF grammars here.
Built-In Grammars
Voicegain provides a set of commonly used grammars built in. They are implemented as dynamic GRXML grammars with the content to be captured controlled using parameters.
When passed in an MRCP request the built in grammar url will be e.g. for speech
builtin:grammar/number?minallowed=0
and for DTMF
builtin:dtmf/number?minallowed=0
When used in the /asr/recognize Web API, a speech grammar will be e.g.
{"type":"BUILT-IN","name":"number", "parameters" : {"minallowed" : 0}}
while a corresponding DTMF grammar will be:
{"type":"BUILT-IN","name":"dtmf/number", "parameters" : {"minallowed" : 0}}
Currently only en-us language is supported in builtin grammars. Here is an example how to specify language for a built-in grammar:
builtin:grammar/number?language=en-us
or
{"type":"BUILT-IN","name":"number", "parameters" : {"language" : "en-us"}}
Combining grammars with large vocabulary recognition
We support combination of grammar and large vocabulary recognition. To combine these two in an API request you need to use this special name for the grammar to enable large vocabulary recognition next to grammar-based recognition:
{ "type" : BUILT-IN, "name" : "transcribe"}
For information how to combine the two when using MRCP see here.
List of available builtin grammars
Below are all built-in grammars. Each grammar can be either speech or DTMF.
alphanumeric
It recognizes a connected alphanumeric string, such as "abc123". For example, it can be used to spell a person’s name, or to recognize a product number that contains both letters and digits. This grammar requires lowercase input.
Returns results in alphanum_str
Parameters:
- length - length of the expected alphanumeric string - cannot be used together with the other two parameters
- minlength
- maxlength
boolean
The boolean built-in grammar accepts an affirmative or negative reply from the caller: “yes”, “no”, “true”, “false”, and so on. When invoking this grammar you can use URI parameters to assign any two touchtone buttons as synonyms for yes and no. This grammar returns a MEANING with a value of “true” or “false”
Parameters:
- y, n : assign to DTMF values for yes and no
ccexpdate
The ccexpdate grammar understands the expiration date on a credit card. These dates generally omit the day, listing only the month and year. The grammar returns the date in YYYYMMDD format (the date is the last day of the month). If the caller specifies the day, the grammar accepts the utterance but automatically assigns the last day in that month, even if the caller specified a different day.
Returns results in year4, year2, month, month_by_name, MEANING,
Parameters:
- maxallowed - number of months from today to include in the grammar
creditcard
The creditcard grammar understands a caller saying a credit card number, optionally preceding the number with the credit card name. It returns the MEANING text string with the recognized card number. The grammar also returns a CARDTYPE key identifying the type of credit card (visa, amex, and so on), based on the first digit in the card number. This grammar can be constrained to accept specific types of credit cards.
Parameters:
- typesallowed - allowed types of credit cards - as string of card names separated by +
allowed values are: amex, discover, mastercard, visa
currency
The currency built-in grammar accepts currency amounts appropriate to the current language. The grammar can be constrained to accept only those amounts that fall within a given range.
The implementation of currency includes the following features:
• It allows values up to 999,999,999.99.
• In a DTMF grammar, a caller can press the star (*) key on the telephone keypad to indicate a decimal point.
This built-in returns an amount in the following basic format:
dollar_amount.cent_amount
If the caller does not specify an amount for one of these fields, it is assigned a value of zero. This means that the built-in can accept amounts like “six dollars” or “forty-one cents”.
Parameters:
- minallowed, maxallowed - Setting an allowed range tells the recognizer to only allow mounts inside the range you specify. Amounts outside the range are not recognized as part of the vocabulary.
digit
The digits built-in grammar recognizes digit strings of up to 20 digits. It can be constrained to accept a specific list of valid strings, or a specific number of random digits. Punctuation characters such as hyphens (-), dots (.), and underscores (_) are not recognized. If spoken, they reduce recognition accuracy.
Note: It is strongly recommended that you use the length parameter or minlength and maxlength parameters to constrain this grammar, as doing so improves accuracy.
This grammar returns a MEANING key that contains a string of digits with no embedded spaces, such as “12345”.
Parameters:
- length - length of the expected digit string - cannot be used together with the other two parameters
- minlength
- maxlength
date
This date grammar recognizes the following:
- <month>
- <weekday>
- <specials> such as today, tomorrow, and yesterday
- <month> <date> <year>, the default date range is between 01/01/1900 and 12/31/2199. Any date in that range is recognized.
The date range can be changed by specifying minallowed and/or maxallowed in requests for this date grammar.
However, currently only <year> is dynamically generated based on the specified minallowed and/or maxallowed. - <weekday> <month> <date>
- [the] <date> of <month> <year> refer to point 4.
Returns results in: YEAR, TWO_DIGIT_YEAR, MONTH, WEEKDAY, DAY, MEANING
Parameters:
- minallowed, maxallowed
number
The number grammar recognizes spoken numbers: that is, numbers expressed as words rather than as a digit string—”four hundred fifty-three” rather than “four five three”. This grammar will accept decimal places up to a specified limit, and can be constrained to various ranges and multipliers (tens, hundreds, thousands, and so on).
This grammar returns a MEANING with digits with no embedded spaces, with or without decimals (“12345.678”).
Parameters:
- minallowed Minimum number (default 0)
- maxallowed Maximum number (default 999,999,999.99)
- maxdecimal Maximum number of decimal places (default 2, maximum 9)
phone
The phone built-in grammar accepts local and long distance telephone numbers, as well as variations for special numbers (for example, “911” for emergency in US English). The grammar accepts extension numbers if needed, and can limit the length of these extension numbers to a set number of digits.
This grammar returns a MEANING key containing the phone number as a string of digits with no embedded spaces or punctuation. The number of digits in the string depends on the local standards.
Returned results: phone, extension
Parameters:
- extensionallowed
- longdistanceallowed
- specialsallowed
routing-number
Routing number grammar
Returns results in: routing_number
ssn
Recognizes 9-digit US Social Security numbers.
Returns results in: ssn
time
This grammar recognizes the following:
- <qualifier> <12-hour> <minute> [<section_of_day>]
- [<qualifier>] <12-hour> <section_of_day>
- midnight
- noon
- [<qualifier>] <minute> [minute|minutes] (past|to|after|before) <12-hour>
- <24-hour> <minute>, note that <24-hour> refers to 12-23.
Returns results in: time (e.g. 1200pm), hours, minutes, meridiem (e.g. pm)
Parameters:
- minallowed, maxallowed
universal
The universal built-in grammars are:
- cancel: Recognizes a request to cancel the previous transaction.
- exit: Recognizes a request to exit the voice application.
- help: Recognizes a request for help or more information.
- operator: Recognizes a request to be transferred to a live operator.
- repeat
- wait
- goback
- startover
These universal grammars will only recognize the caller’s request, not perform it.
Results are returned in: _universal
Parameters:
- types - values to be returned separated by +
zip
The zipcode grammar recognizes valid United States ZIP Codes in five-digit format. This grammar returns a zip text string for the code in whatever format is appropriate for the local postal services.
Comments
0 comments
Article is closed for comments.