Web API methods that perform speech recognition (as opposed to transcription) do not use a large vocabulary Natural Language Model (NLM) and instead require grammars to define what utterances are possible and to assign semantic meaning (tags) to the recognized utterances.
/asr/recognize Web API supports 3 types of grammars:
- GRXML - standard grammars
- JJSGF - JSON wrapped JJSGF grammar
- BUILT-IN - built-in grammars provided by Voicegain
There is also a fourth grammar type which is specific to GREG test platform.
Single /asr/recognize requests takes an array of grammars. They are all active at the same time. The name of the grammar that matched the utterance will be returned in the result.
GRXML grammars are the XML form of SRGS (Speech Recognition Grammar Specification) grammars - the other form being ABNF.
Details of the specific GRXML syntax supported by Voicegain platform are here.
JJSGF is a JSON version of JSGF (JSpeech Grammar Format). They are equivalent to GRXML grammars.
You can read more about JJSGF grammars here.
Voicegain provides a set of commonly used grammars built in. They are implemented and dynamic GRXML grammars with the content controlled using parameters.
Below are all built-in grammars. Each grammar can be either speech or DTMF. Currently only en-us and es-es languages are supported.
It recognizes a connected alphanumeric string, such as "abc123". For example, it can be used to spell a person’s name, or to recognize a product number that contains both letters and digits. This grammar requires lowercase input.
Returns results in alphanum_str
- length - length of the expected alphanumeric string - cannot be used together with the other two parameters
The boolean built-in grammar accepts an affirmative or negative reply from the caller: “yes”, “no”, “true”, “false”, and so on. When invoking this grammar you can use URI parameters to assign any two touchtone buttons as synonyms for yes and no. This grammar returns a MEANING with a value of “true” or “false”
- y, n : assign to DTMF values for yes and no
The ccexpdate grammar understands the expiration date on a credit card. These dates generally omit the day, listing only the month and year. The grammar returns the date in YYYYMMDD format (the date is the last day of the month). If the caller specifies the day, the grammar accepts the utterance but automatically assigns the last day in that month, even if the caller specified a different day.
Returns results in year4, year2, month, month_by_name, MEANING,
- maxallowed - number of months from today to include in the grammar
The creditcard grammar understands a caller saying a credit card number, optionally preceding the number with the credit card name. It returns the MEANING text string with the recognized card number. The grammar also returns a CARDTYPE key identifying the type of credit card (visa, amex, and so on), based on the first digit in the card number. This grammar can be constrained to accept specific types of credit cards.
- typesallowed - allowed types of credit cards - as string of card names separated by +
allowed values are: amex, discover, mastercard, visa
The currency built-in grammar accepts currency amounts appropriate to the current language. The grammar can be constrained to accept only those amounts that fall within a given range.
The implementation of currency includes the following features:
• It allows values up to 999,999,999.99.
• In a DTMF grammar, a caller can press the star (*) key on the telephone keypad to indicate a decimal point.
This built-in returns an amount in the following basic format:
If the caller does not specify an amount for one of these fields, it is assigned a value of zero. This means that the built-in can accept amounts like “six dollars” or “forty-one cents”.
- minallowed, maxallowed - Setting an allowed range tells the recognizer to only allow mounts inside the range you specify. Amounts outside the range are not recognized as part of the vocabulary.
The digits built-in grammar recognizes digit strings of up to 20 digits. It can be constrained to accept a specific list of valid strings, or a specific number of random digits. Punctuation characters such as hyphens (-), dots (.), and underscores (_) are not recognized. If spoken, they reduce recognition accuracy.
Note: It is strongly recommended that you use the length parameter or minlength and maxlength parameters to constrain this grammar, as doing so improves accuracy.
This grammar returns a MEANING key that contains a string of digits with no embedded spaces, such as “12345”.
- length - length of the expected digit string - cannot be used together with the other two parameters
This date grammar recognizes the following:
- <specials> such as today, tomorrow, and yesterday
- <month> <date> <year>, the default date range is between 01/01/1900 and 12/31/2199. Any date in that range is recognized.
The date range can be changed by specifying minallowed and/or maxallowed in requests for this date grammar.
However, currently only <year> is dynamically generated based on the specified minallowed and/or maxallowed.
- <weekday> <month> <date>
- [the] <date> of <month> <year> refer to point 4.
Returns results in: YEAR, TWO_DIGIT_YEAR, MONTH, WEEKDAY, DAY, MEANING
- minallowed, maxallowed
The number grammar recognizes spoken numbers: that is, numbers expressed as words rather than as a digit string—”four hundred fifty-three” rather than “four five three”. This grammar will accept decimal places up to a specified limit, and can be constrained to various ranges and multipliers (tens, hundreds, thousands, and so on).
This grammar returns a MEANING with digits with no embedded spaces, with or without decimals (“12345.678”).
- minAllowed Minimum number (default 0)
- maxAllowed Maximum number (default 999,999,999.99)
- maxDecimal Maximum number of decimal places (default 2, maximum 9)
The phone built-in grammar accepts local and long distance telephone numbers, as well as variations for special numbers (for example, “911” for emergency in US English). The grammar accepts extension numbers if needed, and can limit the length of these extension numbers to a set number of digits.
This grammar returns a MEANING key containing the phone number as a string of digits with no embedded spaces or punctuation. The number of digits in the string depends on the local standards.
Returned results: phone, extension
Routing number grammar
Returns results in: routing_number
Recognizes 9-digit US Social Security numbers.
Returns results in: ssn
This grammar recognizes the following:
- <qualifier> <12-hour> <minute> [<section_of_day>]
- [<qualifier>] <12-hour> <section_of_day>
- [<qualifier>] <minute> [minute|minutes] (past|to|after|before) <12-hour>
- <24-hour> <minute>, note that <24-hour> refers to 12-23.
Returns results in: time (e.g. 1200pm), hours, minutes, meridiem (e.g. pm)
- minallowed, maxallowed
The universal built-in grammars are:
- cancel: Recognizes a request to cancel the previous transaction.
- exit: Recognizes a request to exit the voice application.
- help: Recognizes a request for help or more information.
- operator: Recognizes a request to be transferred to a live operator.
These universal grammars will only recognize the caller’s request, not perform it.
Results are returned in: _universal
- types - values to be returned separated by +
The zipcode grammar recognizes valid United States ZIP Codes in five-digit format. This grammar returns a zip text string for the code in whatever format is appropriate for the local postal services.