Normally when /asr/recognize/async API is used the recognizer will return when the grammar is matched and the complete timeout expires. That means that it is only possible to get a single recognition in one /asr/recognize/async API request. If a NOMATCH or NOINPUT is detected the recognition will terminate.
However, sometimes there are use cases which demand that the recognizer e.g. ignores all NOMATCHes until a MATCH is found. This is what the continuous recognition option is for.
Example request
Below is a sample API request that will start continuous recognition:
{
"sessions": [{
"asyncMode": "REAL-TIME",
"continuousRecognition": {
"enable": true,
"stopOn": ["ERROR"],
"noCallbackFor": ["NOMATCH"]
},
"callback": { "uri" : "http://my-host.com/recognition-result?sid={sessionId}" }
}],
"audio": {
"source": { "stream": { "protocol": "WEBSOCKET" } },
"format": "L16",
"rate": 8000,
"channel": "mono",
"capture" : false
},
"settings": {
"asr": {
"noInputTimeout": 5000,
"incompleteTimeout": 3000,
"completeTimeout": 1500,
"grammars" : [{
"type": "JJSGF",
"grammar": "my-zip-grammar",
"public": {
"root": "(<digit> {d1=rules.digit.d;}) (<digit> {d2=rules.digit.d;}) (<digit> {d3=rules.digit.d;}) (<digit> {d4=rules.digit.d;}) (<digit> {d5=rules.digit.d;}) {out.zip='zip'+d1+d2+d3+d4+d5;}"
},
"rules": {
"digit": "(zero {out.d='0';}) | (one {out.d='1';}) | (two {out.d='2';}) | (three {out.d='3';}) | (four {out.d='4';}) | (five {out.d='5';}) | (six {out.d='6';}) | (seven {out.d='7';}) | (eight {out.d='8';}) | (nine {out.d='9';});"
}
}]
}
}
}
continuousRecognition option has 3 elements:
- enable - turns the continuous recognition on or off (by default it is false/off)
- stopOn - list of events that will terminate continuous recognition. Possible values are: NOINPUT, NOMATCH, MATCH, and ERROR. By default recognition will stop on either MATCH or ERROR, in case of NOINPUT or NOMATCH the recognition will continue.
- noCallbackFor - list of event types for which no callbacks need to be made.
Example Use Case
An example might be a use case where a voicemail is being played to a caller and during the playback we want to interpret caller commands like e.g. stop, next, previous, save, delete. If we used normal recognition we would encounter situations where what is said was not understood. Stopping recognition on NOMATCH would not make much sense because either: (1) re-prompting would mess up the flow of the call, or (2) restarting recognition might introduce a gap in recognition that may result in missing a part what the caller said.
In scenario like this it is best to ignore NOMATCH and continue to listen, the caller will notice no response to what he said and will naturally repeat that.
The settings for continuous recognition that would work in this case would be:
- enable : true
- stopOn : ERROR, MATCH
- noCallbackFor : NOINPUT, NOMATCH - notes: (1) in this case we suggest setting a noinputTimeout very long so that internally no NOINPUTS are generated, (2) application could also decide to accept NOMATCH callbacks - they could be tracked and if too numerous acted upon.
Comments
2 comments
It would be nice to have example Code on implementations. A GIST or REPO for solutions such as this.
Or even a ECOSYSTEM page where one can post share and locate SCRIPTS designed for the SHEMA object model required by the VoiceGain platform.
An example of a simple Web Application that shows how to use the API for Continuous Recognition is available here: platform/examples/command-grammar-web-app at master · voicegain/platform (github.com)
A Python script that illustrates continuous recognition can be found here: platform/async-real-time-grammar-continuous.py at master · voicegain/platform (github.com)
Please sign in to leave a comment.