In this example, the base Callback Url for the application is: /speech/dynamic-test . This Url is to be provisioned by a user of Voicegain Telephony Bot API. It can e.g. be hosted on a Node.js server, or Python Flask, or it could be an AWS Lambda function.

This webservice needs to implement 3 methods on the same url:

POST - for the first request in a new telephone call
PUT - for all requests during the call
- Note: for this request there are 3 options where the csid will be included in the request (you can configure that in the Web Console):
  - path, e.g. PUT /speech/dynamic-test/fa8ded81-0564-4895-ac09-d9e4ced3006c?seq=2
  - query, e.g. PUT /speech/dynamic-test?csid=fa8ded81-0564-4895-ac09-d9e4ced3006c&seq2
  - body, e.g. {"csid" : "fa8ded81-0564-4895-ac09-d9e4ced3006c", ..... }
DELETE - for the last request in the call
- Delete also has csid

NOTE: the JSON parser used by the Telephony Bop App back-end currently cannot handle mixed quotes, so do not submit JSON responses containing, e.g.,

"text" : 'Welcome back'

The API documentation can be found here (you need to be logged in into Voicegain Web Console): https://console.voicegain.ai/api-documentation#tag/aivr-callback

1: New call started

Voicegain invokes your webhook: POST /speech/dynamic-test

with payload:

{
  "sid":"e02d682e-b0d8-4b08-8121-324ec5ee140a",
  "sequence":1,
  "ani":"+1817400xxxx",
  "dnis":"+1817573xxxx",
  "startTime":"2020-08-06T22:26:02.717Z",
  "logicType":"inbound",
  "media":"speech",
  "userAppData":"n/a"
}

Callback application creates its own session fa8ded81-0564-4895-ac09-d9e4ced3006c and sends the following response back to Voicegain (note: do not send back the entire content of the request):

{
  "csid":"fa8ded81-0564-4895-ac09-d9e4ced3006c",
  "sid":"e02d682e-b0d8-4b08-8121-324ec5ee140a",
  "sequence":1,
  "prompt":{
    "text":"Welcome to Call Recorder",
    "audioProperties":{"voice":"catherine"}
  }
}

This response provides Voicegain with instructions to play a prompt "welcome to call recorder" with a TTS voice called "catherine".

Note, this is not an HTTP request that you make to Voicegain API, but a response to an HTTP request that Voicegain made to your webhook service.

2: First prompt played

After playing the requested prompt has finished Voicegain makes the second callback.

Voicegain invokes: PUT /speech/dynamic-test/fa8ded81-0564-4895-ac09-d9e4ced3006c?seq=2

with payload:

{
  "sid":"e02d682e-b0d8-4b08-8121-324ec5ee140a",
  "events":[
    {
      "type":"output",
      "timeMsec":1780,
      "logicType":"inbound",
      "sequence":"1",
      "text":"Welcome to Call Recorder",
      "endReason":"completed",
      "method":"vui"
    }
  ]
}

Callback application responds with request to ask a question. It sets up a grammar which is a built-in phone grammar both in speech and DTMF form. The "name" value is set to "phone" - it will be used as the name of the variable that will store the semantic meaning returned from the grammar.

Within a question you can provide two prompts (either one is optional but you need at least one):

"text" - this is the prompt that will be played fully (no barge-in). If caller says something during this prompt it will not be captured. If you do not want a nonbargineable prompt then you must omit text field completely
"questionPrompt" - this is the bargineable question prompt. Speech recognition is started as soon as this prompt starts playing. If you do not want a bargineable prompt then you must omit questionPrompt field completely

NOTE: if you do not provide the "audioResponse.grammar" field then a large vocabulary transcription will be made rather than grammar-based recognition.

{
  "csid":"fa8ded81-0564-4895-ac09-d9e4ced3006c",
  "sid":"e02d682e-b0d8-4b08-8121-324ec5ee140a",
  "sequence":"2",
  "question":{
    "name":"phone",
    "text":"First the phone number.",
    "audioProperties":{"voice":"catherine"},
    "audioResponse":{
      "questionPrompt":"Say or Enter the number you want to call",
      "grammar":[
        {"type":"BUILT-IN","name":"phone"},
        {"type":"BUILT-IN","name":"dtmf/phone"}
      ]
    }
  }
}

"text" is optional, so if there is no need for a bargineable part you can return this response:

{
  "csid":"fa8ded81-0564-4895-ac09-d9e4ced3006c",
  "sid":"e02d682e-b0d8-4b08-8121-324ec5ee140a",
  "sequence":"2",
  "question":{
    "name":"phone",
    "audioProperties":{"voice":"catherine"},
    "audioResponse":{
      "questionPrompt":"Say or Enter the number you want to call",
      "grammar":[
        {"type":"BUILT-IN","name":"phone"},
        {"type":"BUILT-IN","name":"dtmf/phone"}
      ]
    }
  }
}

You can use the following SSML elements in the prompt text:

break, e.g., "Items: one <break time='3000ms'/>, two, three, four"

3: Phone number recognized

After the caller has answered with the phone number Voicegain reports this to the Callback.

Voicegain invokes: PUT /speech/dynamic-test/fa8ded81-0564-4895-ac09-d9e4ced3006c?seq=3

with payload:

{
  "sid":"e02d682e-b0d8-4b08-8121-324ec5ee140a",
  "events":[
    {
     "type":"output",
      "timeMsec":4115,
      "logicType":"inbound",
      "sequence":"2.1",
      "text":"Say or Enter the number you want to call",
      "endReason":"completed",
      "method":"vui"
    },
    {
      "type":"input",
      "timeMsec":15765,
      "logicType":"inbound",
      "sequence":"2.2",
      "name":"phone",
      "vuiAlternatives":[
        {
          "utterance":"4 6 9 4 5 1 ninety 3 8 5",
          "confidence":0.8604967594146729,
          "grammar":"phone",
          "semanticTags":{
            "input":"speech",
            "phone":"4694519385"
          }
        }
      ],
      "method":"vui",
      "vuiResult":"MATCH"
    }
  ],
  "vars":{
    "phone.input":"speech",
    "phone.phone":"4694519385"
  }
}

Callback application requests playing a "Got it" prompt

{
  "csid":"fa8ded81-0564-4895-ac09-d9e4ced3006c",
  "sid":"e02d682e-b0d8-4b08-8121-324ec5ee140a",
  "sequence":"3",
  "prompt":{
    "text":"Got it.",
    "audioProperties":{"voice":"catherine"}
  }
}

4: "Got it" played

After playing the requested prompt has finished Voicegain makes the fourth callback.

Voicegain invokes: PUT /speech/dynamic-test/fa8ded81-0564-4895-ac09-d9e4ced3006c?seq=4

with payload:

{
  "sid":"e02d682e-b0d8-4b08-8121-324ec5ee140a",
  "vars":{
    "phone.input":"speech",
    "phone.phone":"4694519385"
  },
  "events":[
    {
      "type":"output",
      "timeMsec":16349,
      "logicType":"inbound",
      "sequence":"3",
      "text":"Got it.",
      "endReason":"completed",
      "method":"vui"
    }
  ]
}

Callback application takes the phone.phone value from the vars and requests a transfer to that phone number:

{
  "csid":"fa8ded81-0564-4895-ac09-d9e4ced3006c",
  "sid":"e02d682e-b0d8-4b08-8121-324ec5ee140a",
  "sequence":"4",
  "transfer":{
    "prompt":{
      "text":"Dialing: <say-as interpret-as=\"telephone\" format=\"1\">4694519385</say-as>",
      "audioProperties":{"voice":"catherine"}
    },
    "phone":{"phoneNumber":"4694519385"}
  }
}

5: Transfer success

Voicegain reports success of the transfer

Voicegain invokes: PUT /speech/dynamic-test/fa8ded81-0564-4895-ac09-d9e4ced3006c?seq=5

with payload:

{
  "sid":"e02d682e-b0d8-4b08-8121-324ec5ee140a",
  "events":[
    {
      "type":"transfer",
      "timeMsec":24843,
      "logicType":"inbound",
      "sequence":"4.1",
      "transferType":"phone",
      "method":"vui",
      "transferDestination":"4694519385",
      "text": "Dialing: <say-as interpret-as="telephone" format="1">4694519385</say-as>"
    },
    {
      "type":"transfer",
      "timeMsec":27000,
      "logicType":"inbound",
      "sequence":"4.2",
      "transferType":"phone",
      "method":"vui",
      "transferDestination":"4694519385",
      "outcome":"success",
      "outcomeDetail":"SUCCESS"
    }
  ],
  "vars":{
    "phone.input":"speech",
    "phone.phone":"4694519385"
  }
}

Notice that transfer is reported in two events: first is the acknowledgement of the transfer prompt being played, second is the outcome of the transfer - they will have different values of timeMsec.

Callback application simply acknowledges receipt of the callback. There is nothing to request - the response payload does not have any action:

{
 "csid":"fa8ded81-0564-4895-ac09-d9e4ced3006c",
 "sid":"e02d682e-b0d8-4b08-8121-324ec5ee140a",
 "sequence":"5"
}

6: Call disconnect

The call ends after the two bridged parties finish talking.

Voicegain invokes: DELETE /speech/dynamic-test/fa8ded81-0564-4895-ac09-d9e4ced3006c?seq=6

with payload:

{
  "sid":"e02d682e-b0d8-4b08-8121-324ec5ee140a",
  "events":[
    {
      "type":"hangup",
      "timeMsec":29591,
      "logicType":"inbound",
      "sequence":"5",
    }
  ],
  "vars":{
    "phone.input":"speech",
    "phone.phone":"4694519385"
  }
}

Callback application acknowledges a normal termination of the call:

{
  "csid":"fa8ded81-0564-4895-ac09-d9e4ced3006c",
  "sid":"e02d682e-b0d8-4b08-8121-324ec5ee140a",
  "sequence":"6",
  "termination":"normal"
}

Other Examples

Large Vocabulary Recognition

If you want to do a large vocabulary recognition, you simply skip the grammar parameter like in the example below, which also illustrates use of hints:

{
  "csid": "mobid9-lmps973g-zhhdf7-7287dtk",
  "sid": "3ca9d8a5-ce9c-4264-ac8b-7e25b3438466",
  "sequence": 1,
  "question": {
    "audioProperties": { "voice": "Catherine" },
    "text": "Hello, I am calling from Huber.",
    "audioResponse": {
      "questionPrompt": "say chicken biryiani",
      "noInputTimeout": 8000,
      "completeTimeout": 1200,
      "hints" : ["chicken biryiani:10"]
    } 
  }
}

Recognition using JJSGF Grammar

Below is an example of the use of JJSGF grammar. Note that in order to get the recognition result you need to name the question, the name will become a name of a variable which will store the semantic result of the recognition, e.g., "vars":{"quest1":"yes"}

{
  "csid": "mobid9-lmps973g-zhhdf7-7287dtk",
  "sid": "98ddcff8-3424-439f-b2db-99696833f678",
  "sequence": 1,
  "question": {
    "audioProperties": { "voice": "Catherine" },
    "audioResponse": {
      "name" : "qest1",
      "questionPrompt": "Hello, I am calling from Huber. say yes or no",
      "noInputTimeout": 8000,
      "completeTimeout": 1200,
      "incompleteTimeout": 2000,
      "grammar" : [
        {
          "type": "JJSGF",
          "parameters": { "tag-format": "semantics/1.0-literals" },
          "grammar": "yes_no",
          "public": {
            "root": "(<yes_phrase> {yes}) | (<no_phrase> {no})"
          },
          "rules": { 
            "yes_phrase": "[sure|ok|okay] ([<yes> [<yes>]] <yes> )",
            "yes": "(yes|yeah|yup)",
            "no_phrase": "([<no> [<no>]] <no> ) [thanks|(thank you)]",
            "no": "(no|nope)"
          }
        }
      ]
    } 
  }
}

Note you can also use other types of grammars, see here.

Recognition using GRXML Grammar

Below is an example of the use of GRXML grammar. Here it is loaded from a URL, but you can also provide it as base-64 encodes inlined data.

{
  "csid": "mobid9-lmps973g-zhhdf7-7287dtk",
  "sid": "98ddcff8-3424-439f-b2db-99696833f678",
  "sequence": 1,
  "question": {
    "audioProperties": { "voice": "Catherine" },
    "audioResponse": {
      "name" : "qest1",
      "questionPrompt": "Hello, I am calling from Huber. say yes or no",
      "noInputTimeout": 8000,
      "completeTimeout": 1200,
      "incompleteTimeout": 2000,
      "grammar" : [
        {
          "type" : "GRXML",
          "name": "my-yes-no-grammar",
          "fromUrl" : { "url" : "http://my.host/grammar-url/my-yes-no.grxml" }
        }
      ]
    } 
  }
}

Transfer to a SIP URI

Here is an example callback response that will trigger a transfer to a SIP URI endpoint:


{
  "csid":"fa8ded81-0564-4895-ac09-d9e4ced3006c",
  "sid":"e02d682e-b0d8-4b08-8121-324ec5ee140a",
  "sequence":"4",
  "transfer":{
    "prompt":{
      "text":"Transferring via SIP INVITE",
      "audioProperties":{"voice":"catherine"}
    },
    "phone":{"phoneNumber":"sip:4bfe7c46-e3bc-4791-9748-233c@fs.acme.com:5060;transport=tcp"}
  }
}

Known Issues: There is currently no way to continue the call (the telephony bot logic) after the invited SIP leg disconnects. So, basically, currently the invited SIP leg has to be considered as the final leg of the call. On our roadmap we have a change that will allow resuming the flow logic of the main bot after the SIP INVITEd leg returns.

SIP REFER Transfer

Here is an example callback response that will trigger a SIP REFER transfer - notice that we use deflect: prefix instead of sip: as in case of SIP INVITE.


{
  "csid":"fa8ded81-0564-4895-ac09-d9e4ced3006c",
  "sid":"e02d682e-b0d8-4b08-8121-324ec5ee140a",
  "sequence":"4",
  "transfer":{
    "prompt":{
      "text":"Transferring via SIP REFER",
      "audioProperties":{"voice":"catherine"}
    },
    "phone":{"phoneNumber":"deflect:4bfe7c46-e3bc-4791-9748-233c@fs.acme.com:5060;transport=tcp"}
  }
}

Disconnect

You can disconnect (hang-up) a call on demand using disconnect command, for example (the prompt is optional and if present will be played just before the call disconnects):

{
  "csid": "mobid9-lmps973g-zhhdf7-7287dtk",
  "sid": "a70af315-4987-427e-a738-6fad6527ea74",
  "sequence": 2,
  "disconnect": {
    "reason" : "END of CALL",
    "prompt" : {
      "audioProperties": { "voice": "Catherine" },
      "text": "Goodbye, disconnecting"
    }
  }  
}

Callback flow of a simple Telephony Bot Application

1: New call started

2: First prompt played

3: Phone number recognized

5: Transfer success

6: Call disconnect

Other Examples

Large Vocabulary Recognition

Recognition using JJSGF Grammar

Recognition using GRXML Grammar

Transfer to a SIP URI

SIP REFER Transfer

Disconnect

Comments

1: New call started

2: First prompt played

3: Phone number recognized

5: Transfer success

6: Call disconnect

Other Examples

Large Vocabulary Recognition

Recognition using JJSGF Grammar

Recognition using GRXML Grammar

Transfer to a SIP URI

SIP REFER Transfer

Disconnect

Related articles