Example of recognition with GRXML grammar of audio streamed via websocket – Voicegain

Session setup for /asr/recognize/async request

{
  "sessions": [{
     "asyncMode": "REAL-TIME",
     "callback": { "uri" : "http://my.host/callback-url" }
  }],
  "audio": {
    "source": { "stream": { "protocol": "WEBSOCKET" } },
    "format": "PCMU",
    "rate": 8000,
    "capture" : true
  },
  "settings": {
    "asr": {
      "noInputTimeout": 15000,
      "incompleteTimeout": 5000,
      "completeTimeout": 2000,
      "grammars" : [{
        "type" : "GRXML",
        "name": "my-grammar",
        "fromUrl" : { "url" : "http://my.host/grammar-url/my-grammar" }
      }]
    },
  },
}

This request will return a response like this:

{
  "sessions":[{
    "sessionId":"0-0kds03xqq07p4fbnnrth9np8v81l",
    "asyncMode":"REAL-TIME"
  }],
  "audio":{ 
    "stream":{
      "websocketUrl":"wss://api.ascalon.ai/v1/0/socket/3cf17a47-be6d-4b74-a073-70794f84c53a"
    },
    "capturedAudio":"bc92af2e-da47-463a-b290-9bb4d60d84a6"
  },
  "preemptible":false
}

websocketUrl is the URL to which the audio should be streamed.

In Java, the code would e.g. include establishment of connection to wss:

private void connectToWebSocket() {
  WebSocketContainer container = ContainerProvider.getWebSocketContainer();
  try {
    container.connectToServer(this, websocketUrl);
   } catch (DeploymentException | IOException ex) {
     ex.printStackTrace();
   }
 }

and then sending the data in binary using the wss session (note - the data has to be sent in binary, not as websocket messages):

session.getBasicRemote().sendBinary(bb);

once the recognition is complete the uri set for the callback will receive a POST request with payload like this one (format is the same as the documented response to poll request https://console.voicegain.ai/api-documentation#operation/asrRecognizeAsyncGet):

{ 
  "session":{
    "sessionId":"0-0kdt68dtx093llt7by8vgn0do3mt",
    "asyncMode":"REAL-TIME"
  },
  "result":{
    "status":"MATCH",
    "lastEvent":"RECOGNITION-COMPLETE",
   "alternatives":[
      {
        "utterance":"forty eight one six nine",
        "confidence":0.42858585715293884,
        "grammar":"my-grammar",
        "semanticTags":{"zip":"48169"}
       }
    ],
    "final":true
  },
  "phase":"DONE"
}

Related articles