[UPDATE: this article has been simplified by removing a use case with useSTOMP:true, we also added more details about OkHttp3 library for working with websockets.]
Request
Transcription request has to be made to https://api.voicegain.ai/v1/asr/transcribe/async
The body of the request will be:
{
"sessions": [
{
"asyncMode": "REAL-TIME",
"websocket": {
"adHoc": true,
"useSTOMP" : false,
"minimumDelay": 100
}
}
],
"audio": {
"source": { "stream": { "protocol": "WEBSOCKET" } },
"format": "F32",
"channel" : "mono",
"rate": 16000,
"capture": false
},
"settings": {
"asr": {
"noInputTimeout": 59999,
"completeTimeout": 0
}
}
}
The websocket specific parameters are:
- useSTOMP - set it to false to use simple websocket messages - STOMP is intended for broadcasting
- adHoc - we are not using a predefined websocket
- minimumDelay - the purpose of the minimum delay >0 is to handle the hypotheses rewrites on the server side - very low value of minimumDelay will result in a lot of corrections happening on the client side, but if fast response is critical then do set minimumDelay to 0
Some notes about the audio streaming websocket:
- the protocol must be "WEBSOCKET"
- channel must be "mono"
- capture may be set to true for debugging - if set to true then response will have the uuid of the captured audio - it can the be retrieved using this web method: GET https://api.voicegain.ai/v1/data/{uuid}/file
Response
The response will be e.g.:
{
"sessions": [
{
"sessionId": "0-0kfrdm3561ujwshczv51fownlfm5",
"asyncMode": "REAL-TIME",
"websocket": {
"url": "wss:/api.voicegain.ai/v1/0/plain/0-0kiavyb0108h87pipnttx8vcj7x0"
},
"audio": {
"stream": {
"websocketUrl": "wss://api.voicegain.ai/v1/0/socket/e5b22dc5-2a45-4525-9f12-55d4bd190e15"
}
}
}
Two websocket urls are returned:
- sessions[].websocket.url -- this will be used to receive the transcription results. The single-use url is used to establish the websocket connection.
- audio.stream.websocketUrl -- this will be used to stream the audio to the recognizer. Audio needs to be streamed using binary format. The format to be used is specified in the initial request - it must be mono. For available Audio formats see here.
Using javax.websocket
In Java (using javax.websocket), the connection to audio websocket would be established as follows, e.g.:
private void connectToWebSocket() {
WebSocketContainer container = ContainerProvider.getWebSocketContainer();
try {
container.connectToServer(this, websocketUrl);
} catch (DeploymentException | IOException ex) {
ex.printStackTrace();
}
}
and then sending the data in binary using the wss session (note - the data has to be sent in binary, not as websocket messages):
session.getBasicRemote().sendBinary(bb);
The messages with results of transcription have the same format described in this Knowledge Base Article.
Transcription can be be stopped by closing the audio streaming websocket. Alternatively, if completeTimeout is set to value > 0 then the timeout can determine when transcription stops.
Using OkHttp3
Alternatively, you can also use OkHttp3 v4 library that will work on Java and Android. Once you define the listener opening connection is simple:
// open audio streaming websocket
OkHttpClient audioClient = new OkHttpClient.Builder()
.readTimeout(0, TimeUnit.MILLISECONDS)
.build();
Request audioRequest = new Request.Builder()
.url(wssUrlStr)
.build();
MyListener audioListener = new MyListener("audio");
WebSocket audioWebSocket = audioClient.newWebSocket(audioRequest, audioListener);
Sending binary data is also very simple, e.g.:
boolean send(WebSocket ws, byte[] array, int offset, int length) throws Exception {
ByteBuffer bb = ByteBuffer.wrap(array,offset,length);
try {
ws.send(ByteString.of(bb));
return true;
} catch (IllegalStateException e) {
System.out.println("Assuming the other side closed the Websocket");
return false;
}
}
Example Code
You can see this code example for more details of how to use websockets with Voicegain API.
Comments
0 comments
Please sign in to leave a comment.