FreeSWITCH is a very capable telephony platform suitable for building various telephony applications. Some of those applications will rely speech-to-text conversion, for example: ACDs (automatic call distribution), IVRs, Voice-Bots, Real-Time Agent Assist, real-time conference call transcription, call monitoring, etc.
Voicegain Speech-to-Text platform can be used with FreeSWITCH in a variety of ways.
1. mod_unimrcp for IVRs
Voicegain STT platform has supported MRCP (Media Resource Control Protocol) for a long time now. Our ASR can be accessed using MRCP and we support both grammar-based recognition (e.g. GRXML) and large-vocabulary transcription. MRCP is a communication protocol designed to connect telephony based IVRs and Voice Bots with speech recognizers (ASR) and speech synthesizers (TTS).
FreeSWITCH can interact with MRCP based recognizers using the included mod_unimrcp module. Voicegain STT has been tested with mod_unimrcp and interfaces with it without problems. You can learn more about using Voicegain STT via mod_unimrcp here.
Voicegain supports MRCP both in the Cloud and on the Edge (on-prem). We will soon be releasing in OpenSource a recognizer plugin for unimrcp server that will give you even more options in deploying FreeSWITCH with Voicegain and MRCP.
2. Bridge into Voicegain Telephony Bot API
Voicegain provides a Telephony Bot API which is a callback API - similar in style to Twilio TwiML. You can place a call to Voicegain endpoint either using a phone number obtained from Voicegain or using a SIP endpoint unique to your Voicegain application. When a call arrives you will get a web callback and the response you will provide will determine actions that the Voicegain platform will perform, like e.g. play a prompt, recognize speech, detect DTMF, etc.
You can learn more about this API from the following blog posts:
- Voicegain releases Telephony Bot APIs for telephony IVRs and bots
- Easy How-To: Build a Voicebot using Voicegain, RASA, and AWS Lambda
- Easy Speech IVR for Outbound Calling using Voicegain and Twilio
If you have a FreeSWITCH application and you would like to recognize spoken speech you can bridge into Voicegain SIP endpoint and in a callback specify a prompt and the type of speech capture (grammar-based or large vocabulary). Once the recognition finishes you will get a callback and then you can either issue a disconnect command which will transfer call flow back to your FreesSWITCH app, or you can continue with additional questions and recognitions on Voicegain platform as needed.
Below is an example of a simple interaction with 4 participants:
- Your control logic for FS application, e.g., a Lua script
- Webservice that will handle callbacks from Voicegain Telephone Bot API. It has to be able to maintain session data.
- Voicegain Telephone Bot API platform
3. mod_voicegain for using Voicegain ASR from FS apps/scripts
This is still not Generally Available - please contact us if you are interested in testing.
mod_voicegain will give you capabilities similar to using mod_unimrcp with Voicegain but without the whole overhead of using an MRCP protocol - mod_voicegain talks directly to Voicegain ASR.
mod_voicegain taps into the FreeSWITCH inbound audio stream and sends the audio data to Voicegain ASR in the Cloud or on the Edge. Voicegain ASR processes the audio according to the invocation parameters specified in the data argument. It then communicates the result of transcription or recognition in an Event.
mod_voicegain installs on FreeSWITCH as an appplication module and can be invoked as a such, e.g.:
<action application="vg_asr_start" data=""/>
or from LUA script:
Results will always be returned as a FreeSWITCH event but it is also possible to get the results in a callback to the url specified in callback.uri
The FreeSWITCH event will be of custom type (Event-Name: CUSTOM) and Event-Subclass will be "voicegain_asr_update". The relevant payload will be in the "ASR-Response" field formatted as JSON.
You can read more about mod_voicegain is this Knowledge Base article.
4. mod_vg_tap_ws for real-time transcription
mod_vg_tap_ws has been developed with applications like Real-Time Agent Assist in mind. These apps need access to the audio stream from a FreeSWITCH call but do not otherwise need to interact with FreeSWITCH (unlike IVR and Voice-Bots).
mod_vg_tap_ws installs as an application module and has simple commands to start/stop streaming to Voicegain Speech-to-Text engine.
The start command can specify the following destinations:
- websocket URL(s) - returned from a POST command that starts new speech-to-text session
- http URL of your webservice which will return the websocket URL to connect to.
The results from transcription are not returned to a FreeSWITCH app but will be delivered to the destination specified when starting speech-to-text session - the results can be delivered via websocket, polling, or callback.
You can read more about mod_vg_tap_ws here.