Voicegain back-end can work with the following audio formats:
- PCMA: 8-bit a-law logarithmic
- PCMU: 8-bit u-law logarithmic - this is the G711 format
- L8: linear PCM 8-bit mono audio (signed little-endian)
- L16: linear PCM 16-bit mono audio (signed little-endian) - this is the format normally used in WAV files
- F32: linear PCM 32-bit floating point mono audio (little-endian) - audio captured from a web browser is typically in this format
- FLAC: https://xiph.org/flac/documentation.html
- MP3: mp3 encoding
Specifying Audio Parameters in API Requests
Audio field in API request will look e.g. like this:
If the provided audio does not have an audio header (e.g. RIFF) then format, rate, channel(s) are required.
If the audio has a header, there is still advantage to provide format, rate, channel(s) if known because that speeds up the time for setup of the session - the audio header does not need to be examined.
Notes about the values:
- format - was already described in the first section
- rate - the sample rate
- channels - available from version 1.16.0 - possible values are "mono", "stereo" - specified the number of channels in the source audio
- channel - deprecated starting from version 1.16.0 - possible values "left", "right", "mix", "mono", "stereo" - was used to indicate both the number channels in the audio as well as channel selection for processing
- capture - if set to true then the audio that has been processed will be captured. The uuid of the captured audio will be returned in the response. The uuid can be used to download the captured audio using this command: https://api.voicegain.ai/v1/data/<uuid>/file
Starting from version 1.16.0 channel selection from processing will be done using audioChannelSelector field. Possible values are "left", "right", "mix".