Supported Formats in Offline Transcription
For offline transcription Voicegain relies on the ffmpeg for converting input audio to the format suitable for the Speech-to-Text engine. Because of that we support a lot of audio formats in this scenario.
The complete list or supported audio formats is shown below.
8SVX exponential |
8SVX fibonacci |
AAC (Advanced Audio Coding) |
AAC (Advanced Audio Coding) (codec aac) |
AAC LATM (Advanced Audio Coding LATM syntax) |
ADPCM 4X Movie |
ADPCM AmuseGraphics Movie |
ADPCM Argonaut Games |
ADPCM CDROM XA |
ADPCM Creative Technology |
ADPCM Electronic Arts |
ADPCM Electronic Arts Maxis CDROM XA |
ADPCM Electronic Arts R1 |
ADPCM Electronic Arts R2 |
ADPCM Electronic Arts R3 |
ADPCM Electronic Arts XAS |
ADPCM IMA AMV |
ADPCM IMA Capcom's MT Framework |
ADPCM IMA CRYO APC |
ADPCM IMA Cunning Developments |
ADPCM IMA Dialogic OKI |
ADPCM IMA Duck DK3 |
ADPCM IMA Duck DK4 |
ADPCM IMA Electronic Arts EACS |
ADPCM IMA Electronic Arts SEAD |
ADPCM IMA Eurocom DAT4 |
ADPCM IMA Funcom ISS |
ADPCM IMA High Voltage Software ALP |
ADPCM IMA Loki SDL MJPEG |
ADPCM IMA QuickTime |
ADPCM IMA Radical |
ADPCM IMA Simon & Schuster Interactive |
ADPCM IMA Ubisoft APM |
ADPCM IMA WAV |
ADPCM IMA Westwood |
ADPCM Microsoft |
ADPCM MTAF |
ADPCM Nintendo Gamecube AFC |
ADPCM Nintendo Gamecube DTK |
ADPCM Nintendo THP |
ADPCM Nintendo THP (little-endian) |
ADPCM Playstation |
ADPCM Shockwave Flash |
ADPCM Sound Blaster Pro 2.6-bit |
ADPCM Sound Blaster Pro 2-bit |
ADPCM Sound Blaster Pro 4-bit |
ADPCM Yamaha |
ADPCM Yamaha AICA |
ADPCM Zork |
ADU (Application Data Unit) MP3 (MPEG audio layer 3) |
ADU (Application Data Unit) MP3 (MPEG audio layer 3) (codec mp3adu) |
ALAC (Apple Lossless Audio Codec) |
Amazing Studio Packed Animation File Audio |
AMR-NB (Adaptive Multi-Rate NarrowBand) (codec amr_nb) |
AMR-WB (Adaptive Multi-Rate WideBand) (codec amr_wb) |
aptX (Audio Processing Technology for Bluetooth) |
aptX HD (Audio Processing Technology for Bluetooth) |
ATRAC1 (Adaptive TRansform Acoustic Coding) |
ATRAC3 (Adaptive TRansform Acoustic Coding 3) |
ATRAC3 AL (Adaptive TRansform Acoustic Coding 3 Advanced Lossless) |
ATRAC3+ (Adaptive TRansform Acoustic Coding 3+) (codec atrac3p) |
ATRAC3+ AL (Adaptive TRansform Acoustic Coding 3+ Advanced Lossless) (codec atrac3pal) |
ATRAC9 (Adaptive TRansform Acoustic Coding 9) |
ATSC A/52A (AC-3) |
ATSC A/52A (AC-3) (codec ac3) |
ATSC A/52B (AC-3, E-AC-3) |
Bink Audio (DCT) |
Bink Audio (RDFT) |
Cook / Cooker / Gecko (RealAudio G2) |
CRI HCA |
DCA (DTS Coherent Acoustics) (codec dts) |
Delphine Software International CIN audio |
Digital Speech Standard - Standard Play mode (DSS SP) |
Discworld II BMV audio |
Dolby E |
DPCM Gremlin |
DPCM id RoQ |
DPCM Interplay |
DPCM Sol |
DPCM Squareroot-Delta-Exact |
DPCM Xan |
DPCM Xilam DERF |
DSD (Direct Stream Digital), least significant bit first |
DSD (Direct Stream Digital), least significant bit first, planar |
DSD (Direct Stream Digital), most significant bit first |
DSD (Direct Stream Digital), most significant bit first, planar |
DSP Group TrueSpeech |
DST (Digital Stream Transfer) |
EVRC (Enhanced Variable Rate Codec) |
FLAC (Free Lossless Audio Codec) |
G.722 ADPCM (codec adpcm_g722) |
G.723.1 |
G.726 ADPCM (codec adpcm_g726) |
G.726 ADPCM little-endian (codec adpcm_g726le) |
G.729 |
GSM |
GSM Microsoft variant |
HCOM Audio |
IAC (Indeo Audio Coder) |
iLBC (Internet Low Bitrate Codec) |
iLBC (Internet Low Bitrate Codec) (codec ilbc) |
IMC (Intel Music Coder) |
Interplay ACM |
libgsm GSM (codec gsm) |
libgsm GSM Microsoft variant (codec gsm_ms) |
libopus Opus (codec opus) |
libspeex Speex (codec speex) |
libvorbis (codec vorbis) |
LucasArts VIMA audio |
MACE (Macintosh Audio Compression/Expansion) 3:1 |
MACE (Macintosh Audio Compression/Expansion) 6:1 |
MLP (Meridian Lossless Packing) |
Monkey's Audio |
MP1 (MPEG audio layer 1) |
MP1 (MPEG audio layer 1) (codec mp1) |
MP2 (MPEG audio layer 2) |
MP2 (MPEG audio layer 2) (codec mp2) |
MP3 (MPEG audio layer 3) |
MP3 (MPEG audio layer 3) (codec mp3) |
MP3onMP4 |
MP3onMP4 (codec mp3on4) |
MPEG-4 Audio Lossless Coding (ALS) (codec mp4als) |
Musepack SV7 (codec musepack7) |
Musepack SV8 (codec musepack8) |
Nellymoser Asao |
On2 Audio for Video Codec (codec avc) |
OpenCORE AMR-NB (Adaptive Multi-Rate Narrow-Band) (codec amr_nb) |
OpenCORE AMR-WB (Adaptive Multi-Rate Wide-Band) (codec amr_wb) |
Opus |
PCM 16.8 floating point little-endian |
PCM 24.0 floating point little-endian |
PCM 32-bit floating point big-endian |
PCM 32-bit floating point little-endian |
PCM 64-bit floating point big-endian |
PCM 64-bit floating point little-endian |
PCM A-law / G.711 A-law |
PCM Archimedes VIDC |
PCM D-Cinema audio signed 24-bit |
PCM mu-law / G.711 mu-law |
PCM signed 16|20|24-bit big-endian for Blu-ray media |
PCM signed 16|20|24-bit big-endian for DVD media |
PCM signed 16-bit big-endian |
PCM signed 16-bit big-endian planar |
PCM signed 16-bit little-endian |
PCM signed 16-bit little-endian planar |
PCM signed 20-bit little-endian planar |
PCM signed 24-bit big-endian |
PCM signed 24-bit little-endian |
PCM signed 24-bit little-endian planar |
PCM signed 32-bit big-endian |
PCM signed 32-bit little-endian |
PCM signed 32-bit little-endian planar |
PCM signed 64-bit big-endian |
PCM signed 64-bit little-endian |
PCM signed 8-bit |
PCM signed 8-bit planar |
PCM unsigned 16-bit big-endian |
PCM unsigned 16-bit little-endian |
PCM unsigned 24-bit big-endian |
PCM unsigned 24-bit little-endian |
PCM unsigned 32-bit big-endian |
PCM unsigned 32-bit little-endian |
PCM unsigned 8-bit |
QCELP / PureVoice |
QDesign Music Codec 1 |
QDesign Music Codec 2 |
RealAudio 1.0 (14.4K) (codec ra_144) |
RealAudio 2.0 (28.8K) (codec ra_288) |
RealAudio Lossless |
RealAudio SIPR / ACELP.NET |
RFC 3389 comfort noise generator |
SBC (low-complexity subband codec) |
SEGA CRI ADX ADPCM |
Shorten |
Sierra VMD audio |
Sipro ACELP.KELVIN |
Siren |
Smacker audio (codec smackaudio) |
SMPTE 302M |
Sonic |
TAK (Tom's lossless Audio Kompressor) |
TrueHD |
TTA (True Audio) |
Ulead DV Audio |
Vorbis |
Voxware MetaSound |
VQF TwinVQ |
Wave synthesis pseudo-codec |
WavPack |
Westwood Audio (SND1) (codec westwood_snd1) |
Windows Media Audio 1 |
Windows Media Audio 2 |
Windows Media Audio 9 Professional |
Windows Media Audio Lossless |
Windows Media Audio Voice |
Xbox Media Audio 1 |
Xbox Media Audio 2 |
Real-Time Transcription
For real-time transcription we support a smaller set of audio formats:
- PCMA: 8-bit a-law logarithmic
- PCMU: 8-bit u-law logarithmic - this is the G711 format
- L8: linear PCM 8-bit mono audio (signed little-endian)
- L16: linear PCM 16-bit mono audio (signed little-endian) - this is the format normally used in WAV files
- F32: linear PCM 32-bit floating point mono audio (little-endian) - audio captured from a web browser is typically in this format
Specifying Audio Parameters in API Requests
Audio field in API request will look e.g. like this:
"audio": {
"source": {
"inline": {
"data": "dGhpcyBpcyB0aGUgZGF0YSB0aGF0IGlzIHRvIGJlIGVuY29kZWQgaW4gYmFzZSA2NA=="
}
},
"format": "L16",
"rate": 16000,
"channels": "stereo",
"capture": false
},
If the provided audio does not have an audio header (e.g. RIFF) then format, rate, channels are required.
For real-time transcription, even if the audio has a header, there is still advantage to provide format, rate, channels if known because that speeds up the time for setup of the session - the audio header does not need to be examined.
Notes about the values:
- format - was already described in the previous section
- rate - the sample rate in Hz
- channels - possible values are "mono", "stereo" - specified the number of channels in the source audio
- capture - if set to true then the audio that has been processed will be captured. The uuid of the captured audio will be returned in the response. The uuid can be used to download the captured audio using this command: https://api.voicegain.ai/v1/data/<uuid>/file
Starting from version 1.16.0 channel selection from processing will be done using audioChannelSelector field. Possible values are "left", "right", "mix".
Comments
0 comments
Please sign in to leave a comment.