Supported Formats in Offline Transcription

For offline transcription Voicegain relies on the ffmpeg for converting input audio to the format suitable for the Speech-to-Text engine. Because of that we support a lot of audio formats in this scenario.
The complete list or supported audio formats is shown below.

8SVX exponential

8SVX fibonacci

AAC (Advanced Audio Coding)

AAC (Advanced Audio Coding) (codec aac)

AAC LATM (Advanced Audio Coding LATM syntax)

ADPCM 4X Movie

ADPCM AmuseGraphics Movie

ADPCM Argonaut Games

ADPCM CDROM XA

ADPCM Creative Technology

ADPCM Electronic Arts

ADPCM Electronic Arts Maxis CDROM XA

ADPCM Electronic Arts R1

ADPCM Electronic Arts R2

ADPCM Electronic Arts R3

ADPCM Electronic Arts XAS

ADPCM IMA AMV

ADPCM IMA Capcom's MT Framework

ADPCM IMA CRYO APC

ADPCM IMA Cunning Developments

ADPCM IMA Dialogic OKI

ADPCM IMA Duck DK3

ADPCM IMA Duck DK4

ADPCM IMA Electronic Arts EACS

ADPCM IMA Electronic Arts SEAD

ADPCM IMA Eurocom DAT4

ADPCM IMA Funcom ISS

ADPCM IMA High Voltage Software ALP

ADPCM IMA Loki SDL MJPEG

ADPCM IMA QuickTime

ADPCM IMA Radical

ADPCM IMA Simon & Schuster Interactive

ADPCM IMA Ubisoft APM

ADPCM IMA WAV

ADPCM IMA Westwood

ADPCM Microsoft

ADPCM MTAF

ADPCM Nintendo Gamecube AFC

ADPCM Nintendo Gamecube DTK

ADPCM Nintendo THP

ADPCM Nintendo THP (little-endian)

ADPCM Playstation

ADPCM Shockwave Flash

ADPCM Sound Blaster Pro 2.6-bit

ADPCM Sound Blaster Pro 2-bit

ADPCM Sound Blaster Pro 4-bit

ADPCM Yamaha

ADPCM Yamaha AICA

ADPCM Zork

ADU (Application Data Unit) MP3 (MPEG audio layer 3)

ADU (Application Data Unit) MP3 (MPEG audio layer 3) (codec mp3adu)

ALAC (Apple Lossless Audio Codec)

Amazing Studio Packed Animation File Audio

AMR-NB (Adaptive Multi-Rate NarrowBand) (codec amr_nb)

AMR-WB (Adaptive Multi-Rate WideBand) (codec amr_wb)

aptX (Audio Processing Technology for Bluetooth)

aptX HD (Audio Processing Technology for Bluetooth)

ATRAC1 (Adaptive TRansform Acoustic Coding)

ATRAC3 (Adaptive TRansform Acoustic Coding 3)

ATRAC3 AL (Adaptive TRansform Acoustic Coding 3 Advanced Lossless)

ATRAC3+ (Adaptive TRansform Acoustic Coding 3+) (codec atrac3p)

ATRAC3+ AL (Adaptive TRansform Acoustic Coding 3+ Advanced Lossless) (codec atrac3pal)

ATRAC9 (Adaptive TRansform Acoustic Coding 9)

ATSC A/52A (AC-3)

ATSC A/52A (AC-3) (codec ac3)

ATSC A/52B (AC-3, E-AC-3)

Bink Audio (DCT)

Bink Audio (RDFT)

Cook / Cooker / Gecko (RealAudio G2)

CRI HCA

DCA (DTS Coherent Acoustics) (codec dts)

Delphine Software International CIN audio

Digital Speech Standard - Standard Play mode (DSS SP)

Discworld II BMV audio

Dolby E

DPCM Gremlin

DPCM id RoQ

DPCM Interplay

DPCM Sol

DPCM Squareroot-Delta-Exact

DPCM Xan

DPCM Xilam DERF

DSD (Direct Stream Digital), least significant bit first

DSD (Direct Stream Digital), least significant bit first, planar

DSD (Direct Stream Digital), most significant bit first

DSD (Direct Stream Digital), most significant bit first, planar

DSP Group TrueSpeech

DST (Digital Stream Transfer)

EVRC (Enhanced Variable Rate Codec)

FLAC (Free Lossless Audio Codec)

G.722 ADPCM (codec adpcm_g722)

G.723.1

G.726 ADPCM (codec adpcm_g726)

G.726 ADPCM little-endian (codec adpcm_g726le)

G.729

GSM

GSM Microsoft variant

HCOM Audio

IAC (Indeo Audio Coder)

iLBC (Internet Low Bitrate Codec)

iLBC (Internet Low Bitrate Codec) (codec ilbc)

IMC (Intel Music Coder)

Interplay ACM

libgsm GSM (codec gsm)

libgsm GSM Microsoft variant (codec gsm_ms)

libopus Opus (codec opus)

libspeex Speex (codec speex)

libvorbis (codec vorbis)

LucasArts VIMA audio

MACE (Macintosh Audio Compression/Expansion) 3:1

MACE (Macintosh Audio Compression/Expansion) 6:1

MLP (Meridian Lossless Packing)

Monkey's Audio

MP1 (MPEG audio layer 1)

MP1 (MPEG audio layer 1) (codec mp1)

MP2 (MPEG audio layer 2)

MP2 (MPEG audio layer 2) (codec mp2)

MP3 (MPEG audio layer 3)

MP3 (MPEG audio layer 3) (codec mp3)

MP3onMP4

MP3onMP4 (codec mp3on4)

MPEG-4 Audio Lossless Coding (ALS) (codec mp4als)

Musepack SV7 (codec musepack7)

Musepack SV8 (codec musepack8)

Nellymoser Asao

On2 Audio for Video Codec (codec avc)

OpenCORE AMR-NB (Adaptive Multi-Rate Narrow-Band) (codec amr_nb)

OpenCORE AMR-WB (Adaptive Multi-Rate Wide-Band) (codec amr_wb)

Opus

PCM 16.8 floating point little-endian

PCM 24.0 floating point little-endian

PCM 32-bit floating point big-endian

PCM 32-bit floating point little-endian

PCM 64-bit floating point big-endian

PCM 64-bit floating point little-endian

PCM A-law / G.711 A-law

PCM Archimedes VIDC

PCM D-Cinema audio signed 24-bit

PCM mu-law / G.711 mu-law

PCM signed 16|20|24-bit big-endian for Blu-ray media

PCM signed 16|20|24-bit big-endian for DVD media

PCM signed 16-bit big-endian

PCM signed 16-bit big-endian planar

PCM signed 16-bit little-endian

PCM signed 16-bit little-endian planar

PCM signed 20-bit little-endian planar

PCM signed 24-bit big-endian

PCM signed 24-bit little-endian

PCM signed 24-bit little-endian planar

PCM signed 32-bit big-endian

PCM signed 32-bit little-endian

PCM signed 32-bit little-endian planar

PCM signed 64-bit big-endian

PCM signed 64-bit little-endian

PCM signed 8-bit

PCM signed 8-bit planar

PCM unsigned 16-bit big-endian

PCM unsigned 16-bit little-endian

PCM unsigned 24-bit big-endian

PCM unsigned 24-bit little-endian

PCM unsigned 32-bit big-endian

PCM unsigned 32-bit little-endian

PCM unsigned 8-bit

QCELP / PureVoice

QDesign Music Codec 1

QDesign Music Codec 2

RealAudio 1.0 (14.4K) (codec ra_144)

RealAudio 2.0 (28.8K) (codec ra_288)

RealAudio Lossless

RealAudio SIPR / ACELP.NET

RFC 3389 comfort noise generator

SBC (low-complexity subband codec)

SEGA CRI ADX ADPCM

Shorten

Sierra VMD audio

Sipro ACELP.KELVIN

Siren

Smacker audio (codec smackaudio)

SMPTE 302M

Sonic

TAK (Tom's lossless Audio Kompressor)

TrueHD

TTA (True Audio)

Ulead DV Audio

Vorbis

Voxware MetaSound

VQF TwinVQ

Wave synthesis pseudo-codec

WavPack

Westwood Audio (SND1) (codec westwood_snd1)

Windows Media Audio 1

Windows Media Audio 2

Windows Media Audio 9 Professional

Windows Media Audio Lossless

Windows Media Audio Voice

Xbox Media Audio 1

Xbox Media Audio 2

Real-Time Transcription

For real-time transcription we support a smaller set of audio formats:

PCMA: 8-bit a-law logarithmic
PCMU: 8-bit u-law logarithmic - this is the G711 format
L8: linear PCM 8-bit mono audio (signed little-endian)
L16: linear PCM 16-bit mono audio (signed little-endian) - this is the format normally used in WAV files
F32: linear PCM 32-bit floating point mono audio (little-endian) - audio captured from a web browser is typically in this format

Specifying Audio Parameters in API Requests

Audio field in API request will look e.g. like this:

"audio": {
  "source": {
    "inline": {
      "data": "dGhpcyBpcyB0aGUgZGF0YSB0aGF0IGlzIHRvIGJlIGVuY29kZWQgaW4gYmFzZSA2NA=="
    }
  },
  "format": "L16",
  "rate": 16000,
  "channels": "stereo",
  "capture": false
},

If the provided audio does not have an audio header (e.g. RIFF) then format, rate, channels are required.

For real-time transcription, even if the audio has a header, there is still advantage to provide format, rate, channels if known because that speeds up the time for setup of the session - the audio header does not need to be examined.

Notes about the values:

format - was already described in the previous section
rate - the sample rate in Hz
channels - possible values are "mono", "stereo" - specified the number of channels in the source audio
capture - if set to true then the audio that has been processed will be captured. The uuid of the captured audio will be returned in the response. The uuid can be used to download the captured audio using this command: https://api.voicegain.ai/v1/data/<uuid>/file

Starting from version 1.16.0 channel selection from processing will be done using audioChannelSelector field. Possible values are "left", "right", "mix".

Supported Audio Formats

Supported Formats in Offline Transcription

Real-Time Transcription

Specifying Audio Parameters in API Requests

Comments

Supported Formats in Offline Transcription

Real-Time Transcription

Specifying Audio Parameters in API Requests

Related articles