Speech Analytics API Overview – Voicegain

Voicegain Speech Analytics (SA) API supports variety of analytics tasks performed on the audio or the transcript of that audio. The features supported by our SA API were chosen to support our target main use case which is processing Call Center calls.

Things that Speech Analytics can do now (from release 1.22.0)

In the current release only OFF-LINE Speech Analytics are supported.

Note, in the list below we do not include things that can be obtained also from our Transcribe API, like: transcript, decibel values, audiozones, etc. These however will be accessible from the Speech Analytics response.

Per channel analytics:

gender - likely gender of the speaker based on the voice characteristics. Currently either "male" or "female".
emotion - Both totals over the entire call and a list of values computed at multiple places in the transcript. Each item will contain values of:
- sentiment - from -1.0 (mad/angry) to +1.0 (happy/satisfied)
- mood - a map with estimated values (range 0.0 to 1.0) for the following moods: "neutral" "calm" "happy" "sad" "angry" "fearful" "disgust" "surprised"
- location - start and end in msec and index of the word
Named Entities recognized in the call. This will be a list with the entity type and the location in the call. NER values that are supported are:
- CARDINAL - Numerals that do not fall under another type.
- DATE - Absolute or relative dates or periods.
- EVENT - Named hurricanes, battles, wars, sports events, etc.
- FAC - Buildings, airports, highways, bridges, etc.
- GPE - Countries, cities, states.
- NORP - Nationalities or religious or political groups.
- MONEY - Monetary values, including unit.
- ORDINAL - "first", "second", etc.
- ORG - Companies, agencies, institutions, etc.
- PERCENT - Percentage, including "%".
- PERSON - People, including fictional.
- QUANTITY - Measurements, as of weight or distance.
- TIME - Named documents made into laws.
Keywords - list of keywords or keyword groups recognized in the call. Keywords to be recognized can easily to configured from examples.
Profanity - this is essentially a predefined keyword group
talk metrics - things like maximum and average talk streak, talk rate, energy
overtalk metrics - overtalk happens if this speaker starts speaking while the other speaker is already speaking.

Global analytics:

silence metrics - Defined as time when none of the channels is speaking. Note: Only the Agent is assumed to be in control of the speaking time. This a simplification, but it is difficult to determine of any silence was caused by the caller and was unavoidable.
word cloud frequencies - smart word cloud data with stop words removed and word variations collapsed before computing frequencies

Speech Analytics features coming soon

REAL-TIME Speech Analytics will be available in the near future. Soon we also plan to release Score Card support for Speech Analytics.

Per channel analytics:

age - estimated age of the speaker based on the voice characteristics. Three possible values: "young-adult" "senior" "unknown"
phrases - list of phrases or phrase groups recognized in the call. These are identified using NLU algorithms - essentially the same as used for identifying NLU intents. Phrases to be recognized can be configured from examples.
pitch statistics will be added to talk metrics

Supported audio

Speech Analytics API supports the following types of audio input:

2-channel (stereo) audio as typically found in call centers where the Caller voice is recorded in one channel and the Agent voice is recorded in the other channel. Some metrics, like overtalk e.g., can only be computed if the input audio is of this type.
1 channel audio with two speakers - for this audio type diarization will be performed to separate the two speakers. The per-channels analytics will be performed after diarization. Overtalk metrics are not available for this use case.

Things that Speech Analytics can do now (from release 1.22.0)

Speech Analytics features coming soon

Supported audio

Related articles