Practical Paralinguistics: Voice Intelligence in the Enterprise

by Jeff Shukis

Real-World Applications for Voice Intelligence

“Understanding paralinguistics is critical to understanding meaning,” Jeff Shukis, VoiceBase CTO.

Living in a world where 1.7 MBs of data is created every second, it is important businesses leverage each data type available to increase innovation. With the digital world evolving at a pace we cannot perceive, a wide range of sources are being used to derive meaning from data. There are many different sources of getting the true meaning of data, and the voice is one of a few that has been overlooked for a long time. For the past 60 years, different approaches have been taken to derive meaning from voice data.



Practical Paralinguistics 

 In the 1950s Geroge L. Trager pioneered the concept of “Paralinguistics,” which went on to be used in different activities such as trying to find a connection between culture and language. In the modern world, one of the many uses of paralinguistics is deriving meaning from voice data.

There is so much data embedded in our voice such as pitch, tone, speed, volume, etc, that is more critical to understanding what is being said, than the words themselves.

To elaborate this, let’s try picturing a judge going through case transcripts. The judge comes across this context:

 ‘Did you kill Colonel Mustard?’

 ‘Yes, I killed him’


There is nothing in this text that can give the judge a clue that what is being said is true or false. What if the judge had access to the voice or audio? That would likely give more information than the text itself. The judge is more likely to get access to other metrics that the text didn’t give such as speed, tone, mood, and maybe sarcasm. This opens up an entire pool of possibilities.  

What process therefore can we use to capture these metrics and integrate them into our speech analytics applications? Jeff Shukis, the Senior Vice President of Operations (SVP) and CTO at VoiceBase elaborates on how we integrate paralinguistics into our voice analytics operations. 

Before getting into the technical stuff, let’s define paralinguistics.


Paralinguistics are the components of spoken communication that concentrate on how something is said, as opposed to what is said. The elements of paralinguistics include tone, pace, sarcasm, pitch, timing, inflection, and timing. 

During our journey in discovering the power of paralinguistics in voice analytics, we made several discoveries including:

  • It has been almost impossible to harness the power of speech analytics without using paralinguistics
  • The analysis of the entire conversation makes statement-level linguistics less necessary
  • Certain simple paralinguistics metrics are more stable and of high value in call-level compared to high-value metrics such as sarcasm


Challenges We Faced During the Process 

The journey to discovering how paralinguistics could blend in with voice analytics was one that had several challenges. Some of the major challenges we had to face include:


  • The process was complicated and time-consuming
  • A lot of funds and resources were pumped into the project to make it work
  • The concept of paralinguistics hasn’t been well explored, which made it more difficult

How we Made Paralinguistics Practical

As said earlier we failed in trying to find the meaning of high-value metrics due to their instability, contact dependency, and cultural variation. We, therefore, decided to invest in simpler paralinguistic signals that would provide valuable insights into the real world. These helped us come up with two outputs: Voice Features which are customized for data scientists, and conversion metrics for everyone. 

Let’s Talk About VoiceFeatures (For Data Scientists)

VoiceBase believes in innovation and openness, and therefore provides data scientists with an opportunity to utilize our voice features to build input models to achieve their end-goals. 

Most of the VoiceBase outputs are in the form  of JSON. For instance, the output below shows the transcript part of the JSON output, which has the ‘Words’ array embedded inside it. The words array contains different vital metrics such as the word-start, and stop time, the relative energy, and the frequency.

“Transcript”: {

“Words” {

“w”:  “Because”,

“s”:  “1880”,

“e”:  “2180”,

“v”:  “1.487”, Word level relative volume from 0-2

“frq”:  [ Two most-dominant “Lowish” frequencies in the acoustic spectrum

{          “e”: 0.931  “f” : 260.942 } e=Relative energy from 0-1

{           “e”: 0.347  “f” : 521.807 } f=Frequency in hertz





Note: The red markings are not part of the JSON output. They are meant to help you understand and internalize the concept. 

The s and e part of the transcript output represent the start and end time of the word, while the v is the word level relative volume, which mainly falls between 0-2. The frq highlights the two most-dominant “Lowish-frequencies”. Which are mainly important due to their predictiveness. 

For the business person, this sounds like noise. This is where call level metrics come in.


Conversation Metrics (For everyone)

Flipping the coin to call-level metrics opens up a lot of possibilities, mainly for businesses. These call level metrics are divided into several categories including talk time, rate, and overtalk metrics, the talk style, tone, and volume metrics, and sentiment metrics.  

Let’s see what these metrics are, and what importance they have in extracting insights from voice data. 


Talk Time, Rate, and Overtalk 

These are mainly calculated from start time to stop time. By looking at the start and stop time of certain words, you will get important insights such as pace, silence, overtalk, and other relevant information. In a call-center context, this is revolutionary and can help businesses improve their Agent and Customer experience. (AE/CE)


Talk Style, Tone, and Volume metrics

This group mainly includes changes in pitch, volume dynamics, and other related metrics. These are also very important in a call-center scenario. By analyzing the first third of a call, and the last third of a call, you can track changes in these metrics. By doing this you can highlight agent emotions or tone, which is helpful in improving customer experience.


Sentiment: Integrating Paralinguistics Into Real-life Applications


Voice Features

There are many other uses of voice features but the main one is the one we discussed earlier in the article; building predictive models.


Conversation Metrics

Help data scientists and businesses with analysis


Query-Based Call Categorization

Conversation metrics can be used to enhance query-based (rule-based) call categorization, which would not be effective when used alone. 


Example 1

Category: ‘Caller has a problem and leaves agitated’

Match where

‘Caller’ says ANY of (‘Problem’, ‘Issue’, failed) [0s, 30s]

VI: Flag any call where transcript meets the search criteria 

Let’s check our a second example where other metrics are integrated into the equation:


Example 2


Category: ‘Caller has a problem and leaves agitated’

Match where

‘Caller’ says ANY of (‘Problem’, ‘Issue’, failed) [0s,30s] 

and metrics: caller-intra-call-change-in-pitch > 1.06

and metrics: caller-intra-call-change-in-talk-rate >1.04

and metrics: caller-intra-call-change-in-relative-voice-volume-energy >1.05

and metrics: caller-overtalk-incident > 2


V2: Flag any call where transcript meets the search criteria and metrics show an increase in pitch during the call process, increased pace, and overtalk scenarios with the agents (At least twice)

From the above examples, it is clear that combining transcript words and conversation metrics can be useful in query-based categorization, as opposed to using only transcript words alone, or conversation metrics by themselves. 


Wrapping up:

Paralinguistics is an important element that we should take advantage of to improve our voice analytics applications thus find the meaning embedded in what we say. That is why VoiceBase has always been dedicated to venturing into the unexplored areas of paralinguistics, and how we can integrate them into real-life applications. There have been many breakthroughs, and if you would like to experience the infinite power of paralinguistics, you can check our voice intelligence products.



More From the Voice analytics blog

Predictive Analytics for Strategic Insights

Predictive Analytics for Strategic Insights

Predictive analytics is an advanced form of data mining that leverages machine learning to identify patterns in voice recordings, intuit a speaker’s intent, and predict a future outcome — be it a sale, account cancellation, or one of many customized “X” signals your clients might request.

read more