It All Starts with an Accurate Automatic Transcription

by Natalie Chilton
dots background analytics

Thanks to advances in technology such as deep learning, we find ourselves on the heels of an automated revolution: seemingly everything in our lives is becoming more automatic. According to Pew, 65 percent of Americans predict that “robots and computers will ‘definitely’ or ‘probably’ do much of the work currently done by humans” by 2065. VoiceBase is right at the forefront of this revolution with our trailblazing speech-to-text API, allowing you to accurately and automatically transcribe conversations like never before. 

Automatic Transcription is at the heart of actions such as automatic video captions, visual voicemail, and richer voice insights and predictive analytics that will help you provide a more seamless experience for the user. So, how does it work? Take a look at the features that make our speech recognition solution the best in the industry.


Once you upload a recording to the VoiceBase API, your audio will automatically be returned as a fully time-aligned, highly accurate transcript in a TXT, WORD, RTF, or SRT format. From there, you can navigate the text with our easy-to-use click-n-play plugin that allows you to search the term you’re looking for and jump right into the spot where that word was spoken. Recorded content is now keyword searchable.


Per Word Confidence is a measure of the acoustic similarity between the sound in the audio recording and the word that was transcribed. The Per Word Confidence score allows a user to locate certain key words and grade them for accuracy from the transcription. This offers the opportunity to study content more granularly, which can provide a deeper understanding of valuable insights.


This feature allows users to surface specific words or phrases based on their time in a recording. Each time stamp is combined with a source URL and is applied to all of their stored recordings.

VoiceBase Player

Sometimes you get comfortable with tools you have already been using and are familiar with. VoiceBase’s Player allows you to continue using your player of choice but with the added benefits of our UI components. Enhanced features include an interactive transcript, automated keyword and topic extraction, user-defined keyword spotting, ad-hoc search, and transcript editor.


It is now easier than ever to know who said what in a recording. With Stereo Speaker ID, you can automatically label the many speakers. There are several ways you can set up speaker identification, and other features such as keyword extraction and keyword spotting will include this feature.


This tool can be used to create video captions from our automatic speech recognition technology within minutes. VoiceBase supports industry-standards closed-captioning formats (SRT and DFXP) that can be used with commercial video players and video delivery systems.


You can improve automatic transcription accuracy by inputting custom words into the speech recognition API. Some common examples include pronouns, company names, product names, and acronyms to help improve accuracy and keyword spotting.

Accurate automatic transcription of your recordings is just the basis of the many functionalities that our speech to text solution has to offer. Want to find out what else VoiceBase can do for you and your business? Contact us today!

More From the big voice blog

Preparing Your Contact Center for the Holidays

Preparing Your Contact Center for the Holidays

When it comes to purchases, voice is still king, with over half of consumers preferring to call a brand when issues arise or when making high value purchases. To help your call center be successful this season, we have some tips to keep your head above water.

read more

More content like this: