How to Get Formatted Text & Turns Transcripts

by Amelia Ortiz

At the core of any good voice analytics solution is a solid speech-to-text transcription. If your voice data isn’t being transcribed how you need it, what’s the point? There’s a myriad of factors to consider when dealing with Automatic Speech Recognition (ASR) technology solutions: accuracy and word error rate (WER), integration capabilities, and formatting and readability. 

VoiceBase operates with all of this in mind to provide users with speech-to-text results that make the most sense. The result? Two new transcription formats to provide optimal integration and readability for customers: Formatted Text Turns.

Details on the new transcription formats and how to get them are as follows:

    Formatted Text Transcript

    For Enterprise customers, VoiceBase’s Formatted Text makes it easier than ever to read & understand your speech-to-text transcriptions. This new format includes: speaker-separated turns with speaker roles on their own individual lines 

      How to get Formatted Text Results


      API Endpoint

      GET /v3/media/mediaId/transcript/formattedText



      Returns text format of transcript as described above in plain text

      Example of Formatted Text Transcription:

      Agent: Hi, this is Kesby Shipping Company, Bryan speaking, how can I help you?

      Customer: Hi, Brian, this is Henry. I think we spoke earlier, I got a quote from you guys and I need this. I need something shipped out, so I’m going to move forward with the company.

      Agent: OK, great. Yeah, Henry, I do remember you had those pinball machines, didn’t you?

      Customer: Yeah, the Star Wars and pinball machines shipped out.

      Agent: Very cool. OK, do you have your reference number on hand?

      Customer: Yeah, I do get the reference number is one four five

      Agent: Oh

      Customer: eight zero nine

      Agent: yes.

      Customer: three two.

      Agent: OK, got it. So this is going to five five five Stoney Way in Toronto, Canada, and it’s shipped from your business address in Monterrey. What it looks like the total here is one thousand four hundred fifty three dollars. Did you want to pay for that now with credit card?

      Customer: Yeah, that’s correct. I assume you can take credit card over the phone, right.

      Agent: Yes. Correct.

      Customer: Yeah. So have my card here. I can read the numbers if that works for you.

      Agent: Yep. Sounds good. What kind of card is it?

      Customer: This is a visa,

      Agent: OK, go ahead.

      Customer: OK. [redacted] number is [redacted] [redacted] [redacted] [redacted]

      Turns JSON Format

      For developers, the new Turns JSON results make it easier to integrate directly into your custom application. The Turns JSON Format Includes: speaker roles with timestamps on individual lines and conversation metrics.

        How to get Turns Results


        API Endpoint

        GET /v3/media/mediaId/transcript/includeAlternateFormat=turns



        It will include the “turns” section under “transcript” in the alternateFormats section

        Example of Turns JSON Results

        “transcript”: {
        “words” : [ … ],
        “turns” : [
            “speaker”: “Agent”,
             “text” : “Hi, this is c.s.v. shipping company. Brian speaking, how can I help you? “
            “s” : 344
            “e” : 3445
            “speaker”: “Caller”,
             “text” : “This is Henry a, we spoke earlier, I got a quote from you guys and I need
                this. I need something stepped outside. ‘d The company.”
            “s” : 3200
            “e” : 5000
        “alternateFormats”: [ … ]

        If you have any questions about the new transcription formats please don’t hesitate to reach out to VoiceBase today.

        More From the Voice analytics blog

        Predictive Analytics for Strategic Insights

        Predictive Analytics for Strategic Insights

        Predictive analytics is an advanced form of data mining that leverages machine learning to identify patterns in voice recordings, intuit a speaker’s intent, and predict a future outcome — be it a sale, account cancellation, or one of many customized “X” signals your clients might request.

        read more