VoiceBase was conceived to do for voice what Google did for text: make it instantly searchable and shareable by creating a rich, queryable database. Our effective speech analytics begins with an accurate, automated transcript which enterprises can then index and analyze to deliver actionable insights.
Facilitate Discovery
Improve User Experience
Create a Valuable Archive
- Transcription
- Stereo Speaker ID
- Custom Vocabulary
- Number Formatting
- Instant Search
Languages

Features For Any Use Case
1 2 3 4 5 6 7 8 | curl https://apis.voicebase.com/v3/media \ --form media=@recording.mp3 \ --form configuration='{ "speechModel" : { "language" : "en-US" } }' \ --header "Authorization: Bearer ${TOKEN}" |
Select a Language
Use the language configuration parameter to set the language. Omitting the parameter defaults the language to U.S. English (en-US
).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | curl https://apis.voicebase.com/v3/definition/vocabularies/earningsCalls \ --request PUT \ --header "Content-Type: application/json" \ --header "Authorization: Bearer ${TOKEN}" \ --data '{ "vocabularyName": "earningsCalls", "terms": [ { "soundsLike": [ "A.F.F.O." ], "term": "AFFO", "weight": 2 }, { "soundsLike": [ "Aypack" ], "term": "APAC", "weight": 2 }, { "term": "CapEx" } ] }' |
Add Custom Vocabulary
Accurately spot non dictionary terms like:
• Jargon
• Proper Nouns
• Acronyms
• Hyphenated words
• Multi-word phrases.
1 2 3 4 5 6 7 8 9 10 11 | curl https://apis.voicebase.com/v3/media \ --form media=@recording.mp3 \ --form configuration='{ "ingest": { "stereo": { "left" : { "speakerName": "Customer" }, "right": { "speakerName": "Agent" } } } }' \ --header "Authorization: Bearer ${TOKEN}" |
Define Channels
Recording and processing calls in stereo can significantly improve transcription accuracy and analytical insights. To realize the benefit, each speaker is recorded on a different channel (left or right), and the speaker metadata is provided to VoiceBase when uploading the recording.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | 1 00:00:02,76 --> 00:00:05,08 Agent: Well this is Michael thank you for calling A.B.C. 2 00:00:05,08 --> 00:00:07,03 Cable services. How may I help you today. 3 00:00:08,28 --> 00:00:11,93 Customer: Hi I'm calling because I'm interested in buying new cable services. 4 00:00:12,64 --> 00:00:16,43 Agent: OK great let's get started. |
Format for Captioning
VoiceBase can generate subtitles or closed captions for your video project, by allowing you to retrieve the transcript of your audio or video file using the WebVTT or SubRip Text (SRT) format.
No special configuration is required. All transcripts are always available in four formats: JSON word-by-word, plain text, WebVTT and SRT.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | { "speechModel" : { "language" : "en-US", "extensions" : [ "voicemail" ] }, "knowledge": { "enableDiscovery" : false }, "transcript": { "formatting" : { "enableNumberFormatting" : true } }, "publish":{ "callbacks": [ { "url": "https://example.com/transcription", "method": "POST", "type": "analytics", "include":[ "transcript" ] } ] } } |
Optimize for Voicemail
VoiceBase is able to return automatic transcripts for voicemail recordings which can then be delivered via email or SMS. Here is an optimized configuration for fast and accurate voicemail transcription. This includes features that benefit shorter forms of audio, however they are all optional.
- Disable Phrase Groups and Topic Extraction
- Enable Number Formatting
- Use Callbacks