In this next installment in 3Cloud’s series on Microsoft’s Cognitive Services, the Speech APIs will be considered. For app and website developers, these Speech APIs provide natural language processing capabilities that add functionality and value to the customer experience.

In addition to an enhanced customer experience, the Speech APIs can be used by organizations to improve business practices and/or increase customer security. To improve business performance, an organization may use Speech and Language APIs to transcribe call-center recordings to develop deeper understanding of product performance and customers’ concerns. To provide increased customer security, an organization may use the Speaker Recognition API to add a second layer of security by verifying customers’ identities via voice recognition.


All About Speech

First an overview of the Speech APIs – these pre-trained AI Speech models can hear and speak to your customers with personal and convenient voice-based interactions.

speech_to_text_icon2 Speech to Text Transcribes spoken audio to text with standard or custom models. A custom model can be trained for specific vocabulary or unique speaking styles.
text_to_speech_icon2 Text to Speech Bring voice to any app by converting text to audio in near real-time with the choice of over 75 default voices.
speaker_recognition_icon2  Speaker Recognition Voice verification and speaker identification can identify who is speaking; providing increased security in authentication experiences for customers.
speech_translation_icon2  Speech Translation Provides speech-to-speech or speech-to-text translation in 10 different languages.

For more information, see Microsoft’s Cognitive Services Speech Directory.

Develop Business Insight with Text Analytics using Speech to Text and Text Analytics APIs

Organizations with call centers can use the Custom Speech to Text API to transcribe call center recordings that then could be explored with the Language Text Analytics API. Text analysis of call center interactions could lead to answers for questions such as ‘what are our top three product-related issues’ or ‘what issues are of most concern to our customers’? While analysis of call center recordings is not a new data analytic practice, use of Microsoft’s Cognitive Services pre-trained AI models can make the development of the data analytic pipeline faster and more robust.

Speech to Text API

Microsoft’s Speech to Text API is the powerful speech recognition technology used by Cortana and several other Microsoft products. The Custom Speech to Text API allows an organization to build on this technology by training the model to accurately ingest terminology that is unique to the organization’s business practices; for example, distinct sounding terms for products or product functionality. The steps necessary to train the Custom Speech to Text API may be repeated until the desired level of accuracy is reached. Once this has been achieved, the organization’s call center recordings can be transcribed and readied for text analysis.

speech to text API demo

Click on the image to try the demo with your own recording.

Text Analytics API

With accurately transcribed call center recordings, Microsoft’s Language Cognitive Services Text Analytics API provides in-depth text analysis. Text analytics is an umbrella term that can encompass a wide range of practices for analyzing transcribed speech. Practices can include the identification of themes, entities, and sentiment. Themes represent the general ‘gist’ within the recording; frequently occurring patterns found within the communication can be identified. Identification of entities within the text is the primary reason for use of the Custom Speech to Text API – that is, an organization may now learn what specific products or services are being discussed in the call center recordings. Finally, there is sentiment analysis, sometimes referred to as Emotion AI; this analysis identifies the positive and negative language used within the interaction.

text to analytics API demo

Click on the image to try the demo with your own transcription.

In combination, these services provide businesses with several options for the development of a highly customized and flexible data analytic pipeline for the analysis of their call center recordings.

Enhance Customer Authentication Processes with the Speaker Verification API

Identity and data privacy threats are as rampant as ever and have revealed that one-factor authentication processes (e.g. basic username and password usage) are quite vulnerable to theft. Two-factor authentication (2FA) allows customers and organizations to increase the security of their data and identities. One form of a two-factor authentication process is the biometric authentication known as voice recognition. The Speaker Verification API can be used to create a voice recognition system that will recognize the customers’ voices as a means by which to identify them. Such a two-factor authentication can take place during a routine call center interaction. Voice recognition can be accomplished by modeling a customer’s unique voiceprint using the Speaker Verification API. Once a voiceprint of a customer has been created, it can be saved for use for whenever the customer calls again, wherein a call center system can compare the current voice to the voiceprint on file. There are several options for two-factor authentication processes, however a voiceprint authentication has the distinct advantage in that a customer does not need to know or provide additional information (e.g. confirmation codes sent via text or answers to secret questions.) The Speaker Recognition API can create the voiceprint files for authentication and security processes and is robust enough to operate in quite complex acoustic environments.

Speaker Recognition API

speaker recognition API demo

Click on the image to try the demo with your own recordings.

Currently, Microsoft’s Cognitive Services has more than 20 pre-built APIs; many of those allow for customization. Individually or in combination, these APIs enable software developers and data scientists to implement powerful AI solutions that can significantly transform and improve business processes.

More to Come

3Cloud has surveyed several Microsoft Cognitive Service APIs, including search and vision. We will continue to explore the remaining Cognitive Services categories: knowledge and language. Subscribe to our blog so that you don’t miss out and contact us if you would like to learn more about incorporating Cognitive Services into your organization’s business practices.