As in the usual case with many large companies, when I called an airline last week, there was an automated voice that instructed me to select an option that best suits my request before I could speak to a human operator. However, instead of pressing buttons to move through the menus, I needed to say out loud my choice to the phone. This once again reminded me of my curiosity about speech recognition software. How could Siri, Cortana, and automated phone systems decipher what the user is saying? I decided to look more into this.
When the sounds were first recorded by the microphone, the speech recognition systems would turn the audio into digital signals by using the analog-to-digital (A/D) converter. These signals could be readily compared to the speech patterns stored in the database of words (kinda like a dictionary), to decide what the user probably said. Rather than storing the patterns of thousands of words, however, the database only needs to “recognize the few dozen phonemes that the spoken languages are built on (English uses about 46, while Spanish has only about 24)”, then analyze what phonemes made up each word. (1)
This method is the basics of speech recognition, and would work just fine in automated response system like that of the airline I called. However, for complex applications like Siri and Cortana, there are also statistical analysis and artificial intelligence involved. Speech recognition softwares take feedback from the user to improve their performance as they go, so that if the user correct a mistake they made, they would avoid making similar mistakes next time. These applications also take into consideration the probability of different words following a certain word, to chunk out the word with the highest contextual likelihood.
Speech recognition softwares are becoming increasingly prevalent, not only for large companies in answering customers’ queries, but also as “personal assistant apps” (2) that can answer random questions, fulfill your request, or make sassy comebacks sometimes to entertain you. Computer scientists are also making efforts to increase speech recognition systems so that it could decipher what the users say with higher accuracy.
Works Cited
(1) and (2): http://www.explainthatstuff.com/voicerecognition.html









