ASR stands for Automated Speech Recognition. It refers to a technology that converts spoken words into written text. This technology allows computers to identify and process the words a person speaks into an input device or microphone connected to a computer.
ASR is independent transcription software designed to convert the spoken language into readable text. It is of two types which are as follows;
1) Direct dialogue conversations: It is a basic version of ASR. It consists of machine interface which connects with people. You are required to verbally interact with the computer; the machine tells you to respond with a specific word from a list of words and accordingly, provides response or answer to your request. Automated telephone banking uses this technology to enable customers perform a wide range of financial transactions over the telephone.
2) Natural language conversation: It is a more advanced and sophisticated version of ASR. It understands the user'[s speech or written material and responds to the user on the basis of understood content. It enables people to interact with computer using everyday language.
The basic sequence of events that exists in ASR is as follows:
1) A person speaks to the software using an input device like a microphone.
2) The input device creates a wave file of your words.
3) The volume of wave file is normalized and background noises are removed.
4) The cleaned wave file is broken down into phonemes which are the smallest units of sound. There are around 44 phonemes in English.
5) The ASR software analyzes the phonemes, starting from the first phoneme. It uses statistical probability analysis to figure out whole words before making a complete sentence.
6) Now, after understanding the words, the ASR responds in a meaningful way.