Getting the best out of speech to text systems

A look at how you can optimise your use of speech to text systems by utilising the software more effectively and identifying the type of microphone best suited to your specific requirements.

getting-the-best-out-of-speech-to-text-systems
Getting the best out of speech to text systems

The accuracy levels achieved by speech recognition systems have increased markedly over the past few years, with many claiming accuracy rates of up to 95%. However, there can still be significant problems if you are not using the recognition software correctly, or are not fully exploiting its capabilities.

Common problems

Speech recognition software can have problems with users who speak quickly, run words together or have particularly strong accents.

There can also be issues with background noise. Factors such as environmental noise or multiple speakers within range of an audio input device speaking at the same time can have an impact on accuracy.

Other issues include the complexities of jargon, where a business or sector has its own vocabulary and technical phrases. Company-specific terms that are not in general use can result in higher levels of error than you might usually expect.

Possible solutions

In terms of dictation techniques, here are a few hints that should help to deliver improved results:

  • Speak in a natural manner at a normal volume.
  • Dictate punctuation marks such as full stops and commas.
  • Avoid hesitations and long pauses.
  • Structure your sentences grammatically.
  • Make corrections where appropriate – the system should learn your corrected words and use those the next time.
  • Use automatic formatting solutions where possible - these can help in formatting various types of text automatically.
  • It’s generally a good idea to give some thought to what you actually want to say prior to starting the dictation!

From a technology point of view it’s also possible to improve the voice recognition capabilities by “training” the software. The aim is to personalise the system by helping it to better understand your voice pattern, and less than an hour of training can often have significant results.

Coupled with this is the ability to add company- and industry-specific terms to the vocabulary to get over the “jargon” issues discussed above.

Microphone considerations

Your voice can be captured using a microphone that is included on your device (computer, tablet or mobile). For the occasional dictation user who is capturing notes and short memos, this built-in microphone might well be fine. However, for more regular users who are consistently writing lengthier content then a higher quality microphone might be appropriate. Headset microphones are generally considered to deliver the highest quality, not least because the microphone is held in front of your mouth and this tends to limit any other external sound.

However, such products are not really practical for the mobile worker. In such cases the solutions can include using a handheld voice recorder to record audio directly (for subsequent uploading via a docking station) or utilising an app on your smartphone.

One of the factors to bear in mind with microphones is how the audio files are compressed prior to transmission. There are two common techniques - lossless compression which reduces the file in size without losing much audio quality, whilst lossy compression (typically used for MP3 files) sacrifices some of the audio quality for the sake of achieving a smaller file size.

The final word…

Ultimately, where absolute accuracy is essential, perhaps because of the nature of the sector (think legal or healthcare) or where company reputation is at stake, then it is obviously essential to check the final document prior to it being circulated or sent out. However, taking account of the above points should help to ensure that the quality of the speech to text transcription is high and that this final check isn’t too onerous!

Take a look at our speech recognition software