Remember Whisper, which I’ve talked about many times before? It is a tool that uses AI for speech recognition, i.e., converting audio speech into text. It works with many languages.

Well, you’ll be able to do everything the same but even faster, thanks to Distil-Whisper, a lighter version of Whisper that is 6 times faster and uses an AI model that is 49% smaller than its big brother. To top it off, Distil-Whisper only has an error rate of 1%, which is pretty impressive.

This is possible thanks to its split algorithm, which can transcribe long audio files 9 times faster than OpenAI’s sequential algorithm. Let’s not be afraid of words; this is a real revolution for those who need to process large volumes of audio data.

Here is the architecture of the Distil-Whisper model:

Currently, Distil-Whisper is only available for speech recognition in English, but with the rapid evolution of this field, it can be expected that other languages will be supported soon.

Distil-Whisper is, therefore, designed to replace Whisper in English speech recognition, with five key advantages: faster inference, better noise robustness, reduced hallucinations, use in speculative decoding, and permissive licensing for commercial applications. This gem of technology was trained on 22,000 hours of pseudo-labeled audio data in 10 different domains and from more than 18,000 stakeholders.

All the documentation and usage examples are here.

The future of speech recognition looks promising!

