1-second transcription of 1-hour audio, NVIDIA heavily open-sources speech recognition model Parakeet!

NVIDIA's open-source Parakeet TDT 0.6B speech recognition model set a record, able to transcribe 60 minutes of audio in 1 second, with a word error rate of only 6.05%; the model adopts the FastConformer-TDT architecture, which can process 24-minute audio segments at one time, and supports punctuation prediction and timestamping; it's open-sourced under the CC-BY-4.0 license, and participates in the Quantity 600M, supports commercialization, but currently only supports English recognition.

Search