Examining the Best Speech-to-Text Method for Audio Files in Podcasting

Siya Naik; Avina Almeida; Shubham Lotliker

Authors

Siya Naik Computer Engineering, Don Bosco College of Engineering, Goa University, Margao, India
Avina Almeida Computer Engineering, Don Bosco College of Engineering, Goa University, Margao, India
Shubham Lotliker Computer Engineering, Don Bosco College of Engineering, Goa University, Margao, India

Keywords:

Podcast, Subtitles, Spectral Gating, Speech-to-Text, Silero, Vosk, Mozilla DeepSpeech, SpeechRecognition, Word Error Rate, Accuracy

Abstract

Podcasting is a great way to give insights or opinions on any topic to the audience. Podcasting requires both parties to be present physically at the location. But due to the pandemic crisis, this has caused a big problem. So, it is now carried out on an online platform. But the cons are the presence of noise in the audio files as well as miscommunication. The Spectral Gating method is used to remove the noise. This paper compares the various algorithms for converting audio to text by using various speech-to-text pretrained models. We performed an experiment on various audio files and the best accuracy rate was obtained for SpeechRecognition pretrained model.

References

Akhil Kanade, Sourabh Gune, Shubham Dharamkar, Rohan Gokhale, “Automatic Subtile Generation for Videos,” Interntional Journal of Enginneering Research and General Science, Vol.3, Issue.6, p.744,2015.

Siya Sadashiv Naik, Gouri Bhatikar and Ugam Gaude, “Analysis of Best Algorithm for Noise Reduction in Podcasting,” Internatioonal Journal of Scientific Research in Science and Technology, Vol.8, Issue.3, pp24-249,2021.

N Usha Rani, P N Girija, “Error Analysis to Improve the Speech Recogntion Accuracy on Telegu Language,” Indian Academic of Sciences, Vol.37.Part.6, p.747,2012.

Aashish Agarwal, Torsten Zesch, “German End-to-end Speech Recognition based on DeepSpeech,” ResearchGate, Germany, Germany, pp.2-3, 2019.

N. SelvaKumar, M. Rohini, C. Narmada, M. Yogeshprabhu, “Network Traffic Control Using AI,” International Journal of Scientific Research in Network Security and Communication, Vol.8, Issue.2, pp.13-21,2020.

Muhammad Hafida Firmansyah, Anand Paul, Deblina Bhattachrya, Gul Malik Urfa, “A.I. based Emedded Speech to Text using DeepSpeech,” ResearchGate, South Korea, pp.1-5,2020.

Dhara Bhatt, Bhargavi Khrishna, “Computer Assisted Pronounciation Learning System Using Speech Recognition Systems “PROnunciation Application”,” International Journal of Scientific Research in Computer Science and Engineering, Vol.7, Issue.6,pp.36-39,2019.

Examining the Best Speech-to-Text Method for Audio Files in Podcasting

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Journal Information

Information

Join Editorial Board

Keywords

Current Issue