Dr Anton Ragni

BEng, MEng, PhD

School of Computer Science

Senior Lecturer in Speech and Language Technologies

Assessments Lead

Member of the Speech and Hearing (SpandH) research group

a.ragni@sheffield.ac.uk

Regent Court (CS)

Full contact details

Dr Anton Ragni
School of Computer Science
Regent Court (CS)
211 Portobello
Sheffield
S1 4DP

Profile

Dr Anton Ragni is a Senior Lecturer in Speech and Language Processing in the School of Computer Science at the University of Sheffield.

He graduated with BEng and MEng degrees in Information Technology from the University of Tartu, Estonia, in 2005 and 2007 respectively. He was awarded his PhD from the University of Cambridge in 2013.

From 2005 to 2008, he underwent graduate training at the Nordic Graduate School of Language Technology and from 2007 to 2008, he was an intern in the Speech Technology Group, Toshiba Research Europe Ltd, UK. From 2013 to 2018 and from 2018 to 2019, he was a Research Associate and Senior Research Associate, respectively, in Speech Processing at the University of Cambridge.

His current research interest focuses on machine learning approaches for speech and language processing.

Research interests

Dr Anton Ragni's research interests include:

Core automatic speech recognition
Efficient and expressive speech synthesis
Spoken Language Translation
Information Retrieval
Conversation Modelling

Publications

Books

Young S, Evermann G, Gales M, Hain T, Kershaw D, Xunying L, Moore G, Odell J, Ollason D, Povey D , Ragni A et al () The HTK Book (for HTK Version 3.5, documentation alpha version). Cambridge University Engineering Department: Cambridge University Engineering Department.

Journal articles

Flynn R & Ragni A (2026) Beyond the Utterance: An Empirical Study of Very Long Context Speech Recognition. IEEE Transactions on Audio, Speech and Language Processing, 1-11.
Zhao M & Ragni A (2026) Decoding Order Matters in Autoregressive Speech Synthesis.. CoRR, abs/2601.08450.
Mogridge R & Ragni A (2026) Minerva 2 for speech and language tasks. Computer Speech & Language, 95. View this article in WRRO
Tan X, Zhao M, Cross M & Ragni A (2025) Discrete-time diffusion-like models for speech synthesis.. CoRR, abs/2509.18470.
Sun W, Tu Z & Ragni A (2024) Energy-based models for speech synthesis. ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 12667-12671. View this article in WRRO
Ma Y, Øland A, Ragni A, Sette BMD, Saitis C, Donahue C, Lin C, Plachouras C, Benetos E, Quinton E , Shatri E et al (2024) Foundation Models for Music: A Survey.. CoRR, abs/2408.14340.
Cross M & Ragni A (2024) What happens to diffusion model likelihood when your model is conditional?. Proceedings of Machine Learning Research, 255, 1-14. View this article in WRRO
Ragni A, Gales MJF, Rose O, Knill KM, Kastanos A, Li Q & Ness PM (2022) Increasing Context for Estimating Confidence Scores in Automatic Speech Recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30, 1319-1329.
Wang Z & Ragni A (2021) Approximate Fixed-Points in Recurrent Neural Networks.
Jacobsen SA & Ragni A (2021) Continuous representations of intents for dialogue systems.
Chen X, Liu X, Wang Y, Ragni A, Wong JHM & Gales MJF (2019) Exploiting future word contexts in neural network language models for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(9), 1444-1454.
Wu C, Gales MJF, Ragni A, Karanasou P & Sim KC (2018) Improving interpretability and regularization in deep learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(2), 256-265.
Ragni A, Li Q, Gales MJF & Wang Y (2018) Confidence Estimation and Deletion Prediction Using Bidirectional Recurrent Neural Networks.. CoRR, abs/1810.13025.
Shi-Xiong Zhang , Ragni A & Gales MJF (2010) Structured Log Linear Models for Noise Robust Speech Recognition. IEEE Signal Processing Letters, 17(11), 945-948.
Flynn R & Ragni A () Self-Train Before You Transcribe. Interspeech 2024, 2840-2844.

Book chapters

Ma Y, Yuan R, Li Y, Zhang G, Chen X, Yin H, Lin C, Benetos E, Ragni A, Gyenge N , Liu R et al (2023) ON THE EFFECTIVENESS OF SPEECH SELF-SUPERVISED LEARNING FOR MUSIC, Proceedings of the International Society for Music Information Retrieval Conference (pp. 457-465).
Nair S, Ragni A, Klejch O, Galuščáková P & Oard D (2020) Experiments with Cross-Language Speech Retrieval for Lower-Resource Languages, Lecture Notes in Computer Science (pp. 145-157). Springer International Publishing

Conference proceedings

Cassini SR, Hain T & Ragni A (2025) Emphasis Sensitivity in Speech Representations. 2025 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (pp 1-8), 6 December 2025 - 10 December 2025.
Que S & Ragni A (2025) VisualSpeech: Enhancing Prosody Modeling in TTS Using Video. Proceedings of Interspeech 2025 (pp 3778-3782). Rotterdam, The Netherlands, 17 August 2025 - 17 August 2025. View this article in WRRO
Sun W & Ragni A (2025) Score-Based Training for Energy-Based TTS Models. Interspeech 2025 (pp 5528-5532)
Cross M & Ragni A (2025) Flowing Straighter with Conditional Flow Matching for Accurate Speech Enhancement. Proceedings of Machine Learning Research, Vol. 277
Leung W-Z, Cross M, Ragni A & Goetze S (2024) Training data augmentation for dysarthric automatic speech recognition by text-to-dysarthric-speech synthesis. Proceedings of Interspeech 2024 (pp 2494-2498). Kos island, Greece, 1 September 2024 - 1 September 2024. View this article in WRRO
Mogridge R, Close G, Sutherland R, Hain T, Barker J, Goetze S & Ragni A (2024) Non-Intrusive Speech Intelligibility Prediction for Hearing-Impaired Users Using Intermediate ASR Features and Human Memory Models.. ICASSP (pp 306-310)
Li Y, Yuan R, Zhang G, Ma Y, Chen X, Yin H, Xiao C, Lin C, Ragni A, Benetos E , Gyenge N et al (2024) MERT: ACOUSTIC MUSIC UNDERSTANDING MODEL WITH LARGE-SCALE SELF-SUPERVISED TRAINING. 12th International Conference on Learning Representations Iclr 2024
Sun W, Tu Z & Ragni A (2024) Energy-Based Models for Speech Synthesis.. ICASSP (pp 12667-12671)
Leung W-Z, Cross M, Ragni A & Goetze S (2024) Training Data Augmentation for Dysarthric Automatic Speech Recognition by Text-to-Dysarthric-Speech Synthesis.. INTERSPEECH
Flynn R & Ragni A (2024) Self-Train Before You Transcribe.. INTERSPEECH
Yuan R, Ma Y, Li Y, Zhang G, Chen X, Yin H, Zhuo L, Liu Y, Huang J, Tian Z , Deng B et al (2023) MARBLE: Music Audio Representation Benchmark for Universal Evaluation. Advances in Neural Information Processing Systems (NeurIPS 2023), Vol. 36. New Orleans, USA View this article in WRRO
Ma Y, Yuan R, Li Y, Zhang G, Lin C, Chen X, Ragni A, Yin H, Benetos E, Gyenge N , Liu R et al (2023) On the effectiveness of speech self-supervised learning for music.. ISMIR 2023: 24th International Society for Music Information Retrieval Conference proceedings (pp 457-465). Milan, Italy, 5 November 2023 - 5 November 2023. View this article in WRRO
Nomo Sudro P, Ragni A & Hain T (2023) Adapting pretrained models for adult to child voice conversion. 2023 31st European Signal Processing Conference (EUSIPCO) Proceedings (pp 271-275). Helsinki, Finland, 4 September 2023 - 4 September 2023. View this article in WRRO
Flynn R & Ragni A (2023) Leveraging cross-utterance context for ASR decoding. Proceedings of Interspeech 2023 (pp 1359-1363). Dublin, Ireland, 20 August 2024 - 20 August 2024. View this article in WRRO
Nicholls D, Knill K, Gales MJF, Ragni A & Ricketts P (2023) Speak & improve: L2 English speaking practice tool. Proceedings of Interspeech 2023 (pp 3669-3670). Dublin, Ireland, 20 August 2024 - 20 August 2024. View this article in WRRO
Mogridge R, Close G, Sutherland R, Goetze S & Ragni A (2023) Pre-Trained Intermediate ASR Features and Human Memory Simulation for Non-Intrusive Speech Intelligibility Prediction in the Clarity Prediction Challenge 2. he 4th Clarity Workshop on Machine Learning Challenges for Hearing Aids (Clarity-2023). https://claritychallenge.org/clarity2023-workshop/results.html, 19 August 2023 - 19 August 2023.
Flynn R & Ragni A (2023) Leveraging Cross-Utterance Context For ASR Decoding.. INTERSPEECH (pp 1359-1363)
Li Y, Yuan R, Zhang G, MA Y, Lin C, Chen X, Ragni A, Yin H, Hu Z, He H , Benetos E et al (2022) LV-49: MAP-Music2Vec: A Simple and Effective Baseline for Self-Supervised Music Audio Representation Learning. 23rd International Society for Music Information Retrieval Conference (ISMIR 2022). Bengaluru, India, 4 December 2022 - 4 December 2022. View this article in WRRO
Li Y, Zhang G, Yang B, Lin C, Ragni A, Wang S & Fu J (2022) HERB: Measuring hierarchical regional bias in pre-trained language models. Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022 (pp 334-346). Online, 20 November 2022 - 20 November 2022. View this article in WRRO
Kastanos A, Ragni A & Gales MJF (2020) Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp 6329-6333), 4 May 2020 - 8 May 2020.
Li Q, Ness PM, Ragni A & Gales MJF (2019) Bi-directional lattice recurrent neural networks for confidence estimation. ICASSP 2019 (pp 6755-6759). Brighton, UK, 12 May 2019 - 12 May 2019. View this article in WRRO
Ragni A, Li Q, Gales MJF & Wang Y (2019) Confidence estimation and deletion prediction using bidirectional recurrent neural networks. 2018 IEEE Spoken Language Technology Workshop (SLT) (pp 204-211). Athens, Greece, 18 December 2018 - 18 December 2018. View this article in WRRO
Oard DW, Carpuat M, Galuscáková P, Barrow J, Nair S, Niu X, Shing H-C, Xu W, Zotkina E, McKeown KR , Muresan S et al (2019) Surprise Languages: Rapid-Response Cross-Language IR.. EVIA@NTCIR
Li Q, Ness P, Ragni A & Gales MJF (2019) Bi-directional Lattice Recurrent Neural Networks for Confidence Estimation.. ICASSP (pp 6755-6759)
Wang Y, Wong JHM, Gales MJF, Knill KM & Ragni A (2018) Sequence teacher-student training of acoustic models for automatic free speaking language assessment. 2018 IEEE Spoken Language Technology Workshop (SLT). Athens, Greece, 18 December 2018 - 21 December 2018.
Wang Y, Chen X, Gales MJF, Ragni A & Wong JHM (2018) Phonetic and graphemic systems for multi-genre broadcast transcription. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Calgary, AB, Canada, 15 April 2018 - 20 April 2018.
Chen O, Ragni A, Gales M & Chen X (2018) Active memory networks for language modeling. Proceedings of Interspeech 2018 (pp 3338-3342). Hyderabad, India, 2 September 2018 - 6 September 2018.
Knill K, Gales M, Kyriakopoulos K, Malinin A, Ragni A, Wang Y & Caines A (2018) Impact of ASR performance on free speaking language assessment. Interspeech 2018 (pp 1641-1645). Hyderabad, India, 2 September 2018 - 6 September 2018.
Ragni A & Gales M (2018) Automatic speech recognition system development in the "wild". Interspeech 2018 (pp 2217-2221). Hyderabad, India, 2 September 2018 - 6 September 2018.
Chen X, Liu X, Ragni A, Wang Y & Gales MJF (2017) Future word contexts in neural network language models. 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). Okinawa, Japan, 16 December 2017 - 20 December 2017.
Malinin A, Knill K, Ragni A, Wang Y & Gales MJF (2017) An attention based model for off-topic spontaneous spoken response detection : an initial study. 7th ISCA Workshop on Speech and Language Technology in Education (SLaTE) (pp 144-149). Stockholm, Sweden, 25 August 2017 - 25 August 2017. View this article in WRRO
Chen X, Ragni A, Liu X & Gales MJF (2017) Investigating bidirectional recurrent neural network language models for speech recognition. Proceedings of Interspeech 2017 (pp 269-273). Stockholm, Sweden, 20 August 2017 - 24 August 2017.
Knill KM, Gales MJF, Kyriakopoulos K, Ragni A & Wang Y (2017) Use of graphemic lexicons for spoken language assessment. Proceedings of Interspeech 2017 (pp 2774-2778). Stockholm, Sweden, 20 August 2017 - 24 August 2017.
Gales MJF, Knill KM & Ragni A (2017) Low-resource speech recognition and keyword-spotting. Speech and Computer : 19th International Conference, SPECOM 2017 (pp 3-19). Hatfield, UK, 12 September 2017 - 16 September 2017.
Malinin A, Ragni A, Knill K & Gales M (2017) Incorporating uncertainty into deep learning for spoken language assessment. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2 : Short Papers). Vancouver, Canada, 30 July 2017 - 4 August 2017.
Ragni A, Wu C, Gales MJF, Vasilakes J & Knill KM (2017) Stimulated training for automatic speech recognition and keyword search in limited resource conditions. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp 4830-4834). New Orleans, LA, USA, 5 March 2017 - 9 March 2017.
Ragni A, Saunders D, Zahemszky P, Vasilakes J, Gales MJF & Knill KM (2017) Morph-to-word transduction for accurate and efficient automatic speech recognition and keyword search. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp 5770-5774). New Orleans, LA, USA, 5 March 2017 - 9 March 2017.
Chen X, Ragni A, Vasilakes J, Liu X, Knill K & Gales MJF (2017) Recurrent neural network language models for keyword search. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp 5775-5779). New Orleans, LA, USA, 5 March 2017 - 9 March 2017.
Ragni A, Dakin E, Chen X, Gales MJF & Knill KM (2016) Multi-language neural network language models. Interspeech 2016. San Francisco, CA, USA, 8 September 2016 - 12 September 2016.
Yang J, Ragni A, Gales MJF & Knill KM (2016) Log-linear system combination using structured support vector machines. Interspeech 2016. San Francisco, CA, USA, 8 September 2016 - 12 September 2016.
Yang J, Zhang C, Ragni A, Gales MJF & Woodland PC (2016) System combination with log-linear models. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Shanghai, China, 20 March 2016 - 25 March 2016.
van Dalen RC, Yang J, Wang H, Ragni A, Zhang C & Gales MJF (2016) Structured discriminative models using deep neural-network features. 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) (pp 160-166). Scottsdale, AZ, USA, 13 December 2015 - 13 December 2015. View this article in WRRO
Cui J, Kingsbury B, Ramabhadran B, Sethy A, Audhkhasi K, Cui X, Kislal E, Mangu L, Nussbaum-Thom M, Picheny M , Tüske Z et al (2016) Multilingual representations for low resource speech recognition and keyword search. 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) (pp 259-266). Scottsdale, AZ, USA, 13 December 2015 - 13 December 2015. View this article in WRRO
Mendels G, Cooper E, Soto V, Hirschberg J, Gales MJF, Knill KM, Ragni A & Wang H (2015) Improving speech recognition and keyword search for low resource languages using web data. INTERSPEECH 2015 : 16th Annual Conference of the International Speech Communication Association (pp 829-833). Dresden, Germany, 6 September 2015 - 10 September 2015.
Wang H, Ragni A, Gales MJF, Knill KM, Woodland PC & Zhang C (2015) Joint decoding of tandem and hybrid systems for improved keyword spotting on low resource languages. INTERSPEECH 2015 : 16th Annual Conference of the International Speech Communication Association (pp 3660-3664). Dresden, Germany, 6 September 2015 - 10 September 2015.
Gales MJF, Knill KM & Ragni A (2015) Unicode-based graphemic systems for limited resource languages. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. -(-) (pp 5186-5190). Brisbane, QLD, Australia, 19 April 2015 - 24 April 2015.
Ragni A, Gales MJF & Knill KM (2015) A language space representation for speech recognition. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp 4634-4638). Brisbane, QLD, Australia, 19 April 2015 - 24 April 2015.
Knill KM, Gales MJF, Ragni A & Rath SP (2014) Language independent and unsupervised acoustic models for speech recognition and keyword spotting. INTERSPEECH 2014 : 15th Annual Conference of the International Speech Communication Association. Singapore, 14 September 2014 - 18 September 2014.
Rath SP, Knill KM, Ragni A & Gales MJF (2014) Combining tandem and hybrid systems for improved speech recognition and keyword spotting on low resource languages. INTERSPEECH 2014 : 15th Annual Conference of the International Speech Communication Association (pp 835-839). Singapore, 14 September 2014 - 18 September 2014.
Ragni A, Knill KM, Rath SP & Gales MJF (2014) Data augmentation for low resource languages. INTERSPEECH 2014 : 15th Annual Conference of the International Speech Communication Association (pp 810-814). Singapore, 14 September 2014 - 18 September 2014.
Yoshioka T, Ragni A & Gales MJF (2014) Investigation of unsupervised adaptation of DNN acoustic models with filter bank input. 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp 6344-6348). Florence, Italy, 4 May 2014 - 9 May 2014.
Gales MJF, Knill KM, Ragni A & Rath SP (2014) Speech recognition and keyword spotting for low-resource languages : Babel project research at CUED. Fourth International Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU-2014) (pp 16-23). St. Petersburg, Russia, 14 May 2014 - 14 May 2014. View this article in WRRO
van Dalen RC, Ragni A & Gales MJF (2013) Efficient decoding with generative score-spaces using the expectation semiring. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp 7619-7623). Vancouver, BC, Canada, 26 May 2013 - 26 May 2013. View this article in WRRO
Gales MJF, Ragni A, Zhang A & Dalen RCV (2012) Structured discriminative models for speech recognition. Symposium on Machine Learning in Speech and Language Processing (MLSLP). Portland, Oregon, USA, 14 September 2012 - 14 September 2012. View this article in WRRO
Roupakia Z, Ragni A & Gales MJF (2012) Rapid nonlinear speaker adaptation for large-vocabulary continuous speech recognition. INTERSPEECH 2012 : 13th Annual Conference of the International Speech Communication Association (pp 1784-1787). Portland, OR, USA, 9 September 2012 - 9 September 2012. View this article in WRRO
Ragni A & Gales MJF (2012) Inference algorithms for generative score-spaces. 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp 4149-4152). Kyoto, Japan, 25 March 2012 - 25 March 2012. View this article in WRRO
Ragni A & Gales MJF (2012) Derivative kernels for noise robust ASR. IEEE Workshop on Automatic Speech Recognition & Understanding (pp 119-124). Waikoloa, HI, USA, 11 December 2011 - 11 December 2011. View this article in WRRO
Ragni A & Gales MJF (2011) Structured discriminative models for noise robust continuous speech recognition. 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp 4788-4791). Prague, Czech Republic, 22 May 2011 - 22 May 2011. View this article in WRRO
Gales MJF, Ragni A, AlDamarki H & Gautier C (2010) Support vector machines for noise robust ASR. 2009 IEEE Workshop on Automatic Speech Recognition & Understanding (pp 205-210). Merano, Italy, 13 December 2009 - 13 December 2009. View this article in WRRO
Ragni A (2007) Initial experiments with Estonian speech recognition. Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA 2007) (pp 249-252). Tartu, Estonia, 25 May 2007 - 25 May 2007. View this article in WRRO
Flynn R & Ragni A () How Much Context Does My Attention-Based ASR System Need?. Interspeech 2024 (pp 217-221)
Mogridge R & Ragni A () Learning from memory-based models. Interspeech 2024 (pp 2360-2364)
Roupakia Z, Ragni A & Gales MJF () Rapid nonlinear speaker adaptation for large-vocabulary continuous speech recognition. Interspeech 2012 (pp 1784-1787)

Preprints

Flynn R & Ragni A (2026) Beyond the Utterance: An Empirical Study of Very Long Context Speech Recognition, arXiv.
Zhao M & Ragni A (2026) Decoding Order Matters in Autoregressive Speech Synthesis, arXiv.
Sudro PN, Ragni A & Hain T (2025) A comparative study of generative models for child voice conversion, arXiv.
Tan X, Zhao M & Ragni A (2025) Discrete-Time Diffusion-Like Models for Speech Synthesis, arXiv.
Bartley C & Ragni A (2025) How I Built ASR for Endangered Languages with a Spoken Dictionary..
Que S & Ragni A (2025) VisualSpeech: Enhancing Prosody Modeling in TTS Using Video, arXiv.
Cross M & Ragni A (2024) What happens to diffusion model likelihood when your model is conditional?, arXiv.
Ma Y, Øland A, Ragni A, Del Sette BM, Saitis C, Donahue C, Lin C, Plachouras C, Benetos E, Shatri E , Morreale F et al (2024) Foundation Models for Music: A Survey, arXiv.
Flynn R & Ragni A (2024) Self-Train Before You Transcribe, arXiv.
Leung W-Z, Cross M, Ragni A & Goetze S (2024) Training Data Augmentation for Dysarthric Automatic Speech Recognition by Text-to-Dysarthric-Speech Synthesis, arXiv.
Mogridge R, Close G, Sutherland R, Hain T, Barker J, Goetze S & Ragni A (2024) Non-Intrusive Speech Intelligibility Prediction for Hearing-Impaired Users using Intermediate ASR Features and Human Memory Models, arXiv.
Sun W, Tu Z & Ragni A (2023) Energy-Based Models For Speech Synthesis, arXiv.
Ma Y, Yuan R, Li Y, Zhang G, Chen X, Yin H, Lin C, Benetos E, Ragni A, Gyenge N , Liu R et al (2023) On the Effectiveness of Speech Self-supervised Learning for Music, arXiv.
Flynn R & Ragni A (2023) Leveraging Cross-Utterance Context For ASR Decoding, arXiv.
Yuan R, Ma Y, Li Y, Zhang G, Chen X, Yin H, Zhuo L, Liu Y, Huang J, Tian Z , Deng B et al (2023) MARBLE: Music Audio Representation Benchmark for Universal Evaluation, arXiv.
Li Y, Zhang G, Yang B, Lin C, Wang S, Ragni A & Fu J (2022) HERB: Measuring Hierarchical Regional Bias in Pre-trained Language Models, arXiv.
Wang Z & Ragni A (2021) Approximate Fixed-Points in Recurrent Neural Networks, arXiv.
Jacobsen SA & Ragni A (2021) Continuous representations of intents for dialogue systems, arXiv.
Kastanos A, Ragni A & Gales M (2019) Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks, arXiv.
Li Q, Ness P, Ragni A & Gales M (2018) Bi-Directional Lattice Recurrent Neural Networks for Confidence Estimation, arXiv.
Ragni A, Li Q, Gales M & Wang Y (2018) Confidence Estimation and Deletion Prediction Using Bidirectional Recurrent Neural Networks, arXiv.
Wang Y, Chen X, Gales M, Ragni A & Wong J (2018) Phonetic and Graphemic Systems for Multi-Genre Broadcast Transcription, arXiv.
Chen X, Liu X, Ragni A, Wang Y & Gales M (2017) Future Word Contexts in Neural Network Language Models, arXiv.
Chen X, Liu X, Ragni A, Wang Y & Gales MJF (2017) Future Word Contexts in Neural Network Language Models..

Grants

Interpretable acoustic-articulatory relations in speech production, Royal Society, 10/2025 - 09/2027, as PI
Exemplar-based Expressive Speech Synthesis, EPSRC, 06/2021 - 11/2023, £218,290, as PI
Automatic voice conversion for transforming professional adult voice actors to artificial child voice actors, Innovate UK, 01/2021 - 01/2023, £173,605, as Co-PI

Professional activities and memberships

He is a member of IEEE, ISCA and a regular reviewer of major speech and machine learning journals and conferences.

Since 2016, he has been an Officer of ISCA Special Interest Group on Machine Learning in Speech and Language Processing. He received the Best Student Paper Award at the IEEE Workshop on Automatic Speech Recognition and Understanding for his paper “Generative kernels for noise robust ASR” co-authored with M. J. F. Gales in 2011.

School of Computer Science

School of Computer Science

Dr Anton Ragni

Books

Journal articles

Book chapters

Conference proceedings

Preprints

Links