Kavya Manohar

kavyamanohar.comgithub.com/kavyamanoharORCIDHuggingface


Speech and Language Technology • Technical Education • Responsible AI


Employment Details

Machine Learning Researcher

March 2025 - Present : Adalat AI.

Computational Linguist

2023 - 2025 : Kerala University of Digital Sciences Innovation and Technology.

Assistant Professor in Electronics and Communication Engineering.

2017 - 2019 : Government Engineering College, Palakkad.

2013 -2017 : Aryanet Institute of Technology, Palakkad.

2012 - 2013 : Lourdes Matha College of Science and Technology, Thiruvananthapuram.

Education

PhD

APJ Abdul Kalam Technological University, 2019-2023

Thesis: Linguistic challenges in Malayalam speech recognition: Analysis and solutions.

M.Tech

University of Calicut, 2010-2012

Communication Engineering and Signal Processing with CGPA of 8.16.
Government Engineering College Thrissur with MHRD Fellowship.
Thesis: A comparative study on vector quantization techniques on the perceptual quality of wideband speech.

B.Tech

University of Kerala, 2006-2010

Electronics and Communication Engineering from Government Engineering College, Thiruvananthapuram with a CGPA of 7.6.

Research and Development

  1. Model Evaluations:

    Revealed critical gaps in OpenAI Whisper’s multilingual speech AI model evaluation routine through systematic analysis of text normalization for Indic languages, demonstrating performance overestimation due to unaccounted linguistic features. Research published at EMNLP 2024 Main Conference.

  2. Subword Tokenization for Improved Speech Recognition Model Performance:

    Developed a novel syllable-byte pair encoding (S-BPE) tokenization method that intelligently combines linguistic syllable boundaries with data-driven byte pair encoding for Malayalam speech recognition. Achieved 16.8% word error rate (WER) reduction over word-level baseline, demonstrating superior scalability with increased text only training data while maintaining computational efficiency during inference. Published in EURASIP Journal 2023.

  3. Enhancement to Kaldi ASR toolkit:

    Enhanced Kaldi ASR toolkit’s flexibility by implementing custom subword boundary support, contribution merged into main codebase.

  4. Grapheme to Phoneme (G2P) System Development

    Created and published Mlphon, an open-source Malayalam G2P conversion library with web interface, enabling automated pronunciation generation. It is now employed in the development of automatic speech recognition and speech synthesis systems. The work in published in IEEE Access Journal.

    Extended Epitran multilingual G2P library’s language coverage by implementing Malayalam support.

  5. Dataset Creation and Curation

    First comprehensive Malayalam pronunciation dictionary, published on Hugging Face.

    Curated and published crowd sourced Malayalam speech corpus, enabling broader ASR research for 35M+ speakers working alongside the FOSS Language Techology Collective, SMC.

  6. Pioneered speech recognition development for endangered languages (Malasar, Poumai Naga), working closely with language community members through finetuning pretrained transformer models, enabling first-ever speech recognition capabilities. Publication at ICON 2023.

  7. Boosted ASR accuracy by integrating language models with transformer acoustic models, demonstrating significant WER reductions. Models and Demo published in Huggingface. Detailed technical report and code published in personal blog.

  8. Developed and deployed Malayalam ASR system with web-based demo with client side inference in connection with PhD research.

  9. Linguistic Typology Studies

    Published first quantitative analysis of Malayalam’s morphological complexity in comparison with Indian and European languages (TSD 2020), establishing empirical complexity metrics for cross-linguistic comparison.

  10. Digital Typography and Font Engineering

    Engineered OpenType glyph formation rules for three major Malayalam fonts (Manjari, Chilanka, Gayathri), enabling correct rendering of complex Malayalam scripts in digital platforms, now widely used across Kerala’s digital ecosystem.

Publications

Book

  1. Machine Translation: Best Practices using Deep Learning and Generative AI, Ed. Elizabeth Sherly, Leena G Pillai, Kavya Manohar, John Mccarey, CRC Press, 2025 (Under Processing)

Book Chapter

  1. ASR Models from Conventional Statistical Models to Transformers and Transfer Learning, Elizabeth Sherly, Leena G Pillai, Kavya Manohar in Automatic Speech Recognition and Translation for Low Resource Languages, Wiley-Scrivener publishing, 2024.

Journal

  1. Improving speech recognition systems for the morphologically complex Malayalam language using subword tokens for language modeling. Kavya Manohar, A. R. Jayan, and Rajeev Rajan. J AUDIO SPEECH MUSIC PROC. 2023, 47 (2023).

  2. Mlphon: A Multifunctional Grapheme-Phoneme Conversion Tool Using Finite State Transducers. Kavya Manohar, A. R. Jayan, and Rajeev Rajan. IEEE Access 10 (2022).

Conferences

  1. Romanized to Native Malayalam Script Transliteration Using an Encoder-Decoder Framework, Baiju, Bajiyo, Kavya Manohar, Leena G. Pillai, and Elizabeth Sherly. COLING IndoNLP Workshop 2025 (Accepted)

  2. What is lost in Normalization? Exploring Pitfalls in Multilingual ASR Model Evaluations, Kavya Manohar, Leena G Pillai. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, Florida. Association for Computational Linguistics.

  3. Enhancing End-to-End Malayalam Automatic Speech Recognition with Language Model Augmentation. Kavya Manohar, Ashish Abraham, Gokul G Menon. Speech and Language Technologies for Low-Resource Languages. SPELLL 2024.

  4. Malayalam to English Named Entity Transliteration using Attention based BiLSTM, Bajiyo Baiju, Kavya Manohar, Leena G Pillai and Elizaebth Sherly. IEEE-RAICS, Recent Advances in Intelligent Computational Systems at Kothamangalam, 16-18 May 2024

  5. Automatic Speech Recognition System for Malasar Language using Multilingual Transfer Learning, Basil K Raju, Leena G Pillai, Kavya Manohar, Elizabeth Sherly. In Proceedings of the 20th International Conference on Natural Language Processing (ICON 2023)

  6. Automatic Recognition of Continuous Malayalam Speech using Pretrained Multilingual Transformers, Kavya Manohar; Gokul G. Menon; Ashish Abraham; Rajeev Rajan; A. R. Jayan. 2023 International Conference on Intelligent Systems for Communication, IoT and Security (ICISCoIS)

  7. Syllable Subword tokens for Open Vocabulary Speech Recognition in Malayalam, Kavya Manohar, A. R. Jayan, and Rajeev Rajan. Third International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2022), Trento, Italy.

  8. Quantitative analysis of the morphological complexity of Malayalam language. Kavya Manohar, A. R. Jayan, and Rajeev Rajan. International Conference on Text, Speech, and Dialogue. Springer, Cham, Czech Republic (2020).

  9. Malayalam Orthographic Reforms. Impact on Language and Popular Culture. Kavya Manohar and Santhosh Thottingal. Proceedings of Graphemics in the 21st Century, Brest, France (2018).

  10. Spiral splines in typeface design: A case study of Manjari malayalam typeface: Santhosh Thottingal and Kavya Manohar. Typoday Conference, Mumbai (2018).

  11. Comparative study on vector quantization codebook generation algorithms for wideband speech coding. Kavya Manohar and B. Premanand. 2012 International Conference on Green Technologies (ICGT). IEEE (2012).

Articles

  1. Indian Languages and Text Normalization - Article Published in May 2024.

  2. Natural Languages and Machine Intelligence - A Malayalam article published in Sasthragathi, June 2022.

  3. Phonetic description of Malayalam consonants - A study of phonetic description of Malayalam consonants based on existing methods and IPA. Published on January 2020.

  4. Language technology in the age of AI - A Malayalam article published in Janayugam, Sept 2019.

  5. Information, Entropy and Malayalam - What is the information Entropy of Malayalam language? How to calculate it? Published on July 18, 2019.

  6. u and u: vowel signs of Malayalam - An analysis of various visual forms of Malayalam u-signs, published at Alphabettes.org.

Awards and Fellowships

  1. Diversity and Inclusion Grant from the Association of Computational Linguistics to present at EMNLP 2024.

  2. Best Paper Award for the paper “An Open Framework for the Development of Automatic Speech Recognition in Malayalam”, at Kerala Science Congress 2023 organized by Kerala State Council for Science, Technology and Environment.

  3. Junior Research Fellowship by University Grants Commission, Government of India. 2019-2023 (Ph.D.)

  4. GATE Fellowship by Ministry of Human Resource Development, Governemnt of India, 2010-2012 (M.Tech.)

Recognitions

  1. Member, Indian Language Technologies And Products Sectional Committee LITD 20, Bureau of Indian Standards.

  2. Program Committee Member, SPELLL 2024, Third International Conference on Speech and Language Technologies.

  3. Program Committee Member, ComputEL-8, Eighth Workshop on Computational Methods for Endangered Languages

Professional Services

Reviewer for SPELLL 2024, ComputEL-8, International Conference on Emerging Technologies for Intelligent Systems (ETIS 2025).

Certifications

Courses

Summer School on Automatic Speech Recognition, IIT Guwahati, 2019.

Machine Learning by Stanford University on Coursera, 2017.

A System View of Communications: From Signals to Packets by Hong Kong University of Science and Technology on edX, 2014.

Test scores
UGC NET & Junior Research Fellowship - 2019. GATE 2010, 2012, 2013, 2015, 2019. Best All India rank: 4602/176944.

Invited Talks

  1. Invited talk on ‘Malayalam Language: As percieved by computers (കമ്പ്യൂട്ടർ മനസ്സിലാക്കുന്ന മലയാളഭാഷ)’ at the seminar series (Malayalam: Life and Praxis) hosted by the Tirur Regional Centre of Sree Sankaracharya University of Sanskrit, Kaldy on February 18, 2025.
  2. Invited talk on ‘Malayalam in Unicode: Some Linguistic and Cultural Thoughts’ at Thapasam Seminar hosted by Sree Sankaracharya University of Sanskrit, Kalady on October 2, 2024.
  3. ‘Automatic Speech Recognition’ at UGC Stride Faculty Development Programme and International Conference at WMO College, Wayanad on April 28, 2023.
  4. Does your computer understand spoken Malayalam?’ as part of the Language Computing webinar series organized by Kerala Sasthra Sahithya Parishad on October 14, 2022.
  5. Contributing to Free and Open Source Software’ at the the init.d workshop organized by the FOSS cell, Model Engineering College Thrikkakkara, Ernakulam on 7th August 2021.
  6. ‘Indic Scripts, Unicode and a Brief Introduction to Natural Language Processing’ as a part of National webinar series organized by the Department of Computer Science at Sree Sankara Vidyapeetom College, Eranakulam on October 20, 2020.
  7. Towards Malayalam automatic speech recognition - a talk hosted by Tinkerhub foundation. May 2020.
  8. ‘The Digital Representation of Malayalam’ at the Malayalam Computing Workshop organized by the Malayalam Department of the University of Kerala, on December 11, 2019.
  9. Talk delivered on a survey conducted among my students and colleagues on the usage pattern of online resources for learning. Wikimania, Hong Kong, August 2013.

Skills

18 March 2025 Kavya Manohar
sakhi.(myfirstname)@gmail.com