Kavya Manohar

kavyamanohar.comgithub.com/kavyamanoharHuggingface


Fair and Inclusive NLP Research. Technical Education. Speech and Language Computing.


Employment Details

2023 -

Computational Linguist, Kerala University of Digital Sciences Innovation and Technology, Thiruvananthapuram, Kerala, India.

7 years of teaching experience as Assistant Professor in Engineering Colleges.

2017 - 2019
Government Engineering College, Palakkad.
2013 -2017
Aryanet Institute of Technology, Palakkad.
2012 - 2013
Lourdes Matha College of Science and Technology, Thiruvananthapuram.

Education

PhD

APJ Abdul Kalam Technological University, 2019-2023

Thesis: Linguistic challenges in Malayalam speech recognition: Analysis and solutions.

M.Tech

University of Calicut, 2010-2012

Communication Engineering and Signal Processing with CGPA of 8.16.
Government Engineering College Thrissur with MHRD Fellowship.
Thesis: A comparative study on vector quantization techniques on the perceptual quality of wideband speech.

B.Tech

University of Kerala, 2006-2010

Electronics and Communication Engineering from Government Engineering College, Thiruvananthapuram with a CGPA of 7.6.

Research and Development

ASR Evaluations:
Analysis of text normalization in OpenAI’s Whisper ASR evaluation routine, revealing overestimation of performance and advocating alternative approaches for Indic languages (Accepted EMNLP 2024).
Speech Recognition:

Contributions

Subword Tokenization:
Study on the impact of subword tokenization algorithms on language modeling for automatic speech recognition in morphologically complex languages (EURASIP Journal 2023).
G2P Systems:

Contributions

Linguistic Complexity:
Comparative analysis on the morphological complexity of Malayalam language with respect to other Indian and European languages (TSD 2020).
Malayalam Typefaces
Open type engineering of glyph formation rules in popular open source Malayalam fonts Manjari, Chilanka and Gayathri.

Publications

Book Chapter
ASR Models from Conventional Statistical Models to Transformers and Transfer Learning, Elizabeth Sherly, Leena G Pillai, Kavya Manohar in Automatic Speech Recognition and Translation for Low Resource Languages, Wiley-Scrivener publishing, 2024.
Journal

Improving speech recognition systems for the morphologically complex Malayalam language using subword tokens for language modeling. Kavya Manohar, A. R. Jayan, and Rajeev Rajan. J AUDIO SPEECH MUSIC PROC. 2023, 47 (2023).

Mlphon: A Multifunctional Grapheme-Phoneme Conversion Tool Using Finite State Transducers. Kavya Manohar, A. R. Jayan, and Rajeev Rajan. IEEE Access 10 (2022).

Conferences

What is lost in Normalization? Exploring Pitfalls in Multilingual ASR Model Evaluations, Kavya Manohar, Leena G Pillai. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, Florida. Association for Computational Linguistics (Accepted)

Enhancing End-to-End Malayalam Automatic Speech Recognition with Language Model Augmentation. Kavya Manohar, Ashish Abraham, Gokul G Menon. Speech and Language Technologies for Low-Resource Languages. SPELLL 2024. (Accpeted)

Malayalam to English Named Entity Transliteration using Attention based BiLSTM, Bajiyo Baiju, Kavya Manohar, Leena G Pillai and Elizaebth Sherly. IEEE-RAICS, Recent Advances in Intelligent Computational Systems at Kothamangalam, 16-18 May 2024

Automatic Speech Recognition System for Malasar Language using Multilingual Transfer Learning, Basil K Raju, Leena G Pillai, Kavya Manohar, Elizabeth Sherly. In Proceedings of the 20th International Conference on Natural Language Processing (ICON 2023)

Automatic Recognition of Continuous Malayalam Speech using Pretrained Multilingual Transformers, Kavya Manohar; Gokul G. Menon; Ashish Abraham; Rajeev Rajan; A. R. Jayan. 2023 International Conference on Intelligent Systems for Communication, IoT and Security (ICISCoIS)

Syllable Subword tokens for Open Vocabulary Speech Recognition in Malayalam, Kavya Manohar, A. R. Jayan, and Rajeev Rajan. Third International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2022), Trento, Italy.

Quantitative analysis of the morphological complexity of Malayalam language. Kavya Manohar, A. R. Jayan, and Rajeev Rajan. International Conference on Text, Speech, and Dialogue. Springer, Cham, Czech Republic (2020).

Malayalam Orthographic Reforms. Impact on Language and Popular Culture. Kavya Manohar and Santhosh Thottingal. Proceedings of Graphemics in the 21st Century, Brest, France (2018).

Spiral splines in typeface design: A case study of Manjari malayalam typeface: Santhosh Thottingal and Kavya Manohar. Typoday Conference, Mumbai (2018).

Comparative study on vector quantization codebook generation algorithms for wideband speech coding. Kavya Manohar and B. Premanand. 2012 International Conference on Green Technologies (ICGT). IEEE (2012).

Articles

Indian Languages and Text Normalization - Article Published in May 2024.

Natural Languages and Machine Intelligence - A Malayalam article published in Sasthragathi, June 2022.

Phonetic description of Malayalam consonants - A study of phonetic description of Malayalam consonants based on existing methods and IPA. Published on January 2020.

Language technology in the age of AI - A Malayalam article published in Janayugam, Sept 2019.

Information, Entropy and Malayalam - What is the information Entropy of Malayalam language? How to calculate it? Published on July 18, 2019.

u and u: vowel signs of Malayalam - An analysis of various visual forms of Malayalam u-signs, published at Alphabettes.org.

Awards and Fellowships

Best Paper Award for the paper “An Open Framework for the Development of Automatic Speech Recognition in Malayalam”, at Kerala Science Congress 2023 organized by Kerala State Council for Science, Technology and Environment.

Junior Research Fellowship by University Grants Commission, Government of India. 2019-2023 (Ph.D.)

GATE Fellowship by Ministry of Human Resource Development, Governemnt of India, 2010-2012 (M.Tech.)

Recognitions

Member, Indian Language Technologies And Products Sectional Committee LITD 20, Bureau of Indian Standards.

Certifications

Courses

Summer School on Automatic Speech Recognition, IIT Guwahati, 2019.

Machine Learning by Stanford University on Coursera, 2017.

A System View of Communications: From Signals to Packets by Hong Kong University of Science and Technology on edX, 2014.

Test scores
UGC NET & Junior Research Fellowship - 2019.
GATE 2010, 2012, 2013, 2015, 2019. Best All India rank: 4602/176944.

Invited Talks

Skills

11 October 2024 Kavya Manohar
sakhi.kavya@gmail.com