Examining Lexical Diversity with Entropy and Other Mathematical Measures

Authors

  • Ambreen Zehra Rizvi
  • Nazra Zahid Shaikh Senior Lecturer, Department of English, Faculty of Social Sciences and Humanities, Hamdard University Main Campus, Karachi, Pakistan.

Abstract

Lexical diversity, one of the primary measures of language proficiency or complexity, measures the diversity of words produced in a given text or speech sample. Conventional measures such as the type-token ratio (TTR) have shortcomings, leading to the introduction of more sophisticated mathematical metrics such as entropy, Shannon diversity index and advanced statistical models. It also explains the behind-the-scenes of the two existing entropy-based lexical diversity measures, such as their theoretical basis and their computation, and that they have found applications in linguistics, psycholinguistics, and natural language processing (NLP). Comparative analysis illustrates that with the consideration of entropy, measures have a more accurate indication of lexical richness when compared with the conventional methods, especially in the texts with varying lengths. The article closes with some suggestions for further research to enhance measurement of lexical diversity through a combination of machine learning and large-scale corpus analysis.

 

Keywords: Lexical diversity, entropy, Shannon index, type-token ratio, computational linguistics.

Downloads

Published

2025-05-27

How to Cite

Ambreen Zehra Rizvi, & Nazra Zahid Shaikh. (2025). Examining Lexical Diversity with Entropy and Other Mathematical Measures. Dialogue Social Science Review (DSSR), 3(5), 760–773. Retrieved from https://dialoguessr.com/index.php/2/article/view/662

Issue

Section

Articles