Examining Lexical Diversity with Entropy and Other Mathematical Measures

Ambreen Zehra Rizvi; Nazra Zahid Shaikh

Authors

Ambreen Zehra Rizvi
Nazra Zahid Shaikh Senior Lecturer, Department of English, Faculty of Social Sciences and Humanities, Hamdard University Main Campus, Karachi, Pakistan.

Abstract

Lexical diversity, one of the primary measures of language proficiency or complexity, measures the diversity of words produced in a given text or speech sample. Conventional measures such as the type-token ratio (TTR) have shortcomings, leading to the introduction of more sophisticated mathematical metrics such as entropy, Shannon diversity index and advanced statistical models. It also explains the behind-the-scenes of the two existing entropy-based lexical diversity measures, such as their theoretical basis and their computation, and that they have found applications in linguistics, psycholinguistics, and natural language processing (NLP). Comparative analysis illustrates that with the consideration of entropy, measures have a more accurate indication of lexical richness when compared with the conventional methods, especially in the texts with varying lengths. The article closes with some suggestions for further research to enhance measurement of lexical diversity through a combination of machine learning and large-scale corpus analysis.

Keywords: Lexical diversity, entropy, Shannon index, type-token ratio, computational linguistics.

Examining Lexical Diversity with Entropy and Other Mathematical Measures

Authors

Abstract

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)

Journal Information

Indexing

Flag Counter