TY - JOUR
T1 - Applying lexical sophistication models to wordlist development
T2 - A proof-of-concept study
AU - Nicklin, Christopher
AU - Bailey, Daniel
AU - McLean, Stuart
AU - Kim, Young Ae
AU - Kang, Hyeonah
AU - Vitta, Joseph P.
N1 - Publisher Copyright:
© 2024 The Author(s)
PY - 2025/4
Y1 - 2025/4
N2 - Language teaching stakeholders generally rely on frequency-derived wordlists to determine words for pedagogical purposes. However, words that are instinctively easier for many learners, such as “pizza”, occur less frequently in reference corpora than words that might be considered more difficult, such as “physics”. Furthermore, research demonstrates that modeling frequency alongside other lexical sophistication variables predicts word difficulty better than frequency alone. This study constitutes a proof-of-concept; the concept being that a lexical sophistication-based approach to wordlist construction can produce lists that outperform frequency as word difficulty predictors. The method resulted in lexical sophistication-derived difficulty scores for 14,054 of the 20,000 most frequent Corpus of Contemporary American English lemmas. When compared with other commonly used wordlists, these scores successfully addressed the “pizza/physics” problem in that “pizza” was ranked easier than “physics”, and they also displayed larger correlations with word difficulty than other lists across two linguistic domains. More importantly, the scores also performed comparably to a knowledge-based vocabulary list, but contained almost three times as many lemmas for a fraction of the time and financial costs. We envisage that the present study's methodology can be used by researchers and language teaching stakeholders to create bespoke wordlists for a range of contexts.
AB - Language teaching stakeholders generally rely on frequency-derived wordlists to determine words for pedagogical purposes. However, words that are instinctively easier for many learners, such as “pizza”, occur less frequently in reference corpora than words that might be considered more difficult, such as “physics”. Furthermore, research demonstrates that modeling frequency alongside other lexical sophistication variables predicts word difficulty better than frequency alone. This study constitutes a proof-of-concept; the concept being that a lexical sophistication-based approach to wordlist construction can produce lists that outperform frequency as word difficulty predictors. The method resulted in lexical sophistication-derived difficulty scores for 14,054 of the 20,000 most frequent Corpus of Contemporary American English lemmas. When compared with other commonly used wordlists, these scores successfully addressed the “pizza/physics” problem in that “pizza” was ranked easier than “physics”, and they also displayed larger correlations with word difficulty than other lists across two linguistic domains. More importantly, the scores also performed comparably to a knowledge-based vocabulary list, but contained almost three times as many lemmas for a fraction of the time and financial costs. We envisage that the present study's methodology can be used by researchers and language teaching stakeholders to create bespoke wordlists for a range of contexts.
KW - Frequency
KW - Lexical sophistication
KW - Vocabulary
KW - Word difficulty
KW - Wordlists
UR - http://www.scopus.com/inward/record.url?scp=85212124668&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85212124668&partnerID=8YFLogxK
U2 - 10.1016/j.rmal.2024.100175
DO - 10.1016/j.rmal.2024.100175
M3 - Article
AN - SCOPUS:85212124668
SN - 2772-7661
VL - 4
JO - Research Methods in Applied Linguistics
JF - Research Methods in Applied Linguistics
IS - 1
M1 - 100175
ER -