Some resources to learn corpus-based discourse analysis

One of my students asked me for some online references to learn more about corpus-based/assisted discourse analysis. Here’s 5 online talks.

Obesity in the News: Combining Corpus and Critical Perspectives. Online talk by Gavin Brookes at Universidad de Murcia.

Corpus linguistics and the discursive construction of migrants. Online talk by Charlotte Taylor at Universidad de Murcia.

CorpusCast with Dr Robbie Love: Professor Paul Baker on social justice.

Corpus-based discourse analysis. Online talk by Tony McEnery. LAEL webinar.

Corpus linguistics and the analysis of language ideology. Online talk by Rachelle Vessey at Universidad de Murcia.

New research on Data-driven language learning March 2023

Allan, R. (2023). Reserved for Research? Normalising Corpus Use for School TeachersNordic Journal of English Studies22(1).

Allan, R., Walker, T., & Langum, V. (2023). Data-driven learning: Tools, approaches, and next steps. Nordic Journal of English Studies22(1), 1-12.

Muftah, M. (2023). Data-driven learning (DDL) activities: do they truly promote EFL students’ writing skills development? Education and Information Technologies, 1-27.

O’Keeffe, A. (2023). A Theoretical Rationale for the Importance of Patterning in Language Acquisition and the Implications for Data-driven Learning. Nordic Journal of English Studies22(1), 16-41.

Şahin Kızıl, A. Data‐driven learning: English as a foreign language writing and complexity, accuracy and fluency measures. Journal of Computer Assisted Learning.

New publication: An investigation of Chinese EFL learners’ acceptance of mobile dictionaries in English language learning

Zhang,D., Hennessy, S. & Pérez-Paredes, P. (2023) An investigation of Chinese EFL learners’ acceptance of mobile dictionaries in English language learning. Computer Assisted Language Learning, DOI: 10.1080/09588221.2023.2189915


Although many studies have explored the role of dictionaries in English language learning, few have investigated mobile dictionaries (MDs) from learners’ perspectives. This study aimed to explore Chinese EFL learners’ acceptance of three types of MDs: monolingual, bilingualised and bilingual. A total of 125 participants used mobile dictionaries in various English learning contexts, especially in reading comprehension and vocabulary learning. Adapted from the Technology Acceptance Model and the mobile technology evaluation framework, the questionnaire in this study addressed three key themes: (1) perceived ease of use, (2) perceived usefulness, and (3) behavioural intention to use.

Analysis shows that the bilingualised MD group reported the most positive perceptions, especially compared to the bilingual MD group. A total of 101 participants participated in semi-structured group interviews to further explore the reasons underlying their perceptions. Several factors impacting learner acceptance, from the micro to the macro level, are proposed and discussed. As an interdisciplinary study, this research fills theoretical and empirical gaps in investigating mobile-assisted language learning. It offers application designers and language teachers insights into learners’ acceptance of MDs. Moreover, it provides recommendations concerning making MDs more personalised, attractive and effective.

Exploring Part of Speech (POS)-tag sequences in a large-scale learner corpus of L2 English: A developmental perspective


This research explores the POS-tag sequences that shape the transition from upper intermediate (B2 CEFR) to near-native proficiency (C2 CEFR) in a corpus of essays (n=32,410) from the Cambridge Learner Corpus. Gilquin (2018) and others have shown that POS tag sequences offer a holistic approach to extracting the most commonly used patterns without a starting point of an a prioriset of words and word sequences. Using corpus linguistics informed by usage-based theories of language learning, this paper examines the frequency and distribution of 4-slot POS-tag sequences in L2 English writing, drawing on the taxonomy of pattern grammar (Francis et al. 1996, 1998; Hunston & Francis, 2000). Findings point to the presence of both core and emergent POS-tag sequences in learner language in the two proficiency levels analysed. These sequences point to the presence of dynamic language restructuring processes as learners become more proficient and re-evaluate their understanding of frequency and distribution in English. This paper shows evidence of how language competence increases with proficiency. The research offers new evidence to our understanding of the development of L2 writing in EFL contexts.

This is a preprint of

Lim, J., Mark, G., Pérez-Paredes, P. & O’Keeffe, A. (2024). Exploring Part of Speech (POS)-tag sequences in a large-scale learner corpus of L2 English: A developmental perspective. Corpora, 19(1).

5 recent books for language teachers interested in corpus linguistics, DDL & language education

Crosthwaite, P. (Ed.). (2019). Data-driven learning for the next generation: Corpora and DDL for pre-tertiary learners. Routledge. (URL)

Jablonkai, R. R., & Csomay, E. (Eds.). (2022). The Routledge Handbook of Corpora and English Language Teaching and Learning. Routledge.. (URL)

Pérez-Paredes, P. (2020). Corpus Linguistics for Education. A Guide for Research. Routledge. (URL)

Timmis, I. (2015). Corpus linguistics for ELT: Research and practice. Routledge. (URL)

Viana, V. (Ed.). (2022). Teaching English with Corpora: A Resource Book. Routledge. (URL)