Data-driven learning in informal contexts? Embracing broad data-driven learning (BDDL) research

Pérez-Paredes, P. (2024) Data-driven learning in informal contexts? Embracing Broad Data-driven learning (BDDL) research. In Crosthwaite, P. (Ed.). Corpora for Language Learning: Bridging the Research-Practice Divide. Routledge.

In this chapter, I argue that it is necessary to pursue an analysis of DDL practices in the broader language learning context (Pérez-Paredes & Mark, 2022), particularly in informal contexts outside the university classroom.

We need to push the boundaries of DDL praxis and research outside the classroom if we are to gain a more comprehensive view of the contributions of DDL to language learning in the first half of the 21st century. It is essential to expand the ecological research model that has dominated DDL research so far, and which has thoroughly examined higher education (HE) contexts.

While instructed, formal language learning continues to be central to language learners’ experiences, new sites of learning and technologies emerge sometimes unexpectedly (e.g. the impact of ChatGPT at the end of 2022 was surprising, and it is probably too soon to evaluate its impact on language education).

I use the term “prototypical DDL” (Boulton, 2015) to refer to DDL that is designed by an expert in corpus linguistics and which takes place in the context of instructed second language acquisition (SLA) as part of a module or an official programme, typically in a higher education institution (HEI).

The term “broad DDL” (BDDL) refers to pedagogical natural language processing resources (P-NLPRs) for language learning (see Pérez- Paredes et al., 2018). BDDL makes use of a wide range of existing resources such as online dictionaries, text analysis and text processing tools, vocabulary-oriented websites and apps, translation services, and artificial intelligence (AI) tools for language learning across a variety of contexts, including self-directed uses.

It also involves the use of informal language learning against the backdrop of digital learning, characterized by a new ecology of reading and writing, multitasking and the emergence of a new literate social formation (Pérez-Paredes & Zhang, 2022) where communication processes are transitioning towards “dialogic interactions [less] subject to the power of institutions to set standards of knowledge, procedure, and truth based on their control of written texts” (Gee & Hayes, 2011, p. 125).

In BDDL, corpora are one of the many resources available to language learners. While some research has examined the use of Google as a web corpus and a concordancer (Sun, 2007; Sha, 2010; Pérez-Paredes et al., 2012; Boulton, 2015), this has mostly happened in instructed SLA contexts. The impact of other P-NLPRs in informal learning remains largely unexplored (see Crosthwaite & Boulton, 2023 for a discussion of some of these resources).

User-generated activity using personal devices such as phones or tablets treasure the potential to inform designed activity and, most significantly, what we know about learners’ interactions with content online (Kukulska-Hulme et al., 2007). P-NLPRs have the potential to foster autonomy, personalization, induction and authenticity and may offer an alternative to prototypical DDL corpora when engaging with BDLL (Pérez-Paredes et al., 2018, 2019).

There are three areas, at least, that will benefit from an examination of BDDL practices in informal learning: The exploration of new sites of language learning engagement; New opportunities to increase our understanding of the cognitive processes involved in statistical language learning; and the study and analysis of the role of new corpora in informal settings.

Thanks to Carolina Tavares de Carvalho, Daniela Terenzi & Alejandro Curado Fuentes for providing their insights

New publication: An investigation of Chinese EFL learners’ acceptance of mobile dictionaries in English language learning

Zhang,D., Hennessy, S. & Pérez-Paredes, P. (2023) An investigation of Chinese EFL learners’ acceptance of mobile dictionaries in English language learning. Computer Assisted Language Learning, DOI: 10.1080/09588221.2023.2189915


Although many studies have explored the role of dictionaries in English language learning, few have investigated mobile dictionaries (MDs) from learners’ perspectives. This study aimed to explore Chinese EFL learners’ acceptance of three types of MDs: monolingual, bilingualised and bilingual. A total of 125 participants used mobile dictionaries in various English learning contexts, especially in reading comprehension and vocabulary learning. Adapted from the Technology Acceptance Model and the mobile technology evaluation framework, the questionnaire in this study addressed three key themes: (1) perceived ease of use, (2) perceived usefulness, and (3) behavioural intention to use.

Analysis shows that the bilingualised MD group reported the most positive perceptions, especially compared to the bilingual MD group. A total of 101 participants participated in semi-structured group interviews to further explore the reasons underlying their perceptions. Several factors impacting learner acceptance, from the micro to the macro level, are proposed and discussed. As an interdisciplinary study, this research fills theoretical and empirical gaps in investigating mobile-assisted language learning. It offers application designers and language teachers insights into learners’ acceptance of MDs. Moreover, it provides recommendations concerning making MDs more personalised, attractive and effective.

Free copy of our latest paper in Computer Assisted Language Learning

Our article, Language teachers’ perceptions on the use of OER language processing technologies in MALL, has just been published on Computer Assisted Language Learning Journal, Taylor & Francis Online.

50 free eprints can be downloaded from the following URL:

Get yours now!!!!


Combined with the ubiquity and constant connectivity of mobile devices, and with innovative approaches such as Data-Driven Learning (DDL), Natural Language Processing Technologies (NLPTs) as Open Educational Resources (OERs) could become a powerful tool for language learning as they promote individual and personalized learning. Using a questionnaire that was answered by language teachers (n = 230) in Spain and the UK, this research explores the extent to which OER NLPTs are currently known and used in adult foreign language learning. Our results suggest that teachers’ familiarity and use of OER NLPTs are very low. Although online dictionaries, collocation dictionaries and spell checkers are widely known, NLPTs appear to be generally underused in foreign language teaching. It was found that teachers prefer computer-based environments over mobile devices such as smartphones and tablets and that teachers’ qualification determines their familiarity with a wider range of OER NLPTs. This research offers insight into future applications of Language Processing Technologies as OERs in language learning.

KEYWORDS: Language learning, teachers’ perceptions, OER, MALL, natural language processing technologies, higher education

Learner Corpora in Language Testing and Assessment

Edited by Marcus Callies and Sandra Götz
University of Bremen / Justus Liebig University, Giessen
ISBN 9789027203786 
The aim of this volume is to highlight the benefits and potential of using learner corpora for the testing and assessment of L2 proficiency in both speaking and writing, reflecting the growing importance of learner corpora in applied linguistics and second language acquisition research. Identifying several desiderata for future research and practice, the volume presents a selection of original studies, covering a variety of different languages. It features studies that present very thoroughly compiled new corpus resources which are tailor-made and ready for analysis in LTA, new tools for the automatic assessment of proficiency levels, and new methods of (self-)assessment with the help of learner corpora. Other studies suggest innovative research methodologies of how proficiency can be operationalized through learner corpus data. The volume is of particular interest to researchers in (applied) corpus linguistics, learner corpus research, language testing and assessment, as well as for materials developers and language teachers.
Learner corpora in language testing and assessment: Prospects and challenges
Marcus Callies and Sandra Götz
1 – 10
New corpus resources, tools and methods. The Marburg Corpus of Intermediate Learner English (MILE)
Rolf Kreyer
13 – 34
Avalingua : Natural language processing for automatic error detection
Pablo Gamallo Otero, Marcos Garcia, Iria del Río and Isaac González López
35 – 58
Data commentary in science writing: Using a small, specialized corpus for formative self-assessment practices
Lene Nordrum and Andreas Eriksson
59 – 84
First steps in assigning proficiency to texts in a learner corpus of computer-mediated communication
Tim Marchand and Sumie Akutsu
85 – 112
Data-driven approaches to the assessment of proficiency
The English Vocabulary Profile as a benchmark for assigning levels to learner corpus data
Agnieszka Lenko-Szymanska
115 – 140
A multidimensional analysis of learner language during story reconstruction in interviews
Pascual Pérez-Paredes and María Sánchez-Tornel
141 – 162
Article use and criterial features in Spanish EFL writing: A pilot study from CEFR A2 to B2 levels
María Belén Díez-Bedmar
163 – 190
Tense and aspect errors in spoken learner English: Implications for language testing and assessment
Sandra Götz
191 – 216

CFP Automated Writing Evaluation in Language Teaching: Theory, Development, and Application

Call for Papers

Special Issue: CALICO Journal 33.1, 2016
Guest editors: Volker Hegelheimer, Ahmet Dursun, Zhi Li, Iowa State University

Automated Writing Evaluation in Language Teaching: Theory, Development, and Application

The first automated writing evaluation (AWE) software for assessment purposes dates back to the 1960s (Project Essay Grade, Page Ellis). Rapid advances in the fields of artificial intelligence and natural language processing in the last few decades have led to the development of more powerful scoring engines, such as e-rater developed by ETS and IntelliMetric by Vantage Learning. Recent years have seen the application of scoring engines expand to language learning and teaching purposes. Likewise, much open-source and commercial AWE software has been released for use in the language learning (L2) classroom.

Opinions on the utility of AWE tools and their potential effects on educational practices vary, as shown by two frequently-cited books on AWE: Ericsson and Haswell (2006) and Shermis and Burstein (2013).  While many AWE tools are impressive in terms of scoring reliability, the use of AWE for assessment purposes in writing classrooms has seen fierce discussion and opposition, as articulated in the 2004 position statement of the Conference on College Composition and Communication (CCCC). More studies are needed to evaluate AWE tools in classrooms. This special issue will bring together a variety of studies related to AWE in the context of Computer-Assisted Language Learning (CALL). The issue will cover conceptual and empirical research on AWE tool development, AWE tool classroom implementation, and resulting pedagogical implications.  It will thus be of interest to AWE designers and developers, applied-linguistics researchers, and language teachers and practitioners. With an emphasis on AWE development for classroom use and its implementation, this issue will be a good complement to existing books on AWE, such as Ericsson & Haswell (2006) and Shermis & Burstein (2013).

Research articles that include a theoretical discussion and/or empirical research on the promise, challenges, and issues related to the development, implementation, or evaluation of AWE tools are invited.  These articles may investigate how AWE tools provide L2 learners, language teachers, and computational linguists with opportunities and challenges to:

* promote writing proficiency development
* encourage learner autonomy
* support pedagogical practices
* incorporate theories of Second Language Acquisition
* integrate L2 writing curricula
* develop theory-based AWE tools

By bringing together a variety of researchers and practitioners who have employed qualitative, quantitative, or mixed-method methodologies in researching different AWE tools across different contexts and genres, this Special Issue will raise the awareness of researchers and practitioners regarding the use of AWE tools as part of classroom instruction. This issue is timely as new commercial and academic AWE tools are being used or introduced. The papers in this issue can generate both valuable guidance for implementation and also offer suggestions for needed research on the use of AWE tools as potential language learning technologies.

It is our hope that this Special Issue will stimulate lively discussion about (1)  how to approach the theory-based design and use of different AWE tools in order to best address the needs of L2 learners in different contexts, (2) whether or not to integrate AWE tools into the L2 writing curriculum and use these tools as part of classroom instruction, and (3) how to effectively coordinate a variety of existing technologies in light of learner variables, such as self-regulated
learning, motivation, and learner autonomy.

In a wider sense, this Special Issue will illustrate how developers design and create AWE tools, how instructors implement these tools in their classes, and how learners use them to improve their L2 writing skills. We will thus de-mystify the development of AWE tools for pedagogical purposes and shed light on best practices for teaching L2 writing with AWE tools.

Please send inquiries and abstracts to Volker Hegelheimer ( before 1st August 2014. Please list CALICO Journal Special Issue in the subject line of your email. For the submission of the manuscript, follow the online submission process and refer to the Author Guidelines of CJ. During the submission process, select ‘Special Issue AWE’ as the section.


First Call for Papers                                                 1 June  2014
Deadline for submission of abstracts                1 August 2014
Notification of contributors                                   1 September 2014
First draft of papers to be submitted                  1 January 2015
Returned to authors for changes                          15 March 2015
Second draft of papers to be submitted             15 June 2015
Returned to authors for final changes                1 September 2015
Special Issue to be published                              February 2016

Thanks to Mathias Schulze

Learners’ search patterns during corpus-based focus-on-form activities

This research explores the search behaviour of EFL learners (n=24) by tracking their interaction with corpus-based materials during focus-on-form activities (Observe, Search the corpus, Rewriting). One set of learners made no use of web services other than the BNC during the central Search the corpus activity while the other set resorted to other web services and/or consultation guidelines. The performance of the second group was higher, the learners’ formulation of corpus queries on the BNC was unsophisticated and the students tended to use the BNC search interface to a great extent in the same way as they used Google or similar services. Our findings suggest that careful consideration should be given to the cognitive aspects concerning the initiation of corpus searches, the role of computer search interfaces, as well as the implementation of corpus-based language learning. Our study offers a taxonomy of learner searches that may be of interest in future research.

Pérez-Paredes, P., Sánchez-Tornel, M., & Alcaraz Calero, J. M. (2012). Learners’ search patterns during corpus-based focus-on-form activities.International Journal of Corpus Linguistics17(4), 483-516

Full text here.