Data-driven learning in informal contexts? Embracing broad data-driven learning (BDDL) research

Pérez-Paredes, P. (2024) Data-driven learning in informal contexts? Embracing Broad Data-driven learning (BDDL) research. In Crosthwaite, P. (Ed.). Corpora for Language Learning: Bridging the Research-Practice Divide. Routledge.

In this chapter, I argue that it is necessary to pursue an analysis of DDL practices in the broader language learning context (Pérez-Paredes & Mark, 2022), particularly in informal contexts outside the university classroom.

We need to push the boundaries of DDL praxis and research outside the classroom if we are to gain a more comprehensive view of the contributions of DDL to language learning in the first half of the 21st century. It is essential to expand the ecological research model that has dominated DDL research so far, and which has thoroughly examined higher education (HE) contexts.

While instructed, formal language learning continues to be central to language learners’ experiences, new sites of learning and technologies emerge sometimes unexpectedly (e.g. the impact of ChatGPT at the end of 2022 was surprising, and it is probably too soon to evaluate its impact on language education).

I use the term “prototypical DDL” (Boulton, 2015) to refer to DDL that is designed by an expert in corpus linguistics and which takes place in the context of instructed second language acquisition (SLA) as part of a module or an official programme, typically in a higher education institution (HEI).

The term “broad DDL” (BDDL) refers to pedagogical natural language processing resources (P-NLPRs) for language learning (see Pérez- Paredes et al., 2018). BDDL makes use of a wide range of existing resources such as online dictionaries, text analysis and text processing tools, vocabulary-oriented websites and apps, translation services, and artificial intelligence (AI) tools for language learning across a variety of contexts, including self-directed uses.

It also involves the use of informal language learning against the backdrop of digital learning, characterized by a new ecology of reading and writing, multitasking and the emergence of a new literate social formation (Pérez-Paredes & Zhang, 2022) where communication processes are transitioning towards “dialogic interactions [less] subject to the power of institutions to set standards of knowledge, procedure, and truth based on their control of written texts” (Gee & Hayes, 2011, p. 125).

In BDDL, corpora are one of the many resources available to language learners. While some research has examined the use of Google as a web corpus and a concordancer (Sun, 2007; Sha, 2010; Pérez-Paredes et al., 2012; Boulton, 2015), this has mostly happened in instructed SLA contexts. The impact of other P-NLPRs in informal learning remains largely unexplored (see Crosthwaite & Boulton, 2023 for a discussion of some of these resources).

User-generated activity using personal devices such as phones or tablets treasure the potential to inform designed activity and, most significantly, what we know about learners’ interactions with content online (Kukulska-Hulme et al., 2007). P-NLPRs have the potential to foster autonomy, personalization, induction and authenticity and may offer an alternative to prototypical DDL corpora when engaging with BDLL (Pérez-Paredes et al., 2018, 2019).

There are three areas, at least, that will benefit from an examination of BDDL practices in informal learning: The exploration of new sites of language learning engagement; New opportunities to increase our understanding of the cognitive processes involved in statistical language learning; and the study and analysis of the role of new corpora in informal settings.

Thanks to Carolina Tavares de Carvalho, Daniela Terenzi & Alejandro Curado Fuentes for providing their insights

New research on Data-driven language learning March 2023

Allan, R. (2023). Reserved for Research? Normalising Corpus Use for School TeachersNordic Journal of English Studies22(1).

Allan, R., Walker, T., & Langum, V. (2023). Data-driven learning: Tools, approaches, and next steps. Nordic Journal of English Studies22(1), 1-12.

Muftah, M. (2023). Data-driven learning (DDL) activities: do they truly promote EFL students’ writing skills development? Education and Information Technologies, 1-27.

O’Keeffe, A. (2023). A Theoretical Rationale for the Importance of Patterning in Language Acquisition and the Implications for Data-driven Learning. Nordic Journal of English Studies22(1), 16-41.

Şahin Kızıl, A. Data‐driven learning: English as a foreign language writing and complexity, accuracy and fluency measures. Journal of Computer Assisted Learning.

New publication: An investigation of Chinese EFL learners’ acceptance of mobile dictionaries in English language learning

Zhang,D., Hennessy, S. & Pérez-Paredes, P. (2023) An investigation of Chinese EFL learners’ acceptance of mobile dictionaries in English language learning. Computer Assisted Language Learning, DOI: 10.1080/09588221.2023.2189915

Abstract

Although many studies have explored the role of dictionaries in English language learning, few have investigated mobile dictionaries (MDs) from learners’ perspectives. This study aimed to explore Chinese EFL learners’ acceptance of three types of MDs: monolingual, bilingualised and bilingual. A total of 125 participants used mobile dictionaries in various English learning contexts, especially in reading comprehension and vocabulary learning. Adapted from the Technology Acceptance Model and the mobile technology evaluation framework, the questionnaire in this study addressed three key themes: (1) perceived ease of use, (2) perceived usefulness, and (3) behavioural intention to use.

Analysis shows that the bilingualised MD group reported the most positive perceptions, especially compared to the bilingual MD group. A total of 101 participants participated in semi-structured group interviews to further explore the reasons underlying their perceptions. Several factors impacting learner acceptance, from the micro to the macro level, are proposed and discussed. As an interdisciplinary study, this research fills theoretical and empirical gaps in investigating mobile-assisted language learning. It offers application designers and language teachers insights into learners’ acceptance of MDs. Moreover, it provides recommendations concerning making MDs more personalised, attractive and effective.

Exploring Part of Speech (POS)-tag sequences in a large-scale learner corpus of L2 English: A developmental perspective

Abstract

This research explores the POS-tag sequences that shape the transition from upper intermediate (B2 CEFR) to near-native proficiency (C2 CEFR) in a corpus of essays (n=32,410) from the Cambridge Learner Corpus. Gilquin (2018) and others have shown that POS tag sequences offer a holistic approach to extracting the most commonly used patterns without a starting point of an a prioriset of words and word sequences. Using corpus linguistics informed by usage-based theories of language learning, this paper examines the frequency and distribution of 4-slot POS-tag sequences in L2 English writing, drawing on the taxonomy of pattern grammar (Francis et al. 1996, 1998; Hunston & Francis, 2000). Findings point to the presence of both core and emergent POS-tag sequences in learner language in the two proficiency levels analysed. These sequences point to the presence of dynamic language restructuring processes as learners become more proficient and re-evaluate their understanding of frequency and distribution in English. This paper shows evidence of how language competence increases with proficiency. The research offers new evidence to our understanding of the development of L2 writing in EFL contexts.

This is a preprint of

Lim, J., Mark, G., Pérez-Paredes, P. & O’Keeffe, A. (2024). Exploring Part of Speech (POS)-tag sequences in a large-scale learner corpus of L2 English: A developmental perspective. Corpora, 19(1).