Recent DDL research & events: 5 tips

Really exciting times for DDL and corpus linguistics and education researchers. There’s some interesting new stuff that has just been published, including some interesting conference videos. Here’s my selection.

(1) Boulton, A., & Vyatkina, N. (2021). Thirty years of data-driven learning: Taking stock and charting new directions over timeLanguage Learning & Technology25(3), 66-89.

Abstract

The tools and techniques of corpus linguistics have many uses in language pedagogy, most directly with language teachers and learners searching and using corpora themselves. This is often associated with work by Tim Johns who used the term Data-Driven Learning (DDL) back in 1990. This paper examines the growing body of empirical research in DDL over three decades (1989-2019), with rigorous trawls
uncovering 489 separate publications, including 117 in internationally ranked journals, all divided into five time periods. Following a brief overview of previous syntheses, the study introduces our collection, outlining the coding procedures and conversion into a corpus of over 2.5 million words. The main part of the analysis focuses on the concluding sections of the papers to see what recommendations and future avenues of research are proposed in each time period. We use manual coding and semi-automated corpus keyword analysis to explore whether those points are in fact addressed in later publications as an indication of the evolution of the field

(2) Dr Peter Crosthwaite, The University of Queensland: Is Data Driven Learning dead? In this talk Dr Crosthwaite ****

Language is never, ever, ever random

“Language is never, ever, ever random” (Kilgarriff, 2005), not in its usage, not in its acquisition, and not in its processing. (Nick C. Ellis, 2017, p. 41)

Nick C. Ellis (2017). Cognition, Corpora, and Computing: Triangulating Research in Usage-Based Language Learning. Language Learning 67(S1), pp. 40–65

Corpus of North American Spoken English (CoNASE)

The Corpus of North American Spoken English (CoNASE), a 1.25-billion-word corpus of geolocated automatic speech-to-text transcripts, is now available in a beta version.

URL http://cc.oulu.fi/~scoats/CoNASE.html for more information.

The corpus was created from 301,847 ASR transcripts from 2,572 YouTube channels, corresponding to 154,041 hours of video. The size of the corpus is 1,252,066,371 word tokens.

The channels sampled in the corpus are associated with local government entities such as town, city, or county boards and councils, school or utility districts, regional authorities such as provincial or territorial governments, or other governmental organizations.

The transcripts are primarily of recordings of public meetings, although other genres are also present. Video transcripts have been assigned exact latitude-longitude coordinates using a geocoding script.

This information was distributed through the Corpora-List by Steven Coats, University of Oulu, Finland

To cite the corpus, please use

Coats, Steven. 2021. Corpus of North American Spoken English (CoNASE). http://cc.oulu.fi/~scoats/CoNASE.html.

Unpacking SLA articles

This seems like an interesting Youtube channel. Florencia Henshaw
@Prof_F_Henshaw looks at relevant SLA research papers and provides an overview of their contents and implications for research in the field. So far (end of July 2021) two episodes have been published.

Jornada de difusión online proyecto de investigación Nutcracker, 24-25 junio 2021

NUTCRACKER: Sistema de detección, rastreo, monitorización y análisis del discurso terrorista en la Red Funded by: MINECO. 2017-2020. FFI2016-79748-R

Proyectos I+d+I – Programa estatal de investigación, desarrollo e innovación orientada a los retos de la sociedad.

“Nutcracker: System for Detection, Tracking, Monitoring and Analysis of the Discourse of Terror on the Net”

LINK 24 JUNE

https://oficinavirtual.ugr.es/redes/SOR/SALVEUGR/accesosala.jsp?IDSALA=22980794
Password: 657396

LINK 25 JUNE

https://oficinavirtual.ugr.es/redes/SOR/SALVEUGR/accesosala.jsp?IDSALA=22980797
Password: 466561

NUTCRACKER: Sistema de detección, rastreo, monitorización y análisis del discurso terrorista en la Red Funded by: MINECO. 2017-2020. FFI2016-79748-R

PIs: Prof Encarnación Hidalgo Tenorio, & Prof Juan Luis Castro Peña, Universidad de Granada

Where´s home? EU citizens as migrants.

Approaches to migration, language and identity 2020 AMLI Conference (www)

University of Sussex, Wednesday 9 – Friday 11 June 2021

Book of abstracts.


Pascual Pérez-Paredes & Elena Remigi
Universidad de Murcia / The In Limbo Project

When?

Thursday June 10, Panel A: Foregrounding migrant perspectives 11:25 UK time


Abstract

Since January 2021, UK and EU citizens can no longer exercise freedom of movement between the two areas. EU, EEA or Swiss citizens living in the UK before 31 December 2020 have been forced to apply to the EU Settlement Scheme to continue living in the UK. In practical terms, EU citizens have become a new migrant community. The 2016 Brexit referendum started a period of uncertainty,
agony and frustration for both EU citizens in the UK and UK citizens in the EU that ended with the trade deal that the EU and the UK made public on 24 December 2020. The anger, the sense of betrayal (Bueltmann, 2020) and various mental health issues (Reimer, 2018; Bueltmann, 2020), however, linger on. This study uses a corpus of 200 testimonies from EU citizens in the UK to explore their feelings and reactions to Brexit and the hostile environment (Leudar et al., 2008) that emerged soon after the referendum. The In Limbo corpus of testimonies contains personal accounts by EU citizens living in Britain from 2017 until 2020. It has 81,000 tokens and 7,600 types. The collection of the data was organised by volunteers on a not-for-profit basis. The testimonies in Remigi, E., Martin, V., & Sykes
(2020) were chosen as the basis of our corpus.


We used keyword (Baker, 2006; Baker et al., 2008) and collocation (Baker, 2006; Pérez-Paredes, Aguado & Sánchez, 2017; Pérez-Paredes, 2020) analyses to explore the self-representation of EU citizens across four emerging areas of interest: family life, loss of identity, feeling unwelcome and representations of post-Brexit Britain, including discourses about settled status and Britishness. In
order to moderate the impact of Brexit-as-a-topic in the analysis of the narratives, we used two reference corpora in our study: the Brexit corpus and the enTenTen 2015, both provided through Sketch Engine. We used Wodak’s (2001) framework of analysis of representation strategies to pin down our discussion of the discourses emerging in the testimonies. Two strategies appear to be relevant in the context of our data: predication and perspectivation. The former is used mainly when expressing feelings about the UK while the latter are crucial to deliver the narratives
discursively. While our research confirms some of the conclusions in the survey conducted by Bueltmann (2020), the combination of corpus-based CDA methods and the rich data provided through these narratives open up further understanding of the discursive strategies used by EU citizens when resisting the anti-EU environment that was unleashed in the wake of Brexit. Our analysis provides an alternative representation of the consequences and impact of Brexit on EU migrants that is in contrast with the recent triumphalist discourse of the Tory government that misrepresents EU citizens as happily embracing the settled status scheme.

Keywords: Brexit, EU citizens, migrants, keyword analysis, representation strategies

Download the top 100 multiword key terms from the In Limbo corpus.