corpora Archives - Pérez-Paredes

CFP: Generative AI and data-driven learning in second language learning

Guest editors: Javad Zare, Kosar University of Bojnord, Iran, and Alex Boulton, Université de Lorraine, France

Language Learning & Technology has an active call for papers in a special issue on Generative AI and data-driven learning in second language learning: What the future holds, guest edited by Javad Zare and Alex Boulton.

Abstracts for this special issue Call for Papers should be no more than 500 words and should describe the study’s aim, methodology, findings related to L2 learning outcomes, and how these findings can be used in classroom contexts to enhance L2 teaching and learning with technology. To be considered for this special issue, which will appear in volume 31, issue 2 in June of 2027, please submit a title and a 500-word abstract through this online form by June 1, 2025.

Details for the call for proposals are available on our website: https://www.lltjournal.org/post/21/

5 recent books for language teachers interested in corpus linguistics, DDL & language education

Crosthwaite, P. (Ed.). (2019). Data-driven learning for the next generation: Corpora and DDL for pre-tertiary learners. Routledge. (URL)

Jablonkai, R. R., & Csomay, E. (Eds.). (2022). The Routledge Handbook of Corpora and English Language Teaching and Learning. Routledge.. (URL)

Pérez-Paredes, P. (2020). Corpus Linguistics for Education. A Guide for Research. Routledge. (URL)

Timmis, I. (2015). Corpus linguistics for ELT: Research and practice. Routledge. (URL)

Viana, V. (Ed.). (2022). Teaching English with Corpora: A Resource Book. Routledge. (URL)

John Sinclair and language theory

The following is an extract form Hunston (2022, p. 256).

Hunston, S. (2022). Corpora in applied linguistics. Cambridge University Press.

Sinclair made a number of generalisations in the 1980s (Sinclair 1991, 2004; see also Francis 1993; Hoey 2005; Hunston 2002; Stubbs 2001) which might be summarised as follows:

• In describing the meanings of a word, the ‘phrases’ that the word is used in are central to that description (= there is no distinction between form and meaning).

• Those ‘phrases’ are neither fully fixed nor fully open – in fact the distinction between ‘word’ and ‘fixed phrase’ does not hold up; the boundaries of a ‘phrase’ may be indeterminate and the variation resists classification.

• Those ‘phrases’ incorporate associations between individual words that might be discussed under the heading of collocation, but the ‘phrases’ also include aspects of grammar and commonalities of meaning rather than of form (= language is not divided into lexis and grammar).

• Although we commonly think of words as having meaning, and we often talk of a word having several meanings, what actually happens is that a word occurs in several ‘phrases’ and meaning resides in the ‘phrase’ rather than the word (= unit of meaning).

• When we look at text we can observe that a lot of it can be explained as a series of units of meaning and the remainder can be explained in terms of residual grammar (= idiom principle and open-choice principle).

Phil Durrant’s talk available on Youtube

Check out Dr Durrant’s talk “Researching writing development with a corpus” on our research group Youtube Channel https://www.youtube.com/channel/UCKjKIIQL6u1mXD2V9ZaT-_Q

More info on the talk here.

More info on Corpus linguistics and applied linguistics research 2021 site.

Corpus of North American Spoken English (CoNASE)

The Corpus of North American Spoken English (CoNASE), a 1.25-billion-word corpus of geolocated automatic speech-to-text transcripts, is now available in a beta version.

URL http://cc.oulu.fi/~scoats/CoNASE.html for more information.

The corpus was created from 301,847 ASR transcripts from 2,572 YouTube channels, corresponding to 154,041 hours of video. The size of the corpus is 1,252,066,371 word tokens.

The channels sampled in the corpus are associated with local government entities such as town, city, or county boards and councils, school or utility districts, regional authorities such as provincial or territorial governments, or other governmental organizations.

The transcripts are primarily of recordings of public meetings, although other genres are also present. Video transcripts have been assigned exact latitude-longitude coordinates using a geocoding script.

This information was distributed through the Corpora-List by Steven Coats, University of Oulu, Finland

To cite the corpus, please use

Coats, Steven. 2021. Corpus of North American Spoken English (CoNASE). http://cc.oulu.fi/~scoats/CoNASE.html.

Uses of corpora in Spanish Language Teaching

Abad Castelló, Magdalena (2019). Uso de corpus lingüísticos por y para profesores de español como lengua extranjera. redELE Revista electrónica de didáctica del español lengua extranjera, 31. (URL)