Exploring Part of Speech (POS)-tag sequences in a large-scale learner corpus of L2 English: A developmental perspective

Abstract

This research explores the POS-tag sequences that shape the transition from upper intermediate (B2 CEFR) to near-native proficiency (C2 CEFR) in a corpus of essays (n=32,410) from the Cambridge Learner Corpus. Gilquin (2018) and others have shown that POS tag sequences offer a holistic approach to extracting the most commonly used patterns without a starting point of an a prioriset of words and word sequences. Using corpus linguistics informed by usage-based theories of language learning, this paper examines the frequency and distribution of 4-slot POS-tag sequences in L2 English writing, drawing on the taxonomy of pattern grammar (Francis et al. 1996, 1998; Hunston & Francis, 2000). Findings point to the presence of both core and emergent POS-tag sequences in learner language in the two proficiency levels analysed. These sequences point to the presence of dynamic language restructuring processes as learners become more proficient and re-evaluate their understanding of frequency and distribution in English. This paper shows evidence of how language competence increases with proficiency. The research offers new evidence to our understanding of the development of L2 writing in EFL contexts.

This is a preprint of

Lim, J., Mark, G., Pérez-Paredes, P. & O’Keeffe, A. (2024). Exploring Part of Speech (POS)-tag sequences in a large-scale learner corpus of L2 English: A developmental perspective. Corpora, 19(1).

Corpus & applied linguistics research 2022 registration link

Free online event “Corpus & applied linguistics research 2022”

Corpus linguistics and Second Language Acquisition

Prof Tony McEnery, University of Lancaster

October 5, 18:00 (Madrid time) / 17:00 (UK time)

Who’s in this corpus? Looking at language and identity with (and without) demographic metadata

Dr Gavin Brookes, University of Lancaster

October 12, 18:00 (Madrid time) / 17:00 (UK time)

Corpus linguistics and the discursive construction of migrants

Dr Charlotte Taylor, University of Sussex

October 19, 18:00 (Madrid time) / 17:00 (UK time)

A corpus-friendly analysis of fragmentary constructions in English

Prof Javier Pérez Guerra, Universidad de Vigo

October 26, 18:00 (Madrid time) / 17:00 (UK time)

Free event registration link

Registration link

After the registration, you’ll receive an email with the webinar link for the four talks. This is the same link for all four talks.

You can check out the 2021 talks here:

https://www.youtube.com/channel/UCKjKIIQL6u1mXD2V9ZaT-_Q/featured

This online event is organized by the Universidad de Murcia and the E020-07 research group (Lenguajes de especialidad, corpus lingüísticos y lingüística inglesa aplicada a la ingeniería del conocimiento).

Coordination: Prof Pascual Pérez-Paredes

Jornada de difusión online proyecto de investigación Nutcracker, 24-25 junio 2021

NUTCRACKER: Sistema de detección, rastreo, monitorización y análisis del discurso terrorista en la Red Funded by: MINECO. 2017-2020. FFI2016-79748-R

Proyectos I+d+I – Programa estatal de investigación, desarrollo e innovación orientada a los retos de la sociedad.

“Nutcracker: System for Detection, Tracking, Monitoring and Analysis of the Discourse of Terror on the Net”

LINK 24 JUNE

https://oficinavirtual.ugr.es/redes/SOR/SALVEUGR/accesosala.jsp?IDSALA=22980794
Password: 657396

LINK 25 JUNE

https://oficinavirtual.ugr.es/redes/SOR/SALVEUGR/accesosala.jsp?IDSALA=22980797
Password: 466561

NUTCRACKER: Sistema de detección, rastreo, monitorización y análisis del discurso terrorista en la Red Funded by: MINECO. 2017-2020. FFI2016-79748-R

PIs: Prof Encarnación Hidalgo Tenorio, & Prof Juan Luis Castro Peña, Universidad de Granada