Exploring Part of Speech (POS)-tag sequences in a large-scale learner corpus of L2 English: A developmental perspective


This research explores the POS-tag sequences that shape the transition from upper intermediate (B2 CEFR) to near-native proficiency (C2 CEFR) in a corpus of essays (n=32,410) from the Cambridge Learner Corpus. Gilquin (2018) and others have shown that POS tag sequences offer a holistic approach to extracting the most commonly used patterns without a starting point of an a prioriset of words and word sequences. Using corpus linguistics informed by usage-based theories of language learning, this paper examines the frequency and distribution of 4-slot POS-tag sequences in L2 English writing, drawing on the taxonomy of pattern grammar (Francis et al. 1996, 1998; Hunston & Francis, 2000). Findings point to the presence of both core and emergent POS-tag sequences in learner language in the two proficiency levels analysed. These sequences point to the presence of dynamic language restructuring processes as learners become more proficient and re-evaluate their understanding of frequency and distribution in English. This paper shows evidence of how language competence increases with proficiency. The research offers new evidence to our understanding of the development of L2 writing in EFL contexts.

This is a preprint of

Lim, J., Mark, G., Pérez-Paredes, P. & O’Keeffe, A. (2024). Exploring Part of Speech (POS)-tag sequences in a large-scale learner corpus of L2 English: A developmental perspective. Corpora, 19(1).

Corpus linguistics and instructional needs

Tyler & Ortega (2018: 317):

Quite simply, corpora are the place to look for patterns of usage. Moreover, we believe that in usage-inspired instruction L2 targets should be taught not just because they can be taught – that is, because we have a good linguistic description or can create good materials – but because corpus linguistic investigations of learner language development show them to be actual areas of instructional need.

Tyler & Ortega (2018: 318):

The diversity of learning goals just acknowledged is salutary. But it also carries the danger of encouraging a certain bifurcation of usage-inspired L2 instruction into two separate streams, one that privileges implicit and incidental learning (i.e.,absorbing new patterns of language without trying hard to learn them and without knowing they are being learned) and another that revalorizes explicit knowledge, explicit teaching, and explicit learning, thus going against the grain of suspicion over explicitness in much instructed SLA in the past. However, we do not see the explicit-implicit instructional continuum as a zero-sum game. Usage-based views of language development show that the bulk of language learning happens implicitly. But much of the fine-tuning also happens explicitly with the aid of top-down, conscious processing (Ellis, 2011, 2015). It follows that learning proceeds by dynamic interactions between implicit and explicit processing.
Thus, we argue that the full range of goals for learning needs to be addressed in instructional designs. Ideally, usage-inspired L2 instruction can vary so as to offer learners diverse benefits, including more fluent and more contextually effective language use (e.g., through close attention to meaningful input- and practice-driven implicit learning), greater metacognitive self-regulation for greater autonomy and life-long learning (e.g., through induction and deduction of new understandings of language during explicit, concept-guided, top-down learning), and heightened agency in making connections between language choices and social consequences
so the latter can be empowering (e.g., through ethnographic and corpus analyses of one’s and others’ communicative repertoires that make the social consequences and their language reflexes conscious).

Tyler, A. E., Ortega, L., (2018). Usage-inspired L2 instruction. Some reflections and a heuristic. In Tyler, A. E., Ortega, L., Uno, M., & Park, H. I. (Eds.). Usage-inspired L2 instruction: Researched pedagogy. Amsterdam: John Benjamins Publishing Company, 315-321.

#CFP Symposium on Corpus Approaches to Lexicogrammar Edge Hill University


The symposium will take place on Saturday 10 June 2017 at Edge Hill University.

The focus of the Symposium is the interaction of lexis and grammar. The focus is influenced by Halliday’s view of lexis and grammar as “complementary perspectives” (1991: 32), and his conception of the two as notional ends of a continuum (lexicogrammar), in that “if you interrogate the system grammatically you will get grammar-like answers and if you interrogate it lexically you get lexis-like answers” (1992: 64).

We welcome papers reporting on corpus-based studies which examine any aspect of the interaction of lexis and grammar, or discuss methodological issues related to the corpus-based study of lexicogrammar (e.g. annotation, metrics). We are particularly interested in studies that interrogate the system lexicogrammatically to get lexicogrammatical answers. The studies may …

focus more on the lexis or grammar end of the continuum, or adopt an integrative approach.
offer different interpretations of the nature of lexicogrammar.
examine any language, or compare different languages.
examine L1 and/or L2 use.
adopt a synchronic or diachronic approach.
operate within any theoretical approach that takes into account the interaction of lexis and grammar (e.g. Construction Grammar, Lexical Grammar, Pattern Grammar, Systemic Functional Grammar, Valency Grammar).
discuss the implications of a lexicogrammatical approach for applied linguistics (e.g. lexicography, language teaching, translation, (critical) discourse studies).
develop relevant research/teaching resources.


Presentations will be allocated 35 minutes (including 10 minutes for discussion). Please send an abstract of 500 words (excluding references) to Costas Gabrielatos (gabrielc@edgehill.ac.uk). Please make sure that the abstract clearly specifies the research questions or hypotheses, the corpus and methodology, and the main findings.

The deadline for abstract submission is 12 March 2017. Abstracts will be double-blind reviewed, and decisions will be communicated by 9 April 2017.

Programme Committee
Federica Barbieri (Swansea University)
Tine Breban (University of Manchester)
Kristin Davidse (University of Leuven)
Belen Diaz-Bedmar (University of Jaén)
Eva Duran Eppler (University of Roehampton)
Lise Fontaine (Cardiff University)
Gaëtanelle Gilquin (Université catholique de Louvain)
Nick Groom (University of Birmingham)
Glenn Hadikin (University of Portsmouth)
Andrew Hardie (Lancaster University)
Sebastian Hoffmann (University of Trier)
Andrew Kehoe (Birmingham City University)
Gabriel Ozon (University of Sheffield)
Michael Pace-Sigge (University of East Finland)
Magali Paquot (Université catholique de Louvain)
Pascual Perez-Paredes (University of Cambridge)
Paul Rayson (Lancaster University)
Ute Römer (Georgia State University)
James Thomas (Masaryk University)
María Sánchez-Tornel (University of Murcia)
Benet Vincent (Coventry University)
Stefanie Wulff (University of Florida)

Participation is free. Coffee/tea and a light buffet lunch will be provided, but participants are expected to cover their travel and accommodation costs. Please note that the number of places is limited, and places will be allocated on a first-come, first-served basis.

If you have any questions, please contact Costas Gabrielatos (gabrielc@edgehill.ac.uk).

URL: https://www.edgehill.ac.uk/english/research/conferences/lxgr2017/