Learner language research beyond contrastive interlanguage analysis

Learner language research beyond contrastive interlanguage analysis: rethinking epistemology


Contrastive interlanguage analysis (CIA) has allowed researchers to tap into how language learners use their L2 or L3 by examining the frequency of different discrete linguistic features. The rationale behind such analysis is that L1 groups of learners show distinctive distributional features that can help researchers understand L1-L2 interfaces, general communication features in an L2 or, among others, language development at different competence levels. Arguably, CIA has attracted limited interest outside the corpus linguistics community as SLA research and most language education theories have generally failed to appreciate the relevance of this type of research in their own debates about language learning (Gablasova, Brezina & McEnery, 2017).

I maintain that the over-stress on the learner’s mother tongue as the factor that has been most discussed in learner corpus research (Paquot & Granger, 2012) may have discouraged SLA researchers from using corpora and corpus-driven findings. In this sense, Myles (2015) has suggested that SLA research and SLA theories have “more sophisticated agendas”. I will discuss two research projects that combine CIA methods with other research methods. The first research (Pérez-Paredes & Díez-Bedmar, 2018) adopts a parallel sequential design where different methods (POS keyness (Rayson, 2008, 2009) and automatic analysis of syntactic sophistication (Kyle, 2016)) query the data independently. This research sets out to characterize the writing of Spanish young EFL learners in different instructed settings by looking at naturally occurring language use in a set of essays on the same topic. A subset of the International Corpus of Crosslinguistic Interlanguage (ICCI) (Tono and Díez-Bedmar 2014) was used for the analysis. The second research (O´Keeffe, Pérez-Paredes & Mark, 2018) adopted Ellis, Römer & O’Donnell´s (2016) usage based language acquisition approach and examined Verb Argument constructions (VACs) development across EFL performance levels (A2, B2, C2) in the Cambridge Learner Corpus, a 55-million-word corpus of learner exam data, from over 200,000 exam scripts, across 200 countries, from candidates of over 140 first language backgrounds. The use of syntactic pattern analyses offered researchers the possibility to both examine units of analyses that go beyond isolated lexical items and track down how VACs evolve across language development. In this talk, I will argue that learner corpus research needs to re-focus its epistemology and strengthen the use of what I call general corpus research methods. Traditional CIA-related findings and, in particular, an over-reliance on analysis of errors or “non-native” speaker underperformance need to be re-examined so as to go beyond the limitations of CIA and contribute to the body of data of interest to SLA researchers outside the corpus linguistics community. 

References (a selection)

Cohen, L., Manion, L., & Morrison, K. (2002). Research methods in education. Routledge.

Ellis, N. C., Römer, U. & O’Donnell, M. B. (2016). Usage-based Approaches to Language Acquisition and Processing: Cognitive and Corpus Investigations of Construction Grammar. Language Learning Monograph Series. Wiley-Blackwell.

Gablasova, D., Brezina, V. & McEnery, T. (2017). Exploring learner language through corpora: comparing and interpreting corpus frequency information. Language Learning 67(S1):130-154.

Myles, F. (2015). Second language acquisition theory and learner corpus research. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research (Cambridge Handbooks in Language and Linguistics, pp. 309-332). Cambridge: Cambridge University Press.

O´Keeffe, A., Pérez-Paredes, P. & Mark, G. (2018). The English Grammar Profile: Investigating Patterns of Learner Grammar Development. Presentation at the American Association for Applied Linguistics 2018 Conference, Chicago, 24-27 March.

Kyle, K. (2016). Measuring syntactic development in L2 writing: Fine grained indices of syntactic complexity and usage-based indices of syntactic sophistication. PhD Dissertation, Georgia State University. URL: http://scholarworks.gsu.edu/alesl_diss/35/ (1 February, 2018)

Paquot, M., & Granger, S. (2012). Formulaic Language in Learner Corpora. Annual Review of Applied Linguistics, 32, 130-149.

Pérez-Paredes, P. & Díez-Bedmar, B. (2018) Researching learner language through POS Keyword analysis and syntactic complexity. In S. Götz and J. Mukherjee (EDS.) Learner Corpora and Language Teaching. Studies in Corpus Linguistics Series. Amsterdam: John Benjamins.

Rayson, P. (2008). From key words to key semantic domains. International Journal of Corpus Linguistics 13(4): 519-549.

Rayson, P. (2009). Wmatrix: a web-based corpus-processing environment, Computing Department, Lancaster University. URL: http://ucrel.lancs.ac.uk/wmatrix/ (1 February, 2018)

Tono, Y. & Díez-Bedmar, B. (2014). Focus on learner writing at the beginning and intermediate stages: The ICCI corpus. International Journal of Corpus Linguistics 19(2): 163-177.

Quote 1

Several reasons can be given for why elicitation techniques are favoured in SLA research. For instance, Mackey & Gass (2005) provide the following reasons why metalinguistic data may be used in SLA research, as opposed to natural lan- guage use data: (i) the particular structure you want to investigate may not occur in natural production: it may be absent or there may not be enough instances, and, conversely, (ii) to answer your research question you may need to know what learners rule out as a possible L2 sentence: (a) presence of a particular structure/ feature in the learners’ natural output does not necessarily indicate that the learn- ers know (i.e. have a mental representation of) the structure, and (b) absence of a particular structure/feature in natural language use data does not necessarily indi- cate that learners do not know the structure. An additional reason is provided by Granger (2002: 6): it is di cult to control the variables that a ect learner produc- tion in a non-experimental context. Additionally, L2 researchers have been typi- cally trained in (quasi)experimental methods rather than in corpus methods, ex- cept for those studies conducted with source data from CHILDES (see Myles 2007b: 386 for a discussion). e consequence of all this is that the empirical base of SLA research tends to be relatively narrow, based on the language produced by a very limited number of subjects, which, as pointed out by Granger (2002: 6), raises questions about whether results can be generalised. But the methodological future of SLA looks promising, since some researchers are currently claiming that combining both naturalistic and experimental data is crucial to gain insight into the relation between the two types of data (e.g. Gilquin & Gries 2009).

Lozano, C., & Mendikoetxea, A. (2013). Learner corpora and Second Language Acquisition: The design and collection of CEDEL2. In A. Díaz-Negrillo, N. Ballier & P. Thompson (Eds.), Automatic Treatment and Analysis of Learner Corpus Data. Amsterdam: John Benjamins, pp. 65-100.

Quote 2

We as linguists should train ourselves specifically to be open to the evidence of long text. This is quite different from using the computer to be our servant in trying out our ideas; it is making good use of some essential differences between computers and people.

[…] I believe that we have to cultivate a new relationship between the ideas we have and the evidence that is in front of us. We are so used to interpreting very scant evidence that we are not in a good mental state to appreciate the opposite situation. With the new evidence the main difficulty is controlling and organizing it rather than getting it.

Sinclair. Trust the Text. (2004:17)

Quote 3

The second set of assumptions identified by Burrell and Morgan (1979) are of an epistemological kind. These concern the very bases of knowledge – its nature and forms, how it can be acquired, and how communicated to other human beings. How one aligns oneself in this particular debate profoundly affects how one will go about uncovering knowledge of social behaviour. The view that knowledge is hard, objective and tangible will demand of researchers an observer role, together with an allegiance to the methods of natural science; to see knowledge as personal, subjective and unique, however, imposes on researchers an involvement with their subjectsand a rejection of the ways of the natural scientist. To subscribe to the former is to be positivist; to the latter, anti-positivist.

Cohen et al. (6th edition)