Incorporating corpora in teaching symposium, Mittuniversitetet, Sweden

Check out the programme here.

Rethinking learning in DDL

Plenary abstract

It´s 11 years now that Johansson´s (2009:41) claimed that more systematic studies are needed in order to test the benefits of DDL and to discuss ‘students’ problems with corpus investigation’. The meta-analyses carried out by Boulton & Cobb (2017) and Lee, Warschauer & Lee (2018) have cast robust results on the benefits of DDL.  Boulton & Cobb found average effect sizes of 1.50 for pre/posttest designs and 0.95 for control/experimental designs. Lee, Warschauer & Lee (2018) have reported a medium-sized effect of corpus use on vocabulary learning. Overall, these are medium or large effect sizes that support the use of DDL for language learning. The second claim presents researchers with different angles to exploit, and can possibly be studied using different approaches and foci. Chambers (2005:111) called for further research that looks at ‘the integration of corpora and concordancing in the language-learning environment’, and Pérez-Paredes et al,. (2012:499-500) noted that few studies have examined the interaction of learners with corpora and, as an extension, how learning happens. This study problematized the role of learners as researchers as it was found that only a small percentage of the searches either were effective or showed the minimum levels of sophistication needed to carry out the task at hand.

DDL research has paid considerable attention to measuring the outcomes of learning, but perhaps not so much to what learning and how learning takes place. Lee, Warschauer & Lee (2018) noted that using ‘pre-selected, comprehensible concordance lines appears more effective in supporting their corpus-based activities’ and so does learner-friendly concordancer software specifically designed for L2 learning’, something that had been pointed out in Pérez-Paredes, (2010). An emphasis on technology and tools is manifest. In this talk I will examine one of those angles that need further attention: learning. I will draw on the systematic review of the uses and spread of data-driven learning (DDL) and corpora in language learning and teaching across five major CALL-related journals during the 2011–2015 period in Pérez-Paredes (2019). This review examined 32 research papers published in high rank CALL journals and concluded that the normalization (Bax, 2003) of corpus use in language education has only taken place in a limited number of contexts, mostly in Higher Education, where language teachers and DDL researchers subsume overlapping roles. This finding echoes Boulton´s (2017) call to widen the scope of our research to a more comprehensive spectrum of learning contexts. I will outline a taxonomy of language learning focus, learners´ abilities and processes advocated in the body of research examined that can enhance our understanding of the role played by learning in future data-driven language learning theory or theories.

This talk seeks to contribute to previous work that has tried to bridge the gap between research and practice (Chambers, 2019). I will argue that DDL researchers need to move away from a technology-oriented DDL (Godwin-Jones, 2017) and pursue efforts that widen our understanding of the contributions of DDL to SLA, the contributions of SLA to DDL, and the in-depth analysis of the role of DDL in the broader language learning context, including the use of cognitive strategies (Lee, Warschauer & Lee, 2020).

Body of research examined

Pérez-Paredes, P. (2019). A systematic review of the uses and spread of corpora and data-driven learning in CALL research during 2011–2015. Computer Assisted Language Learning.

Learning targets identified (id provided) see file.

Form awareness, morphology, error awareness, register awareness

Clause patterns / patterns




Dictionary skills

Corpus use

Linking adverbials

Paralell concordances 3

Compilation of a corpus 4

Searching frequency 5

Reading 13

Vocabulary acquisition 14

Idioms 18

Authorial stance 23

Abstract nouns 27

Passives 28

POS 30

A taxonomy of language learning targets

An agenda for a learning-driven DDL

Long quote 1: researching noticing

From Gutiérrez, A., Leder, G. C., & Boero, P. (2016). The Second Handbook of Research on the Psychology of Mathematics Education : The Journey Continues. Brill | Sense.

Noticing and Representing Pattern Structure

Pattern activities have been considered to be one of the main ways for introducing students to algebra (e.g., Ainley, Wilson, & Bills, 2003; Mason, 1996). From this perspective, algebra is about generalizing (Radford, 2006). Previous research has evidenced that visual approaches generated in tasks involving the generalization of geometric figures and numeric sequences can provide strong support for the development of algebraic expressions, variables, and the conceptual framework for functions (Healy & Hoyles, 1999). However, not all activities lead to algebraic thinking. For example, placing the emphasis on the construction of tables of values from pattern sequences can result in the development of closed-form formulas, formulas that students cannot relate to the actual physical situation from which the pattern and tables of values have been generated (e.g., Amit & Neria, 2008; Hino, 2011; Warren, 2005). This impacts on students’ ability to identify the range of equivalent expressions that can be represented by the physical situation.The patterns utilised in the 2005–2015 research encompassed both linear and quadratic functions that were represented as a string of visual figures or numbers. The activities students engaged in involved searching for the relationship between the discernable related units of the pattern (commonly called terms), and the terms’ position in the pattern. These reflect the types of activities predominantly used in current curricula to introduce young adolescent students to the notion of a variable and equivalence.

Students noticing and representing the pattern structure.

Fundamental to patterning activities is the search for mathematical regularities and structures. In this search, Rivera (2013) suggests that students are required to coordinate two abilities, their perceptual ability and their symbolic inferential ability. This coordination involves firstly noticing the commonalities in some given terms, and secondly forming a general concept by noticing the commonality to all terms (Radford, 2006; Rivera, 2013). Finally, students are required to construct and justify their inferred algebraic structure that explains a replicable regularity that could be conveyed as a formula (Rivera, 2013). At this stage the focus is no longer on the terms themselves but rather on the relations across and among them (Kaput, 1995).

Difficulties students experience in noticing pattern structure.

Emerging from the findings of this current research is that while young students are capable of noticing pattern structure and engaging in pattern generalization, they exhibit many of the difficulties found in past research with older students. As revealed in the findings of this research: young students have difficulties moving from one representational system to another such as from the figures themselves to an algebraic form that conveys the relationships between the figures (Becker & Rivera, 2007); students tend to be answer driven as they search for pattern structure (Ma, 2007); they engage in single variational thinking or recursive thinking (Becker & Rivera, 2008; Warren, 2005); they fail to understand algebraic formula (Warren, 2006; Radford, 2006); and, they have difficulties expressing the structure in everyday language (Warren, 2005). In addition, initial representations of the pattern (e.g., pictorial, verbal and symbolic) can influence students’ performance. This is particularly evident as 139 10 and 11 year-old students explored more complex patterns (Stalo, Elia, Gagatsis, Teoklitou, & Savva, 2006), with pictorial representations of patterns proving easier for students to predict terms in further positions and articulate the generalization

Capabilities that assist students to notice structure.

Adding to the research is a delineation of the types of capabilities that assist young students to reach generalizations. The ability to see the invariant relationship between the figural cues is paramount to success (Becker & Rivera, 2006; Stalo et al., 2006). The development of specific language that assists students to describe the pattern (e.g., position, ordinal language, rows) (Warren, Miller, & Cooper, 2011; Warren, 2006) and fluency with using variables (Becker & Rivera, 2006) help students to express and justify their generalization. In addition, Becker and Rivera (2006) found that students who had facility with both figural ability and variable fluency were more capable of noticing the structure, and developing and justifying generalizations. By contrast, students who fail to generalize tend to begin with numerical strategies (e.g., guess and check) as they search for generalizations and lack the flexibility to try other approaches (Becker & Rivera, 2005). This has implications for the types of instructional practices that occur in classroom contexts. It is suggested that instruction that includes verbal, figural and numerical representations of patterns, and emphasises the connections among these representations assists students to reach generalizations (Becker & Rivera, 2006). An ability to think multiplicatively has also been shown to assist students generalize figural representations of linear patterns (Rivera, 2013).

Theories pertaining to noticing structure and reaching generalisations.

Results from Radford’s longitudinal study of 120 8th grade (typically 13–14 year-olds) students over a three year period delineated three types of generalization that emerged from the exploration of figural pattern tasks: factual; contextual; and symbolic (Radford, 2006). The first structural layer is factual: ‘it does not go beyond particular figures, like Figure 1000’. The generalization remains bound at the numerical level. Expressing a generalization as factual does not necessary mean that that is the extent of student’s capability. It may simply be that this level can answer the question posed by others or the context in which algebra is needed (Lozano, 2008). The second layer is contextual; ‘they are contextual in that they refer to contextual embodied objects’ and use language such as the figure and the next figure. Finally, symbolic generalization involves expressing a generalization through alphanumeric symbols. The suggested criteria that can be used to assist teachers to distinguish these levels of early algebraic reasoning are: the presence of entities which have the character of generality; the type of language used; and, the treatment that is applied to these objects based on the application of structural properties (Aké, Godino, Gonzato, & Wilhelmi, 2013). The latter refers to how students express this generality. Aké et al. (2013) suggest that algebraic practice involves two crucial aspects, namely, being able to use literal symbols as a general expression and relate this expression to the visual context from which it is derived. In addition, with growing patterns gesturing between the variables (e.g., pattern term, pattern quantity) in conjunction with having iconic signs to represent both variables (e.g., counters for pattern term and cards for pattern quantity) helped 7–9 year old Indigenous students to identify the pattern structure (Miller & Warren, 2015)

Some references

Boulton, A. (2017). Corpora in language teaching and learning. Language Teaching, 50(4), 483-506.

Boulton, A., & Cobb, T. (2017). Corpus Use in Language Learning: A Meta‐Analysis. Language Learning, 67(2), 348-393.

Chambers, A. (2005). Integrating corpus consultation in language studies. Language Learning & Technology 9(2): 111–125.

Chambers, A. (2019). Towards the corpus revolution? Bridging the research–practice gap. Language Teaching, 52(4), 460-475.

Crossley, S., Kyle, K., and Salsbury, T. 2016. A Usage‐Based Investigation of L2 Lexical Acquisition: The Role of Input and Output. Modern Language Journal 100.3, 702-15.

Flowerdew, L. (2009). Applying corpus linguistics to pedagogy: A critical evaluation. International Journal of Corpus Linguistics14(3), 393-417.

Gillespie, J. (2020). CALL research: Where are we now? ReCALL, 32(2), 127-144. doi:10.1017/S0958344020000051

Godwin-Jones, R. (2017). Data-informed language learning. Language Learning & Technology, 21(3), 9–27.

Gutiérrez, A., Leder, G. C., & Boero, P. (2016). The Second Handbook of Research on the Psychology of Mathematics Education : The Journey Continues. Brill :Sense

Johansson, S. (2009). Some thoughts on corpora and second-language acquisition. In Aijmer, K. (Ed.). Corpora and language teaching. John Benjamins Publishing, 33-44.

Lee, H., Warschauer, M., & Lee, J. H. (2018). The Effects of Corpus Use on Second Language Vocabulary Learning: A Multilevel Meta-analysis. Applied Linguistics.

Lee, H., Warschauer, M., & Lee, J. H. (2020). Toward the Establishment of a Data‐Driven Learning Model: Role of Learner Factors in Corpus‐Based Second Language Vocabulary Learning. The Modern Language Journal.

Leung, C., & Scarino, A. (2016). Reconceptualizing the nature of goals and outcomes in language/s education. The Modern Language Journal100(S1), 81-95.

Long, M. H. (1991). Focus on form: A design feature in language teaching methodology. Foreign language research in cross-cultural perspective2(1), 39-52.

Ortega, L. (2014). Ways forward for a bi/multilingual turn in SLA. In S. May (Ed.), The multilingual turn: Implications for SLA, TESOL, and bilingual education (pp. 32-53). New York: Routledge.

Pérez-Paredes, P. (2010). Corpus linguistics and language education in perspective: Appropriation and the possibilities scenario. In Harris, T., & Jaén, M. M. (Eds.). Corpus linguistics in language teaching. Peter Lang, 53-73.

Pérez-Paredes, Pascual, María Sánchez-Tornel, Jose María Alcaraz Calero, and Pilar Aguado Jiménez. (2011). Tracking Learners’ Actual Uses of Corpora: Guided vs Non-guided Corpus Consultation. Computer Assisted Language Learning 24.3, 233-53.

Perez-Paredes, P. eta al. (2012). Learners Search Patterns during Corpus-based Focus-on-form Activities: A Study on Hands-on Concordancing. International Journal of Corpus Linguistics 17.4, 482-515.

Pérez-Paredes, P. (2019). A systematic review of the uses and spread of corpora and data-driven learning in CALL research during 2011–2015. Computer Assisted Language Learning.

Pérez-Paredes, P., Sánchez-Tornel, M. and Alcaraz Calero, J.M.  (2012) Learners’ search patterns during corpus-based focus-on-form activities. A study on hands-on concordancing. International Journal of Corpus Linguistics 17:4, 482–515.

Some references on Usage-based language learning approaches

Ellis, N. (2017) Chapter 6 – Chunking in Language Usage, Learning and Change: I Don’t Know from Part III – Chunking. Edited by Marianne Hundt, Universität Zürich, Sandra Mollin, Universität Heidelberg, Simone E. Pfenninger, Universität Salzburg. Cambridge University Press, pp 113-147

Ellis, N. (2017). Cognition, Corpora, and Computing: Triangulating Research in Usage‐Based Language Learning. Language Learning, 67(S1), 40-65.

Ellis, Nick C., & Ferreira-Junior, Fernando. (2009). Construction Learning as a Function of Frequency, Frequency Distribution, and Function. Modern Language Journal, 93(3), 370-385.

Tyler, A. (2010). Usage-Based Approaches to Language and Their Applications to Second Language Learning. Annual Review of Applied Linguistics, 30, 270-291.

Tyler, A., & Ortega, L. (2016). Usage-based approaches to language and language learning: An introduction to the special issue. 8(3), 335-345.

Tyler, A. (2018). Nick C. Ellis Ute Römer Matthew Brook O’Donnell: Usage-based approaches to language acquisition and processing: Cognitive and corpus investigations of construction grammar. Cognitive Linguistics, 29(1), 155-161.

Weber, Kirsten Morten H. Christiansen Peter Indefrey Peter Hagoort (2018) Primed From the Start: Syntactic Priming During the First Days of Language Learning. Language Learning.

5 recent papers on Data-driven learning


Requested by one of my students, a selection of 5 recent papers on Data-driven learning and the use of corpora in language education.


Ballance, O. J. (2017). Pedagogical models of concordance use: correlations between concordance user preferences. Computer Assisted Language Learning, 30(3-4), 259-283. (Link)

Boulton, A. (2017). Corpora in language teaching and learning. Language Teaching, 50(4), 483-506. (Link)

Boulton, A., & Cobb, T. (2017). Corpus Use in Language Learning: A Meta‐Analysis. Language Learning, 67(2), 348-393. (Link)

Godwin-Jones, R. (2017). Data-informed language learning. Language Learning & Technology, 21(3), 9–27. (Link)

Lee, H., Warschauer, M., & Lee, J. H. (2018). The Effects of Corpus Use on Second Language Vocabulary Learning: A Multilevel Meta-analysis. Applied Linguistics. (Link)




DDL studies based in China HE #AAAL2018


Xiaoya Sun

Investigating the Effectiveness of a Data-driven Learning (DDL) intervention in an EFL Academic Writing Class

Tue, March 27, 1:50 to 2:20pm, Sheraton Grand Chicago, Arkansas Room
Session Submission Type: Paper



The past few decades have witnessed the emergence and development of corpus linguistics “as a powerful methodology-technology” (Lee & Swales, 2006, p. 57) with considerable potential for linguistic research and language pedagogy. In language teaching and learning, the growing applications of corpus linguistics are greatly expanding our pedagogical options and resources (Conrad, 2000; Vyatkina, 2016), as corpora provide rich language samples for teachers to develop authentic instructional materials and classroom activities (Yoon & Hirvela, 2004), and for learners to form and test their hypotheses about patterns of language use (Leech, 1997). However, corpora and corpus tools have not yet “made major inroads into language classrooms” (p. 138, Yoon, 2011), especially in EFL/ESL contexts, and the effectiveness of data-driven learning (DDL) in these contexts has not been firmly established.

This presentation reports on an experimental study that set out to investigate the effectiveness of a DDL intervention in an EFL university classroom, in comparison with a traditional teacher-directed approach, in raising learners’ awareness of hedging in English academic writing and improving their use of hedges. The study adopted a pretest-posttest-delayed test randomized control group design. Treatment for the experimental group involved hands-on experience with two carefully chosen, purpose-built online corpora, while that for the control group consisted of traditional lectures featuring dictionary work and passage-based exercises. Statistical analyses of the two groups’ performances on the three tests have yielded empirical evidence of both the affordances and limitations of the DDL activities. In addition, a questionnaire survey conducted after the intervention has received generally positive feedback from the experimental group participants towards the incorporation of corpora in classroom teaching. These findings are interpreted and discussed in terms of DDL learning principles. The presentation concludes with suggestions for future DDL applications and research in EFL teaching contexts.

A group of 24 students studying translation

Condition 1 vs Condition 2

3 writing tests + questionnaire survey on effectiveness of instructional sessions

4 2-hour instructional sessions for each treatment condition in 3 days

Delayed post-text 2 weeks after completion

MICUSP corpus

ICNALE online: Asian learners of English

Group 1 compares hedging in MICUSP and ICNALE

Group 2 stay with MICUSP and their own writing

Hedging was quantified in terms of frequency and variation

DDL somewhat effective

Hands on DDL less effective


Tanjun Liu

Evaluating the Effect of Data-driven Learning (DDL) on the Acquisition of Academic Collocations by Advanced Chinese Learners of English
Tue, March 27, 2:25 to 2:55pm, Sheraton Grand Chicago, Arkansas Room
Session Submission Type: Paper


Collocations, prefabricated multi-word combinations, are considered to be a crucial component of language competence which indicates the central role they should play in language teaching and learning. However, collocations remain a challenge to L2 learners at different proficiency levels, and particularly a difficulty to Chinese learners of English. Collocations have so far attracted only limited attention in the Chinese language teaching classroom. This study, therefore, focuses on the effectiveness of the teaching of academic collocations to advanced Chinese learners of English, using a specific pedagogical approach to teaching collocations, the corpus-based data-driven learning approach (DDL). DDL has been argued to offer an effective teaching method in language learning. However, large-scale, quantitative studies evaluating the effectiveness and assessed the benefits of DDL in the acquisition of academic collocations were limited in number when compared to a different method of teaching of collocations.

This study, therefore, uses data from 120 Chinese students of English from a Chinese university and employs a quasi-experimental method, using a pre-test-and-post-test (including delayed test) control-group research design to compare the achievement of the use of DDL and online dictionary in teaching academic collocations to advanced Chinese learners of English. The experimental group uses #Lancsbox (Brezina, McEnery & Wattam, 2015), an innovative and user-friendly corpus tool. By comparison, the control group uses the online version of the Oxford Collocations Dictionary. The results are analysed for the differences in collocation gains within and between the two groups. Those quantitative data are supported by findings from semi-structured interviews linking learners’ results with their attitudes towards DDL. The findings contribute to our understanding of the effectiveness of DDL for teaching academic collocations and suggest that the incorporation of technology into language learning can enhance collocation knowledge.

3 groups (ca. 40 ss each)

Used the Oxford collocation dictionary in one of the groups

Treatment: 10 weeks

Post test and delayed post-test (2 months later)

Survey + semi-structured interview

This presentation focused on the survey results and the perceptions of the learners

Positive attitudes

Rezaee et al 2015: make students more collocation wise