Requested by one of my students, a selection of 5 recent papers on Data-driven learning and the use of corpora in language education.


Ballance, O. J. (2017). Pedagogical models of concordance use: correlations between concordance user preferences. Computer Assisted Language Learning, 30(3-4), 259-283. (Link)

Boulton, A. (2017). Corpora in language teaching and learning. Language Teaching, 50(4), 483-506. (Link)

Boulton, A., & Cobb, T. (2017). Corpus Use in Language Learning: A Meta‐Analysis. Language Learning, 67(2), 348-393. (Link)

Godwin-Jones, R. (2017). Data-informed language learning. Language Learning & Technology, 21(3), 9–27. (Link)

Lee, H., Warschauer, M., & Lee, J. H. (2018). The Effects of Corpus Use on Second Language Vocabulary Learning: A Multilevel Meta-analysis. Applied Linguistics. (Link)




DDL studies based in China HE #AAAL2018


Xiaoya Sun

Investigating the Effectiveness of a Data-driven Learning (DDL) intervention in an EFL Academic Writing Class

Tue, March 27, 1:50 to 2:20pm, Sheraton Grand Chicago, Arkansas Room
Session Submission Type: Paper



The past few decades have witnessed the emergence and development of corpus linguistics “as a powerful methodology-technology” (Lee & Swales, 2006, p. 57) with considerable potential for linguistic research and language pedagogy. In language teaching and learning, the growing applications of corpus linguistics are greatly expanding our pedagogical options and resources (Conrad, 2000; Vyatkina, 2016), as corpora provide rich language samples for teachers to develop authentic instructional materials and classroom activities (Yoon & Hirvela, 2004), and for learners to form and test their hypotheses about patterns of language use (Leech, 1997). However, corpora and corpus tools have not yet “made major inroads into language classrooms” (p. 138, Yoon, 2011), especially in EFL/ESL contexts, and the effectiveness of data-driven learning (DDL) in these contexts has not been firmly established.

This presentation reports on an experimental study that set out to investigate the effectiveness of a DDL intervention in an EFL university classroom, in comparison with a traditional teacher-directed approach, in raising learners’ awareness of hedging in English academic writing and improving their use of hedges. The study adopted a pretest-posttest-delayed test randomized control group design. Treatment for the experimental group involved hands-on experience with two carefully chosen, purpose-built online corpora, while that for the control group consisted of traditional lectures featuring dictionary work and passage-based exercises. Statistical analyses of the two groups’ performances on the three tests have yielded empirical evidence of both the affordances and limitations of the DDL activities. In addition, a questionnaire survey conducted after the intervention has received generally positive feedback from the experimental group participants towards the incorporation of corpora in classroom teaching. These findings are interpreted and discussed in terms of DDL learning principles. The presentation concludes with suggestions for future DDL applications and research in EFL teaching contexts.

A group of 24 students studying translation

Condition 1 vs Condition 2

3 writing tests + questionnaire survey on effectiveness of instructional sessions

4 2-hour instructional sessions for each treatment condition in 3 days

Delayed post-text 2 weeks after completion

MICUSP corpus

ICNALE online: Asian learners of English

Group 1 compares hedging in MICUSP and ICNALE

Group 2 stay with MICUSP and their own writing

Hedging was quantified in terms of frequency and variation

DDL somewhat effective

Hands on DDL less effective


Tanjun Liu

Evaluating the Effect of Data-driven Learning (DDL) on the Acquisition of Academic Collocations by Advanced Chinese Learners of English
Tue, March 27, 2:25 to 2:55pm, Sheraton Grand Chicago, Arkansas Room
Session Submission Type: Paper


Collocations, prefabricated multi-word combinations, are considered to be a crucial component of language competence which indicates the central role they should play in language teaching and learning. However, collocations remain a challenge to L2 learners at different proficiency levels, and particularly a difficulty to Chinese learners of English. Collocations have so far attracted only limited attention in the Chinese language teaching classroom. This study, therefore, focuses on the effectiveness of the teaching of academic collocations to advanced Chinese learners of English, using a specific pedagogical approach to teaching collocations, the corpus-based data-driven learning approach (DDL). DDL has been argued to offer an effective teaching method in language learning. However, large-scale, quantitative studies evaluating the effectiveness and assessed the benefits of DDL in the acquisition of academic collocations were limited in number when compared to a different method of teaching of collocations.

This study, therefore, uses data from 120 Chinese students of English from a Chinese university and employs a quasi-experimental method, using a pre-test-and-post-test (including delayed test) control-group research design to compare the achievement of the use of DDL and online dictionary in teaching academic collocations to advanced Chinese learners of English. The experimental group uses #Lancsbox (Brezina, McEnery & Wattam, 2015), an innovative and user-friendly corpus tool. By comparison, the control group uses the online version of the Oxford Collocations Dictionary. The results are analysed for the differences in collocation gains within and between the two groups. Those quantitative data are supported by findings from semi-structured interviews linking learners’ results with their attitudes towards DDL. The findings contribute to our understanding of the effectiveness of DDL for teaching academic collocations and suggest that the incorporation of technology into language learning can enhance collocation knowledge.

3 groups (ca. 40 ss each)

Used the Oxford collocation dictionary in one of the groups

Treatment: 10 weeks

Post test and delayed post-test (2 months later)

Survey + semi-structured interview

This presentation focused on the survey results and the perceptions of the learners

Positive attitudes

Rezaee et al 2015: make students more collocation wise




#CFP #deadline extended edited book focusing on use of corpora for data-driven learning with young learners


Call for chapters for an edited book focusing on the use of corpora for data-driven learning (DDL, Johns, 1991) with young (i.e. pre-tertiary) learners.

DDL, despite being a feature of corpora and language learning research for some time, has really taken off as a viable methodological approach in the last decade due to innovations in corpus query interfaces, data visualisation, open access and improved internet access/speed. However, for a number of reasons including access, resources and difficulties in convincing those outside academia of the value of DDL, the majority of studies on DDL are conducted with tertiary or adult learners, leaving DDL for younger learners (those in pre-school, primary, or secondary education) as a relatively underexplored area in the literature.

With this in mind, chapter proposals are invited that explore the use of DDL with younger learners. Studies dealing with DDL for first or second language acquisition, genre and register learning/teaching and DDL for the teaching/learning of subjects other than languages are particularly welcome. The corpora involved in any of these studies can be spoken, written or multimodal. Chapters may be empirical studies of corpus use and its effects on learning, studies that explore the perceptions of corpus use by younger learners/teachers of younger learners, or studies that make a novel contribution to theory or methodology, such as new software or corpora that deal specifically with younger learners, and new approaches in training teachers / students of younger learners in DDL techniques.

Final chapters will be approximately 6000-7500 words. Chapter proposals of 400-500 words are due mid April, 2018. The edited volume is to be published by Routledge in 2019, part of the Taylor and Francis publishing group.

Please feel free to signal your interest or discuss your ideas by contacting the editor at p.cros@uq.edu.au.

Acceptance notification in April, with final submissions due December 2018/January 2019.

Please contact:

Dr. Peter Crosthwaite, Ph.D., FHEA
Lecturer, School of Languages and Cultures,
University of Queensland

Free copy of our latest paper in Computer Assisted Language Learning

Our article, Language teachers’ perceptions on the use of OER language processing technologies in MALL, has just been published on Computer Assisted Language Learning Journal, Taylor & Francis Online.

Combined with the ubiquity and constant connectivity of mobile devices, and with innovative approaches such as Data-Driven Learning (DDL), Natural Language Processing Technologies (NLPTs) as Open Educational Resources (OERs) could become a powerful tool for language learning as they promote individual and personalized learning. Using a questionnaire that was answered by language teachers (n = 230) in Spain and the UK, this research explores the extent to which OER NLPTs are currently known and used in adult foreign language learning. Our results suggest that teachers’ familiarity and use of OER NLPTs are very low. Although online dictionaries, collocation dictionaries and spell checkers are widely known, NLPTs appear to be generally underused in foreign language teaching. It was found that teachers prefer computer-based environments over mobile devices such as smartphones and tablets and that teachers’ qualification determines their familiarity with a wider range of OER NLPTs. This research offers insight into future applications of Language Processing Technologies as OERs in language learning.

KEYWORDS: Language learning, teachers’ perceptions, OER, MALL, natural language processing technologies, higher education

Language learning theories underpinning corpus-based pedagogy #cl2015


Lynne Flowerdew
Language learning theories underpinning corpus-based pedagogy

The noticing hypothesis (Schmidt)

Attention consciously drawn

Noticing linked to frequency counts

Implicit vs explicit learning

 Constructivist learning

Learners engage in discovery learning

Inductive learning

Cognitive skills, problem solving to understand new data

Widmann et al. 2011: the more possible starting points for exploitation, the more likely for different learners- SACODEYL project.

Sociocultural theory

What about language learning outside the classroom and incidental learning?