DDL studies based in China HE #AAAL2018


Xiaoya Sun

Investigating the Effectiveness of a Data-driven Learning (DDL) intervention in an EFL Academic Writing Class

Tue, March 27, 1:50 to 2:20pm, Sheraton Grand Chicago, Arkansas Room
Session Submission Type: Paper



The past few decades have witnessed the emergence and development of corpus linguistics “as a powerful methodology-technology” (Lee & Swales, 2006, p. 57) with considerable potential for linguistic research and language pedagogy. In language teaching and learning, the growing applications of corpus linguistics are greatly expanding our pedagogical options and resources (Conrad, 2000; Vyatkina, 2016), as corpora provide rich language samples for teachers to develop authentic instructional materials and classroom activities (Yoon & Hirvela, 2004), and for learners to form and test their hypotheses about patterns of language use (Leech, 1997). However, corpora and corpus tools have not yet “made major inroads into language classrooms” (p. 138, Yoon, 2011), especially in EFL/ESL contexts, and the effectiveness of data-driven learning (DDL) in these contexts has not been firmly established.

This presentation reports on an experimental study that set out to investigate the effectiveness of a DDL intervention in an EFL university classroom, in comparison with a traditional teacher-directed approach, in raising learners’ awareness of hedging in English academic writing and improving their use of hedges. The study adopted a pretest-posttest-delayed test randomized control group design. Treatment for the experimental group involved hands-on experience with two carefully chosen, purpose-built online corpora, while that for the control group consisted of traditional lectures featuring dictionary work and passage-based exercises. Statistical analyses of the two groups’ performances on the three tests have yielded empirical evidence of both the affordances and limitations of the DDL activities. In addition, a questionnaire survey conducted after the intervention has received generally positive feedback from the experimental group participants towards the incorporation of corpora in classroom teaching. These findings are interpreted and discussed in terms of DDL learning principles. The presentation concludes with suggestions for future DDL applications and research in EFL teaching contexts.

A group of 24 students studying translation

Condition 1 vs Condition 2

3 writing tests + questionnaire survey on effectiveness of instructional sessions

4 2-hour instructional sessions for each treatment condition in 3 days

Delayed post-text 2 weeks after completion

MICUSP corpus

ICNALE online: Asian learners of English

Group 1 compares hedging in MICUSP and ICNALE

Group 2 stay with MICUSP and their own writing

Hedging was quantified in terms of frequency and variation

DDL somewhat effective

Hands on DDL less effective


Tanjun Liu

Evaluating the Effect of Data-driven Learning (DDL) on the Acquisition of Academic Collocations by Advanced Chinese Learners of English
Tue, March 27, 2:25 to 2:55pm, Sheraton Grand Chicago, Arkansas Room
Session Submission Type: Paper


Collocations, prefabricated multi-word combinations, are considered to be a crucial component of language competence which indicates the central role they should play in language teaching and learning. However, collocations remain a challenge to L2 learners at different proficiency levels, and particularly a difficulty to Chinese learners of English. Collocations have so far attracted only limited attention in the Chinese language teaching classroom. This study, therefore, focuses on the effectiveness of the teaching of academic collocations to advanced Chinese learners of English, using a specific pedagogical approach to teaching collocations, the corpus-based data-driven learning approach (DDL). DDL has been argued to offer an effective teaching method in language learning. However, large-scale, quantitative studies evaluating the effectiveness and assessed the benefits of DDL in the acquisition of academic collocations were limited in number when compared to a different method of teaching of collocations.

This study, therefore, uses data from 120 Chinese students of English from a Chinese university and employs a quasi-experimental method, using a pre-test-and-post-test (including delayed test) control-group research design to compare the achievement of the use of DDL and online dictionary in teaching academic collocations to advanced Chinese learners of English. The experimental group uses #Lancsbox (Brezina, McEnery & Wattam, 2015), an innovative and user-friendly corpus tool. By comparison, the control group uses the online version of the Oxford Collocations Dictionary. The results are analysed for the differences in collocation gains within and between the two groups. Those quantitative data are supported by findings from semi-structured interviews linking learners’ results with their attitudes towards DDL. The findings contribute to our understanding of the effectiveness of DDL for teaching academic collocations and suggest that the incorporation of technology into language learning can enhance collocation knowledge.

3 groups (ca. 40 ss each)

Used the Oxford collocation dictionary in one of the groups

Treatment: 10 weeks

Post test and delayed post-test (2 months later)

Survey + semi-structured interview

This presentation focused on the survey results and the perceptions of the learners

Positive attitudes

Rezaee et al 2015: make students more collocation wise




Some notes #AAAL2018 colloquium on constructions in Applied Linguistics



Constructions in Applied Linguistics: Innovation and Application of Corpus-based Construction Grammar

Sun, March 25, 8:00 to 11:15am, Sheraton Grand Chicago, Colorado Room

Ute Roemer, Georgia State University
This paper presents findings from a large-scale corpus study on the development of verb patterns in second language (L2) learners of English. It follows the lead of existing usage-based studies of L2 construction acquisition while considerably expanding their scope to hundreds of constructions and over 700,000 verb tokens. Using methods from Corpus Linguistics and Natural Language Processing, the study focuses on verb-argument constructions (VACs, e.g. the ‘V n n’ or ditransitive construction) and addresses the following research questions:
1. What are the first VACs acquired by beginning L2 learners of English?
2. How does the VAC repertoire of learners develop across proficiency levels?
3. How does the distribution of verbs in VACs in learner production develop across proficiency levels?
4. What role do formulaic sequences play in the L2 acquisition of VACs?
To address these questions, data on verbs and the constructions they occur in was exhaustively extracted from a dependency-parsed cross-sectional corpus of L2 writing. The corpus is a 6-million word subset of EFCAMDAT, the Education First-Cambridge Open Language Database, consisting of over 68,000 texts produced by L1 German and L1 Spanish learners at CEFR levels A1 through C1. Using a customized Python script, we generated frequency-sorted VAC and verb-VAC lists for each level and L1 (e.g., German_A1). We also extracted recurring multi-word clusters (spans 3, 4, and 5) around the 50 most frequent verbs in EFCAMDAT, together with information on frequency and cluster association strength (Mutual Information).
We will share selected results on verb construction development across learner proficiency levels. We expect to find an increase in VAC types, growth in VAC productivity and complexity, and a development from predominantly fixed sequences to more flexible and productive ones. The resulting findings help to expand our understanding of the processes that underlie construction acquisition in an L2 context.


Nicholas Groom, University of Birmingham, UK


Construction grammar is most strongly associated with cognitive linguistic theory and with research into language acquisition. In this paper, however, I demonstrate that construction grammar offers equally exciting opportunities to more socioculturally-oriented researchers, particularly those whose work focuses on identifying and analysing the meanings and values associated with particular discourse communities.

The potential power of construction-based approaches to sociocultural analysis was first demonstrated by Wulff et al (2007), who identified statistically significant differences in the ‘into-causative’ construction in American and British English. The paper asks why Wulff et al’s call for further research along similar lines has gone largely unheeded. It is proposed that a key reason may be that most current construction-based approaches are deductive in nature (i.e. the researcher decides which construction(s) to study), whereas socioculturally-oriented research is often exploratory in nature and thus more suited to inductive methods (i.e. where the aim is to discover which constructions are associated with a particular language variety or discourse community). The paper then proposes an adaptation of closed-class keywords analysis as a viable methodology for the inductive analysis of variety/discourse-specific constructions in large corpora. The remainder of the paper will provide a practical illustration of this approach, showing how corpus-based construction grammar can yield new insights into the relationship between phraseology (defined as preferred ways of saying) and epistemology (defined as preferred ways of knowing) in the specialized discourses of academic disciplines. The main empirical focus of the paper will be on a newly identified construction, the ‘WAY IN WHICH’ construction (as in This may have affected the way in which religious ideas were disseminated), and will draw on examples of this construction as it occurs in a large-scale corpus-based analysis of professional academic writing in the disciplinary discourses of history and literary criticism.

Closed-class keyword analysis (Groom 2010): the use of closed-class key words yields constructions of interest to the researcher

Phraseology can be repositioned as a discoursal rather than a lexicogrammatical phenomenon.

Florent Perek, University of Birmingham, UK
Amanda Patten


Identifying units of language that unite lexis and grammar as well as form and meaning offer substantial opportunities for resources for language learners. One such would be a ‘constructicon’: a listing of constructions adjusted to learner proficiency level. This paper argues for the use of two existing corpus-based descriptions of English that could be combined to form a constructicon: the grammar patterns identified as part of the COBUILD dictionary project, and the frames identified in the FrameNet project.

Grammar patterns focus on the complementation patterns commonly used with verbs, nouns and adjectives. FrameNet focuses on the roles associated with semantic frames. Although they differ in their scope and approach, both FrameNet and grammar patterns provide valency information as part of their output, but so far no attempt has been made to systematically compare and match this information. The present study fills this gap, focusing in particular on verbs. All argument realization information of verbs was extracted from FrameNet, resulting in a list of triplets of verb, semantic frame, and syntactic pattern. The FrameNet patterns were matched to the verb patterns listed in Francis et al. (1996) and the level of agreement quantified.

The paper demonstrates the use of FrameNet frames to add a semantic dimension to grammar patterns. Conversely, some verb classes defined by their occurrence with grammar patterns can help highlight relations between frames that are not recorded in FrameNet. We argue that matching FrameNet and grammar patterns can build a database of constructions, since semantically coherent set of frames paired with the syntactic realization of frame elements qualify as form-meaning pairs. This would complement the FrameNet Constructicon project (Fillmore et al. 2012) and make it more useful for learners, by focusing on frequent constructions rather than on idiosyncratic ones.

The % of COBUILD patterns in the Frame Net is < 50%.


Stefan Th. Gries, U. California, Santa Barbara


Fifteen years ago, Stefanow

+itsch and Gries introduced methods of measuring the co-occurrence of lexis and construction, identifying what are now called collostructions. Typically, these measures are based on a comparison of (i) the observed co-occurrence frequency of words with other words or constructions and (ii) the corresponding co-occurrence frequencies one would expect from a chance distribution. Examples for such measures include the log-likelihood ratio, MI, or the well-known chi-squared test. Researchers have thus studied the degree to which lexical items ‘like’ to occur in a specific construction (e.g., give, tell, show can be shown to be strongly attracted to the ditransitive construction) or which of two or more functionally similar constructions lexical items prefer (e.g., the will-future is associated more with low-dynamicity and general actions such as see, find, or know whereas the going-to future is associated more with dynamic and specific actions such as do, happen, or go). The observation of such preferred co-occurrences promise considerable improvements to information made available to language learners and teachers, as well as potentially modelling language acquisition processes.

While this approach has been very widely used and quite successful, its reliance on traditional association measures presents problems. First, these measures, unlike learning and comprehension, are bidirectional. Second, they confuse the potentially different effects of frequency and association. Third, the dispersion co-occurrences in the corpus are neglected, with the risk that seemingly high frequencies of underdispersed expressions skew the results.

In this paper, I outline remedies to these problems. I exemplify how unidirectional association measures (Delta P and the KL-divergence) better identify collostructions. I also include measures of dispersion of the collostructions, where corpus parts can be defined in terms of files or of (sub)registers. I exemplify the results for both rarer and more frequent constructions, specifically the ditransitive, the passive, and the will-future.

Towards a tuple-lization of corpus linguistics.

Frequency + association + dispersion


Prof Hunston:

The speaker´s internal model of language is driven by usage

Language is highly patterned

Units of meaning

Textual colligation / Grammar pattern / Local Grammar


-A unified theory of language

-Examples are easy to find but difficult to systematise

-Can the identification of constructions be systematised?

Applications: Measuring learner progress and others

Are patterns and constructions interchangeable? Hunston doesn´t think so. One pattern may have/represent different constructions as different meanings are possible, ie, V – n – for -n

www.collinsdictionary.com with the original grammar patterns will be online this week





EGP: investigating patterns of learner grammar development AAAL 2018 Chicago


The English Grammar Profile: investigating patterns of learner grammar development

Anne O´Keeffe, Mary Immaculate College, University of Limerick – 

Geraldine Mark, Mary Immaculate College, University of Limerick – 

Pascual Pérez-Paredes, University of Cambridge

Check out our handout here.


The CEFR: http://www.cambridgeenglish.org/exams-and-tests/cefr/

The English Grammar Profile: http://www.englishprofile.org/english-grammar-profile/egp-online

Cambridge Learner Corpus: https://www.sketchengine.co.uk/cambridge-learner-corpus/

Sketch Engine universal POS tags https://www.sketchengine.co.uk/universal-pos-tags/



Ellis, N. C. (2003). ‘Constructions, chunking, and connectionism: The emergence of second language structure’. In C. Doughty & M. H. Long (Eds.), Handbook of Second Language Acquisition (pp. 33–68). Oxford, UK: Blackwell.

Ellis, N. C. (2012). “Formulaic language and second language acquisition: Zipf and the phrasal teddy bear”. Annual Review of Applied Linguistics, 32, 17-44.

Simpson-Vlach, R., & Ellis, N. C. (2010). An Academic Formulas List (AFL). Applied Linguistics, 31, 487–512.

Ellis, N. C., Römer, U. & O’Donnell, M. B. (2016). Usage-based Approaches to Language Acquisition and Processing: Cognitive and Corpus Investigations of Construction Grammar. Language Learning Monograph Series. Wiley-Blackwell.

Larsen-Freeman, D. (2006).  “The emergence of complexity,  fluency, and accuracy in the oral and written production of  five Chinese learners of English”. Applied Linguistics, 27(4), 590–619.

Milton, J., & Meara, P. (1995). “How periods abroad affect vocabulary growth in a foreign language”. ITL Review of Applied Linguistics, (107–08), 17–34.

O’Keeffe, A., & Mark, G. (2017). “The English Grammar Profile of learner competence: Methodology and key findings”. International Journal of Corpus Linguistics, 22(4), 457-489. https://benjamins.com/#catalog/journals/ijcl.14086.oke/fulltext

Römer, U., O’Donnell, M. B., & Ellis, N. C. (2014). “Second language learner knowledge of verb–argument constructions: Effects of language transfer and typology”. The Modern Language Journal, 98(4), 952-975.

Thewissen, J. (2013). “Capturing L2 accuracy developmental patterns: Insights from an error-tagged learner corpus”. The Modern Language Journal, 97(S1), 77–101.