Corpus Linguistics #cl2015: notes and pics

Corpus Linguistics Conference 2015, University of Lancaster, UK

Thanks to @TonyMcEnery, @HardieResearch and everybody at @UCREL_Lancaster for organizing a wonderful conference.

Abstract book download:

Adobe-PDF-Document-icon

A selection of talks and personal notes:

Learner corpus research plenary #cl2015

Multi-dimensional analysis of oral proficiency interviews #cl2015

Non-obvious meaning in CL and CADS #cl2015

Representation of benefit claimants in UK media #cl2015

Tono Linguistic feature extraction #cefr #cl2015

Language learning theories underpinning corpus-based pedagogy #cl2015

MA of L2 learner English

And some pics:

 

IMG_20150730_215443

Robert Poole (left)

IMG_20150730_215522

Ricardo Jiménez

IMG_20150730_215840

 

IMG_20150722_210050

Carlos Ordoñana (left)

IMG-20150723-WA0015

Lynne Flowerdew

IMG-20150723-WA0006

Carlos Ordoñana (left) and Yukio Tono (right)

IMG-20150724-WA0009

Discussing the representation of immigrants in the context of the LADEX project.

IMG-20150724-WA0010

Discussing the representation of immigrants in the context of the LADEX project.

IMG-20150723-WA0006

Carlos Ordoñana (left) and Yukio Tono (right)

IMG-20150723-WA0009

Yolanda Noguera and John Flowerdew

IMG-20150723-WA0011

Yukio Tono (middle)

IMG_20150730_215950

Yolanda Noguera and Michael Barlow

 

 

Multi-dimensional analysis of oral proficiency interviews #cl2015

 

IMG_20150723_113417

Shelley Staples; Jesse Egbert; Geoff LaFlair

A multi-dimensional comparison of oral proficiency interviews to conversation, academic and professional spoken registers

MELAB : Michigan Engish Language Battery 989 OPIs in 2013

OPI used for academic and profesional purposes

Only transcribed the first 5 minutes

55 linguistic features

TagCount

FA

6 factor solution

Dimensions interpreted functionally

Dimension scores

Differences across registers (ANOVAs and post hocs)

 

6 dimension

1. Explicit stance: private verbs, that deletion, lower rates of implicit stance that the Longman corpus

3. Speaker-centered informational vs listener centered involvement: pro1, subject-conj.causative, nn, amplifiers,

4. Extended informational discourse: word length, prep, jj atr, that rel, negative features: all pronouns

6. Implicit stance: higher rates of implicit stance that the Longman corpus

 

 

Non-obvious meaning in CL and CADS #cl2015

IMG_20150723_100414

Plenary session: Alan Partington
Non-obvious meaning in CL and CADS: from ‘hindsight post-dictability’ to sweet serendipity

Chair: Amanda Potts

http://www3.lingue.unibo.it/blog/clb/

Introspection & intuition

Processes of inference from the linguistic trace left by speakers/writers

Shared meaning

Idiom principle

Complexity of common grammatical items

Colligation: every word primed to occur in or avoid certain grammatical positions and functions (Hoey, 2005: 13)

SiBol (Siena-Bologna) corpus of newspapers, judicial inquiries, press briefings. Link.

Rapid language change

Corpus methodology is useful in detecting absence, not only presence

Language looks rather different when you look at a lot of it at once (Sinclair 1991)

Qualitative: anaphoric, historic, past behaviour

Quantitative anaphoric and cataphoric; enough data with which to infer

If primed >> psychologically fixed >> reproduced

Evaluation as prototypicality: inner circle obvious, outer circle non-obvious

Prosody can depend on grammar (Louw 1993), pov, literal vs figurative use and on field of register

Embedding is an important factor to interpret prosody

The added value of CL in discourse studies

Looking at language at different levels of abstraction: overview & close reading

Data are not sacred

Much of textual meaning is accretional

Positive cherry-picking: find counter examples

Almost all explanation in DA is informed speculation: in human science this is the closest you get to explanation

Moral panics have evolved over the years (globesity in 2015)

 

 

 

Representation of benefit claimants in UK media #cl2015

 

Ben Clarke
The ideological representation of benefit claimants in UK print media

2010 – 2014

2.3 M corpus

benefits clsimant(s) search criteria

Adjectival constructions

Adjective lemmas are ranked

hard number 40

tough number 53

enTenTen13 score

Tough on is significant in the corpus

Tough patterns

Benefit claimants: scroungers

tougher conditions, curbs on

Prepositions and ideology: on here as a Goal PR in a Material PT (impacted/affected entity)

 

Tono Linguistic feature extraction #cefr #cl2015

Yukio Tono

Linguistic feature extraction and evaluation using machine learning to identify “criterial” grammar constructions for the CEFR levels

IMG_20150722_160026

 

L2 learner profile

English Profile – CEFR for Englsih

Criterial features: Hawkins & Filipovic 2012

CEFR-J RLD Project: aim prepare list of vocabulary and grammar item to be taught and assessed at each CEFR level

CEFR Coursebook Corpus

IMG_20150722_160504

Weka format 3.6.12

158 features

Attribute selection