Non-obvious meaning in CL and CADS #cl2015

IMG_20150723_100414

Plenary session: Alan Partington
Non-obvious meaning in CL and CADS: from ‘hindsight post-dictability’ to sweet serendipity

Chair: Amanda Potts

http://www3.lingue.unibo.it/blog/clb/

Introspection & intuition

Processes of inference from the linguistic trace left by speakers/writers

Shared meaning

Idiom principle

Complexity of common grammatical items

Colligation: every word primed to occur in or avoid certain grammatical positions and functions (Hoey, 2005: 13)

SiBol (Siena-Bologna) corpus of newspapers, judicial inquiries, press briefings. Link.

Rapid language change

Corpus methodology is useful in detecting absence, not only presence

Language looks rather different when you look at a lot of it at once (Sinclair 1991)

Qualitative: anaphoric, historic, past behaviour

Quantitative anaphoric and cataphoric; enough data with which to infer

If primed >> psychologically fixed >> reproduced

Evaluation as prototypicality: inner circle obvious, outer circle non-obvious

Prosody can depend on grammar (Louw 1993), pov, literal vs figurative use and on field of register

Embedding is an important factor to interpret prosody

The added value of CL in discourse studies

Looking at language at different levels of abstraction: overview & close reading

Data are not sacred

Much of textual meaning is accretional

Positive cherry-picking: find counter examples

Almost all explanation in DA is informed speculation: in human science this is the closest you get to explanation

Moral panics have evolved over the years (globesity in 2015)

 

 

 

A linguistic taxonomy of registers on the searchable web #cl2015

 

IMG_20150723_092947

Doug Biber; Jesse Egbert; Mark Davies
Panel: A linguistic taxonomy of registers on the searchable web: Distribution, linguistic descriptions, and automatic register identification

Abstract book pp 52-54

Doug Biber

Oral-literate dimensions & Narrative dimension remain constant in all MDA across languages and registers

Oral-literate dimensions

3 dimensions here

Pronouns & questions, verbs, dependent clauses crucial in interactivity

These analyses show that there are major linguistic differences among the eight major user-defined register categories.

Can we automatically id web registers?

Start point 150+ linguistic features as predictors

90% was training corpus and 10% test corpus

Each document was assigned to a single category

Stepwise discriminant analysis to select the strongest predictive  features

10-feature model 0.34 precision

44-feature model 0.44 precision

 

 

Representation of benefit claimants in UK media #cl2015

 

Ben Clarke
The ideological representation of benefit claimants in UK print media

2010 – 2014

2.3 M corpus

benefits clsimant(s) search criteria

Adjectival constructions

Adjective lemmas are ranked

hard number 40

tough number 53

enTenTen13 score

Tough on is significant in the corpus

Tough patterns

Benefit claimants: scroungers

tougher conditions, curbs on

Prepositions and ideology: on here as a Goal PR in a Material PT (impacted/affected entity)

 

Tono Linguistic feature extraction #cefr #cl2015

Yukio Tono

Linguistic feature extraction and evaluation using machine learning to identify “criterial” grammar constructions for the CEFR levels

IMG_20150722_160026

 

L2 learner profile

English Profile – CEFR for Englsih

Criterial features: Hawkins & Filipovic 2012

CEFR-J RLD Project: aim prepare list of vocabulary and grammar item to be taught and assessed at each CEFR level

CEFR Coursebook Corpus

IMG_20150722_160504

Weka format 3.6.12

158 features

Attribute selection

 

 

 

 

 

 

Language learning theories underpinning corpus-based pedagogy #cl2015

 

IMG_20150722_140248
Lynne Flowerdew
Language learning theories underpinning corpus-based pedagogy

The noticing hypothesis (Schmidt)

Attention consciously drawn

Noticing linked to frequency counts

Implicit vs explicit learning

 Constructivist learning

Learners engage in discovery learning

Inductive learning

Cognitive skills, problem solving to understand new data

Widmann et al. 2011: the more possible starting points for exploitation, the more likely for different learners- SACODEYL project.

Sociocultural theory

What about language learning outside the classroom and incidental learning?

 

Learner corpus research plenary #cl2015

Learner corpus research: a fast-growing interdisciplinary field

Sylviane Granger

IMG_20150722_100646

 

LCR IS an interdisciplinary research

Design: learner and taks variables to control

Not only English language

Method: CIA (Granger, 1996) and computer-aided error analysis

Wider spectrum of linguistic analysis

Interpretation: focus on transfer but this is changing; growing integration of SLA theory

Applications: few up-and-running resources but great potential

Version 3 (2016 or 2017) around 30 L1s as opposed to 11 L1s in Version 1

Learner corpora is a powerful heuristic resource

Corpus techniques make it possible to uncover new dimensions of learner language and lead to the formulation of new research questions: the L2 phrasicon (word combinations).

Prof. Granger brings up Leech’s preface to Learner English on Computer (1998)

Gradual change from mute corpora to sound aligned corpora

POS tagging has improved so much

Error-tagging: wide range of error tagging systems: multi-layer annotation systems

Parsing of learner data (90% accuracy Geertzen et al. 2014)

Static learner corpora vs monito corpora

CMC learner corpus (Marchand 2015)

Granger (2009) paper on the learner research field:

Granger, Sylviane. “The contribution of learner corpora to second language acquisition and foreign language teaching.” Corpora and language teaching 33 (2009): 13.

 

CIA V2 Granger (2015): a new model

SLA researchers are more interested in corpus data and corpus linguists are more familiar with SLA grounding

Implications are much more numerous than applications

Links with NLP: spell and gramar checking, learner feedback, native language id, etc.

Multiple perspectives on the same resource: richer insights and more powerful tools

Phraseology

Louvain English for Academic Purposes Dictionary (LEAD)

web-based

corpus based

descriptions of cross-disciplinary academic vocabulary

1200 lexical times around 18 functions (contrast, illustrate, quote, refer, etc.)

A really exciting application