4th Learner Corpus Studies in Asia and the World (LCSAW4)

Following three successful conventions, the 4th Learner Corpus Studies in Asia and the World (LCSAW4) will be held on Sunday, 29, September2019, at Kobe University Centennial Hall in Japan. URL

Credits: https://www.flickr.com/photos/kobeu_pr/26947335663/in/photostream/


LCSAW4 is organized in cooperation with the ESRC-AHRC project led byDr. Tony McEnery at Lancaster University, UK.

Invited Speakers

Tony McEnery

Patrick Rebuschatt

Padraic Monaghan

Kazuya Saito

John Williams

Aaron Baty

Pascual Pérez-Paredes

Yukio Tono

Shin  Ishikawa

Mariko Abe

Yasutake Ishii

Emi Izumi

Masatoshi Sugiura

LCSAW4 Poster Session CFP

Date: Sunday, September 29, 2019

Venue: Kobe University Centennial Hall

Presentation Type: Poster

Language: English

Topic: Studies related to L2 learner corpus

Publication : Online proceedings with ISSN will be published.

Submission : Please send your abstract and short-bio by 20 May 2019  http://bit.ly/lcsaw4 If you cannot access the site, please contact the organizer (iskwshin@gmail.com)

Notice of acceptance: By the end of May 2019

Full paper due: By the end of August 2019

Fee: Free

Granada 29 April 2019: some quotes

Representativeness

Balance, representativeness and comparability are ideals which corpus builders strive for but rarely, if ever, attain. In truth, the measures of balance and representativeness are matters of degree. V´aradi (2001) has been critical of the failure of corpus linguists to fully define and realise a balanced and representative corpus. Even proposals, such as those of Biber (1993), to produce empirically determined representative corpora have not actually been pursued.Biber’s proposal for representativeness to be realised by measuring internal variation within a corpus – i.e. a corpus is representative if it fully captures the variability of a language – has yet to be adopted in practice. It is also only one of many potential definitions of representativeness, as Leech (2007) points out.
However, though balance and representativeness remain largely heuristic notions, decided on the basis of the judgement of linguists when they are building a corpus, this does not mean to say that the concepts are of no value. Similarly, while some corpora designed to be comparable to each other can clearly make a claim for balance and representativeness, others may only do so to a degree. Leech (2007: 141–3) usefully summarises a series of problems encountered in building comparable corpora of British English to explore diachronic variation: notably, problems relating to the evolution over time of the genres that are balanced in those corpora. The changing nature of genre makes claims of comparability when looking at diachronic variation much more tendentious than similar claims for the synchronic Brown/LOB comparison, for example.

Accountability & “selected examples”

Given what has been said about total accountability, you may wonder that analysts would ever approach a corpus seeking a single example, or a subset of carefully selected examples. Not only do some analysts do just that, in certain circumstances it may actually be the right thing to do. Indeed, in an important sense, approaching a corpus in search of a specific type of result may be entirely in line with the scientific method.

McEnery & Hardie (2012).

Positivism

Positivism was the dominant epistemological paradigm in social science from the 1930s through to the 1960s, its core argument being that the social world exists externally to the researcher, and that its properties can be measured directly through observation. In essence, positivism argues that:

• Reality consists of what is available to the senses – that is, what can be seen,smelt, touched, etc.

• Inquiry should be based upon scientific observation (as opposed to philosophical speculation), and therefore on empirical inquiry.

• The natural and human sciences share common logical and methodological principles, dealing with facts and not with values.

Hence, ideas only deserve their incorporation into knowledge if they can be put to the test of empirical experience. Positivists saw the natural sciences as progressing through the patient accumulation of facts about the world in order to produce generalizations known as scientific laws.To achieve this, the act of scientific inquiry was taken to be the accumulation of ‘brute data’ such as shape, size, motion, etc. For positivists, then, both the natural and social worlds operated within a strict set of laws, which science had to discover through empirical inquiry.This is a brief summary of positivism, but, as Bryman (1988) notes, there have been many different versions of positivism which overlap, and which rarely agreed precisely on its essential components.

Phenomenology

Phenomenology holds that any attempt to understand social reality has to be grounded in people’s experiences of that social reality. Hence, phenomenology insists that we must lay aside our prevailing understanding of phenomena and revisit our immediate experience of them in order that new meanings may emerge. Current understandings have to be ‘bracketed’ to the best of our ability to allow phenomena to ‘speak for themselves’, unadulterated by our preconceptions. The result will be new meaning, fuller meaning or renewed meaning.

Critical inquiry

It is worth having a brief overview of critical inquiry because it offers quite a different perspective to positivism and interpretivism. This critical form of research is a meta-process of investigation, which questions currently held values and assumptions and challenges conventional social structures. It invites both researchers and participants to discard what they term ‘false consciousness’ in order to develop new ways of understanding as a guide to effective action. In a Marxist sense, the critical inquiry perspective is not content to interpret the world but also to change it.The assumptions that lie beneath critical inquiry are that:

-Ideas are mediated by power relations in society.
-Certain groups in society are privileged over others and exhert an oppressive force on subordinate groups.
-What are presented as ‘facts’ cannot be disentangled from ideology and the self-interest of dominant groups.

Mainstream research practices are implicated, even if unconsciously, in the reproduction of the systems of class, race and gender oppression. Those adhering to the critical inquiry perspective accuse interpretivists of adopting an uncritical stance towards the culture they are exploring, whereas the task of researchers is to call the structures and values of society into question.

Gray (2004)

Some findings

Some findings, the Tweet way….

Research methods: corpus linguistics

In this session we’ll look at some corpus linguistics methods that can be used to analyse a text or a group of texts automatically.

You can find further information and resources in my book Corpus Linguistics for Education. A Guide for Research. Routledge.

In a way, corpus linguistics could be seen as a type of content analysis that places great emphasis on the fact that language variation is highly systematic.

We´ll look at ways in which frequency and word combination can reveal different patterns of use and meaning at the lexical, syntactical and semantic levels. We will examine how we can make use of corpus linguistics methods to look at a corpus of texts (from different or the same individuals) and single texts and how these compare to what is frequent in similar or identical registers or communicative situations. This way, we can not only find out what is frequent but also what is truly distinctive or central in a given text or group of texts.

Students are encouraged to download and install Antconc on their laptops:

URL: http://www.laurenceanthony.net/software/antconc/

File converter tool URL: http://www.laurenceanthony.net/software/antfileconverter/

CL research methods

There are different well-established CL methods to research language usage through the examination of naturally occurring data. These methods stress the importance of frequency and repetition across texts and corpora to create saliency. These methods can be grouped in four categories:

Analysis of keywords. These are words that are unusually frequent in corpus A when compared with corpus B. This is a Quantitative method that examines the probability to find/not to find a set of words in a given corpus against a reference corpus. This method is said to reduce both researchers´ bias in content analysis and cherry-picking in grounded theory.

Analysis of collocations. Collocations are words found within a given span (-/+ n words to the left and right) of a node word. This analysis is based on statistical tests that examine the probability to find a word within a specific lexical context in a given corpus. There are different collocation strength measures and a variety of approaches to collocation analysis (Gries, 2013). A collocational profile of a word, or a string of words, provides a deeper understanding of the meaning of a word and its contexts of use.

Colligation analysis. This involves the analysis of the syntagmatic patterns where words, and string of words, tend to co-occur with other words (Hoey, 2005). Patterning stresses the relationship between a lexical item and a grammatical context, a syntactic function (i.e. postmodifiers in noun phrases) and its position in the phrase or in the clause. Potentially, every word presents distinctive local colligation analysis. Word Sketches have become a widely used way to examine patterns in corpora.

N-grams. N-gram analysis relies on a bottom-up computational approach where strings of words (although other items such as part of speech tags are perfectly possible) are grouped in clusters of 2,3,4,5 or 6 words and their frequency is examined. Previous research on n-grams shows that different domains (topics, themes) and registers (genres) offer different preferences in terms of the n-grams most frequently used by expert users.

Quote 1: what is a corpus?

The word corpus is Latin for body (plural corpora). In linguistics a corpus is a collection of texts (a ‘body’ of language) stored in an electronic database. Corpora are usually large bodies of machine-readable text containing thousands or millions of words. A corpus is different from an archive in that often (but not always) the texts have been selected so that they can be said to be representative of a particular language variety or genre, therefore acting as a standard reference. Corpora are often annotated with additional information such as part-of-speech tags or to denote prosodic features associated with speech. Individual texts within a corpus usually receive some form of meta-encoding in a header, giving information about their genre, the author, date and place of publication etc. Types of corpora include specialised, reference, multilingual, parallel, learner, diachronic and monitor. Corpora can be used for both quantitative and qualitative analyses. Although a corpus does not contain new information about language, by using software packages which process data we can obtain a new perspective on the familiar (Hunston 2002: 2–3).

Baker et al. (2006). A glossary of corpus linguistics. Edinburgh: UEP.

Quote 2: introspection

Armchair linguistics does not have a good name in some linguistics circles. A caricature of the armchair linguist is something like this. He sits in a deep soft comfortable armchair, with his eyes closed and his hands clasped behind his head. Once in a while he opens his eyes, sits up abruptly shouting, “Wow, what a neat fact!”, grabs his pencil, and writes something down. Then he paces around for a few hours in the excitement of having come still closer to knowing what language is really like. (There isn’t anybody exactly like this, but there are some approximations.)

Charles Fillmore. Directions in Corpus Linguistics (Proceedings of Nobel Symposium 82, 1991),

Quote 3: evidence in a corpus

We as linguists should train ourselves specifically to be open to the evidence of long text. This is quite different from using the computer to be our servant in trying out our ideas; it is making good use of some essential differences between computers and people.

[…] I believe that we have to cultivate a new relationship between the ideas we have and the evidence that is in front of us. We are so used to interpreting very scant evidence that we are not in a good mental state to appreciate the opposite situation. With the new evidence the main difficulty is controlling and organizing it rather than getting it.

Sinclair. Trust the Text. (2004:17)

Quote 4: why analyse registers?

Register, genre, and style differences are fundamentally important for any student with a primary interest in language. For example, any student majoring in English, or in the study of another language like Japanese or Spanish, must understand the text varieties in that language. If you are training to become a teacher (e.g. for secondary education or for TESL), you will shortly be faced with the task of teaching your own students how to use the words and structures that are appropriate to different spoken and written tasks – different registers and genres. Other students of language are more interested in the study of literature or the creative writing of new literature, issues relating to the style perspective, since the literary effects that distinguish one novel (or poem) from the next are realized as linguistic differences.

Biber & Conrad  (2009:4)

Quote 8: sleeping furiously

Tony McEnery has outlined the reasons why corpus linguistics was largely ignored in the past possibly because of the influence of Noam Chomsky. Prof. McEnery has placed this debate in a wider context where different stakeholders fight a paradigm war: rationalist introspection versus evidence driven analysis.

Quote 9: epistemological adherence?

“Science is a subject that relies on measurement rather than opinion”, Bill Cox wrote in the book version of Human Universe, the BBC Show. And I think he is right. Complementary research methodologies can only bring about better insights and better-informed debates.

Hands-on workshop. Corpus analysis: the basics.

Tasks

3a Run a word list

3b Run a keyword list

3c Use concord plot: explore its usefulness

3d Choose a lexical item: explore clusters

3e Choose a lexical item: explore n-grams

3f Run a collocation analysis

Download the Conservative manifesto 2017 here and the Labour 2017 manifesto here

OR

Policy paper: DFID Education Policy 2018: Get Children Learning (PDF)

OR

Brown corpus data download.

Brown corpus text categories and the texts themselves identified.

Links

UAM Corpus Tool

Representative corpora (EN)

BNC https://corpus.byu.edu

COCA https://corpus.byu.edu

Representative corpora (Register perspective)

MICUSP http://micusp.elicorpora.info

Corpus of research articles http://micusp.elicorpora.info

A list of corpora you can download.

Using NVIVO?

NVIVO node export and beyond

http://scalar.usc.edu/works/using-nvivo-an-unofficial-and-unauthorized-primer/index

Migrants here to provide maximun benefit

Today, 27/1/2019, Sajid Javid UK Home Secretary laid out that the Govt sees immigrants as an asset to generate a “maximum benefit”.

May´s thing with immigrants and freedom of movement

A couple of years ago I published research that examined how migrants were constructed both in the UK immigration legislation and in the information delivered through the UK Border Agency website. We wrote this in 2015 well before the Brexit Referendum. I read this again today and have realised how naive we were. The following is part of our conclusions:

What our results seem to suggest is that for the UK Administration, the issue of immigrant integration is not part of how immigrants are constructed in the legislation and the information that the UK immigration agencies and authorities publish and distribute. This failure to mention integration issues in the legislation is not found in other legal systems such as in Italy, where Hernández González (2016) discovered a tension between inclusion/integration and exclusion/control in the same 2007–2011 period. The language-driven evidence provided in this study corroborates that the use of the lemma ‘migrant’ in the two corpora analysed calls for a partial construction of immigrants mainly as workers who need to be tightly controlled and classified into Tiers to prevent unlawful behaviour. In doing so, migrants, an alternative word for immigrants in our research context, acquires an extremely subtle negative prosody.

Pérez-Paredes, P., Aguado. P. & Sánchez, P. (2017).  Constructing immigrants in UK legislation and Administration informative texts: a corpus-driven study (2007-2011). Discourse & Society,28,1,81-103.