Ideology in corporate language

Ruth Breeze

Ideology in corporate language: discourse analysis using Wmatrix3

2013 Annual Reports from leading companies (16)  in financial services, mining, food and pharmaceutical

Parts: first part, non technical, discursive, visually interesting

Reference corpus: 1st BNC Sampler Business & BNC Informative texts but then only BNC Business

Use of semantic categories

Three case studies: size (big), time (begin) and casuse and effect

Size: Focus on growth, large, expanding, substantial. Not only adjectives are interesting here.


Ideology of cause and effect

Dynamic approach to time

Emphasis on size and importance

Salient semantic areas: investigation, tough, strong, attentive, jelp & give, in power, belonging to a group

Differences: only in domain/topic-focus, probably different stresses on newness and green economy



Big data and corpus linguistics

AELINCO 2015 Conference, U. Valladolid, Spain

la foto

Andre Hardie Keynote 

What follows is my own notes and understanding of Hardie’s keynote.

How big is big data?


Non manual curation of the database

Must be mined or statistically summarised (manual not posssible)

Pattern finding: trend modelling, data mining & machine learning

Language big data: Google n-gram

A revolutionary change for language and linguistics?

Textual big data studies sone by non-linguistic specialists

Limitations of Google when used with no language training

Michel et al. Quantitative analysis if Culture. Science 331 (2011). Culturomics. What is there?

Quantitative findings, otherwise pretty predictable and very much frequency counts. In actual fact, the study was not backed by any expert in corpus linguistics. Steven Pinker was involved in the paper and the whole thing was treated as if they invented the wheel.

Borin et al. papers trying to “salvage” the whole cultoromics movement from its ignorance.

New “happiness” analyses are trendy, but what do they have to offer? Lots of problems attached and shortcomings.  I think that corpus analysis is becoming mainstream and it is more visible in specialized journal. The price of fame?

Linguistically risibly naive research done by non-linguists

la foto (1)


Paul Rayson keynote

Larger corpora available from Brown in the 1960’s

Mura Nava’s resource. An interesting timeline of corpus analysis tools.

SAMUELS : Semantic Annotation and mark-up for enhancing lexical searches

Overcoming problems when doing textual analysis: fused forms, archaic forms, apostrophe, and many many others…. Searching for words is a challenge > frequencies split by multiple spellings.


USAS semantic tagger

Full text tagging (as opposed to trends in “textual big data” analysis).

Modern & historical taggers

Disambiguation methods are essential

Paul discusses the Historical Thesaurus of English

The whole annotation system:

la foto (2)


I guess this is the missing part in big data as practised by non-linguists.
















CFP Current Work in Corpus Linguistics: Working with Traditionally-conceived Corpora and Beyond

Current Work in Corpus Linguistics: 
Working with Traditionally-conceived Corpora and Beyond

Welcome to Valladolid for the CILC2015, the seventh edition of the International Conference on Corpus Linguistics organized by AELINCO, the Spanish Association of Corpus Linguistics and hosted this year by the Department of English and the International Centre for Lexicography (University of de Valladolid, Spain). The University of Valladolid is a public university located in North-west Spain. Established in the 13th Century, it is one of the oldest universities in the world. It offers over 100 degrees, 50 masters and doctorate programmes, boasting a broad network of international relations, research centres as well as sports and cultural facilities together with a rich architectural and documentary heritage.

The conference will be held at the Palacio de Congresos “Conde Ansúrez” at the heart of Valladolid from 5 to 7 March 2015. The Conference will focus on current work in Corpus Linguistics as well as future developments of the discipline, especially in connection with the coming of age of Big Data.

Abstracts must be sent in English or Spanish and must not exceed 550 words. The time allotted for the presentation is 20 minutes (with 5 minutes for discussion). Abstracts for poster sessions will also be accepted.

The abstract proposals should be related to one of the conference panels outlined by AELINCO. Each panel has a director who shall communicate the acceptance or rejection of the proposals after the evaluation by experts.

The first circular provides information about the submission of proposals.

The deadline for submitting proposals is 1 December 2014.

For further information: