Phil Durrant’s talk available on Youtube

Check out Dr Durrant’s talk “Researching writing development with a corpus” on our research group Youtube Channel

More info on the talk here.

More info on Corpus linguistics and applied linguistics research 2021 site.

Corpus of North American Spoken English (CoNASE)

The Corpus of North American Spoken English (CoNASE), a 1.25-billion-word corpus of geolocated automatic speech-to-text transcripts, is now available in a beta version.

URL for more information.

The corpus was created from 301,847 ASR transcripts from 2,572 YouTube channels, corresponding to 154,041 hours of video. The size of the corpus is 1,252,066,371 word tokens.

The channels sampled in the corpus are associated with local government entities such as town, city, or county boards and councils, school or utility districts, regional authorities such as provincial or territorial governments, or other governmental organizations.

The transcripts are primarily of recordings of public meetings, although other genres are also present. Video transcripts have been assigned exact latitude-longitude coordinates using a geocoding script.

This information was distributed through the Corpora-List by Steven Coats, University of Oulu, Finland

To cite the corpus, please use

Coats, Steven. 2021. Corpus of North American Spoken English (CoNASE).

Incorporating corpora in teaching symposium, Mittuniversitetet, Sweden

Check out the programme here.


Here you can find some useful resources to carry out your transcription project.

MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk. 3rd Edition. Mahwah, NJ: Lawrence Erlbaum Associates.

Brian MacWhinney (2019) Tools for Analyzing Talk. Part 1: The CHAT Transcription Format. URL:

Leech (2004): types of annotation

phonetic annotation e.g. adding information about how a word in a spoken corpus was pronounced.

prosodic annotation — again in a spoken corpus — adding information about prosodic features such as stress, intonation and pauses.

syntactic annotation —e.g. adding information about how a given sentence is parsed, in terms of syntactic analysis into such units such phrases and clauses

semantic annotation e.g. adding information about the semantic category of words — the noun cricket as a term for a sport and as a term for an insect belong to different semantic categories, although there is no difference in spelling or pronunciation.

pragmatic annotation e.g. adding information about the kinds of speech act (or dialogue act) that occur in a spoken dialogue — thus the utterance okay on different occasions may be an acknowledgement, a request for feedback, an acceptance, or a pragmatic marker initiating a new phase of discussion.
discourse annotation e.g. adding information about anaphoric links in a text, for example connecting the pronoun them and its antecedent the horses in: I’ll saddle the horses and bring them round. [an example from the Brown corpus]

stylistic annotation e.g. adding information about speech and thought presentation (direct speech, indirect speech, free indirect thought, etc.)
lexical annotation adding the identity of the lemma of each word form in a text — i.e. the base form of the word, such as would occur as its headword in a dictionary (e.g. lying has the lemma LIE).

Online services:


Backbone Transcriptor. URL



Metadata for corpus work:

Annotation on Sketch Engine:

TEI by example website:

Acquiring text varities

One of the most important goals of formal schooling is teaching text varieties that might not be acquired outside of school […] Early in school, children learn to read books of many different types, including fictional stories, historical accounts of past events, and descriptions of natural phenomena. These varieties rely on different linguistic structures and patterns, and students must learn how to recognize and interpret those differences. At the same time, students must learn how to produce some of these different varieties, for example writing a narrative essay on what they did during summer vacation versus a persuasive essay on whether the school cafeteria should sell candy. The amount of explicit instruction in different text varieties varies across teachers, schools, and countries, but even at a young age, students must somehow learn to control and interpret the language of different varieties, or they will not succeed at school.

Biber & Conrad (2009:3)

Biber, D., & Conrad, S. (2009). Register, genre, and style (Cambridge Textbooks in Linguistics).

Check other quotations here.