Extracting n word phrases in large texts

This is a summary of resources posted on [Corpora-List] early 2014

CMU-Cambridge Statistical Language Modeling toolkit

http://mi.eng.cam.ac.uk/~prc14/toolkit.html

Sketch Engine

http://www.sketchengine.co.uk/documentation/wiki/SkE/NGrams

Lawrence Anthony’s AntConc 

http://www.antlab.sci.waseda.ac.jp/software.html

kfNgram

http://www.kwicfinder.com/kfNgram/kfNgramHelp.html

Colibri

Software for the extraction of n-grams as well as patterns that are not consecutive (skipgrams). The software is written in C++ for speed and memory efficiency but comes with a Python binding for usage from Python script. It also has a standalone CLI tool that can do what you want.

https://github.com/proycon/colibri-core

http://proycon.github.io/colibri-core/doc/ f

Maarten van Gompel

GnuPG key: 0x1A31555C  XMPP: proycon@anaproy.nl

ICAME35: 2nd Call for Papers: deadline Dec 15


The Centre for Research in Applied Linguistics (CRAL) at the University of Nottingham is hosting the 35th ICAME conference.

The theme of the conference is: Corpus Linguistics, Context and Culture.

** Date: 30 April  to 4 May 2014.
Venue:  University of Nottingham, UK, University Park Campus.

The main conference will be opened with a talk by

Ronald Carter (University of Nottingham)

and there will be a wine reception sponsored by John Benjamins on the Wednesday night.

Keynote speakers:

Beatrix Busse (University of Heidelberg)

Susan Hunston (University of Birmingham)

Tony McEnery (University of Lancaster)

Ute Roemer (Georgia State University)

Wolfgang Teubert (University of Birmingham)

The conference aims to explore English corpus linguistics and its intersections with other fields, as well as its applications in a range of contexts of language use. We invite submissions of abstracts for papers, work-in-progress reports, posters and software demonstrations on any topic relevant to the conference theme.  Areas for submissions can include – but are not limited to:

– corpus and discourse analysis
– corpus linguistics and its theoretical implications
– diachronic corpus studies
– corpora and new media
– varieties of English
– contrastive linguistics
– sociolinguistics
– mixed methods approaches in corpus linguistics
– corpus stylistics
– corpora in English language education

** Deadline for the submission of abstracts: 15 Dec 2013

Please submit your abstract through the conference website.
Proposals for pre-conference workshops should be sent directly to the organizers at ICAME2014@nottingham.ac.uk

For more details please see the conference website:

http://www.nottingham.ac.uk/conference/fac-arts/english/icame-35/index.aspx

We are looking forward to seeing you in Nottingham in 2014.

The ICAME 35 Team
Michaela Mahlberg, Gavin Brookes, Kathy Conklin, Rachele De Felice, Dave Evans, Kat Gupta, Kevin Harvey, Tony Fisher, Lorenzo Mastropierro, Rebecca Peck, Ana Pellicer-Sánchez, Viola Wiegan

ICAME35: 2nd Call for Papers: deadline Dec 15


The Centre for Research in Applied Linguistics (CRAL) at the University of Nottingham is hosting the 35th ICAME conference.

The theme of the conference is: Corpus Linguistics, Context and Culture.

** Date: 30 April  to 4 May 2014.
Venue:  University of Nottingham, UK, University Park Campus.

The main conference will be opened with a talk by

Ronald Carter (University of Nottingham)

and there will be a wine reception sponsored by John Benjamins on the Wednesday night.

Keynote speakers:

Beatrix Busse (University of Heidelberg)

Susan Hunston (University of Birmingham)

Tony McEnery (University of Lancaster)

Ute Roemer (Georgia State University)

Wolfgang Teubert (University of Birmingham)

The conference aims to explore English corpus linguistics and its intersections with other fields, as well as its applications in a range of contexts of language use. We invite submissions of abstracts for papers, work-in-progress reports, posters and software demonstrations on any topic relevant to the conference theme.  Areas for submissions can include – but are not limited to:

– corpus and discourse analysis
– corpus linguistics and its theoretical implications
– diachronic corpus studies
– corpora and new media
– varieties of English
– contrastive linguistics
– sociolinguistics
– mixed methods approaches in corpus linguistics
– corpus stylistics
– corpora in English language education

** Deadline for the submission of abstracts: 15 Dec 2013

Please submit your abstract through the conference website.
Proposals for pre-conference workshops should be sent directly to the organizers at ICAME2014@nottingham.ac.uk

For more details please see the conference website:

http://www.nottingham.ac.uk/conference/fac-arts/english/icame-35/index.aspx

We are looking forward to seeing you in Nottingham in 2014.

The ICAME 35 Team
Michaela Mahlberg, Gavin Brookes, Kathy Conklin, Rachele De Felice, Dave Evans, Kat Gupta, Kevin Harvey, Tony Fisher, Lorenzo Mastropierro, Rebecca Peck, Ana Pellicer-Sánchez, Viola Wiegan

CFP American Association for Corpus Linguistics AACL 2014

The American Association for Corpus Linguistics (AACL) call for papers for the next conference September 26-28, 2014, in Flagstaff, AZ.

Abstracts
Faculty, graduate students, and independent scholars are invited to submit abstracts for 25-minute papers (20 minute presentation + 5 minutes for questions) on any aspect of corpus linguistics. Abstracts will undergo anonymous review.

Papers are welcome from a range of subfields:

Tools and methods (corpus creation, corpus annotation, tagging and parsing, visualization of large data sets, open source corpora (philosophy and practice), software development);
Linguistic analyses of corpora as they relate to language use (register/genre as well as lexical and grammatical variation, language varieties, parallel corpora, historical change, lexicography);
Application (the use of corpora in language teaching and learning).

Abstract details: Submit abstracts to aacl@nau.edu by February 10, 2014.

Cover page: Author(s) name(s); Affiliation; Contact information; Paper title; Category (see above)
Abstract page: Paper title; Abstract (max. 250 words)
Format: MS Word or PDF (the latter is necessary if the abstract contains specialized fonts)

Important dates
February 10: Deadline for submission of abstracts
April 11: Notification of decisions on abstracts
September 26-28: Conference