Language corpora and the language classroom

ENG 420C: Seminar in Language (Corpus Linguistics)
TuTh 11:10AM – 12:25PM (LA 207/227)
Department of English – NAU
Flagstaff, AZ

Pérez-Paredes, P. & Diíez Bedmar, B. 2010. Language corpiora and the language classroom. Murcia:
Consejería de Educación de la CARM.
ISBN 978-84-692-4229-2

LingPipe

LingPipe is a suite of Java libraries for the linguistic analysis of human language.

Feature Overview

LingPipe’s information extraction and data mining tools:
track mentions of entities (e.g. people or proteins);
link entity mentions to database entries;
uncover relations between entities and actions;
classify text passages by language, character encoding, genre, topic, or sentiment;
correct spelling with respect to a text collection;
cluster documents by implicit topic and discover significant trends over time; and
provide part-of-speech tagging and phrase chunking.

Architecture

LingPipe’s architecture is designed to be efficient, scalable, reusable, and robust. Highlights include:
Java API with source code and unit tests;
multi-lingual, multi-domain, multi-genre models;
training with new data for new tasks;
n-best output with statistical confidence estimates;
online training (learn-a-little, tag-a-little);
thread-safe models and decoders for concurrent-read exclusive-write (CREW) synchronization; and
character encoding-sensitive I/O

LingPipe

LingPipe is a suite of Java libraries for the linguistic analysis of human language.

Feature Overview

LingPipe’s information extraction and data mining tools:
track mentions of entities (e.g. people or proteins);
link entity mentions to database entries;
uncover relations between entities and actions;
classify text passages by language, character encoding, genre, topic, or sentiment;
correct spelling with respect to a text collection;
cluster documents by implicit topic and discover significant trends over time; and
provide part-of-speech tagging and phrase chunking.

Architecture

LingPipe’s architecture is designed to be efficient, scalable, reusable, and robust. Highlights include:
Java API with source code and unit tests;
multi-lingual, multi-domain, multi-genre models;
training with new data for new tasks;
n-best output with statistical confidence estimates;
online training (learn-a-little, tag-a-little);
thread-safe models and decoders for concurrent-read exclusive-write (CREW) synchronization; and
character encoding-sensitive I/O