English ENCOW14 web corpus now available first release version #ENCOW14A #corpuslinguistics

Through the corpora list
::::::::::::::::::::::::::::::::::

The English ENCOW14 web corpus is now available in its first release version ENCOW14A (16.8 GT full corpus, 9.6 GT shuffled). The shuffle version is completely free but available only to people working in the academia.

At the same time, we make available our new Colibri² web application hosted at webcorpora.org. It allows registered users to query the corpora or download the whole data sets. Colibri² also serves DECOW12AX
(German, 8.3 GT), NLCOW14AX (Dutch, 4.7 GT), SVCOW14AX (Swedish, 4.8 GT).

ENCOW14A was crawled in 2012 and 2014 in over 20 top-level domains, has undergone state-of-the-art deduplication, boilerplate removal, hyphenation repair and repair for run-together sentences (texrex). It is
annotated with POS (Penn/TreeTagger), lemma (TreeTagger), chunks (TreeTagger), as well as dependency relations (MaltParser, experimental). It contains the following meta data: URL, Last-Modified date, crawl date, country and city geolocation, and document quality score as well as paragraph boilerplate scores.

Download & web access via Colibri² (free registration required):
https://webcorpora.org/

Corpus information:
http://corporafromtheweb.org/encow14/

COW is created at Freie Universität Berlin, German Grammar Group:
http://hpsg.fu-berlin.de/

All processing specific to web documents was done with texrex:
http://texrex.sourceforge.net/

ENCOW14 includes GeoLite data created by MaxMind, available from:
http://www.maxmind.com.

:::::::::::

Roland Schäfer (ENCOW14/COW), Felix Bildhauer (COW)

Most Relevant NLP Journals via NLPeople

This is a question and follow-up initiated by Eduardo César Garrido Merchán in the Linkedin NLPeople group.

Meeting of the Association for Computational Linguistics (ACL)
Transactions of the Association for Computational Linguistics (ISSN: 2307-387X)
European Chapter of the ACL (EACL)
North American Chapter of the Association for Computational Linguistics
International Conference on Computational Linguistics (COLING)
Conference on Empirical Methods in Natural Language Processing (EMNLP)
Data & Knowledge Engineering.
IEEE Transactions on Knowledge and Data Engineering
Computational Linguistics
International Conference on Computational Linguistics and Intelligent Text Processing
Text REtrieval Conference (TREC)
International Joint Conference on Natural Language Processing
SIGIR
ECIR
CICLing.org

NLP conference calendar: http://www.cs.rochester.edu/~tetreaul/conferences.html

Learner Corpora in Language Testing and Assessment

Edited by Marcus Callies and Sandra Götz
University of Bremen / Justus Liebig University, Giessen
ISBN 9789027203786 
The aim of this volume is to highlight the benefits and potential of using learner corpora for the testing and assessment of L2 proficiency in both speaking and writing, reflecting the growing importance of learner corpora in applied linguistics and second language acquisition research. Identifying several desiderata for future research and practice, the volume presents a selection of original studies, covering a variety of different languages. It features studies that present very thoroughly compiled new corpus resources which are tailor-made and ready for analysis in LTA, new tools for the automatic assessment of proficiency levels, and new methods of (self-)assessment with the help of learner corpora. Other studies suggest innovative research methodologies of how proficiency can be operationalized through learner corpus data. The volume is of particular interest to researchers in (applied) corpus linguistics, learner corpus research, language testing and assessment, as well as for materials developers and language teachers.
Learner corpora in language testing and assessment: Prospects and challenges
Marcus Callies and Sandra Götz
1 – 10
New corpus resources, tools and methods. The Marburg Corpus of Intermediate Learner English (MILE)
Rolf Kreyer
13 – 34
Avalingua : Natural language processing for automatic error detection
Pablo Gamallo Otero, Marcos Garcia, Iria del Río and Isaac González López
35 – 58
Data commentary in science writing: Using a small, specialized corpus for formative self-assessment practices
Lene Nordrum and Andreas Eriksson
59 – 84
First steps in assigning proficiency to texts in a learner corpus of computer-mediated communication
Tim Marchand and Sumie Akutsu
85 – 112
Data-driven approaches to the assessment of proficiency
The English Vocabulary Profile as a benchmark for assigning levels to learner corpus data
Agnieszka Lenko-Szymanska
115 – 140
A multidimensional analysis of learner language during story reconstruction in interviews
Pascual Pérez-Paredes and María Sánchez-Tornel
141 – 162
Article use and criterial features in Spanish EFL writing: A pilot study from CEFR A2 to B2 levels
María Belén Díez-Bedmar
163 – 190
Tense and aspect errors in spoken learner English: Implications for language testing and assessment
Sandra Götz
191 – 216

TELL-OP -Kick-off meeting

       
TELL-OP 
Transforming European Learner Language into Learning Opportunities
2014-1-ES01-KA203-004782
A KA200 Higher Education Strategic Partnership
 Universidad de Murcia, January 14, 2015
Kick-off meeting agenda
Venue: Universidad de Murcia, Spain, January 14-16, 2015
Campus de La Merced, Universidad de Murcia
Plaza de la Universidad, Murcia, Spain
SALA JACOBO DE LAS LEYES
Hemeroteca Clara Campoamor, Campus La Merced
Suggested arrival date: January 13 (evening) or 14 (morning)
Suggested departure: January 16  (afternoon-evening) or 17
Wednesday 14
Slot 1    16:00-18:30
Thursday 15
Slot 2 9:30-11:30
Slot 3       12:00-13:45
Slot 4       16:00-18:30
Friday 16
Slot 5 9:30-11:30
Slot 6 12:00-13:45
Slot 1 An overview of ERASMUS+ KA200 & regulations
Slot 2 A detailed overview of TELL-OP: aims, timescale and outputs
Slot 3 Intelectual outputs 5,6 & 7
Slot 4 Intelectual outputs 8 & 9
Slot 5 Intelectual outputs 10, 11 & 14
Slot 6 Intelectual output 12 & Multiplier event

Accommodation for international delegates

Hotel Arco S. Juan: http://www.arcosanjuan.com/en/
Plaza Ceballos, 10
30003 Murcia, Spain

Project management intranet