Category: corpus linguistics
English ENCOW14 web corpus now available first release version #ENCOW14A #corpuslinguistics
Through the corpora list
::::::::::::::::::::::::::::::::::
The English ENCOW14 web corpus is now available in its first release version ENCOW14A (16.8 GT full corpus, 9.6 GT shuffled). The shuffle version is completely free but available only to people working in the academia.
At the same time, we make available our new Colibri² web application hosted at webcorpora.org. It allows registered users to query the corpora or download the whole data sets. Colibri² also serves DECOW12AX
(German, 8.3 GT), NLCOW14AX (Dutch, 4.7 GT), SVCOW14AX (Swedish, 4.8 GT).
ENCOW14A was crawled in 2012 and 2014 in over 20 top-level domains, has undergone state-of-the-art deduplication, boilerplate removal, hyphenation repair and repair for run-together sentences (texrex). It is
annotated with POS (Penn/TreeTagger), lemma (TreeTagger), chunks (TreeTagger), as well as dependency relations (MaltParser, experimental). It contains the following meta data: URL, Last-Modified date, crawl date, country and city geolocation, and document quality score as well as paragraph boilerplate scores.
Download & web access via Colibri² (free registration required):
https://webcorpora.org/
Corpus information:
http://corporafromtheweb.org/
COW is created at Freie Universität Berlin, German Grammar Group:
http://hpsg.fu-berlin.de/
All processing specific to web documents was done with texrex:
http://texrex.sourceforge.net/
ENCOW14 includes GeoLite data created by MaxMind, available from:
http://www.maxmind.com.
:::::::::::
Roland Schäfer (ENCOW14/COW), Felix Bildhauer (COW)
Most Relevant NLP Journals via NLPeople
This is a question and follow-up initiated by Eduardo César Garrido Merchán in the Linkedin NLPeople group.
Meeting of the Association for Computational Linguistics (ACL)
Transactions of the Association for Computational Linguistics (ISSN: 2307-387X)
European Chapter of the ACL (EACL)
North American Chapter of the Association for Computational Linguistics
International Conference on Computational Linguistics (COLING)
Conference on Empirical Methods in Natural Language Processing (EMNLP)
Data & Knowledge Engineering.
IEEE Transactions on Knowledge and Data Engineering
Computational Linguistics
International Conference on Computational Linguistics and Intelligent Text Processing
Text REtrieval Conference (TREC)
International Joint Conference on Natural Language Processing
SIGIR
ECIR
CICLing.org
NLP conference calendar: http://www.cs.rochester.edu/~tetreaul/conferences.html
Learner Corpora in Language Testing and Assessment
How to Batch Convert Text Files to Other Formats in Mac via the Terminal
TELL-OP -Kick-off meeting
2014-1-ES01-KA203-004782
Plaza de la Universidad, Murcia, Spain
Accommodation for international delegates
Hotel Arco S. Juan: http://www.arcosanjuan.com/en/
Plaza Ceballos, 10
30003 Murcia, Spain