Lexicoder automated content analysis of text

Lexicoder is a Java-based, multi-platform software for automated content analysis of text. Lexicoder was developed by Lori Young and Stuart Soroka, and programmed by Mark Daku (initially at McGill University, and now at Penn, Michigan, and McGill respectively).

The current version of the software (2.0) is freely available – for academic use only. Additions and revisions will also be released here as they become available. In addition, the Lexicoder Sentiment Dictionary, a dictionary designed to capture the sentiment of political texts, is available formatted for Lexicoder, or WordStat, and also adaptable to other content-analytic software. Work on Topic Dictionaries, based on the Policy Agendas coding scheme, is also underway.

Through Linkedin The WebGenre R&D Group.

CFP Terminology & Artificial Intelligence 2015


Terminology and Artificial Intelligence 2015
4 November – 6 November 2015
University of Granada, Spain


Terminology and Artificial Intelligence (TIA) 2015 will highlight the close connection between multilingual terminology, ontologies, and the representation of specialized knowledge. Knowledge, as regarded in Terminology, is something more complex than a simple hierarchy or a thesaurus-like structure. In this sense, ontologies, understood as a shared conceptualization of a domain that can be communicated between people and/or systems, are better suited for accounting for multilinguality and contextual constraints. The link between Terminology and knowledge representation has been widely acknowledged with the advent of multilingual ontologies.

This is particularly relevant since today’s networked society has generated an increasing number of contexts where multilingualism challenges current knowledge representation methods and techniques. To meet these challenges, it is necessary to deal with semantics since information can be organized, presented, and searched, based on meaning and not just text. Ideally, this would mean that language-independent specialized knowledge could be accessed across different natural languages. There is thus the urgent need for high-quality multilingual knowledge resources that are able to bridge communication barriers, and which can be linked and shared.

Such issues can only be successfully addressed with creative collaborative solutions within disciplines, such as knowledge engineering, terminology, ontology engineering, cognitive sciences, corpus lexicology, and computational linguistics. Accordingly, the TIA 2015 Conference will provide a forum for interdisciplinary research that focuses on the intersection of different disciplines dealing with terminology, multilingualism, lexicology, ontology, and knowledge representation. Papers may address both theoretical questions and methodological aspects on these issues, as well as interdisciplinary approaches developed to facilitate convergence and co-operation in terminological aspects of importance to an increasingly multilingual society.

TIA 2015 solicits both regular papers (8 pages), which present significant work, and short papers (4 pages), which typically present work in progress or a smaller, focused contribution. Regardless of the language of the paper( English, Spanish, or French), all paper presentations will be in English. The submission deadline is June 15. See the conference webpage for more specific submission details.

1. Terminology and ontology acquisition and management
· Applying pattern recognition to enriching terminological resource
· Lexicons, thesauri and ontologies as semantic resources
· Lexicons and ontologies as means for knowledge transfer
· Reusing, standardizing and merging terminological or ontological resources
· Multilingual terminology extraction
· Multilinguality and multimodality in terminological resources
· Management of language resources
1. Terminology and knowledge representation
· Ontological semantics and linguistic
· Ontology localization
· Development of multimedia terminological resources
· Terminology alignment in parallel corpora and other lexical resources
· Representation of terms and conceptual relations in knowledge-based applications
· Comparative studies of terminological resources and/or ontological resources
· Terminological resources in the 21st century
· Harmonization of format and standards in terminological resources
1. Terminology and ontologies for applications
· Interoperability and reusability in knowledge-based tools and applications
· Models and metamodels in annotating semantic and terminological resources
· New R&D directions in terminology for industrial uses and needs
· Terminology for machine translation and natural language processing

Featured plenary speakers
Paul Buitelaar, National University of Ireland, Galway, Ireland
Ricardo Miral Usón, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain

Pamela Faber, University of Granada
Thierry Poibeau, CNRS

The PROGRAMME COMMITTEE members are distinguished experts from all over the world.

See the TIA 2015 website: http://lexicon.ugr.es/tia2015/Submission.html

Paper submissions (long and short papers): 15 June 2015
Notification to authors: 4 September 2015
Final camera-ready paper: 24 September 2015
Conference: 4-6 November 2015

University of Granada,
Faculty of Translation and Interpreting,
18071 Granada, Spain

Contact information: termai2015@gmail.com​

Mensaje distribuido a través de la lista de (AESLA)

Text Mining and Applications (TEMA’15)


Text Mining and Applications (TEMA’15) Track of EPIA’15

TeMA 2015 will be held at the 17th Portuguese Conference on Artificial Intelligence (EPIA 2015) taking place at the University of Coimbra, Portugal, from 8th to 11th September 2015. This track is organized under the auspices of the Portuguese Association for Artificial Intelligence (APPIA).

EPIA 2015 URL: http://epia2015.dei.uc.pt

This announcement contains:

[1] Track Description; [2] Topics of Interest; [3] Special Interests; [4] Important Dates; [5] Paper Submission and Formatting Instructions; [6] Track Fees; [7] Organizing Committee; [8] Program Committee and [9] Contacts.

[1] Track Description

Human languages are complex by nature and efforts in pure symbolic approaches alone have been unable to provide fully satisfying results. Text Mining and Machine Learning techniques applied to texts (raw or annotated) brought up new insights and completely shifted the approaches to Human Language Technologies. Both approaches, symbolic and statistically based, when duly integrated, have shown capabilities to bridge the gap between language theories and effective use of languages, and can enable important applications in real-world heterogeneous environment such as the Web.

The most natural form of written information is raw, unstructured text. The huge amount of this kind of textual information circulating in the Internet nowadays (in an increasing number of different languages) leads us to use and investigate systems, algorithms and models for mining texts. As a consequence, Text Mining is an active research area that is continuously broadening worldwide and fostering reinforced interest in languages other than the most common ones such as English, French, German and now Chinese. This 6th Biannual Track of Text Mining and Applications will provide, as in previous editions of the TeMA Tracks within the EPIA Conferences, a venue for researchers to present and share their work in intelligent computational technologies applied to written human languages. TeMA 2015 is a forum for researchers working in Human Language Technologies i.e. Natural Language Processing (NLP), Computational Linguistics (CL), Natural Language Engineering (NLE), Text Mining (TM) and related areas.

Authors are invited to submit their papers on any of the issues identified in section [2].

[2] Topics of Interest

Topics include but are not limited to:

Text Mining:

– Language Models

– Multi-word Units

– Lexical Knowledge Acquisition

– Word and Multi-word Sense Disambiguation

– Acquisition and Usage of Language Resources

– Lexical Cohesion

– Sentiment Analysis

– Word and Multi-word Translation Extraction

– Textual Entailment

– Text Clustering and Classification

– Algorithms and Data Structures for Text Mining

– Multi-Faceted Text Analysis: Opinions, Time, Space

– Evaluation of all the previous


– Social Network Analysis

– Machine Translation

– Automatic Summarization

– Information Extraction / Intelligent Information Retrieval

– Multilingual access to multilingual Information

– E-training, E-learning and Question-Answering Systems

– Web Mining

[3] Special Interests

The evolution of the Web has drastically changed the focus of Text Mining as most of the texts are small in size. While, in the past, the focus was on dealing with very large corpora of long texts, the new reality is huge collections of tweets or posts on social media that contain very few words in a clear multilingual environment. This is usually referred to the Big Data (here Big Textual Data). As a consequence, new trends have recently been appearing in Text Mining, for example, keyword extraction, named-entity recognition, novelty detection, event identification.

Moreover, the growing interest and quality of Wikipedia has allowed to include knowledge on a large scale in Text Mining applications. As such, a lot of research have been focusing on the correct use of knowledge bases, for example, entity information retrieval, word sense disambiguation, ephemeral clustering.

[4] Important Dates

March 9, 2015: Paper submission deadline

April 27, 2015: Notification of paper acceptance

June 1, 2015: Deadline for camera-ready versions

September 8-11, 2015: Conference dates

[5] Paper Submission and Formatting Instructions

Submissions must be original and not published elsewhere. Papers should not exceed twelve (12) pages in length and must adhere to the formatting instructions of the conference. Each submission will be peer reviewed by at least three members of the Program Committee. The reviewing process is double blind, so authors should remove names and affiliations from the submitted papers, and must take reasonable care to assure anonymity during the review process. References to own work may be included in the paper, as long as referred to in the third person. Acceptance will be based on the paper’s significance, technical quality, clarity, relevance and originality. Accepted papers must be presented at the conference by one of the authors and at least one author of each accepted paper must register for the conference.

All accepted papers will be published by Springer in a volume of Lecture Notes in Artificial Intelligence (LNAI).

Full-length papers cannot exceed twelve (12) pages. All papers should be prepared according to the formatting instructions of Springer LNAI series (http://www.springer.de/comp/lncs/authors.html/) and must be submitted in PDF (Adobe’s Portable Document Format) through EPIA 2015 Conference Management.

[6] Track Fees:

Track participants must register at the main EPIA 2015 conference. No extra fee shall be paid for attending this track.

[7] Organizing Committee:

Joaquim F. Ferreira da Silva. Universidade Nova de Lisboa, Portugal.

Vitor R. Rocio. Universidade Aberta, Portugal.

Gaël Dias. University of Caen Basse-Normandie, France.

José G. Pereira Lopes. Universidade Nova de Lisboa, Portugal.

Hugo Gonçalo Oliveira, Universidade de Comimbra, Portugal

[8] Program Committee:

Adam Jatowt (Universit of Kioto, Japan)
Adeline Nazarenko (University of Paris 13, France)
Aline Villavicencio (Universidade Federal do Rio Grande do Sul, Brazil)
Antoine Doucet (University of Caen, France)
António Branco (Universidade de Lisboa, Portugal)
Béatrice Daille (University of Nantes, France)
Belinda Maia (Universidade do Porto, Portugal)
Brigitte Grau (LIMSI, France)
Bruno Cremilleux (University of Caen, France)
Christel Vrain (Université d’Orléans, France)
Eric de La Clergerie (INRIA, France)
Gabriel Pereira Lopes (Universidade Nova de Lisboa, Portugal)
Gaël Dias (University of Caen Basse-Normandie)
Gregory Grefenstette (CEA, France)
Hugo Oliveira (Universidade de Coimbra)
Irene Rodrigues. Universidade de Évora, Portugal)
Isabelle Tellier (University of Orléans, France)
Joaquim Ferreira da Silva (Universidade Nova de Lisboa)
João Balsa (Universidade de Lisboa, Portugal)
João Magalhães (Universidade Nova de Lisboa)
Katerzyna Wegrzyn-Wolska (ESIGETEL, France)
Lucinda Carvalho ((Universidade Aberta, Portugal)
Manuel Vilares Ferro (University of Vigo, Spain)
Marc Spaniol (University of Caen Basse-Normandie)
Marcelo Finger (Universidade de São Paulo, Brazil)
Maria das Graças Volpe Nunes (Universidade de São Paulo, Brazil)
Mark Lee (University of Birmingham, United Kingdom)
Mohand Boughanem (University of Toulouse III, France)
Nuno Mamede (Universidade Técnica de Lisboa, Portugal)
Nuno Marques (Universidade Nova de Lisboa, Portugal)
Pablo Gamallo (Faculdade de Filologia, Santiago de Compustela, Spain)
Paulo Quaresma (Universidade de Évora, Portugal)
Pavel Brazdil (University of Porto, Portugal)
Pierre Zweigenbaum (CNRS-LIMSI, France)
Pushpak Bhattacharyya (Indian Intitute of Technology Bombay, India)
Spela Vintar (University of Ljubljana, Slovenia)
Tomaz Erjavec (Jozef Stefan Institute, Slovenia)
Vitor Jorge Rocio (Universidade Aberta, Portugal)

Most Relevant NLP Journals via NLPeople

This is a question and follow-up initiated by Eduardo César Garrido Merchán in the Linkedin NLPeople group.

Meeting of the Association for Computational Linguistics (ACL)
Transactions of the Association for Computational Linguistics (ISSN: 2307-387X)
European Chapter of the ACL (EACL)
North American Chapter of the Association for Computational Linguistics
International Conference on Computational Linguistics (COLING)
Conference on Empirical Methods in Natural Language Processing (EMNLP)
Data & Knowledge Engineering.
IEEE Transactions on Knowledge and Data Engineering
Computational Linguistics
International Conference on Computational Linguistics and Intelligent Text Processing
Text REtrieval Conference (TREC)
International Joint Conference on Natural Language Processing

NLP conference calendar: http://www.cs.rochester.edu/~tetreaul/conferences.html

Measuring ling. complexity: A multidisciplinary perspective


 All presentations here


The Linguistics Research Unit of the Institute of Language and Communication hosted a workshop on ‘Measuring linguistic complexity: A multidisciplinary perspective’ on Friday 24 April, 2015. 

The main objective of the workshop were to bring together specialists from a number of different but related fields to discuss the construct of linguistic complexity and how it is typically measured in their respective research fields. 

The event was structured around keynote presentations by five distinguished scholars:

  • Philippe Blache (CNRS & Universite d’Aix-Marseille, France): Evaluating complexity in syntax: a computational model for a cognitive architecture
  • Alex Housen (Vrije Universiteit Brussel, Belgium): L2 complexity – A Difficult(y) Matter
  • Frederick J. Newmeyer (University of Washington, University of British Columbia, Simon Fraser University): The question of linguistic complexity: historical perspective
  • Advaith Siddharthan (University of Aberdeen, UK): Automatic Text Simplification and Linguistic Complexity Measurements
  • Benedikt Szmrercsanyi (KULeuven, Belgium): Measuring complexity in contrastive linguistics and contrastive dialectology

A round table closed the workshop.

Details about the event are available on the workshop website: http://www.uclouvain.be/en-linguistic-complexity.html

The number of participants is limited. Participation is free of charge but registration is required before Friday 3rd April (via our registration form at http://www.uclouvain.be/en-505315.html). 

Thomas François (Centre de traitement automatique du langage) & Magali Paquot (Centre for English Corpus Linguistics)


A multidimensional construct: Bulté & Housen (2012:23)

Shared challenges, shared oportunities

Where is the place of theory here?

Do we need new measures? Do we ned to validate existing ones?

The many facets of complexity.

Formal linguistics may be a good starting point but don’t have much to offer.

Building a research community ?


Release of SciSumm14 an annotated corpus for scientific summarization.

From the Corpora list

SciSumm14 is an open repository with a corpus of ACL Computational Linguistics research papers and their annotations, contributed to the public by the Web IR / NLP Group at the National University of Singapore (WING-NUS).  This corpus is offered as a part of the SciSumm Shared Task in TAC 2014. The SciSumm Shared Task is organized under the BiomedSumm track. It follows the basic structure and guidelines of the Biomedical Summarization Track and adapts them for annotating and creating a corpus of training topics from computational linguistics (CL) research papers. 

The purpose behind the release of this corpus is to highlight the challenges and relevance of the scientific summarization problem, support research in automatic scientific document summarization and provide evaluation resources to push the current state of the art. This corpus offers a “community” summary of a reference paper based on its collection of citing sentences, called citances. Furthermore, each of the citances is mapped to referenced text in the reference paper and tagged with the information facet it represents.
This corpus is expected to be of interest to a broad community including those working in computational linguistics NLP, text summarization, discourse structure in scholarly discourse, paraphrase, textual entailment, and/or text simplification.
Dr. Kokil Jaidka (Wee Kim Wee School of Communication and Information, Nanyang Technological University)koki0001@e.ntu.edu.sg
Dr. Min-Yen Kan (Dept. of Computer Science, School of Computing, National University of Singapore)kanmy@comp.nus.edu.sg
Muthu Kumar Chandrasekaran (Dept. of Computer Science, School of Computing, National University of Singapore)muthu.chandra@comp.nus.edu.sg
Ankur Khanna (Web, IR/NLP group, National University of Singapore) khanna89ankur@gmail.com
​1. Created by randomly sampling ten documents from the ACL Anthology corpus and selecting their citing papers. It is available for download at https://github.com/WING-NUS/scisumm-corpus
2. Organized into “topic” folders. Each “topic” is the Reference Paper, and the folder contains upto ten Citing Papers (CPs) that all contain citations to the RP. In each CP, the text spans (i.e., citances) have been identified that pertain to a particular citation to the RP.
3. Most text files were created from the pdf files obtained above by using Adobe Acrobat. The remaining were converted using the GATE 8.0 open source software. For more details, see the README at https://github.com/WING-NUS/scisumm-corpus
4. Inter-annotator agreement was used to assess the homogeneity and quality of the coding of citances and references, and disagreements were resolved through discussion.
5. The ACL ids and the titles of reference papers are given below:
ACL-anthology-id     Tile of the paper
H89-2014                   Augmenting a Hidden Markov Model for Phrase-Dependent Word Tagging
X96-1048                    OVERVIEW OF RESULTS OF THE MUC-6 EVALUATION 
E03-1020                    Discovering Corpus-Specific Word Senses
C90-2039                    Strategic Lazy Incremental Copy Graph Unification
J00-3003                     Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech
P98-1081                    Improving Data Driven Wordclass Tagging by System Combination
N01-1011                    A Decision Tree of Bigrams is an Accurate Predictor of Word Sense
H05-1115                    Using Random Walks for Question-focused Sentence Retrieval
J98-2005                     Estimation of Probabilistic Context-Free Grammars