1st Intl. NLP for Informal Text- Deadline 17/4

Graph-Magnifier-icon

The 1st International Workshop on Natural Language Processing for Informal Text (NLPIT 2015)
In conjunction with The International Conference on Web Engineering(ICWE 2015)
June 23, 2015, Rotterdam, The Netherlands
http://wwwhome.cs.utwente.nl/~badiehm/nlpit2015/

Overview
The rapid growth of Internet usage in the last two decades adds new challenges to understand the informal user generated content (UGC) on the Internet. Textual UGC refers to textual posts on social media, blogs, emails, chat conversations, instant messages, forums, reviews, or advertisements that are created by end-users of an online system. A large portion of language used on textual UGC is informal. Informal text is the style of writing that disregard language grammars and uses a mixture of abbreviations and context dependent terms. The straightforward application of state-of-the-art Natural Language Processing approaches on informal text typically results in significantly degraded performance due to the following reasons: the lack of sentence structure; the lack of enough context required; the seldom entities involved; the noisy sparse contents of users’ contributions; and the untrusted facts contained. It is the aim of this work- shop to bring the attention of researchers to the opportunities and challenges involved in informal text processing. In particular, we are interested in discussing informal text modeling, normalization, mining, and understanding in addition to various application areas in which UGC is involved.

Topics

We invite submissions on topics that include, but are not limited to, the following core NLP approaches for informal UGC: language identification, classification, clustering, filtering, summarization, tokenization, segmentation, morphological analysis, POS tagging, parsing, named entity extraction, named entity disambiguation, relation/fact extraction, semantic annotation, sentiment analysis, language normalization, informality modeling and measuring, language generation, handling uncertainties, machine translation, ontology construction, dictionary construction, etc.

Submission

Authors are invited to submit original work not submitted to another conference or workshop. Workshop submissions could be a full paper or short paper. Paper length should not exceed 12 pages for full papers and 6 pages for short papers. All papers should follow the Springer’s LNCS format. Papers in PDF can be sent via the EasyChair Conference System https://easychair.org/conferences/?conf=nlpit2015. Each submission will receive, in addition to a meta-review, at least 2 peer double-blind reviews. Each full paper will get 25 minutes presentation time. Short papers will get 5 minutes presentation time in addition to a poster. Beside papers, we also plan to have an invited talk by a renowned scientist on a topic relevant for the workshop. Workshop proceedings will be published as part of the ICWE2015 workshop proceedings. To contact the NLPIT 2015 organization team, please send an e-mail to: nlpit2015@easychair.org.

Deadlines

– Submission deadline: April 17, 2015
– Notification deadline: May 17, 2015
– Camera-ready version: May 24, 2015
– Workshop date: June 23, 2015

Msg. distributed through the corpora list

Ideology in corporate language

Ruth Breeze

Ideology in corporate language: discourse analysis using Wmatrix3

2013 Annual Reports from leading companies (16)  in financial services, mining, food and pharmaceutical

Parts: first part, non technical, discursive, visually interesting

Reference corpus: 1st BNC Sampler Business & BNC Informative texts but then only BNC Business

Use of semantic categories

Three case studies: size (big), time (begin) and casuse and effect

Size: Focus on growth, large, expanding, substantial. Not only adjectives are interesting here.

Conclusions:

Ideology of cause and effect

Dynamic approach to time

Emphasis on size and importance

Salient semantic areas: investigation, tough, strong, attentive, jelp & give, in power, belonging to a group

Differences: only in domain/topic-focus, probably different stresses on newness and green economy

 

 

CFP Terminology & Artificial Intelligence 2015

​ TIA 2015: FIRST CALL FOR PAPERS

—————————————————–
Terminology and Artificial Intelligence 2015
4 November – 6 November 2015
University of Granada, Spain

​http://lexicon.ugr.es/tia2015/Home.html​

Terminology and Artificial Intelligence (TIA) 2015 will highlight the close connection between multilingual terminology, ontologies, and the representation of specialized knowledge. Knowledge, as regarded in Terminology, is something more complex than a simple hierarchy or a thesaurus-like structure. In this sense, ontologies, understood as a shared conceptualization of a domain that can be communicated between people and/or systems, are better suited for accounting for multilinguality and contextual constraints. The link between Terminology and knowledge representation has been widely acknowledged with the advent of multilingual ontologies.

This is particularly relevant since today’s networked society has generated an increasing number of contexts where multilingualism challenges current knowledge representation methods and techniques. To meet these challenges, it is necessary to deal with semantics since information can be organized, presented, and searched, based on meaning and not just text. Ideally, this would mean that language-independent specialized knowledge could be accessed across different natural languages. There is thus the urgent need for high-quality multilingual knowledge resources that are able to bridge communication barriers, and which can be linked and shared.

Such issues can only be successfully addressed with creative collaborative solutions within disciplines, such as knowledge engineering, terminology, ontology engineering, cognitive sciences, corpus lexicology, and computational linguistics. Accordingly, the TIA 2015 Conference will provide a forum for interdisciplinary research that focuses on the intersection of different disciplines dealing with terminology, multilingualism, lexicology, ontology, and knowledge representation. Papers may address both theoretical questions and methodological aspects on these issues, as well as interdisciplinary approaches developed to facilitate convergence and co-operation in terminological aspects of importance to an increasingly multilingual society.

TIA 2015 solicits both regular papers (8 pages), which present significant work, and short papers (4 pages), which typically present work in progress or a smaller, focused contribution. Regardless of the language of the paper( English, Spanish, or French), all paper presentations will be in English. The submission deadline is June 15. See the conference webpage for more specific submission details.

TOPICS
1. Terminology and ontology acquisition and management
· Applying pattern recognition to enriching terminological resource
· Lexicons, thesauri and ontologies as semantic resources
· Lexicons and ontologies as means for knowledge transfer
· Reusing, standardizing and merging terminological or ontological resources
· Multilingual terminology extraction
· Multilinguality and multimodality in terminological resources
· Management of language resources
1. Terminology and knowledge representation
· Ontological semantics and linguistic
· Ontology localization
· Development of multimedia terminological resources
· Terminology alignment in parallel corpora and other lexical resources
· Representation of terms and conceptual relations in knowledge-based applications
· Comparative studies of terminological resources and/or ontological resources
· Terminological resources in the 21st century
· Harmonization of format and standards in terminological resources
1. Terminology and ontologies for applications
· Interoperability and reusability in knowledge-based tools and applications
· Models and metamodels in annotating semantic and terminological resources
· New R&D directions in terminology for industrial uses and needs
· Terminology for machine translation and natural language processing

Featured plenary speakers
Paul Buitelaar, National University of Ireland, Galway, Ireland
Ricardo Miral Usón, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain

TIA 2015 CHAIRS
Pamela Faber, University of Granada
Thierry Poibeau, CNRS

The PROGRAMME COMMITTEE members are distinguished experts from all over the world.

SUBMISSION INFORMATION
See the TIA 2015 website: http://lexicon.ugr.es/tia2015/Submission.html

IMPORTANT DATES
Paper submissions (long and short papers): 15 June 2015
Notification to authors: 4 September 2015
Final camera-ready paper: 24 September 2015
Conference: 4-6 November 2015

VENUE:
University of Granada,
Faculty of Translation and Interpreting,
18071 Granada, Spain

Contact information: termai2015@gmail.com​

Mensaje distribuido a través de la lista de (AESLA)

the logDice score in Word Sketches

Dice score gives very good results of collocation candidates. The only problem is that the values of the Dice score are usually very small numbers. We have defined logDice to fix this problem.

Values of the logDice have the following features:
– Theoretical maximum is 14, in case when all occurrences of X co-occur with Y and all occurrences of Y co-occur with X. Usually the value is less then 10.

– Value 0 means there is less than 1 co-occurrence of XY per 16,000 X or 16,000 Y. We can say that negative values means there is no statistical significance of XY collocation.

– Comparing two scores, plus 1 point means twice as often collocation, plus 7 points means roughly 100 times frequent collocation.

– The score does not depend on the total size of a corpus. The score combine relative frequencies of XY in relation to X and Y.

All these characteristics are useful orientation points for any field linguist working with collocation candidate lists.

From: A Lexicographer-Friendly Association Score, by Pavel Rychlý

A taxonomy of learner searches in DDL

 

Learners’ search patterns during corpus-based focus-on-form activities: A study on hands-on concordancing

Authors: Pérez-Paredes, Pascual; Sánchez-Tornel, María; Calero, Jose M. Alcaraz
Source: International Journal of Corpus Linguistics, Volume 17, Number 4, 2012, pp. 482-515(34)
Publisher: John Benjamins Publishing Company

Abstract:
Our research explores the search behaviour of EFL learners (n=24) by tracking their interaction with corpus-based materials during focus-on-form activities (Observe, Search the corpus, Rewriting). One set of learners made no use of web services other than the BNC during the central Search the corpus activity while the other set resorted to other web services and/or consultation guidelines. The performance of the second group was higher, the learners’ formulation of corpus queries on the BNC was unsophisticated and the students tended to use the BNC search interface to a great extent in the same way as they used Google or similar services. Our findings suggest that careful consideration should be given to the cognitive aspects concerning the initiation of corpus searches, the role of computer search interfaces, as well as the implementation of corpus-based language learning. Our study offers a taxonomy of learner searches that may be of interest in future research.