12-14 September
Hissar, Bulgaria
ranlp2011@lml.bas.bg
The main RANLP conference will be preceeded by two days tutorials delivered by distinguished lecturers. We plan 3 tutorials per day, each with duration of 180 minutes (90 min talk + 15 min break + 90 min talk) with the following schedule:
8:30 – 11:45 | 13:00 – 16:15 | 17:15 – 20:30 | |
September 10 | Preslav Nakov & Zornitsa Kozareva | Patrick Hanks | Kevin B. Cohen |
September 11 | Inderjeet Mani | Lucia Specia & Wilker Aziz | Erhard Hinrichs |
Kevin Bretonnel Cohen (University of Colorado School of Medicine)
"Software testing and quality assurance for natural language processing"
Summary: Bugs in software can lead to retracted publications and ruined careers. The first half of this tutorial will cover standard methods in software testing, including black-box testing, white-box testing, and popular testing frameworks. The second half of this tutorial will be devoted to the special problems of testing natural language processing applications, and will show how techniques from field linguistics can be brought to bear on the problem of designing test suites for NLP applications.
Patrick Hanks (University of the West of England, Bristol
and University of Wolverhampton)
"Practical Corpus Pattern Analysis"
One of the most important discoveries in linguistics during the past quarter of a century, since very large corpora became available, is that word use is very highly patterned and that each pattern has a distinctive meaning. In this tutorial, we shall look at some of the procedures and problems of pattern identification.
Patterns involve collocational preferences as well as valencies or syntactic structures. So how should they be categorized, stored, and associated with meanings?
A corpus allows you to look at very large numbers of uses of a word, lemma, or phrase and get an overview. When you first look at the corpus evidence in the form of a concordance for a word, the patterns begin to jump out at you:
- What do you hazard? It is theoretically possible to hazard your life or your money, but it is much more common to hazard a guess.
- What sort of things abate? Typically, storms abate. All sort of other things are possible, too, but they have something in common, namely that they are problematic in some way.
As we study the data more closely, we see more patterns, though it is not always easy to hit on the right level of generalization in describing them.
But then something rather alarming happens. After the regularities have been categorized (however that may be done), we are almost always left with a residue of meaningful uses of a word that are irregular. Such uses are not explicable in terms of the competence/performance distinction, still less in terms of selectional restrictions, and yet (in most cases) these irregular utterances succeed extremely well in communicating a meaning. So how should the be processed?
We shall discuss the implications of all this for linguistic theory.
Erhard Hinrichs (University of Tubingen)
"WebLicht - A Service-Oriented Architecture for Multi-lingual Webservices"
Summary: This tutorial will introduce WebLicht, a service-oriented architecture for multi-lingual webservices and incremental annotation of text corpora. WebLicht currently offers webservices for a variety of European languages, including English, Finnish, French, German, Italian, Spanish, and Romanian. The tutorial will cover technical details of the REST-style architecture, its web service repository, as well as the ISO-conformant text corpus format used by WebLicht.
WebLicht is available as part of the ESFRI infrastructure projects CLARIN and D-SPIN, whose mission it is to establish an integrated and interoperable research infrastructure of language resources and its technology. It aims at lifting the current fragmentation, offering a stable, persistent, accessible and extendable infrastructure.
Zornitsa Kozareva (Information Sciences Institute, University of Southern California)
& Preslav Nakov (National University of Singapore)
"Web Knowledge Extraction and Applications"
Summary: Knowledge acquisition has been of great interest to the research community during the past decades. We will present contemporary methods for knowledge extraction from the Web, based on graph theory, paraphrases and surface markers. We will show how these methods can be applied to various NLP problems including semantic relation extraction, ontology learning, and machine translation. We will also discuss some open problems related to the ontological organization, annotation and evaluation of the harvested knowledge.
Inderjeet Mani (Children's Organization of Southeast Asia)
"Modeling Narrative Structure: Foundations of Computational Narratology"
Summary: The field of narrative (or story) understanding and generation is one of the oldest in NLP and AI, which is hardly surprising, since storytelling is such a fundamental and familiar intellectual activity. In recent years, the demands of interactive entertainment, and interest in the creation of engaging narratives with life-like characters, has provided a fresh impetus to this field. Modeling the structure of narratives requires going beyond the propositional content of sentences in the narrative discourse, representing not only the temporal structure of narrated events (i.e., their ordering and pace) but also their causal relations. The latter can provide, based on reasoning about the beliefs, goals, and plans of particular characters, the basis for the plot (which in the Aristotelian sense is a sequence of events linked by necessity or probability). This tutorial will provide an overview of the principal problems, approaches and challenges faced today in modeling various aspects of narrative structure, offering an empirically-oriented and rigorous framework for assessing developments in the field. Along the way, the tutorial will introduce classical narratological concepts and their mapping to and realization in the architectures of key intelligent narrative systems.
Lucia Specia (University of Wolverhampton)
& Wilker Aziz (University of Wolverhampton)
"Fundamental and Advanced Approaches to Statistical Machine Translation"
Summary: In this tutorial we will cover the foundations of word- and phrase-based Statistical Machine Translation (SMT), from word-alignment and phrase extraction to parameter estimation, decoding and evaluation. We will also introduce some recent developments in SMT, including syntax-based SMT and discriminative models. Finally, we will discuss the state-of-the-art performance, commercial perspectives and challenges for SMT.