The main RANLP conference will be preceeded by two days tutorials delivered by distinguished lecturers. We plan 6 tutorials, each with duration of 150 minutes, distributed as 90 min talk + 30 min break + 60 min talk, so all participant could attend all tutorials.


Monday 3/09

Tuesday 4/09
9:00 - 12:00

Ivana Kruijff-Korbayova

University of Saarbruecken

Information structure and its interaction with discourse semantics

Andrei Mikheev

University of Edinburgh

Information extraction and named entity recognition

13:30 - 16:30

Allan Ramsay

UMIST, Manchester

Computational aspects of discourse processing

Kemal Oflazer

Sabanci University, Istanbul

Finite state language processing and computational morphology

17:00 - 20:00

Wolfgang Menzel

University of Hamburg

Constraint-based parsing

Nicolas Nicolov

IBM Watson Research Center

Current trends in NL dialog systems


Ivana Kruijff-Korbayova, University of Saarbruecken, Information structure and its interaction with discourse semantics

Information Structure (IS) concerns how speakers/writers realize content within a sentence in a way that reflects their intentions and their formulation of the hearer/reader's attentional state. By means of a particular IS, a speaker/writer presents some parts of the sentence meaning as context-dependent, and others as context-affecting.

The goals of the course are:

Various approaches to IS exist, which use diverging terminologies. Among the most often distinguished notions in the area of IS (at various levels) are Theme-Rheme (Mathesius, Firbas, Danes, Halliday, Bolinger, Steedman), Topic-Comment (Chomsky, Jackendoff), Topic-Focus (Sgall and Hajicova, Buering), Ground(=Link+Tail)-Focus (Vallduvi), Presupposition-Focus (Chomsky, Jackendoff, Karttunen&Peters, Creswell, Selkirk, Rooth), Backround-Focus (von Stechow, Krifka, Steedman), Given-New (Halliday), contextually bound-nonbound (Sgall and Hajicova). We will briefly introduce these approaches and compare the different terminologies by showing how they evolved from a common source (Mathesius) and interacted with other areas of (formal) syntax, semantics and pragmatics.

IS can be realized by various means, and often by an interplay thereof. This includes intonation (i.e., accenting, de-accenting, and phrasing), word order, syntactic constructions and morphological marking. Languages differ in the extent to which they employ these means. We will explain how IS is realized by various means in various languages (English and Czech as two very different cases will be of central interest, but other languages will be included too).

Since IS reflects attentional state(s) of discourse participants and attention transcends the sentence, it is appropriate to ask also whether, and if so how, IS projects beyond the sentence into discourse. We will discuss two aspects of the interaction between sentence-level IS and discourse context: (i) dependence and influence of IS on the cognitive status of discourse referents (e.g., activation/accessibility/inactivation, cf. Chafe); (ii) IS-sensitive interpretation of particular adverbials and discourse connectives, requiring access to suitable alternatives in the context (e.g., `only', `even', `too', `although/however', `otherwise').

The concern with IS and its interplay with the larger discourse context is not only justified on theoretical grounds: Experience with applications such as translating telephony and interactive query-answering makes it painfully clear that a theory relating IS and discourse semantics is essential for accurate Natural Language Processing. Fortunately, formal accounts addressing these issues have started to emerge and some, to be embodied in computational models of discourse processing. This tutorial aims at providing enough theoretical background to understand and appreciate such approaches.

Unfortunately, there is no suitable (set of) coursebook(s) in which the issues addressed in the tutorial would be comprehensively handled. Knud Lambrecht's Information Structure and Sentence Form (1994, Cambridge University Press) discusses in detail a number of concepts involved in information structuring, and thus provides useful linguistic background. In addition, a set of relevant papers for further reading will be suggested during the tutorial.

None, though awareness of basic syntactic, semantic and pragmatic notions would be an asset.


Allan Ramsay, UMIST, Manchester,Computational aspects of discourse processing

Linguistic theories of discourse structure have to be combined with computational models of semantics before they can be used in NLP systems. To do this, you have to have a semantic model which is open to the kinds of attitudinal information carried by discourse markers, and you have to be able to extract this information from texts, where it is often only implicit or at best encoded by structural clues. The tutorial will address the following issues:

The aim of the tutorial is to give participants an understanding of the issues that arise when you try to produce computational models that can deal with the differences between `it was given to me' and `I was given it', and between `it was given to me' and `it was given to me'. The discussion will show how to produce compositional treatments of these phenomena, and will provide links to work on linguistic acts and AI planning theory.


Halliday Introduction to functional grammar provides a good background introduction to some of the ideas discussed in this tutorial and in Ivana Kruijff-Korbayová's tutorial. Kamp and Reyle From logic to discourse is a well-known introduction to one version of dynamic semantics (not the one that we will be using, but close enough to be useful). Walker, Joshi and Prince Centering Theory in Discourse deals with centering theory at length.


This tutorial will illustrate how some of the issues covered in  Ivana Kruijff-Korbayová's tutorial on Information structure and its interaction with discourse semantics may be treated within a computational framework, and so it may be a good idea to attend that tutorial if you are not already familiar with the relevant material. An understanding of some version of dynamic semantics would also be extremely useful.

Wolfgang Menzel, University of Hamburg, Constraint-based parsing

Constraint satisfaction techniques introduce an alternative view to grammar modelling. Instead of providing a generative description of possible linguistic structures, conditions for their acceptability are specified. Because a generative backbone is almost completely avoided, the resulting parsing system is particularly prepared to deal with unexpected (i.e. ill-formed) constructions. Different approaches can be distinguished according to

Particular emphasis is given to constraint optimization techniques and their use to develop robust parsing procedures which are error-sensitive and time-adaptive. Applications in the area of spoken language systems and foreign language tutoring systems are discussed.

Andrei Mikheev, University of Edinburgh, Named Entity Recognition: Task Profile

In this tutorial I will describe the task of Named Entity Recognition (NER) and show its applicability to various tasks of NLP and IR. I will present three different paradigms for designing NER systems: list lookup, pattern-based and statistical. I will compare advantages and disadvantages of these approaches. One of the main emphasis of the tutorial will be on methods which allow NER systems to be not heavily dependent on pre-existing resources such as lists. I will present a Document Centered Approach which tries to infer new knowledge from the document under processing and apply this knowledge during the processing itself.

Kemal Oflazer, Sabanci University, Istanbul Finite State Language Processing and Computational Morphology

After a brief overview of basic concepts finite state machines, this tutorial will summarize the use of finite state methods in various stages of language processing. It will then concentrate on employing finite state machinery in computational morphology. The two main approaches employing two-level rules and cascade of rules will be introduced. The two-level approach will be described with sufficient detail, with examples from English, Turkish and other languages, and with special emphasis on addressing engineering issues in building real analyzers to deal with unknown words, words of foreign origin, etc. The cascade of rules approach will be presented in the context of semi-automatic bootstrapping of finite state morphological analyzers from limited annotated information provided by informants.

Nicolas Nicolov, IBM Watson Research Center, Current trends in NL dialog systems

In this turial we will look at the fundamental structures and algorithms used to build conversational agents, programs which comminicate with users in natural language in order to achieve certain tasks (book a ticket, check email over the phone, find information for products and services, etc.). In the first part of the tutorial we will examine theoretical issues. In the second part we will see how the theory is applied in practical dialog systems.

No specific prerequisites required, though knowledge of production systems, planning, parsing, discourse and generation would be helpful.