Tutorials

TUTORIALS

The main RANLP conference will be preceeded by two days tutorials delivered by distinguished lecturers. We plan 6 tutorials, each with duration of 150 minutes, distributed as 90 min talk + 30 min break + 60 min talk, so all participant could attend all tutorials.

	Monday 3/09	Tuesday 4/09
9:00 - 12:00	Ivana Kruijff-Korbayova University of Saarbruecken Information structure and its interaction with discourse semantics	Andrei Mikheev University of Edinburgh Information extraction and named entity recognition
13:30 - 16:30	Allan Ramsay UMIST, Manchester Computational aspects of discourse processing	Kemal Oflazer Sabanci University, Istanbul Finite state language processing and computational morphology
17:00 - 20:00	Wolfgang Menzel University of Hamburg Constraint-based parsing	Nicolas Nicolov IBM Watson Research Center Current trends in NL dialog systems

Abstracts

Ivana Kruijff-Korbayova, University of Saarbruecken, Information structure and its interaction with discourse semantics

Information Structure (IS) concerns how speakers/writers realize content within a sentence in a way that reflects their intentions and their formulation of the hearer/reader's attentional state. By means of a particular IS, a speaker/writer presents some parts of the sentence meaning as context-dependent, and others as context-affecting.

The goals of the course are:

to introduce the notion of information structure and its interpretation;
to introduce various approaches to IS, and explain the diverging terminologies that are used by them;
to explain how IS is realized by (an interplay of) various means including intonation, word order, syntactic constructions and morphological marking;
to consider how utterance-level IS depends on the discourse context and influences interpretation at the discourse level.

Various approaches to IS exist, which use diverging terminologies. Among the most often distinguished notions in the area of IS (at various levels) are Theme-Rheme (Mathesius, Firbas, Danes, Halliday, Bolinger, Steedman), Topic-Comment (Chomsky, Jackendoff), Topic-Focus (Sgall and Hajicova, Buering), Ground(=Link+Tail)-Focus (Vallduvi), Presupposition-Focus (Chomsky, Jackendoff, Karttunen&Peters, Creswell, Selkirk, Rooth), Backround-Focus (von Stechow, Krifka, Steedman), Given-New (Halliday), contextually bound-nonbound (Sgall and Hajicova). We will briefly introduce these approaches and compare the different terminologies by showing how they evolved from a common source (Mathesius) and interacted with other areas of (formal) syntax, semantics and pragmatics.

IS can be realized by various means, and often by an interplay thereof. This includes intonation (i.e., accenting, de-accenting, and phrasing), word order, syntactic constructions and morphological marking. Languages differ in the extent to which they employ these means. We will explain how IS is realized by various means in various languages (English and Czech as two very different cases will be of central interest, but other languages will be included too).

Since IS reflects attentional state(s) of discourse participants and attention transcends the sentence, it is appropriate to ask also whether, and if so how, IS projects beyond the sentence into discourse. We will discuss two aspects of the interaction between sentence-level IS and discourse context: (i) dependence and influence of IS on the cognitive status of discourse referents (e.g., activation/accessibility/inactivation, cf. Chafe); (ii) IS-sensitive interpretation of particular adverbials and discourse connectives, requiring access to suitable alternatives in the context (e.g., `only', `even', `too', `although/however', `otherwise').

The concern with IS and its interplay with the larger discourse context is not only justified on theoretical grounds: Experience with applications such as translating telephony and interactive query-answering makes it painfully clear that a theory relating IS and discourse semantics is essential for accurate Natural Language Processing. Fortunately, formal accounts addressing these issues have started to emerge and some, to be embodied in computational models of discourse processing. This tutorial aims at providing enough theoretical background to understand and appreciate such approaches.

Literature
Unfortunately, there is no suitable (set of) coursebook(s) in which the issues addressed in the tutorial would be comprehensively handled. Knud Lambrecht's Information Structure and Sentence Form (1994, Cambridge University Press) discusses in detail a number of concepts involved in information structuring, and thus provides useful linguistic background. In addition, a set of relevant papers for further reading will be suggested during the tutorial.

Prerequisites:
None, though awareness of basic syntactic, semantic and pragmatic notions would be an asset.

Allan Ramsay, UMIST, Manchester,Computational aspects of discourse processing

Linguistic theories of discourse structure have to be combined with computational models of semantics before they can be used in NLP systems. To do this, you have to have a semantic model which is open to the kinds of attitudinal information carried by discourse markers, and you have to be able to extract this information from texts, where it is often only implicit or at best encoded by structural clues. The tutorial will address the following issues:

basic dynamic semantics: anchoring, updating, conditional reference (`donkey sentences')
using higher-order logic to capture the theme/rheme distinction, and to capture the semantics of focussing
the mutual constraints between information structure and nominal form (centering and the tree-like nature of coherent discourse).

The aim of the tutorial is to give participants an understanding of the issues that arise when you try to produce computational models that can deal with the differences between `it was given to me' and `I was given it', and between `it was given to me' and `it was given to me'. The discussion will show how to produce compositional treatments of these phenomena, and will provide links to work on linguistic acts and AI planning theory.

Literature

Halliday Introduction to functional grammar provides a good background introduction to some of the ideas discussed in this tutorial and in Ivana Kruijff-Korbayová's tutorial. Kamp and Reyle From logic to discourse is a well-known introduction to one version of dynamic semantics (not the one that we will be using, but close enough to be useful). Walker, Joshi and Prince Centering Theory in Discourse deals with centering theory at length.

Prerequisites

This tutorial will illustrate how some of the issues covered in Ivana Kruijff-Korbayová's tutorial on Information structure and its interaction with discourse semantics may be treated within a computational framework, and so it may be a good idea to attend that tutorial if you are not already familiar with the relevant material. An understanding of some version of dynamic semantics would also be extremely useful.

Wolfgang Menzel, University of Hamburg, Constraint-based parsing

Constraint satisfaction techniques introduce an alternative view to grammar modelling. Instead of providing a generative description of possible linguistic structures, conditions for their acceptability are specified. Because a generative backbone is almost completely avoided, the resulting parsing system is particularly prepared to deal with unexpected (i.e. ill-formed) constructions. Different approaches can be distinguished according to

the basic structures over which constraints are defined (usually finite domains) and how they are used to implement linguistic structures (constituency or dependency trees),
the expressive power of the constraint formalism, i.e. limitations to the arity of constraints, availability of constraints of different strength and
the computational properties of the constraint solver, i.e. soundness, completeness, predictability, interruptability, termination behaviour.

Particular emphasis is given to constraint optimization techniques and their use to develop robust parsing procedures which are error-sensitive and time-adaptive. Applications in the area of spoken language systems and foreign language tutoring systems are discussed.

Andrei Mikheev, University of Edinburgh, Named Entity Recognition: Task Profile

In this tutorial I will describe the task of Named Entity Recognition (NER) and show its applicability to various tasks of NLP and IR. I will present three different paradigms for designing NER systems: list lookup, pattern-based and statistical. I will compare advantages and disadvantages of these approaches. One of the main emphasis of the tutorial will be on methods which allow NER systems to be not heavily dependent on pre-existing resources such as lists. I will present a Document Centered Approach which tries to infer new knowledge from the document under processing and apply this knowledge during the processing itself.

Kemal Oflazer, Sabanci University, Istanbul Finite State Language Processing and Computational Morphology

After a brief overview of basic concepts finite state machines, this tutorial will summarize the use of finite state methods in various stages of language processing. It will then concentrate on employing finite state machinery in computational morphology. The two main approaches employing two-level rules and cascade of rules will be introduced. The two-level approach will be described with sufficient detail, with examples from English, Turkish and other languages, and with special emphasis on addressing engineering issues in building real analyzers to deal with unknown words, words of foreign origin, etc. The cascade of rules approach will be presented in the context of semi-automatic bootstrapping of finite state morphological analyzers from limited annotated information provided by informants.

Nicolas Nicolov, IBM Watson Research Center, Current trends in NL dialog systems

In this turial we will look at the fundamental structures and algorithms used to build conversational agents, programs which comminicate with users in natural language in order to achieve certain tasks (book a ticket, check email over the phone, find information for products and services, etc.). In the first part of the tutorial we will examine theoretical issues. In the second part we will see how the theory is applied in practical dialog systems.

The theory of dialog systems
- ELIZA, PARRY and the beginnings of dialog systems
- Turns and utterances
- Grounding
- Conversational implicature
- Dialog acts
- Dialog management in Conversational Agents
- NL Generation in the context of dialog systems
- Iterative user centric design
- Evaluation of dialog systems
The practice of dialog systems
- Mutual funds - form-based dialog management; robust analysis with semantic grammars
- TRINDI architecture
- Verbmobil - appointment scheduling
- Natural Language Assistant - the anatomy of a product recommendation system
- WebGenie - information retrieval with a dialog front end
- Virtual reps
- Trends and future directions

No specific prerequisites required, though knowledge of production systems, planning, parsing, discourse and generation would be helpful.