International Conference
RANLP - 2007

/Recent Advances in Natural Language Processing/
September 27-29, 2007, Borovets, Bulgaria

Tutorials - September 23-25


The main RANLP conference will be preceeded by three days tutorials delivered by distinguished lecturers. We plan 6 half day tutorials, each with duration of 220 minutes, distributed as follows: 60 min talk + 20 min break + 60 min talk + 20 min break + 60 min talk.

Tutorial lectureres:

Elisabeth Andre (University of Augsburg)
Dimitar Kazakov (University of York)
Bernardo Magnini (FBK-irst, Trento)
Stelios Piperidis (ILSP Athens)
Frederique Segond (Xerox Research Centre Grenoble)
Karin Verspoor (Los Alamos National Laboratory)
& Kevin Bretonnel Cohen (University of Colorado School of Medicine)

Tutorial timetable is as follows:


September 23
September 24
September 25
9:00 - 12:40

Stelios Piperidis
(ILSP Athens)

Content Processing
and Applications

Dimitar Kazakov (University of York)

Information Retrieval

Bernardo Magnini
(FBK-irst, Trento)

Textual entailment

12:40 - 14:00 Lunch break Lunch break Lunch break
14:00 - 17:40

Frederique Segond (Xerox Research Centre Grenoble)

Industrial developments in NLP

Elisabeth Andre (University of Augsburg)

Speech-Based Multimodal Dialogue

Karin Verspoor (Los Alamos National Laboratory) &
Kevin Bretonnel Cohen (University of Colorado School of Medicine)

NLP and the Biomedical domain




Speech-Based Multimodal Dialogue
Elisabeth Andre, University of Augsburg

The tutorial focuses on speech-based dialogue systems that emulate aspects of human-human communication by making use of embodied conversational agents. These agents display facial expressions, gaze patterns, head, hand and arm gestures in synchrony with their speech. While work on speech-based dialogue is usually driven by the objective to achieve a high amount of robustness and efficiency, researchers working on embodied conversational agents also need to address aspects of social communication, such as emotions and personality. The tutorial provides an overview of approaches to determine multimodal dialogue acts for a single agent as well as agent  teams conversing with one or several human users. We will discuss how dialogue management tools need to extended to account for the challenges of multimodal multi-party dialogue. To emulate human face-to-face dialogue more closely, it is desirable to avoid asymmetries in communication channels. A specific part of the tutorial is therefore devoted to first attempts towards the development of perceptive agents which are able to perceive communicative feedback signals from the human conversational partner.


Information Retrieval
Dimitar Kazakov, University of York

The area of Information Retrieval (IR) studies the techniques used to detect the existence and find the whereabouts of one or more documents related to a request. This includes the services provided by search engines, but traditionally excludes Question-Answering systems. The tutorial will cover, and consistently demonstrate on working examples, the technology behind the currently used tools for information retrieval, be it online search engines or local (single machine, local net) solutions.

Textual Entailment
Bernardo Magnini, FBK-irst, Trento

The goal of identifying textual entailment - whether one piece of ?text can be plausibly inferred from another - has emerged in recent ?years as a generic core problem in Natural Language Understanding. For instance, in order to answer the question 'Who killed Kennedy?', a Question Answering system may need to recognize that 'Oswald killed Kennedy' can be inferred from 'the assassination of Kennedy by Oswald'. This challenge is at the heart of many natural language understanding tasks including Question Answering, Information Retrieval and Extraction, Machine Translation, and others that attempt to reason about and capture the meaning of linguistic expressions. The task has attracted significant interest over the last couple of years mainly fostered by the PASCAL Recognizing Textual Entailment Challenge (RTE).

The primary goals of this tutorial are to review the framework of applied Textual Entailment and motivate it as a generic paradigm for natural language semantics. The tutorial will provide a concise overview of recent perspectives and research results and present some of the key computational approaches proposed and some of the obstacles identified by the research community in this area.

Multimedia Content Processing and Applications
Stelios Piperidis – ILSP, Athens

The convergence of technological communication platforms opens up new opportunities for content generation and consumption, while enabling such content to be increasingly multimedia in nature. This tutorial will provide an introduction to multimedia content processing, focusing on the role and significance of natural language in multimedia discourse. We will briefly review the state-of-the-art in single-media processing (speech, text, video, etc) and discuss the problems, challenges, fall-back solutions and necessities for further advances.  The role of the different media and the potential benefit from comparative analysis and fusion of single-media processing results will be discussed in the context of different applications. Current achievements and practical applications involving archived or contemporary, monolingual or multilingual multimedia content will be illustrated by working examples.



Industrial developments in NLP
Frederique Segond, Xerox Research Centre Europe

This tutorial will examine the role of NLP and in particular of NLP research from an industrial perspective. Specifically, using real examples from the industry, it will focus on issues and concerns that must be addressed to meet the needs of potential customers for NLP technology. This includes issues such as: is industrial research different from
academic reserach in NLP? What does industry wants from NLP? Is the demand changing? What are the vertical markets? What are the technical constraints? Validation of NLP technology in an industrial context.


Natural Language Processing and the Biomedical domain
K. Bretonnel Cohen, University of Colorado School of Medicine and Karin Verspoor, Los Alamos National Laboratory

This tutorial will provide natural language processing researchers with an introduction to the field of іBioNLPІ -- natural language processing in the fields of medicine and biology. This field has long roots in the history of natural language processing, but has been an absolutely burgeoning field of interest in recent years. The past few years have been characterized by an unusual mixing of bioinformatics and NLP specialists at the conferences of both communities: ACL or NAACL has now hosted workshops on BioNLP every year since 2002, with excellent attendance numbers, and bioinformatics and medical informatics meetings have featured NLP papers, sessions, and SIG meetings since the late 1990s. Recent MUC-like and TREC-sponsored shared tasks have had some unusual results, and the implications of these findings should make for an interesting tutorial for the general NLP researcher.

BioNLP presents unique challenges in a number of areas, ranging from low-level processing tasks to high-level conceptual issues. Tokenization and sentence boundary detection are demonstrably different tasks in biomedical publications than in newswire text, and theoretical issues such as predicate-argument structure representation have been a topic of much discussion in recent work in the field. Despite the many challenges that are unique to biomedical text, most of the sub-topics of NLP are the subject of current research in the BioNLP community -- information retrieval, named entity recognition, information extraction, text classification, semantic role labeling, coreference resolution, question-answering, parsing, morphological analysis, and discourse analysis. Thus, there are interesting challenges in the biomedical domain for almost anyone working in natural language processing.

One unique advantage to the field of BioNLP is the wide availability of biological knowledge resources, including an enormous body of freely available text. The tutorial will include an overview of a variety of publicly available BioNLP resources, including:
    * A number of domain-specific ontologies, including the popular Gene Ontology
    * Corpora, including the popular GENIA corpus and a number of less-well-known but valuable corpora and text collections, some of them featuring full text

One potential stumbling block in the field of BioNLP is the requirement for domain knowledge. The tutorial will include a brief overview of just enough biology to enable the NLP researcher to comprehend the topics under discussion in typical biomedical texts, if not the specifics of the discussion.