Invited speakers

Textual Entailment: A Perspective on Applied Text Understanding

Ido Dagan, Bar-Ilan University, Israel

Abstract

Applied textual entailment was proposed as the task of recognizing whether the meaning or truth of one text can be expressed by, or inferred from, another text. This talk will first discuss the potential relevance of textual entailment, suggesting that it captures generically major semantic inferences for many text understanding applications. Textual entailment may also provide useful variations for some classical semantic problems, such as word sense disambiguation, ontology learning and the semantic interpretation of syntactic constructs. Interestingly, the methodologies that evolved for testing human text comprehension also seem to be "entailment-based". Altogether, this approach may promote an alternative to common text understanding practices: rather than interpreting texts into explicitly stipulated semantic representations, the focus may shift to context-sensitive modeling of entailment relationships between linguistic constructs.

At the second part of the talk we will review the ongoing benchmarks of the PASCAL Recognising Textual Entailment Challenges and our initial research on a probabilistic setting for textual entailment and the acquisition of entailment relations.

Presentation

Back

The Generation of Referring Expressions: Past, Present and Future

Robert Dale, Macquarie University, Australia

Abstract

The task of referring expression generation is concerned with determining what semantic content should be used in a reference to an intended referent so that the hearer will be able to identify the referent. The task has been a focus of interest within natural language generation at least since the early 1980s, in part because the problem appears relatively well-defined. Over the last 25 years, a range of algorithms and approaches have been proposed and explored; and yet, even a casual analysis of real human-authored texts suggests that we have a long way to go in terms of providing an explanation for the range of real linguistic behaviour that we find. In this talk, I'll review research in the area to date, try to characterise where we are now, and point to directions for future research in the area.

Presentation

Back

NLP: An Information Extraction Perspective

Ralph Grishman, New York University

Abstract

Information extraction -- identifying specified types of events or relations from free text -- poses dual and related challenges: adapting systems to new event types, and pulling out information about these events with high accuracy. In this talk we consider how these challenges relate to some of the basic problems of natural language processing -- integrating different types of linguistic knowledge; improving reference resolution; recognizing paraphrase -- and how these problems are being addressed by recent research.

Presentation

Back

Natural Language Processing and Knowledge

Makoto Nagao, National Institute of Information and Communications Technology

Abstract

Natural language processing(NLP) requires varieties of knowledge. Linguistic knowledge such as grammar and dictionaries has been developed and shared among the researchers in NLP for a decade or so. But general knowledge for use in NLP is not. When we consider about man-machine dialogue we have to prepare lots of knowledge, and also strong inference functions such as logical inference and common sense reasoning.
In this talk I will first explain some new developments in knowledge for computational linguistics, then discuss about what kind of knowledge is required for a dialogue system. Information retrieval on the Web is an important technology. Main research interest of current information retrieval is how the system can discard huge amount of retrieved information which is not so well fitted to the retrieval purposes, and focus on some essential information. However we have to be always careful about the quality of information on the Web. Therefore, an important next step will be to check and indicate how reliable is the obtained information from the Internet.
Estimating the confidence degree of information is a very difficult problem. I will discuss some possible ways of estimating the confidence degree of information. Natural language processing technologies as well as logical and common sense reasonings are involved in the estimation.

Presentation

Back

Linguistic Challenges for Computationalists

John Nerbonne, University of Groningen

Abstract

Even now techniques are in common use in computational linguistics which could lead to important advances in pure linguistics, especially langauge acquisition and sociolinguistics, if they were applied with intelligence and persistence. Reliable techniques for assaying similarities and differences among linguistic varieties are useful not only in dialectology and sociolinguistics, but also in studies of first and second language learning and in the study of language contact. These techniques would be even more valuable if they indicated relative degrees of similarity, but also the direction of deviation (contamination). Given the current tendency in linguistics to wish to confront the data of language use more directly, techniques are needed which can handle large amounts of noisy data and extract reliable measures from them. The current focus in Computational Linguistics on useful applications is a very good thing, but some further attention to linguistic use of computational techniques would be very rewarding.

Presentation

Back

Dataset profiling, and what term burstiness can tell you about your data.

Anne de Roeck, Open University, UK

Abstract

The performance of Information Retrieval and Natural Language Processing techniques is very sensitive to the characteristics of the data on which they are used. Though well established, this knowledge has never impacted on evaluation: the literature routinely reports, and compares, experimental evaluation results without reference to the impact of the underlying datasets or collections. This in turn raises a collection of methodological, and practical problems around replicability. These could be addressed if we had reliable ways of profiling datasets, using measures that highlight differences between collections. A first step is to investigate what such measures might look like.

In this talk, I will show that even standard textual datasets such as the TIPSTER collection differ in ways that challenge widely accepted assumptions about the general applicability of techniques, and that similar differences will show up between different languages. In exploring what might be suitable profiling measures, I will set out some desirable properties that such measures should have. I will then review some work on term burstiness and explore what the behaviour of some very frequent terms, and variations in burstiness patterns in the occurrence of a term can tell us about genres and datasets.

Presentation

Back