RANLP 2017

Tutorial schedule

The main RANLP conference will be preceeded by two days tutorials delivered by distinguished lecturers:

Ondrej Bojar and Jindrich Helcl (Charles University)
Noa Cruz (Virgen del Rocio Hospital, Seville)
Veronique Hoste and Orphee De Clercq (Ghent University)
Sanja Štajner (Mannheim University)

We plan 4 half-day tutorials, each with duration of 220 minutes, distributed as follows: 60 min talk + 20 min break + 60 min talk + 20 min break + 60 min talk.

	Morning	Afternoon
September 2	10:00 – 13:40 Tutorial 1: "Negation and Speculation Detection in Biomedical Texts" Noa Cruz	15:00 – 18:40 Tutorial 2: "From Easy-to-calculate Formulas to Holistic Readability Prediction" Veronique Hoste and Orphee De Clercq
September 3	9:30 – 13:10 Tutorial 3: "Deep Learning for Text Simplification" Sanja Štajner	15:00 – 18:40 Tutorial 4: "Deep Learning in MT / NMT" Ondrej Bojar and Jindrich Helcl

Tutorials

Tutorial 1: Negation And Speculation detection In Biomedical Texts

Noa Cruz (Virgen del Rocio Hospital, Seville)

Summary: Negation and speculation are complex expressive linguistic phenomena that have been extensively studied from a theoretical perspective. They modify the meaning of the phrases in their scope.

The amount of negative and speculative information in biomedical texts cannot be underestimated. For example, 13.45% of sentences in the abstracts section of the BioScope corpus and 13.76% of sentences in the full papers section contain negations. The percentage of sentences with hedge cues in the abstracts and full papers section of the BioScope corpus are 17.70% and 19.44% respectively.

In addition, professionals need to have efficient tools for accessing the vast databases of scientific articles and clinical information available and then analysing the text in greater depth. This analysis should include negation and speculation detection because these linguistic phenomena are used in this domain with the aim to express impressions, hypothesised explanations of experimental results, or negative findings.

This lecture is motivated by the fact that negation and speculation detection is an emerging topic that has attracted the attention of many researchers. In recent years, several challenges and shared tasks have included the extraction of these language forms.

Therefore, it aims to define negation and speculation from a Natural Language Processing perspective, to explain the need for processing these phenomena, to summarise existing research on processing negation and speculation, to provide a list of resources and tools, and to speculate about future developments in this research area. An advantage of this lecture is that it will not only provide an overview of the state of the art in negation and speculation detection, but will also introduce newly developed data sets and scripts.

Tutorial 2: From Easy-to-calculate Formulas to Holistic Readability Prediction

Veronique Hoste and Orphee De Clercq (Ghent University)

Summary: Readability research has a long and rich tradition and the central research question has always been: what is it that makes a text easy or hard to read? Whereas superficial text characteristics leading to on-the-spot readability formulas were popular until the last decade of the previous century, recent advances in the field of computer science and natural language processing have triggered the inclusion of more intricate characteristics in present-day readability research. Despite these advances, there is still no consensus on which features are actually the best predictors of readability.

This tutorial will consist of three large parts, each with a hands-on component. For the first part, we will zoom in on existing readability corpora and explain various techniques on how to collect and, more importantly, assess readability. The focus will be on the different envisaged end-users (children, second language learners, but also the general public) and assessment methods (expert rating versus crowdsourcing).

In the next part, we will briefly discuss the traditional readability formulas to quickly move on to the more complex modeling of linguistic features which should be able to grasp lexical, syntactic, semantic and discourse information by employing state-of-the-art techniques in NLP.

The final part of the tutorial will focus on supervised machine learning methods that can be used to automatically predict the readability in a regression (assigning an absolute readability score to a given text) or classification (determine for a text pair which one of the texts is easier or more difficult to read) setup and how these techniques can be evaluated (both intrinsically and extrinsically). We finish the tutorial with discussing some of the future challenges the field is still facing.

Participants are invited to bring their own data (English text material).

Tutorial 3: Deep Learning for Text Simplification

Sanja Štajner (Mannheim University)

Summary: Texts and sentences with complex syntactic structures and vocabulary can pose obstacles for many people and decrease the performance of various natural language processing (NLP) tools. Therefore, in the last 30 years, there have been many attempts at automatically simplifying them. This tutorial aims at providing a comprehensive overview of the research in text simplification (TS) with a special focus on the latest, state-of-the-art TS systems which use deep learning. The first part of the tutorial will introduce the motivation for automated text simplification, possible obstacles for various target populations and NLP applications, existing TS corpora, and evaluation methodologies used in TS. The second part of the tutorial will provide an overview of the most influential TS strategies, from the rule-based systems all the way to the latest neural machine translation based systems. The strengths and weaknesses of different strategies will be highlighted and several systems with different architectures will be directly compared. The last part of the tutorial will present, in details, the state-of-the-art lexical TS system based on the use of word embeddings, and the state-of-the-art fully-fledged TS system using sequence-to-sequence neural networks.

Tutorial 4: Deep Learning in MT / NMT

Ondrej Bojar and Jindrich Helcl (Charles University)

Summary: Neural machine translation (NMT) has become a widely-adopted approach to machine translation in the past few years.

In our tutorial, we will start with the introduction to the basics of the deep learning methods used in NMT, such as recurrent neural networks and their advanced variants (GRU or LSTM networks), or the algorithms for their optimization.

We introduce the NMT-specific models, such as the attention mechanism, and describe the methods used for decoding the target sentences, including model ensembling and beam search.

We will go through the recent advancements in the field and discuss their impact on the state-of-the-art methods used in this year's WMT competition (http://www.statmt.org/wmt17/).