Tutorial schedule
The main RANLP conference will be preceeded by two days tutorials delivered by distinguished lecturers:
- Ondrej Bojar and Jindrich Helcl (Charles University)
- Noa Cruz (Virgen del Rocio Hospital, Seville)
- Veronique Hoste and Orphee De Clercq (Ghent University)
- Sanja Štajner (Mannheim University)
We plan 4 half-day tutorials, each with duration of 220 minutes, distributed as follows: 60 min talk + 20 min break + 60 min talk + 20 min break + 60 min talk.
Morning | Afternoon | |
September 2 | 10:00 – 13:40 Tutorial 1: "Negation and Speculation Detection in Biomedical Texts" Noa Cruz |
15:00 – 18:40 Tutorial 2: "From Easy-to-calculate Formulas to Holistic Readability Prediction" Veronique Hoste and Orphee De Clercq |
September 3 | 9:30 – 13:10 Tutorial 3: "Deep Learning for Text Simplification" Sanja Štajner |
15:00 – 18:40 Tutorial 4: "Deep Learning in MT / NMT" Ondrej Bojar and Jindrich Helcl |
Tutorials
Tutorial 1: Negation And Speculation detection In Biomedical Texts
Noa Cruz (Virgen del Rocio Hospital, Seville)
Summary: Negation and speculation are complex
expressive linguistic phenomena that have been extensively studied from
a theoretical perspective. They modify the meaning of the phrases in
their scope.
The amount of negative and speculative information in
biomedical texts cannot be underestimated. For example, 13.45% of
sentences in the abstracts section of the BioScope corpus and 13.76% of
sentences in the full papers section contain negations. The percentage
of sentences with hedge cues in the abstracts and full papers section of
the BioScope corpus are 17.70% and 19.44% respectively.
In addition,
professionals need to have efficient tools for accessing the vast
databases of scientific articles and clinical information available and
then analysing the text in greater depth. This analysis should include
negation and speculation detection because these linguistic phenomena
are used in this domain with the aim to express impressions,
hypothesised explanations of experimental results, or negative
findings.
This lecture is motivated by the fact that negation and
speculation detection is an emerging topic that has attracted the
attention of many researchers. In recent years, several challenges and
shared tasks have included the extraction of these language forms.
Therefore, it aims to define negation and speculation from a Natural
Language Processing perspective, to explain the need for processing
these phenomena, to summarise existing research on processing negation
and speculation, to provide a list of resources and tools, and to
speculate about future developments in this research area. An advantage
of this lecture is that it will not only provide an overview of the
state of the art in negation and speculation detection, but will also
introduce newly developed data sets and scripts.
Tutorial 2: From Easy-to-calculate Formulas to Holistic Readability Prediction
Veronique Hoste and Orphee De Clercq (Ghent University)Summary: Readability research has a long and rich tradition and the central research question has always been: what is it that makes a text easy or hard to read? Whereas superficial text characteristics leading to on-the-spot readability formulas were popular until the last decade of the previous century, recent advances in the field of computer science and natural language processing have triggered the inclusion of more intricate characteristics in present-day readability research. Despite these advances, there is still no consensus on which features are actually the best predictors of readability. This tutorial will consist of three large parts, each with a hands-on component. For the first part, we will zoom in on existing readability corpora and explain various techniques on how to collect and, more importantly, assess readability. The focus will be on the different envisaged end-users (children, second language learners, but also the general public) and assessment methods (expert rating versus crowdsourcing). In the next part, we will briefly discuss the traditional readability formulas to quickly move on to the more complex modeling of linguistic features which should be able to grasp lexical, syntactic, semantic and discourse information by employing state-of-the-art techniques in NLP. The final part of the tutorial will focus on supervised machine learning methods that can be used to automatically predict the readability in a regression (assigning an absolute readability score to a given text) or classification (determine for a text pair which one of the texts is easier or more difficult to read) setup and how these techniques can be evaluated (both intrinsically and extrinsically). We finish the tutorial with discussing some of the future challenges the field is still facing. Participants are invited to bring their own data (English text material).
Tutorial 3: Deep Learning for Text Simplification
Sanja Štajner (Mannheim University)
Summary: Texts and sentences with complex syntactic structures and vocabulary can pose obstacles for many people and decrease the performance of various natural language processing (NLP) tools. Therefore, in the last 30 years, there have been many attempts at automatically simplifying them.
This tutorial aims at providing a comprehensive overview of the research in text simplification (TS) with a special focus on the latest, state-of-the-art TS systems which use deep learning.
The first part of the tutorial will introduce the motivation for automated text simplification, possible obstacles for various target populations and NLP applications, existing TS corpora, and evaluation methodologies used in TS.
The second part of the tutorial will provide an overview of the most influential TS strategies, from the rule-based systems all the way to the latest neural machine translation based systems. The strengths and weaknesses of different strategies will be highlighted and several systems with different architectures will be directly compared.
The last part of the tutorial will present, in details, the state-of-the-art lexical TS system based on the use of word embeddings, and the state-of-the-art fully-fledged TS system using sequence-to-sequence neural networks.
Tutorial 4: Deep Learning in MT / NMT
Ondrej Bojar and Jindrich Helcl (Charles University)
Summary: Neural machine translation (NMT) has become a widely-adopted approach to machine translation in the past few years.
In our tutorial, we will start with the introduction to the basics of the deep learning methods used in NMT, such as recurrent neural networks and their advanced variants (GRU or LSTM networks), or the algorithms for their optimization.
We introduce the NMT-specific models, such as the attention mechanism, and describe the methods used for decoding the target sentences, including model ensembling and beam search.
We will go through the recent advancements in the field and discuss their impact on the state-of-the-art methods used in this year's WMT competition (http://www.statmt.org/wmt17/).