RANLP 2019

Tutorial schedule

The main RANLP conference will be preceeded by two days tutorials delivered by distinguished lecturers:

Preslav Nakov (Qatar Computing Research Institute, HBKU)
Valia Kordoni (Humboldt University of Berlin)
Antonio Miceli Barone (University of Edinburgh) and Sheila Castilho (Dublin City University)
Vlad Niculae and Tsvetomila Mihaylova (Institute of of Telecommunications, Lisbon)

We plan 4 half-day tutorials, each with duration of 220 minutes, distributed as follows: 60 min talk + 20 min break + 60 min talk + 20 min break + 60 min talk.

	Morning	Afternoon
August 31	9:30 – 13:10 Tutorial 1: Fact Checking: Truth Seeking in the Age of Disinformation	15:00 – 18:40 Tutorial 2: Deep Learning for Metaphors and Idioms
September 1	9:30 – 13:10 Tutorial 3: Neural Machine Translation	15:00 – 18:40 Tutorial 4: Latent Structure Models for NLP

Tutorials

Fact Checking: Truth Seeking in the Age of Disinformation

Preslav Nakov (Qatar Computing Research Institute, HBKU)

Summary: The rise of social media has democratized content creation and has made it easy for everybody to share and spread information online. On the positive side, this has given rise to citizen journalism, thus enabling much faster dissemination of information compared to what was possible with newspapers, radio, and TV. On the negative side, stripping traditional media from their gate-keeping role has left the public unprotected against the spread of misinformation, which could now travel at breaking-news speed over the same democratic channel. This has given rise to the proliferation of false information that is typically created either (a) to attract network traffic and gain financially from showing online advertisements, e.g., as is the case of clickbait, or (b) to affect individual people's beliefs, and ultimately to influence major events such as political elections. There are strong indications that false information was weaponized at an unprecedented scale during the Brexit and the 2016 U.S. presidential campaigns, and it has been suggested that this has posed the Dawn of the Post-Truth Era. "Fake news", which can be defined as fabricated information that mimics news media content in form but not in organizational process or intent, became the Word of the Year for 2017, according to Collins Dictionary. Thus, limiting their spread and impact has become a major focus for computer scientists, journalists, social media companies, and regulatory authorities.

This tutorial will cover recent work on a number of related problems such as misinformation, disinformation, "fake news", rumor, and clickbait detection, fact-checking, stance, bias and propaganda detection, source reliability estimation, as well as detecting bots, trolls, and seminar users. We will also discuss recent advances in automatic generation of text, e.g., GPT-2 and Grover, of images and of videos, e.g., "deep fakes", and their implication for robojournalism and "fake news" generation.

Bio: Dr. Preslav Nakov is a Principal Scientist at the Qatar Computing Research Institute (QCRI), HBKU. His research interests include computational linguistics, "fake news" detection, fact-checking, machine translation, question answering, sentiment analysis, lexical semantics, Web as a corpus, and biomedical text processing. He received his PhD degree from the University of California at Berkeley (supported by a Fulbright grant), and he was a Research Fellow in the National University of Singapore, a honorary lecturer in the Sofia University, and research staff at the Bulgarian Academy of Sciences. At QCRI, he leads the Tanbih project (http://tanbih.qcri.org), developed in collaboration with MIT, which aims to limit the effect of "fake news", propaganda and media bias by making users aware of what they are reading. Dr. Nakov is the Secretary of ACL SIGLEX and of ACL SIGSLAV, and a member of the EACL advisory board. He is member of the editorial board of TACL, C&SL, NLE, AI Communications, and Frontiers in AI. He is also on the Editorial Board of the Language Science Press Book Series on Phraseology and Multiword Expressions. He co-authored a Morgan & Claypool book on Semantic Relations between Nominals, two books on computer algorithms, and many research papers in top-tier conferences and journals. He also received the Young Researcher Award at RANLP'2011. He was also the first to receive the Bulgarian President's John Atanasoff award, named after the inventor of the first automatic electronic digital computer. Dr. Nakov's research was featured by over 100 news outlets, including Forbes, Boston Globe, Aljazeera, MIT Technology Review, Science Daily, Popular Science, Fast Company, The Register, WIRED, and Engadget, among others.

Slides

Deep Learning for Metaphors and Idioms

Valia Kordoni (Humboldt University of Berlin)

Summary: Idioms and metaphors are characteristic to all areas of human activity and to all types of discourse. Their processing is a rapidly growing area in NLP, since they have become a big challenge for NLP systems. Their omnipresence in language has been established in a number of corpus studies and the role they play in human reasoning has also been confirmed in psychological experiments. This makes idioms and metaphors an important research area for computational and cognitive linguistics, and their automatic identification and interpretation indispensable for any semantics-oriented NLP application.

This tutorial focuses on deep learning methods to metaphor detection, classification and analysis in multilingual, multi-genre and heterogeneous big data. The tutorial also deals with metaphor processing tools for NLP applications.

Bio: PD Dr. Valia Kordoni is a Deputy Chair of Computational Linguistics at the Department of English at Humboldt University Berlin. She is an active researcher in Language Technology (LT), Data Science and Artificial Intelligence (AI). Her research interests include multilingual Robust Natural Language Analytics, Computational Semantics, Discourse and Human Cognition Modeling, as well as Machine Learning for the automated acquisition of knowledge, especially concerning multiword units and their impact in Natural Language Processing, spoken and written. She has been the president of the ACL (Association for Computational Linguistics) SIGLEX's (Special Interest Group on Lexicon) MWE (Multiword Expressions) Group. She was the Local Chair of ACL 2016 - The 54th Annual Meeting of the Association for Computational Linguistics. She has coordinated and contributed to many projects funded by the EU, the DFG (Germany), the BMBF (Germany), the DAAD (Germany), as well as the NSF (USA), the latest of those being "TraMOOC: Translation for Massive Open Online Courses" (http://tramooc.eu), a EU-funded Horizon 2020 collaborative project aiming at providing reliable Neural Machine Translation for Massive Open Online Courses (MOOCs).

Neural Machine Translation

Antonio Miceli Barone (University of Edinburgh) and Sheila Castilho (Dublin City University)

Summary: Neural machine translation based on sequence-to-sequence neural networks has become the dominant paradigm in machine translation, achieving better accuracy and fluency than statistical approaches in most scenario. The first part of the tutorial will describe two of the most widely applied architectures: the recurrent encoder-decoder with attention and the Transformer. We will cover details crucial for obtaining good performance such as sub-word tokenization, data augmentation by back-translation, normalization layers, model depth and optimization techniques. We will also provide an overview of recent developments such as unsupervised machine translation, context-aware translation and non-autoregressive architectures. In the second part of the workshop, we will focus on evaluation. In both research and practice, evaluation is a complex task. In the machine translation field, evaluation involves a range of linguistic and extra linguistic factors. Early studies on NMT quality demonstrated that, in general, it yields higher automatic evaluation metric scores than its predecessor, statistical MT (SMT) – although there are arguments that some MT automatic metrics are not fit for NMT. NMT has also shown to provide a jump in fluency when compared with SMT. This increased fluency has quickly made NMT the preferred MT paradigm for assimilation, as is evident from the move to NMT by many major online MT providers. However, due to the novelty of NMT, little is known yet about how humans engage with NMT output. We will show the main practices for evaluation of machine translation and the importance of human evaluation when reporting the progress during the development of MT systems, as well as when evaluating their final quality.

Bio: Antonio Valerio Miceli Barone is a researcher at the University of Edinburgh. His research interests are Machine Translation, Natural Language Processing and Machine Learning. He has developed architectural improvements to recurrent neural network models, such as the BiDeep architecture for neural machine translation and regularization techniques for low-resource and domain adaptation scenarios. He has worked in European Commission projects such as TraMOOC, QT21 and Gourmet, the US IARPA project MATERIAL and industrial collaborations with Samsung and Booking.com. Antonio received his PhD from the University of Pisa defending a dissertation on syntax-based methods for machine translation.

Bio: Sheila Castilho graduated in Linguistics and Education from the UNIOESTE University in Brazil. She holds a joint Master in Natural Language Processing from the University of Wolverhampton – UK and the University of Algarve – PT. She completed her PhD dissertation at Dublin City University in 2016. Currently, she is a post-doctoral researcher at the ADAPT Centre. She was part of the TraMOOC (H2020) project and part of the iADAATPA (CEF) project. She has authored several journal articles and book chapters on translation technology, post-editing of machine translation, user evaluation of machine translation, and translators’ perception of machine translation. She is a co-editor of the book 'Translation Quality Assessment: From Principles to Practice', published in 2018 by Springer. She is also a co-editor of the Machine Translation Journal special issue on ‘Human factors In NMT’, to be published in 2019. Her research interests include machine translation, post-editing, machine and human translation evaluation, usability, and translation technologies.

Slides: Antonio Valerio Miceli Barone, Sheila Castilho

Latent Structure Models for NLP

Vlad Niculae and Tsvetomila Mihaylova (Institute of of Telecommunications, Lisbon)

Summary: Latent structure models are a powerful tool for modeling compositional data, discovering linguistic structure, and building NLP pipelines. They are appealing for two main reasons: they allow incorporating structural bias during training, leading to more accurate models; and they allow discovering hidden linguistic structure, which provides better interpretability.translation, and semantic parsing.

This tutorial will cover recent advances in discrete latent structure models. We discuss their motivation, potential, and limitations, then explore in detail three strategies for designing such models: gradient approximation, reinforcement learning, and end-to-end differentiable methods. We highlight connections among all these methods, enumerating their strengths and weaknesses. The models we present and analyze have been applied to a wide variety of NLP tasks, including sentiment analysis, natural language inference, language modeling, machine translation, and semantic parsing.

Examples and evaluation will be covered throughout. After attending the tutorial, a practitioner will be better informed about which method is best suited for their problem.

Bio: Vlad Niculae is a postdoc in the DeepSPIN project at the Institute of Telecommunications in Lisbon, Portugal. His research aims to bring structure and sparsity to neural network hidden layers and latent variables, using ideas from convex optimization, and motivations from natural language processing. He earned a PhD in Computer Science from Cornell University in 2018. He received the inaugural Cornell CS Doctoral Dissertation Award, and co-organized the NAACL 2019 Workshop on Structured Prediction for NLP (http://structuredprediction.github.io/SPNLP19).

Bio: Tsvetomila Mihaylova is a PhD student in the DeepSPIN project at the Institute of Telecommunications in Lisbon, Portugal, supervised by Andre Martins. ´She is working on empowering neural networks with a planning mechanism for structural search. She has a master’s degree in Information Retrieval from the Sofia University, where she was also a teaching assistant in Artificial Intelligence. She is part of the organizers of a shared task in SemEval 2019.

Slides: https://deep-spin.github.io/tutorial/