RANLP 2015

Tutorial Schedule

The main RANLP conference will be preceeded by two days tutorials delivered by distinguished lecturers. We plan 4 half-day tutorials, each with duration of 220 minutes, distributed as follows: 60 min talk + 20 min break + 60 min talk + 20 min break + 60 min talk.

	Morning	Afternoon
September 5	10:00 – 13:40 1. Paolo Rosso "Author profiling in social media"	15:00 – 18:40 2. Leon Derczynski "NLP for social media"
September 6	9:30 – 13:10 3. Horacio Saggion "An Introduction to Automatic Text Simplification"	14:30 – 18:10 4. Constantin Orasan "New trends in NLP"

Tutorials

Leon Derczynski (University of Sheffield)

"NLP for social media"

Summary: Social media is a crucial part of many everyday lives, not only as a fun and practical way to share interests and activities with geographically distributed networks of friends, but as an important part of our business lives also. Processing social media is particularly problematic for automated tools, because it is a strong departure from the tradition of newswire that many tools were developed with and evaluated against, and also due to the terse and low-context language it typically comprises. This tutorial takes a detailed view of key NLP tasks (corpus annotation, linguistic pre-processing, information extraction) of social media content. After a short introduction to the challenges of processing social media, we will cover key NLP algorithms adapted to processing such content, discuss available evaluation datasets and outline remaining challenges. The goal is to make attendees familiar with the issues involved in social media, to bring them up to date with the state of the art in social media NLP, and to give them practical tools for handling unstructured data of this kind.

Constantin Orasan (University of Wolverhampton)

"New trends in NLP"

Summary: Recent years have seen lots of changes in the field of computational linguistics, most of them due to the widespread use of the Internet and the benefits and problems it brings. The first part of this tutorial will discuss these changes and will focus on crowdsourcing and how it influenced the creation of annotated data and semantic resources.

Annotation of data employed to train and test NLP methods used to be the task of language experts who had a good understanding of the linguistic phenomena to be tackled. Given that a large number of people now have access to the Internet, crowdsourcing has become an alternative way of obtaining annotated data. The core idea of crowdsourcing is that it is possible to design tasks that can be completed by non-experts and that the outputs of these tasks can be combined to obtain high-quality linguistic annotation, which would normally be produced by experts. Examples of how crowdsourcing was employed in computational linguistics will be given.

Collaboratively constructed resources such as Wikipedia proved very useful for computational linguistics, leading to the creation of knowledge datasets and the use of linked open data (LOD) for language processing. We will show examples of methods that were used to create these data sets and research that combines LOD with NLP.

Big data is another trend in computational linguistics as researchers rely on more and more data for improving the results of a method. The second part of the tutorial will introduce the MapReduce programming model and show how it was used in processing language. Combined with processing larger quantities of data, the field of computational linguistics has applied deep learning to various tasks successfully, improving their accuracy. An introduction to deep learning will be provided, followed by examples of how it was applied to tasks such as learning semantic representations, sentiment analysis and machine translation evaluation.

Paolo Rosso (Polytechnic University of Valencia)

"Author profiling in social media"

Summary: Given a document, what're its author's traits? Author profiling distinguishes between classes of authors studying how language is shared by classes of people. This task helps in profiling aspects such as gender, age, native language, and personality type. Author profiling is a problem of growing importance in applications in forensics, security, and marketing. E.g., from a forensic linguistics perspective one would like being able to know the linguistic profile of the author of a harassing text message (language used by a certain type of people) and identify certain characteristics (language as evidence). Similarly, from a marketing viewpoint, companies may be interested in knowing, on the basis of the analysis of blogs and online product reviews, the demographics of people that like or dislike their products. The focus is on author profiling in social media since we are mainly interested in everyday language and how it reflects basic social and personality processes. At the end of the tutorial, we will describe the performance of the systems that they participated in the author profiling shared task at PAN: pan.webis.de/ Finally, we will try to go beyond standard author profiling and we will try to address also how the language is shared by a special class of people: those that are ironic.

Horacio Saggion (University Pompeu Fabra, Barcelona)

"An Introduction to Automatic Text Simplification"

Summary: Automatic text simplification as an NLP task arose from the necessity to make electronic textual content equally accessible to everyone. Automatic text simplification is a complex task which encompasses a number of operations applied to a text at different linguistic levels. The aim is to turn a complex text into a simplified variant, taking into consideration the specific needs of a particular target user. Automatic text simplification has traditionally had a double purpose. It can serve as preprocessing tool for other NLP applications and it can be used for a social function, making content accessible to different users such as foreign language learners, readers with aphasia, low literacy individuals, etc. The first attempts to text simplification were rule-based syntactic simplification systems however nowadays with the availability of large parallel corpora, such as the Original and the Simple English Wikipedia, approaches to automatic text simplification have become more data-driven. Text simplification is a very active research topic where progress is still needed. This tutorial will provide the audience with a panorama of more than a decade of work in the area emphasizing also the relevant social function that content simplification can make to the information society. The tutorial will include demonstrations of existing technologies.