טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentPeled Lotem
SubjectInterpretation Sarcasm with Sentiment Based
Monolingual Machine Translation
DepartmentDepartment of Industrial Engineering and Management
Supervisor Assistant Professor Roi Reichart


Abstract

Sarcasm is a form of speech in which one says the opposite of what they truely mean in order to express a strong sentiment. This complex form of communication is especially common in user generated content, such as social network posts, online product reviews and blog posts. This type of content is frequently used in Natural Language Processing tasks such as sentiment analysis and opinion mining. However, sarcastic expressions - which often appear in such texts - pose a serious challenge to the algorithms approaching these tasks, since they usually regard words in a literal fashion. For example, in the sarcastic tweet “I’m so happy about the results of the USA elections #sarcasm”, any algorithm that would analyze the text literaly will reach the conclusion that the writer really is pleased with the outcome, and that they sense a positive sentiment, whereas this obviously is not the case. In fact, literal analysis poses a challenge not only for NLP algorithms but also for certain populations such as those with Autism and Asperger’s syndrome.

Driven by these challenges, we present the novel task of Sarcasm Interpretation. We define this task as the generation of a non-sarcastic expression presenting the same meaning of the original sarcastic one. In our work we target interpretation of tweets makred with #sarcasm, a quality which assures us that the text was meant sarcastically. This, for the sarcastic tweet “I’m so happy about the results of the USA elections #sarcasm” we would like to generate a non-sarcastic interpretation such as “I’m unhappy about the results of the USA elections”. Note that there are many non-sarcastic interpretations to any given sarcastic tweet.     


In order to generate these interpretations automatically, we collect a first-of-its-kind dataset comprised of sarcasic tweets and their non-sarcastic interpretations. The dataset cotains 3000 sracastic tweets (Twitter posts with #sarcasm), whereas the non-sarcastic interpretations were created by 10 human judges from the fields of comedy writing and paraphrasing. For each sarcastic tweet we obtained 5 non-sarcastic interpretations, resulting in 15,000 (sarcastic, non-sarcastic) pairs. In section 3 of this work we present the data elaborately along with examples and challenges that we encountered. The dataset, as well the code for our algorithm, is available online on our project page: https://github.com/Lotemp/SarcasmSIGN


We approach the task of sarcasm interpretation as a translation task (from sarcastic language to non-sarcastic language), and utilize Machine Translation (MT) algorithms and evaluation metrics, as we elaboratly present in section 5 of this work. In the following, we present the Sarcasm SIGN algorithm- our Machine Translation based Sarcasm Interpretation algorithm, which targets sentiment words, that are a strong characteristic of sarcastic language. We show that even though automatic MT evaluation metrics present similar performance for multiple sarcasm interpretation algorithms, SIGN’s interpretations are scored significantly higher by human judges for adequacy and correct sentiment.  Finally, we present the challenges and nuances in sarcasm interpretations and discuss future work direction that arise from this novel task.