REPORT Lebret_Idiap-RR-15-2016/IDIAP Twitter Sentiment Analysis (Almost) from Scratch Lebret, Rémi Pinheiro, Pedro H. O. Collobert, Ronan EXTERNAL http://publications.idiap.ch/attachments/reports/2015/Lebret_Idiap-RR-15-2016.pdf PUBLIC Idiap-RR-15-2016 2016 Idiap May 2016 A popular application in Natural Language Processing (NLP) is the Sentiment Analysis (SA), i.e., the task of extracting contextual polarity from a given text. The social network Twitter provides an immense amount of text (called tweets) generated by users with a maximum number of 140 characters. In this project, we plan to learn a tweet representation from publicly provided data from Tweets in order to infer sentiment from them. One challenge on this task is the fact that tweets are generated from very different users, making the data very heterogeneous (different from regular data which is written in proper English). Another challenge is, clearly, the large scale of the problem. We propose a deep learning sentence representation (called tweet representation) from user generated data to infer sentiment from tweets. This representation is learned from scratch (directly from the words in tweet) over a large unlabeled corpus of tweets. We demonstrate that we achieve state-of-the-art results for SA on tweets.

<subfield code="a">REPORT</subfield>

</datafield>

<subfield code="a">Lebret_Idiap-RR-15-2016/IDIAP</subfield>

</datafield>

<subfield code="a">Twitter Sentiment Analysis (Almost) from Scratch</subfield>

</datafield>

<subfield code="a">Lebret, Rémi</subfield>

</datafield>

<subfield code="a">Pinheiro, Pedro H. O.</subfield>

</datafield>

<subfield code="a">Collobert, Ronan</subfield>

</datafield>

<subfield code="i">EXTERNAL</subfield>

<subfield code="u">http://publications.idiap.ch/attachments/reports/2015/Lebret_Idiap-RR-15-2016.pdf</subfield>

<subfield code="x">PUBLIC</subfield>

</datafield>

<subfield code="a">Idiap-RR-15-2016</subfield>

</datafield>

<subfield code="b">Idiap</subfield>

</datafield>

</datafield>

<subfield code="a">A popular application in Natural Language Processing (NLP) is the Sentiment Analysis (SA), i.e., the task of extracting contextual polarity from a given text. The social network Twitter provides an immense amount of text (called tweets) generated by users with a maximum number of 140 characters. In this project, we plan to learn a tweet representation from publicly provided data from Tweets in order to infer sentiment from them. One challenge on this task is the fact that tweets are generated from very different users, making the data very heterogeneous (different from regular data which is written in proper English). Another challenge is, clearly, the large scale of the problem. We propose a deep learning sentence representation (called tweet representation) from user generated data to infer sentiment from tweets. This representation is learned from scratch (directly from the words in tweet) over a large unlabeled corpus of tweets. We demonstrate that we achieve state-of-the-art results for SA on tweets.</subfield>

</datafield>

</record>

</collection>