|Title||Economically-efficient Sentiment Stream Analysis|
|Publication Type||Conference Proceedings|
|Year of Publication||2014|
|Authors||Roberto Lourenco, Adriano Veloso, Adriano Pereira, Wagner Meira, Renato Ferreira, Srinivasan Parthasarthy|
|Conference Name||37th international ACM SIGIR Conference on Research & Development in Information Retrieval|
|Conference Location||Gold Coast, Australia|
|Keywords||Economic Efficiency, Sentiment Analysis, Streams and Drifts|
Text-based social media channels, such as Twitter, produce torrents of opinionated data about the most diverse topics and entities. The analysis of such data (aka. sentiment analysis) is quickly becoming a key feature in recommender systems and search engines. A prominent approach to sentiment analysis is based on the application of classification techniques, that is, content is classified according to the attitude of the writer. A major challenge, however, is that Twitter follows the data stream model, and thus classifiers must operate with limited resources, including labeled data and time for building classification models. Also challenging is the fact that sentiment distribution may change as the stream evolves. In this paper we address these challenges by proposing algorithms that select relevant training instances at each time step, so that training sets are kept small while providing to the classifier the capabilities to suit itself to, and to recover itself from, different types of sentiment drifts. Simultaneously providing capabilities to the classifier, however, is a conflicting-objective problem, and our proposed algorithms employ basic notions of Economics in order to balance both capabilities. We performed the analysis of events that reverberated on Twitter, and the comparison against the state-of-the-art reveals improvements both in terms of error reduction (up to 14%) and reduction of training resources (by orders of magnitude).