Much progress has been made in the field of sentiment analysis in the past years. Researchers relied on textual data for this task, while only recently they have started investigating approaches to predict sentiments from multimedia content. With the increasing amount of data shared on social media, there is also a rapidly growing interest in approaches that work ``in the wild'', i.e. that are able to deal with uncontrolled conditions. In this work, we faced the challenge of training a visual sentiment classifier starting from a large set of user-generated and unlabeled contents. In particular, we collected more than 3 million tweets containing both text and images, and we leveraged on the sentiment polarity of the textual contents to train a visual sentiment classifier. To the best of our knowledge, this is the first time that a cross-media learning approach is proposed and tested in this context. We assessed the validity of our model by conducting comparative studies and evaluations on a benchmark for visual sentiment analysis. Our empirical study shows that although the text associated to each image is often noisy and weakly correlated with the image content, it can be profitably exploited to train a deep Convolutional Neural Network that effectively predicts the sentiment polarity of previously unseen images. The dataset used in our experiments, named T4SA (Twitter for Sentiment Analysis), is available on this page.
BibTeX will go here
Hybrid-T4SA models are Hybrid-CNNs (AlexNet trained on ILSVRC12 + Places205) finetuned on our T4SA dataset, while VGG-T4SA models are VGG-19 nets trained on ILSVRC12 and then finetuned on T4SA. In FT-A models, all layers have been finetuned, while in FT-F model convolutional layers are fixed and only fully-connected layers are finetuned.
You can check the performance of the available models in the Experimental Results section.
The data collection process took place from July to December 2016, lasting around 6 months in total. During this time span, we exploited Twitter's Sample API to access a random 1% sample of the stream of all globally produced tweets, discarding:
The details of the dataset are reported in the following table. You can request the access to the T4SA dataset using the form below.
(w/o near dups)
In the first table, we report the performance of the models we trained, together with other previously published methods, on two testing datasets: Twitter Testing Dataset and B-T4SA test split.
In the second table, we report the accuracy of different models when performing cross-validation on the Twitter Testing Datatset.
|Model||Twitter Testing Dataset
(2-way: pos, neg)
|B-T4SA test set
(3-way: pos, neu, neg)
|5 agree||≥ 4 agree||≥ 3 agree|
|CNN (You et al.)||72.2%||68.6%||66.7%||-|
|PCNN (You et al.)||74.7%||71.4%||68.7%||-|
|Twitter Testing Dataset|
|Method||Training derails||5 agrees||≥ 4 agrees||≥ 3 agrees|
|Approaches without intermediate fine-tuning|
|GCH + BoW *||-||71.0%||68.5%||66.5%|
|LCH + BoW *||-||71.7%||69.7%||66.4%|
|CNN ●||Custom architecture tr. on Flickr (VSO)||78.3%||75.5%||71.5%|
|AlexNet ●||AlexNet tr. on ILSVRC2012||81.7%||78.2%||73.9%|
|PlaceCNN ●||AlexNet tr. on Places205||83.0%||-||-|
|GoogleNet ●||GoogleNet tr. on ILSVRC2012||86.1%||80.7%||78.7%|
|HybridNet ●||AlexNet tr. on (ILSVRC2012 + Places205)||86.7%||81.4%||78.1%|
|VGG-19 ●||VGG-19 tr. on ILSVRC2012||88.1%||83.5%||80.0%|
|Approaches using an intermediate fine-tuning|
|PCNN ●||Custom architecture tr. on Flickr (VSO) + ft. on Flickr (VSO)||77.3%||75.9%||72.3%|
|DeepSentiBank ○●||AlexNet tr. on ILSVRC2012 + ft. on Flickr (VSO)||80.4%||-||-|
|MVSO [EN] ○●||DeepSentiBank ft. on MVSO-EN||83.9%||-||-|
|Hybrid-T4SA FT-A (Ours) ●||AlexNet tr. on (ILSVRC2012 + Places205) + ft. on B-T4SA||86.4%||83.0%||80.0%|
|Hybrid-T4SA FT-F (Ours) ●||AlexNet tr. on (ILSVRC2012 + Places205) + ft. on B-T4SA||87.3%||83.2%||81.0%|
|VGG-T4SA FT-F (Ours) ●||VGG-19 tr. on ILSVRC2012 + ft. on B-T4SA||88.9%||85.7%||81.5%|
|VGG-T4SA FT-A (Ours) ●||VGG-19 tr. on ILSVRC2012 + ft. on B-T4SA||89.6%||86.6%||82.0%|
* Approch based on low-level features ○ Approch based on mid-level features ● Approch based on deep learning