In order to resonate with the viewers, many video advertisements explore
creative narrative techniques such as "Freytag's pyramid" where a story begins
with exposition, followed by rising action, then climax, concluding with
denouement. In the dramatic structure of ads in particular, climax depends on
changes in sentiment.
We dedicate our study to understand the dynamic structure of video ads
automatically. To achieve this, we first crowdsource climax annotations on 1,149
videos from the Video Ads Dataset, which already provides sentiment annotations.
We then use both unsupervised and supervised methods to predict the climax.
Based on the predicted peak, the low-level visual and audio cues, and
semantically meaningful context features, we build a sentiment prediction model
that outperforms the current state-of-the-art model of sentiment prediction in
video ads by 25%.
In our ablation study, we show that using our context features, and modeling
dynamics with an LSTM, are both crucial factors for improved performance.
For questions regarding the dataset, please send an email to Keren Ye, firstname.lastname@example.org