In order to resonate with the viewers, many video advertisements explore creative narrative techniques such as "Freytag's pyramid" where a story begins with exposition, followed by rising action, then climax, concluding with denouement. In the dramatic structure of ads in particular, climax depends on changes in sentiment. We dedicate our study to understand the dynamic structure of video ads automatically. To achieve this, we first crowdsource climax annotations on 1,149 videos from the Video Ads Dataset, which already provides sentiment annotations. We then use both unsupervised and supervised methods to predict the climax. Based on the predicted peak, the low-level visual and audio cues, and semantically meaningful context features, we build a sentiment prediction model that outperforms the current state-of-the-art model of sentiment prediction in video ads by 25%. In our ablation study, we show that using our context features, and modeling dynamics with an LSTM, are both crucial factors for improved performance.






For questions regarding the dataset, please send an email to Keren Ye,