CVPR 2018 Workshop: Towards Automatic Understanding of Visual Advertisements (ADS)

Introduction

Recent advances in computer vision have set the stage for more and more challenging tasks to be tackled. Many of these tasks, such as visual question answering and modeling physical forces, would have until recently been thought of as impossible for machines. These problems are challenging and interesting, but they all analyze images that arise naturally, i.e. ones taken by a photographer. In contrast, many images in the media, in particular image advertisements, are often carefully artificially constructed, with a certain goal in mind, i.e. to convey a particular message to the target audience. This poses the interesting challenge of inferring not the physical content of the image, but its visual rhetoric. This rhetoric does rely on physical content but the latter (1) is often portrayed in non-traditional ways, (2) must be involved in reasoning steps to infer what a certain juxtaposition of objects implies, and (3) must be understood in the context of cultural phenomena.

For example, consider the "Junk deer" ad. To infer its message, the viewer must first visually recognize a deer which is made of junk (a rather imaginative incarnation of deer). Next, the viewer must reason about what this implies-- the deer perhaps ate trash, which would be bad for the deer. Thus, the ad implies that pollution is harmful to wildlife. In "Melting earth", the viewer must recognize the melting process happening to Earth. In "Straws striving towards Pepsi can", the viewer must recognize that the straws are striving towards the can, and infer this implies the contents of the can are desirable. In "Owl and coffee", the owl symbolizes wakefulness. In "Natural cow", the ice-cream is natural because the "cow" is made of natural ingredients. In "Porcelain man", the man shares the texture of a porcelain vase and is similarly fragile. In "Zebra chasing lion", the viewer is surprised to see a zebra chasing a lion rather than vice versa. In "Heavy metal fries", the viewer must recognize a cultural symbol (the devil's horns popular in heavy metal culture). In "Boot crushing sandal", the viewer must "fill in" the presence of a man and a woman, and recognize an implied crushing action.


Junk deer	Melting earth	Straws striving towards Pepsi can	Owl and coffee	Natural cow	Porcelain man	Zebra chasing lion	Heavy metal fries	Boot crushing sandal

Understanding advertisements poses many challenges, and provides context for the type of problems we aim to solve in computer vision. Developing methods for automatic understanding of the messages of ads requires participation from computer vision researchers with diverse backgrounds. Related topics include:

Visual persuasion, politics and social media
Metaphors in language
Motivations of humans
Recognition of objects in non-photorealistic datasets and domain adaptation
Knowledge representation for visual question answering
Modeling physical forces
Abnormality and surprise detection
Humor
Visual attention and memorability
Sentiment
Analyzing infographics, charts and comics

A large annotated dataset of image and video ads is available here. In this dataset, we provide over 64,000 ad images annotated with the topic of the ad (e.g. the product or topic, in case of public service announcements), the sentiment that the ad provokes, any symbolic references that the ad makes (e.g. an owl symbolizes wakefulness, ice symbolizes freshness, etc.), including bounding boxes containing the physical content that alludes symbolically to concepts outside of the ad, and questions and answers about the meaning of the ad ("What should I do according to the ad? Why should I do it, according to the ad?")

[top]

Program and Speakers

Date: June 22, 2018
Location: Room 150 - DEF

Time		Speaker/Topic
9am-9:30am		welcome and brainstorming
9:30am-10am		Jiebo Luo (Univ. of Rochester)
10am-10:30am		Jesse Berent (Google) [slides]
10:30am-11am		coffee break
11am-11:30am		more brainstorming
11:30am-12pm		challenge winner talk: Mayu Otani, Yuki Iwazaki, Kota Yamaguchi [slides]
12pm-1:30pm		lunch
1:30pm-2:15pm		Jungseock Joo (UCLA) [slides]
2:15pm-3pm		Lydia Chilton (Columbia Univ.) [slides]
3pm-4:30pm		posters and coffee break: Emotional Style Transfer for Stock Assets. Kazuhiro Ota and Kota Yamaguchi (CyberAgent, Inc.) Understanding Visual Ads by Aligning Symbols and Objects using Co-Attention. Karuna Ahuja, Karan Sikka, Anirban Roy and Ajay Divakaran (SRI International) Interpreting Visual Metaphors in Advertising. Savvas D Petridis and Lydia B Chilton (Columbia University)
4:30pm-5:15pm		brainstorming and closing remarks

[top]

Challenge

We are running a competition prior to the workshop, with results announced at the workshop. The tentative timeline is as follows:

March 21: development set released
April 27: test phase begins
May 31: final submissions due
June: scoreboard results revealed, winners announced

Please access the competition details and data here. We look forward to your submission!

[top]

Submission

We are looking for 3-page abstracts (work in progress, unpublished or previously published work) on topics related to ad-understanding (see example topics above).

Submission is now open: CVPRADS2018

Submission deadline: April 27, 2018 (extended)

[top]

Organizers

	Adriana Kovashka (Univ. of Pittsburgh)
	James Hahn (Univ. of Pittsburgh)

[top]