AnnouncementsPlease remember that you need to meet with me to discuss your paper presentation by Friday/Monday (for Tuesday/Thursday presentations, respectively), and that I need to receive your slides by 10pm on the day before the day when you want to meet, along with times when you can meet.
OverviewCourse description: In this graduate level seminar course, we will examine recent advances in computer vision, with a focus on high-level recognition tasks. We will analyze approaches to research problems in visual recognition, and discuss future directions that these problems and approaches inspire. The course structure will combine lectures, student presentations on assigned conference and journal publications, in-class discussions, and a course project. The goal of this course is to become knowledgeable about the state of the art and the challenges in visual recognition, as well as to develop the skills to read critically, as well as write up and present research work clearly. Example topics include: object detection and recognition, action recognition, image descriptors, mid-level representations, attribute-based recognition and search, context, saliency and importance, unsupervised visual discovery, active and transfer category learning, interactive vision with a human in the loop, interactions between vision and language, big data, image search and retrieval, image and video summarization, and debugging vision systems.
Prerequisites: Basic understanding of probability and linear algebra is required. Familiarity or experience with machine learning is recommended. Each student should speak with the instructor to determine if he or she has sufficient background for this course.
RequirementsGrading will be based on the following components:
- Paper reviews (20%)
- In-class participation and discussion (20%)
- Paper presentations (20%)
- Experiment presentations (10%)
- Final project (30%)
Paper reviewsStudents will be required to write a paper review for one of the two papers discussed in each class. By default, this should be the Primary paper listed for each class, but students can choose to review a Secondary paper. The Primary paper is the one on which we will focus more in class, and we will discuss the Secondary paper at a higher level. In some cases, the Schedule below will have a * next to one paper, which indicates this paper should be read first.
Reviews are due at 10pm on the day before the respective class. They should be emailed in PDF or Word document format to the instructor's email address, with subject "CS3710 Paper Review". The filename should be [first name of student]_[last name]_review_[month]_[day of the class when paper is discussed].pdf (or .doc, .docx), e.g. adriana_kovashka_review_01_15.pdf. Make it clear which of the six questions below you're addressing in each paragraph, and use one paragraph for each question. Please also put your name in the document text. You can skip writing a review for a class during which you are giving a paper presentation (but not if you are giving an experiment presentation).
The reviews should be no longer than one page, and should address the following questions:
- Summarize what this paper aims to do, and what its main contribution is.
- Summarize the proposed approach.
- Summarize the experimental validation of the approach.
- What are three advantages of the proposed approach?
- What are three disadvantages or weaknesses of the approach or experimental validation?
- Suggest one possible extension of this approach, i.e. one idea for future work.
- Any other thoughts, comments, or questions on this paper? (optional)
In-class participation and discussionStudents should actively engage in in-class discussions. For example, this could take the form of asking questions or making meaningful remarks and comments about the work following a paper presentation, or responding to others' questions or comments.
Paper presentationsEach student will be assigned to give about 2 presentations, each of which will cover the 2 papers for the given class. Each presentations will be about 30 minutes long (40 minutes if there is no experiment presentation that day). The presentation will be followed by a discussion capped at 15 (25) minutes, and the presenter will be in charge of driving and moderating this discussion. Presentations should cover the following:
- What problem is each paper trying to solve?
- Why should we care about this problem? What challenges does it pose? In 1-2 sentences, what are the previous approaches to solving this problem, if any?
- At a high level, what approach does this work take to solving the problem?
- In more detail, what are the key steps in the approach?
- How do the authors set up the evaluation of their approach? What hypotheses are they trying to verify?
- What experimental outcomes do the authors find? Are the hypotheses confirmed?
- What are some weaknesses of the approach, or some directions for future work? Be ready to pose some discussion questions.
Make sure to rehearse your presentations so that it is clear and polished. Use many visuals on the slides, and use text sparingly, primarily in bulleted form. You are encouraged to browse the web for resources or slides related to this work, including for original slides from the authors that include results not shown in the paper. However, always cite your sources for any slides that other authors created. Also, always use your own words on slides and during the presentation. Do not simply copy text from the paper or from other resources.
Grading rubric: Your grade for a paper presentation will be based on: (1) whether you sent me a draft of your slides and met with me to discuss these by the deadline; (2) whether you adequately, clearly and correctly addressed and explained all important points of the paper in your presentation; (3) how clear and informative your classmates found the presentation; (4) how you moderated the discussion of the paper; and (5) how well-rehearsed your presentation was.
Experiment presentationsStudents will present an experimental evaluation of 1 paper covered in class. This can be either a Primary or Secondary paper. Students can volunteer to present an additional paper in a later class, in which case the better presentation grade will be used. Presentations should be no more than 15 minutes long, and will be followed by a discussion (capped at 10 minutes).
The goal of this evaluation is to examine in more detail the merits of the paper, and whether the success of the approach is robust to changes in the implementation or experimental setup. Further, how does including some parts of the full approach and removing others affect the experimental outcomes? How do various implementation choices affect the performance of the algorithm? How sensitive is the method to different parameter choices?
The student should pick one to three aspects of the paper to test, as opposed to attempting to match all results in the paper. These tests can be disjoint from what the authors of the paper chose to test. If code for the examined paper is not available, the student should implement a basic version (that captures the core idea) of the proposed approach. If code is available, the author should cite it during the presentation. In the presentation, explain what you did, why you did it, what choices you made implementation-wise, and what you found out.
Please look at the following for examples of experiment presentations: example 1, example 2, example 3, example 4, example 5.
Grading rubric: Your experiment presentation grade will depend on: (1) whether you motivated well your choice of what to test; (2) whether your experimental setup was sensible; (3) how well you explained what you did, and what conclusions you drew; and (4) how useful your classmates found your presentation.
Final projectStudents will complete a project which studies in more depth one of the topics we cover in class. For most types of projects, students can work in pairs (see exception below). A project can become a subsequent conference publication. These projects should focus on one of the following:
- an extension of one or more of the works we covered in class, including experimental evaluation
- a novel approach which addresses one of the problems covered in class, properly evaluated
- a definition of a new problem, along with detailed argumentation of why this problem is important and challenging, an approach to solve this problem, and an evaluation of this approach
- an extensive analysis and experimental evaluation of one or more of the approaches covered in class (think of this as an extended combined version of a paper and an experiment presentation)
- an extensive literature review and analysis on one of the topics covered in class (this one can only be done by students working individually)
The mid-semester project status report will describe the students' progress on the project, and any problems encountered along the way. The status report should use the CVPR latex template, but can be more informal than a conference paper. The progress report should include the following sections: Introduction, Related Work, Approach, and Results. In Results, include your experimental setup (this can change later). If you have results but they do not yet look great, include them anyway for the purpose of this status report.
The project presentation will describe the students' approach and their experimental findings in a clear and engaging fashion. This will be a chance to get feedback from the class before final submission of your report. Presentations will be about 15-20 minutes long.
The project final report should resemble a conference paper, with clear problem definition and argumentation of why this problem is important, overview of related work, detailed explanation of the approach, and well-motivated experimental evaluation. The report should use the CVPR latex template. If this project was done with a partner, each student should document what part of the project he or she did, and how duties and tasks were divided.
All project written items should be emailed to the instructor by 5pm with the subject line "CS3710 Project". The grade breakdown and due dates for the project are:
- Project proposal (5% of course grade) - due March 6, 5pm
- Project status report (5% of course grade) - due April 3, 11:59pm
- Project presentations (10% of course grade) - April 16, 21, 23
- Project final report (10% of course grade) - due April 24, 11:59pm
Note on Academic DishonestyThe work you turn in must be your own work. Plagiarism will cause you to fail the class and receive disciplinary penalty. See info above regarding referring to resources in your presentations.
Note on DisabilitiesIf you have a disability for which you are or may be requesting an accommodation, you are encouraged to contact both your instructor and Disability Resources and Services, 216 William Pitt Union, (412) 648-7890/(412) 383-7355 (TTY), as early as possible in the term. DRS will verify your disability and determine reasonable accomodations for this course.
Note on Medical ConditionsIf you have a medical condition which will prevent you from doing a certain assignment or coming to class, you must inform the instructor of this before the deadline. You must then submit documentation of your condition within a week of the assignment deadline.
|1/06||Introduction [slides]||topic preferences due January 7, 10pm|
|1/08||Describing images with features [slides]|
Overview of Adriana's research [slides]
* Secondary: Pages 178-188, 216-220, 254-255 from Local Invariant Feature Detectors: A Survey. T. Tuytelaars and K. Mikolajczyk. Foundations and Trends in Computer Graphics and Vision, 2008. [pdf] [relevant code]
Primary: Object Recognition from Local Scale-Invariant Features. D. Lowe. ICCV 1999. [pdf] [code]
|Papers: Yan [slides]|
Primary: Video Google: A Text Retrieval Approach to Object Matching in Videos. J. Sivic and A. Zisserman. ICCV 2003. [pdf]
Secondary: The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features. K. Grauman and T. Darrell. ICCV 2005. [pdf] [code]
|Papers: Brandon [slides]|
* Secondary: Histograms of Oriented Gradients for Human Detection. N. Dalal and B. Triggs. CVPR 2005. [pdf]
Primary: A Discriminatively Trained, Multiscale, Deformable Part Model. P. Felzenszwalb, D. McAllester, and D. Ramanan. CVPR 2008. [pdf] [code]
|Papers: Chris [slides]|
* Primary: Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. S. Lazebnik, C. Schmid, and J. Ponce. CVPR 2006. [pdf] [code] [data]
Secondary: Object Bank: A High-Level Image Representation for Scene Classification and Semantic Feature Sparsification L-J. Li, H. Su, E. Xing, L. Fei-Fei. NIPS 2010. [pdf] [code]
|Papers: Connie [slides]
Experiment: Bhavin [slides]
* Primary: Describing Objects by Their Attributes.
A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, CVPR 2009. [pdf]
[code and data]
Secondary: Relative Attributes. D. Parikh and K. Grauman. ICCV 2011. [pdf] [code and data]
|Papers: Phuong [slides]
|2/03||Guest lecture I -- Abhinav Shrivastava, PhD
student at CMU
Homework: Paper review for Data-driven Visual Similarity for Cross-domain Image Matching, due 10pm on 2/02.
|2/05||Guest lecture II -- David Fouhey, PhD student at CMU
Homework: Paper review for Data-Driven 3D Primitives for Single Image Understanding, due 10pm on 2/04.
Primary: From Contours to Regions: An Empirical Evaluation. P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. CVPR 2009. [pdf] [code and data]
Secondary: Constrained Parametric Min-Cuts for Automatic Object Segmentation. J. Carreira and C. Sminchisescu. CVPR 2010. [pdf] [code]
|Papers: Yingjie [slides]
Experiment: Yan [slides]
|2/12||Mining and retrieval||
Primary: Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval. O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. CVPR 2007. [pdf] [data]
Secondary: World-scale Mining of Objects and Events from Community Photo Collections. T. Quack, B. Leibe, and L. Van Gool. CIVR 2008. [pdf]
|Papers: Bhavin [slides]|
* Secondary: Articulated Pose Estimation using Flexible Mixtures of Parts. Y. Yang and D. Ramanan. CVPR 2011. [pdf] [code]
Primary: Real-Time Human Pose Recognition in Parts from a Single Depth Image. J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake. CVPR 2011. [pdf] [slides and video]
|Papers: Jesse [slides]|
Primary: Learning Realistic Human Actions from Movies. I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. CVPR 2008. [pdf] [code] [data]
Secondary: Action Recognition from a Distributed Representation of Pose and Appearance. S. Maji, L. Bourdev, and J. Malik. CVPR 2011. [pdf] [code]
|Papers: Nils [slides]
Scene Semantics from Long-term Observation of People. V. Delaitre, D. Fouhey, I. Laptev, J. Sivic, A. Gupta, and A. Efros. ECCV 2012. [pdf] [code and data]
Secondary: Object-Graphs for Context-Aware Category Discovery. Y. J. Lee and K. Grauman. CVPR 2010. [pdf] [code and data]
|Papers: Zitao [slides]
Experiment: Phuong [slides]
|2/26||Groups of objects||
Primary: Finding Things: Image Parsing with Regions and
Per-Exemplar Detectors. J. Tighe and S. Lazebnik. CVPR 2013. [pdf] [code]
Secondary: Recognition Using Visual Phrases. M. Sadeghi and A. Farhadi. CVPR 2011. [pdf]
|Papers: Wei [slides]|
|3/03||Unsupervised visual discovery||
Unsupervised Discovery of Mid-Level Discriminative patches.
S. Singh, A. Gupta, and A. Efros. ECCV 2012. [pdf]
Primary: Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time. Y. J. Lee, A. Efros, and M. Hebert. ICCV 2013. [pdf] [code and data]
|Papers: Bhavin [slides]
Experiment: Zitao [slides]
|3/05||Vision and language||
Every Picture Tells a Story: Generating Sentences for Images. A. Farhadi, M. Hejrati, A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. ECCV 2010. [pdf] [data]
Primary: Baby Talk: Understanding and Generating Simple Image Descriptions. G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A. Berg, and T. Berg. CVPR 2012. [pdf]
|Papers: Yingjie [slides]||project proposal due March 6, 5pm|
|3/17||Active learning and interactive recognition||
What's It Going to Cost You? : Predicting Effort vs. Informativeness for Multi-Label Image Annotations. S. Vijayanarasimhan and K. Grauman. CVPR 2009. [pdf] [data]
Primary: Visual Recognition with Humans in the Loop. S. Branson, C. Wah, B. Babenko, F. Schroff, P. Welinder, P. Perona, and S. Belongie. ECCV 2010. [pdf] [data]
|Papers: Yan [slides]|
|3/19||Transfer learning and adaptation||
Cross-Domain Video Concept Detection using Adaptive SVMs. J. Yang, R. Yan, and A. Hauptmann. ACM Multimedia 2007. [pdf] [code]
Secondary: Tabula Rasa: Model Transfer for Object Category Detection. Y. Aytar and A. Zisserman. [pdf]
|Papers: Jesse [slides]|
|3/24||Visualizing and debugging vision systems||
HOGgles: Visualizing Object Detection Features. C. Vondrick, A. Khosla, T. Malisiewicz, and A. Torralba. ICCV 2013. [pdf] [code]
Secondary: Finding the Weakest Link in Person Detectors. D. Parikh and C. L. Zitnick. CVPR 2011. [pdf] [data]
|Papers: Connie [slides]
Experiment: Nils [slides]
* Secondary: Diagnosing Error in Object Detectors.
D. Hoiem, Y. Chodpathumwan, and Q. Dai. ECCV 2012. [pdf] [code and data]
Primary: What is an Object? B. Alexe, T. Deselaers, and V. Ferrari. CVPR 2010. [pdf] [code]
|Papers: Brandon [slides]
Experiment: Chris [slides]
Primary: Annotator Rationales for Visual Recognition. J. Donahue and K. Grauman. ICCV 2011. [pdf] [data]
Secondary: Assessing the Quality of Actions. H. Pirsiavash, C, Vondrick, and A. Torralba. ECCV 2014. [pdf] [code and data]
|Papers: Xinyue [slides]|
Peekaboom: A Game for Locating Objects in Images. L. von Ahn, R. Liu, and M. Blum, CHI 2006. [pdf]
Secondary: Crowdsourcing Annotations for Visual Object Detection. H. Su, J. Deng, and L. Fei-Fei. HCOMP 2012. [pdf]
|Papers: Nils [slides]||project status report due April 3, 11:59pm|
|4/07||Saliency and importance||
Learning to Predict Where Humans Look. T. Judd, K. Ehinger, F. Durand, and A. Torralba. ICCV 2009. [pdf] [code and data]
Secondary: Understanding and Predicting Importance in Images. A. Berg et al. CVPR 2012. [pdf]
|Papers: Chris [slides]|
Learning Everything about Anything: Webly-Supervised Visual Concept Learning. S. Divvala, A. Farhadi, and C. Guestrin. CVPR 2014. [pdf] [project]
Secondary: Scene Completion using Millions of Photographs. J. Hays and A. Efros. SIGGRAPH 2007. [pdf] [code and data]
|Papers: Phuong [slides]
Experiment: Jesse [slides]
|4/14||Video: ego-centric and summarization||
Recognizing Activities of Daily Living in First-Person Camera
Views. H. Pirsiavash and D. Ramanan. CVPR 2012. [pdf] [code
and data] [slides]
Secondary: Nonchronological Video Synopsis and Indexing. Y. Pritch, A. Rav-Acha, and S. Peleg. PAMI 2008. [pdf]
|Papers: Connie [slides]
Experiment: Yingjie [slides]
|4/16||Project presentations: (1) Bhavin; (2) Yan|
|4/21||Project presentations: (1) Brandon; (2) Chris|
|4/23||Project presentations: (1) Nils & Phuong; (2) Yingjie & Jesse; (3) Connie||project final report due April 24, 11:59pm|
ResourcesThis course was inspired by the following courses:
- Visual Recognition by Kristen Grauman, UT Austin, Fall 2012
- Advanced Topics in Computer Vision by Devi Parikh, Virginia Tech, Spring 2014
- Visual Recognition by Yong Jae Lee, UC Davis, Fall 2014
- Recognizing and Learning Object Categories by Li Fei-Fei, Rob Fergus, Antonio Torralba
- Object Recognition and Scene Understanding by Antonio Torralba
- Learning-Based Methods in Vision by Alyosha Efros
- Recognizing People, Objects, and Actions by Tamara Berg
- Attributes by Devi Parikh, Ali Farhadi, Kristen Grauman, Tamara Berg, and Abhinav Gupta
- Statistical and Structural Recognition of Human Actions by Ivan Laptev and Greg Mori
- Computer Vision: Algorithms and Applications by Richard Szeliski (available for free on author's page)
- Visual Object Recognition by Kristen Grauman and Bastian Leibe
- LIBSVM (by Chih-Chung Chang and Chih-Jen Lin)
- SVM Light (by Thorsten Joachims)
- VLFeat (feature extraction, tutorials and more, by Andrea Vedaldi)
- GIST feature extraction (by Aude Oliva and Antonio Torralba)
- Caffe (deep learning code by Yangqing Jia et al.)
- Microsoft COCO (Common Objects in Context)
- PASCAL VOC
- Caltech 256
- SUN Database
- Animals with Attributes
- Caltech-UCSD Birds 200
- INRIA Movie Actions
- Matlab tutorial
- Linear algebra review by Fei-Fei Li
- Brief machine learning intro by Aditya Khosla and Joseph Lim
- Resources list (including code and data, tutorials, and other related courses) compiled by Devi Parikh
- Recognition datasets list compiled by Kristen Grauman
- Human activity datasets list compiled by Chao-Yeh Chen