Overview
Course description: In this class, students will learn about modern computer vision. The first part of the course will cover fundamental concepts such as filtering, extracting features and describing images, grouping features, matching features across multiple views, and classification with support vector machines and neural networks. In the second part, we will cover techniques and approaches to classic tasks/topics such as object recognition and vision and language, and self-supervised and embodied learning. The format will include lectures, homework assignments, and a course project.Prerequisites: CS1501 and MATH 0280 (or equivalent). The expectation is that you can program and analyze the performance of programs. Some experience with linear algebra (matrix and vector operations), basic calculus, and probability and statistics is strongly recommended.
Piazza: Sign up for it here. Please use Piazza rather than email so everyone can benefit from the discussion-- you can post in such a way that only the instructor sees your name. Please try to answer each others' questions whenever possible. The best time to ask the instructor or TA questions is during office hours.
Programming languages: For homework assignments, you will use Python. For the course project, you can use any language of your choice.
Textbooks:
- Readings from:
- Computer Vision: Algorithms and Applications by Richard Szeliski (available for free on author's page)
- Visual Object Recognition by Kristen Grauman and Bastian Leibe (accessible for free from campus)
- For reference:
- Computer Vision: A Modern Approach by David Forsyth and Jean Ponce
- Computer Vision: Models, Learning, and Inference by Simon Prince (available for free on author's page)
- Pattern Recognition and Machine Learning by Christopher Bishop free pdf
- Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville (available for free online)
Policies
Grading
Grading will be based on the following components:- Homework assignments (3 assignments x 15% each = 45%)
- Project (15% proposal + 15% mid-semester report + 15% presentation = 45%)
- Participation (10%)
Assignment Submission Mechanics
Homework, project reports and presentations slides are due at 11:59pm on the due date. You will submit your homework using Canvas, under "Assignments" and the corresponding homework ID. You should submit a single zip file with source/results/report/slides, as requested. Name the file YourFirstName_YourLastName.zip. Please comment your code! Grades will be posted on Canvas.Note that Canvas will also contain an automatically computed running average column that you can use to gauge how you're doing in the class based on grades that are already available. Generally, Overall scores over 90% map to some type of A, over 80% to B, and over 70% to C.
Participation
Students are expected to regularly attend the class lectures, and should actively engage in in-class discussions. Attendance will not be taken, but keep in mind that if you don't attend, you cannot participate. You can actively participate by, for example, responding to the instructor's or others' questions, asking questions or making meaningful remarks and comments about the lecture, answering others' questions on Piazza, or bringing in relevant articles you saw in the news. The grading rubric will be as follows: 1 = you attended infrequently, 2 = you attended frequently but did not speak in class, 3 = you attended frequently and spoke a few times, 5 = you attended and participated frequently, 4 = in between 3 and 5.Hint: If you do the readings, you will be able to participate more easily, and in a more meaningful way.
Late Policy
On your programming assignments only, you get 3 "free" late days counted in minutes, i.e., you can submit a total of 72 hours late. For example, you can submit one homework 12 hours late, and another 60 hours late. The 72-hour "budget" is total for all programming assignments, NOT per assignment. Once you've used up your free late days, you will incur a penalty of 25% from the total assignment credit possible for each late day. A late day is anything from 1 minute to 24 hours. Note this policy does not apply to components of the project.Collaboration Policy and Academic Honesty
You will do your homework assignments individually. The work you turn in must be your own work. You are allowed to discuss the assignments with your classmates, but do not look at code they might have written for the assignments, or at their written answers. You are also not allowed to search for code on the internet, use solutions posted online unless you are explicitly allowed to look at those, or to use Python's implementation if you are asked to write your own code. When in doubt about what you can or cannot use, ask the instructor! Plagiarism will cause you to fail the class and receive disciplinary penalty. Please consult the University Guidelines on Academic Integrity. All project components involve group work.Note on Disabilities
If you have a disability for which you are or may be requesting an accommodation, you are encouraged to contact both your instructor and Disability Resources and Services (DRS), 140 William Pitt Union, (412) 648-7890, drsrecep@pitt.edu, (412) 228-5347 for P3 ASL users, as early as possible in the term. DRS will verify your disability and determine reasonable accommodations for this course.Note on Medical Conditions
If you have a medical condition which will prevent you from doing a certain assignment, you must inform the instructor of this before the deadline. You must then submit documentation of your condition within a week of the assignment deadline.Statement on Classroom Recording
To ensure the free and open discussion of ideas, students may not record classroom lectures, discussion and/or activities without the advance written permission of the instructor, and any such recording properly approved in advance can be used solely for the student's own private use.[top]
Project
Please enter your team information and topic here when you submit the project proposal.The project is expected to be a new method for an existing problem or an application of techniques we studied in class (or another method) to a new problem that we have not discussed in class. Below are some tips:
- The project should include some amount of novelty. For example, you cannot just re-implement an existing paper or project. You are allowed to use existing code for known methods, but your project is expected to be a significant amount of work and not just a straight-up run of some package.
- You must show that your method is in some sense better (quantitatively) than at least some relatively recent existing methods. For example, you can show that your method achieves superior accuracy in some prediction task compared to prior methods, or that it achieves comparable accuracy but is faster. This outcome is not guaranteed to come out the way you intended during the limited timespan of a course project, so whether or not your outperform the state of the art will only be a small component of your grade. Further, if you propose a sufficiently interesting method, rather than an extremely simple method, it will be less of a problem if your method does not outperform other existing approaches to the problem.
- You are encouraged to use any external expertise you might have (e.g. biology, physics, etc.) so that your project makes the best use of areas you know well, and is as interesting as possible.
- Students are required to work in groups of three for their final project.
- A sample list of topics you can (but don't have to) choose from is provided on Canvas. The instructor has suggested these based on her familiarity with the literature, and these contain sufficient amount of novelty, with a risk level appropriate for this course. At the very list you should read through this list carefully to get a sense of what an appropriate topic might be.
- Milestones for the project include: (1) project proposal and literature review, (2) mid-semester report, and (3) final presentation.
- You will submit a project proposal that describes what you propose to do; why what you propose to do is interesting, important, and challenging; what related prior work exists (i.e. detailed literature review), what data and resources you are going to use; what your high-level idea of the method is; and how you plan to evaluate your method, including against what baselines. A grading rubric is provided below. A good source for learning about what work has been done in your domain of interest are search engines, Google Scholar, and arxiv.org.
- You will also submit a mid-semester project report. The goal is to have designed your method conceptually, with as much detail as possible, and to have made substantial progress in terms of implementing this proposed method (i.e. an initial implementation is complete, you are now iterating on the design). Include a mature and detailed formulation of your method, and ideally some experimental results for your method. Also comment on the challenges you are facing or have faced.
- The final presentation will focus on your experimental validation and findings, including comparisons to relevant baselines. In your presentation, make sure to describe your motivation for the work, as well as relevant prior literature, briefly, and of course your method.
- Your proposal, report and slides (for the presentation) are due on Canvas on the indicated due date (see the schedule), at 11:59pm. Note you cannot use free-late-days for project components.
- Your grades will be based on four rubrics, available below. For the presentation, your grade will be an average of (1) your classmates' scores, and (2) the instructor's scores.
- Combining your final project for this class and another class is generally permitted, but the project proposal and final report should clearly outline what part of the work was done to get credit in this class, and the instructor should approve the proposed breakdown of work between this and another class.
- The presentation should take 12-15 minutes, with 1-2 minutes for questions. Make sure to test your laptops before class as we'll have an extremely tight schedule! The time budgeted for laptop switching is on the order of seconds.
- You are allowed to change your final presentation slides (e.g. to include new results) until the slide submission deadline, but please mark new content in yellow.
- What do you propose to do? (1-5 points)
- Why is what you are proposing interesting and important? (1-5 points)
- Why is it challenging? (1-5 points)
- What is the prior work in this space? Describe in detail (2-3 pages, in your own words) (1-10 points).
- How is your work novel, in the context of this prior work? (1-5)
- What is your high-level idea of how your method will work? (1-5)
- What data do you plan to use? (1-5)
- How will you evaluate the method, i.e. what metrics are you going to use, and what baselines are you going to compare to? (1-5)
- Give a (1) conservative and (2) an ambitious schedule of milestones for your project. (1-5)
- What is your proposed approach? Describe in detail (3-4 pages). (1-10)
- Why should your approach work well for this task? (1-5)
- In what ways is your proposed approach ambitious? (1-5)
- What progress have you made thus far in implementing your method? Describe in detail (1-2 pages). (1-10)
- What challenges have you encountered along the way? (1-5)
- What are your next steps? Describe in detail (1-2 pages). (1-10)
- What metrics are you going to use to evaluate your method? What datasets are you going to use? (These may have changed since the proposal stage.) What are your experimental results so far, if any? (1-5)
- How well did the authors (presenters) explain what problem they are trying to solve? (1-5 points)
- How well did they explain why this problem is important and/or challenging? (1-5)
- How clearly was prior work described? How well did the authors explain how their proposed work is different than prior work? (1-5)
- How clearly did the authors describe their proposed approach? (1-10)
- How novel and ambitious is the proposed approach? (1-5)
- How well did the authors describe their experimental validation? How informative were the figures used? (1-5)
- Were all/most relevant experimental settings (e.g. datasets, tasks) and baselines (competitor methods) included in the experimental validation? (1-5)
- To what extent is the performance of the proposed method satisfactory? (1-5)
- How informative were the conclusions the authors drew about their method’s performance relative to other methods? How sensible was the discussion of limitations? How interesting was the discussion of future work? (1-5)
- Suggested list on Canvas.
- Look at the datasets and tasks below.
- Look at the topics in the programs of some of the recent computer vision conferences: CVPR 2020 (with papers downloadable here) and ICCV 2019 (with papers here).
- Check out project resources and topics in a related class here.
- Read some paper abstracts on this page.
Schedule
Date | Chapter | Topic | Readings | Lecture slides | Due |
1/19 | Basics | Introduction | Szeliski Sec. 1.1-1.2 | pptx pdf | |
1/21 | |||||
1/26 | Filters | Szeliski Sec. 3.2, 10.5, 4.1.1 | pptx pdf | ||
1/28 | |||||
2/2 | Features | Szeliski Sec. 4.1, Grauman/Leibe Sec. 3, 4.2.1; feature survey Sec. 1,3.2,7; Lowe IJCV 2004 |
pptx pdf | ||
2/4 | |||||
2/9 | Grouping and transformations | Szeliski Sec. 2.1, 3.6.1, 4.2, 4.3.2, 5.3-4, 6.1.4, 7.2, 11.1.1; Grauman/Leibe Sec. 5.1, 5.2 | pptx pdf | ||
2/11 | |||||
2/16 | HW1 due | ||||
2/18 | Classification (SVMs, CNNs) |
Bishop
PRML
Sec. 1.1, Bishop PRML Sec. 7.1; Karpathy Module 1 and Module 2; Krizhevsky NIPS 2012, Zeiler ECCV 2014; Pytorch tutorial (Canvas) |
pptx pdf | ||
2/25 | proposal due | ||||
3/2 | |||||
3/4 | |||||
3/9 | |||||
3/11 | |||||
3/16 | Classics | Object detection (supervised, weakly-supervised, across domains) | Szeliski Sec. 14.1, 14.4; Grauman/Leibe Sec. 8, 9, 10.2.1.1, 10.3.3, 11.1,2,5; Felzenszwalb PAMI 2010, Girshick CVPR 2014, Ren NIPS 2015, Redmon CVPR 2016, Zhou CVPR 2016, Harwath ECCV 2018, Ye ICCV 2019, Ren CVPR 2020, Peng ICCV 2019, Hoffman ICML 2018 | pptx pdf | |
3/18 | |||||
3/23 | |||||
3/25 | HW2 due | ||||
3/30 | Vision and language | blog1,
blog2, Karpathy CVPR 2015, Venugopalan CVPR 2017, Wu CVPR 2016, Narasimhan ECCV 2018, Mao ICLR 2019, Miech CVPR 2020 |
pptx pdf | ||
4/1 | report due | ||||
4/6 | |||||
4/8 | Frontiers | Self-supervised and embodied learning | Doersch ICCV 2015, Jayaraman ICCV 2015, Pinto ECCV 2016, Mnih 2013, Caicedo ICCV 2015, Zhu ICRA 2017, Das CVPR 2018 | pptx pdf | |
4/13 | |||||
4/15 | Project presentations | ||||
4/20 | participation notes | ||||
4/22 | HW3 due, slides due | ||||
4/27 | Projects discussion, postmortem |
[top]
Resources
This course was inspired by the following courses:- Computer Vision by Kristen Grauman, UT Austin, Spring 2011
- Computer Vision by Derek Hoiem, UIUC, Spring 2015
- Convolutional Neural Networks for Visual Recognition by Fei-Fei Li, Andrej Karpathy, Justin Johnson, and Serena Young, Stanford University, Spring 2018
- Microsoft COCO (Common Objects in Context) (object recognition, segmentation, image description)
- ImageNet (object recognition)
- SUN Database (scenes)
- Caltech-UCSD Birds 200 (fine-grained object recognition)
- MSRC Annotations (active learning)
- Animals with Attributes (attribute-based recognition)
- a-Pascal + a-Yahoo (attribute-based recognition)
- Shoes (attribute-based search)
- INRIA Movie Actions (action recognition)
- ADL (ego-centric action recognition)
- Action Quality (evaluating action quality)
- CarDb Historical Cars (style classification of cars)
- Recognizing Image Style (photographic style classification)
- Judd gaze (visual saliency prediction)
- Visual Persuasion (predicting subtle messages in images)
- Advertisements: Images and Videos (understanding what the ad prompts of the viewer and why)
- VQA (visual question-answering)
- Recognition datasets list compiled by Kristen Grauman
- Human activity datasets list compiled by Chao-Yeh Chen
- LIBSVM (by Chih-Chung Chang and Chih-Jen Lin)
- SVM Light (by Thorsten Joachims)
- VLFeat (feature extraction, tutorials and more, by Andrea Vedaldi)
- TensorFlow (deep learning framework by Google)
- Caffe (deep learning framework by Yangqing Jia et al.)
- PyTorch (another popular deep learning framework)
- Keras (deep learning library)