Reading List

Category	Paper	Link
Survey papers	Bengio, Yoshua, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives. (2013): 1-1.	[PDF]
	Bengio, Yoshua. Learning deep architectures for AI. Foundations and trends in Machine Learning 2.1 (2009): 1-127.	[PDF]
	Erhan, D., Bengio, Y., Courville, A., Manzagol, P. A., Vincent, P., & Bengio, S. (2010). Why does unsupervised pre-training help deep learning?. The Journal of Machine Learning Research, 11, 625-660.	[PDF]
Deep belief networks	Salakhutdinov, R. (2009). Learning deep generative models (Doctoral dissertation, University of Toronto).	[PDF] (thesis)
	Salakhutdinov, R., & Hinton, G. E. (2009). Deep boltzmann machines. In International Conference on Artificial Intelligence and Statistics (pp. 448-455).	[PDF]
Large-scale	Le, Q. V., Ranzato, M. A., Monga, R., Devin, M., Chen, K., Corrado, G. S., ... & Ng, A. Y. (2011). Building high-level features using large scale unsupervised learning. arXiv preprint arXiv:1112.6209.	[PDF]
Breakthroughs	Hinton, G. E., Osindero, S. and Teh, Y., A fast learning algorithm for deep belief nets Neural Computation 18:1527-1554, 2006	[PDF]
	Yoshua Bengio, Pascal Lamblin, Dan Popovici and Hugo Larochelle, Greedy Layer-Wise Training of Deep Networks, in J. Platt et al. (Eds), Advances in Neural Information Processing Systems 19 (NIPS 2006), pp. 153-160, MIT Press, 2007	[PDF]
	MarcAurelio Ranzato, Christopher Poultney, Sumit Chopra and Yann LeCun Efficient Learning of Sparse Representations with an Energy-Based Model, in J. Platt et al. (Eds), Advances in Neural Information Processing Systems (NIPS 2006), MIT Press, 2007	[PDF]
	Lee, H., Battle, A., Raina, R., & Ng, A. (2006). Efficient sparse coding algorithms. In Advances in neural information processing systems (pp. 801-808).	[PDF]
	Vincent, P., Larochelle, H., Bengio, Y., & Manzagol, P. A. (2008, July). Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning (pp. 1096-1103). ACM.	[PDF]
Deep learning in NLP: word embeddings	R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu and P. Kuksa. Natural Language Processing (Almost) from Scratch, Journal of Machine Learning Research (JMLR), 2011.	[PDF]
	Mnih, A., & Hinton, G. (2007, June). Three new graphical models for statistical language modelling. In Proceedings of the 24th international conference on Machine learning (pp. 641-648). ACM.	[PDF]
	Turian, J., Ratinov, L., & Bengio, Y. (2010, July). Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (pp. 384-394). Association for Computational Linguistics.	[PDF]
	Mikolov, T., Karafiat, M., Burget, L., Cernocky, J., & Khudanpur, S. (2010). Recurrent neural network based language model. In INTERSPEECH (pp. 1045-1048).	[PDF]
	Mikolov, T., Yih, W. T., & Zweig, G. (2013). Linguistic regularities in continuous space word representations. In Proceedings of NAACL-HLT (pp. 746-751).	[PDF]
	Arisoy, E., Sainath, T. N., Kingsbury, B., & Ramabhadran, B. (2012, June). Deep neural network language models. In Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT (pp. 20-28). Association for Computational Linguistics.	[PDF]
	Bordes, A., Glorot, X., Weston, J., and Bengio, Y. (2012). Joint learning of words and meaning representations for open-text semantic parsing. AIS-TATS2012.	[PDF]
Sentiment classification	Glorot, Xavier, Antoine Bordes, and Yoshua Bengio. Domain adaptation for large-scale sentiment classification: A deep learning approach. Proceedings of the 28th International Conference on Machine Learning (ICML-11). 2011.	[PDF]
	Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011, June). Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1 (pp. 142-150). Association for Computational Linguistics.	[PDF]
	Socher, R., Pennington, J., Huang, E. H., Ng, A. Y., & Manning, C. D. (2011, July). Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 151-161). Association for Computational Linguistics.	[PDF]
Spoken dialogue system	Henderson, Matthew and Thomson, Blaise and Young, Steve, Deep Neural Network Approach for the Dialog State Tracking Challenge, Proceedings of the SIGDIAL 2013 Conference	[PDF]
	Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., ... & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Processing Magazine, IEEE, 29(6), 82-97.	[PDF]
Paraphrase detection	Socher, R., Huang, E. H., Pennin, J., Manning, C. D., & Ng, A. (2011). Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In Advances in Neural Information Processing Systems (pp. 801-809).	[PDF]
Parsing	Socher, R., Bauer, J., Manning, C. D., & Ng, A. Y. (2013). Parsing with compositional vector grammars. In In Proceedings of the ACL conference.	[PDF]