Electrical Engineering > STUDY GUIDE > ECE 403/503 Optimization for Machine Learning (All)

ECE 403/503 Optimization for Machine Learning

Document Content and Description Below

ECE 403/503 Optimization for Machine Learning Department of Electrical and Computer Engineering University of Victoria Online link: http://www.ece.uvic.ca/~wslu/403/index.html Office: EOW 427 E-mail: [email protected] May 20192 Instructor Office Hours Lei Zhao Days: Wednesdays Phone: 778-967-2019 Time: 15:00 – 17:00 E-mail: [email protected] Location: EOW 419 URL: http://studentweb.uvic.ca/~leizhao Lectures Days: Tuesday, Wednesday, Friday Time: 1:30 – 2:20pm Location: Engineering Computer Science Building (ECS) A108 Labs for ECE 403 & 503 B01 Thu, 2:30-5:20 pm, ELW326, Sep 26, Oct 10, Oct 31, Nov 21 Required Text Course Notes Assessment Assignments: 10% Labs (for ECE 403) 15% Labs & Project (for ECE 503) 15% Mid-term exam 20% Date: Wednesday, October 30, 13:30-14:20 Final exam 55% Other Things to Note • Course web site: http://studentweb.uvic.ca/~leizhao/403.html. • About assignments: - There will be 7 assignments. - Only paper copies are accepted. - Please clearly print your name and V-number on the first page of your work. - Submit your assignment on the due day by 4:00 pm. 2 submission options: ◊ Submit your work to the instructor at class; or ◊ Drop it into the drop box. References Requirement Software3 Chapter 1 Optimization Problems in Machine Learning 1.1 Optimization Problems 1.2 Types of Data Sets 1.3 Feature Extraction 1.4 Examples of Unsupervised Learning 1.5 Examples of Supervised Learning Problems Chapter 2 Basic Concepts and Optimization Algorithms 2.1 Gradient and Hessian 2.2 Optimality Conditions 2.3 Gradient Descent Algorithm 2.4 Accelerated Gradient Algorithms 2.5 Stochastic Gradient Algorithms 2.6 Newton Algorithm 2.7 Quasi-Newton Algorithms Problems Chapter 3 Applications in Machine Learning 3.1 Classification by Minimizing Hinge Loss 3.2 Softmax Regression 3.3 Kernel Methods Problems Appendix A Basic Principles and Techniques of Linear Algebra Appendix B Elements of Probability Theory4 References [1] L. Bottou, F. E. Curtis, and J. Nocedal, “Optimization methods for machine learning,” SIAM Review, vol. 60, no. 2, pp. 223-311, 2018. [2] Y. LeCun, C. Cortes, and C. J. C. Burges, “The MNIST database,” see the web link: http://yann.lecun.com/exdb/mnist/ [3] UCI Machine Learning, http://archive.ics.uci.edu/ml, University of California Irvine, School of Information and Computer Science. [4] C. M. Bishop, Neural Networks for Pattern Recognition, Clarendon Press, Oxford, 1995. [5] G. H. Golub and C. F. Van Loan, Matrix Computations, 2nd ed., Johns Hopkins University Press, 1989. [6] D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature, vol. 401, pp. 788–791, 21Oct. 1999. [7] N. Gillis, “The why and how of nonnegative matrix factorization,” in Regularization, Optimization, Kernels, and Support Vector Machines, J. A. K. Suykens, M. Signoretto, and A. Argyriou Eds., pp. 257-291, CRC Press, 2014. [8] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, 2nd ed., Springer, 2009. [9] A. Antoniou and W.-S. Lu, Practical Optimization – Algorithms and Engineering Applications, Springer 2007. [10] S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, 2004. [11] Y. Nesterov, “A method for solving a convex programming problem with rate of convergence O(1/k2),” Soviet Math. Doklady, vol. 269, no. 3, pp. 543–547, 1983, in Russian. [12] Y. Nesterov, Introductory Lectures on Convex Optimization — A Basic Course, Norwell, MA.: Kluwer Academic Publishers, 2004. [13] M. Grant and S. Boyd, CVX: MATLAB Software for Disciplined Convex Programming, version 2.0, Sept. 2013, online available: http://cvxr.com/cvx [14] C. M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006. [15] W. N. Street, W. H. Wolberg, and O. L. Mangasarian, “Nuclear feature extraction for breast tumor diagnosis,” in IS?T/SPIE Int. Symp. Electronic Imaging: Science and Technology, vol. 1905, pp. 861-870, San Jose, CA., 1993. [16] O. L. Mangasarian, W. N. Street, and W. H. Wolberg, “Breast cancer diagnosis and prognosis via linear programming,” AAAI Tech. Report SS-94-01, 1994.5 • Requirements (a) The mathematical literacy for taking this cause includes ◊ basic linear algebra (MATH 101) ◊ calculus (MATH 100, 101, and 200) ◊ basic probability theory (STAT 254) (b) Software ◊ Familiarity with MATLAB is essential for better understanding course material, doing homework, performing laboratory experiments, conducting projects, and so on. Occasionally we use CVX. CVX is a MATLAB software for disciplined convex optimization. Assuming you have properly installed MATLAB on your PC, you can download and install and use CVX on your computer as a MATLAB toolbox. ◊ To download CVX, go to the web link http://cvxr.com/cvx/download/ , read the instructions carefully, choose an appropriate download option and follow the installation steps exactly. An informative user guide for version 2.1 of the software is available from the web link below: http://web.cvxr.com/cvx/doc/ ◊ Both MATLAB and CVX are available on the PCs in the laboratory room. It is a great idea to install these software packages on your personal computer as well for the convenience of using the software elsewhere.6 Chapter 1 Optimization Problems in Machine Learning 1.1 Optimization Problems The world is so much quantified these days. And we are told that we now live in a “big data” era as search engines, recommendation platforms, and speech and image recognition software become an indispensable part of modern society. Based on statistical techniques and by capitalizing on increasingly powerful computing resources and availability of datasets of immense size, machine learning (ML) has demonstrated a dramatic rise by playing a key role in many recent technological advances that yield undeniable performance superiority as well as societal, economic, and scientific impacts [1]. From a technical perspective, ML focuses on the development of computer programs that can access data and use it to learn for themselves. The process of learning begins with observations or data, such as examples, direct experience, or instruction, in order to look for patterns in data and make better decisions in the future (often in terms of unseen data), based on the examples provided. In brief words, ML allows computers to learn automatically without human intervention or assistance and adjust actions accordingly. One of the pillars of ML is mathematical optimization which, in this context, involves numerical computations of parameters for a system designed to make decisions based on available data. During the optimization process, these parameters are chosen to be optimal with respect to a given learning problem [1]. ML applications deal with a large variety of data sizes, types and structures, as well as a suite of criteria for performance evaluations. As a result, the problem sizes as well as the formulations and solution techniques can vary wildly from standard leastsquares to highly nonlinear and nonconvex optimization methods. Nevertheless, practically all optimization problems for ML applications fit into the general formulation minimize ( ) (1.1a) subject to: ( ) 0 for 1,2, , (1.1b) ( ) 0 i j f a i p c    x x x for 1,2, , j q  (1.1c) where f ( ) x is an objective (or loss) function, ai( ) 0 x  for i = 1, 2, …, p are equality constraints, cj( ) 0 x  for j = 1, 2, …, q are inequality constraints. According to (1.1), optimization is an algorithmic procedure for identifying a best set of parameters as vector x that reduces the loss (measure by function f ( ) x ) to minimum while meeting all the requirements that are set for x to satisfy. This course is about basic concepts and optimization methods that are found particularly useful for ML applications. Our primary goal is to address the following two questions: 1. How do optimization problems arise in ML applications? 2. What are the most successful optimization methods for ML problems of various scales? The first question is addressed in the rest of this chapter, while

[Show More]

Last updated: 5 months ago

Preview 6 out of 31 pages

Buy Now

Instant download

We Accept: