## Recent developments in optimization methods and machine learning applications

Humboldt University, Berlin, winter semester 2019/2020The course will be concentrated mostly on theoretical aspects of first-order methods in optimization, e.g. gradient descent, accelerated gradient descent, mirror descent, AdaGrad. These methods are at their renaissance motivated by current big data world, which leads to large-scale problems with no need of very precise solution. One of the main assumptions in the course will be convexity of the objective function and some background from convex analysis will be given. Another cornerstone is randomness of different nature. One case is when the randomness is present in the problem itself, e.g. in stochastic optimization, for which stochastic gradient descent is usually applied. The second case is when the randomness is added to a deterministic problem by introducing randomization. In this case one can also apply stochastic gradient descent, as well as more sophisticated random coordinate descent or stochastic variance reduction methods. Non-convex optimization will also be discussed as many of the optimization problems in machine learning are non-convex.

Prerequisites: Multivariate calculus, Linear algebra, Basic probability

Exam: 20.02.2020, WIAS Mohrenstraße 39, room 212.

Lecture contents

- Lecture 1 + Seminar 1, 18.10.19

[2], Chapter 1 and Sections 2.2.1-2.2.3.

[3] or [4], Sections 1.1.2-1.1.3. - Lecture 2, 25.10.19

[2], Sections 2.2.5, 2.3.1-2.3.4.

[3] or [4], Section 1.2.1. - Lecture 3, 01.11.19

[2], Section 2.4.

Nesterov's Smoothing, Theorem 1

[3] or [4], Section 3.2.1. - Lecture 4, 08.11.19

[2], Sections 3.1.1-3.1.2

Composite Objective Mirror Descent, Lemma 1, Theorem 2 - Lecture 5, 15.11.19

[2], Section 3.2

Switching subgradient scheme - Lecture 6, 22.11.19

[3] or [4], Section 2.1.1 - 2.1.4

[2], Section 3.1.3

A. Nemirovski, Lecture Notes on information-based complexity of convex programming - Lecture 7, 29.11.19

Y. Nesterov, slide 19 on Dual Gradient Method

[2], Section 3.1.4

M. Jaggi, Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization - Lecture 8, 06.12.19

Accelerated Gradient Method

- Lecture 9, 13.12.19

[2], Section 3.5 - Lecture 10, 10.01.20

[2], Section 3.8, 3.8.1

A. NEMIROVSKI, PROX-METHOD WITH RATE OF CONVERGENCE O(1/T) FOR VARIATIONAL INEQUALITIES WITH LIPSCHITZ CONTINUOUSMONOTONE OPERATORS AND SMOOTH CONVEX-CONCAVE SADDLE POINT PROBLEMS - Lecture 11, 17.01.20

[5], Sections 5.1, 5.2

- Lecture 12, 24.01.20

[5], Section 5.3, Question 11 on p.85, Sections 4.2, 4.3

Homework

- Lecture 1
- Lecture 2
- Lecture 3
- Lecture 4
- Lecture 5
- Lecture 6
- Lecture 8
- Lecture 9
- Lecture 10
- Lecture 11
- Lecture 12

Literature:

**A. Ben-Tal, A. Nemirovski**

*Lectures on Modern Convex Optimization*

**G. Lan**

*Lectures on Optimization Methods for Machine Learning*

**Yu. Nesterov**

*Introductory Lectures on Convex Optimization. A Basic Course*

Springer-Verlag US, 2004**Yu. Nesterov**

*Lectures on convex optimization*

Springer, 2018**John C. Duchi**

*Introductory Lectures on Stochastic Optimization*,

2016