{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "title : \"Optimization\"\n", "subtitle : \n", "author : Paul Schrimpf\n", "date : `j using Dates; print(Dates.today())`\n", "bibliography: \"opt.bib\"\n", "---\n", "\n", "\"Creative\n",
This work is licensed under a Creative\n", "Commons Attribution-ShareAlike 4.0 International License.\n", "\n", "### About this document {-}\n", "\n", "This document was created using Weave.jl. The code is available in\n", "[the course github\n", "repository](https://bitbucket.org/paulschrimpf/econ526). The same\n", "document generates both static webpages and associated jupyter\n", "notebooks. This is meant to accompany [the lecture notes for 526](http://faculty.arts.ubc.ca/pschrimpf/526/526.html). \n", "\n", "$$\n", "\\def\\indep{\\perp\\!\\!\\!\\perp}\n", "\\def\\Er{\\mathrm{E}}\n", "\\def\\R{\\mathbb{R}}\n", "\\def\\En{{\\mathbb{E}_n}}\n", "\\def\\Pr{\\mathrm{P}}\n", "\\newcommand{\\norm}[1]{\\left\\Vert {#1} \\right\\Vert}\n", "\\newcommand{\\abs}[1]{\\left\\vert {#1} \\right\\vert}\n", "\\DeclareMathOperator*{\\argmax}{arg\\,max}\n", "\\DeclareMathOperator*{\\argmin}{arg\\,min}\n", "\\def\\inprob{\\,{\\buildrel p \\over \\rightarrow}\\,} \n", "\\def\\indist{\\,{\\buildrel d \\over \\rightarrow}\\,} \n", "$$\n", "\n", "# Introduction\n", "\n", "The goal of this notebook is to give you some familiarity with numeric\n", "optimization. Understanding the code in this notebook will require\n", "some knowledge of Julia. \n", "\n", "Therefore you might find the following other resources about\n", "Julia useful:\n", "\n", "- [Julia manual](https://docs.julialang.org/)\n", "- [QuantEcon](https://lectures.quantecon.org/jl/)\n", "- [List of learning resources](https://julialang.org/learning/)\n", "\n", "\n", "Numeric optimization is important because many (most) models cannot be\n", "fully solved analytically. Numeric results can be used to complement\n", "analytic ones. Numeric optimization plays a huge role in econometrics. \n", "\n", "In our notes on optimization, we focused on maximization problems\n", "because utility and profit maximization are arguably the most\n", "fundamental optimization problems in economics. In this notebook, we\n", "will focus on minimization problems following the convention in\n", "mathematics, engineering, and most numerical libraries. It is easy\n", "to convert between minimization and maximization, and we hope that\n", "this does not lead to any confusion.\n", "\n", "# Heuristic searches\n", "\n", "The simplest type of optimization algorithm are heuristic\n", "searches. Consider the problem: \n", "\n", "$$\n", "\\min_x f(x)\n", "$$\n", "\n", "with $f:\\R^n \\to \\R$. Heuristic search algorithms consist of \n", "\n", "1. Evaluate $f$ at a collection of points \n", "2. Generate a new candidate point, $x^{new}$. Replace a point\n", " in the current collection with $x^{new}$ if $f(x^{new})$ is small enough. \n", "3. Stop when function value stops decreasing and/or collection of\n", " points become too close together. \n", " \n", "There are many variants of such algorithms with different ways of\n", "generating new points, deciding whether to accept the new point, and\n", "deciding when to stop. Here is a simple implementation and animation\n", "of the above idea. In the code below, new points are drawn randomly\n", "from a normal distribution, and new points are accepted whenever\n", "$f(x^{new})$ is smaller than the worst existing function value." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2.3032762485211047e-8, [0.9998834413084575, 0.9996697013783358], 78, \"Convergence.\", Animation(\"/tmp/jl_M6WsCz\", [\"000001.png\", \"000002.png\", \"000003.png\", \"000004.png\", \"000005.png\", \"000006.png\", \"000007.png\", \"000008.png\", \"000009.png\", \"000010.png\" … \"000069.png\", \"000070.png\", \"000071.png\", \"000072.png\", \"000073.png\", \"000074.png\", \"000075.png\", \"000076.png\", \"000077.png\", \"000078.png\"]))" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using Distributions # julia functions come in packages, we make them available with using\n", "# to install a package type `!add Distributions`\n", "# or `using Pkg; Pkg.add(\"Distributions\")`\n", "using Plots\n", "\n", "# Lines starting with # are comments.\n", "\n", "# The following block is a docstring. It is what would be displayed if\n", "# you enter `?minrandomsearch\n", "\"\"\"\n", " minrandomsearch(f, x0, npoints; var0=1.0, ftol = 1e-6,\n", " vtol = 1e-4, maxiter = 1000,\n", " vshrink=0.9, xrange=[-2., 3.],\n", " yrange=[-2.,6.])\n", " \n", " Find the minimum of function `f` by random search. \n", " \n", " Creates an animation illustrating search progress.\n", " \n", " Inputs:\n", " \n", " - `f` function to minimizie\n", " - `x0` starting value\n", " - `npoints` number of points in cloud\n", " - `var0` initial variance of points\n", " - `ftol` convergence tolerance for function value. Search terminates if both function change is less than ftol and variance is less than vtol.\n", " - `vtol` convergence tolerance for variance. Search terminates if both function change is less than ftol and variance is less than vtol.\n", " - `maxiter` maximum number of iterations\n", " - `vshrink` after every 100 iterations with no function improvement, the variance is reduced by this factor\n", " - `xrange` range of x-axis in animation\n", " - `yrange` range of y-axis in animation\n", " - `animate` whether to create animation\n", "\n", " Output:\n", "\n", " - `(fmin, xmin, iter, info, anim)` tuple consisting of minimal function\n", " value, minimizer, number of iterations, convergence info, and an animation\n", "\n", "\"\"\"\n", "function minrandomsearch(f, x0, npoints; var0=1.0, ftol = 1e-6,\n", " vtol = 1e-4, maxiter = 1000,\n", " vshrink=0.9, xrange=[-2., 3.],\n", " yrange=[-2.,6.], animate=true)\n", " var = var0 # current variance for search\n", " oldbest = Inf # smallest function value\n", " xbest = x0 # x with smallest function vale\n", " newbest = f(xbest) \n", " iter = 0 # number of iterations\n", " noimprove = 0 # number of iterations with no improvement\n", "\n", " animate = (animate && length(x0)==2)\n", "\n", " if animate\n", " # make a contour plot of the function we're minimizing. This is for\n", " # illustrating; you wouldn't have this normally\n", " x = range(xrange[1],xrange[2], length=100)\n", " y = range(yrange[1],yrange[2], length=100)\n", " c = contour(x,y,(x,y) -> log(f([x,y])))\n", " anim = Animation()\n", " else\n", " anim = nothing\n", " end\n", " \n", " while ((oldbest - newbest > ftol || var > vtol) && iter<=maxiter)\n", " oldbest = newbest\n", " x = rand(MvNormal(xbest, var),npoints)\n", "\n", " if animate\n", " # plot the search points\n", " p = deepcopy(c)\n", " scatter!(p, x[1,:], x[2,:], markercolor=:black, markeralpha=0.5, legend=false, xlims=xrange, ylims=yrange)\n", " end\n", " \n", " fval = mapslices(f,x, dims=[1])\n", " (newbest, i) = findmin(fval)\n", " if (newbest > oldbest)\n", " noimprove+=1\n", " newbest=oldbest\n", " else\n", " xbest = x[:,i[2]]\n", " noimprove = 0\n", " end\n", "\n", " if animate\n", " # plot the best point so far\n", " scatter!(p, [xbest[1]],[xbest[2]], markercolor=:red, legend=false)\n", " end\n", " \n", " if (noimprove > 10) # shrink var\n", " var *= vshrink\n", " end\n", "\n", " frame(anim) # add frame to animation\n", " \n", " iter += 1\n", " end\n", " if (iter>maxiter)\n", " info = \"Maximum iterations reached\"\n", " else\n", " info = \"Convergence.\"\n", " end\n", " return(newbest, xbest, iter, info, anim)\n", "end\n", "\n", "\"\"\"\n", " banana(a,b)\n", " \n", " Returns the Rosenbrock function with parameters a, b.\n", "\"\"\"\n", "function banana(a,b)\n", " x->(a-x[1])^2+b*(x[2]-x[1]^2)^2\n", "end\n", "f = banana(1.0,1.0)\n", "\n", "x0 = [-2.0, 3.0]\n", "result = minrandomsearch(f, x0, 20, var0=0.1, vshrink=0.5, vtol=1e-3 )" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "┌ Info: Saved animation to \n", "│ fn = /home/paul/econ526/01-optimization/julia/randsearch.gif\n", "└ @ Plots /home/paul/.julia/packages/Plots/h3o4c/src/animation.jl:95\n" ] }, { "data": { "text/html": [ "" ], "text/plain": [ "Plots.AnimatedGif(\"/home/paul/econ526/01-optimization/julia/randsearch.gif\")" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import Base64:stringmime\n", "gif(result[5], \"randsearch.gif\", fps=5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![random search](./randsearch.gif \"Random search\")\n", "\n", "There are many other heuristic search algorithms. A popular\n", "deterministic one is the Nelder-Mead simplex. Popular heuristic\n", "search algorithms that include some randomness include simulated\n", "annealing and particle swarm. Each of the three algorithms just\n", "mentioned are available in\n", "[Optim.jl](https://julianlsolvers.github.io/Optim.jl/stable/#algo/nelder_mead/). These\n", "heuristic searches have the advantage that they only function values\n", "(as opposed to also requiring gradients or hessians, see\n", "below). Some heuristic algorithms, like simulated annealing, can be\n", "shown to converge to a global (instead of local) minimum under\n", "appropriate assumptions. Compared to algorithms that use more\n", "information, heuristic algorithms tend to require many more function\n", "evaluations. \n", "\n", "# Gradient descent\n", "\n", "Gradient descent is an iterative algorithm to find a local minimum. As\n", "the name suggests, it consists of descending toward a minimum in the\n", "direction opposite the gradient. Each step, you start at some $x$ and\n", "compute $x_{new}$\n", "\n", "1. Given current $x$, compute $x_{new} = x - \\gamma Df_{x}$\n", "2. Adjust $\\gamma$ depending on whether $f(x_{new}) log(f([x,y])))\n", " anim = Animation()\n", " else\n", " anim = nothing\n", " end\n", " g = ForwardDiff.gradient(f,xold)\n", " \n", " while(iter < maxiter && ((xchange>xtol) || (fchange>ftol) || (stuck>0)\n", " || norm(g)>gtol) )\n", " g = ForwardDiff.gradient(f,xold)\n", " x = xold - γ*g\n", " fnew = f(x)\n", "\n", " if animate\n", " scatter!(c, [xold[1]],[xold[2]], markercolor=:red, legend=false,\n", " xlims=xrange, ylims=yrange) \n", " quiver!(c, [xold[1]],[xold[2]], quiver=([-γ*g[1]],[-γ*g[2]]), legend=false,\n", " xlims=xrange, ylims=yrange)\n", " frame(anim)\n", " end\n", " \n", " if (fnew>=fold)\n", " γ*=0.5\n", " improve = 0 \n", " stuck += 1\n", " if (stuck>=10)\n", " break\n", " end\n", " else\n", " stuck = 0\n", " improve += 1\n", " if (improve>5)\n", " γ *= 2.0\n", " improve=0\n", " end\n", " xold = x\n", " fold = fnew\n", " end\n", " xchange = norm(x-xold)\n", " fchange = abs(fnew-fold)\n", " iter += 1\n", " end\n", " if (iter >= maxiter)\n", " info = \"Maximum iterations reached\"\n", " elseif (stuck>0)\n", " info = \"Failed to improve for \" * string(stuck) * \" iterations.\"\n", " else\n", " info = \"Convergence.\"\n", " end\n", " return(fold, xold, iter, info, anim) \n", "end" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "┌ Info: Saved animation to \n", "│ fn = /home/paul/econ526/01-optimization/julia/graddescent.gif\n", "└ @ Plots /home/paul/.julia/packages/Plots/h3o4c/src/animation.jl:95\n" ] }, { "data": { "text/html": [ "" ], "text/plain": [ "Plots.AnimatedGif(\"/home/paul/econ526/01-optimization/julia/graddescent.gif\")" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result = graddescent(f, x0)\n", "gif(result[5], \"graddescent.gif\", fps=5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![gradient descent](./graddescent.gif \"Gradient descent\")\n", "\n", "\n", "Although an appealing and intuitive idea, the above example\n", "illustrates that gradient descent can perform surprisingly poorly in\n", "some cases. Nonetheless, gradient descent is useful for some\n", "problems. Notably, (stochastic) gradient descent is used to fit neural\n", "networks, where the dimension of `x` is so large that computing the\n", "inverse hessian in (quasi) Newton's method is prohibitively time\n", "consuming. \n", "\n", "# Newton's method\n", "\n", "Newton's method and its variations are often the most efficient\n", "minimization algorithms. Newton's method updates $x$ by minimizing a\n", "second order approximation to $f$. Specifically:\n", "\n", "1. Given $x$ set $x_{new} = x - (D^2f_x)^{-1} Df_x$\n", "2. Repeat until $\\norm{Df_{x}}$, $\\norm{x-x_{new}}$, and/or\n", " $\\abs{f(x)-f(x_{new})}$ small." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "newton" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"\"\"\n", " newton(f, x0;ftol = 1e-6,\n", " xtol = 1e-4, gtol=1-6, maxiter = 1000, \n", " xrange=[-2., 3.],\n", " yrange=[-2.,6.], animate=true)\n", "\n", " Find the minimum of function `f` by Newton's method.\n", " \n", " Inputs:\n", " \n", " - `f` function to minimizie, must be compatible with ForwardDiff\n", " - `x0` starting value\n", " - `ftol` tolerance for function value\n", " - `xtol` tolerance for x\n", " - `gtol` tolerance for gradient. Convergence requires meeting all three tolerances.\n", " - `maxiter` maximum iterations\n", " - `xrange` x-axis range for animation\n", " - `yrange` y-axis range for animation\n", " - `animate` whether to create animation\n", "\n", " Output:\n", "\n", " - `(fmin, xmin, iter, info, anim)` tuple consisting of minimal function\n", " value, minimizer, number of iterations, convergence info, and animation\n", "\n", "\"\"\"\n", "function newton(f, x0; γ0=1.0, ftol = 1e-6,\n", " xtol = 1e-4, gtol=1-6, maxiter = 1000, \n", " xrange=[-2., 3.],\n", " yrange=[-2.,6.], animate=true)\n", " fold = f(x0)\n", " xold = x0\n", " xchange=Inf\n", " fchange=Inf\n", " iter = 0\n", " stuck=0\n", "\n", " animate=animate && length(x0)==2\n", "\n", " if animate\n", " # make a contour plot of the function we're minimizing. This is for\n", " # illustrating; you wouldn't have this normally\n", " c = contour(range(xrange[1],xrange[2], length=100),\n", " range(yrange[1],yrange[2], length=100),\n", " (x,y) -> log(f([x,y])))\n", " anim = Animation()\n", " end\n", " \n", " g = ForwardDiff.gradient(f,xold)\n", " \n", " while(iter < maxiter && ((xchange>xtol) || (fchange>ftol) || (stuck>0)\n", " || norm(g)>gtol) )\n", " g = ForwardDiff.gradient(f,xold)\n", " H = ForwardDiff.hessian(f,xold)\n", " Δx = - inv(H)*g\n", " x = xold + Δx\n", " fnew = f(x)\n", "\n", " if animate\n", " scatter!(c, [xold[1]],[xold[2]], markercolor=:red, legend=false,\n", " xlims=xrange, ylims=yrange) \n", " quiver!(c, [xold[1]],[xold[2]], quiver=([Δx[1]],[Δx[2]]), legend=false,\n", " xlims=xrange, ylims=yrange)\n", " frame(anim)\n", " end\n", " \n", " if (fnew>=fold)\n", " stuck += 1\n", " if (stuck>=10)\n", " break\n", " end\n", " else\n", " stuck = 0\n", " xold = x\n", " fold = fnew\n", " end\n", " xchange = norm(x-xold)\n", " fchange = abs(fnew-fold)\n", " iter += 1\n", " end\n", " if (iter >= maxiter)\n", " info = \"Maximum iterations reached\"\n", " elseif (stuck>0)\n", " info = \"Failed to improve for \" * string(stuck) * \" iterations.\"\n", " else\n", " info = \"Convergence.\"\n", " end\n", " return(fold, xold, iter, info, anim) \n", "end" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "┌ Info: Saved animation to \n", "│ fn = /home/paul/econ526/01-optimization/julia/newton.gif\n", "└ @ Plots /home/paul/.julia/packages/Plots/h3o4c/src/animation.jl:95\n" ] }, { "data": { "text/html": [ "" ], "text/plain": [ "Plots.AnimatedGif(\"/home/paul/econ526/01-optimization/julia/newton.gif\")" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result = newton(f, x0)\n", "gif(result[5], \"newton.gif\", fps=5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![newton](./newton.gif \"Newton's method\")\n", "\n", "\n", "Newton's method tends to take relatively few iterations to converge\n", "for well-behaved functions. It does have the disadvantage that hessian\n", "and its inverse can be time consuming to compute, especially when the\n", "dimension of $x$ is large. Newton's method can be unstable for\n", "functions that are not well approximated by their second\n", "expansion. This problem can be mitigated by combining Newton's method\n", "with a line search or trust region. \n", "\n", "## Line search\n", "\n", "Line searches consist of approximately minimizing $f$ along a given\n", "direction instead of updating $x$ with a fixed step size. For Newton's\n", "method, instead of setting $x_{new} = x - (D^2f_x)^{-1} Df_x$, set \n", "$x_{new} \\approx \\argmin_{\\delta} f(x - \\delta (D^2f_x)^{-1} Df_x)$ where\n", "$\\delta$ is a scalar. This one dimensional problem can be solved\n", "fairly quickly. Line search can also be combined with gradient\n", "descent. \n", "\n", "## Trust region\n", "\n", "Instead of setting \n", "$$\n", "x_{new} = x - (D^2f_x)^{-1} Df_x =\n", "\\argmin_{\\tilde{x}} f(x) + Df_x (\\tilde{x} - x) + \\frac{1}{2}\n", "(\\tilde{x}-x)^T Df_x (\\tilde{x} - x)\n", "$$\n", "to the unconstrained minimizer of a local second order approximation,\n", "trust region methods introduce an region near $x$ where the\n", "approximation is trusted, and set\n", "$$\n", "x_{new} = \\argmin_{\\tilde{x} \\in TR(x)} f(x) + Df_x (\\tilde{x} - x) + \\frac{1}{2}\n", "(\\tilde{x}-x)^T D^2 f_x (\\tilde{x} - x).\n", "$$\n", "Often $TR(x) = \\{\\tilde{x} : \\norm{x - \\tilde{x}} < r\\}$. The radius\n", "of the trust region is then increased or decreased depending on\n", "$f(x_{new})$. \n", "\n", "## Quasi-Newton \n", "\n", "Quasi-Newton methods (in particular the BFGS algorithm) are probably\n", "the most commonly used nonlinear optimization algorithm. Quasi-Newton\n", "methods are similar to Newton's method, except instead of evaluating\n", "the hessian directly, quasi-Newton methods build an approximation to\n", "the hessian from repeated evaluations of $Df_x$ at different $x$.\n", "\n", "Optim.jl contains all the algorithms mentioned above. [Their advice on\n", "choice of algorithm is worth\n", "following.](https://julianlsolvers.github.io/Optim.jl/stable/#user/algochoice/). \n", "\n", "## Details matter in practice\n", "\n", "In each of the algorithms above, we were somewhat cavalier with\n", "details like how to adjust step sizes and trust regions and what it\n", "means to approximately minimize during a line search. In practice\n", "these details can be quite important for how long an algorithm takes\n", "and whether it succeeds or fails. Different implementations of\n", "algorithms have different details. Often the details can be adjusted\n", "through some options. It can be worthwhile to try multiple\n", "implementations and options to get the best performance. \n", "\n", "\n", "# Constrained optimization\n", "\n", "Constrained optimization is a bit harder than unconstrained, but uses\n", "similar ideas. For simple bound constraints, like $x\\geq 0$ it is\n", "often easiest to simply transfrom to an unconstrained case by\n", "optimizing over $y = \\log(x)$ instead. \n", "\n", "For problems with equality constraints, one can apply Newton's method\n", "to the first order conditions. \n", "\n", "The difficult case is when there are inequality constraints. Just like\n", "when solving analytically, the difficulty is figuring out which\n", "constraints bind and which do not. \n", "For inequality constraints, we will consider problems written in the form:\n", "$$\n", "\\min_{x \\in \\R^n} f(x) \\text{ s.t. } c(x) \\geq 0 \n", "$$\n", "\n", "## Interior Point Methods\n", "\n", "Interior point methods circumvent the problem of figuring out which\n", "constraints bind by approaching the optimum from the interior of the\n", "feasible set. To do this, the interior point method applies Newton's\n", "method to a modified version of the first order condition. The\n", "unmodified first order conditions can be written\n", "$$\n", "\\begin{align*}\n", "0 = & Df_x - \\lambda^T Dc_x \\\\\n", "0 = & \\lambda_i c_i(x) \\\\\n", "\\lambda \\geq & 0 \\\\\n", "c(x) \\geq & 0\n", "\\end{align*}\n", "$$\n", "A difficulty with these conditions is that solving them can require\n", "guessing and checking which combinations of constraints bind and which\n", "do not. Interior point methods get around this problem by beginning\n", "with an interior $x$ and $\\lambda$ such that $\\lambda>0$ and\n", "$c(x)>0$. They are then updated by applying Newton's method to the\n", "equations\n", "$$\n", "\\begin{align*}\n", "0 = & Df_x - \\lambda^T Dc_x \\\\\n", "\\mu = & \\lambda_i c_i(x) \\\\\n", "\\end{align*}\n", "$$\n", "where there is now a $\\mu$ in place of $0$ in the second equation. $x$\n", "and $\\lambda$ are updated according to Newton's method for this system\n", "of equations. In particular, \n", "$x_{new} = x + \\Delta_x$ and $\\lambda_{new}= \\lambda + \\Delta_\\lambda$, where\n", "$$\n", "\\begin{align*}\n", "\\begin{pmatrix} - ( Df_x - \\lambda^T Dc_x) \\\\\n", "\\mu 1_m - diag(c(x)) \\lambda \n", "\\end{pmatrix} = \\begin{pmatrix} \n", " D^2 f_x - D^2 (\\lambda c)_x & -Dc_x^T \\\\\n", " \\lambda Dc_x & diag(c(x)) \n", "\\end{pmatrix} \\begin{pmatrix}\n", "\\Delta_x \\\\\n", "\\Delta_\\lambda\n", "\\end{pmatrix}\n", "\\end{align*}\n", "$$\n", "Over iterations $\\mu$ is gradually decreased toward\n", "$0$. Here is one simple implementation." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "constraint (generic function with 1 method)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"\"\"\n", " interiorpoint(f, x0, c; tol=1e-4, maxiter = 1000,\n", " μ0 = 1.0, μfactor = 0.2,\n", " xrange=[-2., 3.],\n", " yrange=[-2.,6.], animate=true)\n", "\n", " Find the minimum of function `f` subject to `c(x) >= 0` using a\n", " primal-dual interior point method.\n", " \n", " Inputs:\n", " \n", " - `f` function to minimizie\n", " - `x0` starting value. Must have c(x0) > 0\n", " - `c` constraint function. Must return an array.\n", " - `tol` convergence tolerance\n", " - `μ0` initial μ\n", " - `μfactor` how much to decrease μ by\n", " - `xrange` range of x-axis for animation\n", " - `yrange` range of y-axis for animation\n", " - `animate` whether to create an animation (if true requires length(x)==2)\n", " - `verbosity` higher values result in more printed output during search. 0 for no output, any number > 0 for some. \n", " \n", " Output:\n", "\n", " - `(fmin, xmin, iter, info, animate)` tuple consisting of minimal function\n", " value, minimizer, number of iterations, and convergence info\n", "\n", "\"\"\"\n", "function interiorpoint(f, x0, c; tol=1e-4, maxiter = 1000,\n", " μ0 = 1.0, μfactor = 0.2,\n", " xrange=[-2., 3.],\n", " yrange=[-2.,6.], animate=true, verbosity=0)\n", " fold = f(x0)\n", " xold = x0\n", " all(c(x0).>0) || error(\"interiorpoint requires a starting value that strictly satisfies all constraints\")\n", " μ = μ0\n", " λ = μ./c(x0)\n", " xchange=Inf\n", " fchange=Inf\n", " iter = 0\n", " μiter = 0\n", " stuck=0\n", "\n", " animate = animate && length(x0)==2\n", " if animate\n", " # make a contour plot of the function we're minimizing. This is for\n", " # illustrating; you wouldn't have this normally\n", " ct = contour(range(xrange[1],xrange[2], length=100), \n", " range(yrange[1],yrange[2], length=100),\n", " (x,y) -> log(f([x,y])))\n", " plot!(ct, xrange, 2.5 .- xrange) # add constraint \n", " anim = Animation()\n", " end\n", " L(x,λ) = f(x) - λ'*c(x)\n", " foc = [ForwardDiff.gradient(x->L(x,λ),xold); λ.*c(xold)]\n", " while(iter < maxiter && ((xchange>tol) || (fchange>tol) || (stuck>0)\n", " || norm(foc)>tol || μ>tol) )\n", " # Calculate the direction for updating x and λ\n", " Dc = ForwardDiff.jacobian(c, xold)\n", " cx = c(xold)\n", " foc = ForwardDiff.gradient(x->L(x,λ),xold)\n", " H = ForwardDiff.hessian(x->L(x,λ),xold)\n", " Δ = [H -Dc'; λ'*Dc diagm(cx)] \\ [-foc; μ .- cx.*λ]\n", "\n", " # Find a step size such that λ>=0 and c(x)>=0\n", " # The details here could surely be improved\n", " α = 1.0\n", " acceptedstep = false\n", " λold = copy(λ)\n", " x = copy(xold)\n", " while (α > 1e-10)\n", " x = xold + α*Δ[1:length(xold)]\n", " λ = λold + α*Δ[(length(xold)+1):length(Δ)]\n", " if (all(λ.>=0) && all(c(x).>=0))\n", " acceptedstep=true\n", " break\n", " end\n", " α *= 0.5\n", " end\n", " if !acceptedstep\n", " stuck = 1\n", " break\n", " end\n", " fnew = f(x)\n", "\n", " if (animate)\n", " scatter!(ct, [xold[1]],[xold[2]], markercolor=:red, legend=false,\n", " xlims=xrange, ylims=yrange) \n", " quiver!(ct, [xold[1]],[xold[2]], quiver=([α*Δ[1]],[α*Δ[2]]), legend=false,\n", " xlims=xrange, ylims=yrange)\n", " frame(anim)\n", " end\n", "\n", " xchange = norm(x-xold)\n", " fchange = abs(fnew-fold)\n", " μiter += 1\n", "\n", " # update μ (the details here could also be improved) \n", " foc = ForwardDiff.gradient(x->L(x,λ),x)\n", " if (μiter>10 || (norm(foc)< μ && λ'*c(x)<10*μ)) \n", " μ *= μfactor\n", " μiter = 0\n", " end\n", " \n", " xold = x\n", " fold = fnew\n", " if verbosity>0\n", " print(\"Iter $iter: f=$fnew, λ=$λ, c(x)=$(c(x)), μ=$μ, norm(foc)=$(norm(foc))\\n\")\n", " end\n", " iter += 1 \n", " end\n", " if (iter >= maxiter)\n", " info = \"Maximum iterations reached\"\n", " elseif (stuck>0)\n", " info = \"Failed to find feasible step for \" * string(stuck) * \" iterations.\"\n", " else\n", " info = \"Convergence.\"\n", " end\n", " return(fold, xold, iter, info, anim) \n", "end\n", "\n", "\"\"\"\n", " banana(a,b)\n", " \n", " Returns the Rosenbrock function with parameters a, b.\n", "\"\"\"\n", "function banana(a,b)\n", " x->(a-x[1])^2+b*(x[2]-x[1]^2)^2\n", "end\n", "f = banana(1.0,1.0)\n", "\n", "x0 = [3.0, 0.0]\n", "\n", "function constraint(x)\n", " [x[1] + x[2] - 2.5]\n", "end" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "┌ Info: Saved animation to \n", "│ fn = /home/paul/econ526/01-optimization/julia/ip.gif\n", "└ @ Plots /home/paul/.julia/packages/Plots/h3o4c/src/animation.jl:95\n" ] }, { "data": { "text/html": [ "" ], "text/plain": [ "Plots.AnimatedGif(\"/home/paul/econ526/01-optimization/julia/ip.gif\")" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result = interiorpoint(f, x0, constraint; maxiter=100)\n", "gif(result[5], \"ip.gif\", fps=5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![interior point](./ip.gif \"Interior point\")\n", "\n", "\n", "Optim.jl includes an interior point method. IPOPT is another popular\n", "implementation. As above, the details of the algorithm can be\n", "important in practice. It can be worthwhile to experiment with\n", "different methods for updating $\\mu$, using a more sophisticated line\n", "search or trust region, and perhaps replacing the computation of the\n", "hessian with a quasi-Newton approximation. \n", "\n", "It has been proven that interior point methods converge relatively\n", "quickly for convex optimization problems. \n", "\n", "## Sequential quadratic programming\n", "\n", "Sequential quadratic programming relies on the fact that there are\n", "efficient methods to compute the solution to quadratic programs ---\n", "optimization problems with quadratic objective functions and linear\n", "constraints. We can then solve a more general optimization problem by\n", "solving a sequence of quadratic programs that approximate the original problem.\n", "\n", "Sequential quadratic programming is like a constrained version of\n", "Newton's method. Given a current $x$ and $\\lambda$ the new $x$ solves\n", "$$\n", "\\begin{align*}\n", "x_{new} \\in \\argmin_{\\tilde{x}} & f(x) + Df_x (\\tilde{x} - x) +\n", "\\frac{1}{2} (\\tilde{x}-x)^T (D^2 f_x + D^2 (\\lambda^T c)_x) (\\tilde{x} - x) \\\\\n", " \\text{ s. t. } & c(x) + Dc_{x} (\\tilde{x} - x) \\geq 0\n", "\\end{align*}\n", "$$\n", "and the new $\\lambda$ is set to the value of the multipliers for this\n", "problem. \n", "\n", "This quadratic program (an optimization problem with a quadratic\n", "objective function and linear constraints) can be solved fairly\n", "efficiently if $(D^2 f_x + D^2 (\\lambda^T c)_x)$ is positive\n", "semi-definite.[^QP] \n", "\n", "[^QP]: Most for Convex program solvers are designed to accept\n", " semidefinite programs instead of quadratic programs. Fortunately,\n", " a [quadratic program can be re-written as a semidefinite\n", " program](https://math.stackexchange.com/q/2256243). A solver such\n", " as SCS, ECOS, or Mosek can then be used. Fortunately, Convex.jl\n", " will automatically take care of any necessary transformation.\n", " \n", "One could also incorporate a trust region or line search into the\n", "above algorithm. Here is one simple implementation." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "sequentialquadratic" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using Convex, ECOS\n", "\"\"\"\n", " sequentialquadratic(f, x0, c; tol=1e-4, maxiter = 1000,\n", " trustradius=1.0, xrange=[-2., 3.],\n", " yrange=[-2.,6.], animate=true, verbosity=1)\n", "\n", " \n", " Find the minimum of function `f` by random search\n", " \n", " Inputs:\n", " \n", " - `f` function to minimizie\n", " - `x0` starting value. Must have c(x0) > 0\n", " - `c` constraint function. Must return an array.\n", " - `tol` convergence tolerance\n", " - `trustradisu` initial trust region radius\n", " - `xrange` range of x-axis for animation\n", " - `yrange` range of y-axis for animation\n", " - `animate` whether to create an animation (if true requires length(x)==2)\n", " - `verbosity` higher values result in more printed output during search. 0 for no output, any number > 0 for some. \n", " \n", " Output:\n", "\n", " - `(fmin, xmin, iter, info, animate)` tuple consisting of minimal function\n", " value, minimizer, number of iterations, and convergence info\n", "\n", "\"\"\"\n", "function sequentialquadratic(f, x0, c; tol=1e-4, maxiter = 1000,\n", " trustradius=1.0,\n", " xrange=[-2., 3.],\n", " yrange=[-2.,6.], animate=true, verbosity=1)\n", " fold = f(x0)\n", " xold = x0\n", " xchange=Inf\n", " fchange=Inf\n", " iter = 0\n", " μiter = 0\n", " stuck=0\n", "\n", " animate = animate && length(x0)==2\n", " if animate\n", " # make a contour plot of the function we're minimizing. This is for\n", " # illustrating; you wouldn't have this normally\n", " ct = contour(range(xrange[1],xrange[2], length=100), \n", " range(yrange[1],yrange[2], length=100),\n", " (x,y) -> log(f([x,y])))\n", " plot!(ct, xrange, 2.5 .- xrange) # add constraint \n", " anim = Animation()\n", " end\n", " Dc = ForwardDiff.jacobian(c,xold)\n", " Df = ForwardDiff.gradient(f,xold)\n", " λ = (Dc*Dc') \\ Dc*Df\n", " println(λ)\n", " L(x,λ) = f(x) - λ'*c(x)\n", " foc = [ForwardDiff.gradient(x->L(x,λ),xold); λ.*c(xold)]\n", " fold = f(xold)\n", " negsquared(x) = x < 0 ? x^2 : zero(x)\n", " merit(x) = f(x) + sum(negsquared.(c(x)))\n", " while(iter < maxiter && ((xchange>tol) || (fchange>tol) || (stuck>0)\n", " || norm(foc)>tol) )\n", " Df = ForwardDiff.gradient(f,xold)\n", " Dc = ForwardDiff.jacobian(c, xold)\n", " cx = c(xold)\n", " H = ForwardDiff.hessian(x->L(x,λ),xold)\n", "\n", " # set up and solve our QP\n", " Δ = Variable(length(xold))\n", " problem = minimize(Df'*Δ + quadform(Δ,H), [cx + Dc*Δ >= 0; norm(Δ)<=trustradius])\n", " solve!(problem, ECOSSolver(verbose=verbosity-1))\n", " λ .= problem.constraints[1].dual\n", " xnew = xold .+ Δ.value\n", "\n", " if (animate)\n", " scatter!(ct, [xold[1]],[xold[2]], markercolor=:red, legend=false,\n", " xlims=xrange, ylims=yrange) \n", " quiver!(ct, [xold[1]],[xold[2]], quiver=([Δ.value[1]],[Δ.value[2]]), legend=false,\n", " xlims=xrange, ylims=yrange)\n", " frame(anim)\n", " end\n", "\n", "\n", " # decide whether to accept new point and whether to adjust trust region\n", " if (merit(xnew) < merit(xold))\n", " xold = xnew\n", " stuck = 0\n", " foc = [ForwardDiff.gradient(x->L(x,λ),xold); λ.*c(xold)]\n", " if (problem.constraints[2].dual>1e-4) # trust region binding\n", " trustradius *= 3/2\n", " end\n", " else\n", " stuck += 1\n", " trustradius *= 2/3\n", " if (stuck>=20)\n", " break\n", " end\n", " end\n", " \n", "\n", " xchange = norm(xnew-xold)\n", " fchange = abs(f(xnew)-f(xold))\n", "\n", " if true\n", " print(\"Iter $iter: f=$(f(xold)), λ=$λ, c(x)=$(c(xold)), TR=$trustradius, norm(foc)=$(norm(foc))\\n\")\n", " end\n", " iter += 1 \n", " end\n", " if (iter >= maxiter)\n", " info = \"Maximum iterations reached\"\n", " elseif (stuck>0)\n", " info = \"Failed to find feasible step for \" * string(stuck) * \" iterations.\"\n", " else\n", " info = \"Convergence.\"\n", " end\n", " return(f(xold), xold, iter, info, anim) \n", "end" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-1.0]\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "┌ Warning: Problem status Infeasible; solution may be inaccurate.\n", "└ @ Convex /home/paul/.julia/packages/Convex/6NNC8/src/solution.jl:51\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Iter 0: f=1.0, λ=[5.744129812962317e7], c(x)=[-2.5], TR=0.6666666666666666, norm(foc)=2.8722813232690143\n", "Iter 1: f=1.0, λ=[8.82630652026277e7], c(x)=[-2.5], TR=0.4444444444444444, norm(foc)=2.8722813232690143\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "┌ Warning: Problem status Infeasible; solution may be inaccurate.\n", "└ @ Convex /home/paul/.julia/packages/Convex/6NNC8/src/solution.jl:51\n", "┌ Warning: Problem status Infeasible; solution may be inaccurate.\n", "└ @ Convex /home/paul/.julia/packages/Convex/6NNC8/src/solution.jl:51\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Iter 2: f=5.038440570604318, λ=[1.1753817640813076e8], c(x)=[-0.07212807174958691], TR=0.6666666666666666, norm(foc)=1.6644012806823975e8\n", "Iter 3: f=1.6448771839550929, λ=[2.416203521589231e-9], c(x)=[0.48372602542892373], TR=1.0, norm(foc)=8.727700659672426\n", "Iter 4: f=0.5840086806647102, λ=[4.4551946746239154e-10], c(x)=[0.6757581967033381], TR=1.0, norm(foc)=4.6137298556725534\n", "Iter 5: f=0.23916925350447604, λ=[7.569576370538181e-11], c(x)=[0.4993830486552775], TR=1.0, norm(foc)=2.5160755086622277\n", "Iter 6: f=0.09743531200157135, λ=[9.38238253913904e-10], c(x)=[0.19871922912166484], TR=1.0, norm(foc)=1.40581442153938\n", "Iter 7: f=0.040343557701675914, λ=[0.02685022552990332], c(x)=[4.2504577635327223e-10], TR=1.0, norm(foc)=0.7553410661599463\n", "Iter 8: f=0.027527315271240967, λ=[0.08320080939959254], c(x)=[1.3213163896352853e-10], TR=1.0, norm(foc)=0.3652756468851557\n", "Iter 9: f=0.024131472985838866, λ=[0.08679517745067436], c(x)=[8.819052155217832e-10], TR=1.0, norm(foc)=0.1831946208733258\n", "Iter 10: f=0.023256002358673446, λ=[0.08778281083823443], c(x)=[2.1765655944250284e-10], TR=1.0, norm(foc)=0.09173401109091146\n", "Iter 11: f=0.023033667836213356, λ=[0.08804971131022993], c(x)=[2.176051783209232e-9], TR=1.0, norm(foc)=0.04591272206539734\n", "Iter 12: f=0.02297758351219684, λ=[0.08811530260655176], c(x)=[2.1821664475396574e-9], TR=1.0, norm(foc)=0.022967012802199437\n", "Iter 13: f=0.022963508642272278, λ=[0.08812480456673913], c(x)=[3.764233369452086e-10], TR=1.0, norm(foc)=0.011500723092885471\n", "Iter 14: f=0.0229599744412804, λ=[0.08812974655873822], c(x)=[1.3661960451827326e-10], TR=1.0, norm(foc)=0.005756385618588163\n", "Iter 15: f=0.022959088150398085, λ=[0.0881314107961055], c(x)=[4.56474413823571e-10], TR=1.0, norm(foc)=0.002880464885383854\n", "Iter 16: f=0.02295886732972351, λ=[0.0881263017672415], c(x)=[1.0752962964488688e-9], TR=1.0, norm(foc)=0.001456169724223254\n", "Iter 17: f=0.022958810562596636, λ=[0.08812727182962203], c(x)=[3.745093124507548e-10], TR=1.0, norm(foc)=0.0007266128616736104\n", "Iter 18: f=0.022958797295923792, λ=[0.08812916326165048], c(x)=[9.805600775791845e-10], TR=1.0, norm(foc)=0.0003909086741240457\n", "Iter 19: f=0.022958793324829984, λ=[0.08812981693441531], c(x)=[1.730438015101754e-10], TR=1.0, norm(foc)=0.00020768034064370404\n", "Iter 20: f=0.022958792149509473, λ=[0.08812993623802398], c(x)=[1.063815702195825e-10], TR=1.0, norm(foc)=0.00010132460936763705\n", "Iter 21: f=0.02295879187493447, λ=[0.08813033423408839], c(x)=[1.4879342202789303e-10], TR=1.0, norm(foc)=4.905856373221587e-5\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "┌ Info: Saved animation to \n", "│ fn = /home/paul/econ526/01-optimization/julia/sqp.gif\n", "└ @ Plots /home/paul/.julia/packages/Plots/h3o4c/src/animation.jl:95\n" ] }, { "data": { "text/html": [ "" ], "text/plain": [ "Plots.AnimatedGif(\"/home/paul/econ526/01-optimization/julia/sqp.gif\")" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x0 = [0.0, 0.0]\n", "result = sequentialquadratic(f, x0, constraint; maxiter=100)\n", "gif(result[5], \"sqp.gif\", fps=5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![sqp](./sqp.gif \"Sequential quadratic programming\")\n", "\n", "Compared to interior point methods, sequential quadratic programming\n", "has the advantage of not needing a feasible point to begin. Like\n", "Newton's method, sequential quadratic programming has local quadratic\n", "convergence. A downside of sequential quadratic programming is that\n", "solving the quadratic program at each step can take considerably\n", "longer than solving the system of linear equations that interior point\n", "methods and Newton methods require. \n", "\n", "\n", "## SLQP active Set \n", "\n", "SLQP active set methods use a linear approximation to the optimization\n", "problem to decide which constraints are \"active\" (binding). In each\n", "iteration, a linear approximation to the original problem is first\n", "solved. The constraints that bind in linear approximation are then\n", "assumed to bind in the full problem, and we take a Newton step\n", "accordingly. \n", "\n", "\n", "## Augmented Lagrangian\n", "\n", "Augmented Lagragian methods convert a constrained minimization problem\n", "to an unconstrained problem by adding a penalty that increases with\n", "the constraint violation to the objective function." ] } ], "metadata": { "kernelspec": { "display_name": "Julia 1.2.0", "language": "julia", "name": "julia-1.2" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.2.0" } }, "nbformat": 4, "nbformat_minor": 2 }