
Stochastic QuasiNewton Methods for Nonconvex Stochastic Optimization
In this paper we study stochastic quasiNewton methods for nonconvex sto...
read it

Adaptive Sequential SAA for Solving Twostage Stochastic Linear Programs
We present adaptive sequential SAA (sample average approximation) algori...
read it

Adaptive Sampling QuasiNewton Methods for ZerothOrder Stochastic Optimization
We consider unconstrained stochastic optimization problems with no avail...
read it

Adaptive Sampling QuasiNewton Methods for DerivativeFree Stochastic Optimization
We consider stochastic zeroorder optimization problems, which arise in ...
read it

Asynchronous Parallel Stochastic QuasiNewton Methods
Although firstorder stochastic algorithms, such as stochastic gradient ...
read it

Stochastic GaussNewton Algorithms for Nonconvex Compositional Optimization
We develop two new stochastic GaussNewton algorithms for solving a clas...
read it

A Progressive Batching LBFGS Method for Machine Learning
The standard LBFGS method relies on gradient approximations that are no...
read it
Retrospective Approximation for Smooth Stochastic Optimization
We consider stochastic optimization problems where a smooth (and potentially nonconvex) objective is to be minimized using a stochastic firstorder oracle. These type of problems arise in many settings from simulation optimization to deep learning. We present Retrospective Approximation (RA) as a universal sequential sampleaverage approximation (SAA) paradigm where during each iteration k, a samplepath approximation problem is implicitly generated using an adapted sample size M_k, and solved (with prior solutions as "warm start") to an adapted error tolerance ϵ_k, using a "deterministic method" such as the line search quasiNewton method. The principal advantage of RA is that decouples optimization from stochastic approximation, allowing the direct adoption of existing deterministic algorithms without modification, thus mitigating the need to redesign algorithms for the stochastic context. A second advantage is the obvious manner in which RA lends itself to parallelization. We identify conditions on {M_k, k ≥ 1} and {ϵ_k, k≥ 1} that ensure almost sure convergence and convergence in L_1norm, along with optimal iteration and work complexity rates. We illustrate the performance of RA with linesearch quasiNewton on an illconditioned least squares problem, as well as an image classification problem using a deep convolutional neural net.
READ FULL TEXT
Comments
There are no comments yet.