sfista

Solvers‎ > ‎

General Description:

S-FISTA (smoothed FISTA) method for solving problems of the form

$\min_{x} f(x)+\lambda_G g(Ax)+\lambda_H h(x)$

Assumptions:

all functions are convex
f - smooth
g - real-valued and proximable
h - proper closed and proximable
A - linear transformation
$\lambda_H,\lambda_G$ - positive scalars

Available Oracles:

function value of f
gradient of f
function value of g
prox of a positive constant times g
function value of h
prox of a positive constant times h
value of the linear transformation A
value of the adjoint of A

(the function values of f,g and h are not required in the economical setting)

Usage:

out               = sfista(Ffun,Ffun_grad,Gfun,Gfun_prox,Hfun,Hfun_prox,Afun,Atfun,mu,lambda_G,lambda_H,startx,[par])
[out,fmin]        = sfista(Ffun,Ffun_grad,Gfun,Gfun_prox,Hfun,Hfun_prox,Afun,Atfun,mu,lambda_G,lambda_H,startx,[par])
[out,fmin,parout] = sfista(Ffun,Ffun_grad,Gfun,Gfun_prox,Hfun,Hfun_prox,Afun,Atfun,mu,lambda_G,lambda_H,startx,[par])

Input:

Ffun - function handle for f

Ffun_grad - function handle for the gradient of f

Gfun - function handle for g

Gfun_prox - function handle for the proximal mapping of g times a positive constant
Hfun - function handle for h (if h is extended real-valued, it is enough to specify the operation of h on its domain)
Hfun_prox - function handle for the proximal mapping of h times a positive constant
Afun - function handle for the linear transformation A
Atfun - function handle for the adjoint of the linear transformation A
mu - positive smoothing parameter
lambda_G - positive scalar penalty for the function g
lambda_H - positive scalar penalty for the function h
startx - starting vector
par - struct containing different values required for the operation of sfista

 field of par  possible values  default  description
  max_iter   positive integer  1000 maximal amount of iterations
eco_flag true/false   false true - economic version in which
function values are not computed 
 print_flag true/false   true true - internal printing is enabled
 monotone_flag  true/false  false true - a monotone version of FISTA is applied
 Lstart positive real  1 an initial value for the Lipschitz constant
 const_flag  true/false  false true - constant stepsize is used
false - backtracking is employed
 regret_flag  true/false  false true - Lipschitz constant is divided by eta
       in the next iteration
false - otherwise
 eta  real>1  2 multiplicative constant used when regretting
or backtracking
 continuation  true/false  false true  - continuation mode is used
false - otherwise
 gamma  real>1  5 multiplicative constant used to increase mu.
Relevant only in continuation mode.
 cont_cycles  positive integer  5 number of cycles in continuation mode.
Relevant only in continuation mode.
 eps  positive real  1e-5 stopping criteria tolerance (the
method terminates when

with x^k being the kth iterate vector)

Output:

out   - optimal solution (up to a tolerance)
fmin     - optimal value (up to a tolerance)
parout   - a struct containing containing additional information related to the convergence. The fields of parout are:

iterNum - number of performed iterations

funValVec - vector of all function values generated by the method
LvalVec - vector of all used Lipschitz constants (relevant when par.const_flag=false)
OrigFun - original function value (without smoothing)
CyclIter - number of performed iterations in each cycle (relevant only in continuation mode)

Method Description:

Employs FISTA on the following smoothed version of the model:

$\min_{x} f(x)+\lambda_G g_{\mu}(Ax)+\lambda_H h(x)$

where

$g_{\mu}(x)=M_g^{\mu}(x)=\min \left \{ g(u)+\frac{1}{2\mu} \|x-u\|^2\right \}$

is the Moreau envelope of g with parameter mu. The smooth part is taken as

$f(x)+\lambda_G g_{\mu}(Ax)$

and the nonsmooth component is

$\lambda_H h(x)$

Besides "continuation", ", "gamma" and "cont_cycles", all the parameters in par are related to FISTA.

References:

Amir Beck, "First Order Methods in Optimization", SIAM, 2017.
Amir Beck and Marc Teboulle, "Smoothing and First Order Methods: A Unified Framework", SIAM J. Optim. vol. 22 (2012), No. 2, 557--580.
Yuri Nesterov, "Smooth Minimization of Nonsmooth Functions", Mathematical Programming, vol. 103 (2005), No. 1, p. 127--152.

Example:

Consider the problem

$\min_x (x-a)^TQ(x-a)+0.25 \|Ax\|_2+0.4\|x\|_1$

where Q, a and A are given by

>> randn('seed',314);

>> A = randn(30,20);

>> Q = randn(20);

>> Q = Q * Q';

>> a = randn(20,1);

Note that by its construction, Q is positive definite, and therefore the problem is convex. To solve the problem, we will run sfista with

$f(x)= (x-a)^T Q (x-a),g(x)=\|x\|_2, h(x)=\|x\|_1, \lambda_G = 0.25, \lambda_H=0.4$

and with mu=0.01.

>> Afun = @(x) A*x;

>> Atfun = @(x) A'*x;

>> f=@(x) (x-a)'*Q*(x-a);

>> grad_f = @(x) 2*Q*(x-a) ;

>> g = @(x) norm(x,2);

>> h = @(x) norm(x,1);

>> mu = 0.01 ;

>> lambda_g=0.25;

>> lambda_h=0.4;

>> [xs,fun_xs,parout] = sfista( @(x) f(x) , @(x) grad_f(x),@(x) g(x),@(x,a) prox_Euclidean_norm(x,a),@(x) h(x),@(x,a) prox_l1(x,a), @(x) A*x , @(x)...

A'*x,mu,lambda_g,lambda_h,ones(20,1));

*********************

FISTA

*********************

#iter fun. val. L val.

1 270.280178 128.000000

2 156.805102 128.000000

3 97.019766 128.000000

4 65.695242 128.000000

5 47.036194 128.000000

: : :

200 11.636340 128.000000

227 11.636340 128.000000

228 11.636339 128.000000

229 11.636339 128.000000

257 11.636339 128.000000

285 11.636339 128.000000

Stopping because the norm of the difference between consecutive iterates is too small

----------------------------------

Optimal value = 11.636339

Original function value = 11.637589

Note that the "optimal value" is the function value corresponding to the best smoothed objective function value. "Original function value" is the corresponding function value without any smoothing (that is, when using g rather than g_mu). The amount of iterations employed by sfista is 299:

>> parout.iterNum

ans =

299

It is possible to use continuation, meaning that the smoothing parameter is initialized as a large value that gradually decreases to the required value. Several cycles of iterations are employed, where in each cycle, FISTA is employed on the problem with a specific smoothing parameter.

>> clear par; par.continuation=true;
>> [xs,fun_xs,parout] = sfista( @(x) f(x) , @(x) grad_f(x),@(x) g(x),@(x,a) prox_Euclidean_norm(x,a),@(x) h(x),@(x,a) prox_l1(x,a), @(x) A*x,...
@(x) A'*x,mu,lambda_g,lambda_h,ones(20,1),par);

*********************
Starting continuation cycles
Cycle 1 (mu = 6.25) required 73 iterations
Cycle 2 (mu = 1.25) required 2 iterations
Cycle 3 (mu = 0.25) required 1 iterations
Cycle 4 (mu = 0.05) required 2 iterations
Starting final cycle with mu = 0.01
*********************
FISTA
*********************
#iter       	fun. val.     	L val.
     1 	       11.636857 	  128.000000
     2 	       11.636841 	  128.000000
     3 	       11.636821 	  128.000000
     4 	       11.636799 	  128.000000
     :             :                   :
    30 	       11.636342 	  128.000000
    31 	       11.636340 	  128.000000
    32 	       11.636340 	  128.000000
    33 	       11.636339 	  128.000000
    62 	       11.636339 	  128.000000
Stopping because the norm of the difference between consecutive iterates is too small
----------------------------------
Optimal value =       11.636339

Original function value =       11.637589

The number of iterations in each of the cycle can be computed using the CycleIter field of parout

>> parout.CycleIter

ans =

73

2

1

2

74

The overall number of iterations when using continuation (152) is obviously smaller than the amount of iterations required in the version without continuation (299).