Solvers‎ > ‎

sfista

General Description:

S-FISTA (smoothed FISTA) method for solving problems of the form 
Assumptions:
  • all functions are convex
  • f   - smooth
  • g  - real-valued and proximable
  • h  - proper closed and proximable
  • A  - linear transformation
  •  - positive scalars
Available Oracles:
  • function value of f
  • gradient of f 
  • function value of g
  • prox of a positive constant times g
  • function value of h
  • prox of a positive constant times h
  • value of the linear transformation A
  • value of the adjoint of A
(the function values of f,g and h are not required in the economical setting)

Usage: 
out               = sfista(Ffun,Ffun_grad,Gfun,Gfun_prox,Hfun,Hfun_prox,Afun,Atfun,mu,lambda_G,lambda_H,startx,[par])
[out,fmin]        = sfista(Ffun,Ffun_grad,Gfun,Gfun_prox,Hfun,Hfun_prox,Afun,Atfun,mu,lambda_G,lambda_H,startx,[par])
[out,fmin,parout] = sfista(Ffun,Ffun_grad,Gfun,Gfun_prox,Hfun,Hfun_prox,Afun,Atfun,mu,lambda_G,lambda_H,startx,[par])
Input: 
Ffun            - function handle for f 
Ffun_grad   - function handle for the gradient of f
Gfun            - function handle for g
Gfun_prox   - function handle for the proximal mapping of g times a positive constant
Hfun            - function handle for h (if h is extended real-valued, it is enough to specify the operation of h on its domain)
Hfun_prox   - function handle for the proximal mapping of h times a positive constant
Afun            - function handle for the linear transformation A
Atfun           - function handle for the adjoint of the linear transformation A
mu               - positive smoothing parameter
lambda_G    - positive scalar penalty for the function g
lambda_H    - positive scalar penalty for the function h
startx          - starting vector
par              - struct containing different values required for the operation of sfista


 field of par possible values default description
max_iter  positive integer 1000maximal amount of iterations
eco_flagtrue/false  falsetrue - economic version in which function values are not computed
 print_flagtrue/false  truetrue - internal printing is enabled
 monotone_flag true/false falsetrue - a monotone version of FISTA is applied
 Lstartpositive real 1an initial value for the Lipschitz constant
 const_flag true/false falsetrue - constant stepsize is used false - backtracking is employed
 regret_flag true/false falsetrue - Lipschitz constant is divided by eta in the next iteration false - otherwise
 eta real>1 2multiplicative constant used when regretting or backtracking
 continuation true/false falsetrue - continuation mode is used false - otherwise
 gamma real>1 5multiplicative constant used to increase mu. Relevant only in continuation mode.
 cont_cycles positive integer 5number of cycles in continuation mode. Relevant only in continuation mode.
 eps positive real 1e-5stopping criteria tolerance (the method terminates when with x^k being the kth iterate vector)

Output: 
out           - optimal solution (up to a tolerance)
fmin         - optimal value (up to a tolerance)
parout      - a struct containing containing additional information related to the convergence. The fields of parout are:
                   iterNum    - number of performed iterations
funValVec - vector of all function values generated by the method
LvalVec     - vector of all used Lipschitz constants (relevant when par.const_flag=false) 
OrigFun    -  original function value (without smoothing)
CyclIter     -  number of performed iterations in each cycle (relevant only in continuation mode)


Method Description: 

Employs FISTA on the following smoothed version of the model:
where 


is the Moreau envelope of g with parameter mu. The smooth part is taken as 

and the nonsmooth component is 
Besides "continuation", ", "gamma" and "cont_cycles", all the parameters in par are related to FISTA. 

References: 


Example: 

Consider the problem
 
where Q, a and A are given by 

>> randn('seed',314);
>> A = randn(30,20);
>> Q = randn(20);
>> Q = Q * Q';
>> a = randn(20,1);

Note that by its construction, Q is positive definite, and therefore the problem is convex. To solve the problem, we will run sfista with 

and with mu=0.01.

>> Afun = @(x) A*x;
>> Atfun = @(x) A'*x;
>> f=@(x) (x-a)'*Q*(x-a);
>> grad_f = @(x) 2*Q*(x-a) ;
>> g = @(x) norm(x,2);
>> h = @(x) norm(x,1);
>> mu = 0.01 ;
>> lambda_g=0.25;
>> lambda_h=0.4;

>> [xs,fun_xs,parout] = sfista( @(x) f(x) , @(x) grad_f(x),@(x) g(x),@(x,a) prox_Euclidean_norm(x,a),@(x) h(x),@(x,a) prox_l1(x,a), @(x) A*x , @(x)...
A'*x,mu,lambda_g,lambda_h,ones(20,1));

*********************
FISTA
*********************
#iter       fun. val.     L val.
     1      270.280178  128.000000
     2      156.805102  128.000000
     3       97.019766  128.000000
     4       65.695242  128.000000
     5       47.036194  128.000000
     :               :              :
   200       11.636340  128.000000
   227       11.636340  128.000000
   228       11.636339  128.000000
   229       11.636339  128.000000
   257       11.636339  128.000000
   285       11.636339  128.000000
Stopping because the norm of the difference between consecutive iterates is too small
----------------------------------
Optimal value =       11.636339
Original function value =    11.637589  

Note that the "optimal value" is the function value corresponding to the best smoothed objective function value. "Original function value" is the corresponding function value without any smoothing (that is, when using g rather than g_mu).  The amount of iterations employed by sfista is 299:

>> parout.iterNum

ans =

   299

It is possible to use continuation, meaning that the smoothing parameter is initialized as a large value that gradually decreases to the required value. Several cycles of iterations are employed, where in each cycle, FISTA is employed on the problem with a specific smoothing parameter. 
>> clear par; par.continuation=true;
>> [xs,fun_xs,parout] = sfista( @(x) f(x) , @(x) grad_f(x),@(x) g(x),@(x,a) prox_Euclidean_norm(x,a),@(x) h(x),@(x,a) prox_l1(x,a), @(x) A*x,...
@(x) A'*x,mu,lambda_g,lambda_h,ones(20,1),par);
*********************
Starting continuation cycles
Cycle 1 (mu = 6.25) required 73 iterations
Cycle 2 (mu = 1.25) required 2 iterations
Cycle 3 (mu = 0.25) required 1 iterations
Cycle 4 (mu = 0.05) required 2 iterations
Starting final cycle with mu = 0.01
*********************
FISTA
*********************
#iter       	fun. val.     	L val.
     1 	       11.636857 	  128.000000
     2 	       11.636841 	  128.000000
     3 	       11.636821 	  128.000000
     4 	       11.636799 	  128.000000
     :             :                   :
    30 	       11.636342 	  128.000000
    31 	       11.636340 	  128.000000
    32 	       11.636340 	  128.000000
    33 	       11.636339 	  128.000000
    62 	       11.636339 	  128.000000
Stopping because the norm of the difference between consecutive iterates is too small
----------------------------------
Optimal value =       11.636339 
Original function value =       11.637589 

The number of iterations in each of the cycle can be computed using the CycleIter field of parout

>> parout.CycleIter

ans =

    73
     2
     1
     2
    74

The overall number of iterations when using continuation (152) is obviously smaller than the amount of iterations required in the version without continuation (299).