adlpmm

General Description:

Alternating direction linearized proximal method of multipliers (ADLPMM) for solving problems of the form 

Assumptions:
  • all functions are convex
  • f   - proper closed and proximable
  • g  - proper closed and proximable
  • A  - linear transformation
  •  - positive scalar
Available Oracles:
  • function value of f
  • prox of a positive constant times f
  • function value of g
  • prox of a positive constant times g
  • value of the linear transformation A
  • value of the adjoint of A
(the function values of f,g are not required in the economical setting)

Usage: 
out               = adlpmm(Ffun,Ffun_prox,Gfun,Gfun_prox,Afun,Atfun,lambda,startx,[L],[par])
[out,fmin]        = adlpmm(Ffun,Ffun_prox,Gfun,Gfun_prox,Afun,Atfun,lambda,startx,[L],[par])
[out,fmin,parout] = adlpmm(Ffun,Ffun_prox,Gfun,Gfun_prox,Afun,Atfun,lambda,startx,[L],[par])
Input: 
Ffun             - function handle for f (if f is extended real-valued, it is enough to specify the operation of f on its domain)
Ffun_prox    - function handle for the proximal mapping of f times a positive constant
Gfun            - function handle for g (if g is extended real-valued, it is enough to specify the operation of g on its domain)
Gfun_prox   - function handle for the proximal mapping of g times a positive constant
Afun            - function handle for the linear transformation A
Atfun           - function handle for the adjoint of the linear transformation A
lambda        - positive constant
startx          - starting vector
L                  - positive constant; should be set to the squared norm of A (if the parameter is not given, then the squared norm of A is computed inefficiently)
par              - struct containing different values required for the operation of adplmm


 field of par possible values default description
max_iter  positive integer 1000maximal amount of iterations
eco_flagtrue/false  falsetrue - economic version in which function values are not computed
 print_flagtrue/false  truetrue - internal printing is enabled
 real_valued_flag true/false falsetrue - g is assumed to be real-valued (only improved function values are printed) false - otherwise 
 feas_check_flag true/false falsetrue - feasibility violation is printed false - otherwise
 rho positive real 1positive parameter used in the method
 eps positive real 1e-5stopping criteria tolerance (the method terminates when with x^k being the kth iterate vector)

Output: 
out           - optimal solution (up to a tolerance)
fmin         - optimal value (up to a tolerance)
parout      - a struct containing containing additional information related to the convergence. The fields of parout are:
                   iterNum      - number of performed iterations
funValVec   - vector of all function values generated by the method
FeasValVec - vector of all feasibility violation values (relevant only when par.feas_check_flag=true)
dualVec      -  dual optimal vector

Method Description: 

Employs the alternating directions linearized proximal method of multipliers (ADLPMM) whose general update step is


where rho is a positive constant.  The last computed vector xk+1 is considered as the obtained solution and the last computed yk+1 is the obtained dual vector.


References: 


Example 1:

Consider the problem of minimizing the l1 norm over an affine space:

The above model is also called basis pursuit. The problem also fits the main model with 

A and b are generated by 

>> randn('seed',314);
>> A=randn(20,30);
>> b=randn(20,1);

To find a solution, we can run adlpmm 

>> [out,fout,parout]=adlpmm(@(x)norm(x,1),@(x,a)prox_l1(x,a),@(x)0,@(x,a)b,@(x)A*x,@(x)A'*x,1,zeros(30,1),norm(A)^2);

*********************
adlpmm
*********************

#iter       fun. val.     feas. viol. 
     1        0.000000       3.76502
     2        2.018427       2.54728
     3        2.549455       1.58128
     4        3.242071       1.53246
     5        3.734760       1.47139
     :              :                   :
   996        4.569683   0.000517774
   997        4.569588   0.000503603
   998        4.569496   0.000493158
   999        4.569407   0.000486552
  1000        4.569322   0.000483765
----------------------------------
Optimal value =        4.569322 

Note that we set L to be equal to the squared norm of the matrix A. If the value is not given, then adlpmm will compute the squared norm in an inefficient manner.

Example 2 (low rank matrix completion)

We begin by generating a low rank 40x40 matrix 

>> randn('seed',314);
>> rand('seed',314);
>> n=40;
>> r=4;
>> X0=randn(n,r);
>> X1=randn(r,n);
>> X=X0*X1;

Obviously the matrix X has rank 4. We assume that 80% of the entries of X are known. 

>> p=0.8;
>> a=floor(n^2*p);
>> sample=randperm(n^2);
>> sample=sort(sample(1:a));
>> b=X(sample);

The vector "sample" contains the list of known indices, while "b" contains their values. To recover the "true" X given the observations vector b, we need to find a minimal rank matrix that is consistent with the observations:


where  is the linear transformation that maps a 40x40 matrix to the vector containing the values of the matrix at the components of "sample". The above optimization problem is extremely difficult since the rank function is nonconvex (and actually not even continuous). Instead, we will solve the well-known relaxation in which the rank function is replaced by the nuclear norm:


The problem also fits the main model with 


The constant L should be set to be the squared norm of the linear transformation  which is 1 in this case (since the associated matrix comprises different unit vectors). 
To solve the problem, we first construct  the following MATLAB function (written as a separate m-file) which evaluates the adjoint operator of :

function out=adjoint_A(size_matrix,sample,y)

out=zeros(size_matrix(1),size_matrix(2));
out(sample)=y;

We can now solve the problem using adlpmm

[out,fout,parout]=adlpmm(@(X)sum(svd(X)),@(X,a)prox_nuclear(X,a),@(x)0,@(X,a)b,@(X)X(sample),@(y)adjoint_A([n,n],sample,y),1,zeros(n,n),1);
*********************
adlpmm
*********************
#iter       fun. val.     feas. viol. 
     1        0.000000       70.4765
     2      452.409114        67.089
     3      218.924288      0.754675
     4      209.889560      0.649812
     5      201.712045      0.615665
     6      194.363096      0.567603
     7      187.756982      0.538813 
        :                        :                               :
   158      152.281262   1.72676e-10
   159      152.281262   1.86096e-11
   160      152.281262   1.55405e-10
   161      152.281262   2.24433e-10
   162      152.281262   2.25893e-10
Stopping because the norm of the difference between consecutive iterates is too small
----------------------------------
Optimal value =      152.281262 

The obtained solution is almost a perfect reconstruction of the "true" matrix X

>> norm(X-out)

ans =

   4.4730e-10

Note that it is possible to allow the solver to compute an appropriate  L on its own, but this is not advisable since it consists of invoking a rather inefficient method to compute the norm of transformation.

>> [out,fout,parout]=adlpmm(@(X)sum(svd(X)),@(X,a)prox_nuclear(X,a),@(x)0,@(X,a)b,@(X)X(sample),@(y)adjoint_A([n,n],sample,y),1,zeros(n,n));

*********************
adlpmm
*********************
#iter       fun. val.     feas. viol. 
     1          0.000000         70.4765 
     2        452.409114          67.089 
     3        218.924288        0.754675 
     4        209.889560        0.649812 
     5        201.712045        0.615665 
     6        194.363096        0.567603 
     7        187.756982        0.538813 
        :                        :                               :
   158        152.281262     1.72676e-10 
   159        152.281262     1.86096e-11 
   160        152.281262     1.55405e-10 
   161        152.281262     2.24433e-10 
   162        152.281262     2.25893e-10 
Stopping because the norm of the difference between consecutive iterates is too small
----------------------------------
Optimal value =      152.281262 

Unsurprisingly, using a larger L (e.g., 2 in the following run), might result with a slower convergence.

>> [out,fout,parout]=adlpmm(@(X)sum(svd(X)),@(X,a)prox_nuclear(X,a),@(x)0,@(X,a)b,@(X)X(sample),@(y)adjoint_A([n,n],sample,y),1,zeros(n,n),2);

*********************
adlpmm
*********************
#iter       fun. val.     feas. viol. 
     1        0.000000       70.4765
     2      226.204557       2.40463
     3      341.630479       33.5476
     4      341.031920        34.386
     5      280.119641       17.6311
     6      217.335047      0.784551
        :                         :                            :
   305      152.281262    1.9659e-10
   306      152.281262    4.5285e-10
   307      152.281262   9.60166e-10
   308      152.281262   1.29546e-09
   309      152.281262   1.44147e-09
   310      152.281262   1.40528e-09
Stopping because the norm of the difference between consecutive iterates is too small
----------------------------------
Optimal value =      152.281262 

Example 3

Consider the problem 

where A and b are given by

>> randn('seed',766);
>> n=20;
>> A = randn(n,n) ;
>> b = randn(n,1) ;

The problem fits the main model with 

We can solve the problem using adlpmm. In the following run we give a limit of 10000 iterations. Since both f and g are real-valued, it is better to call the solver with par.real_valued_flag=true since in this setting it outputs the vector with the smallest function value and only iterations in which an actual improvement of function value was obtained are printed. 

>> lambda = 5 ;
>> clear par
>> par.max_iter=10000;
>> par.real_valued_flag=true;
>> [out,fout,parout] = adlpmm(@(x) 2*norm(x,inf),@(x,a) prox_linf(x,2*a),@(x) norm(x+b,1), @(x,a) prox_l1(x+b,a)-b,...
@(x) A*x, @(x) A'*x,lambda,ones(n,1),norm(A)^2,par);

*********************
adlpmm
*********************
#iter            fun. val.     
     1      379.922993
     2      155.862412
     3      128.413247
     4       82.971548
     5       77.140573
     7       74.741988
     :               :
  3265        8.169385
  3551        8.166405
  3836        8.165786
  4692        8.165380
  8971        8.165284
----------------------------------
Optimal value =        8.165284