General Description:
Alternating direction linearized proximal method of multipliers (ADLPMM) for solving problems of the form
Assumptions:
Available Oracles:
(the function values of f,g are not required in the economical setting)
Usage:
out = adlpmm(Ffun,Ffun_prox,Gfun,Gfun_prox,Afun,Atfun,lambda,startx,[L],[par]) [out,fmin] = adlpmm(Ffun,Ffun_prox,Gfun,Gfun_prox,Afun,Atfun,lambda,startx,[L],[par]) [out,fmin,parout] = adlpmm(Ffun,Ffun_prox,Gfun,Gfun_prox,Afun,Atfun,lambda,startx,[L],[par]) Input:Ffun - function handle for f (if f is extended real-valued, it is enough to specify the operation of f on its domain) Ffun_prox - function handle for the proximal mapping of f times a positive constant
Gfun - function handle for g (if g is extended real-valued, it is enough to specify the operation of g on its domain)
Gfun_prox - function handle for the proximal mapping of g times a positive constant
Afun - function handle for the linear transformation A
Atfun - function handle for the adjoint of the linear transformation A lambda - positive constant
startx - starting vector
L - positive constant; should be set to the squared norm of A (if the parameter is not given, then the squared norm of A is computed inefficiently)
par - struct containing different values required for the operation of adplmm
Output:out - optimal solution (up to a tolerance) fmin - optimal value (up to a tolerance) parout - a struct containing containing additional information related to the convergence. The fields of parout are: iterNum - number of performed iterations
funValVec - vector of all function values generated by the method Method Description:
Employs the alternating directions linearized proximal method of multipliers (ADLPMM) whose general update step is
where rho is a positive constant. The last computed vector xk+1 is considered as the obtained solution and the last computed yk+1 is the obtained dual vector.
References:
Consider the problem of minimizing the l1 norm over an affine space:
The above model is also called basis pursuit. The problem also fits the main model with
A and b are generated by
>> randn('seed',314); >> A=randn(20,30); >> b=randn(20,1); To find a solution, we can run adlpmm
>> [out,fout,parout]=adlpmm(@(x)norm(x,1),@(x,a)prox_l1(x,a),@(x)0,@(x,a)b,@(x)A*x,@(x)A'*x,1,zeros(30,1),norm(A)^2); *********************
adlpmm
*********************
#iter fun. val. feas. viol.
1 0.000000 3.76502
2 2.018427 2.54728
3 2.549455 1.58128
4 3.242071 1.53246
5 3.734760 1.47139
: : : 996 4.569683 0.000517774
997 4.569588 0.000503603
998 4.569496 0.000493158
999 4.569407 0.000486552
1000 4.569322 0.000483765
----------------------------------
Optimal value = 4.569322
Note that we set L to be equal to the squared norm of the matrix A. If the value is not given, then adlpmm will compute the squared norm in an inefficient manner.
Example 2 (low rank matrix completion)
We begin by generating a low rank 40x40 matrix
>> randn('seed',314); >> rand('seed',314); >> n=40; >> r=4; >> X0=randn(n,r); >> X1=randn(r,n); >> X=X0*X1; Obviously the matrix X has rank 4. We assume that 80% of the entries of X are known.
>> p=0.8; >> a=floor(n^2*p); >> sample=randperm(n^2); >> sample=sort(sample(1:a)); >> b=X(sample); The vector "sample" contains the list of known indices, while "b" contains their values. To recover the "true" X given the observations vector b, we need to find a minimal rank matrix that is consistent with the observations:
where
The problem also fits the main model with
The constant L should be set to be the squared norm of the linear transformation
To solve the problem, we first construct the following MATLAB function (written as a separate m-file) which evaluates the adjoint operator of
function out=adjoint_A(size_matrix,sample,y) out=zeros(size_matrix(1),size_matrix(2)); out(sample)=y; We can now solve the problem using adlpmm
[out,fout,parout]=adlpmm(@(X)sum(svd(X)),@(X,a)prox_nuclear(X,a),@(x)0,@(X,a)b,@(X)X(sample),@(y)adjoint_A([n,n],sample,y),1,zeros(n,n),1); *********************
adlpmm
*********************
#iter fun. val. feas. viol.
1 0.000000 70.4765
2 452.409114 67.089
3 218.924288 0.754675
4 209.889560 0.649812
5 201.712045 0.615665
6 194.363096 0.567603
7 187.756982 0.538813
: : :
158 152.281262 1.72676e-10
159 152.281262 1.86096e-11
160 152.281262 1.55405e-10
161 152.281262 2.24433e-10
162 152.281262 2.25893e-10
Stopping because the norm of the difference between consecutive iterates is too small ---------------------------------- Optimal value = 152.281262 The obtained solution is almost a perfect reconstruction of the "true" matrix X
>> norm(X-out) ans = 4.4730e-10 Note that it is possible to allow the solver to compute an appropriate L on its own, but this is not advisable since it consists of invoking a rather inefficient method to compute the norm of transformation.
>> [out,fout,parout]=adlpmm(@(X)sum(svd(X)),@(X,a)prox_nuclear(X,a),@(x)0,@(X,a)b,@(X)X(sample),@(y)adjoint_A([n,n],sample,y),1,zeros(n,n)); ********************* adlpmm ********************* #iter fun. val. feas. viol. 1 0.000000 70.4765 2 452.409114 67.089 3 218.924288 0.754675 4 209.889560 0.649812 5 201.712045 0.615665 6 194.363096 0.567603 7 187.756982 0.538813 : : : 158 152.281262 1.72676e-10 159 152.281262 1.86096e-11 160 152.281262 1.55405e-10 161 152.281262 2.24433e-10 162 152.281262 2.25893e-10
Stopping because the norm of the difference between consecutive iterates is too small ---------------------------------- Optimal value = 152.281262 Unsurprisingly, using a larger L (e.g., 2 in the following run), might result with a slower convergence.
>> [out,fout,parout]=adlpmm(@(X)sum(svd(X)),@(X,a)prox_nuclear(X,a),@(x)0,@(X,a)b,@(X)X(sample),@(y)adjoint_A([n,n],sample,y),1,zeros(n,n),2); ********************* adlpmm
********************* #iter fun. val. feas. viol. 1 0.000000 70.4765
2 226.204557 2.40463
3 341.630479 33.5476
4 341.031920 34.386
5 280.119641 17.6311
6 217.335047 0.784551
: : :
305 152.281262 1.9659e-10
306 152.281262 4.5285e-10
307 152.281262 9.60166e-10
308 152.281262 1.29546e-09
309 152.281262 1.44147e-09
310 152.281262 1.40528e-09
Stopping because the norm of the difference between consecutive iterates is too small ---------------------------------- Optimal value = 152.281262 Example 3
Consider the problem
where A and b are given by
>> randn('seed',766); >> n=20; >> A = randn(n,n) ; >> b = randn(n,1) ; The problem fits the main model with
We can solve the problem using adlpmm. In the following run we give a limit of 10000 iterations. Since both f and g are real-valued, it is better to call the solver with par.real_valued_flag=true since in this setting it outputs the vector with the smallest function value and only iterations in which an actual improvement of function value was obtained are printed.
>> lambda = 5 ; >> clear par >> par.max_iter=10000; >> par.real_valued_flag=true; >> [out,fout,parout] = adlpmm(@(x) 2*norm(x,inf),@(x,a) prox_linf(x,2*a),@(x) norm(x+b,1), @(x,a) prox_l1(x+b,a)-b,... @(x) A*x, @(x) A'*x,lambda,ones(n,1),norm(A)^2,par); ********************* adlpmm ********************* #iter fun. val. 1 379.922993 2 155.862412 3 128.413247 4 82.971548 5 77.140573 7 74.741988 : : 3265 8.169385 3551 8.166405 3836 8.165786 4692 8.165380 8971 8.165284 ---------------------------------- Optimal value = 8.165284 |