adlpmm

General Description:

Alternating direction linearized proximal method of multipliers (ADLPMM) for solving problems of the form

$\min_x f(x)+\lambda g(Ax)$

Assumptions:

all functions are convex
f - proper closed and proximable
g - proper closed and proximable
A - linear transformation
$\lambda$ - positive scalar

Available Oracles:

function value of f
prox of a positive constant times f
function value of g
prox of a positive constant times g
value of the linear transformation A
value of the adjoint of A

(the function values of f,g are not required in the economical setting)

Usage:

out               = adlpmm(Ffun,Ffun_prox,Gfun,Gfun_prox,Afun,Atfun,lambda,startx,[L],[par])
[out,fmin]        = adlpmm(Ffun,Ffun_prox,Gfun,Gfun_prox,Afun,Atfun,lambda,startx,[L],[par])
[out,fmin,parout] = adlpmm(Ffun,Ffun_prox,Gfun,Gfun_prox,Afun,Atfun,lambda,startx,[L],[par])

Input:

Ffun - function handle for f (if f is extended real-valued, it is enough to specify the operation of f on its domain)

Ffun_prox - function handle for the proximal mapping of f times a positive constant

Gfun - function handle for g (if g is extended real-valued, it is enough to specify the operation of g on its domain)

Gfun_prox - function handle for the proximal mapping of g times a positive constant

Afun - function handle for the linear transformation A
Atfun - function handle for the adjoint of the linear transformation A

lambda - positive constant

startx - starting vector

L - positive constant; should be set to the squared norm of A (if the parameter is not given, then the squared norm of A is computed inefficiently)
par - struct containing different values required for the operation of adplmm

 field of par  possible values  default  description
  max_iter   positive integer  1000 maximal amount of iterations
eco_flag true/false   false true - economic version in which
function values are not computed 
 print_flag true/false   true true - internal printing is enabled
 real_valued_flag  true/false  false true  - g is assumed to be real-valued
        (only improved function values
         are printed)
false - otherwise 
 feas_check_flag  true/false  false true  - feasibility violation is printed
false - otherwise
 rho  positive real  1 positive parameter used in the method
 eps  positive real  1e-5 stopping criteria tolerance (the
method terminates when

with x^k being the kth iterate vector)

Output:

out   - optimal solution (up to a tolerance)
fmin     - optimal value (up to a tolerance)
parout   - a struct containing containing additional information related to the convergence. The fields of parout are:

iterNum - number of performed iterations

funValVec - vector of all function values generated by the method
FeasValVec - vector of all feasibility violation values (relevant only when par.feas_check_flag=true)
dualVec - dual optimal vector

Method Description:

Employs the alternating directions linearized proximal method of multipliers (ADLPMM) whose general update step is

$\begin{array}{ll}x^{k+1} &= {\rm prox}_{\frac{1}{\rho L}f} \left ( x^k-\frac{1}{L} A^T (Ax^k -z^k+y^k) \right )\\ z^{k+1} &= {\rm prox}_{\frac{\lambda}{\rho}g}\left ( A x^{k+1}+\frac{1}{\rho}y^k \right )\\ y^{k+1} &= y^k+\rho(Ax^{k+1}-z^{k+1}).\end{array}$

where rho is a positive constant. The last computed vector x^k+1is considered as the obtained solution and the last computed y^k+1is the obtained dual vector.

References:

Amir Beck, "First Order Methods in Optimization", SIAM, 2017.
Bingsheng He and Xiaoming Yuan, "On the O(1/n) Convergence Rate of the Douglas-Rachford Alternating Direction Method", SIAM J. Numer. Anal. 50(2): 700-709, 2012.
Jonathan Eckstein, "Some saddle-function splitting methods for convex programming", Optimization Methods and Software, 4:75-83, 1994.
Ron Shefi and Marc Teboulle, "Rate of convergence analysis of decomposition methods based on the proximal method of multipliers for convex minimization", SIAM J. Optim., 24(1):269-297, 2014.

Example 1:

Consider the problem of minimizing the l₁ norm over an affine space:

$\min \{ \|x\|_1 : Ax =b\}$

The above model is also called basis pursuit. The problem also fits the main model with

$f(x) = \|x\|_1, g(x)=\delta_{\{b\}}(x), \lambda=1$

A and b are generated by

>> randn('seed',314);

>> A=randn(20,30);

>> b=randn(20,1);

To find a solution, we can run adlpmm

>> [out,fout,parout]=adlpmm(@(x)norm(x,1),@(x,a)prox_l1(x,a),@(x)0,@(x,a)b,@(x)A*x,@(x)A'*x,1,zeros(30,1),norm(A)^2);

*********************

adlpmm

*********************

#iter fun. val. feas. viol.

1 0.000000 3.76502

2 2.018427 2.54728

3 2.549455 1.58128

4 3.242071 1.53246

5 3.734760 1.47139

: : :

996 4.569683 0.000517774

997 4.569588 0.000503603

998 4.569496 0.000493158

999 4.569407 0.000486552

1000 4.569322 0.000483765

----------------------------------

Optimal value = 4.569322

Note that we set L to be equal to the squared norm of the matrix A. If the value is not given, then adlpmm will compute the squared norm in an inefficient manner.

Example 2 (low rank matrix completion)

We begin by generating a low rank 40x40 matrix

>> randn('seed',314);

>> rand('seed',314);

>> n=40;

>> r=4;

>> X0=randn(n,r);

>> X1=randn(r,n);

>> X=X0*X1;

Obviously the matrix X has rank 4. We assume that 80% of the entries of X are known.

>> p=0.8;

>> a=floor(n^2*p);

>> sample=randperm(n^2);

>> sample=sort(sample(1:a));

>> b=X(sample);

The vector "sample" contains the list of known indices, while "b" contains their values. To recover the "true" X given the observations vector b, we need to find a minimal rank matrix that is consistent with the observations:

$\min \{ {\rm rank }(X) : {\mathcal A}(X)=b\}$

where ${\mathcal A}$ is the linear transformation that maps a 40x40 matrix to the vector containing the values of the matrix at the components of "sample". The above optimization problem is extremely difficult since the rank function is nonconvex (and actually not even continuous). Instead, we will solve the well-known relaxation in which the rank function is replaced by the nuclear norm:

$\min \{ \|X\|_* : {\mathcal A}(X)=b\}$

The problem also fits the main model with

$f(X)=\|X\|_*, g(x)=\delta_{\{b\}}(x), \lambda=1$

The constant L should be set to be the squared norm of the linear transformation ${\mathcal A}$ which is 1 in this case (since the associated matrix comprises different unit vectors).

To solve the problem, we first construct the following MATLAB function (written as a separate m-file) which evaluates the adjoint operator of ${\mathcal A}$ :

function out=adjoint_A(size_matrix,sample,y)

out=zeros(size_matrix(1),size_matrix(2));

out(sample)=y;

We can now solve the problem using adlpmm

[out,fout,parout]=adlpmm(@(X)sum(svd(X)),@(X,a)prox_nuclear(X,a),@(x)0,@(X,a)b,@(X)X(sample),@(y)adjoint_A([n,n],sample,y),1,zeros(n,n),1);

*********************

adlpmm

*********************

#iter fun. val. feas. viol.

1 0.000000 70.4765

2 452.409114 67.089

3 218.924288 0.754675

4 209.889560 0.649812

5 201.712045 0.615665

6 194.363096 0.567603

7 187.756982 0.538813

: : :

158 152.281262 1.72676e-10

159 152.281262 1.86096e-11

160 152.281262 1.55405e-10

161 152.281262 2.24433e-10

162 152.281262 2.25893e-10

Stopping because the norm of the difference between consecutive iterates is too small

----------------------------------

Optimal value = 152.281262

The obtained solution is almost a perfect reconstruction of the "true" matrix X

>> norm(X-out)

ans =

4.4730e-10

Note that it is possible to allow the solver to compute an appropriate L on its own, but this is not advisable since it consists of invoking a rather inefficient method to compute the norm of transformation.

>> [out,fout,parout]=adlpmm(@(X)sum(svd(X)),@(X,a)prox_nuclear(X,a),@(x)0,@(X,a)b,@(X)X(sample),@(y)adjoint_A([n,n],sample,y),1,zeros(n,n));

*********************

adlpmm

*********************

#iter fun. val. feas. viol.

1 0.000000 70.4765

2 452.409114 67.089

3 218.924288 0.754675

4 209.889560 0.649812

5 201.712045 0.615665

6 194.363096 0.567603

7 187.756982 0.538813

: : :

158 152.281262 1.72676e-10

159 152.281262 1.86096e-11

160 152.281262 1.55405e-10

161 152.281262 2.24433e-10

162 152.281262 2.25893e-10

Stopping because the norm of the difference between consecutive iterates is too small

----------------------------------

Optimal value = 152.281262

Unsurprisingly, using a larger L (e.g., 2 in the following run), might result with a slower convergence.

>> [out,fout,parout]=adlpmm(@(X)sum(svd(X)),@(X,a)prox_nuclear(X,a),@(x)0,@(X,a)b,@(X)X(sample),@(y)adjoint_A([n,n],sample,y),1,zeros(n,n),2);

*********************

adlpmm

*********************

#iter fun. val. feas. viol.

1 0.000000 70.4765

2 226.204557 2.40463

3 341.630479 33.5476

4 341.031920 34.386

5 280.119641 17.6311

6 217.335047 0.784551

: : :

305 152.281262 1.9659e-10

306 152.281262 4.5285e-10

307 152.281262 9.60166e-10

308 152.281262 1.29546e-09

309 152.281262 1.44147e-09

310 152.281262 1.40528e-09

Stopping because the norm of the difference between consecutive iterates is too small

----------------------------------

Optimal value = 152.281262

Example 3

Consider the problem

$\min_{x \in {\mathbb R}^{20}} \left \{ 2 \|x\|_{\infty}+5 \|Ax+b\|_1\right \}$

where A and b are given by

>> randn('seed',766);

>> n=20;

>> A = randn(n,n) ;

>> b = randn(n,1) ;

The problem fits the main model with

$f(x)=2\|x\|_{\infty},g(x)=\|x+b\|_1, \lambda =5$

We can solve the problem using adlpmm. In the following run we give a limit of 10000 iterations. Since both f and g are real-valued, it is better to call the solver with par.real_valued_flag=true since in this setting it outputs the vector with the smallest function value and only iterations in which an actual improvement of function value was obtained are printed.

>> lambda = 5 ;

>> clear par

>> par.max_iter=10000;

>> par.real_valued_flag=true;

>> [out,fout,parout] = adlpmm(@(x) 2*norm(x,inf),@(x,a) prox_linf(x,2*a),@(x) norm(x+b,1), @(x,a) prox_l1(x+b,a)-b,...

@(x) A*x, @(x) A'*x,lambda,ones(n,1),norm(A)^2,par);

*********************

adlpmm

*********************

#iter fun. val.

1 379.922993

2 155.862412

3 128.413247

4 82.971548

5 77.140573

7 74.741988

: :

3265 8.169385

3551 8.166405

3836 8.165786

4692 8.165380

8971 8.165284

----------------------------------

Optimal value = 8.165284