Big Red Bits

Duality Theorem for Semidefinite Programming

August 9th, 2009 renatoppl No comments

I’ve been reading through Luca Trevisan’s blog posts about relaxations of the Sparsest Cut problem. He discusses three relaxations: (i) spectral relaxation; (ii) Leighton-Rao and (iii) Arora-Rao-Vazirani. (i) and (iii) are semi-definite programs (SDP), which is an incredibly beautiful and useful generalization of Linear Programs. This technique is the core of the famous MAX-CUT algorithm from Goemans and Williamson and was also applied to design approximation algorithms for a large variety of combinatorial problems. As for Linear Programming, there is a Duality Theorem that allows us to give certificates to lower and upper bounds. I was curious to learn more about it and my friend Ashwin suggested me those great lecture notes by Lazlo Lovasz about semi-definite programming. I was surprised that the proof of the duality theorem for SDP is almost the same as for LP, just changing a few steps. We begin by using the Convex Separation Theorem to derive a Semi-definite version of Farkas’ Lemma. We use this version of Farkas’ Lemma to prove the Duality Theorem in a similar manner that is normally done for LP. Let’s go through the proof, but first, let’s define positive semidefinite matrices:

Definition 1 A matrix ${A}$ is said to be positive semidefinite (denoted by ${A \succeq 0}$ ) if for all ${x \in {\mathbb R}^n}$ we have ${x^t A x \geq 0}$ .

It is the same as saying that all eigenvalues of ${A}$ are non-negative, since the smallest eigenvalue of ${A}$ is given by ${min_{\Vert x \Vert = 1} x^t A x}$ . We denote ${A \succ 0}$ when all eigenvalues of ${A}$ are stricly positive. It is the same as ${x^t A x > 0}$ for all ${x \neq 0}$ . Also, given two matrices ${A}$ and ${B}$ we use the following notation: ${A \bullet B = \sum_{ij} A_{ij} B_{ij}}$ what is nothing more than the dot-product when we see those matrices as vectors in ${{\mathbb R}^{n^2}}$ . Therefore:

Definition 2 A semi-definite program (SDP) is an optimization problem where the variable is a matrix ${X}$ and has the form:

$\displaystyle \left. \begin{aligned} & \min C \bullet X \text{ s.t. } \\ & \qquad \left. \begin{aligned} & D_i \bullet X = d_i & \forall i \\ & X \succeq 0 \\ \end{aligned} \right. \end{aligned} \right. \ \ \ \ \ (1)$

that is, it is a linear program and the restriction that the variable, viewed as a matrix, must be positive semi-definite.

The interesting thing is to use that the set ${\mathcal{P} \subseteq {\mathbb R}^{n^2}}$ of semi-definite matrices is a convex cone, i.e., if ${A, B \in \mathcal{P}}$ then ${c_1 A + c_2 B \in \mathcal{P}}$ for any ${c_1, c_2 \geq 0}$ . It is easy to see this is a convex set. Now we use the following theorem to prove Farkas’ Lemma:

Theorem 3 (convex separation) Given two convex sets ${C_1, C_2 \subseteq {\mathbb R}^n}$ such that ${C_1 \cap C_2 = \emptyset}$ then there is ${p, v \in {\mathbb R}^n}$ such that ${v^t (x - p) \leq 0}$ for all ${x \in C_1}$ and ${v^t (x - p) \geq 0}$ for all ${x \in C_2}$ . Besides, if one of them is a cone, it holds with ${p = 0}$ .

Theorem 4 (Semi-definite Farkas’ Lemma) Exacly one of the following problems has a solution:

${x_1 A_1 + \hdots + x_k A_k \succ 0}$ , ${x_i \in {\mathbb R}}$

${A_1 \bullet Y = 0}$ , …, ${A_k \bullet Y = 0}$ , ${Y \succeq 0}$

Proof: If problem ${1}$ doesn’t have a solution then the cone ${\mathcal{P}}$ is disjoint from the convex cone ${\{ \sum_i x_i A_i ; x_i \in {\mathbb R} \}}$ and therefore they can be separated. This means that there is ${Y}$ ( ${Y}$ plays the role of ${v}$ in the convex separation theorem), such that: ${\left( \sum_i x_i A_i \right) \bullet Y \leq 0}$ for all ${x_1, \hdots, x_k \in {\mathbb R}}$ and ${Y \bullet M \geq 0}$ for all ${M \in \mathcal{P}}$ . Now, taking ${x_i = 1}$ and all others ${0}$ and then later ${x_i = -1}$ and all other zero, we can easily prove that ${A_i \bullet Y = 0}$ .

It remains to prove that ${Y \succeq 0}$ . Just take ${M = x x^t}$ for ${x \in {\mathbb R}^n}$ and then we have that ${Y \bullet M = x^t Y x \geq 0}$ for all ${x \in {\mathbb R}^n}$ and therefore, ${Y \succeq 0}$ . $\Box$

Now, we can use this to prove the duality theorem. We will use a slighly different version of Farkas’ Lemma together with an elementary result in Linear Algebra. The following version comes from applying the last theorem to the matrices:

$\displaystyle \begin{bmatrix} A_1 & \\ & 0 \end{bmatrix}, \hdots, \begin{bmatrix} A_k & \\ & 0 \end{bmatrix}, \begin{bmatrix} -B & \\ & 1 \end{bmatrix}$

instead of ${A_1, \hdots, A_{k+1}}$ .

Theorem 5 (Semi-definite Farkas’ Lemma: Inomogeneous Version) Exacly one of the following problems has a solution:

${x_1 A_1 + \hdots + x_k A_k \succ B}$ , ${x_i \in {\mathbb R}}$

${A_1 \bullet Y = 0}$ , …, ${A_k \bullet Y = 0}$ , ${B \bullet Y \geq 0}$ , ${Y \succeq 0}$

where ${A \succ B}$ means ${A - B \succ 0}$ . Now, an elementary Linear Algebra result before we can proceed to the proof of the duality theorem:

Theorem 6 Given two semi-definite matrices ${A}$ and ${B}$ we have ${tr(AB) \geq 0}$ , where ${tr(M) = \sum_i M_{ii}}$

Proof: The trace of a matrix is invariant under change of basis, i.e. for any non-singular matrix ${\Gamma}$ , we have that ${tr(A) = tr(\Gamma^{-1} A \Gamma)}$ . This is very easy to see, since:

$\displaystyle \begin{aligned} tr(\Gamma^{-1} A \Gamma) & = \sum_i (\Gamma^{-1} A \Gamma)_{ii} = \sum_i \sum_j (\Gamma^{-1})_{ij} (A \Gamma)_{ji} = \\ & = \sum_i \sum_j \sum_k (\Gamma^{-1})_{ij} A_{jk} (\Gamma)_{ki} = \\ & = \sum_j \sum_k A_{jk} \left( \sum_i (\Gamma)_{ki} (\Gamma^{-1})_{ij} \right) = \\ & = \sum_j \sum_k A_{jk} \delta_{jk} = tr(A) \\ \end{aligned}$

Now, we can write in the basis of ${A}$ ‘s eigenvectors. Let ${\Gamma}$ be a matrix s.t. ${A = \Gamma^{t} D \Gamma}$ where ${D = diag(\lambda_1, \hdots, \lambda_n)}$ Now:

$\displaystyle \begin{aligned} tr(AB) & = tr(\Gamma^{t} AB \Gamma) = tr(\Gamma^{t} A \Gamma \Gamma^{t} B \Gamma) = tr(D \Gamma^{t} B \Gamma) = \\ & = \sum_{ii} \lambda_i (\Gamma^{t} B \Gamma)_{ii} \geq 0 \end{aligned}$

since ${\lambda_i \geq 0}$ because the matrix is positive semi-definite and:

$\displaystyle (\Gamma^{t} B \Gamma)_{ii} = e_i^t \Gamma^{t} B \Gamma e_i = (\Gamma e_i)^t B (\Gamma e_i)$

$\Box$

Now, we are ready for the Duality Theorem:

Theorem 7 (SDP Duality) The dual of the program 1 is given by:

$\displaystyle \left. \begin{aligned} & \max d^t y \text{ s.t. } \\ & \qquad \left. \begin{aligned} & y_1 D_1 + \hdots + y_k D_k \preceq C \end{aligned} \right. \end{aligned} \right. \ \ \ \ \ (2)$

We call 1 the primal problem and 2 the dual. If ${X}$ is feasible for 1 and ${y}$ is feasible for 2, then ${d^t y \leq C \bullet X}$ . Besides, if the dual has a strict feasible solution (i.e. with ${\sum_i y_i D_i \prec C}$ ) then the dual optimum is attained and both have the the same optimal value.

As we can see in the theorem above, the main difference between this duality theorem and the duality theorem of linear programming is this “stictly feasible” condition in the theorem. Let’s proceed to the proof. As in the the LP proof the outline is the following: first we prove weak duality ( ${d^t y \leq C \bullet X}$ for feasible ${X,y}$ ) and then we consider the dual restrictions with ${d^t y > v^*_{dual}}$ and we apply Farkas’ Lemma:

Proof: Weak duality: If ${X}$ is feasible for 1 and ${y}$ is feasible for 2 then:

$\displaystyle C\bullet X \geq \left( \sum_i y_i D_i \right) \bullet X = \sum_i y_i D_i\bullet X$

because ${C - \sum_i y_i D_i \succeq 0}$ and the ${\bullet}$ of two positive semi-definite matrices is non-negative. Now:

$\displaystyle d^t y = \sum_i y_i d_i = \sum_i y_i D_i\bullet X$

by 1. Therefore, ${C\bullet X \geq d^t y}$

Strong duality: Clearly, the following problem has no solution:

$y_1 D_1 + \hdots + y_k D_k \prec C$

$y^t d > v^*_{dual}$

where ${v^*_{dual}}$ is the optimal value of 2. Now we apply the inomogeneous Farkas’ Lemma to:

$\displaystyle y_1 \begin{bmatrix} -D_1 & \\ & d_1 \end{bmatrix} + \hdots + y_k \begin{bmatrix} -D_k & \\ & d_k \end{bmatrix} \succ \begin{bmatrix} -C & \\ & v^*_{dual} \end{bmatrix}$

and we get a matrix ${X' = \begin{bmatrix} X & x^t \\ x & x_{n+1} \end{bmatrix}}$ such that:

$\displaystyle \begin{bmatrix} -D_i & \\ & d_i \end{bmatrix} \bullet X' = 0, \begin{bmatrix} -C & \\ & v^*_{dual} \end{bmatrix} \bullet X' = 0, X' \succeq 0$

which means: ${D_i \bullet X = d_i x_{n+1}}$ and ${C \bullet X \leq v^*_{dual} x_{n+1}}$ . We know that ${X \succeq 0}$ and ${x_{n+1} \geq 0}$ . We just need to prove that ${x_{n+1} > 0}$ . We already know it is not negative, so we need to prove it is not zero and here we will need the hypothesis that the dual has a strict solution. If ${x_{n+1} = 0}$ then there would be solution to: