### Archive

Posts Tagged ‘linear programming’

## Bounded Degree Spanning Tree and an Uncrossing Lemma

I’ve been reading about the Bounded Degree Spanning Tree problem and I thought of writing some of what I am learning here. It illustrates a beautiful techique called Iterated Rounding and uses the combinatorial idea of uncrossing. I’ll try to give a high-level idea of the argument and give references on the details. The first result of this kind was given by Goemans (although there were previous results with weaker guarantees) by Goemans in Minimum Bounded Degree Spanning Trees, but the result based on iterated rounding and a subsequent improvement are due to Singh and Lau in a serie of papers. A main reference is Approximating minimum bounded degree spanning trees to within one of optimal.

The problem of bounded degree spanning tree is as follows: consider a graph ${G = (V,E)}$ with edge weights and we for some nodes ${W \subseteq V}$ a degree bound ${b_v, v \in W}$. We want to find, among the spanning trees for which the degree of ${v}$ is ${\leq b_v}$ the one with minimum cost. It is clearly a hard problem, since taking all weights equal to ${1}$ and ${b_v = 2}$ for all nodes is the Hamiltonian Path problem, which is NP-complete. We will get a different kind of approximation. Let OPT be the optimal solution: we will show an algorithm that gives a spanning tree of cost ${\leq OPT}$ such that each node ${v}$ has degree ${\leq b_v + 2}$ (this can be improved to ${b_v + 1}$ with a more sofisticated algorithm, also based on Iterated Rounding).

As always, the first step to design an approximation algorithm is to relax it to an LP. We consider the following LP:

\displaystyle \left. \begin{aligned} & \min \sum_{e\in E} c_e x_e \text{ s.t. } \\ & \qquad \left. \begin{aligned} & \sum_{e \in E} x_e = \vert V \vert - 1 \\ & \sum_{e \in E(S)} x_e \leq \vert S \vert - 1 & \forall S \subseteq V, \vert S \vert \geq 2 \\ & \sum_{e \in \delta(v)} x_e \leq b_v & \forall v \in W\\ & x_e \geq 0 & \forall e \in E \end{aligned} \right. \end{aligned} \right.

The first constraint expresses that in a spanning tree, there are at most ${\vert V \vert - 1}$ edges, the second prevent the formation of cycles and the third guarantees the degree bounds. For ${W = \emptyset}$ we have the standard Minimal Spanning Tree problem and for this problem the polytope is integral. With the degree bounds, we lose this nice property. We can solve this LP using the Ellipsoid Method. The separation oracle for the ${\sum_{e \in E(S)} x_e \leq \vert S \vert - 1}$ is done by a flow computation.

Iterated Rounding

Now, let’s go ahead and solve the LP. It would be great if we had an integral solution: we would be done. It is unfortunately not the case, but we can still hope it is almost integral in some sense: for example, some edges are integral and we can take them to the final solution and recurse the algorithm on a smaller graph. This is not far from truth and that’s the main idea of the iterated rounding. We will show that the support of the optimal solution ${E(x) = \{e \in E; x_e > 0\}}$ has some nice structure. Consider the following lemma:

Lemma 1 For any basic solution ${x}$ of the LP, either there is ${v}$ with just one incident edge in the support ${E(x)}$ or there is one ${v \in W}$ such that that at most ${3}$ edges are incident to it.

If we can prove this lemma, we can solve the problem in the following way: we begin with an empty tree: then we solve the LP and look at the support ${E(x)}$. There are two possibilities according to the lemma:

• If there is one node ${v}$ with just one edge ${e = (u,v)}$ incident to it in the support, we add it to the tree, remove ${v}$ from ${V}$, decrease ${b_u}$, make ${E = E(x)}$ (the trick is to remove in each iteration edges from ${E}$ that are not in the support. Clearly, removing those edges doesn’t hurt the objective value) and run the algorithm again. Notice that the LP called in the recursion has value less or equal then the actual LP ${- c_e}$. So if by induction we get a spanning tree respecting the new degree bounds plus two and value less or equal than the new LP value, we can just add ${e}$ and we have a solution with value less or equal than the one of the original LP respecting the degree bounds plus two.
• Otherwise, there is one node ${v \in W}$ that has degree ${3}$ in the support. So, we just remove that degree bound on that vertex (i.e. remove ${v}$ from ${W}$), make ${E = E(x)}$ (again,eliminate the edges not in the support) and run the algorithm again. Clearly, if one node is still in ${W}$, it has ${b_v \geq 1}$, since there are only three edges in the support, there will be for the rest of the computation, just three edges incident to it, so there will be at most three edges more incident to it. So it will exceed its original ${b_v}$ by at most ${2}$.

The algorithm eventually stops, since in each iteration we have less edges or less nodes in ${W}$ and the solution is as desired. The main effort is therefore to prove the lemma. But before, let’s look at the lemma: it is of the following kind: “any basic solution of the LP has some nice properties, which envolve having a not too big (at least in some point) support”. So, it involves proving that the support is not too large. That is our next task as we are trying to prove the lemma. And we will be done with:

Theorem 2 The algorithm described above produces a spanning tree of cost ${\leq Z^*}$ (the LP values and therefore ${\leq OPT}$)in which each node ${v \in W}$ has degree ${\leq b_v + 2}$.

Bounding the size of the support

We would like now to prove some result like the Lemma above: that in the solution of the LP we have either one ${v}$ with degree ${1}$ in ${E(x)}$ or we have a node in ${W}$ with degree ${\leq 3}$. First, we suppose the opposite, that ${(V,E(x))}$ has all the nodes with degree ${\geq 2}$ and all the nodes in ${W}$ have degree ${\geq 4}$. This implies that we have a large number of edges in the support. From the degrees, we know that:

$\displaystyle \vert E(x) \vert \geq \frac{1}{2} \left( 2( \vert V \vert - \vert W \vert ) + 4 \vert W \vert \right) = \vert V \vert + \vert W \vert$

We want to prove that the support ${\vert E(x) \vert}$ of the LP can’t be too large. The first question is: how to estimate the size of the support of a basic solution. The constraints look like that:

A basic solution can be represented by picking ${\vert E \vert}$ rows of the matrix and making them tight. So, if we have a general ${Ax \leq b}$ LP, we pick some submatrix ${A'}$ of ${A}$ which is ${n \times n}$ and the basic solution is just ${x = A'^{-1} b'}$. The lines of matrix ${A'}$ can be of three types: they can be ${\chi_S}$, which are corresponding to ${\sum_{e \in E(S)} x_e \leq \vert S \vert - 1}$, ${\chi_v}$ that correspond to ${\sum_{e \in \delta(v)} x_e \leq b_v}$ or ${\delta_e}$ corresponding to ${x_e \geq 0}$. There are ${\vert E \vert}$ vectors in total. The size of the support ${E(x)}$ is smaller or equal the number of rows of the form ${\chi_S, \chi_v}$ in the basic solution. Therefore the idea to bound the size of the support is to prove that “all basic solutions can be represented by a small number of rows in the form ${\chi_S, \chi_v}$. And this is done using the following:

Lemma 3 Assuming ${E = E(x)}$, for any basic solution ${x}$, there is ${Z \subseteq W}$ and a family ${\mathcal{S}}$ of sets such that:

1. The restrictions correspondent to ${S \in \mathcal{S}}$ and ${v \in Z}$ are tight for ${x}$
2. ${\{ \chi_S; S \in \mathcal{S} \} \cup \{ \chi_v; v \in Z \}}$ is an independent set
3. ${\vert \mathcal{S} \vert + \vert Z \vert = \vert E(x) \vert}$
4. ${\mathcal{S}}$ is a laminar family

The first 3 items are straightfoward properties of basic solutions. The fourth one, means that for two sets ${S_1, S_2 \in \mathcal{S}}$, one of three things happen: ${S_1 \subseteq S_2}$, ${S_2 \subseteq S_1}$ or ${S_1 \cap S_2 = \emptyset}$. Now, we based on the previous lemma and in the following result that can be easily proved by induction, we will prove Lemma 1.

Lemma 4 If ${\mathcal{S}}$ is a laminar family over the set ${V}$ where each set ${S \in \mathcal{S}}$ contains at least ${2}$ elements, then ${\vert \mathcal{S} \vert \leq \vert V \vert - 1}$.

Now, the proof of Lemma 1 is easy. Let’s do it and then we come back to prove Lemma 3. Simply see that ${\vert E(x) \vert = \vert \mathcal{S} \vert + \vert Z \vert \leq \vert V \vert - 1 + \vert W \vert}$ what contradicts ${\vert E(x) \vert \geq \vert V \vert + \vert W \vert}$.

Uncrossing argument

And now we arrive in the technical heart of the proof, which is proving Lemma 3. This says that given any basic solution, given any feasible solution, we can write it as a “structured” basic solution. We start with any basic feasible solution. This already satifies (1)-(3), then we need to change that solution to satisfy condition (4) as well. We need to get rid crossing elements, i.e., ${S,T \in \mathcal{S}}$ in the form:

We do that by the means of the:

Lemma 5 (Uncrossing Lemma) If ${S}$ and ${T}$ are intersecting and tight (tight in the sense that their respective constraint is tight), then ${S \cup T}$ and ${S \cap T}$ are also tight and:

$\displaystyle \chi_{S \cap T} + \chi_{S \cup T} = \chi_S + \chi_T$

Which corresponds to that picture:

Proof: First, we note that ${x(E(S))}$ is a supermodular function, i.e.:

$\displaystyle x(E(S)) + x(E(T)) \leq x(E(S \cap T)) + x(E(S \cup T))$

We can see that by case analysis. Every edge appearing in the left side appears in the right side with at least the same multiplicity. Notice also that it holds with strict inequality iff there are edges from ${S\setminus T}$ to ${T \setminus S}$. Now, we have:

\displaystyle \begin{aligned} (\vert S \vert - 1) + (\vert T \vert - 1) & = (\vert S \cap T \vert - 1) + (\vert S \cup T \vert - 1) \geq \\ & \geq x(E(S \cap T)) + x(E(S \cup T)) \geq \\ & \geq x(E(S)) + x(E(T)) = (\vert S \vert - 1) + (\vert T \vert - 1) \end{aligned}

where the first relation is trivial, the second is by feasibility, the third is by supermodularity and the lastone is by tightness. So, all hold with equality and therefore ${S \cap T}$ and ${S \cup T}$ are tight. We also proved that:

$\displaystyle x(E(S \cap T)) + x(E(S \cup T)) = x(E(S)) + x(E(T))$

so there can be no edge from ${S\setminus T}$ to ${T \setminus S}$ in ${E(x)}$ and therefore, thinking just of edges in ${E(x)}$ we have:

$\displaystyle \chi_{S \cap T} + \chi_{S \cup T} = \chi_S + \chi_T$

$\Box$

Uncrossing arguments are found everywhere in combinatorics. Now, we show how the Uncrossing Lemma can be used to prove Lemma 1:

Proof: Let ${x}$ be any basic solution. It can be represented by a pair ${(Y, \mathcal{C})}$ where ${Y \subseteq W}$ and ${\mathcal{C}}$ is a family of sets. We will show that the same basic solution can be represented by ${(Y, \mathcal{L})}$ where ${\mathcal{L}}$ is a laminar family and has the same size of ${\mathcal{C}}$.

Let ${\mathcal{S}}$ be all sets that are tight under ${x}$ and ${\mathcal{L}}$ a maximal laminar family of tights sets in ${\mathcal{S}}$, such that ${\{\chi_S; S \in \mathcal{L} \} \cup \{\chi_v; v \in Z \}}$ are independent. I claim that ${\vert \mathcal{L} \vert = dim(span(\mathcal{S}))}$.

In fact, suppose ${\vert \mathcal{L} \vert < dim(span(\mathcal{S}))}$, then there are sets of ${\mathcal{S}}$ we could add to ${\mathcal{L}}$ without violating independence – the problem is that those sets would cross some set. Pick such ${S \in \mathcal{S}}$ intersecting fewer possible sets in ${\mathcal{L}}$. The set ${S}$ intersects some ${T \in \mathcal{L}}$. Since both are tight we can use the Uncrossing Lemma and we get:

$\displaystyle \chi_{S \cap T} + \chi_{S \cup T} = \chi_S + \chi_T$

since ${\chi_S \notin span(\mathcal{L})}$, we can’t have simultaneously ${\chi_{S \cap T}}$ and ${\chi_{S \cup T}}$ in ${span(\mathcal{L})}$. Let’s consider two cases:

1. ${\chi_{S \cap T} \notin span(\mathcal{L})}$, then ${S \cap T \in span(\mathcal{S})}$ is in ${span(\mathcal{S}) \setminus span(\mathcal{L})}$ and intersects fewer sets of ${\mathcal{L}}$ than ${S}$, since all sets that intersect ${S \cap T}$ in ${\mathcal{L}}$ must intersect ${S}$ as well (since no set can cross ${T}$).
2. ${\chi_{S \cup T} \notin span(\mathcal{L})}$, then ${S \cup T \in span(\mathcal{S})}$ is in ${span(\mathcal{S}) \setminus span(\mathcal{L})}$ and intersects fewer sets of ${\mathcal{L}}$ than ${S}$, since all sets that intersect ${S \cup T}$ in ${\mathcal{L}}$ must intersect ${S}$.

In either case we have a contradiction, so we proved that ${\vert \mathcal{L} \vert = dim(span(\mathcal{S}))}$. So we can generate all the space of tight sets with a laminar family. $\Box$

And this finishes the proof. Let’s go over all that we’ve done: we started with an LP and we wanted to prove that the support of each solution was not too large. We wanted that because we wanted to prove that there was one node with degree one in the support or a node in ${W}$ with small (${\leq 3}$) degree. To prove that the degree of the support is small, we show that any basic solution has a representation in terms of a laminar family. Then we use the fact that laminar families can’t be very large families of sets. For that, we use the celebrated Uncrossing Lemma.

Note: Most of this is based on my notes on David Williamson’s Approximation Algorithms class. I spent some time thinking about this algorithm and therefore I decided o post it here.

Categories: theory Tags:

## Duality Theorem for Semidefinite Programming

I’ve been reading through Luca Trevisan’s blog posts about relaxations of the Sparsest Cut problem. He discusses three relaxations: (i) spectral relaxation; (ii) Leighton-Rao and (iii) Arora-Rao-Vazirani. (i) and (iii) are semi-definite programs (SDP), which is an incredibly beautiful and useful generalization of Linear Programs. This technique is the core of the famous MAX-CUT algorithm from Goemans and Williamson and was also applied to design approximation algorithms for a large variety of combinatorial problems. As for Linear Programming, there is a Duality Theorem that allows us to give certificates to lower and upper bounds. I was curious to learn more about it and my friend Ashwin suggested me those great lecture notes by Lazlo Lovasz about semi-definite programming. I was surprised that the proof of the duality theorem for SDP is almost the same as for LP, just changing a few steps. We begin by using the Convex Separation Theorem to derive a Semi-definite version of Farkas’ Lemma. We use this version of Farkas’ Lemma to prove the Duality Theorem in a similar manner that is normally done for LP. Let’s go through the proof, but first, let’s define positive semidefinite matrices:

Definition 1 A matrix ${A}$ is said to be positive semidefinite (denoted by ${A \succeq 0}$) if for all ${x \in {\mathbb R}^n}$ we have ${x^t A x \geq 0}$.

It is the same as saying that all eigenvalues of ${A}$ are non-negative, since the smallest eigenvalue of ${A}$ is given by ${min_{\Vert x \Vert = 1} x^t A x}$. We denote ${A \succ 0}$ when all eigenvalues of ${A}$ are stricly positive. It is the same as ${x^t A x > 0}$ for all ${x \neq 0}$. Also, given two matrices ${A}$ and ${B}$ we use the following notation: ${A \bullet B = \sum_{ij} A_{ij} B_{ij}}$ what is nothing more than the dot-product when we see those matrices as vectors in ${{\mathbb R}^{n^2}}$. Therefore:

Definition 2 A semi-definite program (SDP) is an optimization problem where the variable is a matrix ${X}$ and has the form:

\displaystyle \left. \begin{aligned} & \min C \bullet X \text{ s.t. } \\ & \qquad \left. \begin{aligned} & D_i \bullet X = d_i & \forall i \\ & X \succeq 0 \\ \end{aligned} \right. \end{aligned} \right. \ \ \ \ \ (1)

that is, it is a linear program and the restriction that the variable, viewed as a matrix, must be positive semi-definite.

The interesting thing is to use that the set ${\mathcal{P} \subseteq {\mathbb R}^{n^2}}$ of semi-definite matrices is a convex cone, i.e., if ${A, B \in \mathcal{P}}$ then ${c_1 A + c_2 B \in \mathcal{P}}$ for any ${c_1, c_2 \geq 0}$. It is easy to see this is a convex set. Now we use the following theorem to prove Farkas’ Lemma:

Theorem 3 (convex separation) Given two convex sets ${C_1, C_2 \subseteq {\mathbb R}^n}$ such that ${C_1 \cap C_2 = \emptyset}$ then there is ${p, v \in {\mathbb R}^n}$ such that ${v^t (x - p) \leq 0}$ for all ${x \in C_1}$ and ${v^t (x - p) \geq 0}$ for all ${x \in C_2}$. Besides, if one of them is a cone, it holds with ${p = 0}$.

Theorem 4 (Semi-definite Farkas’ Lemma) Exacly one of the following problems has a solution:

1. ${x_1 A_1 + \hdots + x_k A_k \succ 0}$, ${x_i \in {\mathbb R}}$
2. ${A_1 \bullet Y = 0}$, …, ${A_k \bullet Y = 0}$, ${Y \succeq 0}$

Proof: If problem ${1}$ doesn’t have a solution then the cone ${\mathcal{P}}$ is disjoint from the convex cone ${\{ \sum_i x_i A_i ; x_i \in {\mathbb R} \}}$ and therefore they can be separated. This means that there is ${Y}$ (${Y}$ plays the role of ${v}$ in the convex separation theorem), such that: ${\left( \sum_i x_i A_i \right) \bullet Y \leq 0}$ for all ${x_1, \hdots, x_k \in {\mathbb R}}$ and ${Y \bullet M \geq 0}$ for all ${M \in \mathcal{P}}$. Now, taking ${x_i = 1}$ and all others ${0}$ and then later ${x_i = -1}$ and all other zero, we can easily prove that ${A_i \bullet Y = 0}$.

It remains to prove that ${Y \succeq 0}$. Just take ${M = x x^t}$ for ${x \in {\mathbb R}^n}$ and then we have that ${Y \bullet M = x^t Y x \geq 0}$ for all ${x \in {\mathbb R}^n}$ and therefore, ${Y \succeq 0}$. $\Box$

Now, we can use this to prove the duality theorem. We will use a slighly different version of Farkas’ Lemma together with an elementary result in Linear Algebra. The following version comes from applying the last theorem to the matrices:

$\displaystyle \begin{bmatrix} A_1 & \\ & 0 \end{bmatrix}, \hdots, \begin{bmatrix} A_k & \\ & 0 \end{bmatrix}, \begin{bmatrix} -B & \\ & 1 \end{bmatrix}$

instead of ${A_1, \hdots, A_{k+1}}$.

Theorem 5 (Semi-definite Farkas’ Lemma: Inomogeneous Version) Exacly one of the following problems has a solution:

1. ${x_1 A_1 + \hdots + x_k A_k \succ B}$, ${x_i \in {\mathbb R}}$
2. ${A_1 \bullet Y = 0}$, …, ${A_k \bullet Y = 0}$, ${B \bullet Y \geq 0}$ , ${Y \succeq 0}$

where ${A \succ B}$ means ${A - B \succ 0}$. Now, an elementary Linear Algebra result before we can proceed to the proof of the duality theorem:

Theorem 6 Given two semi-definite matrices ${A}$ and ${B}$ we have ${tr(AB) \geq 0}$, where ${tr(M) = \sum_i M_{ii}}$

Proof: The trace of a matrix is invariant under change of basis, i.e. for any non-singular matrix ${\Gamma}$, we have that ${tr(A) = tr(\Gamma^{-1} A \Gamma)}$. This is very easy to see, since:

\displaystyle \begin{aligned} tr(\Gamma^{-1} A \Gamma) & = \sum_i (\Gamma^{-1} A \Gamma)_{ii} = \sum_i \sum_j (\Gamma^{-1})_{ij} (A \Gamma)_{ji} = \\ & = \sum_i \sum_j \sum_k (\Gamma^{-1})_{ij} A_{jk} (\Gamma)_{ki} = \\ & = \sum_j \sum_k A_{jk} \left( \sum_i (\Gamma)_{ki} (\Gamma^{-1})_{ij} \right) = \\ & = \sum_j \sum_k A_{jk} \delta_{jk} = tr(A) \\ \end{aligned}

Now, we can write in the basis of ${A}$‘s eigenvectors. Let ${\Gamma}$ be a matrix s.t. ${A = \Gamma^{t} D \Gamma}$ where ${D = diag(\lambda_1, \hdots, \lambda_n)}$ Now:

\displaystyle \begin{aligned} tr(AB) & = tr(\Gamma^{t} AB \Gamma) = tr(\Gamma^{t} A \Gamma \Gamma^{t} B \Gamma) = tr(D \Gamma^{t} B \Gamma) = \\ & = \sum_{ii} \lambda_i (\Gamma^{t} B \Gamma)_{ii} \geq 0 \end{aligned}

since ${\lambda_i \geq 0}$ because the matrix is positive semi-definite and:

$\displaystyle (\Gamma^{t} B \Gamma)_{ii} = e_i^t \Gamma^{t} B \Gamma e_i = (\Gamma e_i)^t B (\Gamma e_i)$

$\Box$

Now, we are ready for the Duality Theorem:

Theorem 7 (SDP Duality) The dual of the program 1 is given by:

\displaystyle \left. \begin{aligned} & \max d^t y \text{ s.t. } \\ & \qquad \left. \begin{aligned} & y_1 D_1 + \hdots + y_k D_k \preceq C \end{aligned} \right. \end{aligned} \right. \ \ \ \ \ (2)

We call 1 the primal problem and 2 the dual. If ${X}$ is feasible for 1 and ${y}$ is feasible for 2, then ${d^t y \leq C \bullet X}$. Besides, if the dual has a strict feasible solution (i.e. with ${\sum_i y_i D_i \prec C}$) then the dual optimum is attained and both have the the same optimal value.

As we can see in the theorem above, the main difference between this duality theorem and the duality theorem of linear programming is this “stictly feasible” condition in the theorem. Let’s proceed to the proof. As in the the LP proof the outline is the following: first we prove weak duality (${d^t y \leq C \bullet X}$ for feasible ${X,y}$) and then we consider the dual restrictions with ${d^t y > v^*_{dual}}$ and we apply Farkas’ Lemma:

Proof: Weak duality: If ${X}$ is feasible for 1 and ${y}$ is feasible for 2 then:

$\displaystyle C\bullet X \geq \left( \sum_i y_i D_i \right) \bullet X = \sum_i y_i D_i\bullet X$

because ${C - \sum_i y_i D_i \succeq 0}$ and the ${\bullet}$ of two positive semi-definite matrices is non-negative. Now:

$\displaystyle d^t y = \sum_i y_i d_i = \sum_i y_i D_i\bullet X$

by 1. Therefore, ${C\bullet X \geq d^t y}$

Strong duality: Clearly, the following problem has no solution:

$y_1 D_1 + \hdots + y_k D_k \prec C$

$y^t d > v^*_{dual}$

where ${v^*_{dual}}$ is the optimal value of 2. Now we apply the inomogeneous Farkas’ Lemma to:

$\displaystyle y_1 \begin{bmatrix} -D_1 & \\ & d_1 \end{bmatrix} + \hdots + y_k \begin{bmatrix} -D_k & \\ & d_k \end{bmatrix} \succ \begin{bmatrix} -C & \\ & v^*_{dual} \end{bmatrix}$

and we get a matrix ${X' = \begin{bmatrix} X & x^t \\ x & x_{n+1} \end{bmatrix}}$ such that:

$\displaystyle \begin{bmatrix} -D_i & \\ & d_i \end{bmatrix} \bullet X' = 0, \begin{bmatrix} -C & \\ & v^*_{dual} \end{bmatrix} \bullet X' = 0, X' \succeq 0$

which means: ${D_i \bullet X = d_i x_{n+1}}$ and ${C \bullet X \leq v^*_{dual} x_{n+1}}$. We know that ${X \succeq 0}$ and ${x_{n+1} \geq 0}$. We just need to prove that ${x_{n+1} > 0}$. We already know it is not negative, so we need to prove it is not zero and here we will need the hypothesis that the dual has a strict solution. If ${x_{n+1} = 0}$ then there would be solution to:

$\displaystyle X\bullet D_i = 0, X\bullet C = 0, X \succeq 0$

and by Farkas’ Lemma, the problem:

$\displaystyle \sum_i y_i D_i \prec C$

would have no solution and we know it does have a solution. Therefore, ${x_{n+1} > 0}$.

Everything else comes from combining the weak and strong version described above.

$\Box$
Categories: theory Tags:

## Consistent Labeling

This week, me and Igor are presenting the paper “Metric clustering via consistent labeling” by Krauthgamer and Roughgarden in our Algorithms Reading Group. To prepare for the presentation, I thought that writting a blog post about it was a nice idea. Consistent Labeling is a framework that allows us to represent a variety of metric problems, as computing a separating decomposition of a metric space, a padded decomposition (both which are the main ingredient of embedding metric spaces into dominating trees), sparse cover, metric triangulation and so on. Here, I’ll define the Consistent Labeling Problem, formulate it as an Integer Program and show how we can get an approximation algorithm to it by rounding its linear relaxation.

First, consider a base set ${A}$. We want to attribute labels in ${L}$ to the elements of ${A}$ respecting some constraints. First, each element ${a \in A}$ has associated with it a subset ${L_a \subseteq L}$ of labels it can be assigned. Each element can receive at most ${k}$ labels. Second, there is a collection ${\mathcal{C}}$ of subsets of ${A}$, so that for each ${S \in \mathcal{C}}$, there must be one label ${i \in L}$ that is assigned to all elements in ${S}$. For each element in ${S \in \mathcal{C}}$, there is a penalty ${w_S}$ to violate this constraint.

Our goal is to find a probability distribution over labelings that minimizes the total penalty ${\sum_{S \in \mathcal{C}} w_S Pr[S \text{ is violated}]}$. Let’s formulate this is an integer program. For that, the first thing we need are decision variables: let ${x_{ai}}$ be a ${{0,1}}$-variable indicating if label ${i \in L}$ is assigned to element ${a \in A}$. Let the variable ${y_{iS}}$ mean that the label ${i}$ was assigned to all elements in ${S}$ and let ${z_S}$ mean that set ${S}$ is consistently labeled. A linear programming formulation can be therefore expressed as:

\displaystyle \left. \begin{aligned} & \min \sum_{S \in \mathcal{C}} w_S z_S \text{ s.t. } \\ & \qquad \left\lbrace \begin{aligned} & \sum_{i \in L} x_{ia} \leq k & \forall a \in A \\ & y_{iS} \leq x_{ia} & \forall S \in \mathcal{C}, a \in S, i \in L\\ & z_S \leq \sum_{i \in L} y_{iS} & \forall S \in \mathcal{C} \\ & z_S \leq 1 & \forall S \in \mathcal{C} \\ & x_{ia} = 0 & \forall i \notin L_a \\ \end{aligned} \right. \end{aligned} \right. \ \ \ \ \ (1)

It is not hard to see that if ${x,y,x}$ are ${\{0,1\}}$-variables then the formulation corresponds to the original problem. Now, let’s relax it to a Linear Program and interpret those as probabilities. The rounding procedure we use is a generalization of the one in “Approximation algorithms for classification problems” of Kleinberg and Tardos: until every object has ${k}$ labels, repeat the following procedure: pick ${i \sim \text{Uniform}(L)}$ and ${t \sim \text{Uniform}([0,1])}$ and assign label ${i}$ to all objects with ${x_{ai} > t}$. In the end, pick the ${k}$ first labels assigned to each object.

What we just described is a procedure that, given a solution to the relaxed LP, produces a randomized labeling of the objects. Now, we need to prove that the solution produced by this randomized labeling is good in expectation, that is, that ${\sum_{S \in \mathcal{C}} w_S Pr[S \text{ is violated}]}$ is good compared to the optimal single deterministic assignment. The authors prove that they are within a factor of ${2 f_{max}}$, where ${f_{max} = \max_{S \in \mathcal{C}} \vert S \vert}$.

Theorem 1 For each ${S \in \mathcal{C}}$, the probability that ${S}$ is consistently labeled is lower bounded by ${1 - \left( 1 - \frac{z_S}{k \vert S \vert} \right)^k}$.

Proof: Since we are trying to lower bound the probability of getting consistently labeled, we can consider just the probability of all the elements in ${S}$ to be consistently labeled in the same iteration – let’s estimate this probability. This is:

$\displaystyle q = \sum_{i \in L} \frac{1}{\vert L \vert} Pr[S \text{is all labeled with } i \text{ in iteration } j]$

If ${i \in L}$ is chosen in iteration ${j}$, all elements are labeled with ${i}$ if ${t < x_{ia}}$ for all ${a \in S}$, so the probability is ${\min_{a \in S} x_{ia} \geq y_{iS}}$. So, we have:

$\displaystyle q = \sum_{i \in L}\sum_{i \in L} \frac{1}{\vert L \vert} y_{iS} = \frac{z_S}{\vert L \vert}$

Now, let ${p}$ be the probability that set ${S}$ is hit by the labeling in phase ${j}$. If label ${i \in L}$ is chosen, the set ${S}$ is hit by the labeling if ${t < \max_{a \in S} x_{ia}}$, therefore:

$\displaystyle p = \sum_{i \in L} \frac{1}{\vert L \vert} \max_{a \in S} x_{ia} \leq \sum_{i \in L} \frac{1}{\vert L \vert} \sum_{a \in S} x_{ia}$

inverting the order of the summation, we get:

$\displaystyle p \leq \frac{1}{\vert L \vert} \sum_{a \in S} \sum_{i \in L} x_{ia} \leq \frac{1}{\vert L \vert} \sum_{a \in S} k = \frac{k \vert S \vert}{\vert L \vert}$

The probability that ${S}$ is consistently labeled is smaller greater than the probability that it is consistently labeled in the same iteration before the set is hit ${k}$ times. In one iteration three things may happen: either the set is not hit, or it is hit but it is not consistently labeled or it is consistently labeled. The figure below measures how many times the set is hit. The down arrows represent the event that the set ${S}$ is consistently labeled:

A standard trick in this cases is to disregard the self-loops and normalize the probabilities. This way the probability that ${S}$ is consistently labeled is:

$\displaystyle \frac{q}{p} \sum_{j=0}^{k-1} \left( \frac{p-q}{p} \right)^j = \frac{q}{p} \cdot \frac{ 1 - \left( \frac{p-q}{p} \right)^k }{1 - \frac{p-q}{p} } = 1 - \left( 1 - \frac{q}{p} \right)^k$

Now, we just use that ${p \leq \frac{k \vert S \vert}{\vert L \vert}}$ and ${q = \frac{z_S}{\vert L \vert}}$ to obtain the desired result. $\Box$

The approximation factor of ${2 f_{max}}$ follows straight from the previous theorem by considering the following inequalities:

$\displaystyle \left( 1 - \frac{a}{k}\right)^k \leq e^{-a} \leq 1 - \frac{a}{2}$

and noting that the LP solution is a lower bound of the optimal value

Theorem 2 The rounding procedure gives a randomized algorithm that, in expectation, achieves a ${2 f_{max}}$ approximation to the optimal consistent labeling.

Categories: theory Tags: