Posts Tagged ‘sparsest cut’

Eigenvectors and communities

July 26th, 2009 1 comment

I liked a lot the talk by Luca Trevisan in STOC’09 about the “Max Cut and the Smallest Eigenvalue”. In this paper he proposes an {.531}-approximation algorithm for MAX-CUT, which is the first approximation algorithm for MAX-CUT that doesn’t use semi-definite programming (SDP). The best possible approximation for that problem is the 0.878-approximation given by Goemans and Williamson, based on SDP (which is the best possible approximation assuming the Unique Games Conjecture).

I won’t discuss the paper in much detail here, but its heart is the very beautiful idea of analyzing properties of a graph looking at the spectrum of the matrix that defines it. For example, represent a graph {G} by its adjacent matrix {A}, i.e., {A} is an {n \times n} matrix (where {n} is the number of nodes of {G}) and {A_{ij} = 1} iff the edge {(i,j)} is in {G}. Let also {D} be the diagonal matrix for which {D_{ii}} is the degree of node {i}. The normalized adjacency matrix is given by {D^{-1/2} A D^{-1/2}}. Let {\lambda_1 \geq \lambda_2 \geq \hdots \geq \lambda_n} be its eigenvalues. It is easy to see that all the eigenvalues are in {[-1,1]}, since:

\displaystyle 0 \leq \sum_{ij} A_{ij} (x_i - x_j)^2 = 2 x^t D x - 2 x^t A x \Rightarrow \frac{x^t A x}{x^t D x} \leq 1

taking {y = D^{1/2}x} we have that: {\frac{y^t D^{-1/2} A D^{-1/2} y}{y^t y} \leq 1} for all {y \in {\mathbb R}^n}. It is not hard to see the the maximum eigenvalue is:

\displaystyle \lambda_1 = \max_{y \in {\mathbb R}^n} \frac{y^t D^{-1/2} A D^{-1/2} y}{y^t y}

To prove {\lambda_n \geq -1} it is analogous, just by analyzing {\sum_{ij} A_{ij} (x_i + y_i)^2 \geq 0}. The next natural questions is: do those bounds hold with equality?

In fact, {\lambda_1 = 1} for all graphs. It is very easy to see that, just take the vector {x \in {\mathbb R}^n} with {x_i = \sqrt{d_i}} where {d_i} is the degree of node {i}. And for {\lambda_n}, it is not difficult to see that {\lambda_n = -1} iff the graph has a bipartite connected component. It is also easy to see that if the graph has {k} connected components then {\lambda_1 = \hdots = \lambda_k = 1}.

So, given a connected graph {\lambda_2 < 1}. But what happens if {\lambda_2} is very close to {1}. Intuitively, it means that the graph is almost disconnected in the sense that it has a very sparse cut. For {d}-regular graphs (a graph where all nodes have degree {d}), this intuition is captured by Cheeger’s inequality.

\displaystyle \sqrt{2 (1-\lambda_2) } \geq h_G \geq \frac{1}{2} (1-\lambda_2)

where {h_G} is the sparsity of {G}, defined as {\min_{S \subseteq V} \frac{\vert e(S, V \setminus S) \vert}{d \min\{\vert S \vert, \vert V \setminus S \vert \}}}. In the paper, Luca Trevisan discusses an analogue os Cheeger’s Inequality for the smallest eigenvalues {\lambda_n}. Instead of measuring how close we are from a disconnected graph, we measure how close we are from a a bipartite graph.

Let’s try to get some intuition: if a {d}-regular graph has a good sparse cut, like in the picture below, we could divide it in two parts and try to get an eigenvalue close to {1} by the following way: set “almost” {1} in one part and almost {-1} in the other. Since there are very few edges crossing the cut, after being transformed by {\frac{1}{d}A} the values are almost still the same. Also if the graph is almost bipartite, we can set almost {1} in one part and almost {-1} in the other, and after applying {\frac{1}{d}A}, what was {1} becomes {-1} and vice versa.


Ok, we can use eigenvalues to have a good idea if the graph has good expansion or not, but how to use that to actually find good partitions of it? Let {x_2} be the eigenvector corresponding to {\lambda_2}. The intuition in the last paragraph tells us that maybe we should think of getting {S_1 = \{v \in V; x_2(v) < 0\}} and {S_1 = \{v \in V; x_2(v) \geq 0\}}. More generally we can substitute {0} by a threshold {t}. This gives us only {n} possible partitions to test and choose the best, instead of {2^n}.

Finding Communities in Graphs

Real world graphs like social networks have clusters of nodes that can be identified as communities. There is a lot of work in finding those communities using all sorts of methods – more references about that can be found in Jon Kleinberg’s Networks course website. Spectral techniques are a natural candidate for doing that. I did a small experiment using Orkut, the most used social network in Brasil. Let {G} be the entire graph of Orkut and {u} be the node representing my profile on {G} and let: {\partial \{u\} = \{ v; (u,v) \in E(G)\}} be the set of all my friends and {G[\partial \{u\}]} be the graph induced on my friends.

First, I crawled the web to reconstruct {G[\partial \{u\}]} (notice I am not in the graph, so I am not using the fact that I am connecting all them. I am just using the connections between them). First, I have {211} friends, which form a huge {194} nodes component, and smaller components (one of size {5}, two of size {3} and six of size {1}). Let’s forget about the smaller components and care about the huge component: in this component there are my friends from undergrad, my friends from high-school, my family and some miscellaneous friends. Although all of them are interconnected, I wanted to be able to separate them just by using the topology of the graph.

I did the following experiment: I calculated the eigenvalues and eigenvectors of {D^{-1/2} A D^{-1/2}}. Let {1 = \lambda_1 \geq \hdots \lambda_n} be the eigenvalues and {x_1, \hdots, x_n} be the eigenvectors. For each node {v \in V}, I associated with coordinates {(x_2(v), x_n(v))} in a way that a vertical cut would be a good approximation of the sparsest cut and an horizontal line would be a good approximation of the max-cut. The result is there:


After doing that I labeled the nodes of the graph with colors: my friends from IME (my undergrad university), I labeled blue. My friends from high-school, I labeled green, my family nodes, I labeled yellow and friends from other places, I labeled red. Also, there were some people that used strange characters in their names I could no parse, so I also labeled them red. I was happy to see that the algorithm efficiently separated the green from the blue nodes: clearly distinguishing between my high-school and my undergrad friends.

More about spectral methods

Luca Trevisan has a lot of very cool blog posts about Cheeger’s inequality and other interesting proofs of spectral facts. This MIT course has a lot of material about Spectral Graph Theory.

Sparsest Cut via Metric Embeddings

June 10th, 2009 No comments

A very powerful technique to design approximation algorithms is called metric embeddings. A lot of problems are easy when the graph has some special structure like a tree or a cycle or when the vertices are encoded as points in a {{\mathbb R}^n}. This gives us more structure to the problem an allows us to make geometric considerations about the object we are studying – and this is sometimes the key to find a good approximation. The basic idea is to start with a space without much structure and try to approximate it by some space with stronger geometric or topological properties. A lot has be done recently in finding cuts in graphs or solving Steiner Tree problems.

Consider the problem of finding the sparsest cut in a graph. Given a graph {G = (V,E)}, we define the sparsity of a {(S, \overline{S})} as:

\displaystyle \alpha(S) = \frac{\vert \partial S \vert}{\min\{ \vert S \vert, \vert V \setminus S \vert \}}

where {\partial S } is the set of edges with one endpoint in {S} and other in {\overline{S})}. The sparsest cut is the cut minimizing the sparsity. A related problem is the flux problem, which asks for a cut minimizing:

\displaystyle f(S) = \frac{\vert \partial S \vert}{ \vert S \vert \cdot \vert V \setminus S \vert}

If we can minimize flux, we have a {2}-approximation for sparsity, since for all {S} we have that: {\frac{\alpha(S)}{\vert V \vert} \leq f(S) \leq \frac{2 \alpha(S)}{\vert V \vert}}. If we can approximate the minimum flux, we can also approximate sparsity by the same factor multiplied by {2}.

Let’s formulate flux as an embeddings problem: we would like a mapping {\phi: V \longrightarrow \{0, 1\}} such that minimizes:

\displaystyle \frac{\sum_{(u,v) \in E} \vert \phi(u) - \phi(v) \vert }{ \sum_{(u,v) \in V^2}  \vert \phi(u) - \phi(v) \vert }

Let’s relax this a bit. Substitue {\vert \phi(u) - \phi(v) \vert} by d_{uv}. So, we want to minimize {\frac{\sum_{(u,v) \in E} d_{uv} }{\sum_{(u,v) \in V^2} d_{uv} }}. A standard trick for minimizing fractions where both numerator and denominator are linear functions on the same variables is to fix the denominator to be {1} and minimize the denominator. So, we want to minimize {\sum_{(u,v) \in E} d_{uv}} given that {\sum_{(u,v) \in V^2} d_{uv} = 1}. We also wanted {d_{uv} \in \{ 0, 1\}}. But this is still not enough, we want {d_{uv} } to be of a specific form called cut metric, i.e., there is a partition of {V} in two sets and {d_{uv} = 1} iff {u} and {v} are in different sets. We relax it a bit by just asking it to be a metric, i.e, non-negative value so that the triangle inequality holds:

\displaystyle d_{uv} \leq d_{uw} + d_{wv}

By relaxing a minimization problem we might get a solution which has actually smaller than the previous problem. This gives however a lower bound to the result. In general, finding a good lower bound is the first step in the analysis of approximation algorithms.

Theorem 1 The solution of the flux problem is lower-bounded by the solution of the following LP:

\displaystyle \left. \begin{aligned} & \min \sum_{(u,v) \in E} d_{uv} \text{ s.t. } \\ & \qquad \left\lbrace \begin{aligned} & \sum_{(u,v) \in V^2} d_{uv} = 1 \\ & d_{uv} \leq d_{uw} + d_{wv} \\ & d_{uv} \geq 0 \end{aligned} \right. \end{aligned} \right. \ \ \ \ \ (1)

The idea for approximating this is to drop the integrality constraint, solve the LP to get a general metric and the round it to obtain a cut. Suppose {\{d_{uv}\}} is the optimal solution to the LP version of the program above. How to get a cut from it?

To do it, we will use a powerful result due to Bourgain, but first, let’s discuss some basics about metrics and embeddinds. A metric space is a pair {(X,d)} where {X} is a set and {d:X^2 \longrightarrow {\mathbb R}_+}. Examples of metric spaces are:

  1. {X} is the set of vertices of a weighted graph {G} and {d(u,v)} is the distance from {u} to {v} in the graph;
  2. {X = {\mathbb R}^n} and {d} is the {L_p}-norm, i.e., {d(u,v) = ( \sum_{i=1}^n \vert u_i - v_i \vert^p )^{1/p}}, when {p = 2}, this is the Euclidian norm, when {p = 1}, it is the Manhattan distance. This space is called {L^n_p} ;
  3. {X = {\mathbb R}^n} and {d} is the {L_\infty}-norm, i.e., {d(u,v) = \max_i \vert u_i - v_i \vert}. It is called {L_\infty} norm, because it {\Vert u - v \Vert_\infty = \lim_{p \rightarrow \infty} \Vert u - v \Vert_p} where {\Vert u - v \Vert_p} denoted the distance in the {L_p} norm. This space is called {L^n_\infty}

Given two metric spaces {(X,d_X)} and {(Y, d_Y)}, we say that a mapping {\phi:X \longrightarrow Y} is a distortion {D} embedding:

\displaystyle d_X (u,v) \leq d_Y(\phi(u), \phi(v)) \leq D d_X (u,v)

to make our life easier, we use most of the times the following (not so) weaker definition: we say it is a distortion {D} embedding if there is some factor {\gamma > 0} such that:

\displaystyle \gamma d_X (u,v) \leq d_Y(\phi(u), \phi(v)) \leq D \gamma d_X (u,v)

this is the same but allowing some scaling.

Bourgain’s Theorem says that:

Theorem 2 (Bourgain) For any metric space {(X,d)} with {|X| = n} there is an embedding into {L_1^{O(\log^2 n)}} with distortion {O(\log n)}. And we can construct this embedding in poly-time using a randomized algorithm.

Supose {\phi} is such embedding. Then we have a function {\phi : V \rightarrow {\mathbb R}^{O(log^2 n)}} such that {d_{uv} \leq \Vert \phi(u) - \phi(v) \Vert \leq d_{uv} \cdot O(\log n)}. Now, this gives us {O(n \log^2 n)} cuts obtained in the following way: for each coordinate {i} of {{\mathbb R}^{O(log^2 n)}}, order the {n} points according to {\phi_i(u)} and for each {j = 1 \ldots n-1} take the cut {S} formed by the first {j} points. Let this be the cut {S_{ij}} Then, take the minimum of those cuts.

Theorem 3 The procedure described above generates a cut within a factor of {O(\log n)} to the optimal in poly-time.

To prove this, first we need some notation. Given the cut {(S_{ij}, \overline{S_{ij}})}, let {d_{S_{ij}}} be the cut-distance associated with it, i.e., {d_{S_{ij}} (u,v)} is {0} if {u} and {v} are in the same side of the cut and {1} otherwise. Let also {\{ x_{i1} \leq x_{i1} \leq \ldots \leq x_{in} \} = \{ \phi_i(v); v \in V \}}. It is not difficult to see that:

\displaystyle \vert \phi_i(u) - \phi_i(v) \vert = \sum_{j=1}^{n-1} (x_{i,j+1} - x_{i,j}) d_{S_{ij}} (u,v)

and, therefore:

\displaystyle \Vert \phi(u) - \phi(v) \Vert_1 = \sum_{i=1}^{O(\log^2 n)} \sum_{j=1}^{n-1} (x_{i,j+1} - x_{i,j}) d_{S_{ij}} (u,v)

Now, we can go ahead and prove the theorem: Let {f^*} be the optimal flux and {f} the flux obtained by the algorithm. We have that:

\displaystyle f^* \leq f = \min_{ij} \frac{ \sum_{(u,v)\in E} d_{S_{ij}(u,v)}}{ \sum_{(u,v)\in V^2} d_{S_{ij}(u,v)}} = \min_{ij} \frac{ c_{ij} \sum_{(u,v)\in E} d_{S_{ij}(u,v)}}{ c_{ij} \sum_{(u,v)\in V^2} d_{S_{ij}(u,v)}}

where {c_{ij} = x_{i,j+1} - x_{i,j}}. Then:

\displaystyle f \leq \frac{\sum_{ij} \left[ c_{ij} \sum_{(u,v)\in E} d_{S_{ij}}(u,v) \right] }{\sum_{ij} \left[ c_{ij} \sum_{(u,v)\in V^2} d_{S_{ij}(u,v)} \right] } = \frac{\sum_{(u,v)\in E} \left[ \sum_{ij} c_{ij} d_{S_{ij}}(u,v) \right] }{ \sum_{(u,v)\in V^2} \left[ \sum_{ij} c_{ij} d_{S_{ij}(u,v)} \right] }

but this is exacly the expression of {L_1}-norm, so:

\displaystyle f \leq \frac{\sum_{(u,v)\in E} \Vert \phi(u) - \phi(v) \Vert_1 }{ \sum_{(u,v)\in V^2} \Vert \phi(u) - \phi(v) \Vert_1 } \leq \frac{\sum_{(u,v)\in E} O(\log n) d_{uv} }{ \sum_{(u,v)\in V^2} d_{uv} }

which is smaller than {f^* O(\log n)} since this is the solution to the LP, which is a relaxed version of the original problem. And this finishes the proof.

This is one of the first applications to metric embeddings. It was introduced in a paper by Linal, London and Rabinovich. There are very nice resources on the web about metric embeddins, as the course by Anupam Gupta at CMU, the course by Michael Goemans at MIT and the course by Tim Roughgarden at Stanford.

This is also an area full of exciting open problems. It is known, for example, that it is hard to decide if a given metric is embeddable in {L_1} or not. But given a metric that we know that is isometrically embeddable in {L_1}, how to find the embedding? This is an open problem. In fact, it is unknown even how to approximate this embedding by a factor better then {\log n}, which is given by Bougain’s Theorem.