Home > theory > Minimum average cost cycle and TSP

## Minimum average cost cycle and TSP

After some time, I did again Code Jam – well, not again, this is the first time I do Code Jam, but there is a while I don’t do Programming Competitions. Back in my undergrad I remember all the fun I had with my ACM-ICPC team solving problems and discussing algorithms problems. Actually, ICPC was what made me interested in Algorithms and Theory of Computing for the first time. I was remembering that not only because Code Jam because I came across a nice problem whose solution I learned in programming competitions, specifically a technique I learned to solve this problem.

Let’s formulate the problem in a more abstract way: Given a graph ${G = (V,E)}$ and two functions: a cost function ${c:E \rightarrow {\mathbb R}_+}$ and a benefit function ${b:E \rightarrow {\mathbb R}_+}$, we define the cost-benefit of a set of edges as ${\frac{b(S)}{c(S)}}$. Now, consider those two questions:

Question I: Find the spanning tree of maximum (minimum) cost-benefit.

Question II: Find the cycle of maximum (minimum) cost-benefit.

The solution of those uses binary search. If we can answer the following query: given ${\beta > 0}$, is there a cycle (spanning tree) of cost-benefit smaller (larger) than ${\beta}$? We either state there is no such tree (cycle) or exhibit that. How can we solve this? It is simple: consider the graph ${G}$ with edge weights given by ${b_e - \beta c_e}$. Then there is a cycle (spanning tree) of cost benefit ${\leq \beta}$ if and only if there is a cycle (spanning tree) in this graph with transformed weights with negative total weight. Finding a cycle with negative weight is easy and can be done, for example, using Bellman Ford’s algorithm. Finding a spanning tree with negative weights can be done using any minimal spanning tree algorithm, as Kruskal, Prim or Boruvka.

Taking ${c(e) = 1}$ for all ${e \in E}$ we can find using binary search, the cycle with smallest average length, i.e., smallest ${b(C) / \vert C \vert}$ where ${\vert C \vert}$ is the number of edges in the cycle.

Asymmetric Travelling Salesman Problem

We can use this trick just described to design an ${O(\log n)}$-approximation to the asymmetric TSP problem. Consider we have ${n}$ nodes in ${V}$ and a function ${c: V \times V \rightarrow {\mathbb R}_+}$, not necessarily symmetric, such that the triangular inequality holds, i.e., ${c(i,j) \leq c(i,k) + c(k,j)}$. A TSP tour is an ordering ${\pi: \{1, \hdots, n\} \rightarrow V}$ and has total cost: $\displaystyle c(\pi) = \sum_{j=1}^n c(\pi_j, \pi_{j+1})$

where ${\pi_{n+1} = \pi_1}$. Let OPT be the cost of the optimal tour. It is NP-complete to calculate the optimal, but consider the following approximation algorithm: find the cycle with smallest average cost. Then remove all the nodes in that cycle except one, in the remaining graph find again the cycle of smallest average cost and remove all nodes except one. Continue doing that until there is just one node left. Taking all those cycles together, we have a strongly connected Eulerian graph (in-degrees are equal to out-degrees) for each node). I claim that the total weight of edges ${E'}$ in this Eulerian graph is: $\displaystyle c(E') \leq 2 \mathcal{H}_n \cdot OPT$

where ${\mathcal{H}_n = \sum_{j=1}^n \frac{1}{j} = O(\log n)}$ is the harmonic number. Now, since we have this graph we can find an Eulerian tour and transform it into a TSP tour shortcutting when necessary (triangle inequality guarantees that shortcutting doesn’t decrease the cost of the tour). So, we just need to prove the claim.

In fact, it is not hard to see that after removing some nodes, the optimal tour is still ${\leq OPT}$, where ${OPT}$ is the tour of smallest cost for all nodes. To see this, just take the original tour and shortcut it, for example, if the original tour passed through a sequence of nodes ${i_1, \hdots, i_p}$ but nodes ${i_2, \hdots, i_{p-1}}$ then by triangle inequality: $\displaystyle c(i_1, i_p) \leq \sum_{j=1}^{p-1} c(i_j, i_{j+1})$

so we can just substitute the edges ${(i_1, i_2), \hdots, (i_{p-1}, i_p)}$ by ${(i_1, i_p)}$. Now, suppose we do ${k}$ iterations and in the beginning of the ${j^{th}}$ iteration there are ${n_j}$ nodes left. So, clearly the average length of the cycle we picked in the algorithm is ${\leq \frac{OPT}{n_j}}$ and therefore, if ${C_1, \hdots C_k}$ are the cycles chosen, we have: $\displaystyle c(E') = \sum_{j=1}^k c(C_j) \leq \sum_{j=1}^k \frac{OPT}{n_j} (n_j - n_{j+1} + 1)$

since: $\displaystyle \frac{n_j - n_{j+1} + 1}{n_j} \leq \frac{1}{n_j} + \frac{1}{n_j - 1} + \hdots + \frac{1}{n_{j+1}}$

we plug those two expressions together and we get the claim.

Categories: theory Tags: