Alon Krymgand

The Christofides Algorithm

Approximating the Metric TSP Problem: An Introduction to Combinatorial Optimization.
  • May 20th 2023
  • 13 min read
  • #SoME3
  • Algorithms
  • Programming

The traveling salesman problem (or TSP for short) is a well-studied problem in computer science and graph theory: Given a graph G=(V,E)G=(V, E) and a cost function on the edges c:ER+c: E \to \R^+, what is the cheapest cycle that visits all vertices of the graph?

A demonstration of the TSP Problem: The distance between any two vertices is the Euclidean distance between them. Showing the steps of Christofides' algorithm to calculate a 1.5-approximation.

Finding such cycle is very useful on its own, but it also has many applications and usages in other problems. An obvious use case would be planning the route for package delivery service at the beginning of the day, such that all packages are delivered and the minimum amount of fuel is used. Another more interesting application would be the design of a network or circuit board, aiming to optimize the efficiency and data transition times, while minimizing the size and costs. TSP also appears as a sub-problem in many other fields that may not seem related at all at first, such as DNA sequencing and astronomy.[1]

TSP is HARD.

Theoretical computer scientists love to study problems and the relations between them, in terms of their complexity. This field of computer science is called computational complexity theory. It turns out that the TSP problem is hard (NP-Hard). Not only that, the problem of determining if a given cycle (which visits all vertices of the graph) is the cheapest cycle, is hard too (NP-Complete). And if that wasn’t enough, it has been proven that TSP can’t be approximated (1+ϵ1+\epsilon approximation in polynomial time), even with “easier” versions of TSP when we allow some constraints on the structure of the graph and the cost function. This fact is known as the inapproximability of the TSP problem.[2]

So, is all hope lost?

Well, not exactly. In this article, I will present the Christofides Algorithm[3], which is a classic approximation algorithm in the field of combinatorial optimization. The algorithm does not solve the general TSP problem (because it’s hard!), but rather the Metric TSP Problem, which is an easier variant of the problem that provides some constraints on the input graph and cost function. Since we are talking in the context of graph theory, I present those constraints in their graphic definition:

  1. The graph GG is a clique, meanings that there is an edge between every two vertices in VV. A formal way to say that would be that G=(V,V×V)G=(V, V \times V), where ×\times denotes the cartesian product of two groups.

  2. The costs function cc satisfies the triangle inequality: c(α,β)+c(β,γ)c(α,γ)c(\alpha, \beta) + c(\beta, \gamma) \ge c(\alpha, \gamma) when α,β,γ\alpha, \beta, \gamma are vertices V\in V.

Although it is not as general as the non-metric TSP problem, Metric TSP is still a very interesting problem with many applications. In fact, most of the applications I’ve listed in the introduction are actually cases where solving the Metric TSP is sufficient. For example, it is not difficult to prove that the following two distance functions between two points on a plane p1=(x1,y1)p_1 = (x_1, y_1) and p2=(x2,y2)p_2 = (x_2, y_2) satisfy the inequality:

  • The Manhattan[4] distance: x2x1+y2y1|x_2-x_1| + |y_2-y_1|

  • The Euclidian[5] distance: (x2x1)2+(y2y1)2\sqrt{(x_2-x_1)^2 + (y_2-y_1)^2}

Throughout this article, I will use the phrases cost, weight or distance interchangeably. All of them refer to the same function cc. The cost of a collection of edges (cost of a path or a cycle) is defined as the sum of costs over all edges in the collection.

Approximation Algorithms

In the field of theoretical computer science, and especially in combinatorial optimization, we usually try to solve hard (NP-Hard) problems using approximation algorithms. I won’t dive too deeply into computational complexity here, but in general, when (and if) we know that a certain optimization problem is hard to solve exactly in polynomial time, we can instead solve it approximately. Such a solution may not be the best solution, but we would still want to prove some bound on “how good the solution is with respect to the optimal solution”. There are multiple ways that one can do so, but in this article, we will use an approximation factor.[6]

We say that an algorithm is an α\alpha-approximation algorithm (factor α\alpha approximation) for a problem if and only if for every instance of the problem, our algorithm will find a solution within a factor of α\alpha of the optimum solution.

Since TSP is a minimization problem (we want to find a TSP cycle with minimal cost), then α1\alpha\ge1, and such algorithm will always find a cycle whose total cost is at most α\alpha times the cost of the optimal TSP cycle.[7]

Cycle Shortcutting

One very useful property that we will use in the algorithm analysis is that under the metric setting, any two consecutive edges in a path v1v2v3v_1 \to v_2 \to v_3 can be shortened into a shorter path v1v3v_1 \to v_3 by skipping over the middle vertex. This is possible since the graph is complete, so an edge v1v3v_1 \to v_3 always exists, and because of the triangle inequality, we know that c(v1,v2)+c(v2,v3)c(v1,v3)c(v_1, v_2) + c(v_2, v_3) \ge c(v_1, v_3).

From now on, given a graph GG we will denote an optimal TSP Cycle of it with CC^*. Such a path is an ordered set of edges of the graph, where each edge is directed, and every two adjacent edges in the cycle are of the form ik,kji \to k, k \to j, for some i,j,kVi, j, k \in V (notably, adjacent edges share a vertex). Additionally, of course, the first and last vertices are the same.

The cycle CC^* must visit every vertex at least once (by definition), so it must contain at least n=Vn = |V| edges. On the other hand, if CC^* contains more than nn edges, then there exists a vertex that appears twice in CC^*, and one of those appearances can be skipped, resulting in a shorter (or equally weighted) cycle. Hence, there must exist an optimal TSP cycle CC^* with exactly nn edges.

Bounding TSP with MST

It turns out that the cost of the minimum spanning tree TT of a graph GG is actually a lower bound on CC^*, meaning that c(C)c(T)c(C^*) \ge c(T). This is because removing any edge ee from CC^* results in a tree whose edges are C{e}C^* \setminus \{e\}. This tree spans all vertices of the original graph and thus is a spanning tree. By definition, the weight of TT is minimal across all spanning trees of GG, and thus c(C{e})c(T)c(C^* \setminus \{e\}) \ge c(T). From that we easily show that the cost of CC^* is at least c(T)c(T):

c(C)c(C)c(e)=c(C{e})c(T)c(C^*) \ge c(C^*) - c(e) = c(C^* \setminus \{e\}) \ge c(T)

Unlike TSP, finding the MST of a graph is considered easy, and there are multiple efficient, popular algorithms that do that (Prim’s, Kruskal’s, Borůvka’s). This is very promising! We now have a building block (MST) that we can easily compute, which resembles a TSP cycle very well and is guaranteed to have a lower cost than the optimal TSP.

A weighted graph with his minimum spanning tree highlighted

An Approximation Using MSTs

Using this fact, we can now show a fairly straightforward 22-approximation for the Metric TSP problem:

  1. Find the MST of G=(V,E)G=(V,E), and denote it as T=(V,F)T=(V, F).
  2. Construct a new graph T=(V,F)T' = (V, F') where FF' is the multiset that contains every edge of FF exactly twice. In other words, TT' is the tree TT where all of its edges are duplicated.
  3. Find an Eulerian cycle of TT' and denote it as CC.
  4. Shortcut the cycle CC to a cycle CC' containing exactly nn unique vertices.

Correctness

It is known that an Eulerian cycle can be found in a connected graph if and only if the degree of all vertices is even. Since TT' is the tree TT where each edge is duplicated, the degree of all vertices of TT' must be even. Besides that, every Eulerian cycle of a connected graph with vertices of even degree is a valid TSP Cycle since it must visit all vertices. We apply shortcutting only if it leaves the cycle a valid TSP Cycle, and thus the algorithm yields a valid TSP Cycle as required.

Approximation Factor

c(C)c(C)2c(T)2c(C)c(C') \le c(C) \le 2\cdot c(T) \le 2 \cdot c(C^*)

Hence this simple approximation algorithm actually returns a cycle that costs at most twice as the optimal TSP cycle.

Runtime Analysis

Since both finding an MST and an Eulerian cycle of a graph are easy, polynomial problems, and TT, TT', CC, and CC' are all of polynomial size relative to the size of the input graph GG, the resulting algorithm is polynomial as well; I will leave it to the reader to provide a strict bound.

The Problem of the Naïve Approximation

We have the MST as a lower bound on the optimal TSP Cycle. The problem is that the MST is not a cycle; by duplicating the edges of the tree and finding the Eulerian cycle in the new graph TT', we convert the tree, in a fairly clever manner, into a cycle as desired.

Sadly, this conversion costs a lot in the approximation factor. As mentioned before, all that is required from TT' is that all vertices of it are of even degree. By duplicating all edges of TT we obviously guarantee that, but do we have to duplicate all edges of the MST? Why not leave even vertices untouched, and modify only the bad vertices? Additionally, vertices with degree 5 (for example) would be of degree 10 after duplication, but degree 6 would suffice, as that is also even.

The Handshaking Lemma

The Handshaking lemma states that in any undirected graph, the number of vertices of odd degree is even. The following is a sketch of the proof; Consider a general undirected graph G=(V,E)G=(V,E). Since every edge in a graph contributes exactly twice to the degree of some two vertices in the graph, we get that

vVdeg(v)=2E\sum_{v \in V} \deg(v) = 2 \left| E \right|

meaning that the sum of degrees across all vertices in the graph is even. If we partition VV into two sets V=VevenVoddV =V_\text{even} \sqcup V_\text{odd} , then we get

vVodddeg(v)+vVevendeg(v)even=2Eeven\sum_{v \in V_\text{odd}}{ \deg(v) } + \underbrace{\sum_{v\in V_\text{even}}{\deg(v)}}_\text{even} = \underbrace{2 \left| E \right|}_\text{even}

Hence vVodddeg(v)\sum_{v \in V_\text{odd}} \deg(v) must be even too. By definition, all elements in the sum are even, meaning that Vodd|V_\text{odd}| must be even, as required.[8]

Recall that in the naïve algorithm, we have the MST and we want to convert it to a graph with even vertices only. By the Handshaking lemma, we know that the number of bad vertices (odd degree vertices) must be even, which is a crucial observation towards Christofides’ Algortihm.

Finally… Christofides’ Algorithm

Christofides’ algorithm is an elegant and revolutionary algorithm that provides a 1.51.5-approximation for the Metric TSP problem. In other words, it outputs a cycle of the graph where the cost of the cycle (which is defined by the sum of all costs of edges in the cycle), is at most 50% more than the cost of the shortest cycle possible.

A demonstration of the TSP Problem: The distance between any two vertices is the Euclidean distance between them. Showing the steps of Christofides' algorithm to calculate a 1.5-approximation.

The algorithm steps follow:

  1. Find the MST of G=(V,E)G=(V,E), and denote it as T=(V,F)T=(V, F).
  2. Denote with UVU \subseteq V the set of vertices with odd degree in relation to TT.
  3. Find a Minimum Weight Perfect Matching (MPM) of vertices UU, and denote such matching with JJ.
  4. Construct a new graph T=(V,FJ)T' = (V, F \cup J), merging the MST with the MPM, where FJF \cup J is a multiset that may contain the same edge more than once.
  5. Find an Eulerian cycle of TT' and denote it as CC.
  6. Shortcut the cycle CC to a cycle CC' containing exactly nn unique vertices.

Correctness

The key to the correctness of the algorithm is the handshaking lemma, which states that in all undirected graphs, the number of vertices of an odd degree is even. This means that a perfect matching JJ of the odd degree vertices UU always exists since the graph is a clique.

The matching JJ adds exactly one edge endpoint to all vertices of odd degree in the MST TT, making the union of the MST and MPM TT' an undirected graph with even degree vertices only. This, as discussed before, is sufficient for finding an Eulerian cycle CC in TT', which is a valid TSP cycle. The way we apply shortcutting keeps the cycle a TSP cycle and thus the algorithm yields a valid TSP Cycle as required.

Approximation Factor

Consider an optimal TSP cycle CC^*. By shortcutting, we can transform such cycle to a new cycle of even length CUC^*_U, that visits only vertices of odd degree of TT (meaning that they are U\in U) and skip over vertices of even degree. We only use shortcuts to get to CUC^*_U from CC^*, and hence c(CU)c(C)c(C^*_U) \le c(C^*).

An example of a cycle C' of length 8 with the matching M1 highlighted

We can now partition the cycle CUC^*_U into two different matchings M1,M2M_1, M_2 of vertices U\in U. Without loss of generality, denote U={v1,v2,...,vk}U=\{v_1, v_2, ..., v_k\} where kk is even. Then,

M1={(v1,v2),(v3,v4),...,(vk1,vk)}M2={(v2,v3),(v4,v5),...,(vk,v1)}M_1 = \{ (v_1, v_2), (v_3, v_4), ..., (v_{k-1}, v_k) \} \\ M_2 = \{ (v_2, v_3), (v_4, v_5), ..., (v_k, v_1) \}

By this construction, we get that the edges of the cycle CUC^*_U are exactly M1M2M_1 \sqcup M_2. Since both M1M_1 and M2M_2 are perfect matchings of all vertices U\in U, they both serve as upper bounds on the minimum weight perfect matching JJ of those vertices. We get that:

c(C)c(CU)=c(M1)+c(M2)2c(J)c(C^*) \ge c(C^*_U) = c(M_1) + c(M_2) \ge 2 \cdot c(J)

Hence we get that c(J)c(C)2c(J) \le {c(C^*) \over 2}. Since the algorithm yields an Eulerian cycle CC of edges FF of the MST TT joined with JJ, we get

c(C)=c(F)+c(J)1.5c(C)c(C) = c(F) + c(J) \le 1.5 \cdot c(C^*)

Hence the algorithm yields a cycle CC which costs at most 1.51.5 times the cost of the optimal TSP cycle CC^*, meaning that the algorithm has an approximation factor of 1.51.5 as required.

Runtime Analysis

The analysis is similar to the naïve algorithm analysis, with the addition of finding the minimum weight perfect matching. Using Edmond’s Blossom algorithm, it is possible to find the minimum weight perfect matching in a polynomial O(n3)\mathcal{O}(n^3) time. I won’t dive into Edmond’s algorithm in this article; It turns out that finding the matching is actually the bottleneck of Christofides’ algorithm, and it too runs in O(n3)\mathcal{O}(n^3).

Christofides in Practice

When I first saw Christofides’ algorithm, I thought to myself that the 1.5 approximation factor is pretty 💩! I mean, if I was the head of some delivery company, I would surely not be happy to know that the route planning algorithm can be improved by 33%. The practical implications of how close we can approximation TSP is huge in the real world.

Well, remember that 1.5 is an upper bound on the approximation ratio. In practice, especially on large dense graphs, the algorithm yields a cycle that is not that far from the optimal cycle on average. Try it yourself! play with the simulation above, and try to get it to output a cycle that is clearly not close to the optimal. Note that since computing the actual optimal TSP cycle is exponentially hard, comparing the approximation to the optimal TSP cycle is emitted from the simulation.[9]

Does a better approximation exist?

Christofides published his paper in 1976, and for almost 50 years (!) it was the best-known approximation algorithm for the Metric TSP problem. In 2020 however, a paper that consists of 90 pages was published with the following abstract: “For some ϵ>1036\epsilon > 10^{-36} we give a randomized 3/2ϵ3/2 - \epsilon approximation algorithm for metric TSP.”[10]

Although the algorithm is much more complex (I did not even try to understand it), and the approximation improvement is clearly not sufficient, it actually proves that 3/23/2 is not the lower bound of the approximation factor. This gives hope that an efficient and elegant algorithm exists whose factor is a magnitude better than 1.51.5, and we just haven’t discovered it yet. The paper was published at STOC’21 and for this reason, received the best paper award.

References and Further Reading

I was heavily inspired to write this post after a lecture about the topic of Prof. Feldman Moran in the course Combinatorial Optimization at the University of Haifa, spring 2023. Proofreading was done by Almog Ben Chen. Thanks!

  1. Travelling salesman problem Wikipedia
  2. New Inapproximability Bounds for TSP Marek Karpinski, Michael Lampis, Richard Schmied
  3. Worst-case analysis of a new heuristic for the traveling salesman problem Nicos Christofides
  4. Taxicab geometry Wikipedia
  5. Euclidean distance Wikipedia
  6. Approximation Algorithms Wikipedia
  7. Discrete Mathematics and Algorithms (Lecture notes) Reza Zadeh
  8. Handshaking Lemma Wikipedia
  9. A Probabilistic Analysis of Christofides' Algorithm Markus Blaser, Konstatinos Panagiotou, B. V. Raghavendra Rao
  10. A (Slightly) Improved Approximation Algorithm for Metric TSP Anna R. Karlin, Nathan Klein, Shayan Oveis Gharan