Traveling Salesman Problem¶
The Traveling Salesman Problem (TSP) is quite an interesting math problem. It simply asks: Given a list of cities and the distances between them, what is the shortest possible path that visits each city exactly once and returns to the origin city?
It is a very simple problem to describe and yet very difficult to solve. TSP is known to be NP-hard and a brute-force solution can be incredibly expensive computationally. Even with just $200$ cities, with the brute-force method you have this many possible permutations to check:
import math
math.factorial(200)
That's actually a lot more than the total number of atoms in the universe!
Here's an obligatory xkcd for this:
from IPython.display import Image
Image(url='https://imgs.xkcd.com/comics/travelling_salesman_problem.png')
Tesla Superchargers¶
To make the TSP even more exciting, let's make the salesman visit the awesome Tesla Superchargers. As of this writing there are $196$ superchargers in the US, and that number is quickly growing. Let's look at what the optimal path looks like for going through these superchargers as a concrete TSP example.
Optimal Path for Supercharger Traveling¶
I'll go through how I obtained the results in the later sections, but first I'd like to present the optimal path that I found below. You can toggle the display for the superchargers and the optimal path by clicking on the checkboxes.
The optimal path looks pretty awesome, right?
Solving the TSP¶
There are numerous heuristics and approximate solutions for TSP and that is on its own a vast topic. An approximate solution called Christofides's algorithm is provably within $1.5$ times of the optimum. One can also use simulated annealing or genetic algorithms to find solutions that are very close to optimal in most cases.
But here I'm most interested in finding the exact optimum, since we don't have that many nodes, and the distance metric (symmetric geometric distance) is relatively simple. After surveying the literature and searching online, I found the Concorde TSP solver that can find the exact optimal path (instead of approximations) using branch-and-bound algorithms. The basic idea is that when the algorithm branches out to search for the optimum, many of the permutations can actually be safely cut short if it is impossible for a branch to result in a value better than a known better solution. This kind of method has been shown to be the most effective for finding the exact optimum for TSP.
Fetching Coordinates¶
So first we need to find all the supercharger locations. One possible way to do that is to get a list of addresses for them and then geocode the addresses into coordinates. However it turns out that some of the superchargers are in remote places that aren't easily specified by a street address. They are more conveniently specified by latitudes and longitudes.
Luckily the Tesla website contains references to coordinates of all the supercharger locations. We can use simple regular expressions and BeautifulSoup
to parse the pages.
%matplotlib inline
import matplotlib.pyplot as plt
from IPython.core.pylabtools import figsize
figsize(15, 5)
from bs4 import BeautifulSoup
import re
import requests
import numpy as np
import pandas as pd
# get the list of superchargers in the US
url = 'http://www.teslamotors.com/findus/list/superchargers/United+States'
rv = requests.get(url)
content = rv.text
# get link to each supercharger, each page contains the supercharger's coordinates
sc_page_urls = re.findall('(/findus/location/supercharger/\w+)', content)
# get the cooridnates (latitude, longitude) for each supercharger
sc_names = []
sc_coors = {}
for sc_page_url in sc_page_urls:
url = 'http://www.teslamotors.com' + sc_page_url
rv = requests.get(url)
soup = BeautifulSoup(rv.text)
sc_name = soup.find('h1').text
sc_names.append(sc_name)
directions_link = soup.find('a', {'class': 'directions-link'})['href']
lat, lng = directions_link.split('=')[-1].split(',')
lat, lng = float(lat), float(lng)
sc_coors[sc_name] = {'lat': lat, 'lng': lng}
# sort the names
sc_names = sorted(sc_names)
coords = pd.DataFrame.from_dict(sc_coors).T.reindex(sc_names)
coords.head()
Computing Geodesic Distances¶
Now that we've gather all the coordinates, we can start to compute distances. Here is a function that computes the distance between two points on earth specified by latitude-longitude pairs. This function is based on the code on John D. Cook's excellent blog post related to this topic.
def distance_on_earth(lat1, long1, lat2, long2, radius=6378.388):
"""
Compute distance between two points on earth specified by latitude/longitude.
The earth is assumed to be a perfect sphere of given radius. The radius defaults
to 6378.388 kilometers. To convert to miles, divide by 1.60934
Reference
---------
Adopted from John D. Cook's blog post:
http://www.johndcook.com/blog/python_longitude_latitude/
"""
# Convert latitude and longitude to spherical coordinates in radians.
degrees_to_radians = np.pi / 180.0
# phi = 90 - latitude
phi1 = (90.0 - lat1) * degrees_to_radians
phi2 = (90.0 - lat2) * degrees_to_radians
# theta = longitude
theta1 = long1 * degrees_to_radians
theta2 = long2 * degrees_to_radians
# Compute spherical distance from spherical coordinates.
cos = (np.sin(phi1) * np.sin(phi2)* np.cos(theta1 - theta2) +
np.cos(phi1) * np.cos(phi2))
arc = np.arccos(cos)
rv = arc * radius
return rv
Note that we are making the simplifying assumptions that the Earth is a perfect sphere, and that the distance is a simple Euclidean distance, instead of a driving distance. Although one can certainly plug in a different distance metric and follow the same procedure outlined here. (Update: see part two for the results using driving distances)
We can now compute the distances between all pairs of supercharger locations:
# get distances between all pairs
mile_in_km = 1.60934
distances = {}
for i in range(len(sc_names)):
a = sc_names[i]
distances[a] = {}
for j in range(len(sc_names)):
b = sc_names[j]
if j == i:
distances[a][b] = 0.
elif j > i:
distances[a][b] = distance_on_earth(coords.ix[a, 'lat'],
coords.ix[a, 'lng'],
coords.ix[b, 'lat'],
coords.ix[b, 'lng'])
else:
distances[a][b] = distances[b][a]
distances = pd.DataFrame(distances) / mile_in_km
One interesting thing to note is that, for each supercharger in the US, on average there's another one less than $60$ miles away. That's pretty nice.
closest_distances = distances[distances > 0].min()
ax = closest_distances.hist(bins=25)
ax.set_title('histogram of distances to closest superchargers')
ax.set_ylabel('number of superchargers')
ax.set_xlabel('miles')
closest_distances.describe()
Using the Concorde TSP Solver¶
Now we are ready to use the Concorde TSP solver. To use Concorde, you'll need to download a few things and make sure you have a working C compiler. You can find the detailed steps here. I compiled it on OSX Yosemite without issues.
Information about the input/output files for Concorde can be found here. In our particular case, the input file to Concorde can be generated as follows:
# create input file for Concorde TSP solver
sc_id = 0
output = ''
for sc_name in sc_names:
output += '%d %f %f\n' % (sc_id, sc_coors[sc_name]['lat'], sc_coors[sc_name]['lng'])
sc_id += 1
header = """NAME : TTS
COMMENT : Traveling Tesla Salesman
TYPE : TSP
DIMENSION : %d
EDGE_WEIGHT_TYPE : GEOM
NODE_COORD_SECTION
""" % sc_id
with open('sc.tsp', 'w') as output_file:
output_file.write(header)
output_file.write(output)
This creates a .tsp
file that the concorde
executable can process directly, and it outputs the solution in a .sol
file in the same directory where the executable is:
# after running the Concorde executable, parse the output file
solution = []
f = open('../../../TSP/concorde/TSP/sc.sol', 'r')
for line in f.readlines():
tokens = line.split()
solution += [int(c) for c in tokens]
f.close()
assert solution[0] == len(sc_names)
solution = solution[1:] # first number is just the dimension
assert len(solution) == len(sc_names)
Now we have the optimal path!
optimal_path = []
for solution_id in solution:
optimal_path.append(sc_names[solution_id])
# connect back to the starting node
optimal_path.append(sc_names[solution[0]])
optimal_path = pd.Series(optimal_path)
optimal_path.head()
optimal_path.tail()
We can also easily find the total length of the path:
# compute total distance in optimal path
total = 0
for i in range(len(optimal_path) - 1):
total += distances.ix[optimal_path[i], optimal_path[i + 1]]
total
So the total is almost $16,000$ miles, not an easy trip for the salesman!
Finally, we can combine all the results and use the Google Maps API to create the visualization in the earlier section.
Updates¶
- [06/17/2015] Really surprised by the media coverage: Fortune, Popular Mechanics, The Verge, Nautilus. Thank you all for the interests and feedback
- [06/19/2015] Updated to include two new superchargers (total is now 196)
- [06/19/2015] In the previous version of this post, there were numerical precision issues that caused sub-optimal behavior in some subsections. It has been resolved with special thanks to Prof. Bill Cook
- [06/23/2015] By popular demand, I've added a part two of this blog post using driving distances and driving times in addition to the simplified straight line distances.