Random Points

Left Tesla

After an incredible 6.5 years of working at Tesla, I left in March this year. It has been a life changing experience to be able to work with and learn from so many brilliant people there.

I plan to take the rest of the year off to focus on …

3 Easy Steps to Set Up Pyspark

Starting with Spark 2.2, it is now super easy to set up pyspark.

  1. Download Spark

    Download the spark tarball from the Spark website and untar it:

    $ tar zxvf spark-2.2.0-bin-hadoop2.7.tgz

  2. Install pyspark

    If you use conda, simply do:

    $ conda install pyspark

    or if you prefer pip …

Easily Profile Python Code in Jupyter

line_profiler is an excellent tool that can help you quickly profile your python code and find where the performance bottlenecks are. In this blog post I will walk through a simple example and a few tips about using this tool within the Jupyter notebook.

Installation

To install line_profiler …

Now Working At Tesla

It's been an incredible year with lots of changes. Partly because of my previous blog posts, I'm now working as a software engineer at Tesla. I started at the end of August and it's been an amazingly rewarding experience so far.

I had been a fan of the company for …

The Traveling Tesla Salesman (Part 2)

In part one of this blog post I look at a simplified distance metric: straight line (i.e. big circle) distance between two points on earth. A popular question has been: how about using the actual driving distances? Let's have some fun with this here in part two.

I will actually look at two additional metrics here - driving distances and driving times. Both are availabe via the Google Directions API.

The Traveling Tesla Salesman

Traveling Salesman Problem

The Traveling Salesman Problem (TSP) is quite an interesting math problem. It simply asks: Given a list of cities and the distances between them, what is the shortest possible path that visits each city exactly once and returns to the origin city?

It is a very simple problem to describe and yet very difficult to solve. TSP is known to be NP-hard and a brute-force solution can be incredibly expensive computationally. Even with just $200$ cities, with the brute-force method you have this many possible permutations to check:

In [1]:
import math
math.factorial(200)
Out[1]:
788657867364790503552363213932185062295135977687173263294742533244359449963403342920304284011984623904177212138919638830257642790242637105061926624952829931113462857270763317237396988943922445621451664240254033291864131227428294853277524242407573903240321257405579568660226031904170324062351700858796178922222789623703897374720000000000000000000000000000000000000000000000000

Computing Sample Variance: Why Divide by N - 1?

Variance Estimation

In statistics we know that the mean and variance of a population $Y$ are defined to be:

\begin{equation} \left\{ \begin{aligned} \text{Mean}(Y) &= \mu = \frac{1}{N} \sum_{i=1}^{N} Y_i \\ \text{Var}(Y) &= \sigma^2 = \frac{1}{N} \sum_{i=1}^{N} (Y_i - \mu)^2 \\ \end{aligned} \right. \end{equation}

where $N$ is the size of the population.

Can Integer Operations Overflow in Python?

Integer representations

Integers are typically represented in memory as a base-2 bit pattern, and in python the built-in function bin can be used to inspect that:

In [1]:
bin(19)
Out[1]:
'0b10011'

If the number of bits used is fixed, the range of integers that can be represented would …

Simple Proof for Kraft's Inequality

In graduate school I came up with an original proof for Kraft's inequality. The proofs I could find in textbooks tend to be more complicated and less intutive to me, and so I'd like to share my proof here.

Prefix Codes

First it is important to understand the concept of …