Random Points

3 Easy Steps to Set Up Pyspark

Starting with Spark 2.2, it is now super easy to set up pyspark.

  1. Download Spark

    Download the spark tarball from the Spark website and untar it:

    $ tar zxvf spark-2.2.0-bin-hadoop2.7.tgz

  2. Install pyspark

    If you use conda, simply do:

    $ conda install pyspark

    or if you prefer pip …

Now Working At Tesla

It's been an incredible year with lots of changes. Partly because of my previous blog posts, I'm now working as a software engineer at Tesla. I started at the end of August and it's been an amazingly rewarding experience so far.

I had been a fan of the company for …

The Traveling Tesla Salesman (Part 2)

In part one of this blog post I look at a simplified distance metric: straight line (i.e. big circle) distance between two points on earth. A popular question has been: how about using the actual driving distances? Let's have some fun with this here in part two.

I will actually look at two additional metrics here - driving distances and driving times. Both are availabe via the Google Directions API.

The Traveling Tesla Salesman

Traveling Salesman Problem

The Traveling Salesman Problem (TSP) is quite an interesting math problem. It simply asks: Given a list of cities and the distances between them, what is the shortest possible path that visits each city exactly once and returns to the origin city?

It is a very simple problem to describe and yet very difficult to solve. TSP is known to be NP-hard and a brute-force solution can be incredibly expensive computationally. Even with just $200$ cities, with the brute-force method you have this many possible permutations to check:


Computing Sample Variance: Why Divide by N - 1?

Variance Estimation

In statistics we know that the mean and variance of a population $Y$ are defined to be:

\begin{equation} \left\{ \begin{aligned} \text{Mean}(Y) &= \mu = \frac{1}{N} \sum_{i=1}^{N} Y_i \\ \text{Var}(Y) &= \sigma^2 = \frac{1}{N} \sum_{i=1}^{N} (Y_i - \mu)^2 \\ \end{aligned} \right. \end{equation}

where $N$ is the size of the population.

Can Integer Operations Overflow in Python?

Integer representations

Integers are typically represented in memory as a base-2 bit pattern, and in python the built-in function bin can be used to inspect that:

In [1]:

If the number of bits used is fixed, the range of integers that can be represented would …

Simple Proof for Kraft's Inequality

In graduate school I came up with an original proof for Kraft's inequality. The proofs I could find in textbooks tend to be more complicated and less intutive to me, and so I'd like to share my proof here.

Prefix Codes

First it is important to understand the concept of …

Fibonacci Numbers in Python

Fibonacci numbers

The Fibonacci numbers are defined recursively by the following difference equation:

\begin{equation} \left\{ \begin{aligned} F_{n} & = F_{n-1} + F_{n-2} \\ F_1 & = 1 \\ F_0 & = 0 \\ \end{aligned} \right. \end{equation}

It is easy to compute the first few elements in the sequence:

$0, 1, 1, 2, 3, 5, 8, 13, 21, 34 \cdots $