# 3 Easy Steps to Set Up Pyspark

Starting with Spark 2.2, it is now super easy to set up pyspark.

$tar zxvf spark-2.2.0-bin-hadoop2.7.tgz 2. Install pyspark If you use conda, simply do: $ conda install pyspark

or if you prefer pip …

# Tips for Running TensorFlow with GPU Support on AWS

In this blog post I will discuss how to get TensorFlow working on the AWS p2 instances, along with some tips about configurations and optimizations. I will assume you are familiar with the basics of AWS, and focus on how to set up TensorFlow with GPU support on AWS.

# Easily Profile Python Code in Jupyter

line_profiler is an excellent tool that can help you quickly profile your python code and find where the performance bottlenecks are. In this blog post I will walk through a simple example and a few tips about using this tool within the Jupyter notebook.

## Installation¶

To install line_profiler with Anaconda …

# Now Working At Tesla

It's been an incredible year with lots of changes. Partly because of my previous blog posts, I'm now working as a software engineer at Tesla. I started at the end of August and it's been an amazingly rewarding experience so far.

I had been a fan of the company for …

# The Traveling Tesla Salesman (Part 2)

In part one of this blog post I look at a simplified distance metric: straight line (i.e. big circle) distance between two points on earth. A popular question has been: how about using the actual driving distances? Let's have some fun with this here in part two.

I will actually look at two additional metrics here - driving distances and driving times. Both are availabe via the Google Directions API.

# The Traveling Tesla Salesman

## Traveling Salesman Problem¶

The Traveling Salesman Problem (TSP) is quite an interesting math problem. It simply asks: Given a list of cities and the distances between them, what is the shortest possible path that visits each city exactly once and returns to the origin city?

It is a very simple problem to describe and yet very difficult to solve. TSP is known to be NP-hard and a brute-force solution can be incredibly expensive computationally. Even with just $200$ cities, with the brute-force method you have this many possible permutations to check:

math.factorial(200)
788657867364790503552363213932185062295135977687173263294742533244359449963403342920304284011984623904177212138919638830257642790242637105061926624952829931113462857270763317237396988943922445621451664240254033291864131227428294853277524242407573903240321257405579568660226031904170324062351700858796178922222789623703897374720000000000000000000000000000000000000000000000000

# Computing Sample Variance: Why Divide by N - 1?

## Variance Estimation¶

In statistics we know that the mean and variance of a population $Y$ are defined to be:

\left\{ \begin{aligned} \text{Mean}(Y) &= \mu = \frac{1}{N} \sum_{i=1}^{N} Y_i \\ \text{Var}(Y) &= \sigma^2 = \frac{1}{N} \sum_{i=1}^{N} (Y_i - \mu)^2 \\ \end{aligned} \right.

where $N$ is the size of the population.

# Can Integer Operations Overflow in Python?

## Integer representations¶

Integers are typically represented in memory as a base-2 bit pattern, and in python the built-in function bin can be used to inspect that:

In [1]:
bin(19)

Out[1]:
'0b10011'

If the number of bits used is fixed, the range of integers that can be represented would …

# Simple Proof for Kraft's Inequality

In graduate school I came up with an original proof for Kraft's inequality. The proofs I could find in textbooks tend to be more complicated and less intutive to me, and so I'd like to share my proof here.

## Prefix Codes¶

First it is important to understand the concept of …

# Fibonacci Numbers in Python

## Fibonacci numbers¶

The Fibonacci numbers are defined recursively by the following difference equation:

\left\{ \begin{aligned} F_{n} & = F_{n-1} + F_{n-2} \\ F_1 & = 1 \\ F_0 & = 0 \\ \end{aligned} \right.

It is easy to compute the first few elements in the sequence:

$0, 1, 1, 2, 3, 5, 8, 13, 21, 34 \cdots$