## Integer representations¶

Integers are typically represented in memory as a base-2 bit pattern, and in python the built-in function `bin`

can be used to inspect that:

```
bin(19)
```

If the number of bits used is fixed, the range of integers that can be represented would be fixed and can potentially overflow. That is the case for many languages such as C/C++.

In python, integers have arbitrary precision and therefore we can represent an arbitrarily large range of integers (only limited by memory available).

```
2 ** 200
```

However as I'll explain in this post, one still needs to be careful with precision issues especially when using the pydata stack (numpy/pandas).

## Can integers overflow in python?¶

Short answers:

- No
- if the operations are done in pure python, because python integers have arbitrary precision

- Yes
- if the operations are done in the pydata stack (numpy/pandas), because they use C-style fixed-precision integers

## Arbitrary precision¶

So how do python integers achieve arbitrary precision?

In python 2, there are actually two integers types: `int`

and `long`

, where `int`

is the C-style fixed-precision integer and `long`

is the arbitrary-precision integer. Operations are automatically promoted to `long`

if `int`

is not sufficient, so there's no risk of overflowing. In python 3, `int`

is the only integer type and it is arbitrary-precision.

To see a bit under the hood, let's examine how much the storage size changes for different integers in python.

```
%matplotlib inline
import matplotlib.pyplot as plt
from IPython.core.pylabtools import figsize
figsize(15, 5)
```

```
import numpy as np
import pandas as pd
```

Here we consider a list of integers going from $2^{0}$ to $2^{159}$, and we use `sys.getsizeof`

to inspect how many bytes are actually used to store the integer:

```
import sys
int_sizes = {}
for i in range(160):
int_sizes[i] = sys.getsizeof(2 ** i)
int_sizes = pd.Series(int_sizes)
```

Plotting the results:

```
ax = int_sizes.plot(ylim=[0, 60])
ax.set_ylabel('number of bytes')
ax.set_xlabel('integer (log scale)')
ax.set_title('number of bytes used to store an integer (python 3.4)')
ax.set_xticks(range(0, 160, 10))
labels = ['$2^{%d}$' % x for x in range(0, 160, 10)]
ax.set_xticklabels(labels)
ax.tick_params(axis='x', labelsize=18)
ax.tick_params(axis='y', labelsize=12)
```

```
int_sizes[29:31].head()
```

We can see that it takes 28 bytes before we get to $2^{30}$ where python allocates 4 more bytes to store larger integers. Certainly not the most compact representation, as a raw 64-bit array (i.e. 8 bytes) could do the job with fixed-precision. However we get the benefits of arbitrary precision and many others in python.

Also as we can expect, the storage increases roughly logarithmically as the integer gets larger. And interestly, it looks like python bumps the storage size at $2^{30}$, $2^{60}$, $2^{90}$, and so on.

## Be Careful with Overflows in numpy¶

In a lot of situations we would prefer to use the pydata stack (numpy/scipy/pandas) for computation over pure python. It is important to note that overflows can occur, because the data structures under the hood are fixed-precision. Here we have a numpy array of integers

```
a = np.array([2**63 - 1, 2**63 - 1], dtype=int)
a
```

```
a.dtype
```

This is a 64-bit integer and therefore $2^{63}-1$ is actually the largest integer it can hold. Adding 1 to the array will *silently* cause an overflow

```
a + 1
```

similary, we'd need to be careful with `np.sum`

:

```
a.sum()
```

On the other hand, `np.mean`

actually computes by first converting all inputs into `float`

, so the overflow won't happen at this value yet

```
a.mean()
```