numba

numba is the first tool we will explore to accelerate the code.

numba is a “just-in-time” compiler. It works at the level of a function, by compiling that function to machine code just before it is executed (just in time!).

The machine code is executed directly on the processor, thus bypassing the Python virtual machine, and therefore running quicker. The inputs to the function are passed into this function, this all executes as machine code, and then the results are passed by to Python.

numba.jit

You use numba by marking functions that you want to be “just-in-time” (jit) compiled using the numba.jit decorator. For example, here is a very simple function that calculates the square root of an array of numbers;

import math
import numpy as np

def calculate_roots(numbers):
    num_vals = len(numbers)
    result = np.zeros(num_vals, "f")

    for i in range(0, num_vals):
        result[i] = math.sqrt(numbers[i])

    return result

Let’s see how long this takes to calculate 10 million square roots. We’ll do this by asking numpy to generate an array of 10 million random numbers between 0 and 500.

numbers = 500.0 * np.random.rand(10000000)

Now let’s time the function using timeit

timeit(calculate_roots(numbers)

On my computer I get this result;

1.42 s ± 25.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

So, it takes ~1.4 seconds to calculate 10 million square roots.

We can speed this up by asking numba to jit our calculate_roots function. We do this by adding the @numba.jit() decorator to the function, e.g.

import numba

@numba.jit()
def calculate_roots(numbers):
    num_vals = len(numbers)
    result = np.zeros(num_vals, "f")

    for i in range(0, num_vals):
        result[i] = math.sqrt(numbers[i])

    return result

Now lets time the function…

timeit(calculate_roots(numbers))

On my computer I get this result;

5.16 ms ± 35.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

It now takes ~5 milliseconds(!) to calculate 10 million square roots. This is almost 300 times faster, just by adding a single @numba.jit() to the top of the function.

Exercise 1

Add a @numba.jit() decorator to the calculate_scores function.

Answer to the above exercise

Caching the results of just-in-time compilation.

Part of the runtime of the script is the just-in-time compilation of the calculate_scores function. The more complex the function, the longer it takes for numba to create and then compile the function to machine code.

You can cache the result of compilation by passing cache=True to the decorator, e.g.

@numba.jit(cache=True)

You will still pay the cost of compilation the first time you run your script. But subsequent runs will load the cached machine code and will use that (thereby avoiding compiling the code again).

Exercise 2

Make the change to cache the results of JIT compilation in your copy of slow.py.

Exercise 3

The line

(ids, varieties, data) = slow.load_and_parse_data(5)

loads only 5% of the data. This is because, before numba, processing more than 5% of the data took too long. Now that you have accelerated the code, increase this to 100% of the data, e.g.

(ids, varieties, data) = slow.load_and_parse_data(100)

Answer to the above exercises