numba is the first tool we will explore to accelerate the code.
numba is a “just-in-time” compiler. It works at the level of a function, by compiling that function to machine code just before it is executed (just in time!).
The machine code is executed directly on the processor, thus bypassing the Python virtual machine, and therefore running quicker. The inputs to the function are passed into this function, this all executes as machine code, and then the results are passed by to Python.
You use numba by marking functions that you want to be “just-in-time” (jit) compiled using the numba.jit decorator. For example, here is a very simple function that calculates the square root of an array of numbers;
import math
import numpy as np
def calculate_roots(numbers):
num_vals = len(numbers)
result = np.zeros(num_vals, "f")
for i in range(0, num_vals):
result[i] = math.sqrt(numbers[i])
return result
Let’s see how long this takes to calculate 10 million square roots. We’ll do this by asking numpy to generate an array of 10 million random numbers between 0 and 500.
numbers = 500.0 * np.random.rand(10000000)
Now let’s time the function using timeit
timeit(calculate_roots(numbers)
On my computer I get this result;
1.42 s ± 25.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
So, it takes ~1.4 seconds to calculate 10 million square roots.
We can speed this up by asking numba to jit our calculate_roots function. We do this by adding the @numba.jit() decorator to the function, e.g.
import numba
@numba.jit()
def calculate_roots(numbers):
num_vals = len(numbers)
result = np.zeros(num_vals, "f")
for i in range(0, num_vals):
result[i] = math.sqrt(numbers[i])
return result
Now lets time the function…
timeit(calculate_roots(numbers))
On my computer I get this result;
5.16 ms ± 35.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
It now takes ~5 milliseconds(!) to calculate 10 million square roots. This is almost 300 times faster, just by adding a single @numba.jit() to the top of the function.
Add a @numba.jit() decorator to the calculate_scores function.
Using the timeit function, measure how long the function now takes to complete. How many times faster is the function compared to before you added the @numba.jit() decorator?
Now measure how long the total script takes to run, using, e.g. the time function on MacOS/Linux, or Measure-Command on Windows. How many times faster is the script compared to before you added the @numba.jit() decorator?
Does the speed up of the function match the speed up of the overall script?
Part of the runtime of the script is the just-in-time compilation of the calculate_scores function. The more complex the function, the longer it takes for numba to create and then compile the function to machine code.
You can cache the result of compilation by passing cache=True to the decorator, e.g.
@numba.jit(cache=True)
You will still pay the cost of compilation the first time you run your script. But subsequent runs will load the cached machine code and will use that (thereby avoiding compiling the code again).
Make the change to cache the results of JIT compilation in your copy of slow.py.
The line
(ids, varieties, data) = slow.load_and_parse_data(5)
loads only 5% of the data. This is because, before numba, processing more than 5% of the data took too long. Now that you have accelerated the code, increase this to 100% of the data, e.g.
(ids, varieties, data) = slow.load_and_parse_data(100)
Using timeit in your Jupyter notebook, how much long does it take calculate_scores to run now? Is this about what you expect (twenty times longer to process twenty times the amount of data)?
Make the change to your copy of slow.py. Does increasing the amount of data processed by twenty times change the total runtime of this script?