numba is the first tool we will explore to accelerate the code.
numba is a “just-in-time” compiler. It works at the level of a function, by compiling that function to machine code just before it is executed (just in time!).
The machine code is executed directly on the processor, thus bypassing the Python virtual machine, and therefore running quicker. The inputs to the function are passed into this function, this all executes as machine code, and then the results are passed by to Python.
numba by marking functions that you want to be “just-in-time” (jit) compiled using the
numba.jit decorator. For example, here is a very simple function that calculates the square root of an array of numbers;
import math import numpy as np def calculate_roots(numbers): num_vals = len(numbers) result = np.zeros(num_vals, "f") for i in range(0, num_vals): result[i] = math.sqrt(numbers[i]) return result
Let’s see how long this takes to calculate 10 million square roots. We’ll do this by asking
numpy to generate an array of 10 million random numbers between 0 and 500.
numbers = 500.0 * np.random.rand(10000000)
Now let’s time the function using
On my computer I get this result;
1.42 s ± 25.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
So, it takes ~1.4 seconds to calculate 10 million square roots.
We can speed this up by asking
numba to jit our
calculate_roots function. We do this by adding the
@numba.jit() decorator to the function, e.g.
import numba @numba.jit() def calculate_roots(numbers): num_vals = len(numbers) result = np.zeros(num_vals, "f") for i in range(0, num_vals): result[i] = math.sqrt(numbers[i]) return result
Now lets time the function…
On my computer I get this result;
5.16 ms ± 35.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
It now takes ~5 milliseconds(!) to calculate 10 million square roots. This is almost 300 times faster, just by adding a single
@numba.jit() to the top of the function.
@numba.jit() decorator to the
timeit function, measure how long the function now takes to complete. How many times faster is the function compared to before you added the
Now measure how long the total script takes to run, using, e.g. the
time function on MacOS/Linux, or
Measure-Command on Windows. How many times faster is the script compared to before you added the
Does the speed up of the function match the speed up of the overall script?
Part of the runtime of the script is the just-in-time compilation of the
calculate_scores function. The more complex the function, the longer it takes for
numba to create and then compile the function to machine code.
You can cache the result of compilation by passing
cache=True to the decorator, e.g.
You will still pay the cost of compilation the first time you run your script. But subsequent runs will load the cached machine code and will use that (thereby avoiding compiling the code again).
Make the change to cache the results of JIT compilation in your copy of
(ids, varieties, data) = slow.load_and_parse_data(5)
loads only 5% of the data. This is because, before numba, processing more than 5% of the data took too long. Now that you have accelerated the code, increase this to 100% of the data, e.g.
(ids, varieties, data) = slow.load_and_parse_data(100)
timeit in your Jupyter notebook, how much long does it take
calculate_scores to run now? Is this about what you expect (twenty times longer to process twenty times the amount of data)?
Make the change to your copy of
slow.py. Does increasing the amount of data processed by twenty times change the total runtime of this script?