import numba
@numba.jit()
def calculate_scores(data):
"""Calculate the score for each row. This is calculated
as the sum of pairs of values on each row where the values
are not equal to each other, and neither are equal to -1.
Returns
=======
scores : numpy array containing the scores
"""
nrows = data.shape[0]
ncols = data.shape[1]
# Here is the list of scores
scores = np.zeros(nrows)
# Loop over all rows
for irow in range(0, nrows):
for i in range(0, ncols):
for j in range(i, ncols):
ival = data[irow, i]
jval = data[irow, j]
if ival != -1 and jval != -1 and ival != jval:
scores[irow] += 1
return scores
import slow
(ids, varieties, data) = slow.load_and_parse_data(5)
scores = slow.calculate_scores(data)
timeit(slow.calculate_scores(data))
On my laptop I get
3.59 ms ± 9.17 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
time slow.py
The best score 21439.0 comes from pattern MDC053363.000_9241
python slow.py 0.67s user 0.21s system 149% cpu 0.587 total
Adding @numba.jit() has made the calculate_scores function almost 875 times faster on my laptop!
Adding @numba.jit() has made the overall script over 5.6 times faster on my laptop.
The overall speed up of the script is not as much as the function because now, the runtime of the script is dominated by the time taken by the other functions, plus starting up and shutting down the script. There is also a non-zero cost of compiling the calculate_scores function.
@numba.jit(cache=True)
def calculate_scores(data):
"""Calculate the score for each row. This is calculated
as the sum of pairs of values on each row where the values
are not equal to each other, and neither are equal to -1.
Returns
=======
scores : numpy array containing the scores
"""
nrows = data.shape[0]
ncols = data.shape[1]
# Here is the list of scores
scores = np.zeros(nrows)
# Loop over all rows
for irow in range(0, nrows):
for i in range(0, ncols):
for j in range(i, ncols):
ival = data[irow, i]
jval = data[irow, j]
if ival != -1 and jval != -1 and ival != jval:
scores[irow] += 1
return scores
One the first run, the script takes 0.682 seconds to run. This is because the function is still compiled.
However, subsequent runs load the cached machine code, and take 0.443 seconds to run. This shows that compilation took about 0.24 seconds.
0.443 seconds is 7.5 times faster than the original script.
import slow
(ids, varieties, data) = slow.load_and_parse_data(100)
timeit(slow.calculate_scores(data))
I get
76.5 ms ± 995 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Processing 20 times the data has taken about 21 times the amount of time. This is about what we expect (there is some random error in timing)
Timing the whole script, I see that this takes 0.585 s. This is just over 100 ms more than when we processed 5% of the data. About 76 ms of this is the increased time to process that data. The remainder is likely to be random error, plus extra costs associated with parsing more data, and allocating / deallocating larger memory arrays.