Exercise

import numba

@numba.jit(cache=True, parallel=True)
def calculate_scores(data):
    """Calculate the score for each row. This is calculated
       as the sum of pairs of values on each row where the values
       are not equal to each other, and neither are equal to -1.

       Returns
       =======

            scores : numpy array containing the scores
    """
    nrows = data.shape[0]
    ncols = data.shape[1]

    # Here is the list of scores
    scores = np.zeros(nrows)

    # Loop over all rows
    for irow in numba.prange(0, nrows):
        for i in range(0, ncols):
            for j in range(i, ncols):
                ival = data[irow, i]
                jval = data[irow, j]

                if ival != -1 and jval != -1 and ival != jval:
                    scores[irow] += 1

    return scores
import slow

(ids, varieties, data) = slow.load_and_parse_data(5)
scores = slow.calculate_scores(data)

timeit(slow.calculate_scores(data))

(ids, varieties, data) = slow.load_and_parse_data(100)
scores = slow.calculate_scores(data)

timeit(slow.calculate_scores(data))

On my laptop I get

739 µs ± 94.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

for processing 5% of the data, and

11.6 ms ± 138 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

for processing 100% of the data.

time slow.py
The best score 21931.0 comes from pattern MDC054122.001_13602
python slow.py  0.87s user 0.42s system 254% cpu 0.508 total

Back