# cython

cython is the second tool we will explore to accelerate the code. Cython is older and more established than numba. It is a different way of accelerating a Python script. It is more difficult to use than numba. The advantage is that can become easier to use when you are working on larger or more complex projects.

cython is a tool that converts Python to C, and then pre-compiles that C into a module that can be called and used from Python. While you can mix cython-optimised and numba-optimised code in the same script, we recommend that you choose one or the other when embarking on accelerating a particular piece of code.

# Cython Hello World

cython has lots of excellent documentation. This documentation describes a simple “Hello World” example that demonstrates how it works.

First, create a new file called helloworld.pyx. We use the extension .pyx as this shows that this file will be converted by cython into C, and then compiled into a module.

Into this file, type the following;

#cython: language_level=3

print("Hello World!")

(the #cython: language_level=3 is a hint that we are writing a Python 3 script)

Next, we need to create a setup.py script which will convert helloworld.pyx to C, and then compile it into a binary module called helloworld that we can import and use in Python.

Create a new file called setup.py and type in the following;

try:
from setuptools import setup
except ImportError:
from distutils.core import setup

from Cython.Build import cythonize

from distutils.extension import Extension

cyslow = Extension(
"helloworld",
sources=["helloworld.pyx"],
extra_compile_args=['-O3'],
)

setup(
ext_modules = cythonize(cyslow)
)

Here we set the name of the module we will build (cyslow) and the list of source files that comprise that module (just helloworld.pyx). We also give the command line arguments that are passed to the compiler (extra_compile_args) and the linker (extra_link_args). The option -O3 means to compile and link in a fully optimised way. (note that, on Windows for some compilers, you may need to change -O3 to /O2).

The cythonize function then does all of the work of converting helloworld.pyx to C, compiling it, and then linking it as an ext_module.

The next step is to build this module from the command line, using

python setup.py build_ext --inplace

This will create a module called helloworld.{SOMETHING}.so on Linux/MacOS or helloworld.{SOMETHING}.pyd on Windows. This is the compiled C code, arranged as a Python module.

You will also see a file called helloworld.c. This is the actual C source code that cython has generated from your Python. On my computer it is 2783 lines!

You can now load this module in Python. For example, in your Jupyter notebook you can now type;

import helloworld

You should see that Hello World! is printed to the screen.

# Exercise

We have copied the functions from the original version of slow.py to a file called cyslow.pyx, and the “main” section into cyslow_main.py. We have also created a setup.py for you that will compile this module. You can download all of these using;

import urllib.request
url = "https://raw.githubusercontent.com/chryswoods/siremol.org/main/chryswoods.com/accelerating_python/code"
filename = "cyslow.pyx"
urllib.request.urlretrieve(f"{url}/{filename}", filename)
filename = "cyslow_main.py"
urllib.request.urlretrieve(f"{url}/{filename}", filename)
filename = "setup.py"
urllib.request.urlretrieve(f"{url}/{filename}", filename)

Convert cyslow.pyx to C, and then compile it into a module called cyslow using the command;

python setup.py build_ext --inplace

(note that, on Windows, for some compilers, you may need to change -O3 into /O2)

• How many lines of C have been generated in the resulting cyslow.c file?

• Import the cyslow module in your Jupyter notebook. Measure how fast each of the three functions (load_and_parse_data, calculate_scores and get_index_of_best_score) run using timeit, from loading just 5% of the data. How does this compare to how fast they ran from slow.py?

• Run the whole program, via python cyslow_main.py. Time how long this takes to run. How does this compare to the original Python version?