Testing is extremely important. Without testing, you cannot be sure that your code is doing what you think. Testing is an integral part of software development, and should be done while you are writing code, not after the code has been written.
There are two main types of tests, both of which you should include in your code.
Runtime (sanity) tests - these are light-weight tests performed while the code is running to ensure that everything is ok, e.g. arguments passed to a function make sense and are valid inputs.
Correctness (unit) tests - these are heavier tests, typically run and written separately from the code, that test that the functions give the correct answers and behave in the expected way.
These are run in a function to ensure that the function is being called correctly with sensible (sane) arguments. For example, lets consider the following script;
"""Module containing functions used to demonstrate the need for testing""" def addArrays(a, b): """Function to add together the two passed arrays, returning the result.""" c =  for a_, b_ in zip(a, b): c.append(a_ + b_) return c
nano to copy and paste the above script into a file called
addarrays.py. Then open a new
ipython session in the same directory as
addarrays.py and type;
from addarrays import addArrays c = addArrays( [1,2,3], [4,5,6] ) print(c)
should show that
c is equal to
[5, 7, 9]. Now type;
c = addArrays( [1,2], [4,5,6] ) print(c)
c is seen to be equal to
[5, 7]. Is this what you expected?
The problem is that
addArrays expects both arrays to contain the same number of items. The first array was smaller than the second, but it did not give any error. Should it have returned [5,7,6] instead?
To clean the function, we need to add a runtime test that checks that both arrays have the same length. If they don’t, then we need to report this back to the user using a sensible error message. We do this using an exception. Exit
ipython and use
nano to edit the
addarrays.py script. Change it so that the function looks like this;
def addArrays(a, b): """Function to add together the two passed arrays, returning the result.""" if len(a) != len(b): raise ValueError("Both arrays must have the same length.") c =  for a_, b_ in zip(a, b): c.append(a_ + b_) return c
Here, we raise a
ValueError, which indicates that something is wrong with the value of one of the arguments. A list of all Python exceptions is here. Also note that you can create your own exceptions as well, instructions here (although this is beyond what we have time to cover in this course).
Now when we call the function incorrectly, we get a sensible error message. Check this by opening a new
ipython session and typing;
from addarrays import addArrays c = addArrays([1,2], [3,4,5])
and you should see;
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-2-fdd61ba0cb11> in <module>() ----> 1 addarrays.addArrays([1,2], [3,4,5]) /path/to/addarrays.py in addArrays(a, b) 5 6 if len(a) != len(b): ----> 7 raise ValueError("Both arrays must have the same length.") 8 9 c =  ValueError: Both arrays must have the same length.
The benefit of an exception is that it provides a way for your function to test and report when something has gone wrong. If something has gone wrong, it can be reported back to the user with a sensible error message. Also, unlike just printing a message and exiting the program, exceptions provide a way to recover from errors. This is achieved using “try” blocks. For example try typing the following;
a = [1,2] b = [3,4,5] try: c = addArrays(a,b) print(c) except ValueError: print("Something went wrong calling addArrays")
You should see that the string
Something went wrong calling addArrays is now printed to the screen.
try block lets you try to run a piece of code. If an exception is raised, then the exception is caught in the
except block. This can be used either to present an even cleaner error message, or to fix the problem, e.g. try typing this;
a = [1,2] b = [3,4,5] try: c = addArrays(a,b) except ValueError: while len(a) < len(b): a.append(0) while len(b) < len(a): b.append(0) c = addArrays(a,b) print( c )
Now you will see that the
c is equal to the array
[4, 6, 5]. Because the arrays
b were not the same length, the first call to
addArrays in the
try block caused a
ValueError exception to be raised. This was caught in the
except ValueError block. In here, because
a was smaller than
b, zeroes were appended onto
a until it had the same size as
b. Now the next call of
addArrays in the
except ValueError block was successful, allowing
c to be created and printed at the end.
So you can see that exceptions allow us to fix problems in the context of how the function is called. Note that it would not be appropriate to add this fix into
addArrays itself, as
addArrays cannot know itself whether or not the arrays contain numbers, or whether or not it would be appropriate to make the arrays equal by padding with zeroes. Only the code that calls
addArrays knows the context of the call, and thus what an appropriate fix would be. Exceptions provide a way for
addArrays to signal that a problem has occurred, and the
try block provides the way for the caller to fix the problem.
The second set of tests are correctness (also called unit) tests. These are tests that are run on a function to test that it is giving the correct output. For example, we can test that
addArrays is adding together numbers correctly by creating a new function to test it, e.g. in a new
ipython session type;
from addarrays import addArrays def test_add(): a = [1,2,3] b = [4,5,6] expect = [5,7,9] c = addArrays(a,b) if c == expect: print("OK") else: print("BROKEN") test_add()
You should see that the test passed and the string
OK was printed to the screen.
Testing manually works but is time-consuming and error prone - we might forget to run a test. What we need is a way to collect together all of the tests and to automate them.
The first thing to do is to create a testing script for our module, which is typically called “test_MODULENAME.py”, so in our case, it would be “test_addarrays.py”. Into this file, we should add all of our tests, e.g. using
nano copy and paste in the following;
from addarrays import addArrays def test_add(): a = [1,2,3] b = [4,5,6] expect = [5,7,9] c = addArrays(a,b) assert expect == c
The only change here is that we have used
assert. This is a statement that does nothing if the passed test is true, but that will raise an
AssertionError exception if the test is false. We can run the test manually using ipython, e.g. in a new
ipython session type:
from test_addarrays import test_add test_add()
You should see that nothing happens, as the test passes.
This is still a bit manual. Fortunately, there is a package called
pytest which automates running test scripts like this. pytest automatically finds, runs and reports on tests. Exit
ipython and then, on the command line type;
You should see printed to the screen;
============================= test session starts ============================== platform linux -- Python 3.5.2, pytest-3.0.5, py-1.4.32, pluggy-0.4.0 rootdir: /panfs/panasas01/training/train01, inifile: collected 1 items test_addarrays.py . =========================== 1 passed in 0.02 seconds ===========================
This automatically searched all the python files in the directory for functions that started with
test_ and ran them.
You can check this by breaking the code, e.g. edit
addarrays.py and change the function to the following (replaces
a_ + b_ with
a_ - b_);
def addArrays(a, b): """Function to add together the two passed arrays, returning the result.""" if len(a) != len(b): raise ValueError("Both arrays must have the same length.") c =  for a_, b_ in zip(a): c.append(a_ - b_) return c
Now go back to the command line and run
pytest again, e.g.
You should now see something like
============================= test session starts ============================== platform linux -- Python 3.5.2, pytest-3.0.5, py-1.4.32, pluggy-0.4.0 rootdir: /panfs/panasas01/training/train01, inifile: collected 1 items test_addarrays.py F =================================== FAILURES =================================== ___________________________________ test_add ___________________________________ def test_add(): a = [1,2,3] b = [4,5,6] expect = [5,7,9] c = addArrays(a,b) > assert expect == c E assert [5, 7, 9] == [-3, -3, -3] E At index 0 diff: 5 != -3 E Use -v to get the full diff test_addarrays.py:8: AssertionError =========================== 1 failed in 0.07 seconds ===========================
You can see that it has picked out the line at which the
assert failed (marked with the
> on the left).
What follows (on lines beginning with
E) is then a little bit of help to see why it failed.
First it prints out the
assert line again but with the variables expanded out so you can see exactly what it was comparing.
Then it tells you what part of the comparison failed, in this case it found that the first elements didn’t match (
5 != -3).
When 1 + 1 = 2.0000001
One problem with testing that a calculation is correct is that computers don’t do floating point arithmetic too well. For example, in a new
ipython session type;
expected = 0 actual = 0.1 + 0.1 + 0.1 - 0.3 assert expected == actual
While this may work, you will likely that an
AssertionError exception was raised, e.g.
--------------------------------------------------------------------------- AssertionError Traceback (most recent call last) <ipython-input-3-18a1029b2615> in <module>() ----> 1 assert expected == actual AssertionError:
We can see what caused the problem by printing the value of
On my machine, I get the value
The problem is that computers are continually rounding floating point numbers. Rounding errors can accumulate during a calculation and these can lead to seemingly wrong predictions such that
0.1 + 0.1 + 0.1 - 0.3 != 0. Rounding errors can cause problems in your code, and also cause problems when writing tests. If you are going to compare floating point numbers, then you must make the comparison to within a threshold or delta, e.g. expected agrees with actual if
abs(expected - actual) < 0.0000000000000001. Notice the use of python’s inbuilt absolute value (
abs) function - in this case, it is important that you specify this as the absolute difference. Otherwise, if ever
actual is greater than
expected (depsite being within the threshold), it will fail, not giving you what you were hoping for.
Thresholds are application-specific.
pytest provides an
approx function that allows you to compare floating point numbers to within different thresholds.
It does this by comparing two numbers up to a specified absolute or relative precision, e.g. type
import pytest assert actual == pytest.approx(expected)
prints nothing, because
5.55111512313e-17 is equal to 0 to within a relative precision of
1e-6 (the default for
assert actual == pytest.approx(expected, abs=1e-10)
This again prints nothing, as
5.55111512313e-17 is equal to 0 up to 10 decimal places. Now try;
assert actual == pytest.approx(expected, abs=1e-17)
This should now raise an
AssertionError exception that looks like this;
--------------------------------------------------------------------------- AssertionError Traceback (most recent call last) <ipython-input-44-c91bdbddaf67> in <module>() ----> 1 assert actual == pytest.approx(expected, abs=1e-17) AssertionError:
When should we test?
Testing is extremely important, and is the only way that you can check whether or not your code performs as you expect. Documentation tells the user what a function is supposed to do, while tests provide the guarantee that the function actually does it. Ideally you should write tests continually during development of software. For example, my workflow is to plan a new function, then write the documentation for the function (so that I don’t forget my plan!), then write the function, and then write tests to ensure that the function is working. I am then able to move onto the next function I need to write, safe in the knowledge that the previous functions I have written will work and will not cause obscure and difficult to find bugs. Then, as I continue to develop the software, and it is used by other people, I will discover new bugs or will receive bug reports. I then turn these bug reports into new unit tests, so that, once fixed, those bugs cannot reappear in my code.
Obviously, we can’t write tests to cover every problem, and indeed trying to write too large a test suite would cost us more time than would be worthwhile. However, you will quickly work out how much is the right amount of testing, through trial and error. There is definitely no excuse for never testing, and any effort expended in writing tests is less painful than dealing with the aftermath of either;
- A bug being discovered in your script just before you publish a paper on the results, leading you to have to delay publication or, worse, have to make a retraction.
- Or (as happened once to myself) having to tell another scientist that all of their calculations have to be run again as the script they had been using had a bug that rendered all output incorrect.
In addition, you should also periodically review your tests, like code, to avoid
- Pass when they should fail, false positives.
- Fail when they should pass, false negatives.
- Don’t test anything.
Also, never, ever write ‘empty’ tests, such as;
def test_critical_correctness(): # TODO - will complete this tomorrow! pass
These give a false sense of security!
- Costs time while coding, but saves time in the long run (less effort spent debugging, less effort spent recovering from bugs found just before paper publication).
- Gives confidence that code does what we want and expect it to.
- Promotes trust that code, and so research, is correct.
- Mirrors your documentation. Documentation provides the promise of what the code will do. Tests provide the proof.
One of the problems with testing is that you want to test if an action will be correct, without necessarily performing the action. For example, you may want to test that your script will correctly identify which files to remove, without actually removing those files. Or, you may want to test that your script will correctly form the command line to run an external program, without actually running that program. Mocking is the process of testing without acting. If you want to learn more about mocking, see this post.
test_addarrays.py with more tests, e.g. a function to test that
addArrays correctly adds together arrays of negative numbers, a function to test that
addArrays correctly adds arrays of strings, and a function to test that
addArrays correctly adds together empty arrays. Try to think of all of the things that could break the code. Also add a function that tests that
addArrays correctly reports when the arrays are the wrong size, e.g.
def test_wrongsize(): a = [1, 2, 3] b = [4, 5] with pytest.raises(ValueError): addArrays(a, b)
Also add in tests for floating point addition, using
Run your tests with
If you get stuck, an example test script is here