Property Based Testing in Python

Posted on Fri 16 October 2020 in Python • 4 min read

Building software can be challenging, but maintaining it after that initial build can be even moreso. Being able to test the software such that it verifies the software behaves as expected is crucial in building robust software applications that users depend upon, being able to automate this testing is even better! There's other blog posts on this blog around the topic of testing Introduction to Pytest & Pipenv, but for this post we're going to focus on a very specific type of testing, property based testing.

Property based testing differs itself from the conventional example based testing by being able to generate the test data that drives your tests, and even better, can help find the boundaries of where the tests fail.

To demonstrate the power of property based testing, we're going to build some testing for the old faithful multiplication operator in Python.

To help with this, we are going to use a few packages:

  • pytest (testing framework)
  • hypothesis (property testing package)
  • ipytest (to enable running tests in jupyter notebooks)

Before we dive in, let's set up ipytest and use some example-based testing to verify the multiplication operator.

In [5]:
import ipytest
import pytest

def multiply(number_1, number_2):
    return number_1 * number_2
In [6]:

def test_example():
    assert multiply(3,3) == 9
    assert multiply(5,5) == 25
    assert multiply(4,6) == 24
.                                                                        [100%]
1 passed in 0.02s

Fantastic, our examples passed the test! Now let's ensure that the test fails.

In [7]:

def test_fail_example():
    assert multiply(3,3) == 9
    assert multiply(3,5) == 150
F                                                                        [100%]
================================== FAILURES ===================================
______________________________ test_fail_example ______________________________

    def test_fail_example():
        assert multiply(3,3) == 9
>       assert multiply(3,5) == 150
E       assert 15 == 150
E        +  where 15 = multiply(3, 5)

<ipython-input-7-212df0aaa8ed>:3: AssertionError
=========================== short test summary info ===========================
FAILED - assert 15 == 150
1 failed in 0.34s

Perfect! We can see that the test fails as expected and even nicely tells us which line of code it failed on. Let's say we had lots of these examples that we wanted to test for, so to simplify it we could potentially use pytest's parametrize decorator.

In [8]:

@pytest.mark.parametrize('number_1, number_2 , expected', [
def test_multiply(number_1,number_2,expected):
    assert expected == multiply(number_1,number_2)
...                                                                      [100%]
3 passed in 0.02s

Is this enough testing to verify our function? Really, we're only testing a few conditions that we'd expect to work, but in reality it's the ones that nobody foresees that would be ideal to capture in our tests. This also raises a few more things, the developer writing the tests may choose to write 2 or 2000 test cases but this doesn't guarantee anything when it comes to if it's truly covered.

Introduce Property Based Testing

Property based testing is considered as generative testing, we don't supply specific examples with inputs and expected outputs. Rather we define certain properties and generate randomized inputs to ensure the properties are correct. In addition to this, property based testing can also shrink outputs to find the exact boundary condition where a test fails.

While this doesn't 100% replace example-based testing, they definitely have their use and have a lot of potential for effective testing. Now let's implement the same tests above, using property based testing with hypothesis.

In [9]:
from hypothesis import given
import hypothesis.strategies as st
In [10]:

def test_multiply(number_1,number_2):
    assert multiply(number_1,number_2) == number_1 * number_2
.                                                                        [100%]
1 passed in 0.14s

Note that we've used the given decorator which makes our test parametrized, and use strategies which cover the types of input data to generate. As per the hypothesis documentation Most things should be easy to generate and everything should be possible, we can find more information on them here:

Now this doesn't look any different to last time, so what even changed! Let's change our multiply function so it behaves strangely and see if we can see hypothesis shrink the failures in action. Shrinking is whenever it finds a failure, it'll try to get to the absolute boundary case to help us find the potential cause and even better it'll remember this failure for next time so it doesn't poke it's head up again!

In [15]:
def bad_multiply(number_1,number_2):
    if number_1 > 30:
        return 0
    if number_2 < 0:
        return 0
    return number_1 * number_2
In [16]:

def test_bad_multiply(number_1,number_2):
    assert bad_multiply(number_1,number_2) == number_1 * number_2
F                                                                        [100%]
================================== FAILURES ===================================
______________________________ test_bad_multiply ______________________________

>   def test_bad_multiply(number_1,number_2):

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

number_1 = 31, number_2 = 1

    def test_bad_multiply(number_1,number_2):
>       assert bad_multiply(number_1,number_2) == number_1 * number_2
E       assert 0 == (31 * 1)
E        +  where 0 = bad_multiply(31, 1)

<ipython-input-16-3e2ec463c8ad>:3: AssertionError
--------------------------------- Hypothesis ----------------------------------
Falsifying example: test_bad_multiply(
    number_1=31, number_2=1,
=========================== short test summary info ===========================
FAILED - assert 0 == (31 * 1)
1 failed in 0.22s

Fantastic, we can see that the failure has been shrunken to number_1 being 31 and number_2 being 1 which is one integer off the 'bad' boundary conditions we'd introduced into the multiply function.

Hopefully this has introduced the power of property based testing and can help make software more robust for everyone!