Example-Based Testing: Enough!

Published

October 16, 2024

John Hughes, creator of QuickCheck

“Don’t write tests. Generate them.”

We, mere developers, have the liberty to write software—and eventually ship it to production—without having to go through any legal procedures concerning our algorithms and system properties. Most industries must comply with some form of regulation to sell their products or provide their services. In the software industry, it’s “cooler.” Imagine if the construction industry had the same freedom over their structural designs. Imagine the pharmaceutical industry dictating their drug formulations without oversight. Imagine the aviation and automobile industries operating without strict safety standards… Imagine if they were as “cool” as the software industry… we’d all die.

Let’s talk about the most dangerous form of testing known to mankind: example-based testing.

What is Example-Based Testing, anyway?

Example-based testing focuses on writing tests that check specific inputs against expected outputs. Sounds reasonable, right? Let’s see.

Here’s a typical example of this atrocity in action:

def test_add_numbers():
    assert add(2, 2) == 4
    assert add(0, 0) == 0
    assert add(-1, 1) == 0

You know what this test function does? It works exactly for three cases. Congratulations, we’ve solved addition for 0.000000001% of all possible inputs. We should be proud of ourselves, right?

One Way to Ruin Them All

Let’s explore a few ways to destroy our code, our careers and possibly our will to live.

1. False Confidence

We all know that incredible feeling when all our tests pass. It’s all green. That’s the feeling of atrocity, very likely.. Example-based tests give a false sense of security. Sure, our function works for the three cases we wanted to test, but what about the other infinity minus three cases?

In the code below, what happens when someone tries to divide by zero and the entire system crashes? It’s okay.. At least, it’s all green!

def divide(a, b):
    return a / b

def test_divide():
    assert divide(10, 2) == 5
    assert divide(0, 5) == 0

2. Maintenance Nightmare

As the codebase grows, so does the test suite, and at some point, we realize we’re spending more time updating tests than actually writing new features.

def complex_function(a, b, c, d, e):
    # Some incredibly complex logic here
    pass

def test_complex_function():
    assert complex_function(1, 2, 3, 4, 5) == 42
    assert complex_function(5, 4, 3, 2, 1) == 24
    assert complex_function(0, 0, 0, 0, 0) == 0
    # ... 100 more test cases later ...

The art of creating a monster that will haunt us for the rest of the project’s life. Every time we change complex_function, we’ll need to update all these tests.

3. Edge Cases

Example-based testing is fantastic at hiding edge cases.

def get_nth_element(lst, n):
    return lst[n]

def test_get_nth_element():
    assert get_nth_element([1, 2, 3], 1) == 2
    assert get_nth_element(['a', 'b', 'c'], 0) == 'a'

Looks good, right? What happens when n is negative? What if it’s larger than the list? What if the list is empty? These tests don’t cover those scenarios, so, they fail to reveal the potential issues.

There’s Hope…

You must be asking: “So, what’s the alternative then?” Well, there is a slightly less terrible alternative indeed: property-based testing.

In property-based testing, instead of testing specific examples, you define properties that should always hold true for the function, and then let the testing framework throw a bunch of random inputs at it.

Here’s what it looks like using the hypothesis library in Python:

from hypothesis import given
import hypothesis.strategies as st

def absolute_value(x):
    return abs(x)

@given(st.integers())
def test_absolute_value(x):
    result = absolute_value(x)
    assert result >= 0
    assert result == x or result == -x

This test will generate hundreds of random integers and check if the absolute_value function behaves correctly for all of them.

Keep in mind

Is PBT perfect? No. Is it better than example-based testing? For sure, it is. These are some limitations:

  1. Input Space Boundaries: PBT generates many random inputs but is still limited by the bounds you define. It can’t exhaustively test all possible inputs.

  2. Property Quality: PBT’s effectiveness depends on the quality of the properties you define. Missing critical properties can lead to undetected bugs.

  3. Computational Cost: PBT can be computationally expensive, especially for complex functions or large input spaces.

Still in Doubt?

If you still aren’t convinced, you can check out these references for more insights: