However, classic test suites written with xUnit and BDD styles have some scaling problems they hit when you want to exercise more than some happy paths:
- it is difficult to cover many different inputs by hand-writing test cases, so we stick with at most a dozen of cases for a particular method.
- There are maintenance costs for every new input we want to test: each need some assertions to be written and be updated in the future if the System Under Test changes its API.
- We tend not to test external dependencies such as the language or libraries, since we trust them to do a good job even if their failure is our responsibility. We chose them and we are deploying our project, not the original authors which provided the code "as is", without warranty.
sort()
Let's take as an example a sort() function, which no one implements today except in job interviews and exercises.
Assuming an array (or list, depending on your language), we can produce several inputs for the function like we would do in a kata:
- [1, 2, 3]
- [3, 1]
- [3, 6, 5, 1, 4]
- []
- [1]
- [1, 1]
- [2, 3, 5, 6, 8, 9, 1, 3, 6, 7, 8, 9]
- [1, 2, 3] => [1, 2, 3]
- [3, 1] => [1, 3[
- [3, 6, 5, 1, 4] => [1, 3, 4, 5, 6]
- [] => []
- [1] => [1]
- [1, 1] => [1, 1]
- [2, 3, 5, 6, 8, 9, 1, 3, 6, 7, 8, 9] => [1, 2, 3, 3, 5, 6, 6, 7, 8, 8, 9, 9]
Property-based testing in a nutshell
Property-based testing in an approach to testing coming from the functional programming world. To solve the aforementioned problems (and get new, more interesting ones), it follows these steps:- generate a random sample of possible inputs.
- Exercise the SUT with each of them.
- Verify properties which should be true on every output instead of making precise comparisons.
- (Optionally) if the properties verification failed, possibly shrink to find a minimal input that still causes a failure.
How does this work for the sort() function?
We can use rand() to generate an input array:This array is composed by natural numbers (Gen\nat) and it is long up to 100 elements (Gen\pos(100)), since very long arrays could make our tests slow.
Then, for each of these inputs, we exercise sort() and verify a simple property on the output, which is the order of the elements:
This is not the only property that sort() maintains, but it's the first I would specify. There are possible others:
- every element in the input is also in the output
- every element in the output is also in the input
- the length of the input and output arrays are the same.
How to find properties?
How do we apply property-based testing to code we actually write every day? It certainly fits more in some areas of the code than in others, such as Domain Model classes.Some rules of thumb for defining properties are:
- look for inverse functions (e.g. addition and substraction, or doubling an image in size and shrinking it to 50%). You can use the inverse on the output and verify equality with the input.
- Relate input and output on some property that is true or false on both (e.g. in the sort() example than an element that is in one of the two arrays is also in the other)
- Define post conditions and invariants that always hold in a particular situation (e.g. in the sort() example that the output is sorted, but in general you can restrict the possible output values of a function very much saying it is an array, it contain only integers, its length is equal to the input's length.)
[2, 3, 5, 6, 8, 9, 1, 3, 6, 7, 8, 9] makes my test fail
Defining valid range of inputs with generators and the properties to be satisfied is a rich description of the behavior of the System Under Test. Therefore, when a sort() implementation fails we can work on the input in order to shrink it: trying to reduce its complexity and size in order to provide a minimal failing test case.It's the same work we do when opening a bug report for someone else's code: we try to find a minimal combination that triggers the bug in order to throw away all unnecessary details that would slow down fixing it.
So in property-based testing the [2, 3, 5, 6, 8, 9, 1, 3, 6, 7, 8, 9] can probably be shrinked to [2, 3, 5, 6, 8, 9, 1, 3, 6, 7, 8] and maybe up to [1, 0], depending on the bug. This process is accomplished by trying to shrink all the random values generated, which in our case were the length of the array and the values contained.
Testing the language
So here's some code I expect to work:This function creates a PHP DateTime instance using the native datetime extension, which is a standard for the PHP world. It starts from an year and a day number ranging from 0 to 364 (or 365) and it build a DateTime pointing to the midnight of that particular day.
Here is a property-based test for this function:
We generate two random integers in the [0. 364] range, and test that the difference in seconds of the two generated DateTime objects is equal to 86400 seconds multiplied by the number of the days passed between the two selected dates. A property of the input (distance) is maintained over the output in a different form (seconds instead of days).
Surprisingly, this test fails with the following message: what happened is we triggered a bug of the DateTime object while creating it with a particular combination of format and timezone. The net effect of this bug could have been that our financial reports (telling daily revenue) would have started showing the wrong numbers starting from February 29th of the next year.
Notice that the input is shrinked to the simplest possible values that trigger the problem: January 1st on one value and March 1st on the other.
Eventually we found a easy work around, as with a couple more lines of code we can avoid this behavior. We could do that only after discovering the bug of course.
In conclusion
Testing an application is a necessary burden for catching defects early and fix them with an acceptable cost instead of letting them run wild on real users. Property-based testing pushes automation also in the generation of inputs for the System Under Test and in the verification of results, hoping to lower the maintanance cost while increasing coverage at the same time.Given the domain complexity handled by the datetime extension, it's doing a fantastic job and it's being developed by very competent programmers. Nevertheless, if they can slip in bugs I trust that my own code will, too. Property-based testing is an additional tool that can work side by side with example-based testing to uncover problems in our projects.
We named the property-based PHPUnit extension after Eris, the Greek goddess of chaos, since serious testing means attacking your code and the platform it is built on in the attempt of breaking it before someone else does.
References
- Eris on Packagist
- Eris examples on GitHub
- The tools that inspired Eris from other languages: QuickCheck, ScalaCheck, Erlang QuickCheck.