Property-based Testing
Typeclasses + Monads = Automatic TestingLets look at QuickCheck
- Developed by Koen Claessen and John Hughes in 2000
Ported to 40 other languages
PBT used in basically all kinds of software…
Plan
Property-based Testing
Random Test Generation
Case-Study
Property-based Testing
Don’t (only) write individual unit-tests
- only check particular input-output behavior of code
Do write properties desired of the functions
- generate random inputs
- run function
- verify (or rather, try to falsify) the property
Testing with Specifications
PBT emphasizes the importance of specifying correct behavior
Makes you think about what the code should do,
Finds corner-cases where the specification is violated (so code or spec are fixed)
Makes specs live on as machine-checked documentation.
Properties
A Property is a function that returns a Bool
prop_revapp :: [Int] -> [Int] -> Bool
prop_revapp xs ys =
reverse (xs ++ ys) == reverse xs ++ reverse ysLooks like a theorem that the programmer believes is true.
- By convention, we write the prefix
"prop_"
QUIZ
Consider the property prop_revapp
prop_revapp :: [Int] -> [Int] -> Bool
prop_revapp xs ys =
reverse (xs ++ ys) == reverse xs ++ reverse ysIt says forall xs, ys. reverse (xs ++ ys) == reverse xs ++ reverse ys.
Is the property true?
A. Yes
B. No
Question
Why specify type as [Int] -> [Int] -> Bool?
- Why not write
[a] -> [a] -> Bool?
Checking Properties
QC uses the types to generate random inputs
For that, it needs a specific type like
[Int]Cannot generate an input of type
[a]
Let’s quickCheck it!
>>> quickCheck prop_revapp
*** Failed! Falsifiable (after 6 tests and 9 shrinks):
[0]
[1]Huh? QC gives us a counterexample with xs = [0] and ys == [1]
reverse (xs ++ ys)
==> reverse ([0] ++ [1])
==> reverse ([0, 1])
==> [1, 0]but
reverse xs ++ reverse ys
==> reverse [0] ++ reverse [1]
==> [0] ++ [1]
==> [0, 1]
EXERCISE
Can you modify prop_revapp so that it is sensible and true ?
prop_revapp :: [Int] -> [Int] -> Bool
prop_revapp xs ys = reverse (xs ++ ys) == ???When you are done, we should see
>>> quickCheck prop_revapp'
+++ OK, passed 100 tests.We can run more tests by specifying that as a parameter
quickCheckN n = quickCheckWith (stdArgs {maxSuccess = n})Followed by
quickCheckN 10000 prop_revapp
+++ OK, passed 10000 tests
QuickSort
Here’s a simple sorting function (quickSort)
qsort :: (Ord a) => [a] -> [a]
qsort [] = []
qsort (x:xs) = qsort ls ++ [x] ++ qsort rs
where
ls = [y | y <- xs, y < x] -- elems in xs < x
rs = [z | z <- xs, z > x] -- elems in xs > xSeems to work?
>>> [1,3..19] ++ [2,4..20]
[1,3,5,7,9,11,13,15,17,19,2,4,6,8,10,12,14,16,18,20]
>>> qsort ([1,3..19] ++ [2,4..20])
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
QuickCheck QuickSort
Lets check the output of qsort is ordered
isOrdered :: (Ord a) => [a] -> Bool
isOrdered :: (Ord a) => [a] -> Bool
isOrdered (x1:x2:xs) = x1 <= x2 && isOrdered (x2:xs)
isOrdered _ = Trueand use it to write a property
prop_qsort_isOrdered :: [Int] -> Bool
prop_qsort_isOrdered xs = isOrdered (qsort xs)which we can check
>>> quickCheckN 1000 prop_qsort_isOrdered
+++ OK, passed 1000 tests.
QUIZ
Lets check that the first element of the output is the smallest
qsort :: (Ord a) => [a] -> [a]
qsort [] = []
qsort (x:xs) = qsort ls ++ [x] ++ qsort rs
where
ls = [y | y <- xs, y < x] -- elems in xs < x
rs = [z | z <- xs, z > x] -- elems in xs > x
prop_qsort_min :: [a] -> Bool
prop_qsort_min xs = head (qsort xs) == minimum xsWhat is the result of
>>> quickCheck prop_qsort_minA. Pass 100 tests
B. Fail
Properties and Assumptions
prop_qsort_min :: [a] -> Bool
prop_qsort_min xs = head (qsort xs) == minimum xOops!
>>> quickCheck prop_qsort_min
*** Failed! Exception: 'Prelude.head: empty list' (after 1 test):
[]prop_qsort_min is not true for all Int lists
- Property only makes sense if
xsis not empty
Writing specifications clarifies the assumptions
- under which a given piece of code is supposed to work.
Conditional Properties
Lets modify prop_qsort_min so equality holds if input is non-null
prop_qsort_nn_min :: [Int] -> Property
prop_qsort_nn_min xs =
not (null xs) ==> head (qsort xs) == minimum xsInstead of Bool the function’s output is Property
- a special type built into the QC library
The implies operator ==> is one of many
- that allow the construction of rich properties.
QUIZ
Lets test that our qsort is identical to a trusted reference implementation
- may be too slow to deploy but ok to use for checking correctness
Lets use the standard library’s Data.List.sort function
- (Much faster than ours… but just for illustration!)
prop_qsort_sort :: [Int] -> Bool
prop_qsort_sort xs = qsort xs == sort xsWhat is the result of
qsort :: (Ord a) => [a] -> [a]
qsort [] = []
qsort (x:xs) = qsort ls ++ [x] ++ qsort rs
where
ls = [y | y <- xs, y < x] -- elems in xs < x
rs = [z | z <- xs, z > x] -- elems in xs > x
>>> quickCheck prop_qsort_sortA. OK after 100 tests
B. Failed! Falsifiable...
Lets Check
>>> quickCheck prop_qsort_sort
*** Failed! Falsifiable (after 6 tests and 3 shrinks):
[-3,-3]Oops? What?
>>> sort [-3, -3]
[-3, -3]
>>> qsort [-3, -3]
[-3]Ugh! So close, and yet … Can you spot the bug in our code?
qsort :: (Ord a) => [a] -> [a]
qsort [] = []
qsort (x:xs) = qsort ls ++ [x] ++ qsort rs
where
ls = [y | y <- xs, y < x] -- elems in xs < x
rs = [z | z <- xs, z > x] -- elems in xs > x
Specifying No-Duplicates
We assumed that the input has no duplicates
- Values equal to
xare thrown out fromlsandrs
Is this a bug? Maybe? Maybe not?
- But at least its something we should be aware of!
Lets specify that a list has no-duplicates
noDuplicates ::(Eq a) => [a] -> Bool
noDuplicates (x:xs) = not (x `elem` xs) && noDuplicates xs
noDuplicates _ = True
Specifying a Post-Condition
We can now check that qsort outputs a list with no-duplicates
prop_qsort_distinct :: [Int] -> Bool
prop_qsort_distinct xs = noDuplicates (qsort xs)
-- >>> quickCheck prop_qsort_distinct
-- +++ OK, passed 100 tests.
Specifying a Pre-Condition
Also, qsort is identical to sort on inputs with no duplicates
prop_qsort_distinct_sort :: [Int] -> Property
prop_qsort_distinct_sort xs =
(isDistinct xs) ==> (qsort xs == sort xs)
-- >>> quickCheck prop_qsort_distinct_sort
-- +++ OK, passed 100 tests.
--
Plan
Property-based Testing
- Properties are boolean-functions
- Generate inputs, run function, check if result is
False
Random Test Generation
Case-Study
Plan
Property-based Testing
- Properties are boolean-functions
- Generate inputs, run function, check if result is
False
Test Generation
Case-Study
Test Generation
Lets notice something about quickCheck
If you run it once …
>>> quickCheck prop_qsort_sort
*** Failed! Falsifiable (after 6 tests and 2 shrinks):
[5,5]and if you run it again …
>>> quickCheck prop_qsort_sort
*** Failed! Falsifiable (after 4 tests and 1 shrink):
[1,1]The falsifying tests are different!
How is this possible?
Generators
QC defines a special Generator data type
data Gen a = MkGen (StdGen -> Int -> a)A Gen a is a function that takes as input
- a random number generator
StdGen - a “seed”
Int
and returns as output
- a value of type
a
Creating Generators
There are some functions to create generators, e.g.
choose :: (Int, Int) -> Gen Intwhich
takes a pair of
(lo, hi)returns a random generator for values between
loandhi
Running Generators
To execute the Gen we need access to the system’s “randomness”
Done via an IO “recipe”
sample' :: Gen a -> IO [a]Which
- takes a
Generator ofavalues and - returns a recipe that produces a list of (10)
avalues
We can run the recipe in ghci
>>> sample' (choose (0, 5))
[4,2,5,3,2,2,2,3,0,0,0]
EXERCISE
Lets write a function that returns a Generator over a list of elements
elements :: [a] -> Gen a
elements = ???So elements [x0,x1,...,xn] returns a Generator that randomly produces
values from x0, x1, … xn.
PROBLEM: How to combine Generators?
Suppose I have a generator of positive Int
pos :: Gen Int
pos = sample (0, 100)How can I create a generator of a pair of positive Ints?
posPair :: Gen (Int, Int)
Generator is a Monad!
You can see details here
This will let us combine generators (like combining parsers…)
QUIZ
Which of the below implements posPair :: Gen (Int, Int) ?
- given
pos :: Gen Intandsample :: (Int, Int) -> Gen Int
-- A
posPair = do { x1 <- pos; x2 <- pos; return (x1, x2) }
-- B
posPair = (pos, pos)
-- C
posPair = do { x <- pos; return (x, x) }
-- D
posPair = Gen (4, 5)
-- E
posPair = (sample (0, 100), sample (0, 100))
posPair = do { x1 <- pos; x2 <- pos; return (x1, x2) }
-- >>> sample' posPair
-- [(29,71),(48,74),(89,53),(73,93),(0,40),(71,35),(23,69),(93,49),(59,58),(27,32),(88,45)]
EXERCISE
Lets write a function that mixes a list of Generators
oneOf :: [Gen a] -> Gen a
oneOf = ???oneOf [g0,g1,...,gn] should be a generator that
randomly selects one of
g0,…gnrandomly generates a value from the chosen generator
>>> sample' (oneOf [choose (0,2), choose (10,12)])
[2,2,1,1,12,10,2,2,11,0,11]
oneOf is generalized into the frequency combinator
frequency :: [(Int, Gen a)] -> Gen awhich builds weighted mixtures of individual Generators
Types that can be Generated
QC has a class for types whose values can be randomly Generated
class Arbitrary a where
arbitrary :: Gen aT is an instance of Arbitrary if there is Gen T function
- i.e. there is a generator of
Tvalues!
randomThings :: (Arbitrary a) => IO [a]
randomThings = sample' arbitraryMany standard types have Arbitrary instances
- Users write their own instances when testing their own types
>>> randomThings :: IO [Int]
[0,-2,-2,0,-1,8,1,-14,-13,5,19]
>>> randomThings :: IO [(Int, Bool)]
[(0,True),(1,True),(0,True),(6,False),(-5,True),(4,False),(-12,False),(-8,False),(5,False),(-9,False),(-7,False)]
-
>>> randomThings :: IO [String]
["","\a","\f","\779257W\SUBA","\84573","D\ACK\365059S","9W\554735G","g\SYN~W\62120\&4&[","\NULsc\18427fy(","Q`TI \n/TH","\461027\ESCZ`u\783094\&4B\SOHT\424692"]
Plan
- Property-based Testing
- Properties are
Boolean-functions - Generate inputs, run function, check if result is
False
- Properties are
- Test Generation
Gen ais a monadic generator ofavaluesArbitraryis a class for types with generators
- Case-Study
Case Study: Compiler Optimizations
Lets use QC to test compiler optimizations
Learn how to generate structure data (programs)
Learn how to specify fancy properties (equivalence)
Using the WHILE language from your HW assignment.
WHILE: Syntax
Recall the definition of Variable and Value
data Variable = V String
data Value = IntVal Int | BoolVal Boolwhich we used to define Expression
data Expression = Var Variable | Val Value
| Plus Expression Expression
| Minus Expression Expressionand Statement
data Statement
= Assign Variable Expression
| If Expression Statement Statement
| While Expression Statement
| Sequence Statement Statement
| Skip
WHILE: Semantics
Next, we defined the behavior of programs as
functions from starting state to final state
where state was defined as a map from
VariabletoValue
type WState = M.Map Variable Valueand then you wrote a function
execute :: WState -> Statement -> WState
execute s0 stmt = execState (evalS stmt) s0(We can skip the details of evalS because you wrote it… right?)
Generating WHILE Programs
Lets write a program generator
Generating Variables
instance Arbitrary Variable where
arbitrary = do
x <- elements ['A'..'Z']
return (V [x])and we get
>>> randomThings :: IO [Variable]
[V "G",V "U",V "Z",V "F",V "Z",V "K",V "P",V "D",V "Y",V "M",V "H"]
Generating Values
instance Arbitrary Value where
arbitrary = oneOf [ do {n <- arbitrary; return (IntVal n) }
, do {b <- arbitrary; return (BoolVal b)}
]and we get
>>> randomThings :: IO [Value]
[IntVal 0,BoolVal False,BoolVal True,IntVal 3
,IntVal (-8),IntVal (-3),IntVal 1,BoolVal False
,IntVal 6,BoolVal True,BoolVal False]
Generating Expressions
instance Arbitrary Expression where
arbitrary = expr
expr, binE, baseE :: Gen Expression
expr = oneof [baseE, binE]
binE = do { o <- elements [Plus, Minus];
e1 <- expr;
e2 <- expr
return (o e1 e2) }
baseE = oneOf [ do {x <- arbitrary; return (Var x) }
, do {v <- arbitrary; return (Val v)} ]which gives us
>>> randomThings :: IO [Expression]
... -- lots of expressions!
Generating States
QC already has an way to automatically generate Maps
instance (Ord k, Arbitrary k, Arbitrary v) => Arbitrary (M.Map k v) where
arbitrary = do {kvs <- arbitrary; return (M.fromList kvs) }So for free we get a generator for WState
>>> randomThings :: IO [WState]
...
Specification: Program Equivalence
Next, lets specify the properties
Let p1 and p2 be two While programs.
Program p1 is equivalent to p2 written p1 === p2 if
(===) :: Statement -> Statement -> Property
p1 === p2 = forAll arbitrary (\st -> execute st p1 == execute st p2)That is, for all input states st
executing
p1fromstandexecuting
p2fromst
produce the same state.
QUIZ
For example, suppose that
-- X := 10; Y := 20
prog1 = Sequence
(Assign (V "X") (Val (IntVal 10)))
(Assign (V "Y") (Val (IntVal 20)))
-- Y := 20; X := 10
prog2 = Sequence
(Assign (V "Y") (Val (IntVal 20)))
(Assign (V "X") (Val (IntVal 10))
-- Y := 20; X := 20
prog3 = Sequence
(Assign (V "Y") (Val (IntVal 20)))
(Assign (V "X") (Val (IntVal 20)))then what do the following two queries return?
>>> quickCheck (prog1 === prog2)
>>> quickCheck (prog1 === prog3)A. FAIL, FAIL
B. FAIL, PASS
C. PASS, FAIL
D. PASS, PASS
Checking Compiler Optimizations
A compiler optimization can be viewed as a pair of programs
- input written by human
p_in - output optimized by compiler
p_out
An optimization is correct if p_in === p_out
- Compiler should not change behavior of the code
Checking an Optimization: Zero-Add-Elimination
Here’s a simple optimization
| In | Out |
|---|---|
X := E + 0 |
X := E |
X := E - 0 |
X := E |
Should be correct because adding 0 doesn’t change anything…
We can write this as a QC property
prop_add_zero_elim :: Variable -> Expression -> Property
prop_add_zero_elim x e =
(x `Assign` (e `Plus` Val (IntVal 0))) === (x `Assign` e)
prop_sub_zero_elim :: Variable -> Expression -> Property
prop_sub_zero_elim x e =
(x `Assign` (e `Minus` Val (IntVal 0))) === (x `Assign` e)So what does QC say?
-- >>> quickCheck prop_add_zero_elim
-- *** Failed! Falsifiable (after 1 test):
-- W
-- True
-- fromList []What’s that? Lets see is W := True equivalent to W := True + 0 ?
Forgot about those pesky boolean expressions!
If you recall W := True will just assign True to W
>>> execute M.empty ((V "W") `Assign` (Val (BoolVal True)))
fromList [(W,True)]but W := True + 0 will have a “type error” so will assign 0 to W!
>>> execute M.empty (V "W") `Assign` ((Val (BoolVal True) `Plus` Val (IntVal 0)))
fromList [(W,0)]
Fix: Restrict Optimizations to Int Expressions
Problem was expressions like True
- Caused strange behaviors due to “type errors”
Lets restrict to only integer-valued expressions
exprI, baseI, binI :: Gen Expression
exprI = oneof [baseI, binE]
baseI = oneOf [ do {x <- arbitrary; return (Var x) }
, do {n <- arbitrary; return (Val (IntVal n)) }
]
binI = do { o <- elements [Plus, Minus];
e1 <- exprI;
e2 <- exprI;
return (o e1 e2) }Now we can restrict the property to
prop_add_zero_elim' :: Variable -> Property
prop_add_zero_elim' x =
forAll intExpr (\e -> (x `Assign` (e `Plus` Val (IntVal 0))) === (x `Assign` e))
QUIZ
Consider the property
prop_add_zero_elim' :: Variable -> Property
prop_add_zero_elim' x =
forAll intExpr (\e -> (x `Assign` (e `Plus` Val (IntVal 0))) === (x `Assign` e))What will be the result of
>>> quickCheck prop_add_zero_elim'A. PASS
B. FAIL
>>> quickCheck prop_add_zero_elim'
*** Failed! Falsifiable (after 11 tests):
Z
G
fromList [(B,False),(F,-4),(G,True),(K,8),(M,True),(N,False),(R,3),(T,False),(V,True)]Oops, the counterexample is Z := G and Z := G + 0
- but now the starting state maps
(G, True)
Pesky Bool sneaked right back in …
Moral: Even simple optimizations are really tricky without types!
Try at home Can you fix this property so it passes?
Checking an Optimization: Constant-Folding-ish
Lets try another optimization that doesn’t use any arithmetic
- So should not have any problems with
Int-vs-Bool
Suppose you have two back-to-back assignments
X := E; Y := EWhy recompute E? Result is already stored in X! So optimize the above to
X := E; Y := XWe can specify this transformation as the QC property
prop_const_prop :: Variable -> Variable -> Expression -> Property
prop_const_prop x y e =
((x `Assign` e) `Sequence` (y `Assign` e))
===
((x `Assign` e) `Sequence` (y `Assign` Var x))Mighty QC, do you agree ?
>>> quickCheck prop_const_prop
*Testing> quickCheck prop_const_prop
*** Failed! Falsifiable (after 82 tests):
D
B
True + N + L + M + True + -45 + H + -9 + 70 + True + -68 + N + -29 + I + True + G + O + P + True + Q + False + False + True + True + True + True + X + I + False + 81 + -42 + False + 31 + -13 + T + 23 + True + S + True + I + M + True + True + True + Z + H + -65 + G + K + -22 + D
fromList [(A,True),(B,-72),(C,-19),(D,-34),(E,50),(F,True),(G,True),(H,-21),(I,5),(J,3),(K,True),(L,-20),(M,True),(N,-10),(O,-20),(P,False),(Q,-10),(R,-78),(S,True),(T,70),(U,False),(V,-55),(W,True),(X,True),(Y,True),(Z,-56)]
Shrinking Tests
The property fails … but the counterexample is very long!

QC has a test shrinking mechanism … also in the Arbitrary class
shrink :: a -> [a]shrink t
- takes as input a candidate
tand
- returns as output a list of smaller candidates
That QC will systematically search with to find a minimally failing test!
instance Arbitrary Expression where
arbitrary = expr
-- shrink :: Expression -> [Expression]
shrink (Plus e1 e2) = [e1, e2]
shrink (Minus e1 e2) = [e1, e2]
shrink _ = []We should just keep shrinking each Expression to its sub-Expressions.
>>> quickCheck prop_const_prop
*** Failed! Falsifiable (after 26 tests and 4 shrinks):
D
U
A + D
fromList [(D,-638),(G,256),(H,False),(K,False),(O,True),(R,True),(S,-81),(T,926)]
~~~~~
Aha! Consider the two programs
```haskell
D := A + D;
U := A + Dand
D := A + D;
U := Dare they equivalent? Pretty subtle, eh.
Recap: Property-Based Testing
- Property-based Testing
- Properties are
Boolean-functions - Generate inputs, run function, check if result is
False
- Properties are
- Test Generation
Gen ais a monadic generator ofavaluesArbitraryis a class for types with generators
- Case-Study
- Compiler optimizations are very tricky
- QC-inspired methods have found serious bugs in many compilers & databases & …
Recap: Property-Based Testing
QuickCheck is awesome!
Simple: typeclasses + monads
Useful: Can find subtle bugs or inconsistencies in your code.
Lots of literature on QC and techniques it has inspired
- Can even use QC to generate test data systems in other languages