Property-based Testing
Typeclasses + Monads = Automatic Testing
Lets look at QuickCheck
- Developed by Koen Claessen and John Hughes in 2000
Ported to 40 other languages
PBT used in basically all kinds of software…
Plan
Property-based Testing
Random Test Generation
Case-Study
Property-based Testing
Don’t (only) write individual unit-tests
- only check particular input-output behavior of code
Do write properties desired of the functions
- generate random inputs
- run function
- verify (or rather, try to falsify) the property
Testing with Specifications
PBT emphasizes the importance of specifying correct behavior
Makes you think about what the code should do,
Finds corner-cases where the specification is violated (so code or spec are fixed)
Makes specs live on as machine-checked documentation.
Properties
A Property
is a function that returns a Bool
prop_revapp :: [Int] -> [Int] -> Bool
=
prop_revapp xs ys reverse (xs ++ ys) == reverse xs ++ reverse ys
Looks like a theorem that the programmer believes is true.
- By convention, we write the prefix
"prop_"
QUIZ
Consider the property prop_revapp
prop_revapp :: [Int] -> [Int] -> Bool
=
prop_revapp xs ys reverse (xs ++ ys) == reverse xs ++ reverse ys
It says forall xs
, ys
. reverse (xs ++ ys) == reverse xs ++ reverse ys
.
Is the property true?
A. Yes
B. No
Question
Why specify type as [Int] -> [Int] -> Bool
?
- Why not write
[a] -> [a] -> Bool
?
Checking Properties
QC uses the types to generate random inputs
For that, it needs a specific type like
[Int]
Cannot generate an input of type
[a]
Let’s quickCheck
it!
>>> quickCheck prop_revapp
*** Failed! Falsifiable (after 6 tests and 9 shrinks):
0]
[1] [
Huh? QC gives us a counterexample with xs = [0]
and ys == [1]
reverse (xs ++ ys)
==> reverse ([0] ++ [1])
==> reverse ([0, 1])
==> [1, 0]
but
reverse xs ++ reverse ys
==> reverse [0] ++ reverse [1]
==> [0] ++ [1]
==> [0, 1]
EXERCISE
Can you modify prop_revapp
so that it is sensible and true ?
prop_revapp :: [Int] -> [Int] -> Bool
= reverse (xs ++ ys) == ??? prop_revapp xs ys
When you are done, we should see
>>> quickCheck prop_revapp'
+++ OK, passed 100 tests.
We can run more tests by specifying that as a parameter
= quickCheckWith (stdArgs {maxSuccess = n}) quickCheckN n
Followed by
10000 prop_revapp
quickCheckN +++ OK, passed 10000 tests
QuickSort
Here’s a simple sorting function (quickSort
)
qsort :: (Ord a) => [a] -> [a]
= []
qsort [] :xs) = qsort ls ++ [x] ++ qsort rs
qsort (xwhere
= [y | y <- xs, y < x] -- elems in xs < x
ls = [z | z <- xs, z > x] -- elems in xs > x rs
Seems to work?
>>> [1,3..19] ++ [2,4..20]
1,3,5,7,9,11,13,15,17,19,2,4,6,8,10,12,14,16,18,20]
[
>>> qsort ([1,3..19] ++ [2,4..20])
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20] [
QuickCheck QuickSort
Lets check the output of qsort
is ordered
isOrdered :: (Ord a) => [a] -> Bool
isOrdered :: (Ord a) => [a] -> Bool
:x2:xs) = x1 <= x2 && isOrdered (x2:xs)
isOrdered (x1= True isOrdered _
and use it to write a property
prop_qsort_isOrdered :: [Int] -> Bool
= isOrdered (qsort xs) prop_qsort_isOrdered xs
which we can check
>>> quickCheckN 1000 prop_qsort_isOrdered
+++ OK, passed 1000 tests.
QUIZ
Lets check that the first element of the output is the smallest
qsort :: (Ord a) => [a] -> [a]
= []
qsort [] :xs) = qsort ls ++ [x] ++ qsort rs
qsort (xwhere
= [y | y <- xs, y < x] -- elems in xs < x
ls = [z | z <- xs, z > x] -- elems in xs > x
rs
prop_qsort_min :: [a] -> Bool
= head (qsort xs) == minimum xs prop_qsort_min xs
What is the result of
>>> quickCheck prop_qsort_min
A. Pass 100 tests
B. Fail
Properties and Assumptions
prop_qsort_min :: [a] -> Bool
= head (qsort xs) == minimum x prop_qsort_min xs
Oops!
>>> quickCheck prop_qsort_min
*** Failed! Exception: 'Prelude.head: empty list' (after 1 test):
[]
prop_qsort_min
is not true for all Int
lists
- Property only makes sense if
xs
is not empty
Writing specifications clarifies the assumptions
- under which a given piece of code is supposed to work.
Conditional Properties
Lets modify prop_qsort_min
so equality holds if input is non-null
prop_qsort_nn_min :: [Int] -> Property
=
prop_qsort_nn_min xs not (null xs) ==> head (qsort xs) == minimum xs
Instead of Bool
the function’s output is Property
- a special type built into the QC library
The implies operator ==>
is one of many
- that allow the construction of rich properties.
QUIZ
Lets test that our qsort
is identical to a trusted reference implementation
- may be too slow to deploy but ok to use for checking correctness
Lets use the standard library’s Data.List.sort
function
- (Much faster than ours… but just for illustration!)
prop_qsort_sort :: [Int] -> Bool
= qsort xs == sort xs prop_qsort_sort xs
What is the result of
qsort :: (Ord a) => [a] -> [a]
= []
qsort [] :xs) = qsort ls ++ [x] ++ qsort rs
qsort (xwhere
= [y | y <- xs, y < x] -- elems in xs < x
ls = [z | z <- xs, z > x] -- elems in xs > x
rs
>>> quickCheck prop_qsort_sort
A. OK
after 100 tests
B. Failed! Falsifiable...
Lets Check
>>> quickCheck prop_qsort_sort
*** Failed! Falsifiable (after 6 tests and 3 shrinks):
-3,-3] [
Oops? What?
>>> sort [-3, -3]
-3, -3]
[
>>> qsort [-3, -3]
-3] [
Ugh! So close, and yet … Can you spot the bug in our code?
qsort :: (Ord a) => [a] -> [a]
= []
qsort [] :xs) = qsort ls ++ [x] ++ qsort rs
qsort (xwhere
= [y | y <- xs, y < x] -- elems in xs < x
ls = [z | z <- xs, z > x] -- elems in xs > x rs
Specifying No-Duplicates
We assumed that the input has no duplicates
- Values equal to
x
are thrown out fromls
andrs
Is this a bug? Maybe? Maybe not?
- But at least its something we should be aware of!
Lets specify that a list has no-duplicates
noDuplicates ::(Eq a) => [a] -> Bool
:xs) = not (x `elem` xs) && noDuplicates xs
noDuplicates (x= True noDuplicates _
Specifying a Post-Condition
We can now check that qsort
outputs a list with no-duplicates
prop_qsort_distinct :: [Int] -> Bool
= noDuplicates (qsort xs)
prop_qsort_distinct xs
-- >>> quickCheck prop_qsort_distinct
-- +++ OK, passed 100 tests.
Specifying a Pre-Condition
Also, qsort
is identical to sort
on inputs with no duplicates
prop_qsort_distinct_sort :: [Int] -> Property
=
prop_qsort_distinct_sort xs ==> (qsort xs == sort xs)
(isDistinct xs)
-- >>> quickCheck prop_qsort_distinct_sort
-- +++ OK, passed 100 tests.
--
Plan
Property-based Testing
- Properties are boolean-functions
- Generate inputs, run function, check if result is
False
Random Test Generation
Case-Study
Plan
Property-based Testing
- Properties are boolean-functions
- Generate inputs, run function, check if result is
False
Test Generation
Case-Study
Test Generation
Lets notice something about quickCheck
If you run it once …
>>> quickCheck prop_qsort_sort
*** Failed! Falsifiable (after 6 tests and 2 shrinks):
5,5] [
and if you run it again …
>>> quickCheck prop_qsort_sort
*** Failed! Falsifiable (after 4 tests and 1 shrink):
1,1] [
The falsifying tests are different!
How is this possible?
Generators
QC defines a special Generator data type
data Gen a = MkGen (StdGen -> Int -> a)
A Gen a
is a function that takes as input
- a random number generator
StdGen
- a “seed”
Int
and returns as output
- a value of type
a
Creating Gen
erators
There are some functions to create generators, e.g.
choose :: (Int, Int) -> Gen Int
which
takes a pair of
(lo, hi)
returns a random generator for values between
lo
andhi
Running Gen
erators
To execute the Gen
we need access to the system’s “randomness”
Done via an IO
“recipe”
sample' :: Gen a -> IO [a]
Which
- takes a
Gen
erator ofa
values and - returns a recipe that produces a list of (10)
a
values
We can run the recipe in ghci
>>> sample' (choose (0, 5))
4,2,5,3,2,2,2,3,0,0,0] [
EXERCISE
Lets write a function that returns a Gen
erator over a list of elements
elements :: [a] -> Gen a
= ??? elements
So elements [x0,x1,...,xn]
returns a Gen
erator that randomly produces
values from x0
, x1
, … xn
.
PROBLEM: How to combine Gen
erators?
Suppose I have a generator of positive Int
pos :: Gen Int
= sample (0, 100) pos
How can I create a generator of a pair of positive Int
s?
posPair :: Gen (Int, Int)
Gen
erator is a Monad!
You can see details here
This will let us combine generators (like combining parsers…)
QUIZ
Which of the below implements posPair :: Gen (Int, Int)
?
- given
pos :: Gen Int
andsample :: (Int, Int) -> Gen Int
-- A
= do { x1 <- pos; x2 <- pos; return (x1, x2) }
posPair
-- B
= (pos, pos)
posPair
-- C
= do { x <- pos; return (x, x) }
posPair
-- D
= Gen (4, 5)
posPair
-- E
= (sample (0, 100), sample (0, 100)) posPair
= do { x1 <- pos; x2 <- pos; return (x1, x2) }
posPair
-- >>> sample' posPair
-- [(29,71),(48,74),(89,53),(73,93),(0,40),(71,35),(23,69),(93,49),(59,58),(27,32),(88,45)]
EXERCISE
Lets write a function that mixes a list of Gen
erators
oneOf :: [Gen a] -> Gen a
= ??? oneOf
oneOf [g0,g1,...,gn]
should be a generator that
randomly selects one of
g0
,…gn
randomly generates a value from the chosen generator
>>> sample' (oneOf [choose (0,2), choose (10,12)])
2,2,1,1,12,10,2,2,11,0,11] [
oneOf
is generalized into the frequency
combinator
frequency :: [(Int, Gen a)] -> Gen a
which builds weighted mixtures of individual Gen
erators
Types that can be Gen
erated
QC has a class
for types whose values can be randomly Gen
erated
class Arbitrary a where
arbitrary :: Gen a
T
is an instance of Arbitrary
if there is Gen T
function
- i.e. there is a generator of
T
values!
randomThings :: (Arbitrary a) => IO [a]
= sample' arbitrary randomThings
Many standard types have Arbitrary
instances
- Users write their own instances when testing their own types
>>> randomThings :: IO [Int]
0,-2,-2,0,-1,8,1,-14,-13,5,19]
[
>>> randomThings :: IO [(Int, Bool)]
0,True),(1,True),(0,True),(6,False),(-5,True),(4,False),(-12,False),(-8,False),(5,False),(-9,False),(-7,False)]
[(-
>>> randomThings :: IO [String]
"","\a","\f","\779257W\SUBA","\84573","D\ACK\365059S","9W\554735G","g\SYN~W\62120\&4&[","\NULsc\18427fy(","Q`TI \n/TH","\461027\ESCZ`u\783094\&4B\SOHT\424692"] [
Plan
- Property-based Testing
- Properties are
Bool
ean-functions - Generate inputs, run function, check if result is
False
- Properties are
- Test Generation
Gen a
is a monadic generator ofa
valuesArbitrary
is a class for types with generators
- Case-Study
Case Study: Compiler Optimizations
Lets use QC to test compiler optimizations
Learn how to generate structure data (programs)
Learn how to specify fancy properties (equivalence)
Using the WHILE
language from your HW assignment.
WHILE: Syntax
Recall the definition of Variable
and Value
data Variable = V String
data Value = IntVal Int | BoolVal Bool
which we used to define Expression
data Expression = Var Variable | Val Value
| Plus Expression Expression
| Minus Expression Expression
and Statement
data Statement
= Assign Variable Expression
| If Expression Statement Statement
| While Expression Statement
| Sequence Statement Statement
| Skip
WHILE: Semantics
Next, we defined the behavior of programs as
functions from starting state to final state
where state was defined as a map from
Variable
toValue
type WState = M.Map Variable Value
and then you wrote a function
execute :: WState -> Statement -> WState
= execState (evalS stmt) s0 execute s0 stmt
(We can skip the details of evalS
because you wrote it… right?)
Generating WHILE Programs
Lets write a program generator
Generating Variables
instance Arbitrary Variable where
= do
arbitrary <- elements ['A'..'Z']
x return (V [x])
and we get
>>> randomThings :: IO [Variable]
V "G",V "U",V "Z",V "F",V "Z",V "K",V "P",V "D",V "Y",V "M",V "H"] [
Generating Values
instance Arbitrary Value where
= oneOf [ do {n <- arbitrary; return (IntVal n) }
arbitrary do {b <- arbitrary; return (BoolVal b)}
, ]
and we get
>>> randomThings :: IO [Value]
IntVal 0,BoolVal False,BoolVal True,IntVal 3
[IntVal (-8),IntVal (-3),IntVal 1,BoolVal False
,IntVal 6,BoolVal True,BoolVal False] ,
Generating Expressions
instance Arbitrary Expression where
= expr
arbitrary
baseE :: Gen Expression
expr, binE,= oneof [baseE, binE]
expr
= do { o <- elements [Plus, Minus];
binE <- expr;
e1 <- expr
e2 return (o e1 e2) }
= oneOf [ do {x <- arbitrary; return (Var x) }
baseE do {v <- arbitrary; return (Val v)} ] ,
which gives us
>>> randomThings :: IO [Expression]
... -- lots of expressions!
Generating States
QC already has an way to automatically generate Map
s
instance (Ord k, Arbitrary k, Arbitrary v) => Arbitrary (M.Map k v) where
= do {kvs <- arbitrary; return (M.fromList kvs) } arbitrary
So for free we get a generator for WState
>>> randomThings :: IO [WState]
...
Specification: Program Equivalence
Next, lets specify the properties
Let p1
and p2
be two While programs.
Program p1
is equivalent to p2
written p1 === p2
if
(===) :: Statement -> Statement -> Property
=== p2 = forAll arbitrary (\st -> execute st p1 == execute st p2) p1
That is, for all input states st
executing
p1
fromst
andexecuting
p2
fromst
produce the same state.
QUIZ
For example, suppose that
-- X := 10; Y := 20
= Sequence
prog1 Assign (V "X") (Val (IntVal 10)))
(Assign (V "Y") (Val (IntVal 20)))
(
-- Y := 20; X := 10
= Sequence
prog2 Assign (V "Y") (Val (IntVal 20)))
(Assign (V "X") (Val (IntVal 10))
(
-- Y := 20; X := 20
= Sequence
prog3 Assign (V "Y") (Val (IntVal 20)))
(Assign (V "X") (Val (IntVal 20))) (
then what do the following two queries return?
>>> quickCheck (prog1 === prog2)
>>> quickCheck (prog1 === prog3)
A. FAIL, FAIL
B. FAIL, PASS
C. PASS, FAIL
D. PASS, PASS
Checking Compiler Optimizations
A compiler optimization can be viewed as a pair of programs
- input written by human
p_in
- output optimized by compiler
p_out
An optimization is correct if p_in === p_out
- Compiler should not change behavior of the code
Checking an Optimization: Zero-Add-Elimination
Here’s a simple optimization
In | Out |
---|---|
X := E + 0 |
X := E |
X := E - 0 |
X := E |
Should be correct because adding 0
doesn’t change anything…
We can write this as a QC property
prop_add_zero_elim :: Variable -> Expression -> Property
=
prop_add_zero_elim x e `Assign` (e `Plus` Val (IntVal 0))) === (x `Assign` e)
(x
prop_sub_zero_elim :: Variable -> Expression -> Property
=
prop_sub_zero_elim x e `Assign` (e `Minus` Val (IntVal 0))) === (x `Assign` e) (x
So what does QC say?
-- >>> quickCheck prop_add_zero_elim
-- *** Failed! Falsifiable (after 1 test):
-- W
-- True
-- fromList []
What’s that? Lets see is W := True
equivalent to W := True + 0
?
Forgot about those pesky boolean expressions!
If you recall W := True
will just assign True
to W
>>> execute M.empty ((V "W") `Assign` (Val (BoolVal True)))
W,True)] fromList [(
but W := True + 0
will have a “type error” so will assign 0
to W
!
>>> execute M.empty (V "W") `Assign` ((Val (BoolVal True) `Plus` Val (IntVal 0)))
W,0)] fromList [(
Fix: Restrict Optimizations to Int
Expressions
Problem was expressions like True
- Caused strange behaviors due to “type errors”
Lets restrict to only integer-valued expressions
binI :: Gen Expression
exprI, baseI,= oneof [baseI, binE]
exprI
= oneOf [ do {x <- arbitrary; return (Var x) }
baseI do {n <- arbitrary; return (Val (IntVal n)) }
,
]= do { o <- elements [Plus, Minus];
binI <- exprI;
e1 <- exprI;
e2 return (o e1 e2) }
Now we can restrict the property to
prop_add_zero_elim' :: Variable -> Property
=
prop_add_zero_elim' x -> (x `Assign` (e `Plus` Val (IntVal 0))) === (x `Assign` e)) forAll intExpr (\e
QUIZ
Consider the property
prop_add_zero_elim' :: Variable -> Property
=
prop_add_zero_elim' x -> (x `Assign` (e `Plus` Val (IntVal 0))) === (x `Assign` e)) forAll intExpr (\e
What will be the result of
>>> quickCheck prop_add_zero_elim'
A. PASS
B. FAIL
>>> quickCheck prop_add_zero_elim'
*** Failed! Falsifiable (after 11 tests):
Z
G
B,False),(F,-4),(G,True),(K,8),(M,True),(N,False),(R,3),(T,False),(V,True)] fromList [(
Oops, the counterexample is Z := G
and Z := G + 0
- but now the starting state maps
(G, True)
Pesky Bool
sneaked right back in …
Moral: Even simple optimizations are really tricky without types!
Try at home Can you fix this property so it passes?
Checking an Optimization: Constant-Folding-ish
Lets try another optimization that doesn’t use any arithmetic
- So should not have any problems with
Int
-vs-Bool
Suppose you have two back-to-back assignments
X := E; Y := E
Why recompute E
? Result is already stored in X
! So optimize the above to
X := E; Y := X
We can specify this transformation as the QC property
prop_const_prop :: Variable -> Variable -> Expression -> Property
=
prop_const_prop x y e `Assign` e) `Sequence` (y `Assign` e))
((x ===
`Assign` e) `Sequence` (y `Assign` Var x)) ((x
Mighty QC, do you agree ?
>>> quickCheck prop_const_prop
*Testing> quickCheck prop_const_prop
*** Failed! Falsifiable (after 82 tests):
D
B
True + N + L + M + True + -45 + H + -9 + 70 + True + -68 + N + -29 + I + True + G + O + P + True + Q + False + False + True + True + True + True + X + I + False + 81 + -42 + False + 31 + -13 + T + 23 + True + S + True + I + M + True + True + True + Z + H + -65 + G + K + -22 + D
A,True),(B,-72),(C,-19),(D,-34),(E,50),(F,True),(G,True),(H,-21),(I,5),(J,3),(K,True),(L,-20),(M,True),(N,-10),(O,-20),(P,False),(Q,-10),(R,-78),(S,True),(T,70),(U,False),(V,-55),(W,True),(X,True),(Y,True),(Z,-56)] fromList [(
Shrinking Tests
The property fails … but the counterexample is very long!
QC has a test shrinking mechanism … also in the Arbitrary
class
shrink :: a -> [a]
shrink t
- takes as input a candidate
t
and
- returns as output a list of smaller candidates
That QC will systematically search with to find a minimally failing test!
instance Arbitrary Expression where
= expr
arbitrary
-- shrink :: Expression -> [Expression]
Plus e1 e2) = [e1, e2]
shrink (Minus e1 e2) = [e1, e2]
shrink (= [] shrink _
We should just keep shrinking each Expression
to its sub-Expressions
.
>>> quickCheck prop_const_prop
*** Failed! Falsifiable (after 26 tests and 4 shrinks):
D
U
A + D
D,-638),(G,256),(H,False),(K,False),(O,True),(R,True),(S,-81),(T,926)]
fromList [(~~~~~
Aha! Consider the two programs
```haskell
D := A + D;
U := A + D
and
D := A + D;
U := D
are they equivalent? Pretty subtle, eh.
Recap: Property-Based Testing
- Property-based Testing
- Properties are
Bool
ean-functions - Generate inputs, run function, check if result is
False
- Properties are
- Test Generation
Gen a
is a monadic generator ofa
valuesArbitrary
is a class for types with generators
- Case-Study
- Compiler optimizations are very tricky
- QC-inspired methods have found serious bugs in many compilers & databases & …
Recap: Property-Based Testing
QuickCheck is awesome!
Simple: typeclasses + monads
Useful: Can find subtle bugs or inconsistencies in your code.
Lots of literature on QC and techniques it has inspired
- Can even use QC to generate test data systems in other languages