Property-based Testing
Lets look at QuickCheck
- Developed by Koen Claessen and John Hughes in 2000
Ported to 40 other languages
PBT used in basically all kinds of software…
Plan
Property-based Testing
Random Test Generation
Case-Study
Property-based Testing
Don’t (only) write individual unit-tests
- only check particular input-output behavior of code
Do write properties desired of the functions
- generate random inputs
- run function
- verify (or rather, try to falsify) the property
Testing with Specifications
PBT emphasizes the importance of specifying correct behavior
Makes you think about what the code should do,
Finds corner-cases where the specification is violated (so code or spec are fixed)
Makes specs live on as machine-checked documentation.
Properties
A Property
is a function that returns a Bool
prop_revapp :: [Int] -> [Int] -> Bool
prop_revapp xs ys =
reverse (xs ++ ys) == reverse xs ++ reverse ys
Looks like a theorem that the programmer believes is true.
- By convention, we write the prefix
"prop_"
QUIZ
Consider the property prop_revapp
prop_revapp :: [Int] -> [Int] -> Bool
prop_revapp xs ys =
reverse (xs ++ ys) == reverse xs ++ reverse ys
It says forall xs
, ys
. reverse (xs ++ ys) == reverse xs ++ reverse ys
.
Is the property true?
A. Yes
B. No
Question
Why specify type as [Int] -> [Int] -> Bool
?
- Why not write
[a] -> [a] -> Bool
?
Checking Properties
QC uses the types to generate random inputs
For that, it needs a specific type like
[Int]
Cannot generate an input of type
[a]
Let’s quickCheck
it!
Huh? QC gives us a counterexample with xs = [0]
and ys == [1]
but
EXERCISE
Can you modify prop_revapp
so that it is sensible and true ?
When you are done, we should see
We can run more tests by specifying that as a parameter
Followed by
QuickSort
Here’s a simple sorting function (quickSort
)
qsort :: (Ord a) => [a] -> [a]
qsort [] = []
qsort (x:xs) = qsort ls ++ [x] ++ qsort rs
where
ls = [y | y <- xs, y < x] -- elems in xs < x
rs = [z | z <- xs, z > x] -- elems in xs > x
Seems to work?
>>> [1,3..19] ++ [2,4..20]
[1,3,5,7,9,11,13,15,17,19,2,4,6,8,10,12,14,16,18,20]
>>> qsort ([1,3..19] ++ [2,4..20])
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
QuickCheck QuickSort
Lets check the output of qsort
is ordered
isOrdered :: (Ord a) => [a] -> Bool
isOrdered :: (Ord a) => [a] -> Bool
isOrdered (x1:x2:xs) = x1 <= x2 && isOrdered (x2:xs)
isOrdered _ = True
and use it to write a property
which we can check
QUIZ
Lets check that the first element of the output is the smallest
qsort :: (Ord a) => [a] -> [a]
qsort [] = []
qsort (x:xs) = qsort ls ++ [x] ++ qsort rs
where
ls = [y | y <- xs, y < x] -- elems in xs < x
rs = [z | z <- xs, z > x] -- elems in xs > x
prop_qsort_min :: [a] -> Bool
prop_qsort_min xs = head (qsort xs) == minimum xs
What is the result of
A. Pass 100 tests
B. Fail
Properties and Assumptions
Oops!
prop_qsort_min
is not true for all Int
lists
- Property only makes sense if
xs
is not empty
Writing specifications clarifies the assumptions
- under which a given piece of code is supposed to work.
Conditional Properties
Lets modify prop_qsort_min
so equality holds if input is non-null
prop_qsort_nn_min :: [Int] -> Property
prop_qsort_nn_min xs =
not (null xs) ==> head (qsort xs) == minimum xs
Instead of Bool
the function’s output is Property
- a special type built into the QC library
The implies operator ==>
is one of many
- that allow the construction of rich properties.
QUIZ
Lets test that our qsort
is identical to a trusted reference implementation
- may be too slow to deploy but ok to use for checking correctness
Lets use the standard library’s Data.List.sort
function
- (Much faster than ours… but just for illustration!)
What is the result of
qsort :: (Ord a) => [a] -> [a]
qsort [] = []
qsort (x:xs) = qsort ls ++ [x] ++ qsort rs
where
ls = [y | y <- xs, y < x] -- elems in xs < x
rs = [z | z <- xs, z > x] -- elems in xs > x
>>> quickCheck prop_qsort_sort
A. OK
after 100 tests
B. Failed! Falsifiable...
Lets Check
Oops? What?
Ugh! So close, and yet … Can you spot the bug in our code?
qsort :: (Ord a) => [a] -> [a]
qsort [] = []
qsort (x:xs) = qsort ls ++ [x] ++ qsort rs
where
ls = [y | y <- xs, y < x] -- elems in xs < x
rs = [z | z <- xs, z > x] -- elems in xs > x
Specifying No-Duplicates
We assumed that the input has no duplicates
- Values equal to
x
are thrown out fromls
andrs
Is this a bug? Maybe? Maybe not?
- But at least its something we should be aware of!
Lets specify that a list has no-duplicates
noDuplicates ::(Eq a) => [a] -> Bool
noDuplicates (x:xs) = not (x `elem` xs) && noDuplicates xs
noDuplicates _ = True
Specifying a Post-Condition
We can now check that qsort
outputs a list with no-duplicates
prop_qsort_distinct :: [Int] -> Bool
prop_qsort_distinct xs = noDuplicates (qsort xs)
-- >>> quickCheck prop_qsort_distinct
-- +++ OK, passed 100 tests.
Specifying a Pre-Condition
Also, qsort
is identical to sort
on inputs with no duplicates
prop_qsort_distinct_sort :: [Int] -> Property
prop_qsort_distinct_sort xs =
(isDistinct xs) ==> (qsort xs == sort xs)
-- >>> quickCheck prop_qsort_distinct_sort
-- +++ OK, passed 100 tests.
--
Plan
- Property-based Testing
- Properties are boolean-functions
- Generate inputs, run function, check if result is
False
Random Test Generation
Case-Study
Plan
- Property-based Testing
- Properties are boolean-functions
- Generate inputs, run function, check if result is
False
Test Generation
Case-Study
Test Generation
Lets notice something about quickCheck
If you run it once …
and if you run it again …
The falsifying tests are different!
How is this possible?
Generators
QC defines a special Generator data type
A Gen a
is a function that takes as input
- a random number generator
StdGen
- a “seed”
Int
and returns as output
- a value of type
a
Creating Gen
erators
There are some functions to create generators, e.g.
which
takes a pair of
(lo, hi)
returns a random generator for values between
lo
andhi
Running Gen
erators
To execute the Gen
we need access to the system’s “randomness”
Done via an IO
“recipe”
Which
- takes a
Gen
erator ofa
values and - returns a recipe that produces a list of (10)
a
values
We can run the recipe in ghci
EXERCISE
Lets write a function that returns a Gen
erator over a list of elements
So elements [x0,x1,...,xn]
returns a Gen
erator that randomly produces values from x0
, x1
, … xn
.
PROBLEM: How to combine Gen
erators?
Suppose I have a generator of positive Int
How can I create a generator of a pair of positive Int
s?
Gen
erator is a Monad!
You can see details here
This will let us combine generators (like combining parsers…)
QUIZ
Which of the below implements posPair :: Gen (Int, Int)
?
- given
pos :: Gen Int
andsample :: (Int, Int) -> Gen Int
-- A
posPair = do { x1 <- pos; x2 <- pos; return (x1, x2) }
-- B
posPair = (pos, pos)
-- C
posPair = do { x <- pos; return (x, x) }
-- D
posPair = Gen (4, 5)
-- E
posPair = (sample (0, 100), sample (0, 100))
posPair = do { x1 <- pos; x2 <- pos; return (x1, x2) }
-- >>> sample' posPair
-- [(29,71),(48,74),(89,53),(73,93),(0,40),(71,35),(23,69),(93,49),(59,58),(27,32),(88,45)]
EXERCISE
Lets write a function that mixes a list of Gen
erators
oneOf [g0,g1,...,gn]
should be a generator that
randomly selects one of
g0
,…gn
randomly generates a value from the chosen generator
oneOf
is generalized into the frequency
combinator
which builds weighted mixtures of individual Gen
erators
Types that can be Gen
erated
QC has a class
for types whose values can be randomly Gen
erated
T
is an instance of Arbitrary
if there is Gen T
function
- i.e. there is a generator of
T
values!
Many standard types have Arbitrary
instances
- Users write their own instances when testing their own types
>>> randomThings :: IO [Int]
[0,-2,-2,0,-1,8,1,-14,-13,5,19]
>>> randomThings :: IO [(Int, Bool)]
[(0,True),(1,True),(0,True),(6,False),(-5,True),(4,False),(-12,False),(-8,False),(5,False),(-9,False),(-7,False)]
-
>>> randomThings :: IO [String]
["","\a","\f","\779257W\SUBA","\84573","D\ACK\365059S","9W\554735G","g\SYN~W\62120\&4&[","\NULsc\18427fy(","Q`TI \n/TH","\461027\ESCZ`u\783094\&4B\SOHT\424692"]
Plan
- Property-based Testing
- Properties are
Bool
ean-functions - Generate inputs, run function, check if result is
False
- Properties are
- Test Generation
Gen a
is a monadic generator ofa
valuesArbitrary
is a class for types with generators
- Case-Study
Case Study: Compiler Optimizations
Lets use QC to test compiler optimizations
Learn how to generate structure data (programs)
Learn how to specify fancy properties (equivalence)
Using the WHILE
language from your HW assignment.
WHILE: Syntax
Recall the definition of Variable
and Value
which we used to define Expression
data Expression = Var Variable | Val Value
| Plus Expression Expression
| Minus Expression Expression
and Statement
data Statement
= Assign Variable Expression
| If Expression Statement Statement
| While Expression Statement
| Sequence Statement Statement
| Skip
WHILE: Semantics
Next, we defined the behavior of programs as
functions from starting state to final state
where state was defined as a map from
Variable
toValue
and then you wrote a function
(We can skip the details of evalS
because you wrote it… right?)
Generating WHILE Programs
Lets write a program generator
Generating Variables
and we get
>>> randomThings :: IO [Variable]
[V "G",V "U",V "Z",V "F",V "Z",V "K",V "P",V "D",V "Y",V "M",V "H"]
Generating Values
instance Arbitrary Value where
arbitrary = oneOf [ do {n <- arbitrary; return (IntVal n) }
, do {b <- arbitrary; return (BoolVal b)}
]
and we get
>>> randomThings :: IO [Value]
[IntVal 0,BoolVal False,BoolVal True,IntVal 3
,IntVal (-8),IntVal (-3),IntVal 1,BoolVal False
,IntVal 6,BoolVal True,BoolVal False]
Generating Expressions
instance Arbitrary Expression where
arbitrary = expr
expr, binE, baseE :: Gen Expression
expr = oneof [baseE, binE]
binE = do { o <- elements [Plus, Minus];
e1 <- expr;
e2 <- expr
return (o e1 e2) }
baseE = oneOf [ do {x <- arbitrary; return (Var x) }
, do {v <- arbitrary; return (Val v)} ]
which gives us
Generating States
QC already has an way to automatically generate Map
s
instance (Ord k, Arbitrary k, Arbitrary v) => Arbitrary (M.Map k v) where
arbitrary = do {kvs <- arbitrary; return (M.fromList kvs) }
So for free we get a generator for WState
Specification: Program Equivalence
Next, lets specify the properties
Let p1
and p2
be two While programs.
Program p1
is equivalent to p2
written p1 === p2
if
(===) :: Statement -> Statement -> Property
p1 === p2 = forAll arbitrary (\st -> execute st p1 == execute st p2)
That is, for all input states st
executing
p1
fromst
andexecuting
p2
fromst
produce the same state.
QUIZ
For example, suppose that
-- X := 10; Y := 20
prog1 = Sequence
(Assign (V "X") (Val (IntVal 10)))
(Assign (V "Y") (Val (IntVal 20)))
-- Y := 20; X := 10
prog2 = Sequence
(Assign (V "Y") (Val (IntVal 20)))
(Assign (V "X") (Val (IntVal 10))
-- Y := 20; X := 20
prog3 = Sequence
(Assign (V "Y") (Val (IntVal 20)))
(Assign (V "X") (Val (IntVal 20)))
then what do the following two queries return?
A. FAIL, FAIL
B. FAIL, PASS
C. PASS, FAIL
D. PASS, PASS
Checking Compiler Optimizations
A compiler optimization can be viewed as a pair of programs
- input written by human
p_in
- output optimized by compiler
p_out
An optimization is correct if p_in === p_out
- Compiler should not change behavior of the code
Checking an Optimization: Zero-Add-Elimination
Here’s a simple optimization
In | Out |
---|---|
X := E + 0 |
X := E |
X := E - 0 |
X := E |
Should be correct because adding 0
doesn’t change anything…
We can write this as a QC property
prop_add_zero_elim :: Variable -> Expression -> Property
prop_add_zero_elim x e =
(x `Assign` (e `Plus` Val (IntVal 0))) === (x `Assign` e)
prop_sub_zero_elim :: Variable -> Expression -> Property
prop_sub_zero_elim x e =
(x `Assign` (e `Minus` Val (IntVal 0))) === (x `Assign` e)
So what does QC say?
-- >>> quickCheck prop_add_zero_elim
-- *** Failed! Falsifiable (after 1 test):
-- W
-- True
-- fromList []
What’s that? Lets see is W := True
equivalent to W := True + 0
?
Forgot about those pesky boolean expressions!
If you recall W := True
will just assign True
to W
but W := True + 0
will have a “type error” so will assign 0
to W
!
Fix: Restrict Optimizations to Int
Expressions
Problem was expressions like True
- Caused strange behaviors due to “type errors”
Lets restrict to only integer-valued expressions
exprI, baseI, binI :: Gen Expression
exprI = oneof [baseI, binE]
baseI = oneOf [ do {x <- arbitrary; return (Var x) }
, do {n <- arbitrary; return (Val (IntVal n)) }
]
binI = do { o <- elements [Plus, Minus];
e1 <- exprI;
e2 <- exprI;
return (o e1 e2) }
Now we can restrict the property to
prop_add_zero_elim' :: Variable -> Property
prop_add_zero_elim' x =
forAll intExpr (\e -> (x `Assign` (e `Plus` Val (IntVal 0))) === (x `Assign` e))
QUIZ
Consider the property
prop_add_zero_elim' :: Variable -> Property
prop_add_zero_elim' x =
forAll intExpr (\e -> (x `Assign` (e `Plus` Val (IntVal 0))) === (x `Assign` e))
What will be the result of
A. PASS
B. FAIL
>>> quickCheck prop_add_zero_elim'
*** Failed! Falsifiable (after 11 tests):
Z
G
fromList [(B,False),(F,-4),(G,True),(K,8),(M,True),(N,False),(R,3),(T,False),(V,True)]
Oops, the counterexample is Z := G
and Z := G + 0
- but now the starting state maps
(G, True)
Pesky Bool
sneaked right back in …
Moral: Even simple optimizations are really tricky without types!
Try at home Can you fix this property so it passes?
Checking an Optimization: Constant-Folding-ish
Lets try another optimization that doesn’t use any arithmetic
- So should not have any problems with
Int
-vs-Bool
Suppose you have two back-to-back assignments
Why recompute E
? Result is already stored in X
! So optimize the above to
We can specify this transformation as the QC property
prop_const_prop :: Variable -> Variable -> Expression -> Property
prop_const_prop x y e =
((x `Assign` e) `Sequence` (y `Assign` e))
===
((x `Assign` e) `Sequence` (y `Assign` Var x))
Mighty QC, do you agree ?
>>> quickCheck prop_const_prop
*Testing> quickCheck prop_const_prop
*** Failed! Falsifiable (after 82 tests):
D
B
True + N + L + M + True + -45 + H + -9 + 70 + True + -68 + N + -29 + I + True + G + O + P + True + Q + False + False + True + True + True + True + X + I + False + 81 + -42 + False + 31 + -13 + T + 23 + True + S + True + I + M + True + True + True + Z + H + -65 + G + K + -22 + D
fromList [(A,True),(B,-72),(C,-19),(D,-34),(E,50),(F,True),(G,True),(H,-21),(I,5),(J,3),(K,True),(L,-20),(M,True),(N,-10),(O,-20),(P,False),(Q,-10),(R,-78),(S,True),(T,70),(U,False),(V,-55),(W,True),(X,True),(Y,True),(Z,-56)]
Shrinking Tests
The property fails … but the counterexample is very long!
QC has a test shrinking mechanism … also in the Arbitrary
class
shrink t
- takes as input a candidate
t
and
- returns as output a list of smaller candidates
That QC will systematically search with to find a minimally failing test!
instance Arbitrary Expression where
arbitrary = expr
-- shrink :: Expression -> [Expression]
shrink (Plus e1 e2) = [e1, e2]
shrink (Minus e1 e2) = [e1, e2]
shrink _ = []
We should just keep shrinking each Expression
to its sub-Expressions
.
```haskell >>> quickCheck prop_const_prop *** Failed! Falsifiable (after 26 tests and 4 shrinks):
D U A + D fromList [(D,-638),(G,256),(H,False),(K,False),(O,True),(R,True),(S,-81),(T,926)] ~~~~~
Aha! Consider the two programs
and
are they equivalent? Pretty subtle, eh.
Recap: Property-Based Testing
- Property-based Testing
- Properties are
Bool
ean-functions - Generate inputs, run function, check if result is
False
- Properties are
- Test Generation
Gen a
is a monadic generator ofa
valuesArbitrary
is a class for types with generators
- Case-Study
- Compiler optimizations are very tricky
- QC-inspired methods have found serious bugs in many compilers & databases & …
Recap: Property-Based Testing
QuickCheck is awesome!
Simple: typeclasses + monads
Useful: Can find subtle bugs or inconsistencies in your code.
Lots of literature on QC and techniques it has inspired
- Can even use QC to generate test data systems in other languages