Property-based Testing









Property-based Testing

    Typeclasses + Monads = Automatic Testing

Lets look at QuickCheck

Ported to 40 other languages

PBT used in basically all kinds of software…


















Plan

  1. Property-based Testing

  2. Random Test Generation

  3. Case-Study












Property-based Testing

Don’t (only) write individual unit-tests

  • only check particular input-output behavior of code

Do write properties desired of the functions

  • generate random inputs
  • run function
  • verify (or rather, try to falsify) the property
















Testing with Specifications

PBT emphasizes the importance of specifying correct behavior

  • Makes you think about what the code should do,

  • Finds corner-cases where the specification is violated (so code or spec are fixed)

  • Makes specs live on as machine-checked documentation.
















Properties

A Property is a function that returns a Bool

prop_revapp :: [Int] -> [Int] -> Bool
prop_revapp xs ys = 
    reverse (xs ++ ys) == reverse xs ++ reverse ys

Looks like a theorem that the programmer believes is true.

  • By convention, we write the prefix "prop_"














QUIZ

Consider the property prop_revapp

prop_revapp :: [Int] -> [Int] -> Bool
prop_revapp xs ys = 
    reverse (xs ++ ys) == reverse xs ++ reverse ys

It says forall xs, ys. reverse (xs ++ ys) == reverse xs ++ reverse ys.

Is the property true?

A. Yes

B. No












Question

Why specify type as [Int] -> [Int] -> Bool?

  • Why not write [a] -> [a] -> Bool?












Checking Properties

QC uses the types to generate random inputs

  • For that, it needs a specific type like [Int]

  • Cannot generate an input of type [a]

Let’s quickCheck it!

>>> quickCheck prop_revapp
*** Failed! Falsifiable (after 6 tests and 9 shrinks):
[0]
[1]

Huh? QC gives us a counterexample with xs = [0] and ys == [1]

reverse (xs ++ ys) 
==> reverse ([0] ++ [1]) 
==> reverse ([0, 1]) 
==> [1, 0]

but

reverse xs ++ reverse ys
==> reverse [0] ++ reverse [1]
==> [0] ++ [1]
==> [0, 1]
















EXERCISE

Can you modify prop_revapp so that it is sensible and true ?

prop_revapp :: [Int] -> [Int] -> Bool
prop_revapp xs ys = reverse (xs ++ ys) == ???

When you are done, we should see

>>> quickCheck prop_revapp'
+++ OK, passed 100 tests.

We can run more tests by specifying that as a parameter

quickCheckN n = quickCheckWith (stdArgs {maxSuccess = n})

Followed by

quickCheckN 10000 prop_revapp
+++ OK, passed 10000 tests











QuickSort

Here’s a simple sorting function (quickSort)

qsort        :: (Ord a) => [a] -> [a]
qsort []     = []
qsort (x:xs) = qsort ls ++ [x] ++ qsort rs
  where 
    ls       = [y | y <- xs, y < x]  -- elems in xs < x 
    rs       = [z | z <- xs, z > x]  -- elems in xs > x

Seems to work?

>>> [1,3..19] ++ [2,4..20]
[1,3,5,7,9,11,13,15,17,19,2,4,6,8,10,12,14,16,18,20]

>>> qsort ([1,3..19] ++ [2,4..20])
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]










QuickCheck QuickSort

Lets check the output of qsort is ordered

isOrdered :: (Ord a) => [a] -> Bool
isOrdered ::         (Ord a) => [a] -> Bool
isOrdered (x1:x2:xs) = x1 <= x2 && isOrdered (x2:xs)
isOrdered _          = True

and use it to write a property

prop_qsort_isOrdered :: [Int] -> Bool
prop_qsort_isOrdered xs = isOrdered (qsort xs)

which we can check

>>> quickCheckN 1000 prop_qsort_isOrdered
+++ OK, passed 1000 tests.












QUIZ

Lets check that the first element of the output is the smallest

qsort        :: (Ord a) => [a] -> [a]
qsort []     = []
qsort (x:xs) = qsort ls ++ [x] ++ qsort rs
  where 
    ls       = [y | y <- xs, y < x]  -- elems in xs < x 
    rs       = [z | z <- xs, z > x]  -- elems in xs > x

prop_qsort_min :: [a] -> Bool
prop_qsort_min xs = head (qsort xs) == minimum xs

What is the result of

>>> quickCheck prop_qsort_min

A. Pass 100 tests

B. Fail















Properties and Assumptions

prop_qsort_min :: [a] -> Bool
prop_qsort_min xs = head (qsort xs) == minimum x

Oops!

>>> quickCheck prop_qsort_min
*** Failed! Exception: 'Prelude.head: empty list' (after 1 test):
[]

prop_qsort_min is not true for all Int lists

  • Property only makes sense if xs is not empty

Writing specifications clarifies the assumptions

  • under which a given piece of code is supposed to work.













Conditional Properties

Lets modify prop_qsort_min so equality holds if input is non-null

prop_qsort_nn_min    :: [Int] -> Property
prop_qsort_nn_min xs =
  not (null xs) ==> head (qsort xs) == minimum xs

Instead of Bool the function’s output is Property

  • a special type built into the QC library

The implies operator ==> is one of many

  • that allow the construction of rich properties.












QUIZ

Lets test that our qsort is identical to a trusted reference implementation

  • may be too slow to deploy but ok to use for checking correctness

Lets use the standard library’s Data.List.sort function

  • (Much faster than ours… but just for illustration!)
prop_qsort_sort    :: [Int] -> Bool
prop_qsort_sort xs =  qsort xs == sort xs

What is the result of

qsort        :: (Ord a) => [a] -> [a]
qsort []     = []
qsort (x:xs) = qsort ls ++ [x] ++ qsort rs
  where 
    ls       = [y | y <- xs, y < x]  -- elems in xs < x 
    rs       = [z | z <- xs, z > x]  -- elems in xs > x

>>> quickCheck prop_qsort_sort

A. OK after 100 tests

B. Failed! Falsifiable...















Lets Check

>>> quickCheck prop_qsort_sort
*** Failed! Falsifiable (after 6 tests and 3 shrinks):
[-3,-3]

Oops? What?

>>> sort [-3, -3]
[-3, -3]

>>> qsort [-3, -3]
[-3]

Ugh! So close, and yet … Can you spot the bug in our code?

qsort        :: (Ord a) => [a] -> [a]
qsort []     = []
qsort (x:xs) = qsort ls ++ [x] ++ qsort rs
  where 
    ls       = [y | y <- xs, y < x]  -- elems in xs < x 
    rs       = [z | z <- xs, z > x]  -- elems in xs > x












Specifying No-Duplicates

We assumed that the input has no duplicates

  • Values equal to x are thrown out from ls and rs

Is this a bug? Maybe? Maybe not?

  • But at least its something we should be aware of!

Lets specify that a list has no-duplicates

noDuplicates ::(Eq a) => [a] -> Bool
noDuplicates (x:xs) = not (x `elem` xs) && noDuplicates xs
noDuplicates _      = True










Specifying a Post-Condition

We can now check that qsort outputs a list with no-duplicates

prop_qsort_distinct :: [Int] -> Bool 
prop_qsort_distinct xs = noDuplicates (qsort xs)  

-- >>> quickCheck prop_qsort_distinct
-- +++ OK, passed 100 tests.











Specifying a Pre-Condition

Also, qsort is identical to sort on inputs with no duplicates

prop_qsort_distinct_sort :: [Int] -> Property 
prop_qsort_distinct_sort xs = 
  (isDistinct xs) ==> (qsort xs == sort xs)

-- >>> quickCheck prop_qsort_distinct_sort
-- +++ OK, passed 100 tests.
--


















Plan

  1. Property-based Testing

    • Properties are boolean-functions
    • Generate inputs, run function, check if result is False
  2. Random Test Generation

  3. Case-Study












Plan

  1. Property-based Testing

    • Properties are boolean-functions
    • Generate inputs, run function, check if result is False
  2. Test Generation

  3. Case-Study
















Test Generation

Lets notice something about quickCheck

If you run it once …

>>> quickCheck prop_qsort_sort
*** Failed! Falsifiable (after 6 tests and 2 shrinks):
[5,5]

and if you run it again …

>>> quickCheck prop_qsort_sort
*** Failed! Falsifiable (after 4 tests and 1 shrink):
[1,1]

The falsifying tests are different!

How is this possible?
















Generators

QC defines a special Generator data type

data Gen a = MkGen (StdGen -> Int -> a)

A Gen a is a function that takes as input

  • a random number generator StdGen
  • a “seed” Int

and returns as output

  • a value of type a












Creating Generators

There are some functions to create generators, e.g.

choose :: (Int, Int) -> Gen Int

which

  • takes a pair of (lo, hi)

  • returns a random generator for values between lo and hi












Running Generators

To execute the Gen we need access to the system’s “randomness”

Done via an IO “recipe”

sample' :: Gen a -> IO [a]

Which

  • takes a Generator of a values and
  • returns a recipe that produces a list of (10) a values

We can run the recipe in ghci

>>> sample' (choose (0, 5))
[4,2,5,3,2,2,2,3,0,0,0]











EXERCISE

Lets write a function that returns a Generator over a list of elements

elements :: [a] -> Gen a
elements = ???

So elements [x0,x1,...,xn] returns a Generator that randomly produces values from x0, x1, … xn.

















PROBLEM: How to combine Generators?

Suppose I have a generator of positive Int

pos :: Gen Int
pos = sample (0, 100)

How can I create a generator of a pair of positive Ints?

posPair :: Gen (Int, Int)















Generator is a Monad!

  • You can see details here

  • This will let us combine generators (like combining parsers…)














QUIZ

Which of the below implements posPair :: Gen (Int, Int) ?

  • given pos :: Gen Int and sample :: (Int, Int) -> Gen Int
-- A
posPair = do { x1 <- pos; x2 <- pos; return (x1, x2) }

-- B
posPair = (pos, pos)

-- C
posPair = do { x <- pos; return (x, x) }  

-- D
posPair = Gen (4, 5)

-- E 
posPair = (sample (0, 100), sample (0, 100))












posPair = do { x1 <- pos; x2 <- pos; return (x1, x2) }

-- >>> sample' posPair
-- [(29,71),(48,74),(89,53),(73,93),(0,40),(71,35),(23,69),(93,49),(59,58),(27,32),(88,45)]











EXERCISE

Lets write a function that mixes a list of Generators

oneOf :: [Gen a] -> Gen a
oneOf = ???

oneOf [g0,g1,...,gn] should be a generator that

  • randomly selects one of g0,…gn

  • randomly generates a value from the chosen generator

>>> sample' (oneOf [choose (0,2), choose (10,12)])
[2,2,1,1,12,10,2,2,11,0,11]










oneOf is generalized into the frequency combinator

frequency :: [(Int, Gen a)] -> Gen a

which builds weighted mixtures of individual Generators















Types that can be Generated

QC has a class for types whose values can be randomly Generated

class Arbitrary a where
  arbitrary :: Gen a

T is an instance of Arbitrary if there is Gen T function

  • i.e. there is a generator of T values!
randomThings :: (Arbitrary a) => IO [a]
randomThings = sample' arbitrary

Many standard types have Arbitrary instances

  • Users write their own instances when testing their own types
>>> randomThings :: IO [Int]
[0,-2,-2,0,-1,8,1,-14,-13,5,19]

>>> randomThings :: IO [(Int, Bool)] 
[(0,True),(1,True),(0,True),(6,False),(-5,True),(4,False),(-12,False),(-8,False),(5,False),(-9,False),(-7,False)]
-
>>> randomThings :: IO [String]
["","\a","\f","\779257W\SUBA","\84573","D\ACK\365059S","9W\554735G","g\SYN~W\62120\&4&[","\NULsc\18427fy(","Q`TI \n/TH","\461027\ESCZ`u\783094\&4B\SOHT\424692"]











Plan

  1. Property-based Testing
    • Properties are Boolean-functions
    • Generate inputs, run function, check if result is False
  2. Test Generation
    • Gen a is a monadic generator of a values
    • Arbitrary is a class for types with generators
  3. Case-Study











Case Study: Compiler Optimizations

Lets use QC to test compiler optimizations

  • Learn how to generate structure data (programs)

  • Learn how to specify fancy properties (equivalence)

Using the WHILE language from your HW assignment.
















WHILE: Syntax

Recall the definition of Variable and Value

data Variable = V String 

data Value = IntVal Int | BoolVal Bool

which we used to define Expression

data Expression = Var   Variable | Val   Value
                | Plus  Expression Expression
                | Minus Expression Expression

and Statement

data Statement
  = Assign   Variable   Expression
  | If       Expression Statement  Statement
  | While    Expression Statement
  | Sequence Statement  Statement
  | Skip














WHILE: Semantics

Next, we defined the behavior of programs as

  • functions from starting state to final state

  • where state was defined as a map from Variable to Value

type WState = M.Map Variable Value

and then you wrote a function

execute ::  WState -> Statement -> WState
execute s0 stmt = execState (evalS stmt) s0

(We can skip the details of evalS because you wrote it… right?)















Generating WHILE Programs

Lets write a program generator














Generating Variables

instance Arbitrary Variable where
  arbitrary = do
    x <- elements ['A'..'Z'] 
    return (V [x])

and we get

>>> randomThings :: IO [Variable]
[V "G",V "U",V "Z",V "F",V "Z",V "K",V "P",V "D",V "Y",V "M",V "H"]






Generating Values

instance Arbitrary Value where
  arbitrary = oneOf [ do {n <- arbitrary; return (IntVal n) }
                    , do {b <- arbitrary; return (BoolVal b)} 
                    ]

and we get

>>> randomThings :: IO [Value]
[IntVal 0,BoolVal False,BoolVal True,IntVal 3
,IntVal (-8),IntVal (-3),IntVal 1,BoolVal False
,IntVal 6,BoolVal True,BoolVal False]






Generating Expressions

instance Arbitrary Expression where
  arbitrary = expr

expr, binE, baseE :: Gen Expression
expr     = oneof [baseE, binE] 

binE  = do { o  <- elements [Plus, Minus];
             e1 <- expr; 
             e2 <- expr 
             return (o e1 e2) }

baseE = oneOf [ do {x <- arbitrary; return (Var x) }
              , do {v <- arbitrary; return (Val v)} ]

which gives us

>>> randomThings :: IO [Expression]
... -- lots of expressions! 






Generating States

QC already has an way to automatically generate Maps

instance (Ord k, Arbitrary k, Arbitrary v) =>  Arbitrary (M.Map k v) where
  arbitrary = do {kvs <- arbitrary; return (M.fromList kvs) }

So for free we get a generator for WState

>>> randomThings :: IO [WState]
...






Specification: Program Equivalence

Next, lets specify the properties

Let p1 and p2 be two While programs.

Program p1 is equivalent to p2 written p1 === p2 if

(===) ::  Statement -> Statement -> Property
p1 === p2 = forAll arbitrary (\st -> execute st p1 == execute st p2)

That is, for all input states st

  • executing p1 from st and

  • executing p2 from st

produce the same state.

QUIZ

For example, suppose that

-- X := 10; Y := 20
prog1 = Sequence 
  (Assign (V "X") (Val (IntVal 10)))
  (Assign (V "Y") (Val (IntVal 20)))

--  Y := 20; X := 10
prog2 = Sequence 
  (Assign (V "Y") (Val (IntVal 20)))
  (Assign (V "X") (Val (IntVal 10))

--  Y := 20; X := 20
prog3 = Sequence 
  (Assign (V "Y") (Val (IntVal 20)))
  (Assign (V "X") (Val (IntVal 20)))

then what do the following two queries return?

>>> quickCheck (prog1 === prog2)
>>> quickCheck (prog1 === prog3)

A. FAIL, FAIL

B. FAIL, PASS

C. PASS, FAIL

D. PASS, PASS












Checking Compiler Optimizations

A compiler optimization can be viewed as a pair of programs

  • input written by human p_in
  • output optimized by compiler p_out

An optimization is correct if p_in === p_out

  • Compiler should not change behavior of the code











Checking an Optimization: Zero-Add-Elimination

Here’s a simple optimization

In Out
X := E + 0 X := E
X := E - 0 X := E

Should be correct because adding 0 doesn’t change anything…

We can write this as a QC property

prop_add_zero_elim :: Variable -> Expression -> Property
prop_add_zero_elim x e = 
   (x `Assign` (e `Plus` Val (IntVal 0))) === (x `Assign` e) 

prop_sub_zero_elim :: Variable -> Expression -> Property
prop_sub_zero_elim x e =
  (x `Assign` (e `Minus` Val (IntVal 0))) === (x `Assign` e)

So what does QC say?















-- >>> quickCheck prop_add_zero_elim
-- *** Failed! Falsifiable (after 1 test):
-- W
-- True
-- fromList []

What’s that? Lets see is W := True equivalent to W := True + 0 ?

Forgot about those pesky boolean expressions!

If you recall W := True will just assign True to W

>>> execute M.empty ((V "W") `Assign` (Val (BoolVal True)))
fromList [(W,True)]

but W := True + 0 will have a “type error” so will assign 0 to W!

>>> execute M.empty (V "W") `Assign` ((Val (BoolVal True) `Plus` Val (IntVal 0)))
fromList [(W,0)]














Fix: Restrict Optimizations to Int Expressions

Problem was expressions like True

  • Caused strange behaviors due to “type errors”

Lets restrict to only integer-valued expressions

exprI, baseI, binI :: Gen Expression
exprI = oneof [baseI, binE] 

baseI = oneOf [ do {x <- arbitrary; return (Var x) }
              , do {n <- arbitrary; return (Val (IntVal n)) } 
              ]
binI  = do { o  <- elements [Plus, Minus];
             e1 <- exprI;
             e2 <- exprI;
             return (o e1 e2) }

Now we can restrict the property to

prop_add_zero_elim'   :: Variable -> Property
prop_add_zero_elim' x = 
  forAll intExpr (\e -> (x `Assign` (e `Plus` Val (IntVal 0))) === (x `Assign` e))













QUIZ

Consider the property

prop_add_zero_elim'   :: Variable -> Property
prop_add_zero_elim' x = 
  forAll intExpr (\e -> (x `Assign` (e `Plus` Val (IntVal 0))) === (x `Assign` e))

What will be the result of

>>> quickCheck prop_add_zero_elim'

A. PASS

B. FAIL














>>> quickCheck prop_add_zero_elim'
*** Failed! Falsifiable (after 11 tests):
Z
G
fromList [(B,False),(F,-4),(G,True),(K,8),(M,True),(N,False),(R,3),(T,False),(V,True)]

Oops, the counterexample is Z := G and Z := G + 0

  • but now the starting state maps (G, True)

Pesky Bool sneaked right back in …

Moral: Even simple optimizations are really tricky without types!

Try at home Can you fix this property so it passes?











Checking an Optimization: Constant-Folding-ish

Lets try another optimization that doesn’t use any arithmetic

  • So should not have any problems with Int-vs-Bool

Suppose you have two back-to-back assignments

X := E;   Y := E

Why recompute E? Result is already stored in X! So optimize the above to

X := E;   Y := X

We can specify this transformation as the QC property

prop_const_prop :: Variable -> Variable -> Expression -> Property
prop_const_prop x y e = 
  ((x `Assign` e) `Sequence` (y `Assign` e))
  ===
  ((x `Assign` e) `Sequence` (y `Assign` Var x))

Mighty QC, do you agree ?

>>> quickCheck prop_const_prop 
*Testing> quickCheck prop_const_prop
*** Failed! Falsifiable (after 82 tests):
D
B
True + N + L + M + True + -45 + H + -9 + 70 + True + -68 + N + -29 + I + True + G + O + P + True + Q + False + False + True + True + True + True + X + I + False + 81 + -42 + False + 31 + -13 + T + 23 + True + S + True + I + M + True + True + True + Z + H + -65 + G + K + -22 + D
fromList [(A,True),(B,-72),(C,-19),(D,-34),(E,50),(F,True),(G,True),(H,-21),(I,5),(J,3),(K,True),(L,-20),(M,True),(N,-10),(O,-20),(P,False),(Q,-10),(R,-78),(S,True),(T,70),(U,False),(V,-55),(W,True),(X,True),(Y,True),(Z,-56)]















Shrinking Tests

The property fails … but the counterexample is very long!

QC has a test shrinking mechanism … also in the Arbitrary class

shrink :: a -> [a]

shrink t

  • takes as input a candidate t and
  • returns as output a list of smaller candidates

That QC will systematically search with to find a minimally failing test!

instance Arbitrary Expression where
  arbitrary = expr

  -- shrink :: Expression -> [Expression]
  shrink (Plus e1 e2)  = [e1, e2]
  shrink (Minus e1 e2) = [e1, e2]
  shrink _             = []

We should just keep shrinking each Expression to its sub-Expressions.

>>> quickCheck prop_const_prop 
*** Failed! Falsifiable (after 26 tests and 4 shrinks):    
D
U
A + D
fromList [(D,-638),(G,256),(H,False),(K,False),(O,True),(R,True),(S,-81),(T,926)]
~~~~~

Aha! Consider the two programs

```haskell
D := A + D;   
U := A + D

and

D := A + D; 
U := D

are they equivalent? Pretty subtle, eh.












Recap: Property-Based Testing

  1. Property-based Testing
    • Properties are Boolean-functions
    • Generate inputs, run function, check if result is False
  2. Test Generation
    • Gen a is a monadic generator of a values
    • Arbitrary is a class for types with generators
  3. Case-Study
    • Compiler optimizations are very tricky
    • QC-inspired methods have found serious bugs in many compilers & databases & …












Recap: Property-Based Testing

QuickCheck is awesome!

  • Simple: typeclasses + monads

  • Useful: Can find subtle bugs or inconsistencies in your code.

Lots of literature on QC and techniques it has inspired

  • Can even use QC to generate test data systems in other languages