cse230

Property-based Testing

    Typeclasses + Monads = Automatic Testing

Lets look at QuickCheck

Developed by Koen Claessen and John Hughes in 2000

Ported to 40 other languages

JSVerify, JUNIT-quickcheck, hypothesis

PBT used in basically all kinds of software…

Plan

Property-based Testing
Random Test Generation
Case-Study

Property-based Testing

Don’t (only) write individual unit-tests

only check particular input-output behavior of code

Do write properties desired of the functions

generate random inputs
run function
verify (or rather, try to falsify) the property

Testing with Specifications

PBT emphasizes the importance of specifying correct behavior

Makes you think about what the code should do,
Finds corner-cases where the specification is violated (so code or spec are fixed)
Makes specs live on as machine-checked documentation.

Properties

A Property is a function that returns a Bool

prop_revapp :: [Int] -> [Int] -> Bool
prop_revapp xs ys = 
    reverse (xs ++ ys) == reverse xs ++ reverse ys

Looks like a theorem that the programmer believes is true.

By convention, we write the prefix "prop_"

QUIZ

Consider the property prop_revapp

prop_revapp :: [Int] -> [Int] -> Bool
prop_revapp xs ys = 
    reverse (xs ++ ys) == reverse xs ++ reverse ys

It says forall xs, ys. reverse (xs ++ ys) == reverse xs ++ reverse ys.

Is the property true?

A. Yes

B. No

Question

Why specify type as [Int] -> [Int] -> Bool?

Why not write [a] -> [a] -> Bool?

Checking Properties

QC uses the types to generate random inputs

For that, it needs a specific type like [Int]
Cannot generate an input of type [a]

Let’s quickCheck it!

>>> quickCheck prop_revapp
*** Failed! Falsifiable (after 6 tests and 9 shrinks):
[0]
[1]

Huh? QC gives us a counterexample with xs = [0] and ys == [1]

reverse (xs ++ ys) 
==> reverse ([0] ++ [1]) 
==> reverse ([0, 1]) 
==> [1, 0])

but

reverse xs ++ reverse ys
==> reverse [0] ++ reverse [1]
==> [0] ++ [1]
==> [0, 1]

EXERCISE

Can you modify prop_revapp so that it is sensible and true ?

prop_revapp :: [Int] -> [Int] -> Bool
prop_revapp xs ys = reverse (xs ++ ys) == ???

When you are done, we should see

>>> quickCheck prop_revapp'
+++ OK, passed 100 tests.

We can run more tests by specifying that as a parameter

quickCheckN n = quickCheckWith (stdArgs {maxSuccess = n})

Followed by

quickCheckN 10000 prop_revapp
+++ OK, passed 10000 tests

QuickSort

Here’s a simple sorting function (quickSort)

qsort        :: (Ord a) => [a] -> [a]
qsort []     = []
qsort (x:xs) = qsort ls ++ [x] ++ qsort rs
  where 
    ls       = [y | y <- xs, y < x]  -- elems in xs < x 
    rs       = [z | z <- xs, z > x]  -- elems in xs > x

Seems to work?

>>> [1,3..19] ++ [2,4..20]
[1,3,5,7,9,11,13,15,17,19,2,4,6,8,10,12,14,16,18,20]

>>> qsort ([1,3..19] ++ [2,4..20])
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]

QuickCheck QuickSort

Lets check the output of qsort is ordered

isOrdered :: (Ord a) => [a] -> Bool
isOrdered ::         (Ord a) => [a] -> Bool
isOrdered (x1:x2:xs) = x1 <= x2 && isOrdered (x2:xs)
isOrdered _          = True

and use it to write a property

prop_qsort_isOrdered :: [Int] -> Bool
prop_qsort_isOrdered xs = isOrdered (qsort xs)

which we can check

>>> quickCheckN 1000 prop_qsort_isOrdered
+++ OK, passed 1000 tests.

QUIZ

Lets check that the first element of the output is the smallest

qsort        :: (Ord a) => [a] -> [a]
qsort []     = []
qsort (x:xs) = qsort ls ++ [x] ++ qsort rs
  where 
    ls       = [y | y <- xs, y < x]  -- elems in xs < x 
    rs       = [z | z <- xs, z > x]  -- elems in xs > x

prop_qsort_min :: [a] -> Bool
prop_qsort_min xs = head (qsort xs) == minimum xs

What is the result of

>>> quickCheck prop_qsort_min

A. Pass 100 tests

B. Fail

Properties and Assumptions

prop_qsort_min :: [a] -> Bool
prop_qsort_min xs = head (qsort xs) == minimum x

Oops!

>>> quickCheck prop_qsort_min
*** Failed! Exception: 'Prelude.head: empty list' (after 1 test):
[]

prop_qsort_min is not true for all Int lists

Property only makes sense if xs is not empty

Writing specifications clarifies the assumptions

under which a given piece of code is supposed to work.

Conditional Properties

Lets modify prop_qsort_min so equality holds if input is non-null

prop_qsort_nn_min    :: [Int] -> Property
prop_qsort_nn_min xs =
  not (null xs) ==> head (qsort xs) == minimum xs

Instead of Bool the function’s output is Property

a special type built into the QC library

The implies operator ==> is one of many

that allow the construction of rich properties.

QUIZ

Lets test that our qsort is identical to a trusted reference implementation

may be too slow to deploy but ok to use for checking correctness

Lets use the standard library’s Data.List.sort function

(Much faster than ours… but just for illustration!)

prop_qsort_sort    :: [Int] -> Bool
prop_qsort_sort xs =  qsort xs == sort xs

What is the result of

qsort        :: (Ord a) => [a] -> [a]
qsort []     = []
qsort (x:xs) = qsort ls ++ [x] ++ qsort rs
  where 
    ls       = [y | y <- xs, y < x]  -- elems in xs < x 
    rs       = [z | z <- xs, z > x]  -- elems in xs > x

>>> quickCheck prop_qsort_sort

A. OK after 100 tests

B. Failed! Falsifiable...

Lets Check

>>> quickCheck prop_qsort_sort
*** Failed! Falsifiable (after 6 tests and 3 shrinks):
[-3,-3]

Oops? What?

>>> sort [-3, -3]
[-3, -3]

>>> qsort [-3, -3]
[-3]

Ugh! So close, and yet … Can you spot the bug in our code?

qsort        :: (Ord a) => [a] -> [a]
qsort []     = []
qsort (x:xs) = qsort ls ++ [x] ++ qsort rs
  where 
    ls       = [y | y <- xs, y < x]  -- elems in xs < x 
    rs       = [z | z <- xs, z > x]  -- elems in xs > x

Specifying No-Duplicates

We assumed that the input has no duplicates

Values equal to x are thrown out from ls and rs

Is this a bug? Maybe? Maybe not?

But at least its something we should be aware of!

Lets specify that a list has no-duplicates

noDuplicates ::(Eq a) => [a] -> Bool
noDuplicates (x:xs) = not (x `elem` xs) && noDuplicates xs
noDuplicates _      = True

Specifying a Post-Condition

We can now check that qsort outputs a list with no-duplicates

prop_qsort_distinct :: [Int] -> Bool 
prop_qsort_distinct xs = noDuplicates (qsort xs)  

-- >>> quickCheck prop_qsort_distinct
-- +++ OK, passed 100 tests.

Specifying a Pre-Condition

Also, qsort is identical to sort on inputs with no duplicates

prop_qsort_distinct_sort :: [Int] -> Property 
prop_qsort_distinct_sort xs = 
  (isDistinct xs) ==> (qsort xs == sort xs)

-- >>> quickCheck prop_qsort_distinct_sort
-- +++ OK, passed 100 tests.
--

Plan

Property-based Testing
- Properties are boolean-functions
- Generate inputs, run function, check if result is False
Random Test Generation
Case-Study

Plan

Property-based Testing
- Properties are boolean-functions
- Generate inputs, run function, check if result is False
Test Generation
Case-Study

Test Generation

Lets notice something about quickCheck

If you run it once …

>>> quickCheck prop_qsort_sort
*** Failed! Falsifiable (after 6 tests and 2 shrinks):
[5,5]

and if you run it again …

>>> quickCheck prop_qsort_sort
*** Failed! Falsifiable (after 4 tests and 1 shrink):
[1,1]

The falsifying tests are different!

How is this possible?

Generators

QC defines a special Generator data type

data Gen a = MkGen (StdGen -> Int -> a)

A Gen a is a function that takes as input

a random number generator StdGen
a “seed” Int

and returns as output

a value of type a

Creating `Gen`erators

There are some functions to create generators, e.g.

choose :: (Int, Int) -> Gen Int

which

takes a pair of (lo, hi)
returns a random generator for values between lo and hi

Running `Gen`erators

To execute the Gen we need access to the system’s “randomness”

Done via an IO “recipe”

sample' :: Gen a -> IO [a]

Which

takes a Generator of a values and
returns a recipe that produces a list of (10) a values

We can run the recipe in ghci

>>> sample' (choose (0, 5))
[4,2,5,3,2,2,2,3,0,0,0]

EXERCISE

Lets write a function that returns a Generator over a list of elements

elements :: [a] -> Gen a
elements = ???

So elements [x0,x1,...,xn] returns a Generator that randomly produces values from x0, x1, … xn.

PROBLEM: How to combine `Gen`erators?

Suppose I have a generator of positive Int

pos :: Gen Int
pos = sample (0, 100)

How can I create a generator of a pair of positive Ints?

posPair :: Gen (Int, Int)

`Gen`erator is a Monad!

You can see details here
This will let us combine generators (like combining parsers…)

QUIZ

Which of the below implements posPair :: Gen (Int, Int) ?

given pos :: Gen Int and sample :: (Int, Int) -> Gen Int

-- A
posPair = do { x1 <- pos; x2 <- pos; return (x1, x2) }

-- B
posPair = (pos, pos)

-- C
posPair = do { x <- pos; return (x, x) }  

-- D
posPair = Gen (4, 5)

-- E 
posPair = (sample (0, 100), sample (0, 100))

posPair = do { x1 <- pos; x2 <- pos; return (x1, x2) }

-- >>> sample' posPair
-- [(29,71),(48,74),(89,53),(73,93),(0,40),(71,35),(23,69),(93,49),(59,58),(27,32),(88,45)]

EXERCISE

Lets write a function that mixes a list of Generators

oneOf :: [Gen a] -> Gen a
oneOf = ???

oneOf [g0,g1,...,gn] should be a generator that

randomly selects one of g0,…gn
randomly generates a value from the chosen generator

>>> sample' (oneOf [choose (0,2), choose (10,12)])
[2,2,1,1,12,10,2,2,11,0,11]

oneOf is generalized into the frequency combinator

frequency :: [(Int, Gen a)] -> Gen a

which builds weighted mixtures of individual Generators

Types that can be `Gen`erated

QC has a class for types whose values can be randomly Generated

class Arbitrary a where
  arbitrary :: Gen a

T is an instance of Arbitrary if there is Gen T function

i.e. there is a generator of T values!

randomThings :: (Arbitrary a) => IO [a]
randomThings = sample' arbitrary

Many standard types have Arbitrary instances

Users write their own instances when testing their own types

>>> randomThings :: IO [Int]
[0,-2,-2,0,-1,8,1,-14,-13,5,19]

>>> randomThings :: IO [(Int, Bool)] 
[(0,True),(1,True),(0,True),(6,False),(-5,True),(4,False),(-12,False),(-8,False),(5,False),(-9,False),(-7,False)]
-
>>> randomThings :: IO [String]
["","\a","\f","\779257W\SUBA","\84573","D\ACK\365059S","9W\554735G","g\SYN~W\62120\&4&[","\NULsc\18427fy(","Q`TI \n/TH","\461027\ESCZ`u\783094\&4B\SOHT\424692"]

Plan

Property-based Testing
- Properties are Boolean-functions
- Generate inputs, run function, check if result is False
Test Generation
- Gen a is a monadic generator of a values
- Arbitrary is a class for types with generators
Case-Study

Case Study: Compiler Optimizations

Lets use QC to test compiler optimizations

Learn how to generate structure data (programs)
Learn how to specify fancy properties (equivalence)

Using the WHILE language from your HW assignment.

WHILE: Syntax

Recall the definition of Variable and Value

data Variable = V String 

data Value = IntVal Int | BoolVal Bool

which we used to define Expression

data Expression = Var   Variable | Val   Value
                | Plus  Expression Expression
                | Minus Expression Expression

and Statement

data Statement
  = Assign   Variable   Expression
  | If       Expression Statement  Statement
  | While    Expression Statement
  | Sequence Statement  Statement
  | Skip

WHILE: Semantics

Next, we defined the behavior of programs as

functions from starting state to final state
where state was defined as a map from Variable to Value

type WState = M.Map Variable Value

and then you wrote a function

execute ::  WState -> Statement -> WState
execute s0 stmt = execState (evalS stmt) s0

(We can skip the details of evalS because you wrote it… right?)

Generating WHILE Programs

Lets write a program generator

Generating Variables

instance Arbitrary Variable where
  arbitrary = do
    x <- elements ['A'..'Z'] 
    return (V [x])

and we get

>>> randomThings :: IO [Variable]
[V "G",V "U",V "Z",V "F",V "Z",V "K",V "P",V "D",V "Y",V "M",V "H"]

Generating Values

instance Arbitrary Value where
  arbitrary = oneOf [ do {n <- arbitrary; return (IntVal n) }
                    , do {b <- arbitrary; return (BoolVal b)} 
                    ]

and we get

>>> randomThings :: IO [Value]
[IntVal 0,BoolVal False,BoolVal True,IntVal 3
,IntVal (-8),IntVal (-3),IntVal 1,BoolVal False
,IntVal 6,BoolVal True,BoolVal False]

Generating Expressions

instance Arbitrary Expression where
  arbitrary = expr

expr, binE, baseE :: Gen Expression
expr     = oneof [baseE, binE] 

binE  = do { o  <- elements [Plus, Minus];
             e1 <- expr; 
             e2 <- expr 
             return (o e1 e2) }

baseE = oneOf [ do {x <- arbitrary; return (Var x) }
              , do {v <- arbitrary; return (Val v)} ]

which gives us

>>> randomThings :: IO [Expression]
... -- lots of expressions!

Generating States

QC already has an way to automatically generate Maps

instance (Ord k, Arbitrary k, Arbitrary v) =>  Arbitrary (M.Map k v) where
  arbitrary = do {kvs <- arbitrary; return (M.fromList kvs) }

So for free we get a generator for WState

>>> randomThings :: IO [WState]
...

Specification: Program Equivalence

Next, lets specify the properties

Let p1 and p2 be two While programs.

Program p1 is equivalent to p2 written p1 === p2 if

(===) ::  Statement -> Statement -> Property
p1 === p2 = forAll arbitrary (\st -> execute st p1 == execute st p2)

That is, for all input states st

executing p1 from st and
executing p2 from st

produce the same state.

QUIZ

For example, suppose that

-- X := 10; Y := 20
prog1 = Sequence 
  (Assign (V "X") (Val (IntVal 10)))
  (Assign (V "Y") (Val (IntVal 20)))

--  Y := 20; X := 10
prog2 = Sequence 
  (Assign (V "Y") (Val (IntVal 20)))
  (Assign (V "X") (Val (IntVal 10))

--  Y := 20; X := 20
prog3 = Sequence 
  (Assign (V "Y") (Val (IntVal 20)))
  (Assign (V "X") (Val (IntVal 20)))

then what do the following two queries return?

>>> quickCheck (prog1 === prog2)
>>> quickCheck (prog1 === prog3)

A. FAIL, FAIL

B. FAIL, PASS

C. PASS, FAIL

D. PASS, PASS

Checking Compiler Optimizations

A compiler optimization can be viewed as a pair of programs

input written by human p_in
output optimized by compiler p_out

An optimization is correct if p_in === p_out

Compiler should not change behavior of the code

Checking an Optimization: Zero-Add-Elimination

Here’s a simple optimization

In	Out
`X := E + 0`	`X := E`
`X := E - 0`	`X := E`

Should be correct because adding 0 doesn’t change anything…

We can write this as a QC property

prop_add_zero_elim :: Variable -> Expression -> Property
prop_add_zero_elim x e = 
   (x `Assign` (e `Plus` Val (IntVal 0))) === (x `Assign` e) 

prop_sub_zero_elim :: Variable -> Expression -> Property
prop_sub_zero_elim x e =
  (x `Assign` (e `Minus` Val (IntVal 0))) === (x `Assign` e)

So what does QC say?

-- >>> quickCheck prop_add_zero_elim
-- *** Failed! Falsifiable (after 1 test):
-- W
-- True
-- fromList []

What’s that? Lets see is W := True equivalent to W := True + 0 ?

Forgot about those pesky boolean expressions!

If you recall W := True will just assign True to W

>>> execute M.empty ((V "W") `Assign` (Val (BoolVal True)))
fromList [(W,True)]

but W := True + 0 will have a “type error” so will assign 0 to W!

>>> execute M.empty (V "W") `Assign` ((Val (BoolVal True) `Plus` Val (IntVal 0)))
fromList [(W,0)]

Fix: Restrict Optimizations to `Int` Expressions

Problem was expressions like True

Caused strange behaviors due to “type errors”

Lets restrict to only integer-valued expressions

exprI, baseI, binI :: Gen Expression
exprI = oneof [baseI, binE] 

baseI = oneOf [ do {x <- arbitrary; return (Var x) }
              , do {n <- arbitrary; return (Val (IntVal n)) } 
              ]
binI  = do { o  <- elements [Plus, Minus];
             e1 <- exprI;
             e2 <- exprI;
             return (o e1 e2) }

Now we can restrict the property to

prop_add_zero_elim'   :: Variable -> Property
prop_add_zero_elim' x = 
  forAll intExpr (\e -> (x `Assign` (e `Plus` Val (IntVal 0))) === (x `Assign` e))

QUIZ

Consider the property

prop_add_zero_elim'   :: Variable -> Property
prop_add_zero_elim' x = 
  forAll intExpr (\e -> (x `Assign` (e `Plus` Val (IntVal 0))) === (x `Assign` e))

What will be the result of

>>> quickCheck prop_add_zero_elim'

A. PASS

B. FAIL

>>> quickCheck prop_add_zero_elim'
*** Failed! Falsifiable (after 11 tests):
Z
G
fromList [(B,False),(F,-4),(G,True),(K,8),(M,True),(N,False),(R,3),(T,False),(V,True)]

Oops, the counterexample is Z := G and Z := G + 0

but now the starting state maps (G, True)

Pesky Bool sneaked right back in …

Moral: Even simple optimizations are really tricky without types!

Try at home Can you fix this property so it passes?

Checking an Optimization: Constant-Folding-ish

Lets try another optimization that doesn’t use any arithmetic

So should not have any problems with Int-vs-Bool

Suppose you have two back-to-back assignments

X := E;   Y := E

Why recompute E? Result is already stored in X! So optimize the above to

X := E;   Y := X

We can specify this transformation as the QC property

prop_const_prop :: Variable -> Variable -> Expression -> Property
prop_const_prop x y e = 
  ((x `Assign` e) `Sequence` (y `Assign` e))
  ===
  ((x `Assign` e) `Sequence` (y `Assign` Var x))

Mighty QC, do you agree ?

>>> quickCheck prop_const_prop 
*Testing> quickCheck prop_const_prop
*** Failed! Falsifiable (after 82 tests):
D
B
True + N + L + M + True + -45 + H + -9 + 70 + True + -68 + N + -29 + I + True + G + O + P + True + Q + False + False + True + True + True + True + X + I + False + 81 + -42 + False + 31 + -13 + T + 23 + True + S + True + I + M + True + True + True + Z + H + -65 + G + K + -22 + D
fromList [(A,True),(B,-72),(C,-19),(D,-34),(E,50),(F,True),(G,True),(H,-21),(I,5),(J,3),(K,True),(L,-20),(M,True),(N,-10),(O,-20),(P,False),(Q,-10),(R,-78),(S,True),(T,70),(U,False),(V,-55),(W,True),(X,True),(Y,True),(Z,-56)]

Shrinking Tests

The property fails … but the counterexample is very long!

QC has a test shrinking mechanism … also in the Arbitrary class

shrink :: a -> [a]

shrink t

takes as input a candidate t and
returns as output a list of smaller candidates

That QC will systematically search with to find a minimally failing test!

instance Arbitrary Expression where
  arbitrary = expr

  -- shrink :: Expression -> [Expression]
  shrink (Plus e1 e2)  = [e1, e2]
  shrink (Minus e1 e2) = [e1, e2]
  shrink _             = []

We should just keep shrinking each Expression to its sub-Expressions.

>>> quickCheck prop_const_prop 
*** Failed! Falsifiable (after 26 tests and 4 shrinks):    
D
U
A + D
fromList [(D,-638),(G,256),(H,False),(K,False),(O,True),(R,True),(S,-81),(T,926)]
~~~~~

Aha! Consider the two programs

```haskell
D := A + D;   
U := A + D

and

D := A + D; 
U := D

are they equivalent? Pretty subtle, eh.

Recap: Property-Based Testing

Property-based Testing
- Properties are Boolean-functions
- Generate inputs, run function, check if result is False
Test Generation
- Gen a is a monadic generator of a values
- Arbitrary is a class for types with generators
Case-Study
- Compiler optimizations are very tricky
- QC-inspired methods have found serious bugs in many compilers & databases & …

Recap: Property-Based Testing

QuickCheck is awesome!

Simple: typeclasses + monads
Useful: Can find subtle bugs or inconsistencies in your code.

Lots of literature on QC and techniques it has inspired

Can even use QC to generate test data systems in other languages

Property-based Testing

Plan

Property-based Testing

Testing with Specifications

Properties

QUIZ

Question

Checking Properties

EXERCISE

QuickSort

QuickCheck QuickSort

QUIZ

Properties and Assumptions

Conditional Properties

QUIZ

Lets Check

Specifying No-Duplicates

Specifying a Post-Condition

Specifying a Pre-Condition

Plan

Plan

Test Generation

Generators

Creating Generators

Running Generators

EXERCISE

PROBLEM: How to combine Generators?

Generator is a Monad!

QUIZ

EXERCISE

Types that can be Generated

Plan

Case Study: Compiler Optimizations

WHILE: Syntax

WHILE: Semantics

Generating WHILE Programs

Generating Variables

Generating Values

Generating Expressions

Generating States

Specification: Program Equivalence

QUIZ

Checking Compiler Optimizations

Checking an Optimization: Zero-Add-Elimination

Fix: Restrict Optimizations to Int Expressions

QUIZ

Checking an Optimization: Constant-Folding-ish

Shrinking Tests

Recap: Property-Based Testing

Recap: Property-Based Testing

Creating `Gen`erators

Running `Gen`erators

PROBLEM: How to combine `Gen`erators?

`Gen`erator is a Monad!

Types that can be `Gen`erated

Fix: Restrict Optimizations to `Int` Expressions