## Property-based Testing

`Typeclasses + Monads = Automatic Testing `

Lets look at QuickCheck

- Developed by Koen Claessen and John Hughes in 2000

Ported to 40 other languages

PBT used in basically all kinds of software…

## Plan

**Property-based Testing**Random Test Generation

Case-Study

## Property-based Testing

**Don’t** (only) write individual *unit-tests*

- only check particular input-output behavior of code

**Do write** *properties* desired of the functions

*generate*random inputs*run*function*verify*(or rather, try to falsify) the property

## Testing with Specifications

PBT emphasizes the importance of **specifying correct behavior**

Makes you think about what the code

*should do*,Finds

*corner-cases*where the specification is violated (so code or spec are fixed)Makes specs live on as machine-checked

*documentation*.

## Properties

A `Property`

is a function that returns a `Bool`

```
prop_revapp :: [Int] -> [Int] -> Bool
=
prop_revapp xs ys reverse (xs ++ ys) == reverse xs ++ reverse ys
```

Looks like a *theorem* that the programmer *believes* is true.

- By convention, we write the prefix
`"prop_"`

## QUIZ

Consider the property `prop_revapp`

```
prop_revapp :: [Int] -> [Int] -> Bool
=
prop_revapp xs ys reverse (xs ++ ys) == reverse xs ++ reverse ys
```

It says *forall* `xs`

, `ys`

. `reverse (xs ++ ys) == reverse xs ++ reverse ys`

.

Is the property true?

**A.** Yes

**B.** No

## Question

Why specify type as `[Int] -> [Int] -> Bool`

?

- Why not write
`[a] -> [a] -> Bool`

?

## Checking Properties

QC uses the types to generate *random inputs*

For that, it needs a

*specific*type like`[Int]`

Cannot generate an input of type

`[a]`

Let’s `quickCheck`

it!

```
>>> quickCheck prop_revapp
*** Failed! Falsifiable (after 6 tests and 9 shrinks):
0]
[1] [
```

Huh? QC gives us a *counterexample* with `xs = [0]`

and `ys == [1]`

```
reverse (xs ++ ys)
==> reverse ([0] ++ [1])
==> reverse ([0, 1])
==> [1, 0])
```

but

```
reverse xs ++ reverse ys
==> reverse [0] ++ reverse [1]
==> [0] ++ [1]
==> [0, 1]
```

## EXERCISE

Can you modify `prop_revapp`

so that it is *sensible* and *true* ?

```
prop_revapp :: [Int] -> [Int] -> Bool
= reverse (xs ++ ys) == ??? prop_revapp xs ys
```

When you are done, we should see

```
>>> quickCheck prop_revapp'
+++ OK, passed 100 tests.
```

We can run *more* tests by specifying that as a parameter

`= quickCheckWith (stdArgs {maxSuccess = n}) quickCheckN n `

Followed by

```
10000 prop_revapp
quickCheckN +++ OK, passed 10000 tests
```

## QuickSort

Here’s a simple sorting function (`quickSort`

)

```
qsort :: (Ord a) => [a] -> [a]
= []
qsort [] :xs) = qsort ls ++ [x] ++ qsort rs
qsort (xwhere
= [y | y <- xs, y < x] -- elems in xs < x
ls = [z | z <- xs, z > x] -- elems in xs > x rs
```

Seems to work?

```
>>> [1,3..19] ++ [2,4..20]
1,3,5,7,9,11,13,15,17,19,2,4,6,8,10,12,14,16,18,20]
[
>>> qsort ([1,3..19] ++ [2,4..20])
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20] [
```

## QuickCheck QuickSort

Lets *check* the output of `qsort`

is **ordered**

```
isOrdered :: (Ord a) => [a] -> Bool
isOrdered :: (Ord a) => [a] -> Bool
:x2:xs) = x1 <= x2 && isOrdered (x2:xs)
isOrdered (x1= True isOrdered _
```

and use it to write a property

```
prop_qsort_isOrdered :: [Int] -> Bool
= isOrdered (qsort xs) prop_qsort_isOrdered xs
```

which we can check

```
>>> quickCheckN 1000 prop_qsort_isOrdered
+++ OK, passed 1000 tests.
```

## QUIZ

Lets check that the *first* element of the output is the smallest

```
qsort :: (Ord a) => [a] -> [a]
= []
qsort [] :xs) = qsort ls ++ [x] ++ qsort rs
qsort (xwhere
= [y | y <- xs, y < x] -- elems in xs < x
ls = [z | z <- xs, z > x] -- elems in xs > x
rs
prop_qsort_min :: [a] -> Bool
= head (qsort xs) == minimum xs prop_qsort_min xs
```

What is the result of

`>>> quickCheck prop_qsort_min`

**A.** Pass 100 tests

**B.** Fail

## Properties and Assumptions

```
prop_qsort_min :: [a] -> Bool
= head (qsort xs) == minimum x prop_qsort_min xs
```

Oops!

```
>>> quickCheck prop_qsort_min
*** Failed! Exception: 'Prelude.head: empty list' (after 1 test):
[]
```

`prop_qsort_min`

is not true **for all** `Int`

lists

- Property
*only*makes sense if`xs`

is not empty

Writing specifications clarifies the *assumptions*

- under which a given piece of code is supposed to work.

## Conditional Properties

Lets modify `prop_qsort_min`

so equality holds *if* input is non-null

```
prop_qsort_nn_min :: [Int] -> Property
=
prop_qsort_nn_min xs not (null xs) ==> head (qsort xs) == minimum xs
```

Instead of `Bool`

the function’s output is `Property`

- a special type built into the QC library

The *implies* operator `==>`

is one of many

- that allow the construction of rich properties.

## QUIZ

Lets test that our `qsort`

is *identical* to a *trusted reference implementation*

- may be too
*slow*to deploy but ok to use for checking correctness

Lets use the standard library’s `Data.List.sort`

function

- (
*Much*faster than ours… but just for illustration!)

```
prop_qsort_sort :: [Int] -> Bool
= qsort xs == sort xs prop_qsort_sort xs
```

What is the result of

```
qsort :: (Ord a) => [a] -> [a]
= []
qsort [] :xs) = qsort ls ++ [x] ++ qsort rs
qsort (xwhere
= [y | y <- xs, y < x] -- elems in xs < x
ls = [z | z <- xs, z > x] -- elems in xs > x
rs
>>> quickCheck prop_qsort_sort
```

**A.** `OK`

after 100 tests

**B.** `Failed! Falsifiable...`

## Lets Check

```
>>> quickCheck prop_qsort_sort
*** Failed! Falsifiable (after 6 tests and 3 shrinks):
-3,-3] [
```

Oops? What?

```
>>> sort [-3, -3]
-3, -3]
[
>>> qsort [-3, -3]
-3] [
```

Ugh! So close, and yet … Can you spot the bug in our code?

```
qsort :: (Ord a) => [a] -> [a]
= []
qsort [] :xs) = qsort ls ++ [x] ++ qsort rs
qsort (xwhere
= [y | y <- xs, y < x] -- elems in xs < x
ls = [z | z <- xs, z > x] -- elems in xs > x rs
```

## Specifying No-Duplicates

We assumed that the input has **no duplicates**

**Values equal to**`x`

are thrown out from`ls`

and`rs`

Is this a *bug*? Maybe? Maybe not?

- But at least its something we should be
*aware*of!

Lets specify that a list has no-duplicates

```
noDuplicates ::(Eq a) => [a] -> Bool
:xs) = not (x `elem` xs) && noDuplicates xs
noDuplicates (x= True noDuplicates _
```

## Specifying a Post-Condition

We can now check that `qsort`

**outputs** a list with no-duplicates

```
prop_qsort_distinct :: [Int] -> Bool
= noDuplicates (qsort xs)
prop_qsort_distinct xs
-- >>> quickCheck prop_qsort_distinct
-- +++ OK, passed 100 tests.
```

## Specifying a Pre-Condition

Also, `qsort`

is identical to `sort`

on **inputs with no duplicates**

```
prop_qsort_distinct_sort :: [Int] -> Property
=
prop_qsort_distinct_sort xs ==> (qsort xs == sort xs)
(isDistinct xs)
-- >>> quickCheck prop_qsort_distinct_sort
-- +++ OK, passed 100 tests.
--
```

## Plan

**Property-based Testing**- Properties are boolean-functions
- Generate inputs, run function, check if result is
`False`

Random Test Generation

Case-Study

## Plan

Property-based Testing

- Properties are boolean-functions
- Generate inputs, run function, check if result is
`False`

**Test Generation**Case-Study

## Test Generation

Lets notice something about `quickCheck`

If you run it once …

```
>>> quickCheck prop_qsort_sort
*** Failed! Falsifiable (after 6 tests and 2 shrinks):
5,5] [
```

and if you run it again …

```
>>> quickCheck prop_qsort_sort
*** Failed! Falsifiable (after 4 tests and 1 shrink):
1,1] [
```

The *falsifying tests* are different!

How is this possible?

## Generators

QC defines a special *Generator* data type

`data Gen a = MkGen (StdGen -> Int -> a)`

A `Gen a`

is a function that takes as *input*

- a random number generator
`StdGen`

- a “seed”
`Int`

and returns as *output*

- a
**value**of type`a`

## Creating `Gen`

erators

There are some functions to *create* generators, e.g.

`choose :: (Int, Int) -> Gen Int`

which

takes a pair of

`(lo, hi)`

returns a random generator for values between

`lo`

and`hi`

## Running `Gen`

erators

To *execute* the `Gen`

we need access to the system’s “randomness”

Done via an `IO`

“recipe”

`sample' :: Gen a -> IO [a]`

Which

- takes a
`Gen`

erator of`a`

values and - returns a
*recipe*that produces a list of (10)`a`

values

We can *run* the recipe in `ghci`

```
>>> sample' (choose (0, 5))
4,2,5,3,2,2,2,3,0,0,0] [
```

## EXERCISE

Lets write a function that returns a `Gen`

erator over a list of `elements`

```
elements :: [a] -> Gen a
= ??? elements
```

So `elements [x0,x1,...,xn]`

returns a `Gen`

erator that randomly produces values from `x0`

, `x1`

, … `xn`

.

## PROBLEM: How to *combine* `Gen`

erators?

Suppose I have a generator of positive `Int`

```
pos :: Gen Int
= sample (0, 100) pos
```

How can I create a generator of a *pair* of positive `Int`

s?

`posPair :: Gen (Int, Int)`

`Gen`

erator is a Monad!

You can see details here

This will let us

*combine*generators (like combining parsers…)

## QUIZ

Which of the below implements `posPair :: Gen (Int, Int)`

?

- given
`pos :: Gen Int`

and`sample :: (Int, Int) -> Gen Int`

```
-- A
= do { x1 <- pos; x2 <- pos; return (x1, x2) }
posPair
-- B
= (pos, pos)
posPair
-- C
= do { x <- pos; return (x, x) }
posPair
-- D
= Gen (4, 5)
posPair
-- E
= (sample (0, 100), sample (0, 100)) posPair
```

```
= do { x1 <- pos; x2 <- pos; return (x1, x2) }
posPair
-- >>> sample' posPair
-- [(29,71),(48,74),(89,53),(73,93),(0,40),(71,35),(23,69),(93,49),(59,58),(27,32),(88,45)]
```

## EXERCISE

Lets write a function that *mixes* a list of `Gen`

erators

```
oneOf :: [Gen a] -> Gen a
= ??? oneOf
```

`oneOf [g0,g1,...,gn]`

should be a generator that

randomly selects

*one of*`g0`

,…`gn`

randomly generates a value from the chosen generator

```
>>> sample' (oneOf [choose (0,2), choose (10,12)])
2,2,1,1,12,10,2,2,11,0,11] [
```

`oneOf`

is generalized into the `frequency`

combinator

`frequency :: [(Int, Gen a)] -> Gen a`

which builds *weighted mixtures* of individual `Gen`

erators

## Types that can be `Gen`

erated

QC has a `class`

for types whose values can be randomly `Gen`

erated

```
class Arbitrary a where
arbitrary :: Gen a
```

`T`

is an instance of `Arbitrary`

if there is `Gen T`

function

- i.e. there is a generator of
`T`

values!

```
randomThings :: (Arbitrary a) => IO [a]
= sample' arbitrary randomThings
```

Many standard types have `Arbitrary`

instances

- Users write their own instances when testing their own types

```
>>> randomThings :: IO [Int]
0,-2,-2,0,-1,8,1,-14,-13,5,19]
[
>>> randomThings :: IO [(Int, Bool)]
0,True),(1,True),(0,True),(6,False),(-5,True),(4,False),(-12,False),(-8,False),(5,False),(-9,False),(-7,False)]
[(-
>>> randomThings :: IO [String]
"","\a","\f","\779257W\SUBA","\84573","D\ACK\365059S","9W\554735G","g\SYN~W\62120\&4&[","\NULsc\18427fy(","Q`TI \n/TH","\461027\ESCZ`u\783094\&4B\SOHT\424692"] [
```

## Plan

- Property-based Testing
- Properties are
`Bool`

ean-functions - Generate inputs, run function, check if result is
`False`

- Properties are
- Test Generation
`Gen a`

is a monadic*generator*of`a`

values`Arbitrary`

is a class for types with generators

**Case-Study**

## Case Study: Compiler Optimizations

Lets use QC to test *compiler optimizations*

Learn how to

*generate*structure data (*programs*)Learn how to

*specify*fancy properties (*equivalence*)

Using the `WHILE`

language from your HW assignment.

## WHILE: Syntax

Recall the definition of `Variable`

and `Value`

```
data Variable = V String
data Value = IntVal Int | BoolVal Bool
```

which we used to define `Expression`

```
data Expression = Var Variable | Val Value
| Plus Expression Expression
| Minus Expression Expression
```

and `Statement`

```
data Statement
= Assign Variable Expression
| If Expression Statement Statement
| While Expression Statement
| Sequence Statement Statement
| Skip
```

## WHILE: Semantics

Next, we defined the *behavior* of programs as

functions from

*starting*state to*final*statewhere

*state*was defined as a map from`Variable`

to`Value`

`type WState = M.Map Variable Value`

and then you wrote a function

```
execute :: WState -> Statement -> WState
= execState (evalS stmt) s0 execute s0 stmt
```

(We can skip the details of `evalS`

because *you* wrote it… right?)

## Generating WHILE Programs

Lets write a *program generator*

### Generating Variables

```
instance Arbitrary Variable where
= do
arbitrary <- elements ['A'..'Z']
x return (V [x])
```

and we get

```
>>> randomThings :: IO [Variable]
V "G",V "U",V "Z",V "F",V "Z",V "K",V "P",V "D",V "Y",V "M",V "H"] [
```

### Generating Values

```
instance Arbitrary Value where
= oneOf [ do {n <- arbitrary; return (IntVal n) }
arbitrary do {b <- arbitrary; return (BoolVal b)}
, ]
```

and we get

```
>>> randomThings :: IO [Value]
IntVal 0,BoolVal False,BoolVal True,IntVal 3
[IntVal (-8),IntVal (-3),IntVal 1,BoolVal False
,IntVal 6,BoolVal True,BoolVal False] ,
```

### Generating Expressions

```
instance Arbitrary Expression where
= expr
arbitrary
baseE :: Gen Expression
expr, binE,= oneof [baseE, binE]
expr
= do { o <- elements [Plus, Minus];
binE <- expr;
e1 <- expr
e2 return (o e1 e2) }
= oneOf [ do {x <- arbitrary; return (Var x) }
baseE do {v <- arbitrary; return (Val v)} ] ,
```

which gives us

```
>>> randomThings :: IO [Expression]
... -- lots of expressions!
```

## Generating States

QC already has an way to automatically generate `Map`

s

```
instance (Ord k, Arbitrary k, Arbitrary v) => Arbitrary (M.Map k v) where
= do {kvs <- arbitrary; return (M.fromList kvs) } arbitrary
```

So for free we get a generator for `WState`

```
>>> randomThings :: IO [WState]
...
```

## Specification: Program Equivalence

Next, lets specify the **properties**

Let `p1`

and `p2`

be two *While* programs.

Program `p1`

is *equivalent to* `p2`

written `p1 === p2`

if

```
(===) :: Statement -> Statement -> Property
=== p2 = forAll arbitrary (\st -> execute st p1 == execute st p2) p1
```

That is, for all *input* states `st`

executing

`p1`

from`st`

andexecuting

`p2`

from`st`

produce the *same* state.

## QUIZ

For example, suppose that

```
-- X := 10; Y := 20
= Sequence
prog1 Assign (V "X") (Val (IntVal 10)))
(Assign (V "Y") (Val (IntVal 20)))
(
-- Y := 20; X := 10
= Sequence
prog2 Assign (V "Y") (Val (IntVal 20)))
(Assign (V "X") (Val (IntVal 10))
(
-- Y := 20; X := 20
= Sequence
prog3 Assign (V "Y") (Val (IntVal 20)))
(Assign (V "X") (Val (IntVal 20))) (
```

then what do the following two queries return?

```
>>> quickCheck (prog1 === prog2)
>>> quickCheck (prog1 === prog3)
```

**A.** FAIL, FAIL

**B.** FAIL, PASS

**C.** PASS, FAIL

**D.** PASS, PASS

## Checking Compiler Optimizations

A **compiler optimization** can be viewed as a **pair** of programs

*input*written by human`p_in`

*output*optimized by compiler`p_out`

An optimization is **correct** if `p_in === p_out`

- Compiler should not
*change behavior*of the code

## Checking an Optimization: Zero-Add-Elimination

Here’s a simple optimization

In |
Out |
---|---|

`X := E + 0` |
`X := E` |

`X := E - 0` |
`X := E` |

Should be correct because adding `0`

doesn’t change anything…

We can write this as a QC **property**

```
prop_add_zero_elim :: Variable -> Expression -> Property
=
prop_add_zero_elim x e `Assign` (e `Plus` Val (IntVal 0))) === (x `Assign` e)
(x
prop_sub_zero_elim :: Variable -> Expression -> Property
=
prop_sub_zero_elim x e `Assign` (e `Minus` Val (IntVal 0))) === (x `Assign` e) (x
```

So what does QC say?

```
-- >>> quickCheck prop_add_zero_elim
-- *** Failed! Falsifiable (after 1 test):
-- W
-- True
-- fromList []
```

What’s that? Lets see is `W := True`

equivalent to `W := True + 0`

?

Forgot about those pesky boolean expressions!

If you recall `W := True`

will just assign `True`

to `W`

```
>>> execute M.empty ((V "W") `Assign` (Val (BoolVal True)))
W,True)] fromList [(
```

but `W := True + 0`

will have a “type error” so will assign `0`

to `W`

!

```
>>> execute M.empty (V "W") `Assign` ((Val (BoolVal True) `Plus` Val (IntVal 0)))
W,0)] fromList [(
```

## Fix: Restrict Optimizations to `Int`

Expressions

Problem was expressions like `True`

- Caused strange behaviors due to “type errors”

Lets *restrict* to **only integer-valued expressions**

```
binI :: Gen Expression
exprI, baseI,= oneof [baseI, binE]
exprI
= oneOf [ do {x <- arbitrary; return (Var x) }
baseI do {n <- arbitrary; return (Val (IntVal n)) }
,
]= do { o <- elements [Plus, Minus];
binI <- exprI;
e1 <- exprI;
e2 return (o e1 e2) }
```

Now we can restrict the property to

```
prop_add_zero_elim' :: Variable -> Property
=
prop_add_zero_elim' x -> (x `Assign` (e `Plus` Val (IntVal 0))) === (x `Assign` e)) forAll intExpr (\e
```

## QUIZ

Consider the property

```
prop_add_zero_elim' :: Variable -> Property
=
prop_add_zero_elim' x -> (x `Assign` (e `Plus` Val (IntVal 0))) === (x `Assign` e)) forAll intExpr (\e
```

What will be the result of

`>>> quickCheck prop_add_zero_elim'`

**A.** PASS

**B.** FAIL

```
>>> quickCheck prop_add_zero_elim'
*** Failed! Falsifiable (after 11 tests):
Z
G
B,False),(F,-4),(G,True),(K,8),(M,True),(N,False),(R,3),(T,False),(V,True)] fromList [(
```

Oops, the counterexample is `Z := G`

and `Z := G + 0`

*but*now the*starting*state maps`(G, True)`

Pesky `Bool`

sneaked right back in …

**Moral:** Even simple optimizations are *really* tricky without types!

**Try at home** Can you fix this property so it passes?

## Checking an Optimization: Constant-Folding-ish

Lets try another optimization that doesn’t use any *arithmetic*

- So should not have any problems with
`Int`

-vs-`Bool`

Suppose you have two back-to-back assignments

`X := E; Y := E`

Why *recompute* `E`

? Result is already stored in `X`

! So optimize the above to

`X := E; Y := X`

We can specify this transformation as the QC property

```
prop_const_prop :: Variable -> Variable -> Expression -> Property
=
prop_const_prop x y e `Assign` e) `Sequence` (y `Assign` e))
((x ===
`Assign` e) `Sequence` (y `Assign` Var x)) ((x
```

Mighty QC, do you agree ?

```
>>> quickCheck prop_const_prop
*Testing> quickCheck prop_const_prop
*** Failed! Falsifiable (after 82 tests):
D
B
True + N + L + M + True + -45 + H + -9 + 70 + True + -68 + N + -29 + I + True + G + O + P + True + Q + False + False + True + True + True + True + X + I + False + 81 + -42 + False + 31 + -13 + T + 23 + True + S + True + I + M + True + True + True + Z + H + -65 + G + K + -22 + D
A,True),(B,-72),(C,-19),(D,-34),(E,50),(F,True),(G,True),(H,-21),(I,5),(J,3),(K,True),(L,-20),(M,True),(N,-10),(O,-20),(P,False),(Q,-10),(R,-78),(S,True),(T,70),(U,False),(V,-55),(W,True),(X,True),(Y,True),(Z,-56)] fromList [(
```

## Shrinking Tests

The property *fails* … but the *counterexample* is very long!

QC has a **test shrinking** mechanism … also in the `Arbitrary`

class

`shrink :: a -> [a]`

`shrink t`

- takes as
*input*a**candidate**`t`

and

- returns as
*output*a list of**smaller candidates**

That QC will systematically search with to find a **minimally** failing test!

```
instance Arbitrary Expression where
= expr
arbitrary
-- shrink :: Expression -> [Expression]
Plus e1 e2) = [e1, e2]
shrink (Minus e1 e2) = [e1, e2]
shrink (= [] shrink _
```

We should just keep shrinking each `Expression`

to its *sub*-`Expressions`

.

```
>>> quickCheck prop_const_prop
*** Failed! Falsifiable (after 26 tests and 4 shrinks):
D
U
A + D
D,-638),(G,256),(H,False),(K,False),(O,True),(R,True),(S,-81),(T,926)]
fromList [(~~~~~
Aha! Consider the two programs
```haskell
D := A + D;
U := A + D
```

and

```
D := A + D;
U := D
```

are they equivalent? Pretty subtle, eh.

## Recap: Property-Based Testing

- Property-based Testing
- Properties are
`Bool`

ean-functions - Generate inputs, run function, check if result is
`False`

- Properties are
- Test Generation
`Gen a`

is a monadic*generator*of`a`

values`Arbitrary`

is a class for types with generators

**Case-Study**- Compiler optimizations are very tricky
- QC-inspired methods have found serious bugs in many compilers & databases & …

## Recap: Property-Based Testing

QuickCheck is awesome!

**Simple:**typeclasses + monads**Useful**: Can find subtle bugs or inconsistencies in your code.

Lots of literature on QC and techniques it has inspired

- Can even use QC to generate test data systems in
*other*languages