Before we continue …
A Word from the Sponsor!
		Don't Fear MonadsThey are just a versatile abstraction, like map or fold.
Parsers
A parser is a function that
- converts unstructured data (e.g. String, array ofByte,…)
- into structured data (e.g. JSON object, Markdown, Video…)
type Parser = String -> StructuredObject
Every large software system contains a Parser
| System | Parses | 
|---|---|
| Shell Scripts | Command-line options | 
| Browsers | HTML | 
| Games | Level descriptors | 
| Routers | Packets | 
| Netflix | Video | 
| Spotify | Audio, Playlists… | 
How to build Parsers?
Two standard methods
Regular Expressions
- Doesn’t really scale beyond simple things
- No nesting, recursion
Parser Generators
- Specify grammar via rules
Expr : Var            { EVar $1       }
     | Num            { ENum $1       }
     | Expr Op Expr   { EBin $1 $2 $3 }
     | '(' Expr ')'   { $2            }
     ;- Tools like yacc,bison,antlr,happy
- convert grammar into executable function
Grammars Don’t Compose!
If we have two kinds of structured objects Thingy and Whatsit.
Thingy : rule 	{ action } 
;
Whatsit : rule  { action }
;To parse sequences of Thingy and Whatsit we must duplicate the rules
Thingies : Thingy Thingies  { ... } 
           EmptyThingy      { ... }
;
Whatsits : Whatsit Whatsits { ... }
           EmptyWhatsit     { ... }
;No nice way to reuse the sub-parsers for Whatsit and Thingy :-(
A New Hope: Parsers as Functions
Lets think of parsers directly as functions that
- Take as input a String
- Convert a part of the input into a StructuredObject
- Return the remainder unconsumed to be parsed later
data Parser a = P (String -> (a, String))A Parser a
- Converts a prefix of a String
- Into a structured object of type aand
- Returns the suffix Stringunchanged
Parsers Can Produce Many Results
Sometimes we want to parse a String like
"2 - 3 - 4"into a list of possible results
[(Minus (Minus 2 3) 4),   Minus 2 (Minus 3 4)]So we generalize the Parser type to
data Parser a = P (String -> [(a, String)])
EXERCISE
Given the definition
data Parser a = P (String -> [(a, String)])Implement a function
runParser :: Parser a -> String -> [(a, String)]
runParser p s = ???
QUIZ
Given the definition
data Parser a = P (String -> [(a, String)])Which of the following is a valid oneChar :: Parser Char
that returns the first Char from a string (if one exists)
-- A
oneChar = P (\cs -> head cs)
-- B
oneChar = P (\cs -> case cs of 
                      []   -> [('', [])] 
                      c:cs -> [c, cs])
-- C
oneChar = P (\cs -> (head cs, tail cs))
-- D
oneChar = P (\cs -> [(head cs, tail cs)])
-- E
oneChar = P (\cs -> case cs of
                      [] -> [] 
                      cs -> [(head cs, tail cs)])
Lets Run Our First Parser!
>>> runParser oneChar "hey!"
[('h', "ey")]
>>> runParser oneChar "yippee"
[('y', "ippee")]
>>> runParser oneChar ""
[]Failure to parse means result is an empty list!
EXERCISE
Your turn: Write a parser to grab first two chars
twoChar :: Parser (Char, Char)
twoChar = P (\cs -> ???) When you are done, we should get
>>> runParser twoChar "hey!"
[(('h', 'e'), "y!")]
>>> runParser twoChar "h"
[]
QUIZ
Ok, so recall
twoChar :: Parser (Char, Char)
twoChar  = P (\cs -> case cs of
                       c1:c2:cs' -> [((c1, c2), cs')]
                       _         -> [])Suppose we had some foo such that twoChar' was equivalent to twoChar
twoChar' :: Parser (Char, Char)
twoChar' = foo oneChar oneChar What must the type of foo be?
A. Parser (Char, Char)
B. Parser Char -> Parser (Char, Char)
C. Parser a -> Parser a -> Parser (a, a)
D. Parser a -> Parser b -> Parser (a, b)
E. Parser a -> Parser (a, a)
EXERCISE: A forEach Loop
Lets write a function
forEach :: [a] -> (a -> [b]) -> [b]
forEach xs f = ???such that we get the following behavior
>>> forEach [] (\i -> [i, i + 1])
[]
>>> forEach [10,20,30] (\i -> [show i, show (i+1)])
["10", "11", "20", "21", "30", "31"]
QUIZ
What does quiz evaluate to?
quiz = forEach [10, 20, 30] (\i -> 
         forEach [0, 1, 2] (\j -> 
           [i + j] 
         )
       )A. [10,20,30,0,1,2]
B. [10,0,20,1,30,2]
C. [[10,11,12], [20,21,22] [30,31,32]]
D. [10,11,12,20,21,22,30,31,32]
E. [32]
A pairP Combinator
Lets implement the above as pairP
forEach :: [a] -> (a -> [b]) -> [b]
forEach xs f = concatMap f xs 
pairP :: Parser a -> Parser b -> Parser (a, b)
pairP aP bP = P (\s -> forEach (runParser aP s) (\(a, s') ->
                         forEach (runParser bP s') (\(b, s'') -> 
                           ((a, b), s'')
                       )
                ) Now we can write
twoChar = pairP oneChar oneChar
QUIZ
What does quiz evaluate to?
twoChar = pairP oneChar oneChar
quiz    = runParser twoChar "h" A. [((h,h), "")]
B. [(h, "")]
C. [("", "")]
D. []
E. Run-time exception
Does the Parser a type remind you of something?
Lets implement the above as pairP
data Parser a = P (String -> [(a, String)])
data ST s a   = S (s -> (a, s))
Parser is a Monad!
Like a state transformer, Parser is a monad!
We need to implement two functions
returnP :: a -> Parser a
bindP   :: Parser a -> (a -> Parser b) -> Parser b
QUIZ
Which of the following is a valid implementation of returnP
data Parser a = P (String -> [(a, String)])
returnP   :: a -> Parser a
returnP a = P (\s -> [])          -- A
returnP a = P (\s -> [(a, s)])    -- B
returnP a = P (\s -> (a, s))      -- C
returnP a = P (\s -> [(a, "")])   -- D
returnP a = P (\s -> [(s, a)])    -- E
HINT: return a should just
- “produce” the parse result aand
- leave the string unconsumed.
Bind
Next, lets implement bindP
- we almost saw it as pairP
bindP :: Parser a -> (a -> Parser b) -> Parser b
bindP aP fbP = P (\s -> 
  forEach (runParser aP s) (\(a, s') -> 
    forEach (runParser (fbP a) s') (\(b, s'') ->
      [(b, s'')]
    )   
  )
)The function
- Builds the avalues out ofaP(usingrunParser)
- Builds the bvalues by callingfbP aon the remainder strings'
- Returns bvalues and the remainder strings''

The Parser Monad
We can now make Parser an instance of Monad
instance Monad Parser where
  (>>=)  = bindP
  return = returnPAnd now, let the wild rumpus start!
Parser Combinators
Lets write lots of high-level operators to combine parsers!
Here’s a cleaned up pairP
pairP :: Parser a -> Parser b -> Parser (a, b)
pairP aP bP = do 
  a <- aP
  b <- bP
  return (a, b)
Failures are the Pillars of Success!
Surprisingly useful, always fails
- i.e. returns []no successful parses
failP :: Parser a
failP = P (\_ -> [])
QUIZ
Consider the parser
satP :: (Char -> Bool) -> Parser Char
satP p = do 
  c <- oneChar
  if p c then return c else failPWhat is the value of
quiz1 = runParser (satP (\c -> c == 'h')) "hellow"
quiz2 = runParser (satP (\c -> c == 'h')) "yellow"| quiz1 | quiz2 | |
|---|---|---|
| A | [] | [] | 
| B | [('h', "ellow")] | [('y', "ellow")] | 
| C | [('h', "ellow")] | [] | 
| D | [] | [('y', "ellow")] | 
Parsing Alphabets and Numerics
We can now use satP to write
-- parse ONLY the Char c
char :: Parser Char
char c = satP (\c' -> c == c')
-- parse ANY ALPHABET 
alphaCharP :: Parser Char
alphaCharP = satP isAlpha
-- parse ANY NUMERIC DIGIT
digitChar :: Parser Char
digitChar = satP isDigit
QUIZ
We can parse a single Int digit
digitInt :: Parser Int
digitInt = do
  c <- digitChar      -- parse the Char c
  return (read [c])   -- convert Char to IntWhat is the result of
quiz1 = runParser digitInt "92"
quiz2 = runParser digitInt "cat"| quiz1 | quiz2 | |
|---|---|---|
| A | [] | [] | 
| B | [('9', "2")] | [('c', "at")] | 
| C | [(9, "2")] | [] | 
| D | [] | [('c', "at")] | 
EXERCISE
Write a function
strP :: String -> Parser String 
strP s = -- parses EXACTLY the String s and nothing elsewhen you are done, we should get the following behavior
>>> dogeP = strP "doge"
>>> runParser dogeP "dogerel"
[("doge", "rel")]
>>> runParser dogeP "doggoneit"
[]
A Choice Combinator
Lets write a combinator orElse p1 p2 such that
- returns the results of p1
or, else if those are empty
- returns the results of p2
:: Parser a -> Parser a -> Parser a
orElse p1 p2 = -- produce results of `p1` if non-empty
               -- OR-ELSE results of `p2`e.g. orElseP lets us build a parser that produces an alphabet OR a numeric character
alphaNumChar :: Parser Char
alphaNumChar = alphaChar `orElse` digitCharWhich should produce
>>> runParser alphaNumChar "cat"
[('c', "at")]
>>> runParser alphaNumChar "2cat"
[('2', "cat")]
>>> runParser alphaNumChar "230"
[('2', "30")]
QUIZ
orElse :: Parser a -> Parser a -> Parser a
orElse p1 p2 = -- produce results of `p1` if non-empty
               -- OR-ELSE results of `p2`Which of the following implements orElse?
-- a 
orElse p1 p2 = do 
  r1s <- p1
  r2s <- p2
  return (r1s ++ r2s) 
-- b
orElse p1 p2 = do 
  r1s <- p1 
  case r1s of 
    [] -> p2 
    _  -> return r1s
-- c
orElse p1 p2 = P (\cs -> 
  runParser p1 cs ++ runParser p2 cs
  )
-- d
orElse p1 p2 = P (\cs -> 
  case runParser p1 cs of
    []  -> runParser p2 cs
    r1s -> r1s
  )
An “Operator” for orElse
It will be convenient to have a short “operator” for orElse
p1 <|> p2 = orElse p1 p2
A Simple Expression Parser
Now, lets write a tiny calculator!
-- 1. First, parse the operator 
intOp      :: Parser (Int -> Int -> Int) 
intOp      = plus <|> minus <|> times <|> divide 
  where 
    plus   = do { _ <- char '+'; return (+) }
    minus  = do { _ <- char '-'; return (-) }
    times  = do { _ <- char '*'; return (*) }
    divide = do { _ <- char '/'; return div }
-- 2. Now parse the expression!
calc :: Parser Int
calc = do x  <- digitInt
          op <- intOp
          y  <- digitInt 
          return (x `op` y)When calc is run, it will both parse and calculate
>>> runParser calc "8/2"
[(4,"")]
>>> runParser calc "8+2cat"
[(10,"cat")]
>>> runParser calc "8/2cat"
[(4,"cat")]
>>> runParser calc "8-2cat"
[(6,"cat")]
>>> runParser calc "8*2cat"
[(16,"cat")]
QUIZ
What will quiz evaluate to?
calc :: Parser Int
calc = do x  <- digitInt
          op <- intOp
          y  <- digitInt 
          return (x `op` y)
quiz = runParser calc "99bottles"A. Type error
B. []
C. [(9, "9bottles")]
D. [(99, "bottles")]
E. Run-time exception
Next: Recursive Parsing
Its cool to parse individual Char …
… but way more interesting to parse recursive structures!
"((2 + 10) * (7 - 4)) * (5 + 2)"
EXERCISE: A “Recursive” String Parser
The parser string s parses exactly the string s
- fails otherwise
>>> runParser (string "mic") "mickeyMouse"
[("mic","keyMouse")]
>>> runParser (string "mic") "donald duck"
[]Lets fill in an implementation
string :: String -> Parser String
string s = ???Which library function will eliminate the recursion from string?
QUIZ: Parsing Many Times
Often we want to repeat parsing some object
-- | `manyP p` repeatedly runs `p` to return a list of [a]
manyP  :: Parser a -> Parser [a]
manyP p = m0 <|> m1
  where
    m0  = return [] 
    m1  = do { x <- p; xs <- manyP p; return (x:xs) } Recall digitChar :: Parser Char returned a single numeric Char
What will quiz evaluate to?
quiz = runParser (manyP digitChar) "123horse"A. [(""  , "1234horse")]
B. [("1" , "234horse")]
C. [("1", "23horse"), ("12", "3horse"), ("123", "horse )]
D. [("123", "horse")]
E. []
Lets fix manyP!
Run p first and only return [] if it fails …
-- | `manyP p` repeatedly runs `p` to return a list of [a]
manyP  :: Parser a -> Parser [a]
manyP p = m1 <|> m0
  where
    m0  = return [] 
    m1  = do { x <- p; xs <- manyP p; return (x:xs) } now, we can write an Int parser as
int :: Parser Int
int = do { xs <- manyP digitChar; return (read xs) }which will produce
>>> runParser oneChar "123horse"
[("123", "horse")]
>>> runParser int "123horse"
[(123, "horse")]
Parsing Arithmetic Expressions
Now we can build a proper calculator!
calc0 ::  Parser Int
calc0 = binExp <|> int 
int :: Parser Int
int = do 
  xs <- many digitChar 
  return (read xs)
binExp :: Parser Int
binExp = do
  x <- int 
  o <- intOp 
  y <- calc0 
  return (x `o` y) Works pretty well!
>>> runParser calc0 "11+22+33"
[(66,"")]
ghci> doParse calc0 "11+22-33"
[(0,"")]
QUIZ
calc0 ::  Parser Int
calc0 = binExp <|> int 
int :: Parser Int
int = do 
  xs <- many digitChar 
  return (read xs)
binExp :: Parser Int
binExp = do
  x <- int 
  o <- intOp 
  y <- calc0 
  return (x `o` y) What does quiz evaluate to?
quiz = runParser calc0 "10-5-5"A. [(0, "")]
B. []
C. [(10, "")]
D. [(10, "-5-5")]
E. [(5, "-5")]
Problem: Right-Associativity
Recall
binExp :: Parser Int
binExp = do
  x <- int 
  o <- intOp 
  y <- calc0 
  return (x `o` y)"10-5-5" gets parsed as 10 - (5 - 5) because

The calc0 parser implicitly forces each operator to be right associative
- doesn’t matter for - +,- *
- but is incorrect for - -
QUIZ
Recall
binExp :: Parser Int
binExp = do
  x <- int 
  o <- intOp 
  y <- calc0 
  return (x `o` y)What does quiz get evaluated to?
quiz = runParser calc0 "10*2+100"A. [(1020,"")]
B. [(120,"")]
C. [(120,""), (1020, "")]
D. [(1020,""), (120, "")]
E. []
The calc0 parser implicitly forces all operators to be right associative
- doesn’t matter for - +,- *
- but is incorrect for - -
Does not respect precedence!
Simple Fix: Parentheses!
Lets write a combinator that parses something within (...)
parensP :: Parser a -> Parser a
parensP p = do 
  _ <- char '(' 
  x <- p
  _ <- char ')'
  return x now we can try
calc1 :: Parser Int
calc1 = parens binExp <|> int 
binExp :: Parser Int
binExp = do
  x <- int 
  o <- intOp 
  y <- calc1 
  return (x `o` y)now the original string wont even parse
>>> runParser calc1 "10-5-5" 
[]but we can add parentheses to get the right result
>>> runParser calc1 "((10-5)-5)" 
[(0 ,"")]
>>> runParser calc1 "(10-(5-5))" 
[(10 ,"")]
>>> runParser calc1 "((10*2)+100)"
[(120, "")]
>>> runParser calc1 "(10*(2+100))"
[(1020, "")]
Left Associativity
But how to make the parser left associative
- i.e. parse “10-5-5” as (10 - 5) - 5?
Lets flip the order!
calc1      ::  Parser Int
calc1      = binExp <|> oneInt 
binExp :: Parser Int
binExp = do 
  x <- calc1 
  o <- intOp 
  y <- int
  return (x `o` y)But …
>>> runParser calc1 "2+2"
...Infinite loop! calc1 --> binExp --> calc1 --> binExp --> ...
- without consuming any input :-(
Solution: Parsing with Multiple Levels
Any expression is a sum-of-products
  10 * 20 * 30 + 40 * 50 + 60 * 70 * 80
=> 
  ((((10 * 20) * 30) + (40 * 50)) + ((60 * 70) * 80))
=>  
  ((((base * base) * base)  + (base * base)) + ((base * base) * base))
=>  
  (((prod * base) + prod)  + (prod * base))
=>  
  ((prod + prod) + prod) 
=>   
  (sum + prod)
=>
   sum 
=>   
   expr
Parsing with Multiple Levels
So lets layer our language as
  expr :== sum
  sum  :== (((prod "+" prod) "+" prod) "+" ... "+" prod)
  prod :== (((base "*" base) "*" base) "*" ... "*" base) 
  base :== "(" expr ")" ORELSE intthat is the recursion looks like
  expr = sum
  sum  = oneOrMore prod "+" 
  prod = oneOrMore base "*"
  base = "(" expr ")" <|> int No infinite loop!
- expr --> prod --> base -->* expr
- but last step - -->*consumes a- (
Parsing oneOrMore
Lets implement oneOrMore vP oP as a combinator
- vP parses a single a value
- oP parses an operator a -> a -> a
- oneOrMore vP oP parses and returns the result ((v1 o v2) o v3) o v4) o ... o vn)
But how?
- grab the first - v1using- vP
- continue by - either trying oPthenv2… and recursively continue withv1 o v2
- orElse(no more- o) just return- v1
 
- either trying 
oneOrMore :: Parser a -> Parser (a -> a -> a) -> Parser a
oneOrMore vP oP = do {v1 <- vP; continue v1}
  where 
    continue v1 = do { o <- oP; v2 <- vP; continue (v1 `o` v2) }
               <|> return v1
Implementing Layered Parser
Now we can implement the grammar
  expr = sum
  sum  = oneOrMore prod "+" 
  prod = oneOrMore base "*"
  base = "(" expr ")" <|> int simply as
expr = sum
sum  = oneOrMore prod addOp
prod = oneOrMore base mulOp
base = parens expr <|> intwhere addOp is + or - and mulOp is * or /
addOp, mulOp :: Parser (Int -> Int -> Int)
addOp = constP "+" (+) <|> constP "-" (-)
mulOp = constP "*" (*) <|> constP "/" div
constP :: String -> a -> Parser a 
constP s x = do { _ <- string s; return x }Lets make sure it works!
>>> doParse sumE2 "10-1-1"
[(8,"")]
>>> doParse sumE2 "10*2+1"
[(21,"")]
>>> doParse sumE2 "10+2*1"
[(12,"")]
Parser combinators
That was a taste of Parser Combinators
- Transferred from Haskell to many other languages.
Many libraries including Parsec used in your homework
- oneOrMore is called chainl
Read more about the theory - in these recent papers
Read more about the practice - in this recent post that I like JSON parsing from scratch
