Parser Combinators
















Before we continue …

A Word from the Sponsor!

They are just a versatile abstraction, like map or fold.















Parsers

A parser is a function that

  • converts unstructured data (e.g. String, array of Byte,…)
  • into structured data (e.g. JSON object, Markdown, Video…)












Every large software system contains a Parser

System Parses
Shell Scripts Command-line options
Browsers HTML
Games Level descriptors
Routers Packets
Netflix Video
Spotify Audio, Playlists…












How to build Parsers?

Two standard methods

Regular Expressions

  • Doesn’t really scale beyond simple things
  • No nesting, recursion

Parser Generators

  1. Specify grammar via rules
  1. Tools like yacc, bison, antlr, happy
  • convert grammar into executable function



















Grammars Don’t Compose!

If we have two kinds of structured objects Thingy and Whatsit.

To parse sequences of Thingy and Whatsit we must duplicate the rules

No nice way to reuse the sub-parsers for Whatsit and Thingy :-(













A New Hope: Parsers as Functions

Lets think of parsers directly as functions that

  • Take as input a String
  • Convert a part of the input into a StructuredObject
  • Return the remainder unconsumed to be parsed later

A Parser a

  • Converts a prefix of a String
  • Into a structured object of type a and
  • Returns the suffix String unchanged













Parsers Can Produce Many Results

Sometimes we want to parse a String like

into a list of possible results

So we generalize the Parser type to












EXERCISE

Given the definition

Implement a function












QUIZ

Given the definition

Which of the following is a valid Parser Char

  • that returns the first Char from a string (if one exists)












Lets Run Our First Parser!

Failure to parse means result is an empty list!














EXERCISE

Your turn: Write a parser to grab first two chars

When you are done, we should get














QUIZ

Ok, so recall

Suppose we had some foo such that twoChar' was equivalent to twoChar

What must the type of foo be?

A. Parser (Char, Char)

B. Parser Char -> Parser (Char, Char)

C. Parser a -> Parser a -> Parser (a, a)

D. Parser a -> Parser b -> Parser (a, b)

E. Parser a -> Parser (a, a)









EXERCISE: A forEach Loop

Lets write a function

such that we get the following behavior











QUIZ

What does quiz evaluate to?

A. [10,20,30,0,1,2]

B. [10,0,20,1,30,2]

C. [[10,11,12], [20,21,22] [30,31,32]]

D. [10,11,12,20,21,22,30,31,32]

E. [32]








A pairP Combinator

Lets implement the above as pairP

Now we can write









QUIZ

What does quiz evaluate to?

A. [((h,h), "")]

B. [(h, "")]

C. [("", "")]

D. []

E. Run-time exception













Does the Parser a type remind you of something?

Lets implement the above as pairP










Parser is a Monad!

Like a state transformer, Parser is a monad!

We need to implement two functions











QUIZ

Which of the following is a valid implementation of returnP













HINT: return a should just

  • “produce” the parse result a and
  • leave the string unconsumed.














Bind

Next, lets implement bindP

  • we almost saw it as pairP

The function

  • Builds the a values out of aP (using runParser)
  • Builds the b values by calling fbP a on the remainder string s'
  • Returns b values and the remainder string s''

The Parser Monad

We can now make Parser an instance of Monad

And now, let the wild rumpus start!















Parser Combinators

Lets write lots of high-level operators to combine parsers!

Here’s a cleaned up pairP














Failures are the Pillars of Success!

Surprisingly useful, always fails

  • i.e. returns [] no successful parses











QUIZ

Consider the parser

What is the value of

quiz1 quiz2
A [] []
B [('h', "ellow")] [('y', "ellow")]
C [('h', "ellow")] []
D [] [('y', "ellow")]











Parsing Alphabets and Numerics

We can now use satP to write













QUIZ

We can parse a single Int digit

What is the result of

quiz1 quiz2
A [] []
B [('9', "2")] [('c', "at")]
C [(9, "2")] []
D [] [('c', "at")]











EXERCISE

Write a function

when you are done, we should get the following behavior













A Choice Combinator

Lets write a combinator orElse p1 p2 such that

  • returns the results of p1

or, else if those are empty

  • returns the results of p2

e.g. orElseP lets us build a parser that produces an alphabet OR a numeric character

Which should produce
















QUIZ

Which of the following implements orElse?











An “Operator” for orElse

It will be convenient to have a short “operator” for orElse













A Simple Expression Parser

Now, lets write a tiny calculator!

When calc is run, it will both parse and calculate















QUIZ

What will quiz evaluate to?

A. Type error

B. []

C. [(9, "9bottles")]

D. [(99, "bottles")]

E. Run-time exception













Next: Recursive Parsing

Its cool to parse individual Char

… but way more interesting to parse recursive structures!











EXERCISE: A “Recursive” String Parser

The parser string s parses exactly the string s - fails otherwise

Lets fill in an implementation

Which library function will eliminate the recursion from string?













QUIZ: Parsing Many Times

Often we want to repeat parsing some object

Recall digitChar :: Parser Char returned a single numeric Char

What will quiz evaluate to?

A. [("" , "1234horse")]

B. [("1" , "234horse")]

C. [("1", "23horse"), ("12", "3horse"), ("123", "horse )]

D. [("123", "horse")]

E. []











Lets fix manyP!

Run p first and only return [] if it fails …

now, we can write an Int parser as

which will produce


















Parsing Arithmetic Expressions

Now we can build a proper calculator!

Works pretty well!

















QUIZ

What does quiz evaluate to?

A. [(0, "")]

B. []

C. [(10, "")]

D. [(10, "-5-5")]

E. [(5, "-5")]












Problem: Right-Associativity

Recall

"10-5-5" gets parsed as 10 - (5 - 5) because

The calc0 parser implicitly forces each operator to be right associative

  • doesn’t matter for +, *

  • but is incorrect for -









QUIZ

Recall

What does quiz get evaluated to?

A. [(1020,"")]

B. [(120,"")]

C. [(120,""), (1020, "")]

D. [(1020,""), (120, "")]

E. []














The calc0 parser implicitly forces all operators to be right associative

  • doesn’t matter for +, *

  • but is incorrect for -

Does not respect precedence!

















Simple Fix: Parentheses!

Lets write a combinator that parses something within (...)

now we can try

now the original string wont even parse

but we can add parentheses to get the right result













Left Associativity

But how to make the parser left associative

  • i.e. parse “10-5-5” as (10 - 5) - 5 ?

Lets flip the order!

But …

Infinite loop! calc1 --> binExp --> calc1 --> binExp --> ...

  • without consuming any input :-(



















Solution: Parsing with Multiple Levels

Any expression is a sum-of-products











Parsing with Multiple Levels

So lets layer our language as

that is the recursion looks like

No infinite loop!

  • expr --> prod --> base -->* expr

  • but last step -->* consumes a (

















Parsing oneOrMore

Lets implement oneOrMore vP oP as a combinator - vP parses a single a value - oP parses an operator a -> a -> a - oneOrMore vP oP parses and returns the result ((v1 o v2) o v3) o v4) o ... o vn)

But how?

  1. grab the first v1 using vP

  2. continue by
    • either trying oP then v2 … and recursively continue with v1 o v2
    • orElse (no more o) just return v1

















Implementing Layered Parser

Now we can implement the grammar

simply as

where addOp is + or - and mulOp is * or /

Lets make sure it works!









Parser combinators

That was a taste of Parser Combinators

Many libraries including Parsec used in your homework - oneOrMore is called chainl

Read more about the theory - in these recent papers

Read more about the practice - in this recent post that I like JSON parsing from scratch