Saturday, April 7, 2012

BNF - Backus Naur Form

Miranda and mum

Such fun.

I've just started an interesting personal project that has some fun bits to it. Yesterday I wrote a simple tokenizer that looks quite neat and tidy, so I may post that soon. But today I felt I needed a parser. Then I realised that I didn't actually need a parser, but it would be fun to write one again after all these years. But I couldn't remember Backus Naur Form or, Recursive Decent parsers... it's all a blur. So I thought I'd make some notes on BNF, then I might look at the Recursive Decent Parser, or perhaps look at EBNF before that, then see why BNF is no use representing HTML.

So I dug out my old notes and it explained who Backus was, who Naur was (or "is" perhaps). And when the BNF was made and why. Wikipedia is the place for that info.

What is Backus Naur Form (Backus Normal Form)

BNF is used to define the grammar of language. It is not the only way, but it is a good and unambiguous way. Once defined, the grammar can be taken and used to create a complier. I only ever looked at Recursive decent parsers and that was many years back. There some pretty neat advances now and modern compilers don't tend to use this method now. But it's great if you need to create your own local language. Not many of us do, but I wish I did need to.

Rules

BNF has rules. You define the rules. These can be called production rules.

symbol := alternative1 | alternative2

This means, the symbol on the LHS must be replaced by one of the alternatives on the RHS

Some examples use, ::= instead of :=
I'm not sure which is more prevalent or better. I use :=

Terminals and Symbols

We say symbols in the the previous section. Terminals are the bits of the string that are not symbols. Terminals do not have productions rules. They terminate production.

Limitation and use

It is interesting to realise that BNF cannot express all of the syntax rules. It will not state that a variable needs to be defined before it can be used, for example. It also will not be able to describe semantic meaning. All it will do, is define the structure of the language, the framework.

The first example:

<number> := <digit> | <number> <digit>
<digit>  := 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

I think the first tricky thing, or the only tricky thing, is the reading of the <number> <digit> rule.
Basically, this says that:
" a number is a digit, or a number which is followed by a digit."the second rule says, "a digit is a single character from 0, 1, 2 ... 9"

There are many other variations on the above example, here are a few:

<number> := <digit> | <number> <digit> .
<digit> := 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 .

or,

<number> := <digit> 
         | <number> <digit>
         .
<digit>  := 0 
         | 1 
         | 2 
         | 3 
         | 4 
         | 5 
         | 6 
         | 7 
         | 8 
         | 9 
         .

or perhaps

number ::= digit | number digit .
digit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 .



If you think of developing this, or even understanding this, you see that "a number can be a digit" OR, "a number can be a number followed by a digit". So, in code you would have to recursively call the number function again.

Full-on example of BNF - C syntax

The C language syntax in BNF can be found here: http://www.cs.man.ac.uk/~pjj/bnf/c_syntax.bnf
It is a bit intimidating at first.


translation_unit : external_decl
   | translation_unit external_decl
   ;
external_decl  : function_definition
   | decl
   ;
function_definition : decl_specs declarator decl_list compound_stat
   |  declarator decl_list compound_stat
   | decl_specs declarator  compound_stat
   |  declarator  compound_stat
   ;
decl   : decl_specs init_declarator_list ';'
   | decl_specs   ';'
   ;
decl_list  : decl
   | decl_list decl
   ;


Other posts:
Link to a post on using BNF to create a Recursive Decent Parser
Link to a post on EBNF

No comments:

Post a Comment