Context free grammars are a concept originating in linguistics but widely used in (theoretical) computer science.
For this section, I'm going to use a bunch of mathematical notation, but I'll try to explain everything as it comes along.
A set is collection of elements where each element appears only once. The set of the number from one to four can be written as .
An alphabet is a set with a finite number of elements (the natural numbers 1, 2, 3, …, would be an example for an infinite set).
A context free grammar has four components:
- An alphabet of terminals (read: “things where the expansion terminates”)
- An alphabet of non-terminals (read: “things that can expand into other things”)
- A set of production rules
- The start symbol , an element of the set of non-terminals
It's important that the terminal and non-terminal alphabets are disjoint , i.e. there is no element that is part of both sets.
Production rules have the form . can be read as “Combine the sets and ”, is called Kleene star and can be read as “repeated zero or more times”.
The part on the right of the is called right hand side , or RHS , the part on the left left hand side or LHS .
A grammar is expanded starting with the start symbol and then replacing each non-terminal with the RHS of one of its rules.
Time for a simple example:
Starting from , the grammar expands into , then each expands into , so the end result is .
There can be multiple rules with the same right hand side , so there are (possibly infinitely) many sequences a grammar can expand to.
is the notation for an empty sequence, so when expanding , the just disappears.
Expanding this grammar, we have two options, either expand to or to .
Expanding further, we have two options again, either expand it to or to .
This can be repeated infinitely many times, so the grammar produces sequences of of any length. (Remember the Kleene star? We can use it to write that as instead).