Context free grammars are a concept originating in linguistics but widely used in (theoretical) computer science.
For this section, I'm going to use a bunch of mathematical notation, but I'll try to explain everything as it comes along.
A
set
is collection of elements
where each element appears only once.
The set of the number from one to four can be written as
.
An alphabet is a set with a finite number of elements (the natural numbers 1, 2, 3, …, would be an example for an infinite set).
A context free grammar has four components:
-
An alphabet
of terminals (read: “things where the expansion terminates”)
-
An alphabet
of non-terminals (read: “things that can expand into other things”)
-
A set
of production rules
-
The
start symbol
, an element of the set of non-terminals
It's important that the terminal and non-terminal alphabets are disjoint , i.e. there is no element that is part of both sets.
Production rules have the form
.
can be read as “Combine the sets
and
”,
is called
Kleene star
and can be read as “repeated zero or more times”.
The part on the right of the
is called
right hand side
,
or
RHS
, the part on the left
left hand side
or
LHS
.
A grammar is expanded starting with the start symbol and then replacing each non-terminal with the RHS of one of its rules.
Time for a simple example:
Starting from
, the grammar expands into
, then each
expands into
,
so the end result is
.
There can be multiple rules with the same right hand side , so there are (possibly infinitely) many sequences a grammar can expand to.
is the notation for an empty sequence,
so when expanding
, the
just disappears.
Expanding this grammar, we have two options,
either expand
to
or to
.
Expanding
further,
we have two options again,
either expand it to
or to
.
This can be repeated infinitely many times,
so the grammar produces sequences of
of any length.
(Remember the Kleene star? We can use it to write that as
instead).