Table of Contents
Counting Paths in Neo4j and Cypher explored how the Cypher graph database query language can be used to perform simple computations.
Here we'll first encode nondeterministic finite automata as graphs
and then use
MATCH
with path patterns and
reduce(...)
to find
words in the regular languages accepted by these automata.
Nondeterministic Finite Automata
Formally, a nondeterministic finite automaton (NFA)
is defined by
-
, a finite set of states
-
, the initial state
-
, the subset of accepting states
-
, a finite set of input symbols
-
, the transition function
To get an intuition for how an automaton like this can be used to
process a word
, we can keep track of the
states we can reach and apply the transition function for each symbol
we read.
At first we can only reach the initial state
.
is
the set of states we reach after reading
.
For each new symbol we read, we apply the transition function to each state that was reachable and collect the results.
A word is a member of the regular language defined by this automaton if and only if we can reach at least one accepting state after reading all symbols of the word.
Formally:
Representing Automata as Graphs
Finite automata can be expressed as directed graphs, with nodes representing the different states and relationships between nodes representing the transition function.
Nodes can have labels
:Initial
and
:Accept
while relationships have an
on
property storing the symbol
.
First Automaton, Language a*
This simple automaton has just one state that's both the initial and the accepting state.
For any symbol besides "a" the transition function will return the
empty set and the search algorithm outlined above ends with
.
1
CREATE (s1:Initial:Accept)
CREATE (s1)-[:TRANSITION {on: "a"}]->(s1)
Second Automaton, Language (ab)*
To accept the language {"", "ab", "abab", "ababab", …}, we need an extra state to indicate that we've seen an "a" and are now expecting a "b".
CREATE (sa:Initial:Accept)
CREATE (sa)-[:TRANSITION {on: "a"}]->(sb)
CREATE (sb)-[:TRANSITION {on: "b"}]->(sa)
Third Automaton, Language (a|b)*abba
So far each node had at most one relationship for each symbol. 2
For this language, we can use two "a" relationships from
(s1)
to guess
when to switch from matching (a|b)* to matching the suffix "abba".
As long as there is a sequence of guesses that leads to an accepting
state, the word will be accepted.
CREATE (s1:Initial)
CREATE (s1)-[:TRANSITION {on: "a"}]->(s1)
CREATE (s1)-[:TRANSITION {on: "a"}]->(sa)
CREATE (s1)-[:TRANSITION {on: "b"}]->(s1)
CREATE (sa)-[:TRANSITION {on: "b"}]->(sab)
CREATE (sab)-[:TRANSITION {on: "b"}]->(sabb)
CREATE (sabb)-[:TRANSITION {on: "a"}]->(s2:Accept)
Searching for Words
Once we have represented the finite automaton as a graph, we can
search for paths from the
:Initial
node to any of the
:Accept
nodes, collecting the
on
property of the relationships along the way.
MATCH path = ((:Initial)-[:TRANSITION*]->(:Accept)) WITH reduce(word = "", rel IN relationships(path) | word + rel.on) AS word RETURN word
For the language
(a|b)*abba
, this query returns the words 'abba',
'aabba', 'ababba', 'babba' and 'baabba'.
By default Neo4j does not allow duplicate edges in path patterns like this, and these five words are exactly the ones with a path from initial to accepting state that does not follow the same relationship more than once.
Cypher version 25 has
MATCH REPEATABLE ELEMENTS
that lifts this
restriction but requires a bounded quantifier.
3
MATCH REPEATABLE ELEMENTS path = ((:Initial)-[:TRANSITION]->{7}(:Accept))
WITH reduce(word = "", rel IN relationships(path) | word + rel.on) AS word
RETURN word
This query returns all eight words of length seven: 'bbbabba', 'bbaabba', 'bababba', 'baaabba', 'abbabba', 'abaabba', 'aababba' and 'aaaabba'.
Footnotes
An equivalent deterministic finite automaton would require a second "rejecting" node and additional relationships so that each node has one outgoing relationship per symbol in the alphabet
In these cases the underlying finite automata were deterministic
As long as the maximal length of the path is limited, there are finitely many results and the query will terminate.