Regular Languages in Neo4j and Cypher

cs neo4j

Table of Contents

Counting Paths in Neo4j and Cypher explored how the Cypher graph database query language can be used to perform simple computations.

Here we'll first encode nondeterministic finite automata as graphs and then use MATCH with path patterns and reduce(...) to find words in the regular languages accepted by these automata.

Nondeterministic Finite Automata

Formally, a nondeterministic finite automaton (NFA) regular_languages_in_neo4j_and_cypher_d8711c90544b4a78642b114455c7778a0105e72c.svg is defined by

  • regular_languages_in_neo4j_and_cypher_b14568830498e5f9e22e45f4299ec475e81a6d74.svg , a finite set of states
  • regular_languages_in_neo4j_and_cypher_dc97c11e9eb9e5f7027b35a7ad9b1b7dc9b17a6e.svg , the initial state
  • regular_languages_in_neo4j_and_cypher_77636213e8f4ee2c857644901c8f3e05230493b9.svg , the subset of accepting states
  • regular_languages_in_neo4j_and_cypher_9b773f651fb570e0cdcdb8177a00b5a519c9cc2c.svg , a finite set of input symbols
  • regular_languages_in_neo4j_and_cypher_e0999d9c9a66aebcb18cee71c847581929369702.svg , the transition function

To get an intuition for how an automaton like this can be used to process a word regular_languages_in_neo4j_and_cypher_518775a092ed418f62890b5fc3ba0a240af10ebd.svg , we can keep track of the states we can reach and apply the transition function for each symbol regular_languages_in_neo4j_and_cypher_45908efff8d2a7736ec0db2055c56d5d4a2b897c.svg we read.

At first we can only reach the initial state regular_languages_in_neo4j_and_cypher_2bcab63045bc8336430e70c818af49a49f63d878.svg . regular_languages_in_neo4j_and_cypher_de360fff8db0a15e46b13a1184ae194db2be0d0b.svg is the set of states we reach after reading regular_languages_in_neo4j_and_cypher_b2dccfcc35b3f5d2c0cdb9ccd6bb6a56a22d9f79.svg .

For each new symbol we read, we apply the transition function to each state that was reachable and collect the results.

A word is a member of the regular language defined by this automaton if and only if we can reach at least one accepting state after reading all symbols of the word.

Formally:

regular_languages_in_neo4j_and_cypher_4dd2362ca4613aaf86b7b311b14c7c70cccd4eae.svg

regular_languages_in_neo4j_and_cypher_8238162e4684d87895cba1e46fca2e7d292312d3.svg

Representing Automata as Graphs

Finite automata can be expressed as directed graphs, with nodes representing the different states and relationships between nodes representing the transition function.

Nodes can have labels :Initial and :Accept while relationships have an on property storing the symbol regular_languages_in_neo4j_and_cypher_eb4eb2cb5d576a4c8b2cbbe66a6ff6cfb35e85ef.svg .

First Automaton, Language a*

This simple automaton has just one state that's both the initial and the accepting state.

For any symbol besides "a" the transition function will return the empty set and the search algorithm outlined above ends with regular_languages_in_neo4j_and_cypher_8f3e725b3627adab57afdd0746e484c85855657d.svg . 1

CREATE (s1:Initial:Accept)

CREATE (s1)-[:TRANSITION {on: "a"}]->(s1)
fa_0.svg

Second Automaton, Language (ab)*

To accept the language {"", "ab", "abab", "ababab", …}, we need an extra state to indicate that we've seen an "a" and are now expecting a "b".

CREATE (sa:Initial:Accept)

CREATE (sa)-[:TRANSITION {on: "a"}]->(sb)
CREATE (sb)-[:TRANSITION {on: "b"}]->(sa)
fa_1.svg

Third Automaton, Language (a|b)*abba

So far each node had at most one relationship for each symbol. 2

For this language, we can use two "a" relationships from (s1) to guess when to switch from matching (a|b)* to matching the suffix "abba". As long as there is a sequence of guesses that leads to an accepting state, the word will be accepted.

CREATE (s1:Initial)

CREATE (s1)-[:TRANSITION {on: "a"}]->(s1)
CREATE (s1)-[:TRANSITION {on: "a"}]->(sa)
CREATE (s1)-[:TRANSITION {on: "b"}]->(s1)
CREATE (sa)-[:TRANSITION {on: "b"}]->(sab)
CREATE (sab)-[:TRANSITION {on: "b"}]->(sabb)
CREATE (sabb)-[:TRANSITION {on: "a"}]->(s2:Accept)
fa_2.svg

Searching for Words

Once we have represented the finite automaton as a graph, we can search for paths from the :Initial node to any of the :Accept nodes, collecting the on property of the relationships along the way.

MATCH path = ((:Initial)-[:TRANSITION*]->(:Accept))
WITH reduce(word = "", rel IN relationships(path) | word + rel.on) AS word
RETURN word

For the language (a|b)*abba , this query returns the words 'abba', 'aabba', 'ababba', 'babba' and 'baabba'.

By default Neo4j does not allow duplicate edges in path patterns like this, and these five words are exactly the ones with a path from initial to accepting state that does not follow the same relationship more than once.

Cypher version 25 has MATCH REPEATABLE ELEMENTS that lifts this restriction but requires a bounded quantifier. 3

MATCH REPEATABLE ELEMENTS path = ((:Initial)-[:TRANSITION]->{7}(:Accept))
WITH reduce(word = "", rel IN relationships(path) | word + rel.on) AS word
RETURN word

This query returns all eight words of length seven: 'bbbabba', 'bbaabba', 'bababba', 'baaabba', 'abbabba', 'abaabba', 'aababba' and 'aaaabba'.

Footnotes

1

An equivalent deterministic finite automaton would require a second "rejecting" node and additional relationships so that each node has one outgoing relationship per symbol in the alphabet

2

In these cases the underlying finite automata were deterministic

3

As long as the maximal length of the path is limited, there are finitely many results and the query will terminate.


If you have an idea how this page could be improved or a comment send me a mail.