Computing with Pattern Substitution

Table of Contents

In "The Art of Computer Programming", Volume 1, Section 1.1, Knuth introduces formalism a for describing algorithms based on pattern-matching and string substitutions.

There each algorithm is made up of computing_with_pattern_substitution_886ff4ecd47691acfd37e0032592d001672992c8.svg rules and works on a state of the form . ¹

computing_with_pattern_substitution_b489a49c1352fe29cb986bcb682710d519a74c44.svg and are strings over an alphabet (elements of ),
and are numbers in .

At each step of the computation, we check if computing_with_pattern_substitution_c8856838e47381e8e130ab756b4cc7e48dfbbc39.svg contains the pattern . If so, its first occurrence is replaced with and of the state is set to . Otherwise remains unchanged and is set to .

When computing_with_pattern_substitution_adf844519d2265d68fad90dc05f395655a8815cb.svg the computation terminates, is the result.

First Example

Input: computing_with_pattern_substitution_62ecb72870e99d5b11e6702d4063fb55ba97a690.svg ( times the character )
Task: Determine if is even.
Output: if is even, otherwise.

This can be accomplished using the following three rules with computing_with_pattern_substitution_978865a0ce28b9b2e452982935c95a689fc211e7.svg ( is the empty string):

Start by removing two computing_with_pattern_substitution_e657a090c3791a49addca3ff4eeda806ad85ddf0.svg s at a time until it is not possible anymore.
If a single remains output , otherwise output .

Interpreter

Writing algorithms in this style is tedious because of the need to keep track of the rule indices, using labels instead would make it much easier to write and refactor programs.

I've come up with a simple format

label1
  pattern1 substitution1 label_else1 label_then1
label2
  pattern2 substitution2 label_else2 label_then2
...

pattern and substitution can be strings or _ (the empty string computing_with_pattern_substitution_1e8133c28e06fb6d8c11097c2b2526df9a7e5ed1.svg ) and label_else/then can be one of the other labels or end , a placeholder for .

The rules from before could be rewritten as:

remove_aa
  aa _ check_remaining remove_aa
check_remaining
  a odd output_even end
output_even
  _ even end end

Below is a simple parser that converts this format to rules with indices as computing_with_pattern_substitution_e917047c993c0e5534ba1e53c8f36054165853ea.svg and (plus some bonus features like comments). ²

Rule = Struct.new(:label, :pattern, :substitution, :j_else, :j_then)

def parse(program)
  # Remove comments and empty lines
  pairs = program.lines
    .reject { |l| l.match(/^\s*#.*/) }
    .reject { |l| l.match(/^\s*$/) }
    .map(&:rstrip)
    .each_slice(2)

  # Create a mapping from label to rule index
  labels = Hash.new { |_hash, key| raise "Unknown label #{key}" }
  labels['end'] = -1
  pairs.each_with_index { |(label, _body), i| labels[label] = i }

  # Convert the (label, body) pairs to rules
  pairs.map do |label, body|
    pattern, substitution, l_else, l_then = body.lstrip.split(' ')
    substitution = '' if substitution == '_'
    pattern = '' if pattern == '_'

    Rule.new(label, pattern, substitution, labels[l_else], labels[l_then])
  end
end

The code below is a direct translation of the formal description from earlier with optional logging of each step of the computation.

def run(state, rules, logging = false)
  j, steps = 0, 0

  # Get the length of the longest rule for nicer formatting
  max_length = rules.map{ |r| r.label.length }.max
  until j == -1 do
    rule = rules[j]

    puts "#{rule.label.ljust(max_length, ' ')} | #{state}" if logging

    # Check if `state` contains the `pattern`
    if (index = state.index(rule.pattern))
      state.sub!(rule.pattern, rule.substitution)
      j = rule.j_then
    else
      j = rule.j_else
    end
    steps += 1
  end

  puts "#{'end'.ljust(max_length, ' ')} | #{state}" if logging
  puts "Steps: #{steps}" if logging
  state
end

Let's make sure it works:

def encode_input(n)
  'a' * n
end

run(encode_input(5), parse(program), true)

remove_aa       | aaaaa
remove_aa       | aaa
remove_aa       | a
check_remaining | a
end             | odd
Steps: 4

Greatest Common Divisor

One of the exercises in the book is to think of a set of rules
that calculates computing_with_pattern_substitution_eeeac07bfb6415bca26973fe19083bf455404a6f.svg given the input using a modified version of Euclid's algorithm .

Set ³
If , is the result
Set and go to step 1

After a few hours I've come up with the following solution:

Duplicate the input

copy_as
  a ce copy_bs copy_as
copy_bs
  b df move_cs_left copy_bs
move_cs_left
  ec ce move_ds_left move_cs_left
move_ds_left
  fd df move_ds_left2 move_ds_left
move_ds_left2
  ed de calc_abs_diff move_ds_left2

This converts the state to computing_with_pattern_substitution_78ba8b34785f7b94d6355e73a4d2236b774ef3a8.svg and then moves the $c$s and $d$s around to yield .

Calculate

calc_abs_diff
  cd _ test_r_zero_c calc_abs_diff
test_r_zero_c
  c c test_r_zero_d calc_min
test_r_zero_d
  d d return_r_remove_es calc_min
return_r_remove_es
  e _ return_r_convert_fa return_r_remove_es
return_r_convert_fa
  f a end return_r_convert_fa

Delete $cd$s until either computing_with_pattern_substitution_f79acf9f50443e6603ce3c33203d23ab04994563.svg or with remains.

If both test_r_zero_c and test_r_zero_d fail, computing_with_pattern_substitution_b38095cac8fadd05037a9d3640b51f29930b1d40.svg is zero and we need to return as the result by removing all $e$s and changing all $f$s to $a$s.

After this step, the state is computing_with_pattern_substitution_e453dc5acf9231f7c9c5389a20c5c9e2f4fa4120.svg or .

Calculate

If the result of the previous calculation has the form computing_with_pattern_substitution_7aafb1f9048934ac64de9d0a463a69475a312f83.svg must be greater than $m$
(there were more $d$s than $c$s), remove the $f$s and convert the $e$s to $a$s. Otherwise, remove the $e$s and convert the $f$s to $a$s.

calc_min
  d d remove_es remove_fs
remove_es
  e _ convert_fa remove_es
convert_fa
  f a convert_cb convert_fa
remove_fs
  f _ convert_ea remove_fs
convert_ea
  e a convert_cb convert_ea

After this step, the state is computing_with_pattern_substitution_e73045b27383f391a7ba3c05f73ccc400ba6a842.svg or .

Next iteration

First change the computing_with_pattern_substitution_f79acf9f50443e6603ce3c33203d23ab04994563.svg or to , then move the $a$s to the front.

convert_cb
  c b convert_db convert_cb
convert_db
  d b move_as_left convert_db
move_as_left
  ba ab copy_as move_as_left

After this step, the state is computing_with_pattern_substitution_3db57f44f8034a300ff4fb32b5c4b1c13ed5a61c.svg ( with and ) and we can jump back to the start ( copy_as ).

Output

A log of the 75 steps needed to compute computing_with_pattern_substitution_7133981acf5d4d1df8c0a19d9ac3e9929d291832.svg :

copy_as             | aabbbb
copy_as             | ceabbbb
copy_as             | cecebbbb
copy_bs             | cecebbbb
copy_bs             | cecedfdfdfdf
move_cs_left        | cecedfdfdfdf
move_cs_left        | cceedfdfdfdf
move_ds_left        | cceedfdfdfdf
move_ds_left        | cceeddffdfdf
move_ds_left        | cceeddfdffdf
move_ds_left        | cceedddfffdf
move_ds_left        | cceedddffdff
move_ds_left        | cceedddfdfff
move_ds_left        | cceeddddffff
move_ds_left2       | cceeddddffff
move_ds_left2       | ccededddffff
move_ds_left2       | ccdeedddffff
move_ds_left2       | ccdededdffff
move_ds_left2       | ccddeeddffff
move_ds_left2       | ccddededffff
move_ds_left2       | ccdddeedffff
move_ds_left2       | ccdddedeffff
move_ds_left2       | ccddddeeffff
calc_abs_diff       | ccddddeeffff
calc_abs_diff       | cdddeeffff
calc_abs_diff       | ddeeffff
test_r_zero_c       | ddeeffff
test_r_zero_d       | ddeeffff
calc_min            | ddeeffff
remove_fs           | ddeeffff
remove_fs           | ddeefff
remove_fs           | ddeeff
remove_fs           | ddeef
remove_fs           | ddee
convert_ea          | ddee
convert_ea          | ddae
convert_ea          | ddaa
convert_cb          | ddaa
convert_db          | ddaa
convert_db          | bdaa
convert_db          | bbaa
move_as_left        | bbaa
move_as_left        | baba
move_as_left        | abba
move_as_left        | abab
move_as_left        | aabb
copy_as             | aabb
copy_as             | ceabb
copy_as             | cecebb
copy_bs             | cecebb
copy_bs             | cecedfb
copy_bs             | cecedfdf
move_cs_left        | cecedfdf
move_cs_left        | cceedfdf
move_ds_left        | cceedfdf
move_ds_left        | cceeddff
move_ds_left2       | cceeddff
move_ds_left2       | ccededff
move_ds_left2       | ccdeedff
move_ds_left2       | ccdedeff
move_ds_left2       | ccddeeff
calc_abs_diff       | ccddeeff
calc_abs_diff       | cdeeff
calc_abs_diff       | eeff
test_r_zero_c       | eeff
test_r_zero_d       | eeff
return_r_remove_es  | eeff
return_r_remove_es  | eff
return_r_remove_es  | ff
return_r_convert_fa | ff
return_r_convert_fa | af
return_r_convert_fa | aa
end                 | aa

As expected the result is computing_with_pattern_substitution_7b161bafe1745efbb0e56e7d84130a1a0aa06bac.svg (or rather ).

A Better Solution

To my surprise, Knuth's solution to this problem has only five rules. What is he doing differently?

One key observation is that the step calc_abs_diff to calculate computing_with_pattern_substitution_2121c8a1c1cff3b8c77a8319a6deb5676f2f0096.svg is executed exactly times, with some careful coding both values can be calculated simultaneously.

Furthermore, computing_with_pattern_substitution_b70dc50ebf0779e60a801bef9eac6867c8d5bb52.svg , so there is no need to keep a copy of the original values around.

Implementation

Start by replacing $ab$s with computing_with_pattern_substitution_0662ab8e14cdb87d72122d7c7646d5a748035ed6.svg and moving the to the left.

start
  ab c convert_ab move_c_left
move_c_left
  ac ca start move_c_left

After this, the state is either computing_with_pattern_substitution_c03353910d440c4ca6cf5e1c572f819b94ff6a8c.svg or .

convert_ab
  a b convert_ca convert_ab
convert_ca
  c a test_r_zero convert_ca
test_r_zero
  b b end start

Then rename the $a$s to $b$s and the $c$s to $a$s. ⁴

Now the state is computing_with_pattern_substitution_fd227d8b9c9e30cb28b601006af44c191ab0aea9.svg .
If there are no $b$s in the state, must be zero, and we can jump to end because is the correct result .

In this improved version computing_with_pattern_substitution_7133981acf5d4d1df8c0a19d9ac3e9929d291832.svg only needs 22 steps:

start       | aabbbb
move_c_left | acbbb
move_c_left | cabbb
start       | cabbb
move_c_left | ccbb
start       | ccbb
convert_ab  | ccbb
convert_ca  | ccbb
convert_ca  | acbb
convert_ca  | aabb
test_r_zero | aabb
start       | aabb
move_c_left | acb
move_c_left | cab
start       | cab
move_c_left | cc
start       | cc
convert_ab  | cc
convert_ca  | cc
convert_ca  | ac
convert_ca  | aa
test_r_zero | aa
end         | aa

One Step Further: Primality Testing

Let's try to solve one more problem: given an input computing_with_pattern_substitution_62ecb72870e99d5b11e6702d4063fb55ba97a690.svg , determine whether is a prime number or not.

There are many (and easier) ways of doing this, but to show how algorithms in this formalism can be combined to solve more complicated problems, I'll use the computing_with_pattern_substitution_2e8455213b936062be01dd1060b6070166e1f432.svg procedure from the previous section.

A number computing_with_pattern_substitution_5fde1ea98712ce221077b775bb8b84a1986c8218.svg is prime if each number from to is coprime to it ( ). In our formalism the check is hard to do, instead we can count down from .

Count down a value from to
If return
At each step, if return , otherwise continue with step 1

Implementation

First, we need to handle the special case computing_with_pattern_substitution_74a186cc1bfb49e12368e4467bb127352b0cf94f.svg and copy the $a$s to create the counter .

check1
  aa aa np_clear_c copy_input
copy_input
  a cd move_d_right copy_input
move_d_right
  dc cd subtract1 move_d_right

Then we decrement computing_with_pattern_substitution_852ef67c54a3b23d9a8a6d5577477bbda3e3dcb4.svg (it starts at ) and output if the result is one.

subtract1
  dd d check_done check_done
check_done
  dd dd p_clear_c copy_c

p_clear_c and its counterpart np_clear_c clear the state and output either computing_with_pattern_substitution_4088681e528b30a08ff53cd41c349d5bba181785.svg or .

p_clear_c
  c _ p_clear_d p_clear_c
p_clear_d
  d _ p_clear_a p_clear_d
p_clear_a
  a _ p_output p_clear_a
p_output
  _ prime end end

np_clear_c
  c _ np_clear_d np_clear_c
np_clear_d
  d _ np_clear_a np_clear_d
np_clear_a
  a _ np_output np_clear_a
np_output
  _ notprime end end

Because the computing_with_pattern_substitution_2e8455213b936062be01dd1060b6070166e1f432.svg procedure destroys the state, we need to make a copy of and .

This is done by converting the state computing_with_pattern_substitution_e903f0e88e287de97be1cb7b687836815d5bf265.svg to , moving the $a$s and $b$s to the center and then renaming back to . ⁵

copy_c
  c Ca copy_d copy_c
copy_d
  d bD move_a_right copy_d
move_a_right
  aC Ca move_b_left move_a_right
move_b_left
  Db bD convert_Cc move_b_left
convert_Cc
  C c convert_Dd convert_Cc
convert_Dd
  D d start_gcd convert_Dd

Now the state has the form computing_with_pattern_substitution_5504f8d3bcf2b24034b63cc09ef1859181f4ac63.svg and we can call the procedure from the previous section. ⁶

start_gcd
  ab e convert_ab move_e_left
move_e_left
  ae ea start_gcd move_e_left
convert_ab
  a b convert_ea convert_ab
convert_ea
  e a test_r_zero convert_ea
test_r_zero
  b b test_coprime start_gcd

If we can't find the pattern computing_with_pattern_substitution_de99d11dc22c9f48a0915e627cfa8a3d0dced95b.svg afterwards, the numbers are coprime, and we can continue with , otherwise is not a prime.

test_coprime
  aa aa next np_clear_c
next
  a _ subtract1 subtract1

Even for small numbers this needs a lot of steps and I've removed some boring sections from the output:

check1       | aaaa
copy_input   | aaaa
copy_input   | cdaaa
copy_input   | cdcdaa
copy_input   | cdcdcda
copy_input   | cdcdcdcd
move_d_right | cdcdcdcd
...
move_d_right | ccccdddd
subtract1    | ccccdddd
check_done   | ccccddd
copy_c       | ccccddd
...
convert_Dd   | ccccaaaabbbddd
start_gcd    | ccccaaaabbbddd
move_e_left  | ccccaaaebbddd
move_e_left  | ccccaaeabbddd
move_e_left  | ccccaeaabbddd
move_e_left  | cccceaaabbddd
start_gcd    | cccceaaabbddd
move_e_left  | cccceaaebddd
move_e_left  | cccceaeabddd
move_e_left  | cccceeaabddd
start_gcd    | cccceeaabddd
move_e_left  | cccceeaeddd
move_e_left  | cccceeeaddd
start_gcd    | cccceeeaddd
convert_ab   | cccceeeaddd
convert_ab   | cccceeebddd
convert_ea   | cccceeebddd
convert_ea   | ccccaeebddd
convert_ea   | ccccaaebddd
convert_ea   | ccccaaabddd
test_r_zero  | ccccaaabddd
start_gcd    | ccccaaabddd
move_e_left  | ccccaaeddd
move_e_left  | ccccaeaddd
move_e_left  | cccceaaddd
start_gcd    | cccceaaddd
convert_ab   | cccceaaddd
convert_ab   | ccccebaddd
convert_ab   | ccccebbddd
convert_ea   | ccccebbddd
convert_ea   | ccccabbddd
test_r_zero  | ccccabbddd
start_gcd    | ccccabbddd
move_e_left  | ccccebddd
start_gcd    | ccccebddd
convert_ab   | ccccebddd
convert_ea   | ccccebddd
convert_ea   | ccccabddd
test_r_zero  | ccccabddd
start_gcd    | ccccabddd
move_e_left  | cccceddd
start_gcd    | cccceddd
convert_ab   | cccceddd
convert_ea   | cccceddd
convert_ea   | ccccaddd
test_r_zero  | ccccaddd
test_coprime | ccccaddd
next         | ccccaddd
subtract1    | ccccddd
check_done   | ccccdd
copy_c       | ccccdd
...
convert_Dd   | ccccaaaabbdd
start_gcd    | ccccaaaabbdd
move_e_left  | ccccaaaebdd
move_e_left  | ccccaaeabdd
move_e_left  | ccccaeaabdd
move_e_left  | cccceaaabdd
start_gcd    | cccceaaabdd
move_e_left  | cccceaaedd
move_e_left  | cccceaeadd
move_e_left  | cccceeaadd
start_gcd    | cccceeaadd
convert_ab   | cccceeaadd
convert_ab   | cccceebadd
convert_ab   | cccceebbdd
convert_ea   | cccceebbdd
convert_ea   | ccccaebbdd
convert_ea   | ccccaabbdd
test_r_zero  | ccccaabbdd
start_gcd    | ccccaabbdd
move_e_left  | ccccaebdd
move_e_left  | cccceabdd
start_gcd    | cccceabdd
move_e_left  | cccceedd
start_gcd    | cccceedd
convert_ab   | cccceedd
convert_ea   | cccceedd
convert_ea   | ccccaedd
convert_ea   | ccccaadd
test_r_zero  | ccccaadd
test_coprime | ccccaadd
np_clear_c   | ccccaadd
...
np_output    |
end          | notprime

For larger numbers and primes the results are correct, too, but I won't include the logs here.

Footnotes

In the book the parts are named differently

computing_with_pattern_substitution_baa18e4d3d3ee567444e318724948e1eff64f217.svg is used instead of to signify the end of the computation, this way we don't need to know how many rules there are

computing_with_pattern_substitution_9b0dc582ca3d58dd0be3ee48f361173ba9b0d3bd.svg is the absolute value, e.g. and

⁴

convert_ab is only needed in the case computing_with_pattern_substitution_5d0927be496261f2473f0b1c3595ba1f41963589.svg

⁵

Renaming them in the first place is necessary to avoid an endless loop in the rule

⁶

Altered slightly to use computing_with_pattern_substitution_9fac8519d59803305143aa242d933f0dd53ecd8d.svg instead of

If you have an idea how this page could be improved or a comment send me a mail.