The Power of Recursion

I wrote a few days ago about a recursive word-finding algorithm I used to solve Key Phrase ciphers. I’ve since learned that it works on simple substitution ciphers as long as word breaks are known and the words are in the reference list. Today I tried it on a Tridigital and it works there too once the blank digit is replaced with spaces.

In fact, the program is so fast that most times a sufficient solution appears on the screen in less than a second. I say sufficient, because there is usually something incorrect in the solution, typically one of the smaller words, e.g. “up” for “of” or “that” for “what.” Despite two or three of those, the overall meaning is clear and the wrong words can be easily replaced and the key determined.

Where it is slow is on those texts where the longest words are not near the top in the frequency lists, especially if they are not longer than eight letters. The program is likely to find a great many apparent solutions that are incorrect or simply finding many partially correct ones many levels deep before rejecting them.

I’ve found that tweaking the frequency lists to improve their accuracy makes a considerable difference. I’ve been using Google N-grams website to check my list data and found that even though I used Google’s data set, they don’t agree. There are several reasons for this such as Google measuring capitalized words differently from lower case and words with apostrophes as two words. I’ve been checking my list against a large text source I compiled made of dozens of books downloaded from gutenberg.org which also helps.

One thought on “The Power of Recursion”

  1. I’ve found an online source that has done the same thing I have. See http://www.norvig.com/mayzner.html. My frequencies don’t agree 100% with his or with Google N-grams for several reasons. I’ve already discussed why for N-grams. With Norvig, I believe the difference is mostly that I’ve restricted the frequencies in my data set to post 1920 whereas I don’t think Norvig did, and he doesn’t include the contractions at all. Other reasons might include whatever sorting and counting algorithms were used.

Leave a Reply

Your email address will not be published. Required fields are marked *