I wrote a few days ago about a recursive word-finding algorithm I used to solve Key Phrase ciphers. I’ve since learned that it works on simple substitution ciphers as long as word breaks are known and the words are in the reference list. Today I tried it on a Tridigital and it works there too once the blank digit is replaced with spaces.
In fact, the program is so fast that most times a sufficient solution appears on the screen in less than a second. I say sufficient, because there is usually something incorrect in the solution, typically one of the smaller words, e.g. “up” for “of” or “that” for “what.” Despite two or three of those, the overall meaning is clear and the wrong words can be easily replaced and the key determined.
Where it is slow is on those texts where the longest words are not near the top in the frequency lists, especially if they are not longer than eight letters. The program is likely to find a great many apparent solutions that are incorrect or simply finding many partially correct ones many levels deep before rejecting them.
I’ve found that tweaking the frequency lists to improve their accuracy makes a considerable difference. I’ve been using Google N-grams website to check my list data and found that even though I used Google’s data set, they don’t agree. There are several reasons for this such as Google measuring capitalized words differently from lower case and words with apostrophes as two words. I’ve been checking my list against a large text source I compiled made of dozens of books downloaded from gutenberg.org which also helps.