I just made a breakthrough in solving Key Phrase ciphers. I’ve written a recursive program that finds solutions by trying words of the right length to match patterns of the ciphertext. I have the program order the ct words by length and start with the longest, working down to shortest, eliminating combos that create conflicts.
I got the program working a few days ago but when I tested it on E-6 in the ND19 issue, it ran painfully slow – too slow for any practical use. Today I made a breakthrough, by realizing that the trick was to require the program to try common words first rather than go through word lists in alphabetical order. I first had to write a program that orders all the words in each separate length list, e.g. A3.txt, etc., and measure the frequency of each word numerically, sort the list, then write them out by frequency, e.g. Fq3.txt. This way the program gets to the most likely combinations much more quickly. Once it gets a conflict-free solution, it displays the solution in the correct word order. I’ve found that it can find hundreds or thousands of “valid” solutions of this type, but the correct one is near the top.
I tested it on E6 and it solved it perfectly in less than 30 seconds. I’ve since tested it on two more (published below for you to solve) and in each case it solved the cipher in less than ten seconds. The biggest problem is that it doesn’t judge whether the key is a valid sentence, so it may find several valid solutions consisting of one or a few words that are more common than the correct plaintext one. For example, in E6 it found solutions placing the word BY where the correct word should be UP. I may fine tune the program to score the solutions in some fashion to solve this. In the second example below, the 5th word proved to be the weak point, as it found 129 valid solutions before finding that one, all in a matter of seconds. As a practical matter this weakness is probably unimportant most of the time because by the time it has correctly identified the largest words, it is obvious to the user which words are valid and it is easy to reconstruct the key by hand and finish those dubious words.
Another weakness is its dependence on all the words being in my word lists. Proper nouns, new slang terms, hyphenated words, etc. may foul it up. I haven’t tested that type of plaintext, but I’ll be experimenting over the next few days. Okay, here are the two test encryptions I used. All the words are in normal word lists. No cribs are provided.
Test Keyphrase 1: FNENEA HNNEAEFE HGGRSHITAE EEEIHIT ENE TE IEF ENSSRIHETAHNI ETAEGGHAE TETBE ANFTFB HAE THTT NFHHA HI ELTEE
Test Keyphrase 2: HNYDIR ONIR H HWDDNDR RHND NWRWRAWN OKHO OKW YWRNWO IO H KHDDR RHNNNHRW NY OKNWW ENOOEW HINDY RINNW NNRKO DWHN