The pt is from two unrelated Gutenberg texts. My hillclimber solved this without the crib, but it took over 500 million trials. I had to let it run overnight. My analyzer identified the type correctly by a good margin.
Above are the results of running my new test on a series of plaintexts (pt) from Gutenberg.org enciphered with the cipher types shown. The number of pts was over 3000 in every case. The test clearly has some value in distinguishing cipher types based on the ct. However, as I analyzed the results, I see they are very similar to the Normor test and for a very simple reason. Both tests tend to measure the extent to which a cipher type enciphers pt with letters that have the same or similar frequency as the pt.
The Two-Square is a good example to understand this. Whenever both pt letters are on the same row, the resulting ct digraph is a reversal of those same two letters. Similarly, if one or both the left and right squares use vertical keywords, the resulting digraph is likely to contain a letter from the same keyword used for that vertical (column) key. For this reason, the Two-square ct tends to be closer to normal pt frequencies. When you consider how the other ciphers are constructed, it is not difficult to see that the numbers reflect this same phenomenon.
I believe this new test is simply another way of measuring the same thing the Normor test does. The real question is whether it does so more accurately and reliably, i.e. with finer, sharper separation between types. The data cts used here were enciphered with a random choice of keyword and route, but when I first invented the Normor test, I checked it on different polybius square routes. The results were strikingly different depending on the route chosen. Again, if you consider how some routes tend to cause the ct letter to be the same letter as the pt, or from the same keyword, and others don’t, especially where two different keysquares are used, this becomes understandable.
Here’s a randomly generated cipher of unknown type. Two of my ID tests put the correct type on top and two put it in second place. Hill-climber got a practical solution without a crib, and I put in a crib with some obvious corrections and the hill-climber got the exact solution. Converging key search solved it without a crib. I’ll add a Caesar-shifted crib.
BION some time ago requested anyone with a way to distinguish a Two-Square ciphertext(ct) from a Foursquare ciphertext to share it. I may have found a simple test for that. See this chart.
I ran my new test on the 10,000 examples in BION’s data set of plaintext enciphered twice, once using Two-square and once using Foursquare. There was a significant difference in the test results for the two types. In the graph, the bars represent how many of the 10,000 ciphertexts of each type scored in the range indicated, with the X-axis numbers representing the top of the range. For example, as indicated by the bars directly over the label of 0.6, 1657 of the 10,000 Two-Square cts scored between 0.55 and 0.6 on my new test, while 530 of the Foursquare cts scored in that same range. All the ct samples here were of length 200. The results were less clear for shorter texts. Normal plaintext (i.e., not the stilted P-12 types) averaged about 0.3, depending on length. For those of length 200 or more, the average was .25, with very few scoring over 0.3. For my plaintext samples I used my solutions to ACA ciphers, including stilted types like the P-12s.
The test is similar to the Normor test and is simple to program, but not as simple for pen-and-pencil solvers. To compute it, count the number of appearances of every letter of the alphabet and for each divide that by the length of the ciphertext. Then take a sum of all the differences (absolute values) between the observed and expected values for each letter. I used the frequency counts shown in Caxton Foster’s Cryptanalysis for Microcomputers Appendix A (English) as my baseline expected values. For example, if I counted 20 E’s in the ciphertext (using BION’s data, i.e. a 200-letter ct), that’s 10% or 0.1 of the total. The expected value in English is .125 for E. The difference is .025. Add that to the difference for every other letter, including the ones that do not appear in the ct. That is the result of the test. My code in Delphi is below.
for i := 1 to length(S) do if S[i] in ['a'..'z'] then freq[ord(S[i]),2] := freq[ord(S[i]),2]+1; for j := 97 to 122 do freq[j,2] := freq[j,2]/Length(S); r := 0; for k := 97 to 122 do r := r + abs(freq[k,1]-freq[k,2]);
The value of r is the result of the test. The first row of the “freq” array (i.e. freq[97,1], freq[98,1] etc.) must first be populated with the expected frequencies of the letters in the target language. S is the ciphertext string. The first line counts the observed letters and populates the array [.,2] positions with the results. The second line divides those values by the length of the ciphertext. The final line computes the running total of all the differences. The test is so simple that someone before me must have already invented it, but, if so, it has escaped my notice.
I didn’t intend this as a diagnostic test for type. Rather, I was hoping to solve another problem. Most cipher types, when decrypted with a wrong key, produce fewer high-frequency letters than normal plaintext, and more low-frequency letters than normal plaintext. The Normor test produces high numbers for these while the tetragram frequency scores are low. As the solution gets closer to correct, this changes to be closer to normal; the Normor test results get smaller as tetragram frequency scoring gets higher. However, some types, such as Morse-based types and the Grandpre, during hillclimbing often produce false solutions that outscore the correct solution because they produce many high-frequency letters. Since the frequency order of the letters may be close to normal, the Normor test doesn’t help. I was hoping this new test, which for now I’m just calling the frequency test, would provide a tool during hillclimbing scoring to prevent these false solutions. I tried implementing it on a Grandpre word-based hillclimbing program without success so far, but I may still try using it for that purpose. I have not incorporated it in my Analyzer yet, nor have I tested it on other cipher types besides the two shown in the chart.
Here’s a randomly generated cipher of unknown type. All ID tests got the correct type. Hill-climber solved it without a crib. Will put in a caesar shifted crib. APDQJ THMDL GDRWL JTJOL UKPIE JRWMU UMKLJ IQUYA EPPMM OJMUJ WUHQS MNCCB FEICY PQMSQ KMGFT PDPVE BKQBV AWWIW NSDLQ EIYQR KQFCN SYNVF EEKJT MNJRF HEUPC IWNJW MAUSL PODJD WMJVJ TJPEE MUAV.
Here’s another function to help identify a specific cipher type, in this case Portax.
function Ptaxtest(S:string): integer;
var i,hi, lo : integer;
hi :=0; lo := 0;
for i :=1 to Length(S) do
if S[i] in ['w','j','m','b'] then hi := hi+1;
if S[i] in ['r','s','y','z'] then lo := lo+1;
if lo=0 then lo:=1;
Ptaxtest := Round(20*hi/lo);
The letters w,j,m,b are more frequent in Portax ciphers because they tend to encipher the high-frequency plaintext letters on the same columns on the lower (sliding) part of the key slide. Similarly the second set of letters tend to be low-frequency. Computing the ratio of these two sets is a good indicator of a Portax.