Unknown 4-23-2019

KWVYI OKSGQ MCETS LECCK NNLBD GOSAX EWGRN RMDXO SBGAW AKGBN SFUBE OLMSR HGGEL MHHRI OBXDO RAIDS HKIUE MXIEX IPEWC HUNPA YARCH

Hint: SHULRGQLQH ends: DQGFULHGDK

The pt is from two unrelated Gutenberg texts. My hillclimber solved this without the crib, but it took over 500 million trials. I had to let it run overnight. My analyzer identified the type correctly by a good margin.

New test analysis


BifidCM
Bifid
PlayfrFrac
Morse
Baze2sq4sq
Mean0.660.8740.7240.8250.9500.6370.748
Max1.0041.2631.1001.2051.2151.1741.148
Min0.3030.5490.3310.4830.7190.2910.39
Mode0.6850.9280.8590.7790.8800.730.758

Above are the results of running my new test on a series of plaintexts (pt) from Gutenberg.org enciphered with the cipher types shown. The number of pts was over 3000 in every case. The test clearly has some value in distinguishing cipher types based on the ct. However, as I analyzed the results, I see they are very similar to the Normor test and for a very simple reason. Both tests tend to measure the extent to which a cipher type enciphers pt with letters that have the same or similar frequency as the pt.

The Two-Square is a good example to understand this. Whenever both pt letters are on the same row, the resulting ct digraph is a reversal of those same two letters. Similarly, if one or both the left and right squares use vertical keywords, the resulting digraph is likely to contain a letter from the same keyword used for that vertical (column) key. For this reason, the Two-square ct tends to be closer to normal pt frequencies. When you consider how the other ciphers are constructed, it is not difficult to see that the numbers reflect this same phenomenon.

I believe this new test is simply another way of measuring the same thing the Normor test does. The real question is whether it does so more accurately and reliably, i.e. with finer, sharper separation between types. The data cts used here were enciphered with a random choice of keyword and route, but when I first invented the Normor test, I checked it on different polybius square routes. The results were strikingly different depending on the route chosen. Again, if you consider how some routes tend to cause the ct letter to be the same letter as the pt, or from the same keyword, and others don’t, especially where two different keysquares are used, this becomes understandable.

New test

BION some time ago requested anyone with a way to distinguish a Two-Square ciphertext(ct) from a Foursquare ciphertext to share it. I may have found a simple test for that. See this chart.

I ran my new test on the 10,000 examples in BION’s data set of plaintext enciphered twice, once using Two-square and once using Foursquare. There was a significant difference in the test results for the two types. In the graph, the bars represent how many of the 10,000 ciphertexts of each type scored in the range indicated, with the X-axis numbers representing the top of the range. For example, as indicated by the bars directly over the label of 0.6, 1657 of the 10,000 Two-Square cts scored between 0.55 and 0.6 on my new test, while 530 of the Foursquare cts scored in that same range. All the ct samples here were of length 200. The results were less clear for shorter texts. Normal plaintext (i.e., not the stilted P-12 types) averaged about 0.3, depending on length. For those of length 200 or more, the average was .25, with very few scoring over 0.3. For my plaintext samples I used my solutions to ACA ciphers, including stilted types like the P-12s.

The test is similar to the Normor test and is simple to program, but not as simple for pen-and-pencil solvers. To compute it, count the number of appearances of every letter of the alphabet and for each divide that by the length of the ciphertext. Then take a sum of all the differences (absolute values) between the observed and expected values for each letter. I used the frequency counts shown in Caxton Foster’s Cryptanalysis for Microcomputers Appendix A (English) as my baseline expected values. For example, if I counted 20 E’s in the ciphertext (using BION’s data, i.e. a 200-letter ct), that’s 10% or 0.1 of the total. The expected value in English is .125 for E. The difference is .025. Add that to the difference for every other letter, including the ones that do not appear in the ct. That is the result of the test. My code in Delphi is below.

for i := 1 to length(S) do if S[i] in ['a'..'z'] then 
freq[ord(S[i]),2] := freq[ord(S[i]),2]+1;
for j := 97 to 122 do freq[j,2] := freq[j,2]/Length(S);
r := 0;
for k := 97 to 122 do r := r + abs(freq[k,1]-freq[k,2]);

The value of r is the result of the test. The first row of the “freq” array (i.e. freq[97,1], freq[98,1] etc.) must first be populated with the expected frequencies of the letters in the target language. S is the ciphertext string. The first line counts the observed letters and populates the array [.,2] positions with the results. The second line divides those values by the length of the ciphertext. The final line computes the running total of all the differences. The test is so simple that someone before me must have already invented it, but, if so, it has escaped my notice.

I didn’t intend this as a diagnostic test for type. Rather, I was hoping to solve another problem. Most cipher types, when decrypted with a wrong key, produce fewer high-frequency letters than normal plaintext, and more low-frequency letters than normal plaintext. The Normor test produces high numbers for these while the tetragram frequency scores are low. As the solution gets closer to correct, this changes to be closer to normal; the Normor test results get smaller as tetragram frequency scoring gets higher. However, some types, such as Morse-based types and the Grandpre, during hillclimbing often produce false solutions that outscore the correct solution because they produce many high-frequency letters. Since the frequency order of the letters may be close to normal, the Normor test doesn’t help. I was hoping this new test, which for now I’m just calling the frequency test, would provide a tool during hillclimbing scoring to prevent these false solutions. I tried implementing it on a Grandpre word-based hillclimbing program without success so far, but I may still try using it for that purpose. I have not incorporated it in my Analyzer yet, nor have I tested it on other cipher types besides the two shown in the chart.

Portax test

Here’s another function to help identify a specific cipher type, in this case Portax.

function Ptaxtest(S:string): integer;
var i,hi, lo : integer;
begin
hi :=0; lo := 0;
  for i :=1 to Length(S) do
     begin
     if S[i] in ['w','j','m','b'] then hi := hi+1;
     if S[i] in ['r','s','y','z'] then lo := lo+1;
     end;
     if lo=0 then lo:=1;
  Ptaxtest :=  Round(20*hi/lo);
end;

The letters w,j,m,b are more frequent in Portax ciphers because they tend to encipher the high-frequency plaintext letters on the same columns on the lower (sliding) part of the key slide. Similarly the second set of letters tend to be low-frequency. Computing the ratio of these two sets is a good indicator of a Portax.

Bazeries test

I thought it might be useful to post some of the code I use in my Analyzer program to diagnose the type of an unknown cipher. I have a specific test for most types. The tests do not return a yes/no result, only a numerical score to indicate that type as more likely or less. Other factors like Index of Coincidence, length, Normor score, etc. also factor into the final score. Here’s the one I use for Bazeries:

Function Bazetest(S : String): Single;
var
temp, i : integer;
ch : char;
begin
temp :=0;
For i := 1 To Length(S) do
  begin
  ch := S[i];
  if ch in ['b', 'c', 'g', 'k', 'l', 'm', 'p'] then temp := temp + 1;
  end; // i
 If Length(S)>0 Then  Bazetest := 100*(temp/Length(S))
               Else  Bazetest := 0;
End;  // function Bazetest

This is written in Delphi/Pascal. Basically what it is does is to count up those ciphertext letters shown. They are not be used in the spelled-out numerical key as they do not appear in the numbers (in English). Thus most of them will be used as ciphertext substitutes for the letters in the fourth row of the plaintext polybius square, which is DIOTY. That’s the row with the highest combined frequency in the plaintext square. So the higher the number returned by this function, the more likely it is a Bazeries. It doesn’t appear in this test, but elsewhere my program also weighs heavily the absence of a J in the ciphertext, especially if all 25 other letters appear. That’s true for all the polybius square types.

Another unknown

This new format seems to be working OK, so I’ll put up another con. This one is a random unknown type from Gutenberg.org. Although grammatical in form, the last few words make no sense with the beginning and in fact are the beginning of another sentence from a different book. My program sometimes grabs a second sentence in order to meet length standards. My Analyzer identified the correct type and my hillclimber solved it quickly.

ksupbatpnzikmpwilpvwgeinpantqvxsibhqbgdiratnziqqidvjqpgeutidyjaksysppgtnjfylnavampisyktxibpysqvgtucipprnaxsizljbne