New test

BION some time ago requested anyone with a way to distinguish a Two-Square ciphertext(ct) from a Foursquare ciphertext to share it. I may have found a simple test for that. See this chart.

I ran my new test on the 10,000 examples in BION’s data set of plaintext enciphered twice, once using Two-square and once using Foursquare. There was a significant difference in the test results for the two types. In the graph, the bars represent how many of the 10,000 ciphertexts of each type scored in the range indicated, with the X-axis numbers representing the top of the range. For example, as indicated by the bars directly over the label of 0.6, 1657 of the 10,000 Two-Square cts scored between 0.55 and 0.6 on my new test, while 530 of the Foursquare cts scored in that same range. All the ct samples here were of length 200. The results were less clear for shorter texts. Normal plaintext (i.e., not the stilted P-12 types) averaged about 0.3, depending on length. For those of length 200 or more, the average was .25, with very few scoring over 0.3. For my plaintext samples I used my solutions to ACA ciphers, including stilted types like the P-12s.

The test is similar to the Normor test and is simple to program, but not as simple for pen-and-pencil solvers. To compute it, count the number of appearances of every letter of the alphabet and for each divide that by the length of the ciphertext. Then take a sum of all the differences (absolute values) between the observed and expected values for each letter. I used the frequency counts shown in Caxton Foster’s Cryptanalysis for Microcomputers Appendix A (English) as my baseline expected values. For example, if I counted 20 E’s in the ciphertext (using BION’s data, i.e. a 200-letter ct), that’s 10% or 0.1 of the total. The expected value in English is .125 for E. The difference is .025. Add that to the difference for every other letter, including the ones that do not appear in the ct. That is the result of the test. My code in Delphi is below.

for i := 1 to length(S) do if S[i] in ['a'..'z'] then 
freq[ord(S[i]),2] := freq[ord(S[i]),2]+1;
for j := 97 to 122 do freq[j,2] := freq[j,2]/Length(S);
r := 0;
for k := 97 to 122 do r := r + abs(freq[k,1]-freq[k,2]);

The value of r is the result of the test. The first row of the “freq” array (i.e. freq[97,1], freq[98,1] etc.) must first be populated with the expected frequencies of the letters in the target language. S is the ciphertext string. The first line counts the observed letters and populates the array [.,2] positions with the results. The second line divides those values by the length of the ciphertext. The final line computes the running total of all the differences. The test is so simple that someone before me must have already invented it, but, if so, it has escaped my notice.

I didn’t intend this as a diagnostic test for type. Rather, I was hoping to solve another problem. Most cipher types, when decrypted with a wrong key, produce fewer high-frequency letters than normal plaintext, and more low-frequency letters than normal plaintext. The Normor test produces high numbers for these while the tetragram frequency scores are low. As the solution gets closer to correct, this changes to be closer to normal; the Normor test results get smaller as tetragram frequency scoring gets higher. However, some types, such as Morse-based types and the Grandpre, during hillclimbing often produce false solutions that outscore the correct solution because they produce many high-frequency letters. Since the frequency order of the letters may be close to normal, the Normor test doesn’t help. I was hoping this new test, which for now I’m just calling the frequency test, would provide a tool during hillclimbing scoring to prevent these false solutions. I tried implementing it on a Grandpre word-based hillclimbing program without success so far, but I may still try using it for that purpose. I have not incorporated it in my Analyzer yet, nor have I tested it on other cipher types besides the two shown in the chart.

Unknown 4-10-19

Here’s a randomly generated cipher of unknown type. All ID tests got the correct type. Hill-climber solved it without a crib. Will put in a caesar shifted crib.
APDQJ THMDL GDRWL JTJOL UKPIE JRWMU UMKLJ IQUYA EPPMM OJMUJ WUHQS
MNCCB FEICY PQMSQ KMGFT PDPVE BKQBV AWWIW NSDLQ EIYQR KQFCN SYNVF
EEKJT MNJRF HEUPC IWNJW MAUSL PODJD WMJVJ TJPEE MUAV.

crib: PSTB

Unknown cipher for 4-2-19

Here’s a randomly generated cipher of unknown type. All my ID tests got the correct type. Hill-climbing, my special key search, and converging key search all solved it, but they all needed the crib.

???
QADOQ JIKVM TWWJT OBTRQ JJONL CJKNX NNNQE PPKZS SFIMS PVSXL QFHRA
DIIBT ILZKD XTDLJ ADCQM JIYDI USEZD DIURB VWDXA VACKQ NRELG ADXBR
ATNGB XBATQ QGAVH MBSMT WBNKL ITRJW NDDMN LDYSC OYSXU DDLMA XSFOP
VLZAH DG.

caesar-shifted crib: JXJJRXGJYYJWUTTW

Portax test

Here’s another function to help identify a specific cipher type, in this case Portax.

function Ptaxtest(S:string): integer;
var i,hi, lo : integer;
begin
hi :=0; lo := 0;
  for i :=1 to Length(S) do
     begin
     if S[i] in ['w','j','m','b'] then hi := hi+1;
     if S[i] in ['r','s','y','z'] then lo := lo+1;
     end;
     if lo=0 then lo:=1;
  Ptaxtest :=  Round(20*hi/lo);
end;

The letters w,j,m,b are more frequent in Portax ciphers because they tend to encipher the high-frequency plaintext letters on the same columns on the lower (sliding) part of the key slide. Similarly the second set of letters tend to be low-frequency. Computing the ratio of these two sets is a good indicator of a Portax.

Cipher where log tetragraph scoring doesn’t work.

Here’s a Numbered key cipher even longer than the ACA guidelines suggest. Without a crib it can’t be solved by log tet scoring.

Numbered key:
06 48 52 55 47 35 58 10 06 49 07 16 52 17 57 23 35 27 47 57 31 09
46 20 55 42 59 58 20 37 58 46 47 51 06 38 29 05 32 52 43 26 37 08
24 39 07 10 18 59 31 29 09 19 14 32 42 07 24 57 23 10 41 41 10 32
39 18 50 24 11 21 23 40 56 18 48 52 29 13 23 48 49 07 52 02 38 13
05 28 07 40 56 37 12 35 52 33 13 47 16 15 22 47 17 18 46 30 24 39
23 53 27 52 57 25 56 48 42 55 38 03 55 37 40 25 16 47 11 36 46 52
58 48 15 24 26 38 57 23 18 10 23 25 24 26 25 35 07 10 09 57 33 54
50 26 48 53 40 57 54 50 53 59 20 37 41 54 07 25 32 26 15 21 38 22
30 33 14 57 23 15 02 47 43 52 33 10 57 37 52 08 09 18 20 58 31 15
28 47 35 48 08 28 51 27 47 29 21 23 07 12 51 10 37 31 45 53 41 52
45 49 54 18 06 33 20 16 26.

The key length is 60 — shorter than a Grandpre key. Here’s a trial decrypt with log tet score greater than the log tet score of the solution:

nttateehnatethatentatethantehtettentheetanthenthatthethenteathtthenaeehatthatthattattataeetththettatehtthathentantahhtnathatthethattethentatahthenhetheathentataheathtththenhatththathatatthattheahethetetheenthatthehtthatthahanthen

The log tet score of the above is 982. The log tet score of the solution is 914.

My regular PH hill-climber using log pentagraph and word list scoring solved it without a crib in about 30 million trial decrypts using three hill-climbing threads.

Bazeries test

I thought it might be useful to post some of the code I use in my Analyzer program to diagnose the type of an unknown cipher. I have a specific test for most types. The tests do not return a yes/no result, only a numerical score to indicate that type as more likely or less. Other factors like Index of Coincidence, length, Normor score, etc. also factor into the final score. Here’s the one I use for Bazeries:

Function Bazetest(S : String): Single;
var
temp, i : integer;
ch : char;
begin
temp :=0;
For i := 1 To Length(S) do
  begin
  ch := S[i];
  if ch in ['b', 'c', 'g', 'k', 'l', 'm', 'p'] then temp := temp + 1;
  end; // i
 If Length(S)>0 Then  Bazetest := 100*(temp/Length(S))
               Else  Bazetest := 0;
End;  // function Bazetest

This is written in Delphi/Pascal. Basically what it is does is to count up those ciphertext letters shown. They are not be used in the spelled-out numerical key as they do not appear in the numbers (in English). Thus most of them will be used as ciphertext substitutes for the letters in the fourth row of the plaintext polybius square, which is DIOTY. That’s the row with the highest combined frequency in the plaintext square. So the higher the number returned by this function, the more likely it is a Bazeries. It doesn’t appear in this test, but elsewhere my program also weighs heavily the absence of a J in the ciphertext, especially if all 25 other letters appear. That’s true for all the polybius square types.

Grandpre con

I updated the interactive Grandpre solver on Bionsgadgets so it now has undo-redo buttons. Here’s a randomly generated Grandpre that I tested it on.
My “sloppy” grandpre hill-climber using the crib solved it quickly.

Grandpre:
16 52 32 33 68 15 57 58 33 65 88 74 75 43 41 55 52 82 73 24 17 11
52 62 22 85 52 37 25 87 44 85 57 63 41 75 62 28 73 64 62 25 44 32
87 22 74 37 61 41 75 87 46 84 16 53 27 52 72 72 36 73 82 57 25 61
12 78 51 84 88 74 37 62 84 51 15 77 57 76 52 41 37 33 15 81 72 25
15 52 75 73 58 62 42 57 31 55 66 12 33 42 68 51 46 41 75 72 25 55
74 55 57 58 82 75 16 13 75 65 74 32 76 28 75 52 15 62 23 46 11 28
57 66 36 42 73 36 57 42 52 73 35 57 43 64 57 82 26 74 75 52 75 87
31 41 33 41 57 81 61 53 63 35 37 13 51 48 36 24 25 36 37 81 87 58

crib: toleavewiththe

BION Grandpré Worksheet

Thanks for providing the Grandpré worksheet at http://bionsgadgets.appspot.com/ , BION. It’s much handier to use than the interactive one I made many years ago – I’ve never put in my time to learn to program user interfaces, and in something like this it really shows! I used yours to do the JF19 AC-1217 and you made it fun rather than tedious.

I wonder whether crib placement could use more of the available frequency information. I think it currently just looks at overlaps in the crib, but perhaps it could also usefully check frequencies. I know in a homophonic like this you can’t count on high-frequency letters much, but if you have a low-frequency like ‘F’ it’s unlikely to be one of the higher-frequency dinomes. One could also give votes up and down if other digraphs or longer are formed.

I’ve also been musing about better automated Grandpré attacks, hillclimbing on both the keys and (if any) cribs, perhaps fiddling the keys to make sure it always has a full alphabet of words off my “cromulent words” lists, again paying attention to frequencies. I haven’t tried anything automatic yet – has anyone else?