is P, then this is probably a substitute for t, and so on. Al-Kindī’s technique, known as frequency analysis , shows that it is unnecessary to check each of the billions of potential keys. Instead, it is possible to reveal the contents of a scrambled message simply by analyzing the frequency of the characters in the ciphertext.
Figure 6 The first page of al-Kindī’s manuscript On Deciphering Cryptographic Messages , containing the oldest known description of cryptanalysis by frequency analysis. ( photo credit 1.2 )
However, it is not possible to apply al-Kindī’s recipe for cryptanalysis unconditionally, because the standard list of frequencies in Table 1 is only an average, and it will not correspond exactly to the frequencies of every text. For example, a brief message discussing the effect of the atmosphere on the movement of striped quadrupeds in Africa would not yield to straightforward frequency analysis: “From Zanzibar to Zambia and Zaire, ozone zones make zebras run zany zigzags.” In general, short texts are likely to deviate significantly from the standard frequencies, and if there are less than a hundred letters, then decipherment will be very difficult. On the other hand, longer texts are more likely to follow the standard frequencies, although this is not always the case. In 1969, the French author Georges Perec wrote La Disparition , a 200-page novel that did not use words that contain the letter e. Doubly remarkable is the fact that the English novelist and critic Gilbert Adair succeeded in translating La Disparition into English, while still following Perec’s shunning of the letter e. Entitled A Void , Adair’s translation is surprisingly readable (see Appendix A ). If the entire book were encrypted via a monoalphabetic substitution cipher, then a naive attempt to decipher it might be stymied by the complete lack of the most frequently occurring letter in the English alphabet.
Table 1 This table of relative frequencies is based on passages taken from newspapers and novels, and the total sample was 100,362 alphabetic characters. The table was compiled by H. Beker and F. Piper, and originally published in Cipher Systems: The Protection Of Communication .
Letter
Percentage
a
8.2
b
1.5
c
2.8
d
4.3
e
12.7
f
2.2
g
2.0
h
6.1
i
7.0
j
0.2
k
0.8
l
4.0
m
2.4
n
6.7
o
7.5
p
1.9
q
0.1
r
6.0
s
6.3
t
9.1
u
2.8
v
1.0
w
2.4
x
0.2
y
2.0
z
0.1
Having described the first tool of cryptanalysis, I shall continue by giving an example of how frequency analysis is used to decipher a ciphertext. I have avoided peppering the whole book with examples of cryptanalysis, but with frequency analysis I make an exception. This is partly because frequency analysis is not as difficult as it sounds, and partly because it is the primary cryptanalytic tool. Furthermore, the example that follows provides insight into the modus operandi of the cryptanalyst. Although frequency analysis requires logical thinking, you will see that it also demands guile, intuition, flexibility and guesswork.
Cryptanalyzing a Ciphertext
PCQ VMJYPD LBYK LYSO KBXBJXWXV BXV ZCJPO EYPD KBXBJYUXJ LBJOO KCPK. CP LBO LBCMKXPV XPV IYJKL PYDBL, QBOP KBO BXV OPVOV LBO LXRO CI SX’XJMI, KBO JCKO XPV EYKKOV LBO DJCMPV ZOICJO BYS, KXUYPD: “DJOXL EYPD, ICJ X LBCMKXPV XPV CPO PYDBLK Y BXNO ZOOP JOACMPLYPD LC UCM LBO IXZROK CI FXKL XDOK XPV LBO RODOPVK CI XPAYOPL EYPDK. SXU Y SXEO KC ZCRV XK LC AJXNO X IXNCMJ CI UCMJ SXGOKLU?”
OFYRCDMO, LXROK IJCS LBO LBCMKXPV XPV CPO PYDBLK
Imagine that we have intercepted this scrambled message. The challenge is to decipher it. We know that the text is in English, and that it has been scrambled according to a monoalphabetic substitution cipher, but we have no idea of the key. Searching all possible keys is impractical, so we must apply frequency analysis. What follows is a step-by-step guide to cryptanalyzing the ciphertext, but if you feel confident then you might prefer to ignore this and attempt your own independent