cryptanalysis.
The immediate reaction of any cryptanalyst upon seeing such a ciphertext is to analyze the frequency of all the letters, which results in Table 2 . Not surprisingly, the letters vary in their frequency. The question is, can we identify what any of them represent, based on their frequencies? The ciphertext is relatively short, so we cannot slavishly apply frequency analysis. It would be naive to assume that the commonest letter in the ciphertext, O, represents the commonest letter in English, e, or that the eighth most frequent letter in the ciphertext, Y, represents the eighth most frequent letter in English, h. An unquestioning application of frequency analysis would lead to gibberish. For example, the first word PCQ would be deciphered as aov.
However, we can begin by focusing attention on the only three letters that appear more than thirty times in the ciphertext, namely O, X and P. It is fairly safe to assume that the commonest letters in the ciphertext probably represent the commonest letters in the English alphabet, but not necessarily in the right order. In other words, we cannot be sure that O = e, X = t, and P = a, but we can make the tentative assumption that:
O = e, t or a, X = e, t or a, P = e, t or a.
Table 2 Frequency analysis of enciphered message.
Letter
Frequency
Occurrences
Percentage
A
3
0.9
B
25
7.4
C
27
8.0
D
14
4.1
E
5
1.5
F
2
0.6
G
1
0.3
H
0
0.0
I
11
3.3
J
18
5.3
K
26
7.7
L
25
7.4
M
11
3.3
N
3
0.9
O
38
11.2
P
31
9.2
Q
2
0.6
R
6
1.8
S
7
2.1
T
0
0.0
U
6
1.8
V
18
5.3
W
1
0.3
X
34
10.1
Y
19
5.6
Z
5
1.5
In order to proceed with confidence, and pin down the identity of the three most common letters, O, X and P, we need a more subtle form of frequency analysis. Instead of simply counting the frequency of the three letters, we can focus on how often they appear next to all the other letters. For example, does the letter O appear before or after several other letters, or does it tend to neighbor just a few special letters? Answering this question will be a good indication of whether O represents a vowel or a consonant. If O represents a vowel it should appear before and after most of the other letters, whereas if it represents a consonant, it will tend to avoid many of the other letters. For example, the letter e can appear before and after virtually every other letter, but the letter t is rarely seen before or after b, d, g, j, k, m, q or v.
The table below takes the three most common letters in the ciphertext, O, X and P, and lists how frequently each appears before or after every letter. For example, O appears before A on 1 occasion, but never appears immediately after it, giving a total of 1 in the first box. The letter O neighbors the majority of letters, and there are only 7 that it avoids completely, represented by the 7 zeros in the O row. The letter X is equally sociable, because it too neighbors most of the letters, and avoids only 8 of them. However, the letter P is much less friendly. It tends to lurk around just a few letters, and avoids 15 of them. This evidence suggests that O and X represent vowels, while P represents a consonant.
Now we must ask ourselves which vowels are represented by O and X. They are probably e and a, the two most popular vowels in the English language, but does O = e and X = a, or does O = a and X = e? An interesting feature in the ciphertext is that the combination OO appears twice, whereas XX does not appear at all. Since the letters ee appear far more often than aa in plaintext English, it is likely that O = e and X = a.
At this point, we have confidently identified two of the letters in the ciphertext. Our conclusion that X = a is supported by the fact that X appears on its own in the ciphertext, and a is one of only two English words that consist of a single letter. The only other letter that appears on its own in the ciphertext is Y, and it seems highly likely that this represents the only other one-letter English word, which is i.