COMP 4027 Forensic and Analytical Computing

ANTIFORENSICS: The challenge of cryptography for forensic analysis

Encryption is a useful tool for keeping personal information private. There are real concerns about the loss of personal privacy in the digital world, as we leave traces everywhere we go, either online or in person. Those of you who studied INFT 3015/5017 Computer and Network Security with me last year will recall that encryption is a key technology in preventing criminals from accessing our personal information, including financial data, and in keeping criminals from making unauthorised use of any facilities, among many other desirable applications.

However, as with many other technologies (e.g. medicine <-> bioterrorism; nuclear power <-> nuclear weapons; etc.) encryption can be used by criminals just as easily as by any other people. In this lecture we look at how encryption creates a serious impediment to forensic analysis of retrieved information.

Encryption - an overview

A plaintext is some unencrypted text, such as a message.

A ciphertext is the output of an encryption algorithm after you input some plaintext.

A very simple example of how we turn a plaintext message into a ciphertext is the Caesar cipher, purportedly invented by Julius Caesar to keep his message to his generals secret from his enemies. This cipher involves shifting every letter in the message "forward" by three places in the alphabet. For letters at the very end of the alphabet, you go back to the beginning.

e.g.
a -> D
b -> E
c -> F
....
w -> Z
x -> A
y -> B
z -> C

So for example if we want to encrypt "hello world" we would get the ciphertext "KHOOR ZRUOG"

To decrypt a message with the key is easy - all you have to do is apply the algorithm with the appropriate decryption key. In the above example, the key is 3, because 3 is what we add to every letter to get the ciphertext, so to get the plaintext back again, we subtract 3 from every letter.

However, to decipher the message without key requires a little more thought. We need to know firstly what encryption algorithm is being used, and secondly, what the decryption key is.

It is generally assumed (using the so-called Kerckhoff's Principle) that the algorithm used to encrypt a message is known by the cryptanalyst. Also, it is obvious that the ciphertext is going to be known by the cryptanalyst. Hence the only real secrets are the plaintext and the encrypting/decrypting key/s. The science of recovering the plaintext and/or the key is called cryptanalysis.

Cryptanalysis - an overview

Note that there are many more encryption systems that are far more secure than the shift cipher. Most of these are based on having vast numbers of possible keys to try.

In every single case, it is theoretically possible to find the key to decrypt a ciphertext. Note however that we say theoretically possible - to actually find an encryption key in practice is designed to be as difficult as possible.

So, how is it always theoretically possible to find a decryption key?

The most obvious way to find a key is the brute-force search. This is essentially to try every possible key and to see which one turns out to generate a sensible message. This method can be used with every single encryption technique. In the above example, if we knew we were up against a shift cipher, we would have to work out what the shift was (in this case it is 3). So we would go through all the possibilities (26 in all**) until we found the right one.

So, given it is possible to find they key, how does encryption make it difficult to do so?

Encryption algorithms aim to make it computationally infeasible to either find the decryption key or to find the plaintext message without having the decryption key. This is done by having such an enormous number of possible keys that it would take far too long to try them all out, with current technology and processing speeds, or at least with a desktop computer. Of course, note that it is theoretically possible that you could still guess the key, say if you had great good luck and managed to guess it first time. However the probability of this happening is vanishingly small** and it is not considered a threat for modern encryption.

(Another approach that is similar is to make it not worth the effort to decrypt a ciphertext. For example, I was a consultant for a Cambridge telecomms company and we were developing a hierarchical encryption system. The lowest-level encryption was actually relatively simple, and could be broken with enough desktop computing time, however the benefit in doing this was very small, as it only decrypted a very small amount of data, which it would probably be more efficient to purchase legitimately. Note however that this approach does not address the technical difficulty but rather the motivation for trying to cryptanalyse a ciphertext.)

So encryption algorithms generally rely on the need to have a vast number of keys, which is defined by the number of bits in the key (unless you use a passphrase to generate it!**).

So generally, encryption algorithms cannot guarantee that nobody can decipher the ciphertext without having the key, but we can give a reasonably accurate prediction of how long it would probably take someone to guess the key, using complexity theory.

Complexity

The difficulty of decrypting can be estimated using the Big O notation. Expressing the complexity of an algorithm in this notation can give you a feel for how rapidly the amount of time needed to complete the calculation will grow - that is, we are analysing the complexity of the algorithm.

For example, if we did a brute force search to guess the key for an algorithm, where the key is n bits in length, then it would take us 2n guesses to try all the possibilities (note that it is of course possible to guess they key well before trying out all of the possibilities). However, if the person using the encryption decided to use a bigger key, and added a single bit to the key length, we immediately have twice as many possible keys to try - this is an O(2n) algorithm. As you can see, a small increase in the key length results in a massive increase (a doubling) of the number of possible keys, thus making the job of the cryptanalyst much harder.

A brute force search is not the only way of finding the secret key, although it applies to every encryption algorithm (although for the one-time pad it is ineffectual**). An alternative is to look at the way the algorithm is defined, because sometimes there are characteristics that allow one to (theoretically at least) find the decrypting key from the known information.

An example is the RSA encryption system where there are two keys, the public key and the private key, which are mathematically related. A person's public key is published and anyone wanting to send a secret message will use the public key to encrypt the message for that person. However, the public and private keys are mathematically related, and it is theoretically possible to derive the private (decrypting) key from the public key.

The keys in RSA are derived by choosing two secret numbers p and q, multiplying them together to get N, then calculating phi(n)=p x q which is the Euler totient function. We choose a public key e, then calculate the multiplicative inverse of e modulo phi(n), and this is our decrypting key d. The encrypting and decrypting algorithms are the same, to raise the input to the power of the key, modulo N.

Now if we wanted to cryptanalyse this RSA algorithm, we could easily calculate the decrypting key d from the public information e and N. However, to do this, we need to know either p or q which lets us find phi(n) which in turn lets us calculate d, knowing e. Now this is where we find complexity hindering us - to find p or q knowing N (which is p x q remember), we need to factorise N, yet there are no efficient algorithms for factorising numbers, and the best-known algorithms are subexponential.

Note this applies only to normal digital computers, and there is actually a fast factorisation algorithm developed by Peter Shor that runs in polynomial time, i.e. is relatively efficient. The drawback is that it runs on a quantum computer, and so far these are very much in experimental stages. However if a big quantum computer is built, RSA may no longer be secure. See the Wikipedia article on this algorithm.

These days, most people use encryption algorithms where the complexity of finding the key or else calculating the decryption key from other information is computationally infeasible. These are either secret-key systems like Twofish which have very large keys used in very complex ways, or are public-key systems like RSA and the discrete logarithm systems, whose security is based on the difficulty of finding logarithms in finite fields.

There is a famous quote from one of the most reputable cryptologists today, Hendrik Lenstra, who has said

Suppose that the cleaning lady gives p and q by mistake to the garbage collector, but that the product pq is saved. How to recover p and q? It must be felt as a defeat for mathematics that the most promising techniques are searching the garbage dump and applying memo-hypnotic techniques.
(from Ian Stewart's book, previewed at google).

What this all means for forensics

You've no doubt worked out by now that a real issue for forensic analysts is the difficulty of decrypting evidence, once it is found on a device. Since cryptography is used in very many applications, much evidence that would otherwise be useful either for working out what happened in a case or as proof in court proceedings, cannot be accessed.

Cryptography us used in many ways that pose a problem for forensics analysts:

What this means is that forensics may only catch the less smart or well-resourced criminals. Those who either have some knowledge or hire experts will be able to use encryption well enough to hide their activities.

This leaves governments and agencies with the need to deal with encryption by other means than technological. Usually they opt for a legislative approach, with varying success.

History of governments and how they deal with encryption

The difficulty of recovering encrypted materials has troubled governments around the world for some time. Governments often use official secrets legislation to prevent algorithms from public disclosure, however there is an active community of researchers in cryptography, who independently discover and publish strong cryptanalysis algorithms. For example, in 1973, Clifford Cocks at GCHQ developed what is now known as the RSA, and this algorithm was classified. However in 1976, the first "public-key" encryption mechanism was developed by two researchers, Whitfield Diffie and Martin Hellman, and then developed into RSA by Rivest, Shamir and Adleman, so the secret was public within a few years of its discovery. The UK Government did not declassify Cocks' work until 1997.

Given that secrecy of strong encryption algorithms cannot be guaranteed, governments have since attempted to legislate aspects of their use, so that users of encryption are in breach of law if they use a strong encryption or do not tender up the key to a government agency upon request. This has not been entirely successful, as the following cases show:

Summary

We have seen that the encryption of data, in transmission or storage, poses a real problem to forensic analysts. This problem is so critical that Governments resort to legislation to control the use of encryption by individuals, althogh with limited success. However, the drawback of a legislation-based approach is that criminals may still be able to access ecncryption and are unlikely to be concerned about breaking the law in this respect.

Some resources

Useful reference materials:


Last update hla 2009-03-05