by Paul Bryzek, Michael Gray, Kousha Najafi, Antonina Orlova
Go check out the original paper if you have some time. It’s a short, fun read.
The paper examines the problems with running untrusted code that manipulates other code(the compiler, the assembler, the loader, or even CPU microcode), even if its source can be examined. As an example, we assume Ken Thompson wants to insert a backdoor into the login program. He inserts a couple of lines into login.c that always let the user Ken login.
Anyone who examines login.c can instantly see the added code, and it will be removed immediately. Ken decides to be tricky, and inserts a bug into the C compiler that will add this backdoor at login.c’s compile time. He can then remove the bug from login.c.
The problem is that anyone who examines the c compiler’s source, cc.c, will instantly spot the bug. Ken could distribute tainted binaries of the compiler along with clean source, but every time the compiler is updated, forked, branched, or otherwise caused to be recompiled, he’d have to be there to reinsert the bug. Ken just stole his boss’ shiny red stapler and probably won’t be around much longer, so he needs a better plan.
He decides to add a second bug to cc.c. This bug will reproduce both bugs whenever the c compiler is recompiled(in order to do this, we need a self-reproducing program, sometimes called a quine). He then distributes the binary of this new compiler(this is similar to how real C features can be bootstrapped, see the original paper for more information), along with clean source(i.e. the source doesn't contain the bugs).
Whenever future C compilers are compiled, both bugs will be inserted by the previous C compiler. Whenever login.c is compiled, the backdoor will be inserted. Ken Thompson now controls every UNIX machine that used a C compiler descending from his. Now that he’s taken over the world, he returns to simpler endeavors, like pretending his new shiny red stapler is a tank.
Consider the encryption function E, with key's K and K-1 and message M. A good encryption scheme satisfies the following attributes:
note that the encryption algorithm is used for both encryption and decryption. Public-key encryption is where K is globally known, and K-1 is kept secret. Secret-key encryption is where both K and K-1 are kept secret. In this case we usually have K = K-1.
| Attack Name | Data in Attacker's Possession | Description |
|---|---|---|
| Ciphertext Only | Passively recorded ciphertexts | Attacker monitors communications of which they know nothing, then attempts to break the cipher. |
| Known Plaintext | Passively recorded ciphertexts along with corresponding plaintext or plaintext guesses | Attacker knows or guesses plaintext that corresponds to ciphertext, then attempts to break the cipher. |
| Chosen Plaintext | Ciphertexts corresponding to attacker chosen plaintext | Attacker gets ciphertext based on chosen plaintexts, then attempts to break cipher. |
| Related Key | The same plaintext encrypted with multiple keys | Attack either produces or otherwise obtains the same plaintext encrypted with different keys, and then attempts to break those keys or the encryption algorithm itself.(This was done to break some ciphers in WW2 without having any knowledge of the encryption device, only recorded messages) |
One vulnerability of general encryption schemes is that if a person gathers enough encrypted messages, they will be able to find out the encryption keys.
Give him more info!!! The longer the session key lasts, the longer he has to guess it. The longer the length of the text (transmitted) is, the more analysis he can perform, and more patterns he can decrypt.
Let's pretend we were using some form of substition cipher(this aren't good enough for real life, but will serve as an example).
With a large section of text, the hacker could perform an analysis that checks for the most frequently used code letters, and match them against the common usage of the English Alphabet. A hacker can use the knowledge of English letter/word frequency to match up with the encoded text to decode the message, for example, knowing that E and T are the most frequently used letters in the English language, he can use this info to find potential patterns in the code to easily decrypt it.
In practice, Public-key encryption is used to exchange session keys, keys that will be used for only one session. This lets you take advantage of the speed of secret-key encryption, while easily exchanging the secret key using public-key encryption. It also limits the amount of data encrypted by the public-key, since public-keys are generally kept for longer amounts of time. The session-key is used to encrypt all the data after the secure connection is established.
Another use of Public-key encryption is signatures. Signatures identify a message as authentic; the message does not have to be encrypted. A signature takes advantage of the symmetry between public and private keys.
Normally we have E(M,KA) which anyone can produce but only can be read by A. Signatures consists of E(M,KA-1) which only A can produce but anyone can read.
Since everyone knows the public-key of A, they can decrypt any message encrypted by A's key, KA-1. In practice, signatures usually encrypt a hash of the message. This way, the message can be kept unencrypted, but any change to the message body will change the hash, thereby invalidating the signature.

(Alice and Bob: Yet one more reason I'm barred from speaking at crypto conferences. Courtesy of xkcd.net)
Solution? Add A's ID to the message
i.e.: ({A,B,Kab},Kb)
This example means That A is sending the session key (Kab)to B encrypted by Kb.
Problem This message is still resendable so a hacker can resend the message and receive information that B thinks is going to A. Bad!
Better Solution: Freshness
Add something to the transaction that lets B know that the message is new, not a resent message from a long time ago. This should not be implemented using the time because a crafty hacker can go in and change the system's time so that the system thinks that the message dated 1991 is a current message. We can do better.
Nonces: A unique number chosen for each message exchange. Have brand new number associated with each message that is never repeated.
Then, if A is trying to contact B and wants to use Kab encrypt key for the session key, we have the following, assuming A knows B’s public key Kb and B knows A’s public key Ka.
A --> B: E ({A, B, Kab}, Kb)
A <-- B: E ({A, B, Kab, N}, Ka)
A --> B: E ({A, B, Kab, N}, Kb)
At this point, if B never chooses the same nonce N, then B knows that the message exchange is fresh, and therefore there’s both authenticity and integrity. Moreover, if A ensures that Kab is used only for this connection, then A can be confident of both authenticity and integrity of the message.
(We assume our algorithm is perfect and any key is as valid as another. In reality, many algorithms have classes of weak keys, thus we shouldn't let one side decide on Kab by themselves(they could be making it easy for a 3rd party to break the encryption without actually forfeiting their keys...either due to a bad implementation, or for more evil reasons). We could, for example, have each side generate their own nonce, concatenate them, and then hash them to generate Kab. )
Public Key key distribution- use public keys to build a session key.
In order for this to occur:
1. A needs to authenticate B.
2. B needs to authenticate A.
Start with 2 public key pairs.:
Ka, Ka-1.
Kb, Kb-1.
User A:
has access to Ka, Ka-1, Kb.
does not have access to Kb-1.
User B:
has access to Kb, Kb-1, Ka.
does not have access to Ka-1.
Ka-1. Kb-1 are Users A and B's respective private keys that are known only to themselves.
Solution: Send A's private key encrypted with B's public key.
Problem A does not know if it is really B that she is talking to. She may have just sent an attacker her secret key, and now the hacker or B will be able to impersonate A. Bad!
Better Solution: Verify the Key!
e.g. suppose there’s an ssh program running on Kinko’s machine that has a public key, and you are trying to log in to SEASnet to check your e-mail. The likelihood that this machine knows the public key of the machine at SEASnet is very small. Therefore, the Kinko’s machine and the SEASnet server need to agree on each others identities. To accomplish this task, Kinko’s machine and SEASnet server have to exchange with public keys. The user is then asked to verify that SEASnet's key is actually SEASnet's real key.
ssh on Kinko’s Machine ---> SEASnet server: Ka
ssh on Kinko’s Machine <--- SEASnet server: E({Kb, N}, Ka)
ssh on Kinko’s Machine ---> SEASnet server: E({H (password), N}, Kb>)
Another way verify the key is to provide a signature. A signature says that this principal definitely produced this piece of information. A cryptographic hash is an example of a signature. Assuming Ka and K-1 have the same length, then
The result is that we can get an encryption where only one party can decrypt the information, and a signature where only one party can produce the encrypted version.
e.g. Web browser binaries has a set of trusted public keys, including Verisign public key, Kv.
Ka
A ------------------------------------------------------> B
E ({ B, Kb, E( H({B, Kb}), Kv^-1) }, Ka)
A<------------------------------------------------------- B
Then, A checks using a hash function that D( {B, Kb, E( H({B, Kb}), Kv-1) }, Kv) = H({B, Kb}).
Whether the principle is allowed to make the request? We model the authorization problem as a simple function: AUTH(principle, object, operation) -> Y/N . Mediator runs AUTH() before the operation and only performs it if it is allowed.
Two forms of authorization functions:
| Principle | Object | Operation | Allowed? |
|---|---|---|---|
| Kohler | grades.txt | write | Y |
| Cs111Student | grades.txt | write | N |
| Cs111Student | grades.txt | read | N |
| Kohler | mom.txt | write | Y |
| Cs111Student | mom.txt | read | Y |
Other Example: Domain Name System
Maps the URL a user types in to the corresponding IP address. This system has been made robust in the sense that its impossible to do anything potentially dangerous like change the information of that IP address. All the system is allowed to do is tell where that URL corresponds to. Since it is public, there is no need for authentication.