Version 1.5

The comp.security.pgp FAQ

Appendix II - PGP's inner magic

This part of the FAQ is © by Boudewijn W.Ch. Visser.

Version 0.99.1 - please mail me with comments, corrections etc.

The IDEA symmetrical block cipher
RSA Public-Key Cryptography
The MD5 Message Digest function

The IDEA symmetrical block cipher

Introduction

IDEA is a block cipher invented by Xuejia Lai and James Massey in 1991. A block cipher is an encryption algorithm that encrypts the data in blocks. IDEA has a block size of 64 bits, and a keylength of 128 bits. IDEA is a symmetrical algorithm, which means that the same key is used both for encryption and for decryption. The first version of it was called PES (for Proposed Encryption Algorithm) and published in 1990. When it was discovered that this version had some weaknesses against a certain attack, known as differential cryptanalysis, it was modified. This modified version was first called IPES (Improved Proposed Encryption Standard), and is what we now know as IDEA, an acronym for International Data Encryption Algorithm

How does IDEA work

IDEA consists of 8 rounds.
The datablock of 64 bits is split up in 4 16-bit parts, labeled X1, X2, X3 and X4. These are fed into IDEA. Add stand for addition modulo 2^16, Mul for Multiplication modulo 2^16+1 (and the all-zero subblock is treated as representing 2^16), and Xor for a bitwise Exclusive Or. From the 128 bit key 52 16-bit subkeys are generated, which are used in the rounds. The subkeys are labeled Zx_y, where y is the round where the key is used, and x the number of the subkey within the round. A detailed explanation of how the subkeys are derived can be found below the schematic diagram.

One round of IDEA looks like this :


          X1            X2                      X3           X4
          |             |                       |            | 
          V             V                       V            V 
   Z1_1->Mul     Z2_1->Add               Z3_1->Add    Z4_1->Mul
          |             |                       |            | 
          |             |                       |            | 
          +-------------|------>Xor<------------+            | 
          |             |        |              |            | 
          |             |        |              |            | 
          |             +--------|--->Xor<------|------------+ 
          |             |        |     |        |            | 
          |             |        |     |        |            | 
          |             |        V     V        |            | 
          |             | Z5_1->Mul-->Add       |            | 
          |             |        |     |        |            | 
          |             |        |     |        |            | 
          |             |        V     V        |            | 
          |             |       Add<--Mul<-Z6_1 |            | 
          |             |        |     |        |            | 
          V             |        |     |        V            | 
         Xor<-----------|--------|-----+------>Xor           | 
          |             V        |              |            V 
          |            Xor<------+--------------|---------->Xor 
          |             |                       |            | 
          |             +-----------\/----------+            | 
          |             ____________/\___________            | 
          |             |                       |            | 
          V             V                       V            V

And seven more of these rounds. and after those rounds the output transformation :

          |             |                       |            | 
          |             +-----------\/----------+            | 
          |             ____________/\___________            | 
          |             |                       |            | 
          V             V                       V            V 
   Z1_9->Mul     Z2_9->Add               Z3_9->Add    Z4_9->Mul 
          |             |                       |            | 
          Y1            Y2                      Y3           Y4

Key Schedule

The Z1_1,..Z6_1, Z1_2..Z6_2, ... Z1_9 Z4_9 are the subkeys for the rounds. They are derived from the 128 bit masterkey in the following way :

The 128 bit key is partitioned into 8 subblocks, that are directly used for the first 8 subkeys. (Keys Z1_1...Z1_6 Z2_1 Z2_2 ). The master key is then cyclicly rotated 25 positions to the left, and again partitioned into 8 blocks, which are used for the next 8 subkeys. This procedure is repeated until all 52 subkeys are generated.

Decryption with IDEA works exactly the same, only the subkeys are different. The subkeys for decryption are derived as follows from the encryption subkeys :

round 1  Z1_9^-1   -Z2_9   -Z3_9   Z4_9^-1   Z5_8   Z6_8
round 2  Z1_8^-1   -Z3_8   -Z2_8   Z4_8^-1   Z5_7   Z6_7
round 3  Z1_7^-1   -Z3_7   -Z2_7   Z4_7^-1   Z5_6   Z6_6 
... 
round 8  Z1_2^-1   -Z3_2   -Z2_2   Z4_2^-1   Z5_1   Z6_1
output   Z1_1^-1   -Z2_1   -Z3_1   Z4_1^-1

The Zn_m denote the same subkeys used in the encryption. The ^-1 means the multiplicative inverse (modulo 2^16+1), and the - is simply the negative of a subkey.

Security of IDEA

In general, it is very difficult to make definite pronouncements about the security of encryption algorithms. With the exception of the One Time Pad, no algorithm has any formal, mathematical proofs about its security. Some necessary (but not sufficient) conditions are known, and IDEA meets all those conditions. In addition to that, the longer an algorithm survives attacks without being broken, the more it is trusted. IDEA has now been a challenging target for cryptographers for 6 years now (1996).

Keylength

The length of the key used to encrypt data always puts an upper limit on the security of an algorithm. In principle, one could try all possible keys, and see which one correctly decrypts the data. This is called a "brute-force attack". The number of possible keys should be large enough to make this impossible. For IDEA, there are 2^128 keys, or about 3x10^38. If we had a computer capable of trying one billion keys per second (10^9/s) (this is beyond the capability of existing supercomputers) how long would it take to find one IDEA key?

There are 60x60x24x365 = 3.15x10^7 seconds in a year.

With 10^38 keys, it would take this hypothetical computer 3x10^38/(10^9 x 3x10^7) = 10^22 years to find one key.

If every person on earth had one such computer, it would 'only' take 10^22 / 5x10^9 = 2x10^12 years. This is hundreds of times longer than the age of the earth (4.5x10^9 years). So we can conclude that IDEA is quite safe from a brute-force attack.

Weak keys

Certain keys are weak when used for IDEA. Their use can be detected with little effort by a chosen plaintext attack. Although the number is these weak keys is large, the chance of choosing one of them randomly is very small, 1 in 2^96. They can be prevented by making a small change in the keyschedule of IDEA. Because the odds of choosing one of these weak keys are so small, and in addition to that it takes several chosen-plaintext encryptions to detect such a key, the risk of weak keys is negligible. I have mentioned them for completeness regarding what is known about IDEA's strengths and weaknesses.

Literature:

Bruce Schneier, Applied Cryptography 2nd edition John Wiley & Sons
Chapter three of Xuejia Lai's Ph.D. thesis, describing IDEA. Available as a postscript file on the Internet.
Weak keys for IDEA, paper by Joan Daemen, René Govaerts and Joos Vandewalle. Available as a postscript file on the Internet.

RSA Public-Key Cryptography

Introduction

RSA is one of the most well-known public-key cryptographic algorithms. The concept of public-key cryptography is rather different from the more familiar symmetric cryptography. In the case of symmetric cryptography, one key is used both for encryption and for decryption of the message. This means that in order to have an encrypted communication the key must be delivered somehow in a secure way. This can be done in person, before one of the parties leaves, or it can be done by courier, or in some other manner. It is obvious that there are many circumstances where the delivery of the key can be a great problem.

With public-key cryptography the whole situation is different. Here there are two keys, one public key and one private or secret key. The public key can be used to encrypt messages, but once the messages are encrypted with the public key, they can only be decrypted with the secret key.

Thus the public key can be transmitted in any manner, since it doesn't have to be secret. The only thing that must be kept secret is the private key, and that is easier since this key never has to be transmitted to anyone.

Public-key cryptography was invented in 1976 by Whitfield Diffie and Martin Hellman , and independently by Ralph Merkle. The basis of any public-key algorithm is a mathematical function that can easily be calculated, but that is very hard to reverse without some extra knowledge. The RSA public-key algorithm was invented in 1978 by Rivest, Shamir and Adleman, and thus derives its name from their initials. RSA relies on the difficulty of factoring a large number into into its prime components. Remember, a prime is a number that can only divided by 1 and by itself. If a number is not prime, but composite, it can be uniquely factored. For example, 55=5x11. However, when the number to be factored is not 55, but a few hundred digits long, this is a very hard problem.

How does RSA exactly work

First, we take two primes, p and q. We multiply them, and call their product n. Thus n = p x q . Now we find a number e that must be larger than 3, and relatively prime to (p-1)(q-1). This means that e and (p-1)(q-1) must have no divisors in common. For example, 15 and 32 are relatively prime. We also find a number d such that d x e = 1 mod (p-1)(q-1). We write the message M as a number, and perform :

C = M^e mod n.

C is now the encrypted message.

If we wish to decode the message, we calculate :

M = C^d mod n.

Example:

p=5, q =11, n=55, (p-1)(q-1) = 40. e must be between 3 and 40, and no common factors with 40. We choose e = 7. de = 1 mod (p-1)(q-1). d = 23, because 23*7 = 161 = 1 mod 40

Encryption: M = 2 => C = 2^7 mod 55 = 18

Decryption: M = C^d = 18 ^ 23 mod 55 = 18 ^1 * 18^2 * 18^4* 18^16 mod 55

doing the modulo-operation on each of the numbers

= 18 * 49 * 36 * 26 mod 55 = 2

Our public key consists of the numbers (e, n) and our secret key is (d, n).

The trick is that given p and q, d can easily be calculated, but without knowing p and q, d cannot be found.

The various operations that have to be done (modular exponentiation, modular inverses etc) can be done quite efficiently.

Are there enough primes? Couldn't someone just list all them?

It is no big problem to find large primes, and there are more than enough primes for everybody who wants to use PGP. Euclides already proved (around 300 B.C.) that there is an infinite number of primes, but sometimes people wonder if they don't get scarce as the numbers get large. Well, primes do get less dense as the numbers grow, but that is a very slow decline. The suggestion that is occasionally made if not somebody could list all primes below 2^256, or below 2^512 and thus readily factor PGP-keys with a 512 bit (2-256 bit primes) modulus, or even a 1024 modulus. When we see how many 256-bit primes there are, this suggestion is ridiculous.

The Prime Number Theorem states that there are on the order of N/lnN primes below the number N. With N 2^256, there are 6.5*10^74 primes below 2^256 (1.1*10^77). If we wish to know the number of primes between 2^256 and 2^255, we have to subtract the number of primes below 2^255 . There are about 3.2*10^74 primes below 2^255. And thus, there are about 3.3*10^74 primes of exactly 256 bits. As the number of particles in the Universe if estimated at around 10^80, it is obvious that running out of primes, or listing all the primes is totally out of the question even for 512 bit keys.

Finding such primes is not a very hard problem. This is a good thing, because if it was using PGP would be impossible ! To find such a prime, we just generate a random number of appropriate size, and test if it is a prime. If it isn't we try again until we succeed. There are several very fast algorithms that can test if a number is prime very quickly. The fastest ones are probabilistic. That is, they will never declare a prime number composite, but they will with a certain chance declare a composite number prime. But if we test the same number again and again against different bases, the chance that it is composite and yet not found by any of these test gets very small. There do exist exact primality tests (that prove for certain if a number is prime), but they take a lot more work. PGP first does a trial division with all primes up to 8191, and then uses the Fermat-test with bases 2, 3, 5, 7.

Digital Signatures

One of the nice things about the RSA algorithm is that it can also be used for signing a message. With a digital signature, the sender proves that it is undeniably sent by him/her. To do this with RSA, the sender simply performs the encryption with his secret exponent d. The receiver decodes the message with the public exponent e. The recipient (and everybody else who can receive the message) know that only the owner of the secret key could have made this signature. There is some danger in RSA-signing anything that is presented to you. An attacker could present you a message, that, when signed, would decode another message, or make you sign another document. This can be prevented by not signing the document itself, but the one-way hash or message digest of it. This is what PGP does. There is NO DANGER in PGP-signing a message. In addition to the extra security of signing the one-way hash or message digest of a document, and not the document itself, it is also much faster.

Security

Things that are not yet known : there are a few things that are not clear with regard to RSA (and many other public-key cryptography systems as well).

It has not been proven that breaking RSA is as hard as factoring n. Therefore, it just might be that there is some method for breaking RSA that does not require the factorization of a large number. And second, factoring itself has not been proven to be a truly hard problem. I use "hard problem" here in the mathematical sense. If time required to solve a problem grows as a linear function of the input, or as a polynomial function, a problem is considered tractable. For example, counting n objects takes in principle just n operations. But mean the workload rises as an exponential function of the number of inputs (like 2^n, or 10^n or worse) than the problem is considered intractable, because exponential functions grow so fast.

Factorization

Currently, it is possible to factor general numbers of some 130 digits long, or about 430 bits. Numbers of a special form are easier to factor. Here number of about 155 digits, like 2^512+1 can be factored.
In the case of PGP we are dealing with general numbers. It is believed that with a large effort, 512 bit keys can be factored now. The "large effort" should be seen in the order of several thousand workstations for many months .
One of the most frequently asked questions is "how long should my key be", or "how long would it take to break a <n> bit key ". Unfortunately, it is very hard to make even moderately accurate predictions about the time needed to factor a number that is much larger than those that have been factored up till now. And for predicting in the far future, we also have to take advances in computer power into account.

Just for those who wish to plug in numbers, here is the heuristic runtime formula (not proven to be correct) for the General Number Field Sieve: exp[(1.932+o(1))*(log n)^1/3* loglog(n)^(2/3)]

Log is the natural logarithm, and exp(x) = e^x = 2.718..^x

Here's a table from Applied Cryptography, referenced with an unpublished paper (as of Feb. 1995) by Andrew Odlyzko "Progress in Integer Factorization and Discrete Logarithms"

Mips years required to factor a number with the GNFS:



Bits Mips-years
 512   30,000
 768   2*10^8
1024   3*10^11
1280   1*10^14
1536   3*10^16
2048   3*10^20

Bits	Mips-years
512	30,000
768	2*10^8
1024	3*10^11
1280	1*10^14
1536	3*10^16
2048	3*10^20

In another table, making generous assumptions about the increase in computer power and assuming that factoring algorithms will be as fast the Special Number Field Sieve (which today can only be used on special numbers), Bruce Schneier gives the following keylenghts against various opponents :



Year   vs Individual vs Corporation vs Government
1995       768            1280         1536
2000       1024           1280         1536
2005       1280           1536         2048
2010       1280           1536         2048
2015       1536           2048         2048

Year	vs Individual	vs Corporation	vs Government
1995	768	1280	1536
2000	1024	1280	1536
2005	1280	1536	2048
2010	1280	1536	2048
2015	1536	2048	2048

Please note that there is lots of assumptions and handwaving in these numbers. Very few of the factorizations experts dare to make predictions beyond a few years.

Also keep in mind that there are, especially for Governments, other methods for obtaining your keys than spending billions of dollars and waiting one or more years for the factorization of your key. Burglary and electronic snooping (Tempest) are much more effective and much cheaper.

Timing Attack

Recently Paul Kocher invented an attack against most implementations of RSA. In this attack, one can deduce the secret exponent if the time it takes the processor to process a decryption can be measured very precisely. For a Pentium computer, we are talking at the microsecond level.

This attack is of little consequence for PGP, because anybody who can measure the time it takes your CPU to process a message with this accuracy, can also just copy the key from memory. The attack is no more dangerous than a virus or trojan designed to intercept your passphrase and secret key.

Literature:

Bruce Schneier, Applied Cryptography 2nd edition.
Paul Kocher: timing attack (http://www.cryptography.com/)

The following are references to online papers, mostly in Postscript. Some are highly technical. Those who still want to know more about the algorithms used will have to consult books and journals on paper.

More information about factoring and the two main algorithms used for large numbers:

Factoring Integers Using The Web And The Number Field Sieve - Arjen K. Lenstra: ftp://ftp.bellcore.com/pub/lenstra/nfs.ps.Z
The Magic Words Are Squeamish Ossifrage
Extended Abstract. Derek Atkins, Michael Graff, Arjen K. Lenstra, Paul C. Leyland. (net-wide factoring of 129 digit/426 bit number)
- ftp://ftp.bellcore.com/pub/lenstra/129.ps
- ftp://ftp.ox.ac.uk/pub/math/rsa129/rsa129.ps.gz
Factoring by Email - Arjen K. Lenstra, Mark S. Manasse: ftp://ftp.ox.ac.uk/pub/math/rsa129/fact_by_email.Z
Multiple Polynomial Quadratic Sieve (mpqs) sans_math handwaving explanation of the mpqs without the math - Paul Leyland ftp://ftp.ox.ac.uk/pub/math/rsa129/mpqs_sans_math.Z
Factoring integers with large prime variations of the quadratic sieve - H.J.J. Boender and H.J.J. te Riele: ftp://ftp.cwi.nl/pub/CWIreports/NW/NM-R9513.ps.Z
An implementation of the number field sieve - R.M. Huizing: ftp://ftp.cwi.nl/pub/CWIreports/NW//NM-R9511.ps.Z. Also: http://www.cwi.nl/cwi/publications/index.html
The Block Lanczos algorithm used to solve the HUGE matrices that arise when factoring such numbers:
A block Lanczos algorithm for finding dependencies over GF(2) - Peter L. Montgomery:
- ftp://ftp.cwi.nl/pub/pmontgom/BlockLanczos.psa4.Z (A4)
- ftp://ftp.cwi.nl/pub/pmontgom/BlockLanczos.psl.Z (letter)

The MD5 Message Digest function

Introduction

MD5 is the one-way hash function used by PGP. It was designed by Ronald Rivest. A one-way hash function is a function that operates on a message of arbitrary length, and that returns a fixed-length value. Thus: h = H(M), with H the one-way hash function, M the message and h the hash value for message M.

A simple checksum, that adds up all the all the bytes in a file and then truncates the result to last n digits, or a CRC (Cyclical Redundancy Check) are examples of hashes, but they are not one-way. In order to be one-way, the functions must also have the following properties :

Given M, it must be easy to compute h.
Given h, it must be hard to compute an M such that h=H(M)
Given M, it must be hard to find a M' such that H(M) = H(M')

A hash function is used to make a fingerprint of a document. If it was possible to find another document that has the same hash value as he first, this could be used commit fraud.

In some applications, one-wayness as defined above is not enough. The hash function must also be collision-resistant. That is, it must be hard to find two random M and M' that have the same hash-value.

In order to satisfy all the requirement, a hash value must have a certain length. It is obvious that as soon as we have documents that are longer than the output of the hash function, collisions must occur. This is called the pigeon-hole principle. If our hash function can output 4 distinct values, and we hash 5 documents, at least two documents will have the same hash value.

A related problem is posed by the 'birthday-attack'. This method is called so because it based on the often unknown fact, that if there are 23 people in the room, there's a 50% chance that two of them will have the same birthday.

This follows from the following calculation.

The chance that two people do not share their birthdays, is 364/365 The chance that three people ... is 364/365 * 363/365.

And so on, until we are certain that among 366 people at least two will the same birthday. But with 364/365 * 363/365 * ... * 343/365 ~= 0.5, we already have a 50/50 chance of two people NOT having the same birthday, and thus also a 50/50 chance of them sharing it.

It's the same with hashes. If we have a hash function with a length of n bits, we have a good chance of finding two messages with the same hash value if we try hashing 2^0.5*n random documents.

Internal workings of MD5

Now lets see how MD5 works. The message is processed in block of 512 bits. The message is first padded so that it is just 64 bits short of a multiple of 512 bits. This padding is adding a single 1 bit at the end of the message, and as many 0 bits as are required. Then the original message length (before the padding) is added as a 64 bit number.Now the message is an exact multiple of 512 bits. The blocks are processed as 16 32-bit chunks.

First 4 variables are initialized :

A=0x01234567
B=0x89abcdef 
C=0xfedcab98 
D=0x76543210

These four variables are now copied into the variables a, b, c, d and the main loop begins. This loop is run as long as there are message blocks.

                +-------------------------+
                |     Message Block       | 
                +-------------------------+
                 /     |           |     \ 
                /      |           |      \ 
               /       |           |       \ 
              /        |           |        \ 
             /         |           |         \ 
            /          |           |          \ 
      +-------+    +-------+    +-------+    +-------+
A-+---| round |--->| round |--->| round |--->| round |->Add-------------A
B-|+--|       |--->|       |--->|       |--->|       |---|->Add---------B
C-||+-|   1   |--->|   2   |--->|   3   |--->|   4   |---|---|->Add-----C
D-|||+|       |--->|       |--->|       |--->|       |---|---|---|->Add-D
  ||||+-------+    +-------+    +-------+    +-------+   |   |   |   |
  +|||---------------------------------------------------+   |   |   |
   +||-------------------------------------------------------+   |   |
    +|-----------------------------------------------------------+   |
     +---------------------------------------------------------------+

The final MD5-hash is are the A, B, C, D after the last message block is processed.

In the rounds there are 4 nonlinear functions :

F(X, Y, Z) = (X And Y) Or ( (Not X) And Z)
G(X, Y, Z) = (X And Z) Or ( Y And (Not Z))
H(X, Y, Z) = X Xor Y Xor Z
I(X, Y, Z) = Y Xor (X Or (Not Z))

One round looks like this :

 +------------------------------------------------------------------+
 |                                                                  |
 |  +----+                                                          |
 |  |    |                                                          |
 +->|  a |-------------------------+                                |
    |    |                         |                                |
    +----+                         |    M_j   t_i                   |
    |    |                         |     |     |                    |
    |  b |+------------------------|-----|-----|-----------------+  |
    |    | \                       |     |     |                 |  |
    +----+  \    __________        |     |     |                 |  |
    |    |   +->/Nonlinear \       V     V     |                 V  |
    |  c |----->| Function |---->Add-->Add-->Add-->Rotateleft->Add-+
    |    |   +->\__________/                
    +----+  /
    |    | /
    |  d |+
    |    |
    +----+

Now if we write this as :

FF(a,b,c,d,M_j,s,t_i) denotes a = b+((a+F(b,c,d) + M_j + t_i <<<s)
GG(a,b,c,d,M_j,s,t_i) denotes a = b+((a+G(b,c,d) + M_j + t_i <<<s)
HH(a,b,c,d,M_j,s,t_i) denotes a = b+((a+H(b,c,d) + M_j + t_i <<<s)
II(a,b,c,d,M_j,s,t_i) denotes a = b+((a+I(b,c,d) + M_j + t_i <<<s)

(<<<s is a rotating the bits to left over s positions)

  /* Round 1 */
  FF (a, b, c, d, M_0,  7, 0xd76aa478); /* 1 */
  FF (d, a, b, c, M_1, 12, 0xe8c7b756); /* 2 */
  FF (c, d, a, b, M_2, 17, 0x242070db); /* 3 */
  FF (b, c, d, a, M_3, 22, 0xc1bdceee); /* 4 */
  FF (a, b, c, d, M_4,  7, 0xf57c0faf); /* 5 */
  FF (d, a, b, c, M_5, 12, 0x4787c62a); /* 6 */
  FF (c, d, a, b, M_6, 17, 0xa8304613); /* 7 */
  FF (b, c, d, a, M_7, 22, 0xfd469501); /* 8 */
  FF (a, b, c, d, M_8,  7, 0x698098d8); /* 9 */
  FF (d, a, b, c, M_9, 12, 0x8b44f7af); /* 10 */
  FF (c, d, a, b, M_10,17, 0xffff5bb1); /* 11 */
  FF (b, c, d, a, M_11,22, 0x895cd7be); /* 12 */
  FF (a, b, c, d, M_12, 7, 0x6b901122); /* 13 */
  FF (d, a, b, c, M_13,12, 0xfd987193); /* 14 */
  FF (c, d, a, b, M_14,17, 0xa679438e); /* 15 */
  FF (b, c, d, a, M_15,22, 0x49b40821); /* 16 */

 /* Round 2 */
  GG (a, b, c, d, M_1,  5, 0xf61e2562); /* 17 */
  GG (d, a, b, c, M_6,  9, 0xc040b340); /* 18 */
  GG (c, d, a, b, M_11,14, 0x265e5a51); /* 19 */
  GG (b, c, d, a, M_0, 20, 0xe9b6c7aa); /* 20 */
  GG (a, b, c, d, M_5,  5, 0xd62f105d); /* 21 */
  GG (d, a, b, c, M_10, 9,  0x2441453); /* 22 */
  GG (c, d, a, b, M_15,14, 0xd8a1e681); /* 23 */
  GG (b, c, d, a, M_4, 20, 0xe7d3fbc8); /* 24 */
  GG (a, b, c, d, M_9,  5, 0x21e1cde6); /* 25 */
  GG (d, a, b, c, M_14, 9, 0xc33707d6); /* 26 */
  GG (c, d, a, b, M_3, 14, 0xf4d50d87); /* 27 */
  GG (b, c, d, a, M_8, 20, 0x455a14ed); /* 28 */
  GG (a, b, c, d, M_13, 5, 0xa9e3e905); /* 29 */
  GG (d, a, b, c, M_2,  9, 0xfcefa3f8); /* 30 */
  GG (c, d, a, b, M_7, 14, 0x676f02d9); /* 31 */
  GG (b, c, d, a, M_12,20, 0x8d2a4c8a); /* 32 */

  /* Round 3 */
  HH (a, b, c, d, M_5,  4, 0xfffa3942); /* 33 */
  HH (d, a, b, c, M_8, 11, 0x8771f681); /* 34 */
  HH (c, d, a, b, M_11,16, 0x6d9d6122); /* 35 */
  HH (b, c, d, a, M_14,23, 0xfde5380c); /* 36 */
  HH (a, b, c, d, M_1,  4, 0xa4beea44); /* 37 */
  HH (d, a, b, c, M_4, 11, 0x4bdecfa9); /* 38 */
  HH (c, d, a, b, M_7, 16, 0xf6bb4b60); /* 39 */
  HH (b, c, d, a, M_10,23, 0xbebfbc70); /* 40 */
  HH (a, b, c, d, M_13, 4, 0x289b7ec6); /* 41 */
  HH (d, a, b, c, M_0, 11, 0xeaa127fa); /* 42 */
  HH (c, d, a, b, M_3, 16, 0xd4ef3085); /* 43 */
  HH (b, c, d, a, M_6, 23,  0x4881d05); /* 44 */
  HH (a, b, c, d, M_9,  4, 0xd9d4d039); /* 45 */
  HH (d, a, b, c, M_12,11, 0xe6db99e5); /* 46 */
  HH (c, d, a, b, M_15,16, 0x1fa27cf8); /* 47 */
  HH (b, c, d, a, M_2, 23, 0xc4ac5665); /* 48 */

  /* Round 4 */
  II (a, b, c, d, M_0,  6, 0xf4292244); /* 49 */
  II (d, a, b, c, M_7, 10, 0x432aff97); /* 50 */
  II (c, d, a, b, M_14,15, 0xab9423a7); /* 51 */
  II (b, c, d, a, M_5, 21, 0xfc93a039); /* 52 */
  II (a, b, c, d, M_12, 6, 0x655b59c3); /* 53 */
  II (d, a, b, c, M_3, 10, 0x8f0ccc92); /* 54 */
  II (c, d, a, b, M_10,15, 0xffeff47d); /* 55 */
  II (b, c, d, a, M_1, 21, 0x85845dd1); /* 56 */
  II (a, b, c, d, M_8,  6, 0x6fa87e4f); /* 57 */
  II (d, a, b, c, M_15,10, 0xfe2ce6e0); /* 58 */
  II (c, d, a, b, M_6, 15, 0xa3014314); /* 59 */
  II (b, c, d, a, M_13,21, 0x4e0811a1); /* 60 */
  II (a, b, c, d, M_4,  6, 0xf7537e82); /* 61 */
  II (d, a, b, c, M_11,10, 0xbd3af235); /* 62 */
  II (c, d, a, b, M_2, 15, 0x2ad7d2bb); /* 63 */
  II (b, c, d, a, M_9, 21, 0xeb86d391); /* 64 */

The constant t_i are the integer parts of 2^32* abs(sin(i)), with i in radians.

Finally, as the last message block is processed, the numbers a, b, c, d form the MD5 hash for the file that has been processed.

Are you confused now? Well, at least the bits in the message are, and that's what it was all about.

Literature:

The reference implementation of MD5 can be found in RFC 1321.
Applied Cryptography 2nd edition, by Bruce Schneier

[ Table of Contents | About this FAQ | Glossary ]