Hash functions are widely used for a number of reasons but in the world of cryptography they are the core of message authentication and data integrity functions, reducing a message to a relatively short digest or check sum to which some cryptographic operation involving a secret key is applied.

Cryptographic algorithms such as DES and AES can be used to generate a hash and cryptographic check sum at the same time by using a secret key but here we are going to consider algorithms specifically designed for hash functions as widely used in the creation of digital signatures. We will look at HMAC (Hash Message Authentication Code) and CMAC (Cipher Message Authentication Code) in another article.

In the days of modern cryptography usually considered to be from the days NIST first published the DES algorithm in 1975 a number of hash algorithms have appeared and many have quickly fallen by the wayside. People generally assume that the strength of an algorithm for confidentiality is most difficult followed by the design of hash functions and last of all random numbers which anybody can create? Reality has it back to front, creating good random numbers is extremely difficult, collision free hash functions are very difficult and encipherment algorithms whilst difficult are the easiest of the set.

Professor Ron Rivest, the ‘R’ in RSA is one of the core pioneers in modern cryptography and introduced a number of hash functions (MD2, MD4, MD5) from1989 onwards in addition to his RCx (Ron’s Code) set of crypto algorithms. But as fast as he produced new hash algorithms other cryptographers and in particular Dan Coppersmith seemed to be very good at taking them apart. Although these MD hash algorithms are still widely used today they would be deprecated for any new system design.

Things really changed when NIST introduces SHA (Secure Hash Algorithm, sometimes the original form is referred to as SHA-0) in 1993 followed soon afterwards in 1995 by SHA-1 due to the discovery of an unknown weakness in SHA-0. Between 1998 and 2008 a number of analysts were able to show collisions with much lower than the expected complexity of 2^80 to as little as 2^33.6.

SHA-1 is widely used in many commercial cryptographic systems and is probably the most widely used hash function in current cryptographic systems. So how much do these collisions matter? Do you need to panic and change your system tomorrow?

Hash functions need to conform with a number of properties such as uniformity and non-determinism to ensure there is no short cut for manipulating the hash function but the core properties of a hash function with an n-bit output are normally defined as,

1) Collision resistant which means that the computational complexity of finding two messages M1 and M2 with the same hash value H0 should not be less than 2^n/2 which for SHA-1 would be 2^80

2) Pre-image resistant which means that the computational complexity of finding a message M2 given the hash value H1 from a message M1 should not be less than 2^n

3) Second pre-image resistant which means that the computational complexity of finding a message M2 given M1 such that hash(M2) = hash(M1) should not be less than 2^n

When you hear that a hash function has been broken it generally refers to the fact that somebody has found a lower computational complexity for the basic collision resistance (as per (1) above) defined as 2^n/2 and indeed this is what has happened to SHA-1.

What this means is that it would be possible for an attacker with SHA-1 in hand to find two effectively random messages that result in the same hash digest with a computational complexity lower than that deemed acceptable as per the table below,

However in a commercial environment this is not normally the case when protecting say financial messages with digital signatures, in fact you might argue that it is even more stringent than the second pre-image attack referred to above because in general the messages are structured. What this means is that before a hash digest is even checked it is necessary for the message to pass the tests for structure. You cannot expect the receiver of a message to just accept any old random message it has to look right. I am not aware of any attacks on SHA-1 that would expose any practical computational complexity less than 2^n which for SHA-1 would be 2^160 and significantly in excess of the normal requirements.

There is a second however and that is the fact that NIST has deprecated the use of SHA-1 in favour of SHA-2, actually a set of algorithms with output sizes of 224, 256, 384 and 512 bits. There is also a SHA-3 family but that is relatively new and as any seasoned cryptographer will tell you give it a little time to be poked and prodded before adoption. It gets a little worse because many have argued that derivative attacks applied to SHA-1 may succeed against the SHA-2 family.

But to you and I what does all this mean? If we understand how the hash function is being used and it isn’t a straight collision problem (i.e. there is a security vulnerability if somebody is able to find two constructed messages with the same hash digest) then continuing to use SHA-1 really isn’t an immediate problem although you should be planning to change because determining the type of collision vulnerability is not always straightforward, digital certificates for example are vulnerable to a straight collision attack as has been demonstrated when using the MD5 hash function. When it comes to security, planning for change is the fundamental requirement and if you were designing a new system or upgrading an existing system then you would plan to use SHA-2 with 256 bits.

As we have said before the important thing with any security design is being able to provide changes to algorithms and keys which are transparent to the users.

**Dr David Everett**