Cryptographic Hashing Functions

A cryptographic hashing function, also sometimes referred to as a message digest function, is an algorithm that will take a series of octets as input and produce a consistent, fixed binary value as output that has the important property that one cannot take the output value and derive the original input. The resulting output value has been called many things, including a hash value, one-way hash value, and message digest.

As an example, consider the words Hello, World. If one feeds that string into SHA-1, a widely used hashing function aptly named "Secure Hashing Algorithm 1", the output would be the value 0x907D14FB3AF2B0D4F18C2D46ABE8AEDCE17367BD. Anytime that string is fed into SHA-1, it will reliably produce the same output value.

Immediately, you may wonder how it is true that one cannot derive the input value when, in this example, we know that 0x907D14FB3AF2B0D4F18C2D46ABE8AEDCE17367BD corresponds to Hello, World. That would be a legitimate observation, but knowing the expected value is not the same as being able to derive an input value. Nonetheless, you can see how using a hashing function to do something such as storing passwords as hash values can be dangerous if not done properly (more on that later).

Many security applications ranging from file encryption software, to secure VoIP media transmission, to cryptocurrencies, to authentication systems rely heavily on hashing functions. For example, file encryption software will use a hashing function as a core component to create a Message Authentication Code (MAC) value to ensure integrity of encrypted data. If you place an audio/video call over the Internet that uses encryption, a hashing function is also used to provide message integrity functionality for each packet transmitted.

As mentioned previously, cryptographic hashing functions are often used to record credentials (e.g., passwords) of users who wish to access a given system. However, since it is possible to know that 0x907D14FB3AF2B0D4F18C2D46ABE8AEDCE17367BD corresponds to Hello, World, hackers can (and have) exploited such trivial use of schemes to create what are known as "rainbow tables" to facilitate cracking passwords. Rainbow tables are basically pre-generated hash values for millions of words, phrases, word combinations, etc. Gaining access to a password file that only relies on a hashing function, it is possible for a hacker to look at the stored hash value and then look up the corresponding password in a rainbow table.

To properly use a hashing function for software like authentication systems, one needs to do more than just use a hashing function and input. For a password file, a good solution is to use a random "salt" value and to hash the salt along with the user's password. Rather than calling H(password), one might call H(salt || password) (here || denotes concatenation). For example, if the salt is KTOk7f8EkAXw6E8M and the user's password is Hello, World, the resulting hash would be 0xDD67E8B397C3118378D6B498878F52E83507EB4E. Every password stored would utilize a different salt value, thwarting any attempt to successfully use a rainbow table. Additionally, if any two users used the same password, the unique salt values would ensure that the resulting hash computation produces a different result. Thus, it would not be obvious when comparing different hash values that two passwords were, in fact, the same. Good cryptographic hash functions create wildly different output with if only once octet is changed.

A similar scheme is used when encrypting files to ensure data integrity, often using something like an HMAC. When sending audio/video packets in a conference call, engineers will take steps to ensure that every single packet will have a distinct input such that message integrity can be guaranteed. An HMAC or other such integrity value does not prevent message tampering, but it can be reliably used to detect tampering.

There are a variety of hashing functions defined, including MD5, RIPEMD family (RIPEMD, RIPEMD-180, RIPEMD-160, RIPEMD-256, RIPEMD-320), SHA-1, SHA-2 family (SHA-224, SHA-256, SHA-384, and SHA-512), and the SHA-3 family of functions. There are many more, but those are examples of hashing functions often encountered.