Cryptographic Hashing Functions
A cryptographic hashing function, also sometimes referred to as a message digest function, is an algorithm that will take a series of octets as input and produce a consistent, fixed binary value as output that has the important property that one cannot take the output value and derive the original input. The resulting output value has been called many things, including a hash value, one-way hash value, and message digest.
As an example, consider the words Hello, World
. If one feeds
that string into SHA-1, a widely used hashing function aptly
named "Secure Hashing Algorithm 1", the output would be the value
0x907D14FB3AF2B0D4F18C2D46ABE8AEDCE17367BD
. Anytime that string
is fed into SHA-1, it will reliably produce the same output value.
Immediately, you may wonder how it is true that one cannot derive the
input value when, in this example, we know that
0x907D14FB3AF2B0D4F18C2D46ABE8AEDCE17367BD
corresponds to
Hello, World
. That would be a legitimate observation, but
knowing the expected value is not the same as being able to derive an
input value. Nonetheless, you can see how using a hashing function to
do something such as storing passwords as hash values can be dangerous if
not done properly (more on that later).
Many security applications ranging from file encryption software, to secure VoIP media transmission, to cryptocurrencies, to authentication systems rely heavily on hashing functions. For example, file encryption software will use a hashing function as a core component to create a Message Authentication Code (MAC) value to ensure integrity of encrypted data. If you place an audio/video call over the Internet that uses encryption, a hashing function is also used to provide message integrity functionality for each packet transmitted.
As mentioned previously, cryptographic hashing functions are often used to
record credentials (e.g., passwords) of users who wish to access a given
system. However, since it is possible to know that
0x907D14FB3AF2B0D4F18C2D46ABE8AEDCE17367BD
corresponds to
Hello, World
, hackers can (and have) exploited such trivial
use of schemes to create what are known as "rainbow tables" to facilitate
cracking passwords. Rainbow tables are basically pre-generated hash values
for millions of words, phrases, word combinations, etc. Gaining access to a
password file that only relies on a hashing function, it is possible for a
hacker to look at the stored hash value and then look up the corresponding
password in a rainbow table.
To properly use a hashing function for software like authentication systems,
one needs to do more than just use a hashing function and input. For a
password file, a good solution is to use a random "salt" value and to hash
the salt along with the user's password. Rather than calling
H(password)
, one might call H(salt || password)
(here ||
denotes concatenation). For example, if the salt is
KTOk7f8EkAXw6E8M
and the user's password is
Hello, World
, the resulting hash would be
0xDD67E8B397C3118378D6B498878F52E83507EB4E
. Every password
stored would utilize a different salt value, thwarting any attempt to
successfully use a rainbow table. Additionally, if any two users used the
same password, the unique salt values would ensure that the resulting hash
computation produces a different result. Thus, it would not be obvious when
comparing different hash values that two passwords were, in fact, the same.
Good cryptographic hash functions create wildly different output with if
only once octet is changed.
A similar scheme is used when encrypting files to ensure data integrity, often using something like an HMAC. When sending audio/video packets in a conference call, engineers will take steps to ensure that every single packet will have a distinct input such that message integrity can be guaranteed. An HMAC or other such integrity value does not prevent message tampering, but it can be reliably used to detect tampering.
There are a variety of hashing functions defined, including MD5, RIPEMD family (RIPEMD, RIPEMD-180, RIPEMD-160, RIPEMD-256, RIPEMD-320), SHA-1, SHA-2 family (SHA-224, SHA-256, SHA-384, and SHA-512), and the SHA-3 family of functions. There are many more, but those are examples of hashing functions often encountered.