Before diving into the details of a Cryptographic Hash Function (CHF), it is important to first understand the most primary and originating idea behind this concept (i.e., Cryptography) because Hash functions may have been here since the late 1970s (Preneel, 2010) but cryptography is as old as Julius Caesar i.e. 100 B.C. (Redhat, 2019, n.d.). In its embryonic state, what sparked as a simple yet revolutionary idea of hiding military messages from the enemies back in 100 BC, matured into a complex and intricate idea as a CHF.
In short, a CHF is an algorithm that takes variable-sized data as an input to generate a fixed-sized output. Of course, this is a simplification. To fully understand a CHF, one must first comprehend the basic idea of cryptography and some other associated terms—such as encryption and hashing.
Cryptography is the study of the ideas, methods, techniques, and strategies, that can be used to encode a message, or as a matter of fact, any piece of information or data, to hide its original and intended meaning. In primitive times, having secure communication channels was paramount for the survival of any kingdom, especially during war. To prevent the enemies from obtaining sensitive information, messages were encrypted so that even if an enemy managed to acquire the missive, they couldn’t decode it without the cipher key, at least not instantly, to get the original message. However, the encryption technique used back then was simple and is categorized as symmetric key encryption.
In this type of encryption, a single key—generally referred to as a cipher key—is used for both the encoding and decoding a message. For an example of symmetric key encryption, read the text below.
Zpv gpdvt po uif usjwibm boe mptf tjgiu pg xibu jt nptu jnqpsubot.
Did you able to make sense out of this meaningful message?
It wouldn’t be a stretch to assume that a toddler, who has just learned to write the alphabets, has written the text above. It doesn’t make sense.
However, what if I told you to shift every character one space back and then see what you get? (Hint: replace z with y, p with o and, v with u). With the cipher key revealed, one can easily decode the message to unlock the original text:
You focus on the trivial and lose sight of what is most important.
Simple, isn’t it? This was the most basic example of symmetric key encryption where we had 1 as our cipher key. We first used it to encrypt the message by shifting every character one step forward and then used this same key to decrypt the message by shifting each character one step backward.
However, as human intelligence evolved, using a single key for both encryption and decryption became trivial because the enemies were always just one key behind cracking our messages and this was exactly what happened with the notorious Enigma back in World War II. Therefore, a more sophisticated way of encryption was required to make communication even more secured.
Unlike Symmetric Encryption, Asymmetric Encryption does not rely on a single key. Rather, it uses a combination of two keys, one for encrypting the message and the other for decrypting the code. Thanks to Hollywood movies, most of us are familiar with this concept:
The protagonist visits a bank to access her locker. Two keys are required to unlock the locker. One key remains in the custody of the bank and the other one remains in the custody of the owner of the locker. Therefore, even if someone from within the bank manages to get the bank’s key, they will still be unable to open the locker because the owner’s key is also required to open the locker.
This is asymmetric encryption. By using a pair of keys, asymmetric encryption is more secure than symmetric encryption and more difficult to crack. Revolutionary technologies, such as Blockchain, use asymmetric encryption to ensure the security and privacy of the transactions.
Now that we know the basics of cryptography, encryption, and its types, we are good to dive into the hash functions. Although hashing and encryption share some similarities, it is pertinent to clarify that the two are entirely different animals. Encryption is a two-way function. Hashing, on the other hand, is a one-way function in the sense that once encrypted, a hash value cannot be decrypted.
A hash function is used to map the data of arbitrary size to generate an output of a fixed size, usually called the Hash Digest. However, if this hash function satisfies some well-established standards of security, integrity, and other conventions of similar scope, it can be called a Cryptographic Hash Function (Sobti & Ganesan, 2012).
Following are certain features of a CHF that distinguish it from a simple Hash function, which usually focuses on staying collision free only.
- Irreversible: Unlike encryption, Cryptographic Hash Functions are one-way. Once encrypted, you can never decrypt them even if you have the exact hashing algorithm that was used for the encryption. Simply put, there is no decryption in CHFs. Hence, they are irreversible.
- Collision resistant: It is probabilistically impossible to have the same output digest for two different inputs. It doesn’t matter if the inputs are differentiated by one character or thousands of characters, the output digest will be entirely different. This feature of a CHF, where even a small change results in an entirely different hash, is termed as the Avalanche Effect.
- Pre-image resistance: Even if you have the hashed digest, it is probabilistically impossible to find the input that resulted into that hash value. For example, if I have an output of 100, you can never guess the inputs with 100% surety that resulted into this output.
- Second pre-image resistance: Even if you have the input, the hash algorithm, and the output digest, you may never get exactly the same output for a different input.
- Deterministic: CHFs are deterministic in the sense that you will always get exactly the same output for a single input.
- Fast: They are insanely fast and efficient because they largely rely on bitwise operations that are quickly computable.
- We have SHA-1 (Secure Hashing Algorithm) CHF that generates a 40-character hexadecimal output digest for the input of any length. It was designed by NSA back in 1995 and was widely used until 2017 when it was theoretically proved that it is prone to length extension attacks. Since then, it has been removed from the category of CHF and is no more considered secure enough to be a CHF.
- We have SHA-2 family that has 6 different hash functions including SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, and SHA-512/256
These hash functions are being widely used to ensure the integrity of messages and documents that are shared over the internet. A sender generates a hash digest for a document which he/she sends to the receiver along with the document. The receiver can then generate the hash value of that document using the same hashing algorithm to verify if he/she gets the same hash digest or not. If both hash values are the same, then the document hasn’t been tampered with.
Similarly, most of the websites that are following modern and secured techniques to ensure the password security of the users don’t directly store the user passwords or sensitive information in their database. Instead, they store the equivalent hash values. Therefore, even in case of a hack or the database gets compromised due to any other reason, the sensitive information of users remains secure because the hackers would only get the hash values that will never result into the actual input because of the pre-image resistance of CHFs.
Preneel, B. (2010). The First 30 Years of Cryptographic Hash Functions and the NIST SHA-3 Competition. In J. Pieprzyk (Ed.), Topics in Cryptology—CT-RSA 2010 (pp. 1–14). Springer. https://doi.org/10.1007/978-3-642-11925-5_1
Redhat, 2019. (n.d.). Red Hat Customer Portal. Retrieved October 25, 2020, from https://access.redhat.com/blogs/766093/posts/1976023
Sobti, R., & Ganesan, G. (2012). Cryptographic Hash Functions: A Review. International Journal of Computer Science Issues, ISSN (Online): 1694-0814, Vol 9, 461–479.