Cryptography is the art of protecting and securing communication from potentially malicious third parties. It protects users from data breaches and ensures that their information is accessible only by intended recipients. Cryptography converts information into a generally meaningless (though sometimes meaningful) code, preventing third parties from making sense of private messages.

This conversion follows certain pre-defined steps and procedures that are collectively called an algorithm. An algorithm is a set of finite steps or a sequence of instructions that aims to solve a specific problem. In cryptography, algorithms convert inputs of an arbitrary length into outputs of a fixed-length hash, a process called cryptographic hash algorithms (“CHAs”).

Cryptographic hash algorithms have special properties that set them apart from regular hash algorithms. These key features are irreversibility, collision resistance, efficiency, and determinism.

Must-know Cryptographic Hash Algorithms

Although many CHAs are available, this article discusses some of the very first CHAs and some contemporary algorithms that are particularly efficient. Further, while this article does not focus on hashing algorithms used only for hashing passwords, it does focus on algorithms used often in cryptography for different purposes, such as digital signatures and message authentication. This article aims to provide a high-level understanding of how an individual CHA works, as well as the shortcomings of existing algorithms that have led to the development of new, more promising algorithms.

Message-Digest-5 (MD5) and Secure Hashing Algorithm-1 (SHA-1)

Message-Digest-5 (MD5) and Secure Hashing Algorithm-1 (SHA-1) are two of the most popular CHAs of their times. The numeric digit after their names represents the number of revisions previously made to the original algorithm they represent. Accordingly, MD5 is the fifth revision of the Message-Digest algorithm, which was designed by R.L. Rivest. SHA-1 is the first revision of the Secure Hashing Algorithm designed by the National Security Agency (“NSA”).

An overview of SHA-1

SHA-1 is a cryptographic hash algorithm designed by the NSA in 1995. Additionally, it is a digital signature algorithm. It produces a 160-bit long hash that was considered secure until 2017, when it was theoretically proven to be vulnerable to length extension attacks. Since then, SHA-1 is no longer considered secure and has been removed from the cryptographic hash function (“CHF”) category. However, it is still useful to understand SHA-1 and how it works.

SHA-1 bears some striking similarities to MD5 as far as its structure. This article focuses on SHA-1 because it acts as a mediator between MD5 and SHA-2. Its strengths help explain the weaknesses of MD5, and its weaknesses help explain the strengths of SHA-2.

SHA-1’s output of a 160-bit-long hash is usually rendered as a 40-digit hexadecimal code. This bit value, as its name suggests, combines 0s and 1s in what appears to be a completely random pattern, given the input. That random appearance is why these CHAs are called pseudo-random, which is one of the most important characteristics of any CHA. For example, the 40-digit hexadecimal number for “abcd” is “81fe8bfe87576c3ecb22426f8e57847382917acf.” A 40-digit hexadecimal number for “aacd” is “11b610435d3b0c855bda8ac33b9f721560d787a0.”

The beauty of this CHA is that its hash will remain the same as long as the input does not change. Even the slightest change to the input will completely alter the hash. Moreover, the length of the hash does not depend on the size of its input. It remains the same for a regular .txt file, a short video, or a complete movie. But how exactly does SHA-1 generate a hash?

Hash generation in SHA-1

SHA-1 has a compression function responsible for generating a hash. The primary purpose of this compression function is to combine two fixed-length inputs in order to produce a single fixed-length output with the same size as one of the inputs. Initially, one of the inputs will be the default internal state of SHA-1, set to a 160-bit random value, which is five 32-bit words (4-byte) each.

 

 

 

 

 

 

 

 

 

 

 

Though the default value appears random, it is not. Actually, it was initialized by some constants, based on the pre-defined standards of this algorithm. The length of this default value determines the length of the hash in SHA-1, which is why the output hash generated by SHA-1 is 160 bits long. The second input is the message. The compression function takes the message in 512-bit blocks and keeps setting an internal state until the full message is processed.

The generation of a hash can be understood from the figure below.

 

 

 

 

 

 

 

 

The internal state of SHA-1 is exactly copied in the compression function as one of its inputs. For the sake of simplicity, assume that the message is only 512 bits long. Therefore, the second input for this compression function is the 512-bit message.

Essentially, SHA-1 starts with an internal state, and then bits of the message (in 512-bit blocks) are brought in one at a time. The internal state is continually updated until no more bits of the message remain. Finally, the user may read the ending internal state—the resulting hash. Technically, the internal state is updated with the message until the message ends. This repetitive updating of the internal state with a compression function is called Merkle-Damgård construction. If the compression function works, it will inevitably make the SHA function more secure and efficient. That is primarily how the SHA family of hash functions works.

In the example case, since the message is 512 bits long, the loop will run only once before providing its output hash. However, 80 rounds of SHA compression are performed for a single run of the loop to generate this output.

 

 

 

 

 

 

 

 

But why 80 rounds? Why not 50 or 100? The logical answer to this question underscores the balance between security and efficiency. The more rounds of SHA compression, the more security, but the lower the algorithm’s efficiency. Therefore, a balance must always be maintained between the number of rounds and security. Efficiency comes at the expense of security. The designers of SHA-1 decided on 80 rounds because it was considered effective against cryptanalytic attacks. Indeed, it is particularly effective in preventing differential cryptoanalysis attacks against hashes and block ciphers.

But what if the length of the message is not a multiple of 512?

If the length of the message is not a multiple of 512, the message is either less than 512 bits long or longer than 512 bits long. For the first scenario, the message would need to be operated upon to become 512 bits long. However, for the second scenario, blocks of 512 bits would be sent to the compression function until the last block, which would be less than 512 bits long. These message blocks would be converted into a 512-bit block by implementing the concept of padding.

Suppose that a message is 48 bits long, as shown below:

  • 0010 0110 1100 0101 0010 0110 1100 0101 1100 0101

Append “1” at the very end of the message, and the 1 is followed by 0s.

  • 0010 0110 1100 0101 0010 0110 1100 0101 1100 0101 1000 0000 0000 0000….

The message should end with a 64-bit representation of the original message; 48 is equal to 0000 11. Therefore, the 512-bit message would resulting in the following:

  • 0010 0110 1100 0101 0010 0110 1100 0101 1100 0101 1000 0000 0000 0000…. 0000 11.

The message is then ready to be sent as an input for the compression function.

A quick overview of other CHAs

SHA-2

Unlike SHA-1, SHA-2 is not a single algorithm. Instead, it is a family of algorithms with six different hash functions—SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, and SHA-512/256.

SHA-256 generates a 256-bit hash or a 64-digit hexadecimal code. It is considered the most suitable choice since MD5 and SHA-1 are no longer secure. However, it is less efficient than either MD5 or SHA-1.

MD5

This message-digest algorithm generates a 128-bit hash. It was initially developed for cryptographic purposes. However, its security was compromised long ago. But it is still useful to verify data integrity against unintentional corruption. It is now widely used for primarily non-cryptographic purposes.