Cryptographic Hash Algorithms: An Introduction

Cryptographic Hash Algorithms

Cryptography is an art of protecting and securing the communication from third parties who may or may not be our well-wishers. In any case, cryptography rescues us from being a victim of data breach and ensures that the information is accessible by only the intended receivers. It warrants this by converting information into a meaningless (sometimes meaningful) code making it difficult for a third party to make sense of our intended message. This conversion is done by following certain pre-defined steps and procedures that are collectively termed as Algorithm.  An algorithm is a set of finite steps and sequence of instructions that aimed at solving a specific problem. In our case, the algorithms which are used to convert the input of an arbitrary length to an output of a fixed length hash, are called cryptographic hash algorithms (CHA).

The cryptographic hash algorithms have certain properties in addition to what a regular hash algorithm has. When the features of Irreversibility, collision resistance, efficiency, and deterministic, added to a normal hash algorithm, it no longer remains a regular hash algorithm. It now becomes a more superior hashing algorithm called cryptographic hash algorithm.

Scope of this reading:

Although there are many CHAs, we will restrict our discussion to some of the very first CHAs along with contemporary more efficient algorithms. Further narrowing the scope of this reading, we will not be going into the details of such hashing algorithms that may only be used for hashing passwords because they tend to be very slow since we need them to be secure. In short, we will be focusing on algorithms that are being used quite frequently in cryptography for different purposes such as digital signature and message authentication. We will be steering our discussion in such a manner that it will not only enable our readers to understand the working of individual CHA but it will also provide them with hints to contextualize the shortcomings of an existing algorithm that led to the development of next, more promising, algorithm.

Message Digest-5 (MD5) and Secure Hashing Algorithm-1 (SHA-1)

MD5 and SHA-1 were two of the most popular CHAs of their times. The numeric digit after their names represents their position in reference to the number of revision that was made in the original algorithm to formulate that one. Therefore, MD5 was the fifth revision of the Message Digest Algorithm that was designed by R.L. Rivest and SHA-1 was the first revision of Secure Hashing Algorithm designed by National Security Agency (NSA).

An Overview of SHA-1

SHA-1 is a cryptographic hash algorithm that was designed by National Security Agency (NSA) back in 1995 and was added into the list of Digital Signature Algorithm. It outputs a 160-bit long hash that was considered to be secure until 2017 when it was theoretically proved that it is prone to length extension attacks. Since then, SHA-1 has been removed from the category of CHF and is no more considered secure enough to be a CHF. However, leaving that security breach discussion for another time, lets understand what SHA-1 is and how it works.

SHA-1 bears striking similarities to MD4 and MD5 as far as the structure is concerned. In this article, we have decided to explore SHA-1 because it acts as a mediator between MD5 and SHA-2 in the sense that its strengths can be studied in reference to the weaknesses of MD5 and its weaknesses can be studied in reference to the strengths of SHA-2.

As mentioned earlier, SHA-1 produces an output of 160-bit long hash which is usually rendered as a 40-digits long hexadecimal code. This bit value, as the name suggests, is a combination of 0s and 1s that appear to be completely random if we observe them in combination with the given input. That is why these CHAs are called pseudo-randomand it is one of the most important characteristics of any CHA.

For example, the 40 digits hexadecimal number for “abcd” is: 81fe8bfe87576c3ecb22426f8e57847382917acf

and a 40 digits long hexadecimal number for “aacd” is: 11b610435d3b0c855bda8ac33b9f721560d787a0)

The beauty of this CHA lies in the fact that the above-mentioned hash will remain the same as long as the input is not changed. Even the slightest of the changes in the input will completely alter the hash. Moreover, the length of hash does not depend on the size of input. It will remain same for a regular .txt file, a short video, or a complete movie.

But how exactly does SHA-1 generate the hash?

Hash Generation in SHA-1

SHA-1 has a compression function that is responsible for the generation of hash.  The primary purpose of this compression function is to combine two fixed length inputs in order to produce a single fixed length output having same size as one of the inputs. Initially, one of the inputs will be the default internal state of SHA-1 and it is set to a 160-bit long random value which is five 32-bit words (4-byte) each.

Though appear random, this default value isn’t so random. Actually, it is initialized by some constants based on certain pre-defined standards of this algorithm. The length of this default value determines the length of our hash in SHA-1 and that is why the output hash generated by SHA-1 is 160-bits long.  The second input will be our message whose hash we want to generate. The compression function takes in the message in the form of a 512-bits block at a time and keeps on setting the internal state until the message is expended.

Let’s understand the generation of hash from the below figure.

 

The internal state of SHA-1 is exactly copied in the compression function as one of its inputs. For the sake of simplicity, we have assumed that our message is only 512-bits long. Therefore, the second input to this compression function will be our 512-bits long message.

Essentially, SHA-1 starts with an internal state and then we bring in bits of our message (512-bits block) one at a time and we will keep on updating the internal state until no more message is left. At the end, we just read what the internal state is and that will basically be our hash. Technically, we are just taking the internal state and keep on updating it with the message until the message ends. This repetitive updating of internal state with a compression function is called MERKEL DOWN GUARD CONSTRUCTION. If the compression function is good, it will inevitably make our SHA function more secure and efficient. That is how the SHA family of hash functions primarily works.

In our case, since the message was 512-bits long, the loop will run only once to give the output hash. However, 80 rounds of SHA compression function will be performed for single run of loop to generate this output.

But why 80 rounds? Why not 50 or 100?

The logical answer to this question underlines the importance of a balance between security and efficiency. The more the number of rounds, the more will be the security but the lesser will be the efficiency of the algorithm. Therefore, there will always be a balance that needs to be maintained between the number of rounds and the security. Efficiency comes at the expense of security. The designers of SHA-1 decided to go for 80 rounds because this number of rounds was considered good against cryptanalytic attacks. In fact, It is rather more popular for preventing differential cryptoanalysis attacks that are used for attacking hashes and block ciphers

But What if the length of our message is not a multiple of 512?

In that case, either we will have a message less than 512-bits long or a fairly longer message having a length that is not a multiple of 512. For first scenario, our message is ready to be operated upon in order to become a 512-bit long. However, for the second case, we will keep on sending blocks of 512-bits to our compression function until we are left with last block which will be less than 512-bits. This message blocks will be converted to a 512-bit block by implementing the concept of Padding.

 

Suppose that our message is a 48-bit long message shown below:

  • 0010 0110 1100 0101 0010 0110 1100 0101 1100 0101

Append 1 at the very end of your message and then this 1 will be followed by 0s.

  • 0010 0110 1100 0101 0010 0110 1100 0101 1100 0101 1000 0000 0000 0000….

The message should end with 64-bit representation of our original message. 48 is equal to 0000 11. Therefore, the 512-bit long message will be something like this:

  • 0010 0110 1100 0101 0010 0110 1100 0101 1100 0101 1000 0000 0000 0000…. 0000 11.

 

Now it is ready to be sent as an input to our compression function.

 

SHA-2

Unlike SHA-1, SHA-2 is not a single algorithm. Rather, it is a family of algorithms having six different hash functions including SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, and SHA-512/256.

SHA-256 generates a 256-bit long hash or a 64 digits long hexadecimal code. it is considered to be the most suitable choice since MD5, and SHA-1 are no more secure. However, it is less efficient than either of MD5 and SHA-1.

MD5

Message-digest algorithm generates a 128-bit hash. It was initially developed for cryptographic purposes. However, its security was compromised long ago. Though it is still useful as a checksum for verification of data integrity but that too against an unintentional corruption. Primarily, it is now widely used for non-cryptographic purposes.

Though appear random, this default value isn’t so random. Actually, it is initialized by some constants based on certain pre-defined standards of this algorithm. The length of this default value determines the length of our hash in SHA-1 and that is why the output hash generated by SHA-1 is 160-bits long.  The second input will be our message whose hash we want to generate. The compression function takes in the message in the form of a 512-bits block at a time and keeps on setting the internal state until the message is expended.

Let’s understand the generation of hash from the below figure.

 

The internal state of SHA-1 is exactly copied in the compression function as one of its inputs. For the sake of simplicity, we have assumed that our message is only 512-bits long. Therefore, the second input to this compression function will be our 512-bits long message.

Essentially, SHA-1 starts with an internal state and then we bring in bits of our message (512-bits block) one at a time and we will keep on updating the internal state until no more message is left. At the end, we just read what the internal state is and that will basically be our hash. Technically, we are just taking the internal state and keep on updating it with the message until the message ends. This repetitive updating of internal state with a compression function is called MERKEL DOWN GUARD CONSTRUCTION. If the compression function is good, it will inevitably make our SHA function more secure and efficient. That is how the SHA family of hash functions primarily works.

In our case, since the message was 512-bits long, the loop will run only once to give the output hash. However, 80 rounds of SHA compression function will be performed for single run of loop to generate this output.

But why 80 rounds? Why not 50 or 100?

The logical answer to this question underlines the importance of a balance between security and efficiency. The more the number of rounds, the more will be the security but the lesser will be the efficiency of the algorithm. Therefore, there will always be a balance that needs to be maintained between the number of rounds and the security. Efficiency comes at the expense of security. The designers of SHA-1 decided to go for 80 rounds because this number of rounds was considered good against cryptanalytic attacks. In fact, It is rather more popular for preventing differential cryptoanalysis attacks that are used for attacking hashes and block ciphers

But What if the length of our message is not a multiple of 512?

In that case, either we will have a message less than 512-bits long or a fairly longer message having a length that is not a multiple of 512. For first scenario, our message is ready to be operated upon in order to become a 512-bit long. However, for the second case, we will keep on sending blocks of 512-bits to our compression function until we are left with last block which will be less than 512-bits. This message blocks will be converted to a 512-bit block by implementing the concept of Padding.

 

Suppose that our message is a 48-bit long message shown below:

  • 0010 0110 1100 0101 0010 0110 1100 0101 1100 0101

Append 1 at the very end of your message and then this 1 will be followed by 0s.

  • 0010 0110 1100 0101 0010 0110 1100 0101 1100 0101 1000 0000 0000 0000….

The message should end with 64-bit representation of our original message. 48 is equal to 0000 11. Therefore, the 512-bit long message will be something like this:

  • 0010 0110 1100 0101 0010 0110 1100 0101 1100 0101 1000 0000 0000 0000…. 0000 11.

 

Now it is ready to be sent as an input to our compression function.

 

SHA-2

Unlike SHA-1, SHA-2 is not a single algorithm. Rather, it is a family of algorithms having six different hash functions including SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, and SHA-512/256.

SHA-256 generates a 256-bit long hash or a 64 digits long hexadecimal code. it is considered to be the most suitable choice since MD5, and SHA-1 are no more secure. However, it is less efficient than either of MD5 and SHA-1.

MD5

Message-digest algorithm generates a 128-bit hash. It was initially developed for cryptographic purposes. However, its security was compromised long ago. Though it is still useful as a checksum for verification of data integrity but that too against an unintentional corruption. Primarily, it is now widely used for non-cryptographic purposes.