Hash Collisions Explained

In computer science, a hash collision is a random match in hash values that occurs when a hashing algorithm produces the same hash value for two distinct pieces of data.

Hashing algorithms are often used to prevent third parties from intercepting digital messages.

In fact, hashing algorithms provide the extra layer of protection necessary to secure the transmission of a message to its recipient. In computer science, hashing is a common practice used for a variety of purposes including cryptography, data indexing, and data compression.

Both hashing and cryptography protect data by transforming it into a secure format. However, while cryptography uses a process called encryption, hashing uses a mathematical formula called a hash function to truncate one value into another.

What Is a Hashing Algorithm?

A hashing algorithm is a mathematical formula that takes a given input of data and generates a value of a fixed length called a hash value. The hash value acts as a summary representation of the original value.

For example, if a computer contains a password that reads “Pass1234,” the computer will use a hash function to truncate “Pass1234” into a hash value of a fixed size such as “01.”

Think of a hash value as a series of numbered boxes ranging from one to one hundred, where the first password a user enters is a name card assigned to the first box. The stringed value of the password is protected because the computer only needs to go to the box with a matching hash value instead of remembering the entire string of characters.

Likewise, the computer will check to see whether the string of characters matches the hash value assigned to that input whenever “Pass1234” is entered and grant access accordingly.

Security is guaranteed if “Pass1234” is the only input data producing the “01” hash value.

What Is a Hash Collision?

A hash collision occurs when a hash algorithm produces the same hash value for two different input values.

For instance, a collision would occur in the above example if the hashing algorithm produced a hash value of “01” when a user logged into the computer with the “Pass1234” password or a random value such as “pass.”

If such a collision occurs, hackers can trick the computer into erroneously giving them access whenever they log in with a password that is close enough to the original password to produce the same hash.

Types of Hash Algorithms

Ideally, a good hash function should process the input value quickly while minimizing the possibility of collision. Programmers use different types of hash algorithms depending on the level of security that they desire.

Although the Message-Digest algorithm 5 (MD5) hashing algorithm was once one of the most popular hashing algorithms available, the Secure Hash Algorithm (SHA-2 and SHA-3) family of hashing algorithms is now considered the most secure because MD5 generates many collisions.

The best approach is to use the latest hashing algorithms to prevent attackers from reverse engineering original hash values.