A well-known real-world case is SHA-1. In 2017, Google and CWI published two PDFs with different contents but identical SHA-1 hashes, demonstrating that “matching SHA-1” no longer implies “the same file.”

 A hash collision is a phenomenon where two different pieces of data produce the same hash value. A common misconception is to imagine only “accidental matches,” but once an attacker can intentionally generate collisions, the assumptions behind tamper detection and digital signatures start to break. For example, if you decide “it’s safe because the hash matches what the distributor published,” a collision-prone algorithm can let a different file be engineered to yield the same value, turning verification into a formality.

A well-known real-world case is SHA-1. In 2017, Google and CWI published two PDFs with different contents but identical SHA-1 hashes, demonstrating that “matching SHA-1” no longer implies “the same file.” In 2020, researchers went further and demonstrated chosen-prefix collisions, which allow collisions while preserving arbitrary chosen prefixes. That is far more troublesome in contexts like signatures and identity, because it undermines the shortcut idea that “if the hashes match, the documents are the same.” This is not just a lab trick; it’s a warning that systems can fail whenever they rely on hash equality as proof of sameness.

MD5 has similar issues. In 2008, chosen-prefix collisions were used to show that a “rogue CA certificate” could be created, effectively producing something that looks like a trusted certificate authority. Because certificates sit at the root of trust for HTTPS and many other systems, collisions are not merely about deduplication; they connect directly to impersonation and tampering of distributed software. In 2012, the Flame malware incident was also discussed in connection with certificate forgery involving collision attacks, reinforcing how these weaknesses can translate into real harm.

There is an important line to draw. In settings like hash tables or cache keys, where an attacker cannot freely choose inputs, collisions are often mainly a performance concern. But in security-sensitive settings where an attacker can shape the input, such as public files, signed artifacts, update packages, and certificates, collisions become a direct security problem.

The conclusion is straightforward: do not use MD5 or SHA-1 for security purposes (signatures, tamper detection, certificates). Move to SHA-256 or stronger. Also, audit your systems for designs that treat “hash match” as evidence of correctness: release pipelines, CI, legacy archive checks, and internal distribution flows. Where needed, replace them with signature verification or layered validation. A practical step you can do today is to search your scripts and configuration for “md5” or “sha1” and replace them with SHA-256 or SHA-512. If migration is hard, start by banning them. This is not optional.

Comments

Popular posts from this blog

go ahead baby, now on sale!!

Japan Jazz Anthology Select: Jazz of the SP Era