In this article, I will explain what hash functions are, the math behind them, and the intuition about them. This is part 1 of 3 about "Math and Cryptography behind Blockchains". For other parts see below.
Blockchain math. Part 2 →
Proof of work and nonces
How can the private key serve as both, the username and password? →
Crash course on cryptography and digital signatures
What they taught us in school
Before we learn about hash functions, let's remember some stuff about regular functions from school. Here are various properties of functions.
Hash function is also just a function like the functions above. It also has its input, output. But its properties are a little more interesting:
- Very non-smooth output. Even the smallest change in input can result in a completely different output. This function's output jumps around a lot.
- One way function (non-invertible).
- The input and output are text.
- The output always has the same length. Meaning the output text always has the same number of letters.
Example hash function
Here is an example of a hash function. On top, you have the input to the function (the content), and the output (the hash value) is below it.
Try changing the content and check how the hash value (the output) changes. Notice that the hash value always has the same length. And the smallest changes in the content can lead to completely random changes in the hash value. Here are some examples of content and how the hash value for the content changes:
- Hash functions are used in Bitcoin → hafifcehi
- Hash functions are used in Litcoin → bihbidbci
- Ethereum → ijfjbchhc
Intuition
Hash functions summarize text
- You can think of hash functions as functions that summarize text.
- Given text of arbitrary length, they always produce text of short and fixed length - the summary of the input text.
- Just like the other functions, given the same input, you will always get the same output from hash functions. This means, given the same input text, you will always get the same summary.
- In addition, this summary is just one-way. Given the summary, you can't reverse-engineer the input text.
- Every input text results in a different summary.
- Another way to think about it is that hash functions produce the fingerprint of the input text. The same input text will always map to the same fingerprint but it's impossible to deduce the input text given just its fingerprint.
The summary is quickly verifiable but impossible to reverse-engineer
- The summary is quickly verifiable meaning that if you have an input text and someone claims that is the summary of that text, you can verify that claim by just passing your text through this function, i.e. by computing the output of the hash function.
- But the summary is impossible to reverse engineer since hash functions are one-way.
Hash functions lock the content
- Imagine you have some text. Like "Bitcoin will be top". And you want to send it to your friend.
- But you're worried that someone might intercept it and send a slightly modified version of your text to your friend. Imagine if someone changed "top" to "flop" (Bitcoin will be flop). That would be a disaster!
- How can you send this text to your friend knowing that it might be intercepted?
- Well, you can compute the hash value of your text and store it somewhere. The hash value of "Bitcoin will be top" is bjcbicghc.
- Now, you can send your text. When your friend receives your text, he will also compute the hash value of the received text. If it matches bjcbicghc, then that means that your text has not been tampered with. Otherwise, someone changed the text during transit.
- This is because the hash function output is very non-smooth. Even the smallest change in the input can result in a completely different output.
- So, hash functions lock the content in place. It's impossible to change the content without affecting the hash value as well. The hash value is the lock.
- This property of hash functions is used in checksums - when you download a file from a website, the website can provide the "checksum" (which is just another way of calling the hash value) of the file. After you download the file, you can pass it through the hash function and check that your computed checksum matches the checksum provided by the website. If they do, you downloaded the file safely. Otherwise, your downloaded file is corrupt.
- In some scenarios, it's also used in cloud storage to save space. Instead of storing the actual content (like a movie) which takes up a lot of space, you just store the hash value of the content, which is just a short piece of text. Later, when you need the content, you can pull it from auxiliary storage (like local storage), compute the hash value, and compare it to the hash value stored in the cloud. This way you can check for the authenticity of your movie, without actually storing the entire movie in the cloud.
Implementation
It sounds too good to be true, doesn't it? How can such a function exist?
I will give you an example of how a hash function can be implemented in practice. But it's not necessary to understand the implementation details in order to understand blockchains. You can get away by just understanding the high-level properties of hash functions explained above.
Implementation of a simple hash function:
- The function will first convert each letter in the input text to a number according to the table below. For example, given the text crypto, the function will first convert it to 99 114 121 112 116 111. (Where c→99, r→114, y→121, etc)
- It will then multiply each of the resulting numbers by the corresponding number below. 99 114 121 112 116 111 1 So, it would be , , , etc The number 53 is just a specially chosen number. It could have been another number like 73 but 53 makes the math nicer in this case.
- The function will then sum up all these products: sum=
- The function will then divide the sum by (which is really just 1000000009) and take the remainder: remainder = sum % . is also just an arbitrary number. There is no magic behind it. It could have as well been something like . (% here denotes the remainder operation. For example 36 % 5 = 1. The remainder of 36 divided by 5 is 1.)
- The remainder will be a number with at most 9 digits. Let's say it's something like 352015350. The function will finally convert each digit back to a letter again according to the table below. But it will first add 97 to each digit before looking up the letter from the table. So the number will be converted to dfcabfdfa (because 3+97=100→d, 5+97=102→f, 2+97=99→c, etc).
The resulting text: dfcabfdfa is the output - it's the hash value. (If the resulting text length is less than 9, we will add the letters to the end to make its length 9.)
Here is a picture summarizing the 5 steps
This is exactly what the function shown above (and repeated here for convenience) does. It computes the hash value of the content according to the 5 steps above. And it has all the properties of hash functions that we talked about until now.
Summary
And that’s it! To summarize, hash functions are just regular functions that have the following properties:
- They summarize any input text to a fixed-length summary, called the hash of the input.
- The summary is non-smooth. Meaning, the slightest change in input will result in a completely different summary.
- The function is one-way. Meaning it’s impossible to compute the input text given just the summary.
- The summary is quickly verifiable but impossible to reverse engineer.
- Hash functions lock the content and make it unmodifiable.
That's all a hash function is, just a regular function that has all these properties. Nothing else, nothing fancy. Very straightforward.
Hash functions are used in a lot of different places. Some of the are:
- They are also used in blockchains extensively.
References
- The hash function algorithm explained above is from this article.
Blockchain math. Part 2 →
Proof of work and nonces
How can the private key serve as both, the username and password? →
Crash course on cryptography and digital signatures
If you’re interested in learning more about blockchains, sign up below
You can also check out my blockchain course: