Hashing and its Applications
Introduction:
Hashing is that the process of converting a given key into another value. A hash function is employed to get the new value consistent with a mathematical algorithm. The results of a hash function is understood as a hash value or just, a hash.
Hashing is an improvement over Direct Access Table. The idea is to use a hash function that converts a given number or the other key to a smaller number and uses the tiny number as an index in a table called hash table.
Hashing is that the solution which will be utilized in most such situations and performs extremely well compared to above data structures like Array, Linked List, Balanced BST in practice. With hashing we get O(1) search time on the average and O(n) within the worst case.
A good hash function should have following properties:
1) Efficiently computable.
2) Keys should be uniformly distributed (Each table position equally likely for each key)
3) Uses a one way hashing algorithm, or in other words, the hash can’t be converted back to the first key.
Since a hash function gets us a little number for an enormous key, there’s an opportunity that two keys end in an equivalent value. The situation where a newly inserted key maps to an already occupied squeeze the hash table is named collision and must be handled using some collision handling technique. Chaining and Open Addressing are two methods used to resolve collisions.
Applications of Hashing:
1) Cryptographic Hash Function: In cryptography, a cryptographic hash function could be a mathematical function. The message-sending capabilities of hash functions are combined with security characteristics during a cryptographic hash function.
Cryptographic hash functions enhance traditional hash functions by adding extra safety features, making it harder to deduce the context of a message or information about recipients and senders.
In particular, cryptographic hash functions exhibit these three properties:
- They aren’t in any danger of colliding. No two input hashes should map to the identical output hash.
- They can be concealed. It should be impossible to predict a hash function’s input value from its result.
- They should be enjoyable to resolve. Selecting an input that produces a pre-defined output should be tough. As a result, the input should be chosen from a large range of possibilities.
Although the three traits listed above are desirable, they’re not always possible to realize in practice. Collisions are conceivable because to the difference in sample spaces for input hashes and output hashes.
Cryptographic hash functions are commonly employed in cryptocurrencies to anonymously transmit transaction data. Bitcoin, the primary and largest cryptocurrency, for instance, employs the SHA-256 cryptographic hash function in its algorithm. Similarly, IOTA, a Internet of Things platform, includes its own cryptographic hash function, Curl.
2) Password Verification: Storing passwords in normal text files is dangerous, which is why almost all sites store passwords as hashes. When users enter their password, a hash is applied and the result is compared to a list of hashes stored on the company’s servers. However, this is not a fool proof approach, as Collection #1 of 21 million stolen passwords discovered in 2019 shows.
Some hashing algorithms are MD5, SHA-1, SHA-2 and Blowfish.
A salt is a piece of random data that is used as an additional input to a one-way function of “scrambling” a password or passphrase. In order to add salt to the password, we add some random characters to it before applying the hash, so that the same password generates a unique string each time it is hashed, thus defeating the rainbow table attack and requiring each password to be cracked separately. Salts are usually stored with hashes, and these salts should be used when comparing passwords with hashes.
3) Signature Generation and Verification: Generating & verifying signatures is a mathematical process which is used to verify the authenticity of digital documents or messages. A valid digital signature, where the prerequisites are satisfied, gives the receiver proof that the message was created by a valid sender and that the message was not changed in transit. A digital signature process consists of three algorithms: a key generating algorithm; a signing algorithm that, generates a private key which produces a signature; and a signature verifying algorithm.
Merkle Trees is a technology which is used in cryptocurrencies, is a kind of digital signature. The hash functions used in this technology are specified in the Secure Hash Standard (SHS), FIPS 180. A digital signature generation & verifying algorithm is used in electronic mail, electronic funds transfer, electronic data interchange, software distribution, data storage, and other applications that require data integrity assurance and data origin authentication. Its applications are:
- Transaction verification using elliptic curve cryptography
- Public Key Algorithms
- Broadcast authentication schemes for wireless networks
- Identity-based aggregate signature
4) Data Integrity Checks: Data integrity check is a very important application of hashing. It is used to generate the checksums on data files. It provides assurance to the user about the correctness of the data.
Data integrity check helps the receiver in detecting any changes made to the original file. However, it does not provide any assurance about the originality. An attacker, instead of modifying the file data, can change the entire file and compute an all together new hash and send it to the receiver. This integrity check application is useful only if the user is sure about the originality of the file.
5) Compiler Operation: A programming language’s keywords are treated differently from other identifiers. The compiler keeps all of these keywords in a set, which is implemented using a hash table, to distinguish between programming language keywords (if, else, for, return, etc.) and other identifiers and to successfully compile the program.
6) Robin-Karp Algorithm: The Rabin-Karp algorithm is one of the most well-known applications of hashing. This is a string-searching technique that employs hashing to locate a certain collection of patterns in a string. Plagiarism Detection is a real-life application of this technique.
7) Linking File name and path together: When browsing files on our local system, we notice two extremely important aspects of a file: the file name and the file path. The system employs a map (file_name, file_path) that is implemented using a hash table to store the correspondence between file name and file path.
Conclusion:
Thus, hashing and hash functions are essential tools in computer security. We have learned the objectives of hashing which include data integrity and authentication. We have learned what hashing is, how it works and went over hash functions in cryptography and many other applications.
Contributors:
Dhawal Khapre
Advait Kamathe
Ketaki Kamble
Tejas Khairnar
Maithili Kharabe