Friday, April 8, 2011

Cryptography - Hash generation on Linux using MD5/SHA-1 & Applications

Yo, Readers.I will talk about MD5 (Message-Digest Algorithm 5) & SHA-1(Secure hashing algorithm - 1) on this post.
This is hashing algorithm. Now what is that, Well hashing algorithm takes some arbitrary block of data & returns a fixed-size bit string so that even the small change in data results in changing the hash.
Hence the "Message" is data & hash is "Digest".
MD5 gives us 128-bit hash value
SHA-1 gives us 160 bits of digest size or hash value.
These MD5 or SHA hashes are used for :
a. Data Integrity checkups - The data loss in transfer of files over the Internet can be due to 2 reasons i.e. due to connection loss causing loss of data for some bits. This change doesnt affect the file size but can affect the operation. Another reason could be injection of malicious code in the legit data files using MITM (Man in the Middle) techniques causing security threat. As like downloading legit software from trusted provider having the data file tampered on the way by MITM. This can be easily detected by computing the MD5 or SHA hash of the received file & then comparing it with the hash of original file provided at the Software owner website.
If the hashes doesn't match then there could be data change or data loss. That's why these days every software provider has checksums provided for comparison at the receiving end.
b. Password hashing - generally the passwords stored are not in cleartext format, so the passwords specified during authentication procedure are hashed & there hashes are compared to the hashes stored in the database.
c. File or data identification - Many peer to peer (p2p) file sharing applications use hashing for uniquely identifying the seeds or lets say parts of data to gather accordingly.

Now lets do some hash computing on linux using the utlities as follows

1. md5sum - This will compute the MD5 hash for the provided input. Now as there is chance of getting same hashes for the same file contents but then there will be many of it. This will report the hash & the filename as the standard output.

root@ubuntu:~# cat /dev/urandom > 1.txt
^C
root@ubuntu:~# du -sh 1.txt
35M 1.txt
root@ubuntu:~# md5sum 1.txt > 1.txt.hash
root@ubuntu:~# cat 1.txt.hash
97414a6530b9b9e91634c85522fb22a7 1.txt
root@ubuntu:~# md5sum -c 1.txt.hash
1.txt: OK
root@ubuntu:~#

Now here as you can see that hash is generated for the file in the same directory. Passing the hash file to the md5sum with "-c" switch will automatically verify the file for integrity. So we can compute the hashes for all the files in directory & pass it on to the other end for the integrity check. So all the files can be transfered using FTP and then checked with hash file containing all the hashes for the directory paths & all.

root@ubuntu:~# md5sum -b 1.txt > 1.txt.binary.hash
root@ubuntu:~# md5sum 1.txt > 1.txt.text.hash
root@ubuntu:~# cat 1.txt.binary.hash 1.txt.text.hash
97414a6530b9b9e91634c85522fb22a7 *1.txt
97414a6530b9b9e91634c85522fb22a7 1.txt
root@ubuntu:~#

md5sum can also be used with the binary mode with "-b" switch. Default is Text mode for reading the file while hashing. "*" indicates the binary mode of the hashing. Normally on Unix systmes there is no change in the hash but on some systems with different formatting can change the hash.

2. sha1sum - sha1sum utility is same as that of md5sum for the binary mode & integrity check.Just it returns 160 bit hash value.

root@ubuntu:~# sha1sum 1.txt > 1.txt.hash
root@ubuntu:~# cat 1.txt.hash
64ef39c2963627d9acfb6fc0cf3b50dfc016e4a8 1.txt
root@ubuntu:~# sha1sum -c 1.txt.hash
1.txt: OK
root@ubuntu:~#

3. cksum - This utility generates both CRC(Cyclic Redundancy check) checksum & count the bytes in the file for integrity check.

root@ubuntu:~# cksum 1.txt > 1.txt.cksum
root@ubuntu:~# cat 1.txt.cksum
2861886857 36143104 1.txt
root@ubuntu:~#

here the first field represents checksum & the next field represents the bytes of the file.

Security focus on Hashing -
Now this is an endless topic to explore as there are vulnerabilities found in the hashing algorithms & other security aspects are also there.
The hash computation mechanism is targeted by the use of the Nvidia's CUDA platform. High end Graphics Processing Unit can compute more than 200 million hashes per second.
Using Rainbow Tables computed prior by the Distributed computing architecture such as BOINC can be used to reverse the md5 hashes. This is used for password hash reversing for clear text password.

Let me know what you guys think about this.

No comments:

Post a Comment