What is password hashing? [closed] - algorithm

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
What does it mean to hash a password?

Definition:
Hashing is the application of a function f() to a variable sized input to produce a constant sized output.
A => f() => X
B => f() => Y
C => f() => Z
A hash is also a one-way function which means that there isn't a function to reverse or undo a hash. As well re-applying the hash f(f(x)) isn't going to product x again.
The Details:
A hash function can be as simple as "add 13 to the input" or complex like a Cryptographic Hash such as MD5 or SHA1. There are many things that constitute a good hash function like:
Low Cost: Easy to compute
Deterministic: if I hash the input a multiple times, I am going to get the same output each time
Uniformity: The input will be evenly distributed among the possible outputs. This falls in line with something called the Pigeonhole Principle. Since there are a limited number of outputs we want f() to place those outputs evenly instead of in the same bucket. When two inputs compute to the same output this is known as a collision. It's a good thing for a hash function to produce fewer collisions.
Hashing applied to Passwords:
The hashing of passwords is the same process as described above, however it comes with some special considerations. Many of the properties that make up a good hash function are not beneficial when it comes to passwords.
Take for example determinism, because hashes produce a deterministic result when two people use the same password the hash is going to look the same in the password store. This is a bad thing! However this is mitigated by something called a salt.
Uniformity on the other hand is beneficial because the desire is for the algorithm to limit collisions.
Because a hash is One-Way means the input cannot be determined from the output, which is why hashing is great for passwords!

takes a block of data and returns a string such that you can't get your original block of data back.
Wikipedia Article
Hashing a password will take a clear text string and perform an algorithm on it (depending on the hash type) to get a completely different value. This value will be the same every time, so you can store the hashed password in a database and check the user's entered password against the hash.
This prevents you from storing the cleartext passwords in the database (bad idea).
Here is a list of hash functions.

A hash is simply a one-way function, that will take a string or data source and create an encrypted looking string.
There are various hashing algorithms the most popular is MD5, but there are many others. Many experts in the industry are using the SHA256 algorithm for better security.
MD5 Hash for the words:
password is 22e5ab5743ea52caf34abcc02c0f161d
PASSWORD is 319f4d26e3c536b5dd871bb2c52e3178
The character length of the result will be the same regardless of how many characters you try to hash. Hashes are commonly used to store passwords to prevent them from being viewed.

Related

In Hashing, can't we find AT LEAST one original text hashing to the given hash value

I have a basic question about hashing. It is said that hashing is one way. I have a doubt that if we simply reverse the steps in program/algorithm/logic then can't we find at least one input which hashes to the given output hash value?.
I found 2 related posts, but I am still not completely clear:
How is one way hashing possible?
How do one-way hash functions work? (Edited)
I have the same question as the comment to the accepted answer in the first post:
"Well, but if I want to bypass a password check it suffices to find one string that hashes to the same value as the original password". Does this comment hold water?.
What you're thinking of is called "hash collisions".
And you're right to think, that if one could find an efficient method to determined inputs for a given hash functions that produce a desired output, this would break a lot of systems (https://en.wikipedia.org/wiki/Preimage_attack)
That's there the bones and meat of cryptographically secure hash functions come in. Those are built in a way, that it is very, very difficult to find a preimage that produces a desired hash.
Over time mathamaticians and cryptologists are chipping away on those hashes and quite a number of hash functions that were used for securing thing have been broken (MD4, MD5, SHA-1).
Also it's important to differentiate between hashes that are intended to check the integrity of messages, and hashes that are intended to protect secrets.
For integrety checking you want fast hashes, so that you can put a lot of data through them with minimal effort. MD5, SHA-1, SHA-2 are such hashes.
For secret keeping you want SLOW -er than molasses hashes, so that one can't easily brute force through dictionaries of other predicable patterns of a secret. SCrypt, BCrypt, Argon and many-round PBKDF schemes are such hashes.
The operations in a cryptographic hash function are so complex and there are so many of them that reversing the function (compute at least one valid input for a given output) is incredibly infeasible. It doesn't matter if you do that reversing by hand or with the help of some sort of algorithmic solver. This is called (first) preimage resistance and this is what cryptographers are attacking when a new hash function is proposed. If the hash function stood the test of time, it is considered secure.
On the other hand it is much easier to just generate a bunch of candidate passwords and run the known hash function over them to check for equality with the given output. Humans are pretty bad at generating good passwords or passphrases. Have a look at this talk.
In Hashing, can't we find AT LEAST one original text hashing to the given hash value
In that context, "finding" as in brute forcing the input space is easier than attacking the hash function itself.
There's a very simple way of giving a hash function that is not reversible:
int GetHashCode(byte[] myData)
{
return 1;
}
This is a perfectly valid hash function, as it maps the contents of an arbitrary data set to a much smaller domain (int in this case). It satisfies the condition that the same input data gives the same output data.
It is obvious that this function is not reversible.
(Of course, this hash function is not suitable for securing anything, but that's only one application of hash functions)

Using multiple hash outputs in iterations?

Is there a known or perceived weakness to using the output of other hash algorithms as input for the next hash iteration?
Of course double hashing is not recommended, but this is not the same as double hashing.
Example:
I take a "secret" input and I hash it with SHA256, SHA384, and RIPEMD160 separately. I then combine the output of each into a single long string to use as input for a SHA512 hash. I then repeat this process repeatedly for a number of times.
In my mind, doing this significantly expands the length of the input into the SHA512 and essentially makes brute for even more infeasible.
Additionally, I considered using a 4th hash function merely to generate a value which could then be used to vary the length of the combined input string, by possibly discarding a few bytes in an unpredictable manner, so that the input is not a constant size. I'm not entirely sure that would be of any benefit.
Thoughts?
An answer to this question depends heavily on the attack scenario.
Of course double hashing is not recommended, but this is not the same as double hashing.
I would say: No! If you are storing passwords using a hash function, the attack on the store will be harder, if you use multiple rounds (feeding the output of round n as input for round n+1). Bitcoin as another example uses 2 passes (see here and here). For additional info see Why hashing twice?
by possibly discarding a few bytes in an unpredictable manner, so that the input is not a constant size. I'm not entirely sure that would be of any benefit.
That counteracts the way hash functions are designed. You want the function to produce the same output using the same input. Lifting this relationship basically destroys all use from the function. You could use a random number generator instead. See also: Does the MD5 algorithm always generate the same output for the same string? or Is sha-1 hash always the same?
In my mind, doing [...] essentially makes brute for even more infeasible.
The quoted statement is correct, but the reasoning is flawed. It makes brute force harder, because an attacker has to compute 4 functions instead of one. And she cannot use rainbow tables, because they aren't generated for your setup.
Wild guess: If you are using the mentioned setup to store and verify passwords, don't do it. Use PBKDF2 or bcrypt for that. See Password Storage Cheat Sheet

Is there a two-way hashing algorithm in PHP?

Disclaimer: I understand that a hash is not supposed to be reversible.
I've seen many people ask if there is a way to "unhash" text that is already hashed. However, I am not seeing a straight answer. Most answers state that MD5 and SHA-1 are one-way hashing algorthims, and therefore irreversible. That's great and all, but it begs the question are all hashing algorithms one-way and irreversible?
A hash function is any function that can be used to map data of arbitrary size to data of fixed size. (source: Wikipedia)
Because the range of the input values is infinite and the number of possible distinct output values is finite, the function produces the same output for an infinite number of input values. This means a hash is a losing-information function.
Assuming one could "reverse" the hashing, they would get an infinite set of possible original values. It is still impossible to tell what was the value used to generate the hash.
In mathematical terms, a hash function is not injective and this property automatically makes it not invertible.
All of the above apply to any hash function, no matter what language or library provides it.
Not really. The one absolutely non-negotiable property of a hash function is it converts data of an arbitrary length to values of a fixed length. This means each possible result of your hashing function has infinitely many possible inputs that could produce it, making reversing the hash function to a single value impossible.
If you can place constraints on the length of your data input, then technically you could define a reversible hash function but I don't particularly see a use for it.
... are all hashing algorithms one-way and irreversible?
There are some real-world hash functions that can be reversed, such as the not-uncommon implementation of nominally hashing an 8, 16, 32 or 64-bit number by returning the input unchanged. Many C++ Standard Libraries, python and other languages do exactly that, as it's often good enough for use by hash tables keyed on the numbers - the extra potential for collisions must be weighed up against the time that would have been needed to generate a stronger hash, and indeed even the potential CPU-cache benefits of nearby keys hashing to nearby buckets.
That said, your question starts...
I've seen many people ask if there is a way to "unhash" text that is already hashed.
For very short amounts of text, such 8-character passwords, brute force attacks using dictionaries and mutation rules (e.g. "try a dictionary word followed by each character from space (ASCII 32) through tilda (127)", "try all combinations of replacing letters with similar-looking or -sounding numbers"...) can sometimes find the password likely used (though there's a small chance it's another password with the same hash value).
If the input wasn't based on a dictionary word or something else guessable, it's far less likely to be crackable.
For longer amounts of text, it's increasingly impractical to find any input with matching hash value, and massively less likely that any such input would actually be the one originally used to generate the hash (with longer inputs, more of them will - on average - map to any given hash value). Once the text input is dozens of times longer than the hash value, it's totally impractical (unless perhaps quantum computing develops significantly). (Note that Microsoft's C++ compiler's std::hash<std::string> only combines 10 characters evenly spaced along any string to form the hash value, so longer strings don't increase the quality of the hash, but on the other hand the hash only provides any insight at all into the max 10 characters chosen to form it).
Most answers state that MD5 and SHA-1 are one-way hashing algorthims, and therefore irreversible.
Hashes suitable for cryptographic use (as distinct from hash table use) - should inherently take a relatively long time to calculate (some goodly fraction of a second on likely hardware), so that the brute-force dictionary attacks mentioned above are prohibitively compute-intensive even for short textual strings. This helps make them practically irreversible. Even reasonable checksum-strength hash functions will be hard to reverse after there are more bytes of input than there are bytes in the hash value, rapidly becoming practically irreversible as the input gets larger and larger.

Hash an integer by another integer [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I'd like a way to hash an integer using another integer. It should produce a new hashed integer. It should accept an integer input and a key, the input is then hashed by the key and produced as an integer. The method would look like hash_method(input, key). Collisions won't matter here, I am not using them for security or comparison. I'm pretty sure this is possible seeing how some security algorithms that use challenges do something similar. How would I go about this in ruby?
Hash routines are many and varied, and usually picked to match details of expected input distribution, purpose of the hash value, and speed to generate them.
However, you can make use of existing hash routines in Ruby's standard library. The output of most cryptographic hash functions is a string of bytes that can easily be interpreted as an integer. For your purpose, you just need to decide on a suitable maximum value by restricting the length.
Cryptographic hashes also have the advantage in your case that they make high quality pseudo-random functions - given two inputs that differ by only a single bit, the results will not correlate.
The HMAC construct combines a hash function with two inputs - a message and a secret. Using your input number as the message, and key as the secret allows you to use the function as-is.
There is nothing special about using Integer inputs for most standard hashing function, which on the whole just munges bytes ignoring data types. Given that you don't seem to care about specific values that the hash outputs, you can simply convert your numbers to String values and feed them into a standard hashing function. This is perfectly OK, there is no reason not to do this, unless you need to differentiate between 1 and "1" using the same hashing function.
Like this:
require 'openssl'
input = 25
key = 106
full_hash = OpenSSL::HMAC.hexdigest(
OpenSSL::Digest.new('sha1'), key.to_s, input.to_s )
# This is an unsigned 32-bit integer
result = full_hash[0..7].to_i(16)
# => 2746028024
This result is suitable for checksums, or for algorithms that want pseudo-random re-distribution of values. It has a flaw, in that the speed will not be high in cases where you need to generate many values.
You could make this a lot simpler if you were happy with a lower quality of randomness - you could use a linear congruential generator for example. This would likely be faster than the above, but might exhibit unwanted patterns in the output.

Does the MD5 change from encryption? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I have something I've been researching into and I can't find an answer or maybe just understand. When encrypting a file WITHOUT changing the contents does that change the MD5 Sum/Hash of a file? Like a Word file with the same unchanged string of characters being encrypted, does the encrypted file hold the same MD5 Sum as the encrypted Word file?
Yes, encrypting a file should substantially change any hash of a file.
Cryptographic hash codes are constructed such that hashing any two different strings should produce wildly different results, even if there's a close connection between the original strings. For example, the MD5 hash of "hello" is
5d41402abc4b2a76b9719d911017c592
while the MD5 hash of "hello?" is
3809718a10a0f59bcf6d4939c10fd28d
Encrypting a file should, with good encryption, make the resulting file look statistically random. Consequently, if you were to hash an encrypted file, it should give a hash that's statistically indistinguishable from the hash of a random string. That means that the probability that you should get the same hash output would, roughly speaking, be about 1 / N, where N is the number of possible hash outputs. For even a decently good hash function, this should be astronomically small.
That depends; if you create the exact same plaintext from the ciphertext (which is the name for the encrypted plaintext) and hash that then the MD5 sum will be the same. If you just hash the ciphertext then the hash will be different.
Cryptographically secure hashes should always be different from each other, even if only a single bit of input changes. Even though there are unlimited messages that hash to the same value, it should be impossible to find another message that computes to the same hash (this is called a collision).
Note that the MD5 hash function is broken. If an attacker can generate the files to be hashed then it is possible to generate two different files with the same hash. So it is very easy to create two programs that do different things but hash to the same MD5 hash. So use a hash function that has not been broken, e.g. SHA-256 or SHA-512 would be considered a good option.
Encrypting a file changes the contents that are stored on disk. The MD5 hash of a file's contents does not know (or care) whether it is encrypted or not, it just reads bytes from the disk. Since the bytes are different between plaintext and encrypted, the MD5 hash will be different.

Resources