am sorry for this question, but i was asking: when using MD5, we get a hash, so to get the password we hash all the words untill we find the same hash.
but in a key derivation algorithme such pbkdf2 or bcrypt or scrypt, what a hacker need to seek? or he will make the same algorithme to all words to get the same key derivation?
am sorry for this dumb question.
It’s the same general idea - try all the hashes - but several thousand (or million, or even billion) times slower.
Related
In a password hashing scheme, when comparing two password hashes, I know that I should use a slow equals function, one that will take the same amount of time regardless of the parameters.
I learned the importance of slow equals in "Why is the SlowEquals function important to compare hashed passwords?".
Does such a function exist in Ruby? If not, what gems can I use?
Yes, there is constant-time string comparison library in Ruby, see fast_secure_compare. But you shouldn't use it against two password hashes.
Consider such situation that when Bob tries to brute force Alice's password, what would happen?
Bob tries a password
The server hashes Bob's try
The server compares Bob's hash with Alice's hash
Since the two hashes tend to be very different even the two original passwords are similar, comparison using == will always fail at the very beginning.
On the other hand, if the two hashes only have one different character at the end, it doesn't reflect the similarity of the two original passwords, and Bob still knows nothing about Alice's password.
im working with rails and i noticed that my password_digest is different for 2 users with all other fields other than the password digest different. but i used the same password "abcd" for both..
it ended up generating these 2 different hashes
$2a$10$QyrjMQfjgGIb4ymtdKQXI.WObnWK0/CzR6yfb6tlGJy0CsVWY0GzO
$2a$10$dQSPyeQmZCzVUOXQ3rGtZONX6pwvnKSBRmsLnq1t1CsvdOTAMQlem
i thought the bcrypt gem generates the hash only based on the password field! am i wrong?
thanks :)
What you are looking at here is more than a password hash, there is a lot of metadata about the hash included in those strings. In terms of bcrypt the entire string would be considered the bcrypt hash. Here is what it includes:
$ is the delimiter in bcrypt.
The $2a$ is the bcrypt algorithm that was used.
The $10$ is the cost factor that was used. This is why bcrypt is very popular for storing hashes. Every hash has a complexity/cost associated with it, which you can think of as how quickly it will take a computer to generate this hash. This number is of course relative to the speed of computers, so as computers get faster and faster over the years it will take less and less time to generate a hash with the cost of 10. So next year you increase your cost to 11, then to 12... 13... and so on. This allows your future hashes to remain strong while keeping your older hashes still in valid. Just note that you cannot change the cost of a hash without rehashing the original string.
The $QyrjMQf... is a combination of the salt and the hash. This is a base64 encoded string.
The first 22 characters are the salt.
The remaining characters are the hash when used with the 2a algorithm, cost of 10, and the given salt. The reason for the salt is so an attacker cannot pre compute bcrypt hashes in order to avoid paying the cost of generating them.
In fact this is the answer to your original question: The reason the hashes are different is because if they were the same you would know that anytime you saw the bcrypt string $2a$10$QyrjMQfjgGIb4ymtdKQXI.WObnWK0/CzR6yfb6tlGJy0CsVWY0GzO you would know the password would be abcd. So you could just scan an databases of hashes and quickly find all of the users with the abcd password by looking up that hash.
You cannot do this with bcrypt because $2a$10$dQSPyeQmZCzVUOXQ3rGtZONX6pwvnKSBRmsLnq1t1CsvdOTAMQlem is also abcd. And there are many many many more hashes that will be the result of bcrypt('abcd'). This makes scaning a database for abcd passwords next to impossible.
bcrypt stores the salt in the password hash.
Those are two different hashes of the same password with two different salts.
When verifying the passwords, bcrypt will read the salt from the hash field, then re-compute the hash using that salt.
Is it possible to retrieve the original message from a SHA-1 encrypted message? If I have an SHA -1 encrypted message, what all paratmeters do i need to get the original message from it?
I answered a similar question already: Python SHA1 DECODE function
In short, no it is not possible. The whole point of hashing is to take some long string and turn it into a small one. Hashing is destructive and you lose data, so it is irreversible.
Also, to make things more fun, infinitely many strings have the same hash1. It is impossible to generate a unique string with a given hash unless you know more information about the input.
1: There are tons of hash functions and some may have "special" hashes that are only generated when you give a specific input to the function. Aside from those rare cases (if they even exist), every other output hash has infinitely many input strings.
http://en.wikipedia.org/wiki/Cryptographic_hash_function
it is infeasible to generate a message that has a given hash
The SHA-1 hash generate a 160-bit output from an arbitrarily sized input. As there is more possible inputs than the 2^160 possible output, there is bound to be collision, ie. different input having the same output.
This mean that it may be possible (via brute-force or by exploiting a weakness in the algorithm — none are known at the moment I think) to find a message corresponding to a given hash, but it may not be the original message.
Even if you fix the size of the input, if it is larger than 160 bits, there will be collision, and no way to invert the hash function.
Hashing is not encryption. Encryption is like shuffling the pieces of a jigsaw puzzle. Hashing is more like putting the pieces in a blender, there's no reasonable way to restore the original picture after that.
If you know the length of the original message (in multiples of 512 bits), you'll only need to test the 2^512 inputs of that size. Apply a SHA1 operation to each, and compare the result. This assumes no salting, and rather significant computational resources.
I was wondering whether md5, sha1 and anothers return unique values.
For example, sha1() for test returns a94a8fe5ccb19ba61c4c0873d391e987982fbbd3, which is 40 characters long. So, sha1 for strings larger than 40 chars must be the same (of course it's scrambled, because the given input may contain whitespaces and special chars etc.).
Due to this, when we are storing users' passwords, they can enter either their original password or some super-long one, which nobody knows.
Is this right, or do these hash algorithms provide really unique results - I'm quite sure it's hardly possible.
(Note: You're asking about hashing functions, not encryption).
It's impossible for them to be unique, by definition. They take a large input and reduce its size. It obviously follows, then, that they can't represent all the information they have compressed. So no, they don't provide "truly unique" results.
What they do provide, however, is "collision resistant" results. I.e. they try and show that two slightly different datas produce a significantly different hash.
Hashing algorithms (which is what you are referring to) do not provide unique results. What you are referring to is called the Pigeonhole Principle. The number of inputs exceeds the number of outputs, so multiple inputs must be mapped to the same output. This is why the longer the output hash the better, because there are less number of inputs mapped to an output.
Encrypting something must provide a unique results, because you can encrypt a message and decrypt it and get the same message.
SHA1 is not encryption algorithm, but a cryptographic hash function.
You are right - since it maps arbitrary long input to a fixed size hash there can be collisions. But the idea of a cryptographic hash function is to make it impossible to create such collisions "on demand". That's why we call them one-way hash functions, too.
Quote (source):
The ideal cryptographic hash function has four main or significant properties:
* it is easy to compute the hash value for any given message,
* it is infeasible to find a message that has a given hash,
* it is infeasible to modify a message without changing its hash,
* it is infeasible to find two different messages with the same hash.
Hashing algorithms never guarantee a different result for a different input. That's why hashing is always used as a one-way "encryption".
But you have to be realistic, a 160-bit hashing algorithm can have 2^160 possible combinations, which is... a lot! (1 with 48 zeroes)
These are not encryption functions, but hashing ones.
Hashing, by definition, can have two different strings collide (map to the same value) for the very reasons you mention. But that is usually not relevant because:
Cryptographic hashes (such as SHA1) try hard to make the collision probability for similar strings (very, very) low
You cannot deduce the original string from the hash.
These two mean that you cannot take a hash and easily generate one of the strings that map to it.
I understand that according to pigeonhole principle, if number of items is greater than number of containers, then at least one container will have more than one item. Does it matter which container will it be? How does this apply to MD5, SHA1, SHA2 hashes?
No it doesn't matter which container it is, and in fact this is not that important to cryptographic hashes; much more important is the birthday paradox, which says that you only need to hash sqrt(numberNeededByPigeonHolePrincipal) values, on average, before finding a collision.
Thus, the hash needs to be large enough that the square-root of the search space is too large to brute-force. The square-root-of-search-space for SHA1 is 280, and as of March 2012, no two values have ever been found with the same SHA1-hash (though I predict that will happen within the next year or two..); same with SHA2, a family of hashes which all have an even larger search-space. MD5 has been broken for a while though.
If you have more items to hash than you have slots, then you'll have hash collisions. But if you have a poor hashing algorithm, then you'll see collisions even when the items / slots ratio is very small. A good hashing algorithm (including most of the ones you'll see in the wild) will attempt to spread the resulting hashes over the entire output space as evenly as possible, and thus minimize collisions.
Note that a hash collision is not the end of the world. When used in a hash table, for instance, it just means that more than one item is stored in a slot, and the table code will have to traverse a little bit more to find or add the target item, increasing lookup time slightly.
You'll see people refer to MD5 as a "broken" hashing algorithm, when in reality, it's just a poor one to use as a cryptographic hash. It'll be better than one you build yourself.
The point of a hash function is to randomly distribute items into containers. For any good hash function, it doesn't/shouldn't "matter" which container is which as they must be indistinguishable.
This does not apply to "perfect hash" implementations which attempt to do better than random distribution — unlike the algorithms you mentioned.
As Michael mentioned, collisions happen LONG before there are as many items as slots. You must have graceful collision handling (or a perfect hash) if you want to handle the birthday paradox.
I think which application you're using the hash function for is an important distinction. Frequent collision in hashing containers, for example, can degrade performance. Frequent collision in cryptography will have far more devastating consequences (see: cryptographic hash function on Wikipedia).
Collision happens relatively easily even with "decent" hashing algorithm. For example, in Java,
String s = new String(new char[size]);
always hashes to 0. That is, all strings containing only \0 hash to 0 in Java.
As for "does it matter which container will it be?", again it depends on the application. You can design hash functions that would hash "similar" objects to nearby values. This is useful when you want to search for similar objects, for example. Just hash them all and see where they fall. In this case, collisions or near-collisions are desirable, because it groups objects that are similar.
In other applications, you want even the slightest change in the object to result in an entirely different hash value. This is the case in cryptography, for example, where you want to be as certain as possible that something has not been modified. It is far more difficult to find different objects that hash to the same value in this case.
Depending on your application, cryptographic hashes like MDA, SHA1/2 etc. may not be the ideal choice, precisely because they appear as if entirely random, thus giving you collisions as prediced by the birthday paradox. Traditionally, one reason for using simple hashes based on the remainder operation is that keys were expected to be serial numbers or similar, so that a remainder operation would sustain fewer collisions than expected at random. E.g. if the keys are the integers are 1..1000 you might have no collisions at all in a container of size 1009 if your hash function is the key mod 1009. People would sometimes hand-tune systems by carefully picking container size and hash function to achieve an even split.
Of course, if you have to worry about people maliciously choosing keys that will cause you difficulty, or an upstream system sending you very biassed keys (because e.g. it has its own hash table and decides to process all keys that hash to X at once). you may wish to use a hash based on a keyed cryptographic hash function to defend against this.