Salting passwords 101 - salt

Could someone please help me understand how salting works?
So far I understand the following:
Validate password
Generate a random string
Hash the password and the random string and concat them, then store them in the password field...
How do we store the salt, or know what it is when a user logs in? Do we store it in its own field? If we don't, how does the application figure out what the salt is? And if we do store it, doesn't it defeat the whole purpose?

Salt is combined with the password before hashing. the password and salt clear values are concatenated and the resulting string is hashed. this guarantees that even if two people were to have the same password you would have different resulting hashes. (also makes attacks known as dictionary attacks using rainbow tables much more difficult).
The salt is then stored in original/clear format along with the hash result. Then later, when you want to verify the password you would do the original process again. Combine the salt from the record with the password the user provided, hash the result, compare the hash.
You probably already know this. but it's important to remember. the salt must be generated randomly each time. It must be different for each protected hash. Often times the RNG is used to generate the salt.
So..for example:
user-password: "mypassword"
random salt: "abcdefg12345"
resulting-cleartext: "mypassword:abcdefg12345" (how you combine them is up to you. as long as you use the same combination format every time).
hash the resulting cleartext: "somestandardlengthhashbasedonalgorithm"
In your database now you would store the hash and salt used. I've seen it two ways:
method 1:
field1 - salt = "abcdefg12345"
field2 - password_hash = "somestandardlengthhashbasedonalgorithm"
method 2:
field1 - password_hash = "abcdefg12345:somestandardlengthhashbasedonalgorithm"
In either case you have to load the salt and password hash out of your database and redo the hash for comparison

salt <- random
hash <- hash(password + salt)
store hash:salt
Later
input password
look up hash:salt
hash(password+salt)
compare with stored hash
Got it?

How do we store the salt, or know what it is when a user logs in? Do we store it in its own field?
Yes.
And if we do store it, doesn't it defeat the whole purpose?
No. The purpose of a salt is not being secret, but merely to prevent an attacker from amortizing the cost of computing rainbow tables over all sites in the world (not salt) or all users in your site (single salt used for all users).

According to Practical Cryptography (Neils Ferguson and Bruce Schneier), you should use salted, stretched hashes for maximum security.
x[0] := 0
x[i] := h(x[i-1] || p || s) for i = 1, ..., r
K := x[r]
where
h is the hash (SHA-1, SHA-256, etc.)
K is the generated hashed password
p is the plaintext password
r is the number of rounds
s is the randomly generated salt
|| is the concatenation operator
The salt value is a random number that is stored with the encrypted password. It does not need to remain secret.
Stretching is the act of performing the hash multiple times to make it computationally more difficult for a attacker to test many permutations of passwords. r should be chosen so that the computation takes about 200-1000ms on the user's computer. r may need to be increased as computers get faster.

If you're using a well-known hashing algorithm, someone could have a list of a lot of possible passwords already hashed using that algorithm and compare the items from that list with a hashed password they want to crack (dictionary attack).
If you "salt" all passwords before hashing them, these dictionaries are useless, because they'd have to be created using your salt.

Related

Does Ruby have a SlowEquals Function?

In a password hashing scheme, when comparing two password hashes, I know that I should use a slow equals function, one that will take the same amount of time regardless of the parameters.
I learned the importance of slow equals in "Why is the SlowEquals function important to compare hashed passwords?".
Does such a function exist in Ruby? If not, what gems can I use?
Yes, there is constant-time string comparison library in Ruby, see fast_secure_compare. But you shouldn't use it against two password hashes.
Consider such situation that when Bob tries to brute force Alice's password, what would happen?
Bob tries a password
The server hashes Bob's try
The server compares Bob's hash with Alice's hash
Since the two hashes tend to be very different even the two original passwords are similar, comparison using == will always fail at the very beginning.
On the other hand, if the two hashes only have one different character at the end, it doesn't reflect the similarity of the two original passwords, and Bob still knows nothing about Alice's password.

How can you hash an email address into a zero or one with relatively even distribution?

This may be a very stupid question - apologies in advance.
I'm wondering if it's possible to generate a random number from an email address. I'm imagining something similar to how you can generate an md5 hash of an email address (or pretty much any string for that matter).
So basically such a function would allow you to generate the same random number from the same email address every time you ran it.
The application that I have in mind is to slot email addresses into an A/B test randomly. Normally the way that you would implement such a thing would be to just generate a random number for each email address and store that along with the email address in order to tag a given email as belonging to A or B.
The nice thing about a function that could generate a random number from an email is that you wouldn't have to store that association anywhere. You could run it on the fly to determine at any given time which bucket the email should fall into.
UPDATE: What I'm looking for is a hash, not a random number. So it's just a matter of figuring out how to go from something like an MD5 hash to an integer with a value of 0 or 1.
UPDATE 2: Thanks for the answers and nudging me in the right direction. So one solution in MYSQL is simply:
ASCII(SUBSTR(MD5(CONCAT(customer_email, 'salt')), 1, 1)) % 2
Yes a Hash by definition does this ( or it appears to ) create a someone random value given some string. But note that it's not really random. To deal with this we do a salted hash, which is to do a Hash that has a random number appended to it, then store both the salted hash with the random number. And it will give you the same results (as long as you retrieved the corresponding random number that the email was stored with).
When generated random number is same every time, it is no longer a random number. You can use ascii coding of characters used in the email for your random number. But there is a catch here : abc#xyz.com will be same as cba#xyz.com. So you have to take care of this somehow. Things will become complex if more special characters are used like _ or a dot(.) . Why can't we use the email itself as KEY.

why is password hash different for 2 users with the same password?

im working with rails and i noticed that my password_digest is different for 2 users with all other fields other than the password digest different. but i used the same password "abcd" for both..
it ended up generating these 2 different hashes
$2a$10$QyrjMQfjgGIb4ymtdKQXI.WObnWK0/CzR6yfb6tlGJy0CsVWY0GzO
$2a$10$dQSPyeQmZCzVUOXQ3rGtZONX6pwvnKSBRmsLnq1t1CsvdOTAMQlem
i thought the bcrypt gem generates the hash only based on the password field! am i wrong?
thanks :)
What you are looking at here is more than a password hash, there is a lot of metadata about the hash included in those strings. In terms of bcrypt the entire string would be considered the bcrypt hash. Here is what it includes:
$ is the delimiter in bcrypt.
The $2a$ is the bcrypt algorithm that was used.
The $10$ is the cost factor that was used. This is why bcrypt is very popular for storing hashes. Every hash has a complexity/cost associated with it, which you can think of as how quickly it will take a computer to generate this hash. This number is of course relative to the speed of computers, so as computers get faster and faster over the years it will take less and less time to generate a hash with the cost of 10. So next year you increase your cost to 11, then to 12... 13... and so on. This allows your future hashes to remain strong while keeping your older hashes still in valid. Just note that you cannot change the cost of a hash without rehashing the original string.
The $QyrjMQf... is a combination of the salt and the hash. This is a base64 encoded string.
The first 22 characters are the salt.
The remaining characters are the hash when used with the 2a algorithm, cost of 10, and the given salt. The reason for the salt is so an attacker cannot pre compute bcrypt hashes in order to avoid paying the cost of generating them.
In fact this is the answer to your original question: The reason the hashes are different is because if they were the same you would know that anytime you saw the bcrypt string $2a$10$QyrjMQfjgGIb4ymtdKQXI.WObnWK0/CzR6yfb6tlGJy0CsVWY0GzO you would know the password would be abcd. So you could just scan an databases of hashes and quickly find all of the users with the abcd password by looking up that hash.
You cannot do this with bcrypt because $2a$10$dQSPyeQmZCzVUOXQ3rGtZONX6pwvnKSBRmsLnq1t1CsvdOTAMQlem is also abcd. And there are many many many more hashes that will be the result of bcrypt('abcd'). This makes scaning a database for abcd passwords next to impossible.
bcrypt stores the salt in the password hash.
Those are two different hashes of the same password with two different salts.
When verifying the passwords, bcrypt will read the salt from the hash field, then re-compute the hash using that salt.

I'm brainstorming for a serial number scheme. Am I doing it wrong?

serial number format:
24 octets represented by 24 hex
characters plus hyphens for
readibility
e.g. D429-A7C5-9C15-8516-D15D-3A1C
0-15: {email+master hash}
16-19: {id}
20-23: {timestamp}
email+master hash algorithm:
generate md5 hash of user's email (32 bytes)
generate md5 hash of undisclosed master key
xor these two hashes
remove odd bytes, reducing size to 16
e.g. D429A7C59C158516D15D3A1CB00488ED --> D2AC9181D531B08E
id:
initially 0x00000000, then incremented with each licence sold
timestamp:
timestamp generated when license is purchased
validation:
in order to register product, user must enter 1) email address and 2) serial number
generate email+master hash and verify that it matches 0-15 of serial
extract timestamp from serial and verify that it is < current timestamp and >= date first license is sold
I'm no expert on this, but there are a few things that might be problematic with this approach:
Using MD5 doesn't seem like a good idea. MD5 has known security weaknesses and someone with enough time on their hands could easily come up with some sort of hash collision. Depending on how you use the serial number, someone could easily forge a serial number that looks like it matches some other serial number. Using something from the SHA family might prevent this.
Your XOR of the user email hash with a master key isn't particularly secure - I could recover the hash of the master key easily by XORing the serial number with a hash of my own email.
Dropping every odd byte out of a secure hash breaks the guarantee that the hash is secure. In particular, any hash function with a good security guarantee usually requires that all of the bytes in the resulting hash be there in the output. As an example, I could trivially construct a secure hash function from any existing secure hash function by taking the output of that first hash, interspersing 0s in-between all the old bytes, then outputting the result. It's secure because if you could break any of the security properties of my new hash, it would be equivalent to breaking security properties of the original hash. However, if you drop all the even-numbered bytes from the new hash, you get all zeros, which isn't at all secure.
Is four bytes enough for the id? That only gives you 2^32 different ids.
Some points to add to templatetypedef´s reply:
If you must combine hashes for the email and your master key, hash the concatenation of both. Even better, hash email+key+id for even "better" security in case someone purchases two or more licenses and sees the pattern.
Use a hash function that gives you only 16 bytes. If you must use MD5, any truncation is equally bad, so just take the first 16 bytes.
Your id is never used in the validation.
You will not be protected from key sharing (e.g. warez sites).
A serial number protects you from very few attacks. It´s probably not worth your time and effort.

Time based hash that allows for comparison of hashed data

I'm trying to hash two different geo positions (-180.0, 60.59) and (-179.0, 80.40) to protect the geo positions from being known while allowing to know the number differences between the two hashes. I figured the answer would be having a key generated and stored in the client and having a time based key in the hash.
Cryptographic Hash functions are not preserving operations, that is;
a + b != H(a+b)
Think the + as any operation. This will be very dangerous to allow finding hash collisions.
What you need is homomorphic encryption that enables at least one operation. An example is Paillier cryptosystem. When you multiply the ciphertext you get the addition of the plaintext.
a + b = Dec(Enc(a) * Enc(b)).

Resources