How do you know that the same key pair hasn't been generated ever before?
I guess the system you will access with your keys will keep track of keys that have already been generated and are being used therefore if the algorithm generates a repeated one, it will discard it.
If that's the case there exists the risk of someone flooding a system with "easily" generated keys until you find one it's already there and steal the identity.
I deduce I'm wrong since things seem to work correctly in the world so far, so why? How do you know if your key pair generator hasn't created an already used key?
Related
I am planning on generating a set of public/private keys from a deterministic identifying piece of information from a person and was planning on using fingerprints.
My question, therefore, is: what is the output of a fingerprint scanner? Is there any deterministic output I could use, or is it always going to be a matter of "confidence level"? i.e. Do I always get a "number" which, if matched exactly to the database, will allow access, or do I rather get a number which, if "close enough" to the stored value on the database, allows access, based on a high degree of confidence, rather than an exact match?
I am quite sure the second option is the answer but just wanted to double-check. Is there any way to get some sort of deterministic output? My hope was to re-generate keys every time rather than actually storing fingerprint data. That way a wrong fingerprint would simply generate a new and useless key.
Any suggestions?
Thanks in advance.
I would advise against it for several reasons.
The fingerprints are not entirely deterministic. As suggested in #ImSimplyAnna answer, you might 'round' the results in order to have more chances to obtain a deterministic result. But that would significantly reduce the number of possible/plausible fingerprints, and thus not meet the search space size requirement for a cryptographic algorithm. On top of it, I suspect the entropy of such result to be somehow low, compared to the requirements of modern algorithm which are always based on high quality random numbers.
Fingerprints are not secret, we expose them to everyone all the time, and they can be revealed to an attacker at any time, and stored in a picture using a simple camera. A key must be a secret, and the only place we know we can store secrets without exposing them is our brain (which is why we use passwords).
An important feature for cryptographic keys is the possibility to generate new one if there is a reason to believe the current ones might be compromised. This is not possible with fingerprints.
That is why I would advise against it. Globally, I discourage anyone (myself included) to write his/her own cryptographic algorithm, because it is so easy to screw them up. It might be the easiest thing to screw up, out of all the things you could write, because attacker are so vicicous!
The only good approach, if you're not a skilled specialist, is to use libraries that are used all around, because they've been written by experts on the matter, and they've been subject to many attacks and attempts to break them, so the ones still standing will offer much better levels of protection that anything a non specialist could write (or basically anything a single human could write).
You can also have a look at this question, on the crypto stack exchange. They also discourage the OP in using anything else than a battle hardened algorithm, or protocol.
Edit:
I am planning on generating a set of public/private keys from a
deterministic identifying piece of information
Actually, It did not strike me at first (it should have), but keys MUST NOT be generated from anything which is not random. NEVER.
You have to generate them randomly. If you don't, you already give more information to the attacker than he/she wants. Being a programmer does not make you a cryptographer. Your user's informations are at stake, do not take any chance (and if you're not a cryptographer, you actually don't stand any).
A fingerprint scanner looks for features where the lines on the fingerprint either split or end. It then calculates the distances and angles between such features in an attempt to find a match.
Here's some more reading on the subject:
https://www.explainthatstuff.com/fingerprintscanners.html
in the section "How fingerprints are stored and compared".
The source is the best explanation I can find, but looking around some more it seems that all fingerprint scanners use some variety of that algorithm to generate data that can be matched.
Storing raw fingerprints would not only take up way more space on a database but also be a pretty significant security risk if that information was ever leaked, so it's not really done unless absolutely necessary.
Judging by that algorithm, I would assume that there is always some "confidence level". The angles and distances will never be 100% equal between scans, so there has to be some leeway to make sure a match is still found even if the finger is pressed against the scanner a bit harder or the finger is at a slightly different angle.
Based on this, I'd assume that generating a key pair based on a fingerprint would be possible, if you can figure out a way to make similar scans result in the same information. Simply rounding the angles and distances may work, but may introduce cases where two different people generate the same key pairs, or cases where different scans of the same fingerprint have a high chance of generating several different keys.
Let´s say that IPFS becomes an evolution of internet as we understand today, and the entire model (mainly websites and files that they store) migrates to be as decentralized as possible.
I am not sure about the entire procedure of how IPFS works under the hood, but I understood that the files will be stored mainly based on their hash.
Is there a possibility that, due to the amount of files that a worldwide IPFS model can store (so then same amount of hashes will be generated), algorithms like SHA-2 reach a limit about the amount of unique hashes that it can generate (knowing that 2^256, which is pretty big)?
You would have to generate hashes for about 2^128 different files in order to find a single collision in a 256-bit hash by chance.
That number is much larger than the number of atoms in the universe, so the probability of that happening is extremely small.
It's much more likely that some problem will eventually be found in the hash function, allowing someone to create collisions on purpose.
BCrypt hashes usually begin with some repeating symbols.
Lets say for example we see $2a$10 as a beginning of our hash.Every BCrypt hash has something similar to this.
$ is a separator
2a is in this case the version
10 is the number of iterations 2 to the power of 10
My question is - why is this information in the hash?
There is no dehashing algorithm that might need this information in particular and when people log-in they generate the same has using the same version and the same number of iterations and then the result is compared to what is stored in the database. This means that the algorithm doesn't have build in comparing function that gets the has and based on this information (version and iterations) hashes the password to make the comparison.
Then...why is it so that this information is given away? Who uses this information?
My guess is so that if the version has changed or the number of iterations our program or whatever will know, but...why? I mean that the algorithm must be configured only once and if changes are required then it is the company's job to make the appropriate arrangements so that it knows what version was used and what is used now. Why is it the hash's job to remember the version and number of iterations?
Hashes get leaked every week or so and with this information someone can easily set up his BCrypt and make it running with the same configuration of version and iterations...however if this information wasn't visible in the hash and the hash got public...then how would anyone make their own BCrypt version and start comparing it?
Isn't it more safe to not provide this information so that if the hash alone gets leaked nobody would know what configuration was used to make it?
It makes bcrypt forward and backwards compatible.
For example, bcrypt hashes do not start with 2a.
They start with:
$2$
$2a
$2x$
$2y$
$2b$
You need to know which version of the has you're reading, so you handle it correctly.
Also, you need to know the number of iterations.
not every hash will use 10
not every hash will use the same cost
Why store the version and iterations? Because you have to.
Also, it's an excellent. In the past, people used to just store a hash, and it was awful.
people used either no salt, or the same salt every time, because storing it was too hard
people uses 1 iteration, or a hard-coded number of iterations, because storing it was too hard
BCrypt, SCrypt, and Argon2 use the extraordinarily clever idea of doing all that grunt-work for you, leaving you with only having to call one function, without weakening the security of the system in any way.
you're trying to preach security by obscurity. Stop that, it doesn't work.
not having details in the data is not unusual, in this old hack it was hackers that mentioned it was SHA1. This is easy - the attackers, and researchers too, will take the list of data that was leaked and simply try all kinds of common algorithms and interation/work factor counts with a small list of the common passwords, like the phpbb list from SkullSecurity; when they find the inevitable cracked terrible, passwords, they'll know they found the algorithm and break out the full scale cracking.
having the algorithm stored means you can transition from old to new gradually, and upgrade individual users as they come in, AND have multiple variants in use at one - including transitional types
transitional: you were on salted SHA-1 (BAD), moving to PBKDF2-HMAC-SHA-512 with 200,000 iterations (good), in the middle you actually bulk convert to PBKDF2-HMAC-SHA-512(SHA-1(salted password)), but at each user's login, move them to pure PBKDF2-HMAC-SHA-512(password).
having the iteration count means, like the transitional above, you can increase it over time and have different counts for different users set as they log in.
In this reddit blog post, the author talks about MD5ing the cache keys and hence the reason why they find it very difficult to scale out.
Can someone tell me why one would want to md5 cache keys? I didn’t understand the reason even though they explained it as
“A few years ago, we decided to md5
all of our cache keys. We did this
because at the time memcached (which
is what memcachedb is based on) could
only take keys of a certain length. In
fact, the version it is based on still
has this limitation. MD5ing the keys
was a good solution to this problem,
so we thought.”
The key size back then was probably shorter than it is now (currently 250 bytes - and 250 bytes is a pretty huge key name) meaning a sensible key naming convention may not have been possible, so they just used the sensible naming convention and md5'd it.
We did this because at the time memcached (which is what memcachedb is based on) could only take keys of a certain length
I guess that since some keys where larger than the max length the server allowed, they decided to create a md5 of the key to store it.
However, I'm not sure there is a relation between this and the fact that they can't easily add new servers (since memcached also use hashing to even repartition .. maybe memcachedb doesn't)
I've seen it mentioned in many places that randomness is important for generating keys for symmetric and asymmetric cryptography and when using the keys to encrypt messages.
Can someone provide an explanation of how security could be compromised if there isn't enough randomness?
Randomness means unguessable input. If the input is guessable, then the output can be easily calculated. That is bad.
For example, Debian had a long standing bug in its SSL implementation that failed to gather enough randomness when creating a key. This resulted in the software generating one of only 32k possible keys. It is thus easily possible to decrypt anything encrypted with such a key by trying all 32k possibilities by trying them out, which is very fast given today's processor speeds.
The important feature of most cryptographic operations is that they are easy to perform if you have the right information (e.g. a key) and infeasible to perform if you don't have that information.
For example, symmetric cryptography: if you have the key, encrypting and decrypting is easy. If you don't have the key (and don't know anything about its construction) then you must embark on something expensive like an exhaustive search of the key space, or a more-efficient cryptanalysis of the cipher which will nonetheless require some extremely large number of samples.
On the other hand, if you have any information on likely values of the key, your exhaustive search of the keyspace is much easier (or the number of samples you need for your cryptanalysis is much lower). For example, it is (currently) infeasible to perform 2^128 trial decryptions to discover what a 128-bit key actually is. If you know the key material came out of a time value that you know within a billion ticks, then your search just became 340282366920938463463374607431 times easier.
To decrypt a message, you need to know the right key.
The more possibly keys you have to try, the harder it is to decrypt the message.
Taking an extreme example, let's say there's no randomness at all. When I generate a key to use in encrypting my messages, I'll always end up with the exact same key. No matter where or when I run the keygen program, it'll always give me the same key.
That means anyone who have access to the program I used to generate the key, can trivially decrypt my messages. After all, they just have to ask it to generate a key too, and they get one identical to the one I used.
So we need some randomness to make it unpredictable which key you end up using. As David Schmitt mentions, Debian had a bug which made it generate only a small number of unique keys, which means that to decrypt a message encrypted by the default OpenSSL implementation on Debian, I just have to try this smaller number of possible keys. I can ignore the vast number of other valid keys, because Debian's SSL implementation will never generate those.
On the other hand, if there was enough randomness in the key generation, it's impossible to guess anything about the key. You have to try every possible bit pattern. (and for a 128-bit key, that's a lot of combinations.)
It has to do with some of the basic reasons for cryptography:
Make sure a message isn't altered in transit (Immutable)
Make sure a message isn't read in transit (Secure)
Make sure the message is from who it says it's from (Authentic)
Make sure the message isn't the same as one previously sent (No Replay)
etc
There's a few things you need to include, then, to make sure that the above is true. One of the important things is a random value.
For instance, if I encrypt "Too many secrets" with a key, it might come out with "dWua3hTOeVzO2d9w"
There are two problems with this - an attacker might be able to break the encryption more easily since I'm using a very limited set of characters. Further, if I send the same message again, it's going to come out exactly the same. Lastly, and attacker could record it, and send the message again and the recipient wouldn't know that I didn't send it, even if the attacker didn't break it.
If I add some random garbage to the string each time I encrypt it, then not only does it make it harder to crack, but the encrypted message is different each time.
The other features of cryptography in the bullets above are fixed using means other than randomness (seed values, two way authentication, etc) but the randomness takes care of a few problems, and helps out on other problems.
A bad source of randomness limits the character set again, so it's easier to break, and if it's easy to guess, or otherwise limited, then the attacker has fewer paths to try when doing a brute force attack.
-Adam
A common pattern in cryptography is the following (sending text from alice to bob):
Take plaintext p
Generate random k
Encrypt p with k using symmetric encryption, producing crypttext c
Encrypt k with bob's private key, using asymmetric encryption, producing x
Send c+x to bob
Bob reverses the processes, decrypting x using his private key to obtain k
The reason for this pattern is that symmetric encryption is much faster than asymmetric encryption. Of course, it depends on a good random number generator to produce k, otherwise the bad guys can just guess it.
Here's a "card game" analogy: Suppose we play several rounds of a game with the same deck of cards. The shuffling of the deck between rounds is the primary source of randomness. If we didn't shuffle properly, you could beat the game by predicting cards.
When you use a poor source of randomness to generate an encryption key, you significantly reduce the entropy (or uncertainty) of the key value. This could compromise the encryption because it makes a brute-force search over the key space much easier.
Work out this problem from Project Euler, and it will really drive home what "lots of randomness" will do for you. When I saw this question, that was the first thing that popped into my mind.
Using the method he talks about there, you can easily see what "more randomness" would gain you.
A pretty good paper that outlines why not being careful with randomness can lead to insecurity:
http://www.cs.berkeley.edu/~daw/papers/ddj-netscape.html
This describes how back in 1995 the Netscape browser's key SSL implementation was vulnerable to guessing the SSL keys because of a problem seeding the PRNG.