Why do you need lots of randomness for effective encryption? - algorithm

I've seen it mentioned in many places that randomness is important for generating keys for symmetric and asymmetric cryptography and when using the keys to encrypt messages.
Can someone provide an explanation of how security could be compromised if there isn't enough randomness?

Randomness means unguessable input. If the input is guessable, then the output can be easily calculated. That is bad.
For example, Debian had a long standing bug in its SSL implementation that failed to gather enough randomness when creating a key. This resulted in the software generating one of only 32k possible keys. It is thus easily possible to decrypt anything encrypted with such a key by trying all 32k possibilities by trying them out, which is very fast given today's processor speeds.

The important feature of most cryptographic operations is that they are easy to perform if you have the right information (e.g. a key) and infeasible to perform if you don't have that information.
For example, symmetric cryptography: if you have the key, encrypting and decrypting is easy. If you don't have the key (and don't know anything about its construction) then you must embark on something expensive like an exhaustive search of the key space, or a more-efficient cryptanalysis of the cipher which will nonetheless require some extremely large number of samples.
On the other hand, if you have any information on likely values of the key, your exhaustive search of the keyspace is much easier (or the number of samples you need for your cryptanalysis is much lower). For example, it is (currently) infeasible to perform 2^128 trial decryptions to discover what a 128-bit key actually is. If you know the key material came out of a time value that you know within a billion ticks, then your search just became 340282366920938463463374607431 times easier.

To decrypt a message, you need to know the right key.
The more possibly keys you have to try, the harder it is to decrypt the message.
Taking an extreme example, let's say there's no randomness at all. When I generate a key to use in encrypting my messages, I'll always end up with the exact same key. No matter where or when I run the keygen program, it'll always give me the same key.
That means anyone who have access to the program I used to generate the key, can trivially decrypt my messages. After all, they just have to ask it to generate a key too, and they get one identical to the one I used.
So we need some randomness to make it unpredictable which key you end up using. As David Schmitt mentions, Debian had a bug which made it generate only a small number of unique keys, which means that to decrypt a message encrypted by the default OpenSSL implementation on Debian, I just have to try this smaller number of possible keys. I can ignore the vast number of other valid keys, because Debian's SSL implementation will never generate those.
On the other hand, if there was enough randomness in the key generation, it's impossible to guess anything about the key. You have to try every possible bit pattern. (and for a 128-bit key, that's a lot of combinations.)

It has to do with some of the basic reasons for cryptography:
Make sure a message isn't altered in transit (Immutable)
Make sure a message isn't read in transit (Secure)
Make sure the message is from who it says it's from (Authentic)
Make sure the message isn't the same as one previously sent (No Replay)
etc
There's a few things you need to include, then, to make sure that the above is true. One of the important things is a random value.
For instance, if I encrypt "Too many secrets" with a key, it might come out with "dWua3hTOeVzO2d9w"
There are two problems with this - an attacker might be able to break the encryption more easily since I'm using a very limited set of characters. Further, if I send the same message again, it's going to come out exactly the same. Lastly, and attacker could record it, and send the message again and the recipient wouldn't know that I didn't send it, even if the attacker didn't break it.
If I add some random garbage to the string each time I encrypt it, then not only does it make it harder to crack, but the encrypted message is different each time.
The other features of cryptography in the bullets above are fixed using means other than randomness (seed values, two way authentication, etc) but the randomness takes care of a few problems, and helps out on other problems.
A bad source of randomness limits the character set again, so it's easier to break, and if it's easy to guess, or otherwise limited, then the attacker has fewer paths to try when doing a brute force attack.
-Adam

A common pattern in cryptography is the following (sending text from alice to bob):
Take plaintext p
Generate random k
Encrypt p with k using symmetric encryption, producing crypttext c
Encrypt k with bob's private key, using asymmetric encryption, producing x
Send c+x to bob
Bob reverses the processes, decrypting x using his private key to obtain k
The reason for this pattern is that symmetric encryption is much faster than asymmetric encryption. Of course, it depends on a good random number generator to produce k, otherwise the bad guys can just guess it.

Here's a "card game" analogy: Suppose we play several rounds of a game with the same deck of cards. The shuffling of the deck between rounds is the primary source of randomness. If we didn't shuffle properly, you could beat the game by predicting cards.
When you use a poor source of randomness to generate an encryption key, you significantly reduce the entropy (or uncertainty) of the key value. This could compromise the encryption because it makes a brute-force search over the key space much easier.

Work out this problem from Project Euler, and it will really drive home what "lots of randomness" will do for you. When I saw this question, that was the first thing that popped into my mind.
Using the method he talks about there, you can easily see what "more randomness" would gain you.

A pretty good paper that outlines why not being careful with randomness can lead to insecurity:
http://www.cs.berkeley.edu/~daw/papers/ddj-netscape.html
This describes how back in 1995 the Netscape browser's key SSL implementation was vulnerable to guessing the SSL keys because of a problem seeding the PRNG.

Related

What is the output of a fingerprint scanner? Is there any deterministic identifying information?

I am planning on generating a set of public/private keys from a deterministic identifying piece of information from a person and was planning on using fingerprints.
My question, therefore, is: what is the output of a fingerprint scanner? Is there any deterministic output I could use, or is it always going to be a matter of "confidence level"? i.e. Do I always get a "number" which, if matched exactly to the database, will allow access, or do I rather get a number which, if "close enough" to the stored value on the database, allows access, based on a high degree of confidence, rather than an exact match?
I am quite sure the second option is the answer but just wanted to double-check. Is there any way to get some sort of deterministic output? My hope was to re-generate keys every time rather than actually storing fingerprint data. That way a wrong fingerprint would simply generate a new and useless key.
Any suggestions?
Thanks in advance.
I would advise against it for several reasons.
The fingerprints are not entirely deterministic. As suggested in #ImSimplyAnna answer, you might 'round' the results in order to have more chances to obtain a deterministic result. But that would significantly reduce the number of possible/plausible fingerprints, and thus not meet the search space size requirement for a cryptographic algorithm. On top of it, I suspect the entropy of such result to be somehow low, compared to the requirements of modern algorithm which are always based on high quality random numbers.
Fingerprints are not secret, we expose them to everyone all the time, and they can be revealed to an attacker at any time, and stored in a picture using a simple camera. A key must be a secret, and the only place we know we can store secrets without exposing them is our brain (which is why we use passwords).
An important feature for cryptographic keys is the possibility to generate new one if there is a reason to believe the current ones might be compromised. This is not possible with fingerprints.
That is why I would advise against it. Globally, I discourage anyone (myself included) to write his/her own cryptographic algorithm, because it is so easy to screw them up. It might be the easiest thing to screw up, out of all the things you could write, because attacker are so vicicous!
The only good approach, if you're not a skilled specialist, is to use libraries that are used all around, because they've been written by experts on the matter, and they've been subject to many attacks and attempts to break them, so the ones still standing will offer much better levels of protection that anything a non specialist could write (or basically anything a single human could write).
You can also have a look at this question, on the crypto stack exchange. They also discourage the OP in using anything else than a battle hardened algorithm, or protocol.
Edit:
I am planning on generating a set of public/private keys from a
deterministic identifying piece of information
Actually, It did not strike me at first (it should have), but keys MUST NOT be generated from anything which is not random. NEVER.
You have to generate them randomly. If you don't, you already give more information to the attacker than he/she wants. Being a programmer does not make you a cryptographer. Your user's informations are at stake, do not take any chance (and if you're not a cryptographer, you actually don't stand any).
A fingerprint scanner looks for features where the lines on the fingerprint either split or end. It then calculates the distances and angles between such features in an attempt to find a match.
Here's some more reading on the subject:
https://www.explainthatstuff.com/fingerprintscanners.html
in the section "How fingerprints are stored and compared".
The source is the best explanation I can find, but looking around some more it seems that all fingerprint scanners use some variety of that algorithm to generate data that can be matched.
Storing raw fingerprints would not only take up way more space on a database but also be a pretty significant security risk if that information was ever leaked, so it's not really done unless absolutely necessary.
Judging by that algorithm, I would assume that there is always some "confidence level". The angles and distances will never be 100% equal between scans, so there has to be some leeway to make sure a match is still found even if the finger is pressed against the scanner a bit harder or the finger is at a slightly different angle.
Based on this, I'd assume that generating a key pair based on a fingerprint would be possible, if you can figure out a way to make similar scans result in the same information. Simply rounding the angles and distances may work, but may introduce cases where two different people generate the same key pairs, or cases where different scans of the same fingerprint have a high chance of generating several different keys.

Password hashing algorithm that will keep password safe even from supercomputers?

I was researching about how MD5 is known to have collisions, So its not secure enough. I am looking for some hashing algorithm that even super computers will take time to break.So can you tell me what hashing algorithm will keep my passwords safe for like next coming 20 years of super computing advancement.
Use a key derivation function with a variable number of rounds, such as bcrypt.
The passwords you encrypt today, with a hashing difficulty that your own system can handle without slowing down, will always be vulnerable to the faster systems of 20 years in the future. But by increasing the number of rounds gradually over time you can increase the amount of work it takes to check a password in proportion with the increasing power of supercomputers. And you can apply more rounds to existing stored passwords without having to go back to the original password.
Will it hold up for another 20 years? Difficult to say: who knows what crazy quantum crypto and password-replacement schemes we might have by then? But it certainly worked for the last 10.
Note also that entities owning supercomputers and targeting particular accounts are easily going to have enough power to throw at it that you can never protect all of your passwords. The aim of password hashing is to mitigate the damage from a database leak, by limiting the speed at which normal attackers can recover passwords, so that as few accounts as possible have already been compromised by the time you've spotted the leak and issued a notice telling everyone to change their passwords. But there is no 100% solution.
As someone else said, what you're asking is practically impossible to answer. Who knows what breakthroughs will be made in processing power over the next twenty years? Or mathematics?
In addition you aren't telling us many other important factors, including against which threat models you aim to protect. For example, are you trying to defend against an attacker getting a hold of a hashed password database and doing offline brute-forcing? An attacker with custom ASICs trying to crack one specific password? Etc.
With that said, there are things you can do to be as secure and future-proof as possible.
First of all, don't just use vanilla cryptographic hash algorithms; they aren't designed with your application in mind. Indeed they are designed for other applications with different requirements. For one thing, they are fast because speed is an important criterion for a hash function. And that works against you in this case.
Additionally some of the algorithms you mention, like MD5 or SHA1 have weaknesses (some theoretical, some practical) and should not be used.
Prefer something like bcrypt, an algorithm designed to resist brute force attacks by being much slower than a general purpose cryptographic hash whose security can be “tuned” as necessary.
Alternatively, use something like PBKDF2 which is. Designed to run a password through a function of your choice a configurable number of times along with a salt, which also makes brute forcing much more difficult.
Adjust the iteration count depending on your usage model, keeping in mind that the slower it is, the more security against brute-force you have.
In selecting a cryptographic hash function for PBKDF, prefer SHA-3 or, if you can't use that, prefer one of the long variants of SHA-2: SHA-384 or SHA-512. I'd steer clear of SHA-256 although I don't think there's an issue with it in this scenario.
In any case, use the largest possible and best salt you can; I'd suggest that you use a good cryptographically secure PRNG and never use a salt less than 64 bits (note: that I am talking about the length of the salt generated, not the value returned).
Will these recommendations help 20 years down the road? Who knows - I'd err on the side of caution and say "no". But if you need security for that long a timeframe, you should consider using something other than passwords.
Anyways, I hope this helps.
Here are two pedantic answers to this question:
If P = NP, there is provably no such hash function (and vice versa, incidentally). Since it has not been proven that P != NP at the time of this writing, we cannot make any strong guarantees of that nature.
That being said, I think it's safe to say that supercomputers developed within the next 20 years will take "time" to break your hash, regardless of what it is. Even if it is in plaintext some time is required for I/O.
Thus, the answer to your question is both yes and no :)

Ideal hashing method for wide distribution of values?

As part of my rhythm game that I'm working, I'm allowing users to create and upload custom songs and notecharts. I'm thinking of hashing the song and notecharts to uniquely identify them. Of course, I'd like as few collisions as possible, however, cryptographic strength isn't of much importance here as a wide uniform range. In addition, since I'd be performing the hashes rarely, computational efficiency isn't too big of an issue.
Is this as easy as selecting a tried-and-true hashing algorithm with the largest digest size? Or are there some intricacies that I should be aware of? I'm looking at either SHA-256 or 512, currently.
All cryptographic-strength algorithm should exhibit no collision at all. Of course, collisions necessarily exist (there are more possible inputs than possible outputs) but it should be impossible, using existing computing technology, to actually find one.
When the hash function has an output of n bits, it is possible to find a collision with work about 2n/2, so in practice a hash function with less than about 140 bits of output cannot be cryptographically strong. Moreover, some hash functions have weaknesses that allow attackers to find collisions faster than that; such functions are said to be "broken". A prime example is MD5.
If you are not in a security setting, and fear only random collisions (i.e. nobody will actively try to provoke a collision, they may happen only out of pure bad luck), then a broken cryptographic hash function will be fine. The usual recommendation is then MD4. Cryptographically speaking, it is as broken as it can be, but for non-cryptographic purposes it is devilishly fast, and provides 128 bits of output, which avoid random collisions.
However, chances are that you will not have any performance issue with SHA-256 or SHA-512. On a most basic PC, they already process data faster than what a hard disk can provide: if you hash a file, the file reading will be the bottleneck, not the hashing. My advice would be to use SHA-256, possibly truncating its output to 128 bits (if used in a non-security situation), and consider switching to another function only if some performance-related trouble is duly noticed and measured.
If you're using it to uniquely identify tracks, you do want a cryptographic hash: otherwise, users could deliberately create tracks that hash the same as existing tracks, and use that to overwrite them. Barring a compelling reason otherwise, SHA-1 should be perfectly satisfactory.
If cryptographic security is not of concern then you can look at this link & this. The fastest and simplest (to implement) would be Pearson hashing if you are planing to compute hash for the title/name and later do lookup. or you can have look at the superfast hash here. It is also very good for non cryptographic use.
What's wrong with something like an md5sum? Or, if you want a faster algorithm, I'd just create a hash from the file length (mod 64K to fit in two bytes) and 32-bit checksum. That'll give you a 6-byte hash which should be reasonably well distributed. It's not overly complex to implement.
Of course, as with all hashing solutions, you should monitor the collisions and change the algorithm if the cardinality gets too low. This would be true regardless of the algorithm chosen (since your users may start uploading degenerate data).
You may end up finding you're trying to solve a problem that doesn't exist (in other words, possible YAGNI).
Isn't cryptographic hashing an overkill in this case, though I understand that modern computers do this calculation pretty fast? I assume that your users will have an unique userid. When they upload, you just need to increment a number. So, you will represent them internally as userid1_song_1, userid1_song_2 etc. You can store this info in a database with that as the unique key along with user specified name.
You also didn't mention the size of these songs. If it is midi, then file size will be small. If file sizes are big (say 3MB) then sha calculations will not be instantaneous. On my core2-duo laptop, sha256sum of a 3.8 MB file takes 0.25 sec; for sha1sum it is 0.2 seconds.
If you intend to use a cryptographic hash, then sha1 should be more than adequate and you don't need sha256. No collisions --- though they exist --- have been found yet. Git, Mercurial and other distributed version control systems use sh1. Git is a content based system and uses sha1 to find out if content has been modified.

how secure is a digital signature?

Digital signature, if I understood right, means sending the message in clear along with a hash of the message which is encrypted using a private key.
The recipient of the message calculates the hash, decrypts the received hash using the public key, then compares the two hashes for a match.
How safe is this? I mean, you can obtain the hash of the message easily and you also have the encrypted hash. How easy is it to find the private key used to create the Encrypted_hash?
Example:
Message Hash Encrypted_hash
-----------------------------------------
Hello world! 1234 abcd
Hi there 5678 xyzt
Bla bla 0987 gsdj
...
Given the Hash and the Encrypted_hash values, and enough of these messages, how easy/hard is it to find out the private key?
Because of the algorithms used to generate the keys (RSA is the typical one), the answer is essentially "impossible in any reasonable amount of time" assuming that the key is of a sufficient bit length. As long as the private key is not stolen or given away, you won't be able to decrypt it with just a public key and a message that was hashed with the private key.
As linked to in #Henk Holterman's answer, the RSA algorithm is built on the fact that the computations needed to decrypt the private key - prime factorization being one of them - are hard problems, which cannot be solved in any reasonable amount time (that we currently know of). In other words, the underlying problem (prime factorization) is an NP problem, meaning that it cannot be solved in polynomial time (cracking the private key) but it can be verified in polynomial time (decrypting using the public key).
Ciphers developed before electronic computers were often vulnerable to "known plain-text" attack, which is essentially what is described here: if an attacker had the cipher-text and the corresponding plain-text, he could discover the key. World War II-era codes were sometimes broken by guessing at plain-text words that had been encrypted, like the locations of battles, ranks, salutations, or weather conditions.
However, the RSA algorithm used most often for digital signatures is invulnerable even to a "chosen plain-text attack" when proper padding is used (like OAEP). Chosen plain-text means that the attacker can choose a message, and trick the victim into encrypting it; it's usually even more dangerous than a known plain-text attack.
Anyway, a digital signature is safe by any standard. Any compromise would be due to an implementation flaw, not a weakness in the algorithm.
A digital signature says nothing about how the actual message is transferred. Could be clear text or encrypted.
And current asymmetric algorithms (public+private key) are very secure, how secure depends on the key-size.
An attacker does have enough information to crack it. But it is part of the 'proof' of asymmetric encryption that that takes an impractical amount of CPU time: the method is computationally safe.
What you're talking about is known as a "known plaintext" attack. With any reasonably secure modern encryption algorithm known plaintext is of essentially no help in an attack. When you're designing an encryption algorithm, you assume that an attacker will have access to an arbitrary amount of known plaintext; if that assists the attacker, the algorithm is considered completely broken by current standards.
In fact, you normally take for granted that the attacker will not only have access to an arbitrary amount of known plaintext, but even an arbitrary amount of chosen plaintext (i.e., they can choose some text, somehow get you to encrypt it, and compare the result to the original. Again, any modern algorithm needs to be immune to this to be considered secure.
Given the Hash and the Encrypted_hash values, and enough of these messages, how easy/hard is it to find out the private key?
This is the scenario of a Known-plaintext attack: you are given many plaintext messages (the hash) and corresponding cipher texts (the encrypted hash) and you want to find out the encryption key.
Modern cryptographic algorithms are designed to withstand this kind of attack, like the RSA algorithm, which is one of the algorithms currently in use for digital signatures.
In other words, it is still extremely difficult to find out the private key. You'd either need an impossible amount of computing power, or you'd need to find a really fast algorithm for factorizing integers, but that would guarantee you lasting fame in the history of mathematics, and hence is even more difficult.
For a more detailed and thorough understanding of cryptography, have a look at the literature, like the Wikipedia pages or Bruce Schneier's Applied Cryptography.
For a perfectly designed hash it is impossible (or rather - there is no easier way than trying every possible input key)

What is currently the most secure one-way encryption algorithm?

As many will know, one-way encryption is a handy way to encrypt user passwords in databases. That way, even the administrator of the database cannot know a user's password, but will have to take a password guess, encrypt that with the same algorithm and then compare the result with the encrypted password in the database. This means that the process of figuring out the password requires massive amounts of guesses and a lot of processing power.
Seeing that computers just keep getting faster and that mathematicians are still developing these algorithms, I'm wondering which one is the most secure considering modern computing power and encryption techniques.
I've been using MD5 almost exclusively for years now, and I'm wondering if there's something more I should be doing. Should I be contemplating a different algorithm?
Another related question: How long should a field typically be for such an encrypted password? I must admit that I know virtually nothing about encryption, but I'm assuming that an MD5 hash (as an example) can be longer and would presumably take more processing power to crack. Or does the length of the field not matter at all, provided that the encrypted password fits in it in the first place?
Warning: Since this post was written in 2010, GPUs have been widely deployed to brute-force password hashes. Moderately-priced GPUs
can run ten billion MD5s per second. This means that even a
completely-random 8-character alphanumeric password (62 possible
characters) can be brute forced in 6 hours. SHA-1 is only slightly
slower, it'd take one day. Your user's passwords are much weaker, and
(even with salting) will fall at a rate of thousands of passwords per
second. Hash functions are designed to be fast. You don't want this
for passwords. Use scrypt, bcrypt, or PBKDF-2.
MD5 was found to be weak back in 1996, and should not be used anymore for cryptographic purposes. SHA-1 is a commonly used replacement, but has similar problems. The SHA-2 family of hash functions are the current replacement of SHA-1. The members of SHA-2 are individually referred to as SHA-224, SHA-256, SHA-384, and SHA-512.
At the moment, several hash functions are competing to become SHA-3, the next standardised cryptographic hashing algorithm. A winner will be chosen in 2012. None of these should be used yet!
For password hashing, you may also consider using something like bcrypt. It is designed to be slow enough to make large scale brute force attacks infeasible. You can tune the slowness yourself, so it can be made slower when computers are becoming faster.
Warning: bcrypt is based on an older two-way encryption algorithm, Blowfish, for which better alternatives exist today. I do not think that the cryptographic hashing properties of bcrypt are completely understood. Someone correct me if I'm wrong; I have never found a reliable source that discusses bcrypt's properties (other than its slowness) from a cryptographic perspective.
It may be somewhat reassuring that the risk of collisions matters less for password hashing than it does for public-key cryptography or digital signatures. Using MD5 today is a terrible idea for SSL, but not equally disastrous for password hashing. But if you have the choice, simply pick a stronger one.
Using a good hash function is not enough to secure your passwords. You should hash the passwords together with salts that are long and cryptographically random. You should also help your users pick stronger passwords or pass phrases if possible. Longer always is better.
Great question! This page is a good read. In particular, the author claims that MD5 is not appropriate for hashing passwords:
The problem is that MD5 is fast. So are its modern competitors, like SHA1 and SHA256. Speed is a design goal of a modern secure hash, because hashes are a building block of almost every cryptosystem, and usually get demand-executed on a per-packet or per-message basis.
Speed is exactly what you don’t want in a password hash function.
The article then goes on to explain some alternatives, and recommends Bcrypt as the "correct choice" (his words, not mine).
Disclaimer: I have not tried Bcrypt at all. Consider this a friendly recommendation but not something I can back up with my own technical experience.
To increase password strength you should use a wider variety of symbols. If you have 8-10 characters in the password it becomes pretty hard to crack. Although making it longer will make it more secure, only if you use numeric/alphabetic/other characters.
SHA1 is another hashing (one way encryption) algorithm, it is slower, but is has a longer digest. (encoded messsage) (160 bit) where MD5 only has 128 bit.
Then SHA2 is even more secure, but it used less.
salting the password is always an extra level of defense
$salt = 'asfasdfasdf0a8sdflkjasdfapsdufp';
$hashed = md5( $userPassword . $salt );
Seeing that computers just keep getting faster and that mathematicians are still developing these algorithms
RSA encryption is secure in that it relies on a really big number being hard to factor. Eventually, computers will get fast enough to factor the number in a reasonable amount of time. To stay ahead of the curve, you use a bigger number.
However, for most web sites, the purpose of hashing passwords is to make it inconvenient for someone with access to the database to read the password, not to provide security. For that purpose, MD5 is fine1.
The implication here is that if a malicious user gains access to your entire database, they don't need the password. (The lock on the front door won't stop me from coming in the window.)
1 Just because MD5 is "broken" doesn't mean you can just reverse it whenever you want.
Besides being a cryptographically secure one-way function, a good hash function for password protection should be hard to brute force - i.e. slow by design. scrypt is one of the best in that area. From the homepage:
We estimate that on modern (2009) hardware, if 5 seconds are spent computing a derived key, the cost of a hardware brute-force attack against scrypt is roughly 4000 times greater than the cost of a similar attack against bcrypt (to find the same password), and 20000 times greater than a similar attack against PBKDF2.
That said, from commonly available hash functions, doing a few thousand of iterations of anything from the SHA family is pretty reasonable protection for non-critical passwords.
Also, always add a salt to make it impossible to share effort for brute forcing many hashes at a time.
NIST is currently running a contest to select a new hashing algorith, just as they did to select the AES encryption algorithm. So the answer to this question will likely be different in a couple of years.
You can look up the submissions and study them for yourself to see if there's one that you'd like to use.

Resources