differences between random and urandom - random

I'm trying to find out the differences between /dev/random and /dev/urandom files
What are the differences between /dev/random and /dev/urandom?
When should I use them?
when should I not use them?

Using /dev/random may require waiting for the result as it uses so-called entropy pool, where random data may not be available at the moment.
/dev/urandom returns as many bytes as user requested and thus it is less random than /dev/random.
As can be read from the man page:
random
When read, the /dev/random device will only return random bytes within
the estimated number of bits of noise in the entropy pool. /dev/random
should be suitable for uses that need very high quality randomness
such as one-time pad or key generation. When the entropy pool is
empty, reads from /dev/random will block until additional
environmental noise is gathered.
urandom
A read from the /dev/urandom device will not block waiting for more
entropy. As a result, if there is not sufficient entropy in the
entropy pool, the returned values are theoretically vulnerable to a
cryptographic attack on the algorithms used by the driver. Knowledge
of how to do this is not available in the current unclassified
literature, but it is theoretically possible that such an attack may
exist. If this is a concern in your application, use /dev/random
instead.
For cryptographic purposes you should really use /dev/random because of nature of data it returns. Possible waiting should be considered as acceptable tradeoff for the sake of security, IMO.
When you need random data fast, you should use /dev/urandom of course.
Source: Wikipedia page, man page

Always use /dev/urandom.
/dev/urandom and /dev/random use the same random number generator. They both are seeded by the same entropy pool. They both will give an equally random number of an arbitrary size. They both can give an infinite amount of random numbers with only a 256 bit seed. As long as the initial seed has 256 bits of entropy, you can have an infinite supply of arbitrarily long random numbers. You gain nothing from using /dev/random. The fact that there's two devices is a flaw in the Linux API.
If you are concerned about entropy, using /dev/random is not going to fix that. But it will slow down your application while not generating numbers anymore random than /dev/urandom. And if you aren't concerned about entropy, why are you using /dev/random at all?
Here's a much better/indepth explanation on why you should always use /dev/urandom: http://www.2uo.de/myths-about-urandom/
The kernel developers are discussing removing /dev/random: https://lwn.net/SubscriberLink/808575/9fd4fea3d86086f0/

What are the differences between /dev/random and /dev/urandom?
/dev/random and /dev/urandom are interfaces to the kernel's random number generator:
Reading returns a stream of random bytes strong enough for use in cryptography
Writing to them will provide the kernel data to update the entropy pool
When it comes to the differences, it depends on the operation system:
On Linux, reading from /dev/random may block, which limits its use in practice considerably
On FreeBSD, there is none. /dev/urandom is just a symbolic link to /dev/random.
When should I use them?
When should I not use them?
It is very difficult to find a use case where you should use /dev/random over /dev/urandom.
Danger of blocking:
This is a real problem that you will have to face when you decide to use /dev/random. For single usages like ssh-keygen it should be OK to wait for some seconds, but for most other situations it will be not an option.
If you use /dev/random, you should open it in nonblocking mode and provide some sort of user notification if the desired entropy is not immediately available.
Security:
On FreeBSD, there is no difference anyway, but also in Linux /dev/urandom is considered secure for almost all practical cases (e.g, Is a rand from /dev/urandom secure for a login key? and Myths about /dev/urandom).
The situations where it could make a difference are edge cases like a fresh Linux installation. To cite from the Linux man page:
The /dev/random interface is considered a legacy interface, and /dev/urandom is preferred and sufficient in
all use cases, with the exception of applications which require randomness during early boot time; for
these applications, getrandom(2) must be used instead, because it will block until the entropy pool is initialized.
If a seed file is saved across reboots as recommended below (all major Linux distributions have done this
since 2000 at least), the output is cryptographically secure against attackers without local root access as
soon as it is reloaded in the boot sequence, and perfectly adequate for network encryption session keys.
Since reads from /dev/random may block, users will usually want to open it in nonblocking mode (or perform
a read with timeout), and provide some sort of user notification if the desired entropy is not immediately available.
Recommendation
As a general rule, /dev/urandomshould be used for everything except long-lived GPG/SSL/SSH keys.

Short answer
Use /dev/urandom
Long Answer
They are both fed by the same cryptographically secure pseudorandom number generator (CSPRNG). The fact that /dev/random waits for entropy (or more specifically, waits for the system's estimation of its entropy to reach an appropriate level) only makes a difference when you are using a information-theoretically secure algorithm, as opposed to a computationally secure algorithm. The former encompasses algorithms that you probably aren't using, such as Shamir's Secret Sharing and the One-time pad. The latter contains algorithms that you actually use and care about, such as AES, RSA, Diffie-Hellman, OpenSSL, GnuTLS, etc.
So it doesn't matter if you use numbers from /dev/random since they're getting pumped out of a CSPRNG anyway, and it is "theoretically possible" to break the algorithms that you're likely using them with anyway.
Lastly, that "theoretically possible" bit means just that. In this case, that means using all of the computing power in the world, for the amount of time that that the universe has existed to crack the application.
Therefore, there is pretty much no point in using /dev/random
So use /dev/urandom
Sources
1
2
3

Related

Seed generation for each operation

I' musing mbedtls_ctr_drbg_seed function in order to generate seed. Should I do this before each encryption operation or it might be done one when program starts?
You can use a single DRBG instance for the whole program. It's meant for that. In a high-performance multi-threaded program running on a multicore machine, you might prefer one DRBG instance per thread to reduce inter-thread contention.
Per the documentation, you must call mbedtls_ctr_drbg_seed exactly once per context. This function associates the DRBG with an entropy source. The DRBG will query the entropy source function from time to time when it wants more entropy1 (in all cases, at least once when you call the seed function).
You can see how to the entropy and DRBG APIs of Mbed TLS in sample programs such as key_app.c.
1 Depending on the reseed interval and the prediction resistance setting). These are secondary concerns, which matter only to recover if the DRBG state leaks (e.g. through a memory read vulnerability or through side channels).

How to fill kernel entropy without X and hardware RNG?

I have a tiny embedded device running Linux but with no hardware RNG driver and without X server (no mouse, no keyboard...).
/dev/random
blocks very quickly.
cat /proc/sys/kernel/random/entropy_avail
reports very low numbers (~10).
The system handles a camera so there is a real source of entropy. How can I input entropy into the kernel?
Take a data stream from your camera, hash it using something decent like BLAKE2b or SHA2, then feed it into /dev/random.
Once the entropy count is >=256 you are good to go.
From then only read from /dev/urandom/.
/dev/urandom will happily spew out cryptographically secure pseudorandom data suitable for key material once the system has 256 bits of entropy available.
Running out of entropy after you've collected this amount is a myth. Use /dev/urandom, really, it's perfectly fine.
You should take a try with haveged.
It comes with most distributions, you can also install it easily on custom distributions.
It's a userspace daemon that is meant to solve your problem.
cf. man page here: https://linux.die.net/man/8/haveged

What is entropy starvation

I was lost when reading
"Knowing how Linux behaves during entropy starvation (and being able to find the cause) allows us to efficiently use our server hardware."
in a blog. Then I wikied the meaning of 'entropy' in the context of linux. But still, not clear what "entropy starvation' is and the meaning of the sentence quoted above.
Some applications, notably cryptography, need random data. In cryptography, it is very important that the data be truly random, or at least unpredictable (even in part) to any attacker.
To supply this data, a system keeps a pool of random data, called entropy, that it collects from various sources of randomness on the system: Precise timing of events that might be somewhat random (keys pressed by users, interrupts from external devices), noise on a microphone, or, on some processors, dedicated hardware for generating random values. The incoming somewhat-random data is mixed together to produce better quality entropy.
These sources of randomness can only supply data at certain rates. If a system is used to do a lot of work that needs random data, it can use up more random data than is available. Then software that wants random data has to wait for more to be generated or it has to accept lower-quality data. This is called entropy starvation or entropy depletion.

random number generator dev/random

I read that the random number generator dev/random on Mac and Solaris includes 160 bits of entropy. What can I do, if I need more entropy, for example, 200 bits? Thanks in advance
I'm not sure where you read that 160-bit estimate -- I believe that Solaris, Mac and most BSDs use a 256-bit Yarrow implementation. At any rate, the entropy pool is regularly refilled from even the smallest amount of network or disk activity, so, even though /dev/random on non-Linux systems doesn't actually block "waiting for more entropy" (it's more like a supposedly higher-quality version of /dev/urandom, to which on these systems it's typically linked), nothing stops you (if you trust, say, no more than 160 bits at a time from the /dev) from "blocking and refreshing entropy" yourself -- get N bits, do some disk or network I/O, get another N bits, and so forth.
And if you think your disk access is too predictable, you could go for some really bizzare sources like, say, a few of the most recent twitter entries if your program has internet access;)

Symmetric Encryption: Performance Questions

Does the performance of a symmetric encryption algorithm depend on the amount of data being encrypted? Suppose I have about 1000 bytes I need to send over the network rapidly, is it better to encrypt 50 bytes of data 20 times, or 1000 bytes at once? Which will be faster? Does it depend on the algorithm used? If so, what's the highest performing, most secure algorithm for amounts of data under 512 bytes?
The short answers are:
You want to encrypt all your data in one go. You actually want to give them all in one function call to the encryption code, so that it can run even faster.
With a proper encryption algorithm, encryption will be substantially faster than the network itself. It takes a very bad implementation, a very old PC or a very fast network to make encryption a bottleneck.
When in doubt, use an all-made protocol such as SSL/TLS. If given the choice of the encryption algorithm within a protocol, use AES.
Now the longer answers:
There are block ciphers and stream cipher. A stream cipher begins with an initialization phase, in which the key is input in the system (often called "key schedule"), and then encrypts data bytes "on the fly". With a stream cipher, the encrypted message has the same length than the input message, and encryption time is proportional to the input message length, save for the computational cost of the key schedule. We are not talking big numbers here; key schedule time is below 1 microsecond on a not-so-new PC. Yet, for optimal performance, you want to do key schedule once, not 50 times.
Block ciphers also have a key schedule. When the key schedule has been performed, a block cipher can encrypt blocks, i.e. chunks of data of a fixed length. The block length depends on the algorithm, but it is typically 8 or 16 bytes. AES is a block cipher. In order to encrypt a "message" of arbitrary length, the block cipher must be invoked several times, and this is trickier than it seems (there are many security issues). The part which decides how those invocations are assembled together is called the chaining mode. A well-known chaining mode is called CBC. Depending on the chaining mode, there may be a need for an extra step called padding in which a few extra bytes are added to the input message, so that its length becomes compatible with the chosen chaining. Padding must be such that, upon decryption, it can be removed unambiguously. A common padding scheme is called "PKCS#5".
There is a chaining mode called "CTR" which effectively turns a block cipher into a stream cipher. It has some good points; especially, CTR mode requires no padding, and the encrypted message length will have the same length than the input message. AES with CTR mode is good.
Encryption speed will be about 100 MB/s on a typical PC (e.g. a 2.4 GHz Intel Core2, using a single core).
Warning: there are issues with regards to encrypting several messages with the same key. With chaining modes, those issues hide under the name of "IV" (as "initial value"). With most chaining modes, the IV is a random value of the same size of the cipher block. The IV needs not be secret (it is often transmitted along with the encrypted message, because the decrypting party must also know it) but it must be chosen randomly and uniformly, and each message needs a new IV. Some chaining modes (e.g. CTR) can tolerate a non-uniform IV but only for the first message ever with a given key. With CBC even the first message needs a fully random IV. If this paragraph does not make full sense to you, then, please, do not try to design an encryption protocol, it is more complex than you imagine. Instead, use an already specified protocol, such as SSL (for an encryption tunnel) or CMS (for encrypted messages). Development of such protocols was a long and painful history of attacks and countermeasures, with much grinding of teeth. Do not reenact that history...
Warning 2: if you use encryption, then you are worrying about security: there may be adverse entities bent on attacking your system. Most of the time, simple encryption will not fully deter them; by itself, (properly applied) encryption defeats only passive attackers, those who observe transmitted bytes but do not alter them. A generic attacker is also active, i.e. he removes some data bytes, moves and duplicates other, or adds extra bytes of his own designing. To defeat active attackers you need more than encryption, you also need integrity checks. There again, protocols such as SSL and CMS already take care of the details.
AES should be a good choice.
Good implementations of AES (e.g. the one included in the openssl library) require about 10-20 CPU cycles per byte to encrypt. When you encrypt small messages then you also have to consider the time for the key setup. E.g., a typical implementation of DES requires a few thousand cycles for a key setup. I.e., you might actually finish encrypting a small message with AES before you even start encrypting with other ciphers.
Newer processors (e.g. Westmere based CPUs) have an instruction set that supports AES, allowing to encrypt at speeds of 1.5-4 cycles per byte. That is almost impossible to beat by any other cipher.
If possible try encrypting longer messages, rather then splitting them into small pieces. The main reason is not encryption speed, but rather security. I.e., a secure encryption mode usually requires to use an initialization vector and a message authentication (MAC). These will add about 32 bytes to each of your ciphertext parts. I.e. if you divide your message into small parts then this overhead will be significant.
Symmetric encryption algorithms typically are block ciphers. For any given algorithm, the block size is fixed. Then you pick from several different methods of making subsequent blocks dependent on earlier blocks (i.e. cipher block chaining) to create a stream cipher. But the stream cipher invariably caches incoming data and submits it to the block cipher in full blocks.
So by doing 50 bytes 20 times, all you're doing is pounding on your cache logic.
If you're not operating in stream mode, then datagrams smaller than the native block size of your cipher will get significantly less protection than complete blocks, since there are fewer possible messages for an attacker to consider.
Obviously, performance depends on the amount of data as every part of the data has to be encrypted. You'll get much better information by doing a test in your specific environment (language, platform, encryption algorithm implementation) than anyone here could possibly provide out of the blue: I don't think it would take more than half an hour to set up a basic performance measurement.
As for security, you should be fine with Triple DES or AES.

Resources