I was lost when reading
"Knowing how Linux behaves during entropy starvation (and being able to find the cause) allows us to efficiently use our server hardware."
in a blog. Then I wikied the meaning of 'entropy' in the context of linux. But still, not clear what "entropy starvation' is and the meaning of the sentence quoted above.
Some applications, notably cryptography, need random data. In cryptography, it is very important that the data be truly random, or at least unpredictable (even in part) to any attacker.
To supply this data, a system keeps a pool of random data, called entropy, that it collects from various sources of randomness on the system: Precise timing of events that might be somewhat random (keys pressed by users, interrupts from external devices), noise on a microphone, or, on some processors, dedicated hardware for generating random values. The incoming somewhat-random data is mixed together to produce better quality entropy.
These sources of randomness can only supply data at certain rates. If a system is used to do a lot of work that needs random data, it can use up more random data than is available. Then software that wants random data has to wait for more to be generated or it has to accept lower-quality data. This is called entropy starvation or entropy depletion.
Related
I' musing mbedtls_ctr_drbg_seed function in order to generate seed. Should I do this before each encryption operation or it might be done one when program starts?
You can use a single DRBG instance for the whole program. It's meant for that. In a high-performance multi-threaded program running on a multicore machine, you might prefer one DRBG instance per thread to reduce inter-thread contention.
Per the documentation, you must call mbedtls_ctr_drbg_seed exactly once per context. This function associates the DRBG with an entropy source. The DRBG will query the entropy source function from time to time when it wants more entropy1 (in all cases, at least once when you call the seed function).
You can see how to the entropy and DRBG APIs of Mbed TLS in sample programs such as key_app.c.
1 Depending on the reseed interval and the prediction resistance setting). These are secondary concerns, which matter only to recover if the DRBG state leaks (e.g. through a memory read vulnerability or through side channels).
Suppose I have a program that needs an RNG.
If I were to run arbitrary operations and check the ∆t it takes to do said operations, I could generate random numbers from that
For example:
double start = device.time();
for(int i=0;i<100;i++);//assume compiler doesn't optimize this away
double end = device.time();
double dt = end-start;
dt will be more or less random based on many variables on the device such as battery level, transistor age, room temperature, other processes running, etc.
Now, suppose I keep generating dts and multiply them together as I go, hundreds of times, thousands of times, millions of times, eventually I am left with a very arbitrary number based on values that were more or less randomly calculated by hardware performance benchmarking.
Every time I multiply these dts together, the possible outputs increases exponentially, so determining what possible outputs may be becomes perhaps an impossible task after millions of iterations of this, even if each individual dt value is going to be within a similar range.
A thought then occurs, if you have a very consistent device, you may have dt always in the range of say 0.000000011, 0.000000012, 0.000000013, 0.000000014, then the final output number, no matter how many times I iterate and multiply, will be a number of the form 0.000000011^a * 0.000000012^b * 0.000000013^c * 0.000000014^d, that's probably easy to crack.
But then I turn to hashing, suppose rather than multiplying each dt, I concatenate it in string form to the previous values and hash them, so every time I generate a new dt based on hardware performance's random environmental values, I hash. Then at the end I digest the hash to whatever form I need, now the final output number can't be written in a general algebraic form.
Will numbers generated in this form be cryptographically secure?
Using a clock potentially leaks information to an adversary. Using a microphone also -- the adversary may have planted a bug and is hearing the same input. Best not to rely on any single source but to combine entropy inputs from multiple sources, both external to your computer and internal. By all means use internal OS entropy sources, such as dev/urandom, but use other sources as well.
It might be worth reading the description of the Fortuna CSPRNG for ideas.
If you take enough samples, and use few enough low bits of the time difference, then maybe timing of async interrupts could eventually add up to a useful amount of entropy.
Most OS kernels will collect entropy from timing in their own interrupt handlers, as part of the source for /dev/urandom, but if you really want to roll your own instead of asking the OS for randomness, it's plausible if you're very careful with your mixing function. e.g. have a look at what the Linux kernel uses for mixing in new data into its entropy pool. It has to avoid being "hurt" by sources that aren't actually random on a given system.
Other than interrupts, performance over short times is nearly deterministic, and CPU frequency variations are quantized into not that many different frequencies.
based on many variables on the device such as ...
battery level: maybe a 2-state effect like limiting max turbo when not on AC power, and/or when the battery is low.
transistor age: no. At most an indirect effect if aged transistors use more power, leading to the CPU running hotter and dropping out of max turbo sooner. I'm not sure there's any significant effect.
room temperature: again, only possibly reducing max clock speed sooner. Unless you're wasting multiple seconds of CPU time on this, it won't have an effect even on lightweight laptops. Desktops typically have enough cooling to sustain max turbo indefinitely on a single core, especially for simple scalar code. (SIMD FMA would make a lot more heat.)
other processes running: yes, that and async interrupts that happen to come inside your timed intervals will be the main source of randomness.
Most of the factors that affect clock speed will just uniformly scale up all times, correlated between samples, not more entropy. Clock frequency doesn't change that often; after ramping up to full speed for your benchmark loops, expect it to stay constant for multiple seconds.
I'm trying to find out the differences between /dev/random and /dev/urandom files
What are the differences between /dev/random and /dev/urandom?
When should I use them?
when should I not use them?
Using /dev/random may require waiting for the result as it uses so-called entropy pool, where random data may not be available at the moment.
/dev/urandom returns as many bytes as user requested and thus it is less random than /dev/random.
As can be read from the man page:
random
When read, the /dev/random device will only return random bytes within
the estimated number of bits of noise in the entropy pool. /dev/random
should be suitable for uses that need very high quality randomness
such as one-time pad or key generation. When the entropy pool is
empty, reads from /dev/random will block until additional
environmental noise is gathered.
urandom
A read from the /dev/urandom device will not block waiting for more
entropy. As a result, if there is not sufficient entropy in the
entropy pool, the returned values are theoretically vulnerable to a
cryptographic attack on the algorithms used by the driver. Knowledge
of how to do this is not available in the current unclassified
literature, but it is theoretically possible that such an attack may
exist. If this is a concern in your application, use /dev/random
instead.
For cryptographic purposes you should really use /dev/random because of nature of data it returns. Possible waiting should be considered as acceptable tradeoff for the sake of security, IMO.
When you need random data fast, you should use /dev/urandom of course.
Source: Wikipedia page, man page
Always use /dev/urandom.
/dev/urandom and /dev/random use the same random number generator. They both are seeded by the same entropy pool. They both will give an equally random number of an arbitrary size. They both can give an infinite amount of random numbers with only a 256 bit seed. As long as the initial seed has 256 bits of entropy, you can have an infinite supply of arbitrarily long random numbers. You gain nothing from using /dev/random. The fact that there's two devices is a flaw in the Linux API.
If you are concerned about entropy, using /dev/random is not going to fix that. But it will slow down your application while not generating numbers anymore random than /dev/urandom. And if you aren't concerned about entropy, why are you using /dev/random at all?
Here's a much better/indepth explanation on why you should always use /dev/urandom: http://www.2uo.de/myths-about-urandom/
The kernel developers are discussing removing /dev/random: https://lwn.net/SubscriberLink/808575/9fd4fea3d86086f0/
What are the differences between /dev/random and /dev/urandom?
/dev/random and /dev/urandom are interfaces to the kernel's random number generator:
Reading returns a stream of random bytes strong enough for use in cryptography
Writing to them will provide the kernel data to update the entropy pool
When it comes to the differences, it depends on the operation system:
On Linux, reading from /dev/random may block, which limits its use in practice considerably
On FreeBSD, there is none. /dev/urandom is just a symbolic link to /dev/random.
When should I use them?
When should I not use them?
It is very difficult to find a use case where you should use /dev/random over /dev/urandom.
Danger of blocking:
This is a real problem that you will have to face when you decide to use /dev/random. For single usages like ssh-keygen it should be OK to wait for some seconds, but for most other situations it will be not an option.
If you use /dev/random, you should open it in nonblocking mode and provide some sort of user notification if the desired entropy is not immediately available.
Security:
On FreeBSD, there is no difference anyway, but also in Linux /dev/urandom is considered secure for almost all practical cases (e.g, Is a rand from /dev/urandom secure for a login key? and Myths about /dev/urandom).
The situations where it could make a difference are edge cases like a fresh Linux installation. To cite from the Linux man page:
The /dev/random interface is considered a legacy interface, and /dev/urandom is preferred and sufficient in
all use cases, with the exception of applications which require randomness during early boot time; for
these applications, getrandom(2) must be used instead, because it will block until the entropy pool is initialized.
If a seed file is saved across reboots as recommended below (all major Linux distributions have done this
since 2000 at least), the output is cryptographically secure against attackers without local root access as
soon as it is reloaded in the boot sequence, and perfectly adequate for network encryption session keys.
Since reads from /dev/random may block, users will usually want to open it in nonblocking mode (or perform
a read with timeout), and provide some sort of user notification if the desired entropy is not immediately available.
Recommendation
As a general rule, /dev/urandomshould be used for everything except long-lived GPG/SSL/SSH keys.
Short answer
Use /dev/urandom
Long Answer
They are both fed by the same cryptographically secure pseudorandom number generator (CSPRNG). The fact that /dev/random waits for entropy (or more specifically, waits for the system's estimation of its entropy to reach an appropriate level) only makes a difference when you are using a information-theoretically secure algorithm, as opposed to a computationally secure algorithm. The former encompasses algorithms that you probably aren't using, such as Shamir's Secret Sharing and the One-time pad. The latter contains algorithms that you actually use and care about, such as AES, RSA, Diffie-Hellman, OpenSSL, GnuTLS, etc.
So it doesn't matter if you use numbers from /dev/random since they're getting pumped out of a CSPRNG anyway, and it is "theoretically possible" to break the algorithms that you're likely using them with anyway.
Lastly, that "theoretically possible" bit means just that. In this case, that means using all of the computing power in the world, for the amount of time that that the universe has existed to crack the application.
Therefore, there is pretty much no point in using /dev/random
So use /dev/urandom
Sources
1
2
3
I am looking for a robust, efficient data compression algorithm that I could use to provide a real-time transmission of medical data (primarily waveforms - heart rate, etc.).
I would appreciate any recommendations/links to scientific papers.
EDIT: The system will be based on a server (most probably installed within point-of-care infrastructure) and mobile devices (iOS & Android smartphones and tablets with native apps), to which the waveforms are going to be transferred. The server will gather all the data from the hospital (primarily waveform data). In my case, stability and speed is more important than latency.
That's the most detailed specification I can provide at the moment. I am going to investigate your recommendations and then test several algorithms. But I am looking for something that was successfully implemented in similar architecture. I also am open to any suggestions regarding server computation power or server software.
Don't think of it as real-time or as medical data - think of it as packets of data needing to be compressed for transmission (most likely in TCP packets). The details of the content only matter in choice of compression algorithm, and even there it's not whether it's medical it's how the data is formatted/stored and what the actual data looks like. The important things are the data itself and the constraints due to the overall system (e.g. is it data gathering such as a Holter monitor, or is it real-time status reporting such as a cardiac monitor in an ICU? What kind of system is receiving the data?).
Looking at the data, is it being presented for transmission as raw binary data, or is it being received from another component or device as (for example) structured XML or HL7 with numeric values represented as text? Will compressing the original data be the most efficient option, or should it be converted down to a proprietary binary format that only covers the actual data range (are 2, 3 or 4 bytes enough to cover the range of values?)? What kind of savings could be achieved by converting and what are the compatibility concerns (e.g. loss of HL7 compatibility).
Choosing the absolutely best-compressing algorithm may also not be worth much additional work unless you're going to be in an extremely low-bandwidth scenario; if the data is coming from an embedded device you should be balancing compression efficiency with the capabilities and limitations of the embedded processor, toolset and surrounding system for working with it. If a custom-built compression routine saves you 5% over something already built-in to the tools is it worth the extra coding and debugging time and storage space in embedded flash? Existing validated software libraries that produce "good enough" output may be preferred, particularly for medical devices.
Finally, depending on the environment you may want to sacrifice a big chunk of compression in favor of some level of redundancy, such as transmitting a sliding window of the data such that loss of any X packets doesn't result in loss of data. This may let you change protocols as well and may change how the device is configured - the difference between streaming UDP (with no retransmission of lost packets) and TCP (where the sender may need to be able to retransmit) may be significant.
And, now that I've blathered about the systems side, there's a lot of information out there on packetizing and streaming analog data, ranging from development of streaming protocols such as RTP to details of voice packetization for GSM/CDMA and VOIP. Still, the most important drivers for your decisions may end up being the toolsets available to you on the device and server sides. Using existing toolsets even if they're not the most efficient option may allow you to cut your development (and time-to-market) times significantly, and may also simplify the certification of your device/product for medical use. On the business side, spending an extra 3-6 months of software development, finding truly qualified developers, and dealing with regulatory approvals are likely to be the overriding factors.
UPDATE 2012/02/01: I just spent a few minutes looking at the XML export of a 12-lead cardiac stress EKG with a total observation time of 12+ minutes and an XML file size of ~6MB. I'm estimating that more than 25% of that file was repetitive and EXTREMELY compressible XML in the study headers, and the waveform data was comma-separated numbers in the range of -200 to 200 concentrated in the center of the range and changing slowly, with the numbers crossing the y-axis and staying on that side for a time. Assuming that most of what you want is the waveform values, for this example you'd be looking at a data rate with no compression of 4500KB / 763 seconds or around 59 Kbps. Completely uncompressed and using text formatting you could run that over a "2.5G" GPRS connection with ease. On any modern wireless infrastructure the bandwidth used will be almost unnoticeable.
I still think that the stock compression libraries would eat this kind of data for lunch (subject to issues with compression headers and possibly packet headers). If you insist on doing a custom compression I'd look at sending difference values rather than raw numbers (unless your raw data is already offsets). If your data looks anything like what I'm reviewing, you could probably convert each item into a 1-byte value of -127 to +127, possibly reserving the extreme ends as "special" values used for overflow (handle those as you see fit - special representation, error, etc.). If you'd rather be slightly less efficient on transmission and insignificantly faster in processing you could instead just send each value as a signed 2-byte value, which would still use less bandwidth than the text representation because currently every value is 2+ bytes anyway (values are 1-4 chars plus separators no longer needed).
Basically, don't worry about the size of the data unless this is going to be running 24/7 over a heavily metered wireless connection with low caps.
There is a category of compression software which is so fast that i see no scenario in which it can't be called "real time" : they are necessarily fast enough. Such algorithms are called LZ4, Snappy, LZO, QuickLZ, and reach hundreds of MB/s per CPU.
A comparison of them is available here :
http://code.google.com/p/lz4/
"Real Time compression for transmission" can also be seen as a trade-off between speed and compression ratio. More compression, even if slower, can effectively save transmission time.
A study of the "optimal trade-off" between compression and speed has been realized on this page for example : http://fastcompression.blogspot.com/p/compression-benchmark.html
I tested many compression libraries and this is my conclusion
LZO (http://www.oberhumer.com/opensource/lzo/) is very fast considering compressing big amount of data (more than 1 MB)
Snappy (http://code.google.com/p/snappy/) is good but requires more processing resources at decompresion (better for data less than 1MB)
http://objectegypt.com is offering a library called IHCA which is faster than lzo in big data compression and offers a good decompression speed and requires no license
finally you'd better make your own compression functions, because no one knows about your data more than you
My application of MPI has some process that generate some large data. Say we have N+1 process (one for master control, others are workers), each of worker processes generate large data, which is now simply write to normal file, named file1, file2, ..., fileN. The size of each file may be quite different. Now I need to send all fileM to rank M process to do the next job, So it's just like all to all data transfer.
My problem is how should I use MPI API to send these files efficiently? I used to use windows share folder to transfer these before, but I think it's not a good idea.
I have think about MPI_file and MPI_All_to_all, but these functions seems not to be so suitable for my case. Simple MPI_Send and MPI_Recv seems hard to be used because every process need to transfer large data, and I don't want to use distributed file system for now.
It's not possible to answer your question precisely without a lot more data, data that only you have right now. So here are some generalities, you'll have to think about them and see if and how to apply them in your situation.
If your processes are generating large data sets they are unlikely to be doing so instantaneously. Instead of thinking about waiting until the whole data set is created, you might want to think about transferring it chunk by chunk.
I don't think that MPI_Send and _Recv (or the variations on them) are hard to use for large amounts of data. But you need to give some thought to finding the right amount to transfer in each communication between processes. With MPI it is not a simple case of there being a message startup time plus a message transfer rate which apply to all messages sent. Some IBM implementations, for example, on some of their hardware had different latencies and bandwidths for small and large messages. However, you have to figure out for yourself what the tradeoffs between bandwidth and latency are for your platform. The only general advice I would give here is to parameterise the message sizes and experiment until you maximise the ratio of computation to communication.
As an aside, one of the tests you should already have done is measured message transfer rates for a wide range of sizes and communications patterns on your platform. That's kind of a basic shake-down test when you start work on a new system. If you don't have anything more suitable, the STREAMS benchmark will help you get started.
I think that a all-to-all transfers of large amounts of data is an unusual scenario in the kinds of programs for which MPI is typically used. You may want to give some serious thought to redesigning your application to avoid such transfers. Of course, only you know if that is feasible or worthwhile. From what little information your provide it seems as if you might be implementing some kind of pipeline; in such cases the usual pattern of communication is from process 0 to process 1, process 1 to process 2, 2 to 3, etc.
Finally, if you happen to be working on a computer with shared memory (such as a multicore PC) you might think about using a shared memory approach, such as OpenMP, to avoid passing large amounts of data around.