Symmetric Encryption: Performance Questions

Symmetric Encryption: Performance Questions - performance

Does the performance of a symmetric encryption algorithm depend on the amount of data being encrypted? Suppose I have about 1000 bytes I need to send over the network rapidly, is it better to encrypt 50 bytes of data 20 times, or 1000 bytes at once? Which will be faster? Does it depend on the algorithm used? If so, what's the highest performing, most secure algorithm for amounts of data under 512 bytes?

The short answers are:
You want to encrypt all your data in one go. You actually want to give them all in one function call to the encryption code, so that it can run even faster.
With a proper encryption algorithm, encryption will be substantially faster than the network itself. It takes a very bad implementation, a very old PC or a very fast network to make encryption a bottleneck.
When in doubt, use an all-made protocol such as SSL/TLS. If given the choice of the encryption algorithm within a protocol, use AES.
Now the longer answers:
There are block ciphers and stream cipher. A stream cipher begins with an initialization phase, in which the key is input in the system (often called "key schedule"), and then encrypts data bytes "on the fly". With a stream cipher, the encrypted message has the same length than the input message, and encryption time is proportional to the input message length, save for the computational cost of the key schedule. We are not talking big numbers here; key schedule time is below 1 microsecond on a not-so-new PC. Yet, for optimal performance, you want to do key schedule once, not 50 times.
Block ciphers also have a key schedule. When the key schedule has been performed, a block cipher can encrypt blocks, i.e. chunks of data of a fixed length. The block length depends on the algorithm, but it is typically 8 or 16 bytes. AES is a block cipher. In order to encrypt a "message" of arbitrary length, the block cipher must be invoked several times, and this is trickier than it seems (there are many security issues). The part which decides how those invocations are assembled together is called the chaining mode. A well-known chaining mode is called CBC. Depending on the chaining mode, there may be a need for an extra step called padding in which a few extra bytes are added to the input message, so that its length becomes compatible with the chosen chaining. Padding must be such that, upon decryption, it can be removed unambiguously. A common padding scheme is called "PKCS#5".
There is a chaining mode called "CTR" which effectively turns a block cipher into a stream cipher. It has some good points; especially, CTR mode requires no padding, and the encrypted message length will have the same length than the input message. AES with CTR mode is good.
Encryption speed will be about 100 MB/s on a typical PC (e.g. a 2.4 GHz Intel Core2, using a single core).
Warning: there are issues with regards to encrypting several messages with the same key. With chaining modes, those issues hide under the name of "IV" (as "initial value"). With most chaining modes, the IV is a random value of the same size of the cipher block. The IV needs not be secret (it is often transmitted along with the encrypted message, because the decrypting party must also know it) but it must be chosen randomly and uniformly, and each message needs a new IV. Some chaining modes (e.g. CTR) can tolerate a non-uniform IV but only for the first message ever with a given key. With CBC even the first message needs a fully random IV. If this paragraph does not make full sense to you, then, please, do not try to design an encryption protocol, it is more complex than you imagine. Instead, use an already specified protocol, such as SSL (for an encryption tunnel) or CMS (for encrypted messages). Development of such protocols was a long and painful history of attacks and countermeasures, with much grinding of teeth. Do not reenact that history...
Warning 2: if you use encryption, then you are worrying about security: there may be adverse entities bent on attacking your system. Most of the time, simple encryption will not fully deter them; by itself, (properly applied) encryption defeats only passive attackers, those who observe transmitted bytes but do not alter them. A generic attacker is also active, i.e. he removes some data bytes, moves and duplicates other, or adds extra bytes of his own designing. To defeat active attackers you need more than encryption, you also need integrity checks. There again, protocols such as SSL and CMS already take care of the details.

AES should be a good choice.
Good implementations of AES (e.g. the one included in the openssl library) require about 10-20 CPU cycles per byte to encrypt. When you encrypt small messages then you also have to consider the time for the key setup. E.g., a typical implementation of DES requires a few thousand cycles for a key setup. I.e., you might actually finish encrypting a small message with AES before you even start encrypting with other ciphers.
Newer processors (e.g. Westmere based CPUs) have an instruction set that supports AES, allowing to encrypt at speeds of 1.5-4 cycles per byte. That is almost impossible to beat by any other cipher.
If possible try encrypting longer messages, rather then splitting them into small pieces. The main reason is not encryption speed, but rather security. I.e., a secure encryption mode usually requires to use an initialization vector and a message authentication (MAC). These will add about 32 bytes to each of your ciphertext parts. I.e. if you divide your message into small parts then this overhead will be significant.

Symmetric encryption algorithms typically are block ciphers. For any given algorithm, the block size is fixed. Then you pick from several different methods of making subsequent blocks dependent on earlier blocks (i.e. cipher block chaining) to create a stream cipher. But the stream cipher invariably caches incoming data and submits it to the block cipher in full blocks.
So by doing 50 bytes 20 times, all you're doing is pounding on your cache logic.
If you're not operating in stream mode, then datagrams smaller than the native block size of your cipher will get significantly less protection than complete blocks, since there are fewer possible messages for an attacker to consider.

Obviously, performance depends on the amount of data as every part of the data has to be encrypted. You'll get much better information by doing a test in your specific environment (language, platform, encryption algorithm implementation) than anyone here could possibly provide out of the blue: I don't think it would take more than half an hour to set up a basic performance measurement.
As for security, you should be fine with Triple DES or AES.

Related

scrambling GBT data to identify counterpart

this question is FPGA-design-and-languages agnostic. I use bidirectional gigabit optical transmission lines (GBT) for communication of two distant counterparts. The GBT frame payload is 80 bits, out of them I use 64, so I have additional 16 bits for spare usage.
The master sends consecutively each 25ns a packet of 80 bits, and the same works the other way around. I need to assure, that master sends the data to a client, which has specific firmware implemented, so during the communication I need to identify, that client is equipped with a firmware version required to digest the data I'm sending. The communication is not transaction based, but the optical link rather realizes 80bit register to 80bit register seamless pass-through. Unfortunately the 16 spare bits I have to my disposition I just cannot simply set to a constant value, and somehow code into that constant the target firmware. Such method is quite common, and I could not guarantee 100% firmwares match.
I was thinking whether there would exist some sort of symmetric data scrambling, which could be used to send from slave to master over such scrambled channel the constants needed. I was wondering if there exist some 'standard' solution how to identify two counterparts in a hardware communication channel by some reasonably simple means in terms of required logic elements.
I do not want to encrypt the data. I just want to assure that the firmwares match.
What is the recommended way to handle this?

You could simply barrel-rotate your data by one bit. This would result in essentially a ceasar cipher - not cryptographically secure, but noone is going to accidentally implement the same thing in a different firmware. Also, it gives you the ability to differentiate between 79 different firmware versions(shift one bit, two bits, three ...).
However, this seems overly complicated to me. I can't think of a time when either transmitting a version code in the 16 spare bits or exchanging version numbers at the beginning of communication in a handshake-style patten wouldn't make more sense.

differences between random and urandom

I'm trying to find out the differences between /dev/random and /dev/urandom files
What are the differences between /dev/random and /dev/urandom?
When should I use them?
when should I not use them?

Using /dev/random may require waiting for the result as it uses so-called entropy pool, where random data may not be available at the moment.
/dev/urandom returns as many bytes as user requested and thus it is less random than /dev/random.
As can be read from the man page:
random
When read, the /dev/random device will only return random bytes within
the estimated number of bits of noise in the entropy pool. /dev/random
should be suitable for uses that need very high quality randomness
such as one-time pad or key generation. When the entropy pool is
empty, reads from /dev/random will block until additional
environmental noise is gathered.
urandom
A read from the /dev/urandom device will not block waiting for more
entropy. As a result, if there is not sufficient entropy in the
entropy pool, the returned values are theoretically vulnerable to a
cryptographic attack on the algorithms used by the driver. Knowledge
of how to do this is not available in the current unclassified
literature, but it is theoretically possible that such an attack may
exist. If this is a concern in your application, use /dev/random
instead.
For cryptographic purposes you should really use /dev/random because of nature of data it returns. Possible waiting should be considered as acceptable tradeoff for the sake of security, IMO.
When you need random data fast, you should use /dev/urandom of course.
Source: Wikipedia page, man page

Always use /dev/urandom.
/dev/urandom and /dev/random use the same random number generator. They both are seeded by the same entropy pool. They both will give an equally random number of an arbitrary size. They both can give an infinite amount of random numbers with only a 256 bit seed. As long as the initial seed has 256 bits of entropy, you can have an infinite supply of arbitrarily long random numbers. You gain nothing from using /dev/random. The fact that there's two devices is a flaw in the Linux API.
If you are concerned about entropy, using /dev/random is not going to fix that. But it will slow down your application while not generating numbers anymore random than /dev/urandom. And if you aren't concerned about entropy, why are you using /dev/random at all?
Here's a much better/indepth explanation on why you should always use /dev/urandom: http://www.2uo.de/myths-about-urandom/
The kernel developers are discussing removing /dev/random: https://lwn.net/SubscriberLink/808575/9fd4fea3d86086f0/

What are the differences between /dev/random and /dev/urandom?
/dev/random and /dev/urandom are interfaces to the kernel's random number generator:
Reading returns a stream of random bytes strong enough for use in cryptography
Writing to them will provide the kernel data to update the entropy pool
When it comes to the differences, it depends on the operation system:
On Linux, reading from /dev/random may block, which limits its use in practice considerably
On FreeBSD, there is none. /dev/urandom is just a symbolic link to /dev/random.
When should I use them?
When should I not use them?
It is very difficult to find a use case where you should use /dev/random over /dev/urandom.
Danger of blocking:
This is a real problem that you will have to face when you decide to use /dev/random. For single usages like ssh-keygen it should be OK to wait for some seconds, but for most other situations it will be not an option.
If you use /dev/random, you should open it in nonblocking mode and provide some sort of user notification if the desired entropy is not immediately available.
Security:
On FreeBSD, there is no difference anyway, but also in Linux /dev/urandom is considered secure for almost all practical cases (e.g, Is a rand from /dev/urandom secure for a login key? and Myths about /dev/urandom).
The situations where it could make a difference are edge cases like a fresh Linux installation. To cite from the Linux man page:
The /dev/random interface is considered a legacy interface, and /dev/urandom is preferred and sufficient in
all use cases, with the exception of applications which require randomness during early boot time; for
these applications, getrandom(2) must be used instead, because it will block until the entropy pool is initialized.
If a seed file is saved across reboots as recommended below (all major Linux distributions have done this
since 2000 at least), the output is cryptographically secure against attackers without local root access as
soon as it is reloaded in the boot sequence, and perfectly adequate for network encryption session keys.
Since reads from /dev/random may block, users will usually want to open it in nonblocking mode (or perform
a read with timeout), and provide some sort of user notification if the desired entropy is not immediately available.
Recommendation
As a general rule, /dev/urandomshould be used for everything except long-lived GPG/SSL/SSH keys.

Short answer
Use /dev/urandom
Long Answer
They are both fed by the same cryptographically secure pseudorandom number generator (CSPRNG). The fact that /dev/random waits for entropy (or more specifically, waits for the system's estimation of its entropy to reach an appropriate level) only makes a difference when you are using a information-theoretically secure algorithm, as opposed to a computationally secure algorithm. The former encompasses algorithms that you probably aren't using, such as Shamir's Secret Sharing and the One-time pad. The latter contains algorithms that you actually use and care about, such as AES, RSA, Diffie-Hellman, OpenSSL, GnuTLS, etc.
So it doesn't matter if you use numbers from /dev/random since they're getting pumped out of a CSPRNG anyway, and it is "theoretically possible" to break the algorithms that you're likely using them with anyway.
Lastly, that "theoretically possible" bit means just that. In this case, that means using all of the computing power in the world, for the amount of time that that the universe has existed to crack the application.
Therefore, there is pretty much no point in using /dev/random
So use /dev/urandom
Sources
1
2
3

What is entropy starvation

I was lost when reading
"Knowing how Linux behaves during entropy starvation (and being able to find the cause) allows us to efficiently use our server hardware."
in a blog. Then I wikied the meaning of 'entropy' in the context of linux. But still, not clear what "entropy starvation' is and the meaning of the sentence quoted above.

Some applications, notably cryptography, need random data. In cryptography, it is very important that the data be truly random, or at least unpredictable (even in part) to any attacker.
To supply this data, a system keeps a pool of random data, called entropy, that it collects from various sources of randomness on the system: Precise timing of events that might be somewhat random (keys pressed by users, interrupts from external devices), noise on a microphone, or, on some processors, dedicated hardware for generating random values. The incoming somewhat-random data is mixed together to produce better quality entropy.
These sources of randomness can only supply data at certain rates. If a system is used to do a lot of work that needs random data, it can use up more random data than is available. Then software that wants random data has to wait for more to be generated or it has to accept lower-quality data. This is called entropy starvation or entropy depletion.

What SSL cipher suite has the least overhead?

What SSL cipher suite has the least overhead? A clearly compromised suite would be undesirable, however there age degrees of problems. For instance RC4 is still in the SSL 3.0 specification. What is a good recommendation for a highly traffic website? Would the cipher suite change if it wasn't being used for http?

It depends if you talk about network or CPU overhead.
Network overhead is about packet size. The initial handshake implies some asymmetric cryptography; the DHE cipher suites (when the server certificates is used for digital signatures only) imply a ServerKeyExchange message which will need a few hundred extra bytes compared with a RSA key exchange. This is a one-time cost, and clients will reuse sessions (continuing a previous TLS session with a symmetric-only shortened key exchange).
Also, data is exchanged by "records". A record can embed up to 16 kB worth of data. A record has a size overhead which ranges from 21 bytes (with RC4 and MD5) to 57 bytes (with a 16-byte block cipher such as AES, and SHA-1, and TLS 1.1 or later). So that's at worst 0.34% size overhead.
CPU overhead of SSL is now quite small. Use openssl speed to get some raw figures; on my PC (a 2.4 GHz Core2 from two years ago), RC4 appears to be about twice faster than AES, but AES is already at 160 MBytes/s, i.e. 16 times faster than 100baseT ethernet can transmit. The integrity check (with MD5 or SHA-1) will be quite faster than the encryption. So the cipher suite with the least CPU overhead should be SSL_RSA_WITH_RC4_128_MD5, but it will need some rather special kind of setup to actually notice the difference with, e.g., TLS_RSA_WITH_AES_128_CBC_SHA. Also, on some of the newer Intel processors, there are AES-specific instructions, which will make AES faster than RC4 on those systems (the VIA C7 x86 clones also have some hardware acceleration for some cryptographic algorithms). RC4 may give you an extra edge in some corner cases due to its very small code -- in case your application is rather heavy on code size and you run into L1 cache issues.
(As usual, for performance issues, actual measures always beat theory.)

The ciphersuite with the less overhead is RSA_WITH_RC4_MD5. Note that the way RC4 is used in TLS does not render it broken, as for example in WEP, but still its security can be questioned. It also uses the HMAC-MD5, which also is not the best choice, even though there no attacks known yet. Several web sites (unfortunately) only use that ciphersuite for efficiency. If you use an intel server with AES-NI instructions you might want to experiment with RSA_WITH_AES_128_SHA1. It is faster than RSA_WITH_RC4_MD5 in the systems I've tested.

I was searching about SSL/TLS and bumped into this one. I know the thread is old and just wanted to add a few updates just in case someone gets lost here.
Some ciphers offer more security and some more performance. But since this was posted, several changes to SSL/TLS, most specially on security has been introduced.
For good and always updated ciphers check out this SSL/TLS generator by Mozilla
It is also worth to note that if you are concern with performance, there are other aspects in the SSL connection that you could explore such as:
OCSP stapling
Session resumption (tickets)
Session resumption (caching)
False Start (NPN needed)
HTTP/2

How much overhead does SSL impose?

I know there's no single hard-and-fast answer, but is there a generic order-of-magnitude estimate approximation for the encryption overhead of SSL versus unencrypted socket communication? I'm talking only about the comm processing and wire time, not counting application-level processing.
Update
There is a question about HTTPS versus HTTP, but I'm interested in looking lower in the stack.
(I replaced the phrase "order of magnitude" to avoid confusion; I was using it as informal jargon rather than in the formal CompSci sense. Of course if I had meant it formally, as a true geek I would have been thinking binary rather than decimal! ;-)
Update
Per request in comment, assume we're talking about good-sized messages (range of 1k-10k) over persistent connections. So connection set-up and packet overhead are not significant issues.

Order of magnitude: zero.
In other words, you won't see your throughput cut in half, or anything like it, when you add TLS. Answers to the "duplicate" question focus heavily on application performance, and how that compares to SSL overhead. This question specifically excludes application processing, and seeks to compare non-SSL to SSL only. While it makes sense to take a global view of performance when optimizing, that is not what this question is asking.
The main overhead of SSL is the handshake. That's where the expensive asymmetric cryptography happens. After negotiation, relatively efficient symmetric ciphers are used. That's why it can be very helpful to enable SSL sessions for your HTTPS service, where many connections are made. For a long-lived connection, this "end-effect" isn't as significant, and sessions aren't as useful.
Here's an interesting anecdote. When Google switched Gmail to use HTTPS, no additional resources were required; no network hardware, no new hosts. It only increased CPU load by about 1%.

I second #erickson: The pure data-transfer speed penalty is negligible. Modern CPUs reach a crypto/AES throughput of several hundred MBit/s. So unless you are on resource constrained system (mobile phone) TLS/SSL is fast enough for slinging data around.
But keep in mind that encryption makes caching and load balancing much harder. This might result in a huge performance penalty.
But connection setup is really a show stopper for many application. On low bandwidth, high packet loss, high latency connections (mobile device in the countryside) the additional roundtrips required by TLS might render something slow into something unusable.
For example we had to drop the encryption requirement for access to some of our internal web apps - they where next to unusable if used from china.

Assuming you don't count connection set-up (as you indicated in your update), it strongly depends on the cipher chosen. Network overhead (in terms of bandwidth) will be negligible. CPU overhead will be dominated by cryptography. On my mobile Core i5, I can encrypt around 250 MB per second with RC4 on a single core. (RC4 is what you should choose for maximum performance.) AES is slower, providing "only" around 50 MB/s. So, if you choose correct ciphers, you won't manage to keep a single current core busy with the crypto overhead even if you have a fully utilized 1 Gbit line. [Edit: RC4 should not be used because it is no longer secure. However, AES hardware support is now present in many CPUs, which makes AES encryption really fast on such platforms.]
Connection establishment, however, is different. Depending on the implementation (e.g. support for TLS false start), it will add round-trips, which can cause noticable delays. Additionally, expensive crypto takes place on the first connection establishment (above-mentioned CPU could only accept 14 connections per core per second if you foolishly used 4096-bit keys and 100 if you use 2048-bit keys). On subsequent connections, previous sessions are often reused, avoiding the expensive crypto.
So, to summarize:
Transfer on established connection:
Delay: nearly none
CPU: negligible
Bandwidth: negligible
First connection establishment:
Delay: additional round-trips
Bandwidth: several kilobytes (certificates)
CPU on client: medium
CPU on server: high
Subsequent connection establishments:
Delay: additional round-trip (not sure if one or multiple, may be implementation-dependant)
Bandwidth: negligible
CPU: nearly none

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio