What SSL cipher suite has the least overhead? - performance

What SSL cipher suite has the least overhead? A clearly compromised suite would be undesirable, however there age degrees of problems. For instance RC4 is still in the SSL 3.0 specification. What is a good recommendation for a highly traffic website? Would the cipher suite change if it wasn't being used for http?

It depends if you talk about network or CPU overhead.
Network overhead is about packet size. The initial handshake implies some asymmetric cryptography; the DHE cipher suites (when the server certificates is used for digital signatures only) imply a ServerKeyExchange message which will need a few hundred extra bytes compared with a RSA key exchange. This is a one-time cost, and clients will reuse sessions (continuing a previous TLS session with a symmetric-only shortened key exchange).
Also, data is exchanged by "records". A record can embed up to 16 kB worth of data. A record has a size overhead which ranges from 21 bytes (with RC4 and MD5) to 57 bytes (with a 16-byte block cipher such as AES, and SHA-1, and TLS 1.1 or later). So that's at worst 0.34% size overhead.
CPU overhead of SSL is now quite small. Use openssl speed to get some raw figures; on my PC (a 2.4 GHz Core2 from two years ago), RC4 appears to be about twice faster than AES, but AES is already at 160 MBytes/s, i.e. 16 times faster than 100baseT ethernet can transmit. The integrity check (with MD5 or SHA-1) will be quite faster than the encryption. So the cipher suite with the least CPU overhead should be SSL_RSA_WITH_RC4_128_MD5, but it will need some rather special kind of setup to actually notice the difference with, e.g., TLS_RSA_WITH_AES_128_CBC_SHA. Also, on some of the newer Intel processors, there are AES-specific instructions, which will make AES faster than RC4 on those systems (the VIA C7 x86 clones also have some hardware acceleration for some cryptographic algorithms). RC4 may give you an extra edge in some corner cases due to its very small code -- in case your application is rather heavy on code size and you run into L1 cache issues.
(As usual, for performance issues, actual measures always beat theory.)

The ciphersuite with the less overhead is RSA_WITH_RC4_MD5. Note that the way RC4 is used in TLS does not render it broken, as for example in WEP, but still its security can be questioned. It also uses the HMAC-MD5, which also is not the best choice, even though there no attacks known yet. Several web sites (unfortunately) only use that ciphersuite for efficiency. If you use an intel server with AES-NI instructions you might want to experiment with RSA_WITH_AES_128_SHA1. It is faster than RSA_WITH_RC4_MD5 in the systems I've tested.

I was searching about SSL/TLS and bumped into this one. I know the thread is old and just wanted to add a few updates just in case someone gets lost here.
Some ciphers offer more security and some more performance. But since this was posted, several changes to SSL/TLS, most specially on security has been introduced.
For good and always updated ciphers check out this SSL/TLS generator by Mozilla
It is also worth to note that if you are concern with performance, there are other aspects in the SSL connection that you could explore such as:
OCSP stapling
Session resumption (tickets)
Session resumption (caching)
False Start (NPN needed)
HTTP/2

Related

scrambling GBT data to identify counterpart

this question is FPGA-design-and-languages agnostic. I use bidirectional gigabit optical transmission lines (GBT) for communication of two distant counterparts. The GBT frame payload is 80 bits, out of them I use 64, so I have additional 16 bits for spare usage.
The master sends consecutively each 25ns a packet of 80 bits, and the same works the other way around. I need to assure, that master sends the data to a client, which has specific firmware implemented, so during the communication I need to identify, that client is equipped with a firmware version required to digest the data I'm sending. The communication is not transaction based, but the optical link rather realizes 80bit register to 80bit register seamless pass-through. Unfortunately the 16 spare bits I have to my disposition I just cannot simply set to a constant value, and somehow code into that constant the target firmware. Such method is quite common, and I could not guarantee 100% firmwares match.
I was thinking whether there would exist some sort of symmetric data scrambling, which could be used to send from slave to master over such scrambled channel the constants needed. I was wondering if there exist some 'standard' solution how to identify two counterparts in a hardware communication channel by some reasonably simple means in terms of required logic elements.
I do not want to encrypt the data. I just want to assure that the firmwares match.
What is the recommended way to handle this?
You could simply barrel-rotate your data by one bit. This would result in essentially a ceasar cipher - not cryptographically secure, but noone is going to accidentally implement the same thing in a different firmware. Also, it gives you the ability to differentiate between 79 different firmware versions(shift one bit, two bits, three ...).
However, this seems overly complicated to me. I can't think of a time when either transmitting a version code in the 16 spare bits or exchanging version numbers at the beginning of communication in a handshake-style patten wouldn't make more sense.

SSL on MQ : CPU Performance

I would like to deploy SSL on a MQ server but I would like to know if my current CPU capacity will support SSL. (I do not have budget to increase the number of CPU cores and MQ PVU)
My specs:
Windows 2003 Server SP2,
1 core of Intel Xeon CPU E5-2690 2.9GHz,
2 GB RAM,
1 Qmgr,
Linear Logging,
Persistants messages,
DQM with 5 parteners,
10 senders channels,
10 receivers channels
For a month:
we exchange in average 3 million messages with our partners on a total of 15Gbytes of data.
(so 5K per message).
we had in average variations of CPU between 20% and 40%
we had 4 peaks of 100% CPU
Do you think my system can cope SSL with Cipher RC4_MD5_EXPORT ?
Best Regards,
Pascal
I don't think it's possible to provide a definitive answer as to whether your server can cope with enabling SSL using the RC4_MD5_EXPORT cipher on your MQ channels short of trying it and assessing the impact. You may also want to take a look at the processor queue length using the Windows Performance Monitor tool to see how many processes are waiting for CPU time when the usage increases.
As your CPU provides hardware support for the AES encryption algorithm you may want to consider using one of the AES CipherSpecs instead. This also has the advantage of providing better security as both MD5 and RC4 are fairly weak in terms of hashing and encryption.
One option to consider is installing an SSL acceleration card in your server to allow the messages to be encrypted/hashed using dedicated hardware rather than your server's CPU. This page http://www-01.ibm.com/support/knowledgecenter/#!/SSFKSJ_7.5.0/com.ibm.mq.ref.doc/q049300_.htm on the IBM Knowledge Center provide some further information and lists which cards are supported by WebSphere MQ.

Symmetric Encryption: Performance Questions

Does the performance of a symmetric encryption algorithm depend on the amount of data being encrypted? Suppose I have about 1000 bytes I need to send over the network rapidly, is it better to encrypt 50 bytes of data 20 times, or 1000 bytes at once? Which will be faster? Does it depend on the algorithm used? If so, what's the highest performing, most secure algorithm for amounts of data under 512 bytes?
The short answers are:
You want to encrypt all your data in one go. You actually want to give them all in one function call to the encryption code, so that it can run even faster.
With a proper encryption algorithm, encryption will be substantially faster than the network itself. It takes a very bad implementation, a very old PC or a very fast network to make encryption a bottleneck.
When in doubt, use an all-made protocol such as SSL/TLS. If given the choice of the encryption algorithm within a protocol, use AES.
Now the longer answers:
There are block ciphers and stream cipher. A stream cipher begins with an initialization phase, in which the key is input in the system (often called "key schedule"), and then encrypts data bytes "on the fly". With a stream cipher, the encrypted message has the same length than the input message, and encryption time is proportional to the input message length, save for the computational cost of the key schedule. We are not talking big numbers here; key schedule time is below 1 microsecond on a not-so-new PC. Yet, for optimal performance, you want to do key schedule once, not 50 times.
Block ciphers also have a key schedule. When the key schedule has been performed, a block cipher can encrypt blocks, i.e. chunks of data of a fixed length. The block length depends on the algorithm, but it is typically 8 or 16 bytes. AES is a block cipher. In order to encrypt a "message" of arbitrary length, the block cipher must be invoked several times, and this is trickier than it seems (there are many security issues). The part which decides how those invocations are assembled together is called the chaining mode. A well-known chaining mode is called CBC. Depending on the chaining mode, there may be a need for an extra step called padding in which a few extra bytes are added to the input message, so that its length becomes compatible with the chosen chaining. Padding must be such that, upon decryption, it can be removed unambiguously. A common padding scheme is called "PKCS#5".
There is a chaining mode called "CTR" which effectively turns a block cipher into a stream cipher. It has some good points; especially, CTR mode requires no padding, and the encrypted message length will have the same length than the input message. AES with CTR mode is good.
Encryption speed will be about 100 MB/s on a typical PC (e.g. a 2.4 GHz Intel Core2, using a single core).
Warning: there are issues with regards to encrypting several messages with the same key. With chaining modes, those issues hide under the name of "IV" (as "initial value"). With most chaining modes, the IV is a random value of the same size of the cipher block. The IV needs not be secret (it is often transmitted along with the encrypted message, because the decrypting party must also know it) but it must be chosen randomly and uniformly, and each message needs a new IV. Some chaining modes (e.g. CTR) can tolerate a non-uniform IV but only for the first message ever with a given key. With CBC even the first message needs a fully random IV. If this paragraph does not make full sense to you, then, please, do not try to design an encryption protocol, it is more complex than you imagine. Instead, use an already specified protocol, such as SSL (for an encryption tunnel) or CMS (for encrypted messages). Development of such protocols was a long and painful history of attacks and countermeasures, with much grinding of teeth. Do not reenact that history...
Warning 2: if you use encryption, then you are worrying about security: there may be adverse entities bent on attacking your system. Most of the time, simple encryption will not fully deter them; by itself, (properly applied) encryption defeats only passive attackers, those who observe transmitted bytes but do not alter them. A generic attacker is also active, i.e. he removes some data bytes, moves and duplicates other, or adds extra bytes of his own designing. To defeat active attackers you need more than encryption, you also need integrity checks. There again, protocols such as SSL and CMS already take care of the details.
AES should be a good choice.
Good implementations of AES (e.g. the one included in the openssl library) require about 10-20 CPU cycles per byte to encrypt. When you encrypt small messages then you also have to consider the time for the key setup. E.g., a typical implementation of DES requires a few thousand cycles for a key setup. I.e., you might actually finish encrypting a small message with AES before you even start encrypting with other ciphers.
Newer processors (e.g. Westmere based CPUs) have an instruction set that supports AES, allowing to encrypt at speeds of 1.5-4 cycles per byte. That is almost impossible to beat by any other cipher.
If possible try encrypting longer messages, rather then splitting them into small pieces. The main reason is not encryption speed, but rather security. I.e., a secure encryption mode usually requires to use an initialization vector and a message authentication (MAC). These will add about 32 bytes to each of your ciphertext parts. I.e. if you divide your message into small parts then this overhead will be significant.
Symmetric encryption algorithms typically are block ciphers. For any given algorithm, the block size is fixed. Then you pick from several different methods of making subsequent blocks dependent on earlier blocks (i.e. cipher block chaining) to create a stream cipher. But the stream cipher invariably caches incoming data and submits it to the block cipher in full blocks.
So by doing 50 bytes 20 times, all you're doing is pounding on your cache logic.
If you're not operating in stream mode, then datagrams smaller than the native block size of your cipher will get significantly less protection than complete blocks, since there are fewer possible messages for an attacker to consider.
Obviously, performance depends on the amount of data as every part of the data has to be encrypted. You'll get much better information by doing a test in your specific environment (language, platform, encryption algorithm implementation) than anyone here could possibly provide out of the blue: I don't think it would take more than half an hour to set up a basic performance measurement.
As for security, you should be fine with Triple DES or AES.

What is the performance hit of using TLS with apache?

How much of a performance hit will running everything over TLS do to my server? I would assume this is completely ignorable in this day and age? I heard once that servers today could encrypt gigabytes of data per second, is that true? And if so, is it linearly scalable so that if top speed is 10GB/second, encrypting 1GB would take 0.1 second?
I'm not in some kind of pickle with any admin over this (yet). I'm just curious and if I can mostly ignore the hit, why not just encrypt everything?
Performance Analysis of TLS Web Servers (pdf), a paper written at Rice University, covered this topic back in 2002, and they came to this conclusion:
Apache TLS without the AXL300 served between 149 hits/sec and 259 hits/sec for the CS trace, and between 147 hits/sec and 261 hits/sec for the Amazon trace. This confirms that TLS incurs a substantial cost and reduces the throughput by 70 to 89% relative to the insecure Apache.
So without the AXL300 board, which offloads encryption, there was a reduction in throughput of 70-89% on a PIII-933MHz. However, they note in the next section that as CPU speeds increase, the throughput is expected to increase accordingly. So since 2002, you may find that there is no noticeable difference for your workload.

How much overhead does SSL impose?

I know there's no single hard-and-fast answer, but is there a generic order-of-magnitude estimate approximation for the encryption overhead of SSL versus unencrypted socket communication? I'm talking only about the comm processing and wire time, not counting application-level processing.
Update
There is a question about HTTPS versus HTTP, but I'm interested in looking lower in the stack.
(I replaced the phrase "order of magnitude" to avoid confusion; I was using it as informal jargon rather than in the formal CompSci sense. Of course if I had meant it formally, as a true geek I would have been thinking binary rather than decimal! ;-)
Update
Per request in comment, assume we're talking about good-sized messages (range of 1k-10k) over persistent connections. So connection set-up and packet overhead are not significant issues.
Order of magnitude: zero.
In other words, you won't see your throughput cut in half, or anything like it, when you add TLS. Answers to the "duplicate" question focus heavily on application performance, and how that compares to SSL overhead. This question specifically excludes application processing, and seeks to compare non-SSL to SSL only. While it makes sense to take a global view of performance when optimizing, that is not what this question is asking.
The main overhead of SSL is the handshake. That's where the expensive asymmetric cryptography happens. After negotiation, relatively efficient symmetric ciphers are used. That's why it can be very helpful to enable SSL sessions for your HTTPS service, where many connections are made. For a long-lived connection, this "end-effect" isn't as significant, and sessions aren't as useful.
Here's an interesting anecdote. When Google switched Gmail to use HTTPS, no additional resources were required; no network hardware, no new hosts. It only increased CPU load by about 1%.
I second #erickson: The pure data-transfer speed penalty is negligible. Modern CPUs reach a crypto/AES throughput of several hundred MBit/s. So unless you are on resource constrained system (mobile phone) TLS/SSL is fast enough for slinging data around.
But keep in mind that encryption makes caching and load balancing much harder. This might result in a huge performance penalty.
But connection setup is really a show stopper for many application. On low bandwidth, high packet loss, high latency connections (mobile device in the countryside) the additional roundtrips required by TLS might render something slow into something unusable.
For example we had to drop the encryption requirement for access to some of our internal web apps - they where next to unusable if used from china.
Assuming you don't count connection set-up (as you indicated in your update), it strongly depends on the cipher chosen. Network overhead (in terms of bandwidth) will be negligible. CPU overhead will be dominated by cryptography. On my mobile Core i5, I can encrypt around 250 MB per second with RC4 on a single core. (RC4 is what you should choose for maximum performance.) AES is slower, providing "only" around 50 MB/s. So, if you choose correct ciphers, you won't manage to keep a single current core busy with the crypto overhead even if you have a fully utilized 1 Gbit line. [Edit: RC4 should not be used because it is no longer secure. However, AES hardware support is now present in many CPUs, which makes AES encryption really fast on such platforms.]
Connection establishment, however, is different. Depending on the implementation (e.g. support for TLS false start), it will add round-trips, which can cause noticable delays. Additionally, expensive crypto takes place on the first connection establishment (above-mentioned CPU could only accept 14 connections per core per second if you foolishly used 4096-bit keys and 100 if you use 2048-bit keys). On subsequent connections, previous sessions are often reused, avoiding the expensive crypto.
So, to summarize:
Transfer on established connection:
Delay: nearly none
CPU: negligible
Bandwidth: negligible
First connection establishment:
Delay: additional round-trips
Bandwidth: several kilobytes (certificates)
CPU on client: medium
CPU on server: high
Subsequent connection establishments:
Delay: additional round-trip (not sure if one or multiple, may be implementation-dependant)
Bandwidth: negligible
CPU: nearly none

Resources