What is the browser's overhead to decompress a gzip server response of an average sized web page?
<1ms 1-3ms? more?
I'll assume that you mean 1.3M uncompressed. I get about 6 ms decompression time on one core of a 2 GHz i7.
If I assume 1/3 compression, an extra 7 Mbits needs to be transferred if not compressed. That will take more than 6 ms on a 1 Gbit/s link. 700 ms on a more typical 10 Mbit/s link.
gzip is a big win for HTTP transfers.
Using zlib implementation of gzip with default parameters.
On an internet facing server, Xeon cpu 2.66Ghz quad core, the gzip compression times are
Less than 0.5mS up to 15Kb. 361Kb is 4.50mS and 1077Kb takes 13mS
I consider this still easily worth it however, as most of our traffic is heading out over wifi or 3G links, so transfer time far outweighs server delay.
The times are measured with code bracketing only the call to gzip routines and use nS precision timers, I changed the source to implement this. I was measuring this anyway, as I was trying to determine if caching gzip was worth the memory tradeoff, or was gzip fast enough anyway. In our case, I think we will gzip everything above about 200bytes, and aggresively cache gzip'd responses, especially for larger packets.
(#Mark adler, thanks for writing zlib)
Related
I have never tried this but it's something I was wondering about.
If I am downloading a very big file (say 200 GB) using very fast links (1 Gbps or even 10 Gbps), how does the SO (or whoever do this) writes the downloaded file at the same time in the disk, since disks have very slow write speed compared to my link speed?
Would in this case, the hard drive become a bottleneck?
If I run iostat in my PC it shows 1027 KBps ~ 1 MBps write speed, which is very slow compared with the link stated before.
Yes, it's certainly possible for almost any part of your hardware chain, from the incoming link to your hard drive to become the bottleneck, depending on your hardware.
In the case that you are actually sustaining a download speed faster than the linear write speed of your hard drive, it could certainly become a bottleneck.
Note, however, that even most budget hard drives today1 have a linear write speed of at least 50 MB/s, which is 400 Mbps, and often closer to 100 MB/s (i.e., 800 Mbps). At an effective write speed of 800 Mbps, your drive should be able to keep up with a saturated 1 Gbps link2, at least approximately, but would certainly fall behind against a 10 Gbps link.
Now, what you were measuring with iostat isn't any kind of useful benchmark - it's telling you the actual throughput for either the entire uptime of your host of active requests over some interval: unless are doing a large transfer during the interval, the speeds reported there have little relationship to your actual disk write speeds. There are plenty of benchmarking tools that will measure directly your read and write speeds.
A final think to keep in mind is that most modern operating systems use a "write-back" strategy for storage writes - the writes are first sent to RAM (i.e., the file cache on Windows or page cache on Linux), and then streamed out to the disk asynchronously. This helps hide the throughput of the actual disk for relatively short bursts of writes. For example, your disk may appear to have a very high throughput of 5 GB/s or more if writes are of a few hundred MBs or a few GBs3, but then will approach the true disk speed as they get larger, since the buffering ability of your OS disk cache will be exhausted. Clearly for a 200 GB transfer, the disk cache isn't going to be able to hide the disk speed, unless you have 100s of GBs of RAM.
So all that said, yes, your harddrive can certainly become a bottleneck,
but likely at high higher throughputs than what you measured with iostat.
1 This increase in linear read and write speeds is mostly an artifact of increasing areal density of storage, which directly translates to increased linear read/write speeds at the same RPM. Random read/write has no such relationship, however. SSDs of reasonable capacity usually increase this by another order of magnitude, to around 1 GB/s (8 Gbps) or more.
2 Mostly because these network links have overhead which for fast links often reduces the actual payload to less than 80% of the theoretical link speed.
3 The exact values depend on your total RAM, available RAM and OS configuration.
I have reacently activated gzip compression on an IIS6 webserver. I use both static and dynamic compression (static level 10 and dyamic level 1). This was a measure to increase server response time performance. However it seems like the page loads slower after the compression was activated. All my measurements in firebug indicate this.
Has anyone else had this problem? What can be the cause?
You are doing more work on the server and the client so it is normal that response times increase. On low bandwidth connections you might make it good with reduced transfer times.
If you are on a high bandwidth connection then the compression will have not have a significant impact on the transfer delay as it is already short uncompressed. However you will pay 100% of the CPU penalty.
Now zipping large responses takes quite some CPU power, if the server CPU's are already loaded the response times might even get worse.
My advice : check the server CPU and if it is non-negligeable then either turn off zipping or buy a bigger box. If you have a large population on mobile or in remote locations with poor internet connectivity then zipping might be useful, otherwise it will make little difference.
You might also look in using a reverse proxy to reduce the load of the server.
How much bandwidth do you have between your browser and your server?
compressing and decompressing the stream is more work, and so on a fast network, it may actually be slower -- is this an intranet application? You will see the most gain for compression if you have tight bandwidth requirements (either lots of traffic, or a lower bandwidth connection.
How much compression helps will also depend upon what kind of content your site is delivering.
The best thing to do is to test and measure under the same conditions that your site will be under when it is in production.
Static compression works quite well because a copy of the gzipped file is placed in a temporary folder, however dynamically compressed responses have to be re-gzipped every time and unless the bandwidth is a big problem I don't think it is worth it.
It used to be that disk compression was used to increase storage space at the expense of efficiency but we were all on single processor systems back then.
These days there are extra cores around to potentially do the decompression work in parallel with processing the data.
For I/O bound applications (particularly read heavy sequential data processing) it might be possible to increase throughput by only reading and writing compressed data to disk.
Does anyone have any experience to support or reject this conjecture?
Take care not to confuse disk seek times and disk read rates. It takes millions of CPU cycles (5–10 milliseconds or 5–10 million nanoseconds) to seek to the right track on a hard drive (HDD). Once you're there, you can read tens of megabytes of data per second, assuming low fragmentation. For solid-state drives (SSD), seek times are lower (35,000–100,000ns) than HDDs.
Whether or not the data is compressed on the disk, you still have to seek. The question becomes, is (disk read time for compressed data + the decompression time) < (disk read time for uncompressed data). Decompression is relatively fast, since it amounts to replacing a short token with a longer one. In the end, it probably boils down to how well the data was compressed and how big it was in the first place. If you're reading a 2KB compressed file instead of a 5KB original, it's probably not worth it. If you're reading a 2MB compressed file instead of a 25MB original, it likely is.
Measure with a reasonable workload.
Yes! In fact, processors are so ridiculously fast now that it even makes sense for memory. (IBM does this, I believe.) I believe, some of the current big iron machines even do compression on the CPU cache.
Yes, this makes perfect sense. On NT-based Windows OS's it's widely accepted that sometimes enabling NTFS compression can be faster than disabling it for precisely this reason. This has been true for years and multicore should only make it more true.
I think it also depends on how aggressive your compression is vs how IO bound you are.
For example, DB2's row compression feature is targeted for IO bound application: data warehouses, reporting systems, etc. It uses a dictionary-based algorithm and isn't very aggressive - resulting in 50-80% compression of data (tables, indexes in storage as well as when in memory). However - it also tends to speed queries up by around 10%.
They could have gone with much more aggressive compression, but then would have taken a performance hit.
On a modern system can local hard disk write speeds be improved by compressing the output stream?
This question derives from a case I'm working with where a program serially generates and dumps around 1-2GB of text logging data to a raw text file on the hard disk and I think it is IO bound. Would I expect to be able to decrease runtimes by compressing the data before it goes to disk or would the overhead of compression eat up any gain I could get? Would having an idle second core affect this?
I know this would be affected by how much CPU is being used to generate the data so rules of thumb on how much idle CPU time would be needed would be good.
I recall a video talk where someone used compression to improve read speeds for a database but IIRC compressing is a lot more CPU intensive than decompressing.
Yes, yes, yes, absolutely.
Look at it this way: take your maximum contiguous disk write speed in megabytes per second. (Go ahead and measure it, time a huge fwrite or something.) Let's say 100mb/s. Now take your CPU speed in megahertz; let's say 3Ghz = 3000mhz. Divide the CPU speed by the disk write speed. That's the number of cycles that the CPU is spending idle, that you can spend per byte on compression. In this case 3000/100 = 30 cycles per byte.
If you had an algorithm that could compress your data by 25% for an effective 125mb/s write speed, you would have 24 cycles per byte to run it in and it would basically be free because the CPU wouldn't be doing anything else anyway while waiting for the disk to churn. 24 cycles per byte = 3072 cycles per 128-byte cache line, easily achieved.
We do this all the time when reading optical media.
If you have an idle second core it's even easier. Just hand off the log buffer to that core's thread and it can take as long as it likes to compress the data since it's not doing anything else! The only tricky bit is you want to actually have a ring of buffers so that you don't have the producer thread (the one making the log) waiting on a mutex for a buffer that the consumer thread (the one writing it to disk) is holding.
Yes, this has been true for at least 10 years. There are operating-systems papers about it. I think Chris Small may have worked on some of them.
For speed, gzip/zlib compression on lower quality levels is pretty fast; if that's not fast enough you can try FastLZ. A quick way to use an extra core is just to use popen(3) to send output through gzip.
For what it is worth Sun's filesystem ZFS has the ability to have on the fly compression enabled to decrease the amount of disk IO without a significant increase in overhead as an example of this in practice.
The Filesystems and storage lab from Stony Brook published a rather extensive performance (and energy) evaluation on file data compression on server systems at IBM's SYSTOR systems research conference this year: paper at ACM Digital Library, presentation.
The results depend on the
used compression algorithm and settings,
the file workload and
the characteristics of your machine.
For example, in the measurements from the paper, using a textual workload and a server environment using lzop with low compression effort are faster than plain write, but bzip and gz aren't.
In your specific setting, you should try it out and measure. It really might improve performance, but it is not always the case.
CPUs have grown faster at a faster rate than hard drive access. Even back in the 80's a many compressed files could be read off the disk and uncompressed in less time than it took to read the original (uncompressed) file. That will not have changed.
Generally though, these days the compression/de-compression is handled at a lower level than you would be writing, for example in a database I/O layer.
As to the usefulness of a second core only counts if the system will be also doing a significant number of other things - and your program would have to be multi-threaded to take advantage of the additional CPU.
Logging the data in binary form may be a quick improvement. You'll write less to the disk and the CPU will spend less time converting numbers to text. It may not be useful if people are going to be reading the logs, but they won't be able to read compressed logs either.
Windows already supports File Compression in NTFS, so all you have to do is to set the "Compressed" flag in the file attributes.
You can then measure if it was worth it or not.
This depends on lots of factors and I don't think there is one correct answer. It comes down to this:
Can you compress the raw data faster than the raw write performance of your disk times the compression ratio you are achieving (or the multiple in speed you are trying to get) given the CPU bandwidth you have available to dedicate to this purpose?
Given today's relatively high data write rates in the 10's of MBytes/second this is a pretty high hurdle to get over. To the point of some of the other answers, you would likely have to have easily compressible data and would just have to benchmark it with some test of reasonableness type experiments and find out.
Relative to a specific opinion (guess!?) to the point about additional cores. If you thread up the compression of the data and keep the core(s) fed - with the high compression ratio of text, it is likely such a technique would bear some fruit. But this is just a guess. In a single threaded application alternating between disk writes and compression operations, it seems much less likely to me.
If it's just text, then compression could definitely help. Just choose an compression algorithm and settings that make the compression cheap. "gzip" is cheaper than "bzip2" and both have parameters that you can tweak to favor speed or compression ratio.
If you are I/O bound saving human-readable text to the hard drive, I expect compression to reduce your total runtime.
If you have an idle 2 GHz core, and a relatively fast 100 MB/s streaming hard drive,
halving the net logging time requires at least 2:1 compression and no more than roughly 10 CPU cycles per uncompressed byte for the compressor to ponder the data.
With a dual-pipe processor, that's (very roughly) 20 instructions per byte.
I see that LZRW1-A (one of the fastest compression algorithms) uses 10 to 20 instructions per byte, and compresses typical English text about 2:1.
At the upper end (20 instructions per byte), you're right on the edge between IO bound and CPU bound. At the middle and lower end, you're still IO bound, so there is a a few cycles available (not much) for a slightly more sophisticated compressor to ponder the data a little longer.
If you have a more typical non-top-of-the-line hard drive, or the hard drive is slower for some other reason (fragmentation, other multitasking processes using the disk, etc.)
then you have even more time for a more sophisticated compressor to ponder the data.
You might consider setting up a compressed partition, saving the data to that partition (letting the device driver compress it), and comparing the speed to your original speed.
That may take less time and be less likely to introduce new bugs than changing your program and linking in a compression algorithm.
I see a list of compressed file systems based on FUSE, and I hear that NTFS also supports compressed partitions.
If this particular machine is often IO bound,
another way to speed it up is to install a RAID array.
That would give a speedup to every program and every kind of data (even incompressible data).
For example, the popular RAID 1+0 configuration with 4 total disks gives a speedup of nearly 2x.
The nearly as popular RAID 5 configuration, with same 4 total disks, gives all a speedup of nearly 3x.
It is relatively straightforward to set up a RAID array with a speed 8x the speed of a single drive.
High compression ratios, on the other hand, are apparently not so straightforward. Compression of "merely" 6.30 to one would give you a cash prize for breaking the current world record for compression (Hutter Prize).
This used to be something that could improve performance in quite a few applications way back when. I'd guess that today it's less likely to pay off, but it might in your specific circumstance, particularly if the data you're logging is easily compressible,
However, as Shog9 commented:
Rules of thumb aren't going to help
you here. It's your disk, your CPU,
and your data. Set up a test case and
measure throughput and CPU load with
and without compression - see if it's
worth the tradeoff.
How much of a performance hit will running everything over TLS do to my server? I would assume this is completely ignorable in this day and age? I heard once that servers today could encrypt gigabytes of data per second, is that true? And if so, is it linearly scalable so that if top speed is 10GB/second, encrypting 1GB would take 0.1 second?
I'm not in some kind of pickle with any admin over this (yet). I'm just curious and if I can mostly ignore the hit, why not just encrypt everything?
Performance Analysis of TLS Web Servers (pdf), a paper written at Rice University, covered this topic back in 2002, and they came to this conclusion:
Apache TLS without the AXL300 served between 149 hits/sec and 259 hits/sec for the CS trace, and between 147 hits/sec and 261 hits/sec for the Amazon trace. This confirms that TLS incurs a substantial cost and reduces the throughput by 70 to 89% relative to the insecure Apache.
So without the AXL300 board, which offloads encryption, there was a reduction in throughput of 70-89% on a PIII-933MHz. However, they note in the next section that as CPU speeds increase, the throughput is expected to increase accordingly. So since 2002, you may find that there is no noticeable difference for your workload.