Related
I'm writing real-time data to an empty spinning disk sequentially. (EDIT: It doesn't have to be sequential, as long as I can read it back as if it was sequential.) The data arrives at a rate of 100 MB/s and the disks have an average write speed of 120 MB/s.
Sometimes (especially as free space starts to decrease) the disk speed goes under 100 MB/s depending on where on the platter the disk is writing, and I have to drop vital data.
Is there any way to write to disk in a pattern (or some other way) to ensure a constant write speed close to the average rate? Regardless of how much data there currently is on the disk.
EDIT:
Some notes on why I think this should be possible.
When usually writing to the disk, it starts in the fast portion of the platter and then writes towards the slower parts. However, if I could write half the data to the fast part and half the data to the slow part (i.e. for 1 second it could write 50MB to the fast part and 50MB to the slow part), they should meet in the middle. I could possibly achieve a constant rate?
As a programmer, I am not sure how I can decide where on the platter the data is written or even if the OS can achieve something similar.
If I had to do this on a regular Windows system, I would use a device with a higher average write speed to give me more headroom. Expecting 100MB/s average write speed over the entire disk that is rated for 120MB/s is going to cause you trouble. Spinning hard disks don't have a constant write speed over the whole disk.
The usual solution to this problem is to buffer in RAM to cover up infrequent slow downs. The more RAM you use as a buffer, the longer the span of slowness you can handle. These are tradeoffs you have to make. If your problem is the known slowdown on the inside sectors of a rotating disk, then your device just isn't fast enough.
Another thing that might help is to access the disk as directly as possible and ensure it isn't being shared by other parts of the system. Use a separate physical device, don't format it with a filesystem, write directly to the partitioned space. Yes, you'll have to deal with some of the issues a filesystem solves for you, but you also skip a bunch of code you can't control. Even then, your app could run into scheduling issues with Windows. Windows is not a RTOS, there are not guarantees as far as timing. Again this would help more with temporary slowdowns from filesystem cleanup, flushing dirty pages, etc. It probably won't help much with the "last 100GB writes at 80MB/s" problem.
If you really are stuck with a disk that goes from 120MB/s -> 80MB/s outside-to-inside (you should test with your own code and not trust the specs from the manufacture so you know what you're dealing with), then you're going to have to play partitioning games like others have suggested. On a mechanical disk, that will introduce some serious head seeking, which may eat up your improvement. To minimize seeks, it would be even more important to ensure it's a dedicated disk the OS isn't using for anything else. Also, use large buffers and write many megabytes at a time before seeking to the end of the disk. Instead of partitioning, you could write directly to the block device and control which blocks you write to. I don't know how to do this in Windows.
To solve this on Linux, I would be tempted to test mdadm's raid0 across two partitions on the same drive and see if that works. If so, then the work is done and you don't have to write and test some complicated write mechanism.
Partition the disk into two equally sized partitions. Write a few seconds worth of data alternating between the partitions. That way you get almost all of the usual sequential speed, nicely averaged. One disk seek every few seconds eats up almost no time. One seek per second reduces the usable time from 1000ms to ~990ms which is a ~1% reduction in throughput. The more RAM you can dedicate to buffering the less you have to seek.
Use more partitions to increase the averaging effect.
I fear this may be more difficult than you realize:
If your average 120 MB/s write speed is the manufacturer's value then it is most likely "optimistic" at best.
Even a benchmarked write speed is usually done on a non-partitioned/formatted drive and will be higher than what you'd typically see in actual use (how much higher is a good question).
A more important value is the drive's minimum write speed. For example, from Tom's Hardware 2013 HDD Benchmarks a drive with a 120 MB/s average has a 76 MB/s minimum.
A drive that is being used by other applications at the same time (e.g., Windows) will have a much lower write speed.
An even more important value is the drives actual measured performance. I would make a simple application similar to your use case that writes data to the drive as fast as possible until it fills the drive. Do this a few (dozen) times to get a more realistic average/minimum/maximum write speed value...it will likely be lower than you'd expect.
As you noted, even if your "real" average write speed is higher than 100 MB/s you run into issues if you run into slow write speeds just before the disk fills up, assuming you don't have somewhere else to write the data to. Using a buffer doesn't help in this case.
I'm not sure if you can actually specify a physical location to write to on the hard drive these days without getting into the drive's firmware. Even if you could this would be my last choice for a solution.
A few specific things I would look at to solve your problem:
Measure the "real" write performance of the drive to see if its fast enough. This gives you an idea of how far behind you actually are.
Put the OS on a separate drive to ensure the data drive is not being used by anything other than your application.
Get faster drives (either HDD or SDD). It is fine to use the manufacturer's write speeds as an initial guide but test them thoroughly as well.
Get more drives and put them into a RAID0 (or similar) configuration for faster write access. You'll again want to actually test this to confirm it works for you.
You could implement the strategy of alternating writes bewteen the inside and the outside by directly controlling the disk write locations. Under Windows you can open a disk like "\.\PhysicalDriveX" and control where it writes. For more info see
http://msdn.microsoft.com/en-us/library/windows/desktop/aa363858(v=vs.85).aspx
First of all, I hope you are using raw disks and not a filesystem. If you're using a filesystem, you must:
Create an empty, non-sparse file that's as large as the filesystem will fit.
Obtain a mapping from the logical file positions to disk blocks.
Reverse this mapping, so that you can map from disk blocks to logical file positions. Of course some blocks are unavailable due to filesystem's own use.
At this point, the disk looks like a raw disk that you access by disk block. It's a valid assumption that this block addressing is mostly monotonous to the physical cylinder number. IOW if you increase the disk block number, the cylinder number will never decrease (or never increase -- depending on the drive's LBA to physical mapping order).
Also, note that a disk's average write speed may be given per cylinder or per unit of storage. How would you know? You need the latter number, and the only sure way to get it is to benchmark it yourself. You need to fill the entire disk with data, by repeatedly writing a zero page to the disk, going block by block, and divide the total amount of data written by the amount it took. You need to be accessing the disk or the file in the direct mode. This should disable the OS buffering for the file data, and not for the filesystem metadata (if not using a raw disk).
At this point, all you need to do is to write data blocks of sensible sizes at the two extremes of the block numbers: you need to fill the disk from both ends inwards. The size of the data blocks depends on the bandwidth wastage you can allow for seeks. You should also assume that the hard drive might seek once in a while to update its housekeeping data. Assuming a worst-case seek taking 15ms, you waste 1.5% of per-second bandwidth for each seek. Assuming you can spare no more than 5% of bandwidth, with 1 seek/s on average for the drive itself, you can seek twice per second. Thus your block size needs to be your_bandwith_per_second/2. This bandwidth is not the disk bandwidth, but the bandwidth of your data source.
Alas, if only things where this easy. It generally turns out that the bandwidth at the middle of the disk is not the average bandwidth. During your benchmark you must also take a note of write speed over smaller sections of the disk, say every 1% of the disk. This way, when writing into each section of the disk, you can figure out how to split the data between the "low" and the "high" section that you're writing to. Suppose that you're starting out at 0% and 99% positions on the disk, and the low position has a bandwidth of mean*1.5, and the high position has a bandwidth of mean*0.8, where mean is your desired mean bandwidth. You'll then need to write 100% * 1.5/(0.8+1.5) of the data into the low position, and the remainder (100% * 0.8/(0.8+1.5)) into the slower high position.
The size of your buffer needs to be larger than just the block size, since you must assume some worst-case latency for the hard drive if it hits bad blocks and needs to relocate data, etc. I'd say a 3 second buffer may be reasonable. Optionally it can grow by itself if latencies you measure while your software runs turn out higher. This buffer must be locked ("pinned") to physical memory so that it's not subject to swapping.
Another possible option is to destroke (or short stroke) a hard drive. If you start with a 4TB or larger drive and destroke it to 2TB, only the outer portions of the platters will be used, resulting in a faster throughput rate. The issue would be getting the software that issues vendor unique commands to a hard drive to destroke it.
The background is thus: next week our office will have one day with no heating, due to maintenance. Outdoor temperature is expected between 7 and 12 degrees Celcius, so it might become chilly. The portable electric heaters are too few to cater for everyone.
However, I have, in my office of about 6-8 m2, a big honkin' (3 yrs old) workstation (HP xw8600 with 3.0 GHz Quad-core Xeon) that should be able output a couple of hundred Watts of heat. Running Furmark will max out the GPU but I'm not sure how to best work the CPU.
Last time I was in a cold office I either compiled more often or just launched 4-8 DOSBox:es running Norton Commander, but I think one can do better by using SSE1-2-3-4,MMX etc, i.e. stuff that does more work per cycle.
So, what CPU instructions toggle the most transistors each cycle, and thus use cause the CPU to draw most amount of power and thus give off maximum heat?
If I had a power meter available, then I could benchmark myself, but I figure this would be a fun challenge for the SO-crowd. :)
For your specific goal, if you really want to use your system as a heat generator, you need to first make sure that the cooling system is working really well (throwing the heat out of the box). Processors today are designed to throttle themselves when they reach a critical temperature which happens when a proper heatsink is used and the processor is at TDP (Thermal Design Power is the max power for the processor using normal programs). If you have a better heat sink and good ventilation (box fan?), you can probably get beyond TDP assuming that your power supply can handle it. If you turn the fan off, you basically will hit the thermal limit right away.
To be more explicit, the individual instructions that burn the most are generally load instructions that miss in the caches and go out to memory. To guarantee misses, you'll want to allocate a chunk of memory that's bigger than the last level CPU cache and hop around that memory. The pattern of hopping in the maximum power case is a bit complex because you're trying to get the maximum number of misses outstanding at every level of the cache hierarchy simultaneously. If you have 3 levels of cache, in a given period of time, you can have more misses to the L1 than you can to the L2 than you can to the L3 than you can to the DRAM page. (And the specific design of your processor will have a total limit on misses.) Between misses, the instruction doesn't matter too much, but I'd guess that one of the SSE4 multiplies (PMULUDQ) is probably the best since on a lot of modern processors, they execute in pretty quickly and generally do a whole lot of work (compared to say an add).
The funny thing is, running the GPU may limit the amount of heat that you can generate using misses to the L3 cache since the memory may be bogged down by the GPU. In that case, you should make sure that all accesses to the L3 are hits, but that you're missing in the other levels.
For GeForce graphics, my CudaMFLOPS program (free) is quite handy for obtaining high temperatures on the graphics card. If you have an appropriate card details are in:
http://www.roylongbottom.org.uk/cuda1.htm#anchor8
I find that my tests that execute SSE instructions with data from L1 cache generally produce the highest CPU temperatures.
For cpu use Prime95. That is lightweight and will load up all cores nicely. You aren't really going to get much heat out of a 3ghz xeon though. Chips of that age are usually good for over 4ghz with average cooling, and close to 5ghz with high end water loops. With a 6-core chip # >4ghz with extra voltage added you might be hitting 200w TDP but with that system you will be lucky to get the cpu to 100w.
As for the GPU, the Heaven Benchmark is a good one for quickly getting it up to temperature. Again, unless you have a high end card a couple of hundred watts of heat is unlikely. Another alternative on AMD gpus (maybe nvidia too?) is to use crpto-currency mining software, maybe get a USB stick with a mining linux distribution installed and ready to go. You could also use Prime95 on the same rig as mining software uses very little cpu time.
I actually kept a couple of rooms warm over winter with the heat from a computer, only rarely needing extra heating. This was done with a crypto-currency mining rig, which had 4 gpus running at ~80 degrees C, 24/7, with a box fan to circulate the air round the room. That rig had a 1300W PSU. Might I suggest that instead of trying to use the computer to keep you warm, you wear more clothes?
Now the machines we are forced to use are 2GB Ram, Intel Core 2 Duo E6850 # 3GHz CPU...
The policy within the company is that everyone has the same computer no matter what and that they are on a 3 year refresh cycle... Meaning I will have this machine for the next 2 years... :S
We have been complaining like crazy but they said they want proof that upgrading the machines will provide exactly X time saving before doing anything... And with that they are only semi considering giving us more RAM...
Even when you put forward that developer resources are much more expensive than hardware, they firstly say go away, then after a while they say prove it. As far as they are concerned paying wages comes from a different bucket of money to the machines and that they don't care (i.e. the people who can replace the machines, because paying wages doesn't come from their pockets)...
So how can I prove that $X benefit will be gained by spending $Y on new hardware...
The stack I'm working with is as follows: VS 2008, SQL 2005/2008. As duties dictate we are SQL admins as well as Web/Winform/WebService Developers. So its very typical to have 2 VS sessions and at least one SQL session open at the same time.
Cheers
Anthony
Actually, the main cost for your boss is not the lost productivity. It is that his developers don't enjoy their working conditions. This leads to:
loss of motivation and productivity
more stress causing illness
external opportunities causing developers to go away
That sounds like a decent machine for your stack. Have you proven to yourself that you're going to get better performance, using real-world tests?
Check with your IT people to see if you can get the disks benchmarked, and max out the memory. Management should be more willing to take these incremental steps first.
The machine looks fine apart from the RAM.
If you want to prove this sort of thing time all the things you wait for (typically load times and compile times), add it all up and work how much it costs you to sit around. From that make some sort of guess how much time you'll save (it'll have to be a guess unless you can compare like with like, which is difficult if they won't upgrade your systems). You'll probably find that they'll make the money back on the RAM at least in next to no time - and that's before you even begin to factor in the loss of productivity from people's minds wandering whilst they wait for stuff to happen.
Unfortunately if they're skeptical then it's unlikely you can prove it to them in a quantitative way alone. Even if you came up with numbers, they'll probably question the methodology. I suggest you see if they're willing to watch a 10 minute demo (maybe call it a presentation), and show them the experience of switching between VS instances (while explaining why you'd need to switch and how often), show them the build process (again explaining why you'd need to create a build and how often), etc.
Ask them if you're allowed to bring your own hardware. If you're really convinced it would make you more productive, upgrade it yourself and when you start producing more ask for a raise or to be reimbursed.
Short of that though..
I have to ask: what else are you running? I'm not really that familiar with that stack, but it really shouldn't be that taxing. Are they forcing you to run some kind of system-slowing monitoring or antivirus app?
You'd probably have better luck convincing them to let you change that than getting them to roll out new updates.
If you really must convince them, your best bet is to benchmark your machine as accurately as you can and price out exactly what you need upgraded. Its a lot easier to get them to agree to an exact (and low) dollar amount than some open-ended upgrade
Even discussion this with them for more than five minutes will cost more than just calling out to your local PC dealer and buy the RAM out of your own pocket. Ask you project lead whether they can put it on the tab of the project as another "development tool". If (s)he can't, don't bother and cough up the
When they come complaining, then put the time of the meetings for this on their budget (since they come crying). See how long they can take this.
When we had the same issue, my boss bought better gfx cards for the whole team out of his own pockets and went to the PC guys to get each of us a second monitor. A few days later, he went again to get each of us 2GB more RAM, too.
The main cost from slow developer machines comes from the slow builds and the 'context switching', ie the time that it takes you to switch between the tasks required of you:
Firing up the second instance of VS and waiting for it to load and build
Checking out or updating a source tree
Starting up another instance of VS or checking out a clean source tree to 'have a quick look at' some bug that's been assigned
Multiple build/debug cycles to fix difficult bugs
The mental overhead in switching between different tasks, which shouldn't be underestimated
I made a case a while ago for new hardware after doing a breakdown of the amount of time that was wasted waiting for the machine to catch up. In a typical day we might need to do 2 or 3 full builds at half an hour each. The link time was around 3 minutes, and in a build/debug cycle you might do that 40 times a day. So that's 3.5 hours a day waiting for the machine. The bulk of that is in small 2 or 3 minute pockets which isn't long enough for you to context switch and do something else. It's long enough to check your mail, check stackoverflow, blow your nose and that's about it. So there's nothing else productive you can do with that time.
If you can show that a new machine will build the full project in 15 minutes and link in 1 minute then that's theoretically given you an extra 2 hours of productivity a day (or more realistically, the potential for more build cycles).
So I would get some objective timings that show how long it takes for different parts of your work cycle, then try to do comparative timings on machines with 4GB of RAM, a second drive (eg something fast like a WD Raptor), an SSD, whatever, to come up with some hard figures to support your case.
EDIT: I forgot to mention: present this as your current hardware is making you lose productivity, and put a cost on the amount of time lost by multiplying it by a typical developer hourly rate. On this basis I was able to show that a new PC would pay for itself in about month.
Take a task you do regularly that would be improved with faster hardware - ex: running the test suite, running a build, booting and shutting down a virtual machine - and measure the time it takes with current hardware and with better hardware.
Then compute the monthly, or yearly cost: how many times per month x time gained x hourly salary, and see if this is enough to make a case.
For instance, suppose you made $10,000/month, and gained 5 minutes a day with a better machine, the loss to your company per month would be around (5/60 hours lost a day) x 20 work days/month x $10,000 / 8 hours/day = $105 / month. Or about $1200/year lost because of the machine (assuming I didn't mess up the math...). Now before talking to your manager, think about whether this number is significant.
Now this is assuming that 1) you can measure the improvement, even though you don't have a better machine, and 2) while you are "wasting" your 5 minutes a day, you are not doing anything productive, which is not obvious.
For me, the cost of a slow machine is more psychological, but it's hard to quantify - after a few days of having to wait for things to happen on the PC, I begin to get cranky, which is both bad for my focus, and my co-workers!
It’s easy; hardware is cheap, developers are expensive. Throwing reasonable amounts of money at the machinery should be an absolute no brainer and if your management doesn’t understand that and won’t be guided by your professional opinion then you might be in the wrong job.
As for your machine, throw some more RAM at it and use a fast disk (have a look at how intensive VS is on disk IO using the resource monitor – it’s very hungry). Lots of people going towards 10,000 RPM or even SSD these days and they make a big difference to your productivity.
Try this; take the price of the hardware you need (say fast disk and more RAM), split it across a six month period (a reasonable time period in which to recoup the investment) and see what it’s worth in “developer time” each day. You’ll probably find it only needs to return you a few minutes a day to pay for itself. Once again, if your management can’t understand or support this then question if you’re in the right place.
Applications like Microsoft Outlook and the Eclipse IDE consume RAM, as much as 200MB. Is it OK for a modern application to consume that much memory, given that few years back we had only 256MB of RAM? Also, why this is happening? Are we taking the resources for granted?
Is it acceptable when most people have 1 or 2 gigabytes of RAM on their PCS?
Think of this - although your 200mb is small and nothing to worry about given a 2Gb limit, everyone else also has apps that take masses of RAM. Add them together and you find that the 2Gb I have very quickly gets all used up. End result - your app appears slow, resource hungry and takes a long time to startup.
I think people will start to rebel against resource-hungry applications unless they get 'value for ram'. you can see this starting to happen on servers, as virtualised systems gain popularity - people are complaining about resource requirements and corresponding server costs.
As a real-world example, I used to code with VC6 on my old 512Mb 1.7GHz machine, and things were fine - I could open 4 or 5 copies along with Outlook, Word and a web browser and my machine was responsive.
Today I have a dual-processor 2.8Ghz server box with 3Gb RAM, but I cannot realistically run more than 2 copies of Visual Studio 2008, they both take ages to start up (as all that RAM still has to be copied in and set up, along with all the other startup costs we now have), and even Word take ages to load a document.
So if you can reduce memory usage you should. Don't think that you can just use whatever bloated framework/library/practice you want with impunity.
http://en.wikipedia.org/wiki/Moore%27s_law
also:
http://en.wikipedia.org/wiki/Wirth%27s_law
There's a couple of things you need to think about.
1/ Do you have 256M now? I wouldn't think so - my smallest memory machine is 2G so a 200M application is not much of a problem.
2a/ That 200M you talk about might not be "real" memory. It may just be address space in which case it might not all be in physical memory at once. Some bits may only be pulled in to physical memory when you choose to do esoteric things.
2b/ It may also be shared between other processes (such as a DLL). This means it could be only held in physical memory as one copy but be present in the address space of many processes. That way, the usage is amortized over those many processes. Both 2a and 2b depend on where your figure of 200M actually came from (which I don't know and, running Linux, I'm unlikel to find out without you telling me :-).
3/ Even if it is physical memory, modern operating systems aren't like the old DOS or Windows 3.1 - they have virtual memory where bits of applications can be paged out (data) or thrown away completely (code, since it can always reload from the executable). Virtual memory gives you the ability to use far more memory than your actual physical memory.
Many modern apps will take advantage of the existance of more memory to cache more. Some like firefox and SQL server have explicit settings for how much memory they will use. In my opinion, it's foolish to not use available memory - what's the point of having 2GB of RAM if your apps all sit around at 10MB leaving 90% of your physical memory unused. Of course, if your app does use caching like this, it better be good at releasing that memory if page file thrashing starts, or allow the user to limit the cache size manually.
You can see the advantage of this by running a decent-sized query against SQL server. The first time you run the query, it may take 10 seconds. But when you run that exact query again, it takes less than a second - why? The query plan was only compiled the first time and cached for use later. The database pages that needed to be read were only loaded from disk the first time - the second time, they were still cached in RAM. If done right, the more memory you use for caching (until you run into paging) the faster you can re-access data. You'll see the same thing in large documents (e.g. in Word and Acrobat) - when you scroll to new areas of a document, things are slow, but once it's been rendered and cached, things speed up. If you don't have enough memory, that cache starts to get overwritten and going to the old parts of the document gets slow again.
If you can make good use of the RAM, it is your responsability to use it.
Yes, it is perfectly normal. Also something big was changed since 256MB were normal... and do not forget that before that 640Kb were supposed to be enough for everybody!
Now most software solutions are build with a garbage collector: C#, Java, Ruby, Python... everybody love them because certainly development can be faster, however there is one glitch.
The same program can be memory leak free with either manual or automatic memory deallocation. However in the second case it is likely for the memory consumption to grow. Why? In the first case memory is deallocated and kept clean immediately after something becomes useless (garbage). However it takes time and computing power to detect that automatically, hence most collectors (except for reference counting) wait for garbage to accumulate in order to make worth the cost of the exploration. The more you wait the more garbage you can sweep with the cost of one blow, but more memory is needed to accumulate that garbage. If you try to force the collector constantly, your program would spend more time exploring memory than working on your problems.
You can be completely sure than as long as programmers get more resources, they will sacrifice them using heavier tools in exchange for more freedom, abstraction and faster development.
A few years ago 256 MB was the norm for a PC, then Outlook consumed about 30 - 35 MB or so of memory, that's around 10% of the available memory, Now PC's have 2 GB or more as a norm, and outlook consumes 200 MB of memory, that's about 10% also.
The 1st conclusion: as more memory is available applications use more of it.
The 2nd conclusion: no matter what time frame you pick there are applications that are true memory hogs (like Outlook) and applications that are very efficient memory wise.
The 3rd conclusion: memory consumption of a app can't go down with time, else 640K would have been enough even today.
It completely depends on the application.
On a modern system can local hard disk write speeds be improved by compressing the output stream?
This question derives from a case I'm working with where a program serially generates and dumps around 1-2GB of text logging data to a raw text file on the hard disk and I think it is IO bound. Would I expect to be able to decrease runtimes by compressing the data before it goes to disk or would the overhead of compression eat up any gain I could get? Would having an idle second core affect this?
I know this would be affected by how much CPU is being used to generate the data so rules of thumb on how much idle CPU time would be needed would be good.
I recall a video talk where someone used compression to improve read speeds for a database but IIRC compressing is a lot more CPU intensive than decompressing.
Yes, yes, yes, absolutely.
Look at it this way: take your maximum contiguous disk write speed in megabytes per second. (Go ahead and measure it, time a huge fwrite or something.) Let's say 100mb/s. Now take your CPU speed in megahertz; let's say 3Ghz = 3000mhz. Divide the CPU speed by the disk write speed. That's the number of cycles that the CPU is spending idle, that you can spend per byte on compression. In this case 3000/100 = 30 cycles per byte.
If you had an algorithm that could compress your data by 25% for an effective 125mb/s write speed, you would have 24 cycles per byte to run it in and it would basically be free because the CPU wouldn't be doing anything else anyway while waiting for the disk to churn. 24 cycles per byte = 3072 cycles per 128-byte cache line, easily achieved.
We do this all the time when reading optical media.
If you have an idle second core it's even easier. Just hand off the log buffer to that core's thread and it can take as long as it likes to compress the data since it's not doing anything else! The only tricky bit is you want to actually have a ring of buffers so that you don't have the producer thread (the one making the log) waiting on a mutex for a buffer that the consumer thread (the one writing it to disk) is holding.
Yes, this has been true for at least 10 years. There are operating-systems papers about it. I think Chris Small may have worked on some of them.
For speed, gzip/zlib compression on lower quality levels is pretty fast; if that's not fast enough you can try FastLZ. A quick way to use an extra core is just to use popen(3) to send output through gzip.
For what it is worth Sun's filesystem ZFS has the ability to have on the fly compression enabled to decrease the amount of disk IO without a significant increase in overhead as an example of this in practice.
The Filesystems and storage lab from Stony Brook published a rather extensive performance (and energy) evaluation on file data compression on server systems at IBM's SYSTOR systems research conference this year: paper at ACM Digital Library, presentation.
The results depend on the
used compression algorithm and settings,
the file workload and
the characteristics of your machine.
For example, in the measurements from the paper, using a textual workload and a server environment using lzop with low compression effort are faster than plain write, but bzip and gz aren't.
In your specific setting, you should try it out and measure. It really might improve performance, but it is not always the case.
CPUs have grown faster at a faster rate than hard drive access. Even back in the 80's a many compressed files could be read off the disk and uncompressed in less time than it took to read the original (uncompressed) file. That will not have changed.
Generally though, these days the compression/de-compression is handled at a lower level than you would be writing, for example in a database I/O layer.
As to the usefulness of a second core only counts if the system will be also doing a significant number of other things - and your program would have to be multi-threaded to take advantage of the additional CPU.
Logging the data in binary form may be a quick improvement. You'll write less to the disk and the CPU will spend less time converting numbers to text. It may not be useful if people are going to be reading the logs, but they won't be able to read compressed logs either.
Windows already supports File Compression in NTFS, so all you have to do is to set the "Compressed" flag in the file attributes.
You can then measure if it was worth it or not.
This depends on lots of factors and I don't think there is one correct answer. It comes down to this:
Can you compress the raw data faster than the raw write performance of your disk times the compression ratio you are achieving (or the multiple in speed you are trying to get) given the CPU bandwidth you have available to dedicate to this purpose?
Given today's relatively high data write rates in the 10's of MBytes/second this is a pretty high hurdle to get over. To the point of some of the other answers, you would likely have to have easily compressible data and would just have to benchmark it with some test of reasonableness type experiments and find out.
Relative to a specific opinion (guess!?) to the point about additional cores. If you thread up the compression of the data and keep the core(s) fed - with the high compression ratio of text, it is likely such a technique would bear some fruit. But this is just a guess. In a single threaded application alternating between disk writes and compression operations, it seems much less likely to me.
If it's just text, then compression could definitely help. Just choose an compression algorithm and settings that make the compression cheap. "gzip" is cheaper than "bzip2" and both have parameters that you can tweak to favor speed or compression ratio.
If you are I/O bound saving human-readable text to the hard drive, I expect compression to reduce your total runtime.
If you have an idle 2 GHz core, and a relatively fast 100 MB/s streaming hard drive,
halving the net logging time requires at least 2:1 compression and no more than roughly 10 CPU cycles per uncompressed byte for the compressor to ponder the data.
With a dual-pipe processor, that's (very roughly) 20 instructions per byte.
I see that LZRW1-A (one of the fastest compression algorithms) uses 10 to 20 instructions per byte, and compresses typical English text about 2:1.
At the upper end (20 instructions per byte), you're right on the edge between IO bound and CPU bound. At the middle and lower end, you're still IO bound, so there is a a few cycles available (not much) for a slightly more sophisticated compressor to ponder the data a little longer.
If you have a more typical non-top-of-the-line hard drive, or the hard drive is slower for some other reason (fragmentation, other multitasking processes using the disk, etc.)
then you have even more time for a more sophisticated compressor to ponder the data.
You might consider setting up a compressed partition, saving the data to that partition (letting the device driver compress it), and comparing the speed to your original speed.
That may take less time and be less likely to introduce new bugs than changing your program and linking in a compression algorithm.
I see a list of compressed file systems based on FUSE, and I hear that NTFS also supports compressed partitions.
If this particular machine is often IO bound,
another way to speed it up is to install a RAID array.
That would give a speedup to every program and every kind of data (even incompressible data).
For example, the popular RAID 1+0 configuration with 4 total disks gives a speedup of nearly 2x.
The nearly as popular RAID 5 configuration, with same 4 total disks, gives all a speedup of nearly 3x.
It is relatively straightforward to set up a RAID array with a speed 8x the speed of a single drive.
High compression ratios, on the other hand, are apparently not so straightforward. Compression of "merely" 6.30 to one would give you a cash prize for breaking the current world record for compression (Hutter Prize).
This used to be something that could improve performance in quite a few applications way back when. I'd guess that today it's less likely to pay off, but it might in your specific circumstance, particularly if the data you're logging is easily compressible,
However, as Shog9 commented:
Rules of thumb aren't going to help
you here. It's your disk, your CPU,
and your data. Set up a test case and
measure throughput and CPU load with
and without compression - see if it's
worth the tradeoff.