Is there anyway to limit writeback rate of bcache - limit

I have installed bcache on ubuntu14.04, 100G ssd, 1T hdd.
and I had a performance test with fio randwrite.
At beginning, the speed was good, but then it slowed down.
I checked the io status with iostat, it showed that data was being written into hdd from ssd. Both ssd and hdd was busy.
This may be the reason why the speed slowed down.
What I can think of is to limit writeback rate, so that the ssd can accept more write requests.
But how to limit writeback rate?

You can use /sys/fs/bcache/*/bdev*/writeback_rate to limit it. 0 turns limitation off as far as I know, higher values increase the rate.

Related

Disk latency causing CPU spikes on EC2 instance

We are having an interesting issue where we are seeing a CPU spike on our EC2 instance and at the same time we are seeing a spike in disk latency. Here is the pattern for CPU spike
CPU spike from 50% to 100% within 30 seconds
It stays at 100% utilization for two minutes
CPU utilization is dropped from 100 to almost 0 in 10 seconds. At the same time almost disk latency is also back to normal
This issue has happened on different AWS ec2 instances a couple of times over a week and still happening. In all cases we are seeing CPU spike along with disk latency with CPU spike having a similar pattern as above.
We had put process monitoring tools to check if any particular process was occupying the CPU. That tool revealed that each of process on the ec2 instance starts taking approx twice the CPU. For eg our app server CPU utilization increases from .75% to 1.5 . Similar observation for Nginx and other processes. There was no single process occupying more than 8% CPU. We studied our traffic pattern and there is nothing unusual which can cause this. So the question is
Can increase in disk latency cause the CPU spike pattern as above or in general can disk latency result in CPU spike
Here is my bet: you are running t2 / t3 machines which are burstable instances. You can access 30% of the CPU all the time, and a credit system create a fair usage predictable mode for the 70% remaining. You earn credit by running the instance, you lose credit by going over 30% CPU usage.
You are running out of credits and then AWS reduce your access to CPU. The system goes smooth again when credits are added to your balance.
t2 and t3 doesn't have the system credit system, you can find details here: CPU Credits and baseline
You have two solutions:
Take a bigger instance, so you will have more credits per hour and better baseline or another family like c5, m5, r5 etc...
Take an unlimited mode option for your t3 instances
I would suggest faster storage. cpu aims to add up to 100%. limiting is working in this strange way that it simulates usage for "unknown" reason. Reasons can be one of those:
idle time (notice here this is what you consider FREE cpu, thats why I say it adds up to 100%)
user time (normal usage)
system time (system usage)
iowait (your case, cpu waiting for HDD/SSD to answer)
nice time (low priority processes that were not included in user time)
interupt time (external device "talk" time - could be your case if you have many usb devices etc - rather unlikely)
softirq (queued work from a processed interrupt - see above)
steal time (case that Clement is describing)
I would suggest ensuring which one is your case
you can try below to get the info:
$ sudo apt-get install sysstat
$ mpstat -P ALL 1
From here there is 2 options for you :)
EBS allows you to run IO optimized volume called "IO1" (mid price - mid speed)
Change the machine and use one in "Nitro System" (provides bare metal capabilities - that is: as if you had actual NVMe connected directly - max possible speed)
m5.2xlarge 8 37 32 GiB EBS Only $0.384 per Hour
m5d.2xlarge 8 37 32 GiB 1 x 300 NVMe SSD $0.452 per Hour
Source: Instances built on the Nitro System

how can slow hard drive handle fast download workloads

I have never tried this but it's something I was wondering about.
If I am downloading a very big file (say 200 GB) using very fast links (1 Gbps or even 10 Gbps), how does the SO (or whoever do this) writes the downloaded file at the same time in the disk, since disks have very slow write speed compared to my link speed?
Would in this case, the hard drive become a bottleneck?
If I run iostat in my PC it shows 1027 KBps ~ 1 MBps write speed, which is very slow compared with the link stated before.
Yes, it's certainly possible for almost any part of your hardware chain, from the incoming link to your hard drive to become the bottleneck, depending on your hardware.
In the case that you are actually sustaining a download speed faster than the linear write speed of your hard drive, it could certainly become a bottleneck.
Note, however, that even most budget hard drives today1 have a linear write speed of at least 50 MB/s, which is 400 Mbps, and often closer to 100 MB/s (i.e., 800 Mbps). At an effective write speed of 800 Mbps, your drive should be able to keep up with a saturated 1 Gbps link2, at least approximately, but would certainly fall behind against a 10 Gbps link.
Now, what you were measuring with iostat isn't any kind of useful benchmark - it's telling you the actual throughput for either the entire uptime of your host of active requests over some interval: unless are doing a large transfer during the interval, the speeds reported there have little relationship to your actual disk write speeds. There are plenty of benchmarking tools that will measure directly your read and write speeds.
A final think to keep in mind is that most modern operating systems use a "write-back" strategy for storage writes - the writes are first sent to RAM (i.e., the file cache on Windows or page cache on Linux), and then streamed out to the disk asynchronously. This helps hide the throughput of the actual disk for relatively short bursts of writes. For example, your disk may appear to have a very high throughput of 5 GB/s or more if writes are of a few hundred MBs or a few GBs3, but then will approach the true disk speed as they get larger, since the buffering ability of your OS disk cache will be exhausted. Clearly for a 200 GB transfer, the disk cache isn't going to be able to hide the disk speed, unless you have 100s of GBs of RAM.
So all that said, yes, your harddrive can certainly become a bottleneck,
but likely at high higher throughputs than what you measured with iostat.
1 This increase in linear read and write speeds is mostly an artifact of increasing areal density of storage, which directly translates to increased linear read/write speeds at the same RPM. Random read/write has no such relationship, however. SSDs of reasonable capacity usually increase this by another order of magnitude, to around 1 GB/s (8 Gbps) or more.
2 Mostly because these network links have overhead which for fast links often reduces the actual payload to less than 80% of the theoretical link speed.
3 The exact values depend on your total RAM, available RAM and OS configuration.

Memory performance vs Utilization

Question: How is memory (RAM) performance (read/write speeds, etc.) affected by total utilization.
Background:
I am curious if there is a performance impact for reading/writing to system memory based on the overall utilization of that memory
If performance degrades at higher utilization, what is the relationship between utilization and performance? Is this linear? Or at some point is there a significant drop in performance?
If there is a drop in performance with higher utilization, is there a point at which it becomes faster to use swap on an SSD on a SATA bus? Where does this point occur?
Outcome:
All else being equal, I'm curious if there should be a specific target for memory utilization to get the best performance from a machine as on the one hand, having more stuff in system memory should make things faster than having to read from disk, but at some point surely, the overall memory performance is materially affected by some overhead from high memory utilization right?
This sounds a bit like a superuser.com question, not stackoverflow.
Time to allocate new memory might increase slightly as the system approaches 100% full.
If you don't have any swap space, Linux will pick a processes using a lot of RAM and kill it pre-emptively when the system is approaching OOM. (google oom-killer.)
Access time to already allocated memory is not at all influenced by the fraction of total memory in use. A program that uses 1GB of memory with some specific access pattern will show the same performance on a machine with 2G vs. a machine with 16GB of RAM.
Virtual->physical mappings are defined by page tables, which by themselves could give slower performance for lookups when more memory is allocated to a process. (each process has its own page table). Again, this is not %-full dependent, simply size. However, these lookups need to be cached by the CPU hardware TLB.
See Ulrich Drepper's What Every Programmer Should Know About Memory for more background on this stuff.

PostgreSQL shared_buffers on Windows

I'm runnung 64-bit PostgreSQL 9.1 on Windows Server. I'm trying to improve its performanace especially for handling heavy writing. I used to increase shared_buffer to %25 of RAM, and scine I got 32GB RAM I decided to set shared_buffers to 8GB. While I'm searching for more info I came across this post: http://rhaas.blogspot.com/2012/03/tuning-sharedbuffers-and-walbuffers.html
It says: but not more than about 8GB on Linux or 512MB on Windows, and sometimes less.
Now I'm confused. What's the point of increasing RAM if it won't help improving PostgreSQL performance?!
The other values will be as follows:
work_mem: 160MB
maintenance_work_mem = 1920MB
checkpoint_segments = 100
checkpoint_completion_target = 0.9
checkpoint_timeout = 1h
wal_buffers = 64MB
effective_cache_size = 22GB
For a write-heavy Windows server, the most important setting is to adjust checkpoint_segments. Your value is fairly high already, but you may want to experiment with values up to 256.
From the postgresql performance tuning guide (found here):
PostgreSQL writes new transactions to the database in files called WAL segments that are 16MB in size. Every time checkpoint_segments worth of these files have been written, by default 3, a checkpoint occurs. Checkpoints can be resource intensive, and on a modern system doing one every 48MB will be a serious performance bottleneck. Setting checkpoint_segments to a much larger value improves that. Unless you're running on a very small configuration, you'll almost certainly be better setting this to at least 10, which also allows usefully increasing the completion target.
For more write-heavy systems, values from 32 (checkpoint every 512MB) to 256 (every 4GB) are popular nowadays. Very large settings use a lot more disk and will cause your database to take longer to recover, so make sure you're comfortable with both those things before large increases. Normally the large settings (>64/1GB) are only used for bulk loading. Note that whatever you choose for the segments, you'll still get a checkpoint at least every 5 minutes unless you also increase checkpoint_timeout (which isn't necessary on most systems).

RAMdisk slower than disk?

A python program I created is IO bounded. The majority of the time (over 90%) is spent in a single loop which repeats ~10,000 times. In this loop, ~100KB data is generated and written to a temporary file; it is then read back out by another program and statistics about that data collected. This is the only way to pass data into the second program.
Due to this being the main bottleneck, I thought that moving the location of the temporary file from my main HDD to a (~40MB) RAMdisk (inside of over 2GB of free RAM) would greatly increase the IO speed for this file and so reduce the run-time. However, I obtained the following results (each averaged over 20 runs):
Test data 1: Without RAMdisk - 72.7s, With RAMdisk - 78.6s
Test data 2: Without RAMdisk - 223.0s, With RAMdisk - 235.1s
It would appear that the RAMdisk is slower that my HDD.
What could be causing this?
Are there any other alternative to using a RAMdisk in order to get faster file IO?
Your operating system is almost certainly buffering/caching disk writes already. It's not surprising the RAM disk is so close in performance.
Without knowing exactly what you're writing or how, we can only offer general suggestions. Some ideas:
If you have 2 GB RAM you probably have a decent processor, so you could write this data to a filesystem that has compression. That would trade I/O operations for CPU time, assuming your data is amenable to that.
If you're doing many small writes, combine them to write larger pieces at once. (Can we see the source code?)
Are you removing the 100 KB file after use? If you don't need it, then delete it. Otherwise the OS may be forced to flush it to disk.
Can you write the data out in batches rather than one item at a time? Are you caching resources like open file handles etc or cleaning those up? Are your disk writes blocking, can you use background threads to saturate IO while not affecting compute performance.
I would look at optimising the disk writes first, and then look at faster disks when that is complete.
I know that Windows is very aggressive about caching disk data in RAM, and 100K would fit easily. The writes are going directly to cache and then perhaps being written to disk via a non-blocking write, which allows the program to continue. The RAM disk probably wouldn't support non-blocking operations because it expects those operations to be quick and not worth the bother.
By reducing the amount of memory available to programs and caching, you're going to increase the amount of disk I/O for paging even if only slightly.
This is all speculation on my part, since I'm not familiar with the kernel or drivers. I also speculate that Linux would operate similarly.
In my tests I've found that not only batch size affects overall performance, but also the nature of data itself. I've managed to get 5 times better write times compared to SSD in only one scenario: writing a 100MB chunk of pre-cooked random byte array to RAM drive. Writing more "predictable" data like letters "aaa" or current datetime yields quite opposite results - SSD is always faster or equal. So my guess is that opertating system (Win 7 in my case) does lots of caching and optimizations.
Looks like the most hindering case for RAM-drive is when you perform lots of small writes instead of a few big ones, and RAM drive shines at writing large amounts of hard-to-compress data.
I had the same mind boggling experience, and after many tries I figured it out.
When ramdisk is formatted as FAT32, then even though benchmarks shows high values, real world use is actually slower than NTFS formatted SSD.
But NTFS formatted ramdisk is faster in real life than SSD.
I join the people having problems with RAM disk speeds (only on Windows).
The SSD i have can write 30 GiB (in one big block, dump a 30GiB RAM ARRAY) with a speed of 550 MiB/s (arround 56 seconds to write 30 GiB) ... this is if the write is asked in one source code sentence.
The RAM Disk (imDisk) i have can write 30 GiB write (in one big block, dump a 30GiB RAM ARRAY) with a speed of a bit less than 100 MiB/s (arround 5 minutes and 13 seconds to write 30 GiB) ... this is if the write is asked in one source code sentence.
I had also done another RAM test: from source code do a sequential direct write (one byte per source code loop pass) to a 30GiB RAM ARRAY (i have 64GiB of RAM) and i get a speed of near 1.3GiB/s (1298 MiB per second).
Why on the hell (on Windows) RAM Disk is so slow for one BIG secuential write?
Of course that low write speed happens on RAM disks on Windows, since i tested the same 'concept' on Linux with Linux native ram disk and Linux ram disk can write at near one gigabyte per second.
Please note that i had also tested SoftPerfect and other RAM disks on Windows, RAM Disk speeds are near the same, can not write at more than one hundred megabytes per second.
Actual Windows tested: 10 & 11 (on both HOME & PRO, on 64 bits), RAM Disk format (exFAT & NTFS); since RAM disk speed was too slow i was trying to find one Windows version where RAM disk speed be normal, but found no one.
Actual Linux Kernel tested: Only 5.15.11, since Linux native RAM disk speed was normal i do not test on any other kernel.
Hope this help other people, since knowledge is the base to solve a problem.

Resources