How to measure cache performance of zfs with cache drive - caching

I am trying to compare different filesystems, most with cache/tiered storage features, but so far it does not seem to work as it should. (btw, I know this might the wrong site, but when I searched for zfs, most SE results were on stackoverflow, so it seemed good to ask here)
When testing zfs, I created a single pool, with a main drive/partition and another drive (ssd) added as a cache. The main drive/partition was around 200 GB, the ssd 120 GB. This showed up correctly in zpool.
Then I ran phoronix test suite with iozone, or iozone separately. After some initial unfamiliarity, I settled on phoronix-test-suite run-default pts/iozone which I than ran on just a hdd, just an ssd and an hdd partition with ssd as a cache. And on two laptops which have ssds for comparisons. In the test with zfs + cache, there was virtually no difference to using just an hdd. It was really really slow. And I made sure to set the working directory to the zpool and verified that the temp file was created there and also checked zpool iostat to make sure that the pool was working.
Now, while I might have suspected lower results, I would hope that the speeds should at least be somewhat slower, especially with an 'easy' test such as this, which just does 3 runs of reading 1 MB records from an 8 GB file, and then 3 runs of writing 1 MB records from an 8 GB file.
Now, maybe because of the way zfs cache and similar ones work - they cannot be captured by such a simple test - but then, which would be a good test to capture the benefit of the cache? However, as the test file fits on the cache ssd easily, why it is not written there first and transferred back to the hdd in the background?
The zpool looks like this:
pool: ztest
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
ztest ONLINE 0 0 0
sdb7 ONLINE 0 0 0
cache
sdc ONLINE 0 0 0
errors: No known data errors

Here are my guesses of what the mismatch in expectation / reality is:
For the read benchmark (3 runs of reading 1 MB records from an 8 GB file)
The ZFS cache device (commonly called the "L2ARC") gets populated when a block is written or read. From your description, I'm guessing that the benchmark writes the file once, then reads it sequentially 3 times. I would expect the L2ARC to make a copy of the blocks on your cache device during the first write, or at the very least when you first read the data. (Although, note that the L2ARC does not yet persist across reboots because the map of what's on disk is only stored in memory -- kind of a silly limitation but probably not what's affecting your test.)
Are you using zfs set secondarycache=all to cache all data blocks, as opposed to just metadata blocks? (Just to disambiguate / explain the naming, the primarycache property has similar settings for the in-RAM cache, aka the "ARC".)
To check if the L2ARC is being used during your benchmark, you can look at arcstat data -- the stats you'll be interested in are:
"l2hits": [6, 1000, "L2ARC hits per second"],
"l2miss": [6, 1000, "L2ARC misses per second"],
With the benchmark you described, I would expect to see a very high hit rate (assuming your SSD is >8GB).
For the write benchmark (3 runs of writing 1 MB records from an 8 GB file)
This will only be helped if you also add an SSD log device (commonly called the "ZIL" like you mentioned in one of the comments). I'd split your SSD into two partitions: one very small to use as the ZIL (only has to store enough data to cache ~10s of writes assuming you haven't tuned the filesystem), and one using the rest of the drive as an L2ARC.
To address the advice you found about not using a ZIL unless you have a big beefy server, I don't think there's any reason not to use a ZIL on a small system. I guess it ties up a little extra SSD that could have been used for a read cache, but it doesn't use extra RAM or a noticeable amount of additional CPU, so effectively it should speed up your write latencies / burst throughput with no adverse side effects.

Related

Find out reason for slow indexing in elasticsearch

I have written a script to bulk index a dataset with elasticsearch. It is working as intended, however, if I run the same script on the same dataset on different servers the execution time varies. In the server equipped with SSD, the 2 million documents are done indexing within 10 minutes, however on the one with normal hard disk, it takes up to an hour to complete. Is there a diagnostic tool I can make use of to figure out what causes the slow down?
Some additional information:
The script is written for Python3, and uses elasticsearch-py module for the bulk indexing
Both server runs the same operating system (Ubuntu 14.04 LTS), the one with slower hard drive has 64GB of RAM, but the one with SSD has half the RAM.
You will run into index merges when the large number of records is ingested. That is a process heavily dependent on the speed of the underlying storage. RAM is not really that significant here - it may be more significant when it comes to query performance and stuff you do there. Disk latencies will add up and cause a slow-down compared to the SSD platform.
Therefore, I am not surprised about the SSD speedup. SSD storage is faster than HDD by a factor of 3-8, depending on the manufacturers. If you take into account that HDDs also needs to perform positioning operations for access to different parts of the storage, it is clear that simply using an SDD instead of an HDD can accelerate disk-bound applications by a factor of 10 and more.

Maximize memory for file cache or memory-based file system on Windows?

I have an application which needs to create many small files in maximum performance (less than 1% of them may be read later), and I want to avoid using asynchronous file API to keep the simplicity of my code. The size of total files written cannot be pre-determined, so I figured that to achieve maximum performance, I would need:
1.Windows to utilize all unused RAM for cache (especially for file writes), with no regard of relibility or other issues. If I have more than 1GB of unused RAM and I create one million of 1KB files, I expect Windows to report "DONE" immediately, even if it has written nothing to disk yet.
OR
2.A memory-based file system backed by real on-disk file system. Whenever I write files, it should first write everything in memory only, and then update on-disk file system in background. No delay in synchronous calls unless there isn't enough free memory. Note it's different from tmpfs or RAM disk implementations on Windows, since they require fixed amount of memory and utilize page file when there isn't enough RAM.
For option 1, I have tested VHD files - while it does offer as high as 50% increase of total performance under some configurations, it still does flushing-to-disk and cause writing to wait unnecessarily. And it appears there is no way I can disable journaling or further tweak the write caching behavior.....
For option 2, I have yet found anything similar..... Do such things exist?

How many cores for SSIS?

I did a proof of concept for a complex transformation in SSIS. I have performance metrics now for this POC that I created in a virtual machine, with 1 gig memory, 1 core assigned. The SSIS transformations are all file based (source and target).
Now I want to use this metric for choosing the right amount of cores and memory in production environment.
What would be the right strategy to determine the right amount of cores and memory for production if I know the amount of files per day and the total amount of file size per day to be transformed ?
(edit) Think about total transfer sizes of 100 gigabyte and 5000 files per day!
You'd want to do two other benchmarks: 2 GB mem, 1 core and 1 GB mem, dual core. Taking a snapshot of a fairly tiny environment is difficult to extrapolate without a couple more datapoints.
Also, with only 1GB RAM you'll also want to make sure the server isn't also running out of memory and paging to disk (which will skew your figures somewhat as everything becomes reliant on disk access - and given you're already reading from disk anyway...). So make sure you know what's happening there as well.
SSIS tries to buffer as much as it can in memory for speed, so more memory is always good :-) The bigger question is what benefit extra cores will give you.
There are a number of areas for performance. One is the number of cores. The more cores you have the more parallel work that can be done. This of course is also dependent upon how you build your package. Certain objects are synchronous others are asynchronous. Memory is also a factor, but it is limited to 100MB/dataflow component.

RAMdisk slower than disk?

A python program I created is IO bounded. The majority of the time (over 90%) is spent in a single loop which repeats ~10,000 times. In this loop, ~100KB data is generated and written to a temporary file; it is then read back out by another program and statistics about that data collected. This is the only way to pass data into the second program.
Due to this being the main bottleneck, I thought that moving the location of the temporary file from my main HDD to a (~40MB) RAMdisk (inside of over 2GB of free RAM) would greatly increase the IO speed for this file and so reduce the run-time. However, I obtained the following results (each averaged over 20 runs):
Test data 1: Without RAMdisk - 72.7s, With RAMdisk - 78.6s
Test data 2: Without RAMdisk - 223.0s, With RAMdisk - 235.1s
It would appear that the RAMdisk is slower that my HDD.
What could be causing this?
Are there any other alternative to using a RAMdisk in order to get faster file IO?
Your operating system is almost certainly buffering/caching disk writes already. It's not surprising the RAM disk is so close in performance.
Without knowing exactly what you're writing or how, we can only offer general suggestions. Some ideas:
If you have 2 GB RAM you probably have a decent processor, so you could write this data to a filesystem that has compression. That would trade I/O operations for CPU time, assuming your data is amenable to that.
If you're doing many small writes, combine them to write larger pieces at once. (Can we see the source code?)
Are you removing the 100 KB file after use? If you don't need it, then delete it. Otherwise the OS may be forced to flush it to disk.
Can you write the data out in batches rather than one item at a time? Are you caching resources like open file handles etc or cleaning those up? Are your disk writes blocking, can you use background threads to saturate IO while not affecting compute performance.
I would look at optimising the disk writes first, and then look at faster disks when that is complete.
I know that Windows is very aggressive about caching disk data in RAM, and 100K would fit easily. The writes are going directly to cache and then perhaps being written to disk via a non-blocking write, which allows the program to continue. The RAM disk probably wouldn't support non-blocking operations because it expects those operations to be quick and not worth the bother.
By reducing the amount of memory available to programs and caching, you're going to increase the amount of disk I/O for paging even if only slightly.
This is all speculation on my part, since I'm not familiar with the kernel or drivers. I also speculate that Linux would operate similarly.
In my tests I've found that not only batch size affects overall performance, but also the nature of data itself. I've managed to get 5 times better write times compared to SSD in only one scenario: writing a 100MB chunk of pre-cooked random byte array to RAM drive. Writing more "predictable" data like letters "aaa" or current datetime yields quite opposite results - SSD is always faster or equal. So my guess is that opertating system (Win 7 in my case) does lots of caching and optimizations.
Looks like the most hindering case for RAM-drive is when you perform lots of small writes instead of a few big ones, and RAM drive shines at writing large amounts of hard-to-compress data.
I had the same mind boggling experience, and after many tries I figured it out.
When ramdisk is formatted as FAT32, then even though benchmarks shows high values, real world use is actually slower than NTFS formatted SSD.
But NTFS formatted ramdisk is faster in real life than SSD.
I join the people having problems with RAM disk speeds (only on Windows).
The SSD i have can write 30 GiB (in one big block, dump a 30GiB RAM ARRAY) with a speed of 550 MiB/s (arround 56 seconds to write 30 GiB) ... this is if the write is asked in one source code sentence.
The RAM Disk (imDisk) i have can write 30 GiB write (in one big block, dump a 30GiB RAM ARRAY) with a speed of a bit less than 100 MiB/s (arround 5 minutes and 13 seconds to write 30 GiB) ... this is if the write is asked in one source code sentence.
I had also done another RAM test: from source code do a sequential direct write (one byte per source code loop pass) to a 30GiB RAM ARRAY (i have 64GiB of RAM) and i get a speed of near 1.3GiB/s (1298 MiB per second).
Why on the hell (on Windows) RAM Disk is so slow for one BIG secuential write?
Of course that low write speed happens on RAM disks on Windows, since i tested the same 'concept' on Linux with Linux native ram disk and Linux ram disk can write at near one gigabyte per second.
Please note that i had also tested SoftPerfect and other RAM disks on Windows, RAM Disk speeds are near the same, can not write at more than one hundred megabytes per second.
Actual Windows tested: 10 & 11 (on both HOME & PRO, on 64 bits), RAM Disk format (exFAT & NTFS); since RAM disk speed was too slow i was trying to find one Windows version where RAM disk speed be normal, but found no one.
Actual Linux Kernel tested: Only 5.15.11, since Linux native RAM disk speed was normal i do not test on any other kernel.
Hope this help other people, since knowledge is the base to solve a problem.

Memory mapping of files and system cache behavior in WinXP

Our application is memory intensive and deals with reading a large number of disk files. The total load can be more than 3 GB.
There is a custom memory manager that uses memory mapped files to achieve reading of such a huge data. The files are mapped into the process memory space only when needed and with this the process memory is well under control. But what is observed is, with memory mapping, the system cache keeps on increasing until it occupies the available physical memory. This leads to the slowing down of the entire system.
My question is how to prevent system cache from hogging the physical memory? I attempted to remove the file buffering (by using FILE_FLAG_NO_BUFFERING ), but with this, the read operations take considerable amount of time and slows down the application performance. How to achieve the scalability without sacrificing much on performance. What are the common techniques used in such cases?
I dont have a good understanding of the WinXP OS caching behavior. Any good links explaining the same would also be helpful.
I work on a file backup product, so we often run into similar scenario, where our own file access causes the cache manager to keep data around -- which can cause memory usage to spike.
By default the windows cache manager will try to be useful by reading ahead, and keeping file data around in case its needed again.
There are several registry keys that let you tweak the cache behavior, and some of our customers have had good results with this.
XP is unique in that it has some server capability, but by default optimized for desktop programs, not caching. You can enable System Cache Mode in XP, which causes more memory to be set aside for caching. This might improve performance, or you may be doing this already and its having a negative side effect! You can read about that here
I can't recommend a custom memory manager, but I do know that most heavy weight apps do there own caching (Exchange, SQL). You can observe that by running process monitor.
If you want to completely prevent cache manager from using memory to cache your files you must disable both read and write caching:
FILE_FLAG_NO_BUFFERING and
FILE_FLAG_WRITE_THROUGH
There are other hints you can give the CM, (random access, temporary file) read this document on Caching Behavior here
You can still get good read performance even if you disable caching, but your going to have to emulate the cache manager behavior by having your own background threads doing read aheads
Also, I would recommend upgrading to a server class OS, even windows 2003 is going to give you more cache manager tuning options. And of course if you can move to Windows 7 / Server 2008, you will get even more performance improvements, with the same physical resources, because of dynamic paged/non paged pool sizing, and working set improvements. There is a nice article on that here
Type this down in a notepad and save it as .vbs file. Run it whenever u realize the system RAM is too low. The system cache gets cleared and adds up to RAM. I found it else where on net and giving it here so that it might help you. Also, it is suggested to care that the first record should not ever exceed half your actual RAM. So if u have 1 gb ram, start with the following text in your vbs file.
FreeMem=Space(240000000) <This one is to clear 512 MB ram>
FreeMem=Space(120000000) <This one is to clear 256 MB ram>
FreeMem=Space(90000000) <This one is to clear 128 MB ram>
FreeMem=Space(48000000) <This one is to clear 64 MB ram>
FreeMem=Space(20000000) <This one is to clear 52 MB ram>

Resources