What is the opposite of "caching"? - caching

As far as I can understand, the act of caching something is moving something from a slower memory (e.g. RAM) to a faster memory (e.g. Cache), when you want to access that thing a lot of times.
What if there is something in a faster memory (e.g. RAM), but you want to store it on a slower memory (e.g. Hard Disk) because you aren't using it a lot?
Is there a word for that action? Just "storing" doesn't seem right. Couldn't find a better word on Google, just by describing my problem.

The used term for that is Swapped out or Rolled out
(https://en.wikipedia.org/wiki/Memory_paging)

Related

If we have infinite memory, then do we still be needing paging?

Paging creates illusion that each process has infinite RAM by moving pages to and from disk. So if we have infinite memory(in some hypothetical situation), do we still need Paging? If yes, then why? I faced this question in an interview.
Assuming that "infinite memory" means infinite randomly accessable memory, or RAM, we will still need paging. Although paging is often associated with the ability to swap pages in and out of RAM to a hard disk to conserve memory, this is merely one aspect of paging. Here are some other reasons to have paging:
Security. Paging is a method to enforce operating system security and memory protection by ensuring that a processes cannot access the memory of another process and that it cannot modify the resident kernel.
Multitasking. Paging aids in multitasking by virtualizing the memory space, that is, address 0xFOO in Process A can be something completely different than 0xFOO in Process B
Memory Allocation. Paging aids in memory allocation by reducing fragmentation and ensuring RAM is only allocated when accessed. What this means is that although a process needs, suppose, 100MB of continuous RAM space, this need not be continuous physically. Additionally, when a program requests 100MB of space, the operating system will tell the program it's safe to use that 100MB of space, yet it will not be actually allocated until the program uses that space to its fullest.
Admittedly, the latter would not be entirely necessary if one had infinite RAM, nonetheless; it is always good practice to be eifficient even when we are not resource constrained. It also demonstrates a use for paging that is sometimes not considered.
This is a philosophical question, so here's a philosophical answer :)
The trick in this question is that you make assumptions about the infinite memory. It's ok to say "no, no need to use paging, BUT". And follow by:
The infinite memory has to be accessible within the acceptable time limit for memory access. If it's not (because infinity takes a lot of space, and the memory sits farther away from the processing unit), there's no difference between it and the disk, both are not satisfying the readily available memory requirement, which is what caching via pages tries to solve.
Take for example Amazon's S3, which for all practical purposes is infinite. If you can rely on S3 to satisfy all your memory requirements in the sense that when you need something fetched within time x you can fetch it from S3, there's no need to page anything or even hold it in "local" memory. Just get it from S3 whenever you need it, as many times as you need it. (Obviously this would have other repercussions like cost and network, but let's ignore that for now).
Of course you can always say that optimally you want memory access to be as fast as possible, and "fast enough" is probably slower than "fastest", so local memory access would give you better performance etc.
And last, if I had to envision a memory which is infinite and has the same access time no matter how "far" the memory unit is from the fetching unit, I would have to envision a sphere where the processing unit is in the middle, so that you can't argue that one memory unit is slower than the other because of the distance. Otherwise you could say that paging would be done internally within the memory so that access is faster to the memory units that are most used (or whatever algorithm you choose to use).

On-demand paging to allow analysis of large amounts of data

I am working on an analysis tool that reads output from a process and continuously converts this to an internal format. After the "logging phase" is complete, analysis is done on the data. The data is all held in memory.
However, due to the fact that all logged information is held in memory, there is a limit on the duration of the logging. For most use cases this is ok, but it should be possible to run for longer, even if this will hurt performance.
Ideally, the program should be able to start using hard drive space in addition to RAM once the RAM usage reaches a certain limit.
This leads to my question:
Are there any existing solutions for doing this? It has to work on both Unix and Windows.
To use the disk after memory is full, we use Cache technologies such as EhCache. They can be configured with the amount of memory to use, and to overflow to disk.
But they also have smarter algorithms you can configure as needed, such as sending to disk data not used in the last 10 minutes etc... This could be a plus for you.
Without knowing more about your application it is not possible to provide a perfect answer. However it does sound a bit like you are re-inventing the wheel. Have you considered using an in-process database library like sqlite?
If you used that or similar it will take care of moving the data to and from the disk and memory and give you powerful SQL query capabilities at the same time. Even if your logging data is in a custom format if each item has a key or index of some kind a small light database may be a good fit.
This might seem too obvious, but what about memory mapped files? This does what you want and even allows a 32 bit application to use much more than 4GB of memory. The principle is simple, you allocate the memory you need (on disk) and then map just a portion of that into system memory. You could, for example, map something like 75% of the available physical memory size. Then work on it, and when you need another portion of the data, just re-map. The downside to this is that you have to do the mapping manually, but that's not necessarily bad. The good thing is that you can use more data than what fits into physical memory and into the per-process memory limit. It works really great if you actually use only part of the data at any given time.
There may be libraries that do this automatically, like the one KLE suggested (though I do not know that one). Doing it manually means you'll learn a lot about it and have more control, though I'd prefer a library if it does exactly what you want with regard to how and when the disk is being used.
This works similar on both Windows on Unix. For Windows, here is an article by Raymond Chen that shows a simple example.

Why is LRU better than FIFO?

Why is Least Recently Used better than FIFO in relation to page files?
If you mean in terms of offloading memory pages to disk - if your process is frequently accessing a page, you really don't want it to be paged to disk, even if it was the very first one you accessed. On the other hand, if you haven't accessed a memory page for several days, it's unlikely that you'll do so in the near future.
If that's not what you mean, please edit your question to give more details.
There is no single cache algorithm which will always do well because that requires perfect knowledge of the future. (And if you know where to get that...) The dominance of LRU in VM cache design is the result of a long history of measuring system behavior. Given real workloads, LRU works pretty well a very large fraction of the time. However, it isn't very hard to construct a reference string for which FIFO would have superior performance over LRU.
Consider a linear sweep through a large address space much larger than the available pageable real memory. LRU is based on the assumption that "what you've touched recently you're likely to touch again", but the linear sweep completely violates that assumption. This is why some operating systems allow programs to advise the kernel about their reference behavior - one example is "mark and sweep" garbage collection typified by classic LISP interpreters. (And a major driver for work on more modern GCs like "generational".)
Another example is the symbol table in a certain antique macro processor (STAGE2). The binary tree is searched from the root for every symbol, and the string evaluation is being done on a stack. It turned out that reducing the available page frames by "wiring down" the root page of the symbol tree and the bottom page of the stack made a huge improvement in the page fault rate. The cache was tiny, and it churned violently, always pushing out the two most frequently referenced pages because the cache was smaller than the inter-reference distance to those pages. So a small cache worked better, but ONLY because those two page frames stolen from the cache were used wisely.
The net of all this is that LRU is the standard answer because it's usually pretty good for real workloads on systems that aren't hideously overloaded (VM many times the real memory available), and that is supported by years of careful measurements. However,
you can certainly find cases where alternative behavior will be superior. This is why measuring real systems is important.
Treat the RAM as a cache. In order to be an effective cache, it needs to keep the items most likely to be requested in memory.
LRU keeps the things that were most recently used in memory. FIFO keeps the things that were most recently added. LRU is, in general, more efficient, because there are generally memory items that are added once and never used again, and there are items that are added and used frequently. LRU is much more likely to keep the frequently-used items in memory.
Depending on access patterns, FIFO can sometimes beat LRU. An Adaptive Replacement Cache is hybrid that adapts its strategy based on actual usage patterns.
According to temporal locality of reference, memory that has been accessed recently is more likely to be accessed again soon.

Algorithms for Optimization with Fast Disk Storage (SSDs)?

Given that Solid State Disks (SSDs) are decreasing in price and soon will become more prevalent as system drives, and given that their access rates are significantly higher than rotating magnetic media, what standard algorithms will gain in performance from the use of SSDs for local storage? For example, the high random read speed of SSDs makes something like a disk-based hashtable a viability for large hashstables; 4GB of disk space is readily available, which makes hashing to the entire range of a 32-bit integer viable (more for lookup than population, though, which would still take a long time); while this size of a hashtable would be prohibitive to work with with rotating media due to the access speed, it shouldn't be as much of an issue with SSDs.
Are there any other areas where the impending transition to SSDs will provide potential gains in algorithmic performance? I'd rather see reasoning as to how one thing will work rather than opinion; I don't want this to turn contentious.
Your example of hashtables is indeed the key database structure that will benefit. Instead of having to load a whole 4GB or more file into memory to probe for values, the SSD can be probed directly. The SSD is still slower than RAM, by orders of magnitude, but it's quite reasonable to have a 50GB hash table on disk, but not in RAM unless you pay big money for big iron.
An example is chess position databases. I have over 50GB of hashed positions. There is complex code to try to group related positions near each other in the hash, so I can page in 10MB of the table at a time and hope to reuse some of it for multiple similar position queries. There's a ton of code and complexity to make this efficient.
Replaced with an SSD, I was able to drop all the complexity of the clustering and just use really dumb randomized hashes. I also got an increase in performance since I only fetch the data I need from the disk, not big 10MB chunks. The latency is indeed larger, but the net speedup is significant.. and the super-clean code (20 lines, not 800+), is perhaps even nicer.
SSDs are only significantly faster for random access. Sequential access to disk they are only twice as performant as mainstream rotational drives. Many SSDs have poorer performance in many scenarios causing them to perform worse, as described here.
While SSDs do move the needle considerably, they are still much slower than CPU operations and physical memory. For your 4GB hash table example, you may be able to sustain 250+ MB/s off of an SSD for accessing random hash table buckets. For a rotational drive, you'd be lucky to break the single digit MB/s. If you can keep this 4 GB hash table in memory, you could access it on the order of gigabytes a second - much faster than even a very swift SSD.
The referenced article lists several changes MS made for Windows 7 when running on SSD's, which can give you an idea of the sort of changes you could consider making. First, SuperFetch for prefetching data off of disk is disabled - it's designed to get around slow random access times for disk which are alleviated by SSDs. Defrag is disabled, because having files scattered across the disk aren't a performance hit for SSDs.
Ipso facto, any algorithm you can think of which requires lots of random disk I/O (random being the key word, which helps to throw the principle of locality to the birds, thus eliminating the usefulness of a lot of caching that goes on).
I could see certain database systems gaining from this though. MySQL, for instance using the MyISAM storage engine (where data records are basically glorified CSVs). However, I think very large hashtables are going to be your best bet for good examples.
SSD are a lot faster for random reads, a bit for sequential reads and properly slower for writes (random or not).
So a diskbased hashtable is properly not useful with an SSD, since it now takes significantly time to update it, but searching the disk becomes (compared to a normal hdd) very cheap.
Don't kid yourself. SSDs are still a whole lot slower than system memory. Any algorithm that chooses to use system memory over the hard disk is still going to be much faster, all other things being equal.

Optimizing locations of on-disk data for sequential access

I need to store large amounts of data on-disk in approximately 1k blocks. I will be accessing these objects in a way that is hard to predict, but where patterns probably exist.
Is there an algorithm or heuristic I can use that will rearrange the objects on disk based on my access patterns to try to maximize sequential access, and thus minimize disk seek time?
On modern OSes (Windows, Linux, etc) there is absolutely nothing you can do to optimise seek times! Here's why:
You are in a pre-emptive multitasking system. Your application and all it's data can be flushed to disk at any time - user switches task, screen saver kicks in, battery runs out of charge, etc.
You cannot guarantee that the file is contiguous on disk. Doing Aaron's first bullet point will not ensure an unfragmented file. When you start writing the file, the OS doesn't know how big the file is going to be so it could put it in a small space, fragmenting it as you write more data to it.
Memory mapping the file only works as long as the file size is less than the available address range in your application. On Win32, the amount of address space available is about 2Gb - memory used by application. Mapping larger files usually involves un-mapping and re-mapping portions of the file, which won't be the best of things to do.
Putting data in the centre of the file is no help as, for all you know, the central portion of the file could be the most fragmented bit.
To paraphrase Raymond Chen, if you have to ask about OS limits, you're probably doing something wrong. Treat your filesystem as an immutable black box, it just is what it is (I know, you can use RAID and so on to help).
The first step you must take (and must be taken whenever you're optimising) is to measure what you've currently got. Never assume anything. Verify everything with hard data.
From your post, it sounds like you haven't actually written any code yet, or, if you have, there is no performance problem at the moment.
The only real solution is to look at the bigger picture and develop methods to get data off the disk without stalling the application. This would usually be through asynchronous access and speculative loading. If your application is always accessing the disk and doing work with small subsets of the data, you may want to consider reorganising the data to put all the useful stuff in one place and the other data elsewhere. Without knowing the full problem domain it's not possible to to be really helpful.
Depending on what you mean by "hard to predict", I can think of a few options:
If you always seek based on the same block field/property, store the records on disk sorted by that field. This lets you use binary search for O(log n) efficiency.
If you seek on different block fields, consider storing an external index for each field. A b-tree gives you O(log n) efficiency. When you seek, grab the appropriate index, search it for your block's data file address and jump to it.
Better yet, if your blocks are homogeneous, consider breaking them down into database records. A database gives you optimized storage, indexing, and the ability to perform advanced queries for free.
Use memory-mapped file access rather than the usual open-seek-read/write pattern. This technique works on Windows and Unix platforms.
In this way the operating system's virtual memory system will handle the caching for you. Accesses of blocks that are already in memory will result in no disk seek or read time. Writes from memory back to disk are handled automatically and efficiently and without blocking your application.
Aaron's notes are good too as they will affect initial-load time for a chunk that's not in memory. Combine that with the memory-mapped technique -- after all it's easier to reorder chunks using memcpy() than by reading/writing from disk and attempting swapouts etc.
The most simple way to solve this is to use an OS which solves that for you under the hood, like Linux. Give it enough RAM to hold 10% of the objects in RAM and it will try to keep as many of them in the cache as possible reducing the load time to 0. The recent server versions of Windows might work, too (some of them didn't for me, that's why I'm mentioning this).
If this is a no go, try this algorithm:
Create a very big file on the harddisk. It is very important that you write this in one go so the OS will allocate a continuous space on disk.
Write all your objects into that file. Make sure that each object is the same size (or give each the same space in the file and note the length in the first few bytes of of each chunk). Use an empty harddisk or a disk which has just been defragmented.
In a data structure, keep the offsets of each data chunk and how often it is accessed. When it is accessed very often, swap its position in the file with a chunk that is closer to the start of the file and which has a lesser access count.
[EDIT] Access this file with the memory-mapped API of your OS to allow the OS to effectively cache the most used parts to get best performance until you can optimize the file layout next time.
Over time, heavily accessed chunks will bubble to the top. Note that you can collect the access patterns over some time, analyze them and do the reorder over night when there is little load on your machine. Or you can do the reorder on a completely different machine and swap the file (and the offset table) when that's done.
That said, you should really rely on a modern OS where a lot of clever people have thought long and hard to solve these issues for you.
That's an interesting challenge. Unfortunately, I don't know how to solve this out of the box, either. Corbin's approach sounds reasonable to me.
Here's a little optimization suggestion, at least: Place the most-accessed items at the center of your disk (or unfragmented file), not at the start of end. That way, seeking to lesser-used data will be closer by average. Err, that's pretty obvious, though.
Please let us know if you figure out a solution yourself.

Resources