Dynamically adapting caches to available memory

Dynamically adapting caches to available memory - windows

Is there a way to implement dynamically adapting caches in userspace? I would like my programs to allocate caches that employ some fair share of the available physical memory. If the system is running out of physical memory, caches should be dropped as chosen by the program, and in no case should they be swapped out. It is preferrable that no special privilege was needed, so it is not necessary to actually lock the memory. The program should just get to know that pages are swapped out, so it is not going to use them. All in all, it should work something like caches and buffers implemented in the kernel. Can you point out general ideas and APIs how that can be done? Platforms I am interested in are Linux and Windows.

Why do you think there is any reasonable way to define "fair share"? It's not really a great UX when the application tries to know too much: far better would be to find a sensible, minimal default, and offer the user a config option to adjust it. Even better is to provide the user with stats to show how well the current-sized cache is doing - bigger isn't always better.
There is no "cooperative memory management" API in Linux - no way for the kernel to tell user-space to use less memory. The closest I can think of is that the (relatively new) memory cgroup controller can provide a "notifier" when a memory limit is reached (rather than OOM-killing the allocating process.) That's not exactly nice to use, but then again, any such interface is going to flirt with being race/deadlock-prone. Polling with mincore might work in somewhat contrived/constrained situations, but given that the app has no way to understand the changing system-wide demand for memory, it's not going to work well.

Related

Clean up after killing a thread

After reading this article https://developer.ibm.com/tutorials/l-memory-leaks/ I'm wondering is there a way to cancel thread execution and avoid memory leaks. Since my understanding is that the join functionality is releasing the allocated space. That should be possible to do also by other commands. The thing that interest me how does join releases the memory space and other functions cant? Is there a function that gives to witch thread a memory space is assigned? Can this be given out (the mapping)? I know one should not do crazy things with that since it represents an potential safety issue. But still are there ways to achieve that?

For example if I have a third party lib then I can identify its threads but I have the problem that I cannot identify allocated memory spaces in the lib, or I do not know how to do that (the lib is a binary).
If the library doesn't support that, you can't. Your understanding of the issue is slightly off. It doesn't matter who allocated the memory, it matters whether the memory still needs to be allocated or not. If the library provides some way to get to the point where the memory no longer needs to be allocated, that provided way would also provide a way to free the memory. If the library doesn't provide any way to get to the point where the memory no longer needs to be allocated, some way to free it would not be helpful.
Coding such stuff is a rabbit hole and should be done on the OS level.
Can't be done. The OS has no way to know when the code that allocated some chunk of memory still needs it and when it doesn't. Only the code that allocated the memory can possibly know that.
Posix allows canceling but not identifying the individual threads, and not all Posix functionality works on linux. Posix is just a layer over the stl stuff in the OS.
Right, so POSIX is not the place where this goes. It requires understanding of the application and so must be done at the application layer. If you need this functionality, code it. If you need it in other people's code and they don't supply it, talk to them. Presumably, if their code is decent and appropriate, it has some way to d what you need. If not, your complaint is with the code that doesn't do what you need.
My thoughts on that were that somewhere in Linux the system tracks what allocation on heap were made by the threads if some option is enabled since I know by default there is nothing.
That doesn't help. Which thread allocated memory tells you absolutely nothing about when it is no longer needed. Only the same code that decided it was needed can tell when it is no longer needed. So if this is needed in some code that allocates memory, that code must implement this. If the person who implemented that code did not provide this kind of facility, then that means they decided it wasn't needed. You may wish to ask them why they made that decision. Their answer may well surprise you.
But I see there is no answer to a serious question.
The answer is to code what you need. If it's someone else's code and they didn't code it, then they didn't think you would need it. They're most likely right. But if they're wrong, then don't use their code.

Is there any trick to deliberately swap out a page in linux kernel?

I was trying to debug some issues and I want to conjure up a scenario when physical memory page is swapped out. Is there any trick to do this?
Linux kernel: 3.10.x
Platform: arm
Thank a lot.

If you really mean "in the linux kernel" then yes. There are functions that cause a page to be swapped that you could invoke directly. See pageout() as a starting point. I suspect it would be non-trivial to get this all setup just right.
If you mean "is there a way to do it from user-space", the answer is no. Well, not directly (AFAIK anyway). Your best bet would be to not touch the page in question further, meanwhile allocate lots of other memory (this could be done in a separate process) and touch all those other pages so that the one you care about becomes least recently used and hence a candidate for paging.
Not sure how -- from user space -- you would detect that it actually had been paged though. The point of virtual memory is to hide that from you. I suppose you could have a high likelihood of knowing it had been paged after the fact by timing how long it takes to access the memory once you finally do so.

Simple toy OS memory management

I'm developing a simple little toy OS in C and assembly as an experiment, but I'm starting to worry myself with my lack of knowledge on system memory.
I've been able to compile the kernel, run it in Bochs (loaded by GRUB), and have it print "Hello, world!" Now I'm off trying to make a simple memory manager so I can start experimenting with other things.
I found some resources on memory management, but they didn't really have enough code to go off of (as in I understood the concept, but I was at a loss for actually knowing how to implement it).
I tried a few more or less complicated strategies, then settled with a ridiculously simplistic one (just keep an offset in memory and increase it by the size of the allocated object) until the need arises to change. No fragmentation control, protection, or anything, yet.
So I would like to know where I can find more information when I do need a more robust manager. And I'd also like to learn more about paging, segmentation, and other relevant things. So far I haven't dealt with paging at all, but I've seen it mentioned often in OS development sites, so I'm guessing I'll have to deal with it sooner or later.
I've also read about some form of indirect pointers, where an application holds a pointer that is redirected by the memory manager to its real location. That's quite a ways off for me, I'm sure, but it seems important if I ever want to try virtual memory or defragmentation.
And also, where am I supposed to put my memory offset? I had no idea what the best spot was, so I just randomly picked 0x1000, and I'm sure it's going to come back to me later when I overwrite my kernel or something.
I'd also like to know what I should expect performance-wise (e.g. a big-O value for allocation and release) and what a reasonable ratio of memory management structures to actual managed memory would be.
Of course, feel free to answer just a subset of these questions. Any feedback is greatly appreciated!

If you don't know about it already, http://wiki.osdev.org/ is a good resource in general, and has multiple articles on memory management. If you're looking for a particular memory allocation algorithm, I'd suggest reading up on the "buddy system" method (http://en.wikipedia.org/wiki/Buddy_memory_allocation). I think you can probably find an example implementation on the Internet. If you can find a copy in a library, it's also probably worth reading the section of The Art Of Computer Programming dedicated to memory management (Volume 1, Section 2.5).
I don't know where you should put the memory offset (to be honest I've never written a kernel), but one thing that occurred to me which might work is to place a static variable at the end of the kernel, and start allocations after that address. Something like:
(In the memory manager)
extern char endOfKernel;
... (also in the memory manager)
myOffset = &endOfKernel;
... (at the end of the file that gets placed last in the binary)
char endOfKernel;
I guess it goes without saying, but depending on how serious you get about the operating system, you'll probably want some books on operating system design, and if you're in school it wouldn't hurt to take an OS class.

If you're using GCC with LD, you can create a linker script that defines a symbol at the end of the .BSS section (which would give you the complete size of the kernel's memory footprint). Many kernels in fact use this value as a parameter for GRUB's AOUT_KLUDGE header.
See http://wiki.osdev.org/Bare_bones#linker.ld for more details, note the declaration of the ebss symbol in the linker script.

On-demand paging to allow analysis of large amounts of data

I am working on an analysis tool that reads output from a process and continuously converts this to an internal format. After the "logging phase" is complete, analysis is done on the data. The data is all held in memory.
However, due to the fact that all logged information is held in memory, there is a limit on the duration of the logging. For most use cases this is ok, but it should be possible to run for longer, even if this will hurt performance.
Ideally, the program should be able to start using hard drive space in addition to RAM once the RAM usage reaches a certain limit.
This leads to my question:
Are there any existing solutions for doing this? It has to work on both Unix and Windows.

To use the disk after memory is full, we use Cache technologies such as EhCache. They can be configured with the amount of memory to use, and to overflow to disk.
But they also have smarter algorithms you can configure as needed, such as sending to disk data not used in the last 10 minutes etc... This could be a plus for you.

Without knowing more about your application it is not possible to provide a perfect answer. However it does sound a bit like you are re-inventing the wheel. Have you considered using an in-process database library like sqlite?
If you used that or similar it will take care of moving the data to and from the disk and memory and give you powerful SQL query capabilities at the same time. Even if your logging data is in a custom format if each item has a key or index of some kind a small light database may be a good fit.

This might seem too obvious, but what about memory mapped files? This does what you want and even allows a 32 bit application to use much more than 4GB of memory. The principle is simple, you allocate the memory you need (on disk) and then map just a portion of that into system memory. You could, for example, map something like 75% of the available physical memory size. Then work on it, and when you need another portion of the data, just re-map. The downside to this is that you have to do the mapping manually, but that's not necessarily bad. The good thing is that you can use more data than what fits into physical memory and into the per-process memory limit. It works really great if you actually use only part of the data at any given time.
There may be libraries that do this automatically, like the one KLE suggested (though I do not know that one). Doing it manually means you'll learn a lot about it and have more control, though I'd prefer a library if it does exactly what you want with regard to how and when the disk is being used.
This works similar on both Windows on Unix. For Windows, here is an article by Raymond Chen that shows a simple example.

Profiling for analyzing the low level memory accesses of my program

I have to analyze the memory accesses of several programs. What I am looking for is a profiler that allow me to see which one of my programs is more memory intensive insted of computing intensive. I am very interested in the number of accesses to the L1 data cache, L2, and the main memory.
It needs to be for Linux and if it is possible only with command usage. The programming language is c++. If there is any problem with my question, such as I do not understand what you mean or we need more data please comment below.
Thank you.
Update with the solution
I have selected the answer of Crashworks as favourited because is the only one that provided something of what I was looking for. But the question is still open, if you know a better solution please answer.

It is not possible to determine all accesses to memory, since it doesn't make much sense. An access to memory could be executing next instruction (program resides in memory), or when your program reads or write a variable, so your program is almost accessing memory all the time.
What could be more interesting for you could be follow the memory usage of your program (both heap and stack). In this case you can use standard top command.
You could also monitor system calls (i.e. to write to disk or to attach/alloc a shared memory segment). In this case you should use strace command.
A more complete control to do everything would be debugging your program by means of gdb debugger. It lets you control your program such as setting breakpoints to a variable so the program is interrputed whenever it is read or written (maybe this is what you were looking for). On the other hand GDB can be tricky to learn so DDD, which is a gtk graphical frontend will help you starting out with it.
Update: What you are looking for is really low level memory access that it is not available at user level (that is the task of the kernel of the operating system). I am not sure if even L1 cache management is handled transparently by CPU and hidden to kernel.
What is clear is that you need to go as down as kernel level, so KDB, explained here o KDBG, explained here.
Update 2: It seems that Linux kernel does handle CPU cache but only L1 cache. The book Understanding the Linux Virtual Memory Manager explais how memory management of Linux kernel works. This chapter explains some of the guts of L1 cache handling.

If you are running Intel hardware, then VTune for Linux is probably the best and most full-featured tool available to you.
Otherwise, you may be obliged to read the performance-counter MSRs directly, using the perfctr library. I haven't any experience with this on Linux myself, but I found a couple of papers that may help you (assuming you are on x86 -- if you're running PPC, please reply and I can provide more detailed answers):
http://ieeexplore.ieee.org/Xplore/login.jsp?url=/iel5/11169/35961/01704008.pdf?temp=x
http://www.cise.ufl.edu/~sb3/files/pmc.pdf
In general these tools can't tell you exactly which lines your cache misses occur on, because they work by polling a counter. What you will need to do is poll the "l1 cache miss" counter at the beginning and end of each function you're interested in to see how many misses occur inside that function, and of course you may do so hierarchically. This can be simplified by eg inventing a class that records the start timer on entering scope and computes the delta on leaving scope.
VTune's instrumented mode does this for you automatically across the whole program. The equivalent AMD tool is CodeAnalyst. Valgrind claims to be an open-source cache profiler, but I've never used it myself.

Perhaps cachegrind (part of the valgrind suite) may be suitable.

Do you need something more than the unix command top will provide? This provides cpu usage and memory usage of linux programs in an easy to read presentation format.
If you need something more specific, a profiler perhaps, the software language (java/c++/etc.) will help determine which profiler is best for your situation.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio