Clean up after killing a thread - linux-kernel

After reading this article https://developer.ibm.com/tutorials/l-memory-leaks/ I'm wondering is there a way to cancel thread execution and avoid memory leaks. Since my understanding is that the join functionality is releasing the allocated space. That should be possible to do also by other commands. The thing that interest me how does join releases the memory space and other functions cant? Is there a function that gives to witch thread a memory space is assigned? Can this be given out (the mapping)? I know one should not do crazy things with that since it represents an potential safety issue. But still are there ways to achieve that?

For example if I have a third party lib then I can identify its threads but I have the problem that I cannot identify allocated memory spaces in the lib, or I do not know how to do that (the lib is a binary).
If the library doesn't support that, you can't. Your understanding of the issue is slightly off. It doesn't matter who allocated the memory, it matters whether the memory still needs to be allocated or not. If the library provides some way to get to the point where the memory no longer needs to be allocated, that provided way would also provide a way to free the memory. If the library doesn't provide any way to get to the point where the memory no longer needs to be allocated, some way to free it would not be helpful.
Coding such stuff is a rabbit hole and should be done on the OS level.
Can't be done. The OS has no way to know when the code that allocated some chunk of memory still needs it and when it doesn't. Only the code that allocated the memory can possibly know that.
Posix allows canceling but not identifying the individual threads, and not all Posix functionality works on linux. Posix is just a layer over the stl stuff in the OS.
Right, so POSIX is not the place where this goes. It requires understanding of the application and so must be done at the application layer. If you need this functionality, code it. If you need it in other people's code and they don't supply it, talk to them. Presumably, if their code is decent and appropriate, it has some way to d what you need. If not, your complaint is with the code that doesn't do what you need.
My thoughts on that were that somewhere in Linux the system tracks what allocation on heap were made by the threads if some option is enabled since I know by default there is nothing.
That doesn't help. Which thread allocated memory tells you absolutely nothing about when it is no longer needed. Only the same code that decided it was needed can tell when it is no longer needed. So if this is needed in some code that allocates memory, that code must implement this. If the person who implemented that code did not provide this kind of facility, then that means they decided it wasn't needed. You may wish to ask them why they made that decision. Their answer may well surprise you.
But I see there is no answer to a serious question.
The answer is to code what you need. If it's someone else's code and they didn't code it, then they didn't think you would need it. They're most likely right. But if they're wrong, then don't use their code.

Related

How to measure the point in time at which a slice of data in memory was accessed?

Suppose I'm reading large chunks of data into memory and processing them sequentially. Is there a way to pinpoint when a given segment/chunk of the memory was accessed, by using some kind of system tool that will log memory address accesses?
An approach I'm considering - which doesn't rely on measurement utilities - is logging what data is being processed at any point of time, and inferring the usage based on looking at the data itself. But that is not a generic solution.
These are some of the ideas that have been brewing in my head to do what you want. Never had the time to explore these in more detail though.
Simplest method is to add a watchpoint for the address inside gdb, if you need a quick fix kind of solution.
Another way to do this is to mark the pages READONLY for chunks of data you want to check access for. On Linux this can be done using mprotect call. This assumes you are debugging this code, as the access to the page will cause a segfault. You could possibly install a signal handler.
Another way to do the same maybe to us ptrace system call, which maybe more trouble than it's worth.
If you just want to count accesses to a memory address you can use perf_event_open system call on newer linux kernels. See documentation for PERF_COUNT_HW_CACHE_OP_READ and PERF_COUNT_HW_CACHE_OP_WRITE. You are on your own with that one though. It maybe even less worthwhile to use this method. However, since the question is marked with the performance tag, this maybe what you are looking for.
If you just want a system tool, you might want to look at perf tool and dig into the manuals to see if it can do the same thing that I described with perf_event_open. This tool is a wrapper around that system call, so I am guessing that it should have support for the functionality I mentioned in the previous point.

Dynamically adapting caches to available memory

Is there a way to implement dynamically adapting caches in userspace? I would like my programs to allocate caches that employ some fair share of the available physical memory. If the system is running out of physical memory, caches should be dropped as chosen by the program, and in no case should they be swapped out. It is preferrable that no special privilege was needed, so it is not necessary to actually lock the memory. The program should just get to know that pages are swapped out, so it is not going to use them. All in all, it should work something like caches and buffers implemented in the kernel. Can you point out general ideas and APIs how that can be done? Platforms I am interested in are Linux and Windows.
Why do you think there is any reasonable way to define "fair share"? It's not really a great UX when the application tries to know too much: far better would be to find a sensible, minimal default, and offer the user a config option to adjust it. Even better is to provide the user with stats to show how well the current-sized cache is doing - bigger isn't always better.
There is no "cooperative memory management" API in Linux - no way for the kernel to tell user-space to use less memory. The closest I can think of is that the (relatively new) memory cgroup controller can provide a "notifier" when a memory limit is reached (rather than OOM-killing the allocating process.) That's not exactly nice to use, but then again, any such interface is going to flirt with being race/deadlock-prone. Polling with mincore might work in somewhat contrived/constrained situations, but given that the app has no way to understand the changing system-wide demand for memory, it's not going to work well.

Simple toy OS memory management

I'm developing a simple little toy OS in C and assembly as an experiment, but I'm starting to worry myself with my lack of knowledge on system memory.
I've been able to compile the kernel, run it in Bochs (loaded by GRUB), and have it print "Hello, world!" Now I'm off trying to make a simple memory manager so I can start experimenting with other things.
I found some resources on memory management, but they didn't really have enough code to go off of (as in I understood the concept, but I was at a loss for actually knowing how to implement it).
I tried a few more or less complicated strategies, then settled with a ridiculously simplistic one (just keep an offset in memory and increase it by the size of the allocated object) until the need arises to change. No fragmentation control, protection, or anything, yet.
So I would like to know where I can find more information when I do need a more robust manager. And I'd also like to learn more about paging, segmentation, and other relevant things. So far I haven't dealt with paging at all, but I've seen it mentioned often in OS development sites, so I'm guessing I'll have to deal with it sooner or later.
I've also read about some form of indirect pointers, where an application holds a pointer that is redirected by the memory manager to its real location. That's quite a ways off for me, I'm sure, but it seems important if I ever want to try virtual memory or defragmentation.
And also, where am I supposed to put my memory offset? I had no idea what the best spot was, so I just randomly picked 0x1000, and I'm sure it's going to come back to me later when I overwrite my kernel or something.
I'd also like to know what I should expect performance-wise (e.g. a big-O value for allocation and release) and what a reasonable ratio of memory management structures to actual managed memory would be.
Of course, feel free to answer just a subset of these questions. Any feedback is greatly appreciated!
If you don't know about it already, http://wiki.osdev.org/ is a good resource in general, and has multiple articles on memory management. If you're looking for a particular memory allocation algorithm, I'd suggest reading up on the "buddy system" method (http://en.wikipedia.org/wiki/Buddy_memory_allocation). I think you can probably find an example implementation on the Internet. If you can find a copy in a library, it's also probably worth reading the section of The Art Of Computer Programming dedicated to memory management (Volume 1, Section 2.5).
I don't know where you should put the memory offset (to be honest I've never written a kernel), but one thing that occurred to me which might work is to place a static variable at the end of the kernel, and start allocations after that address. Something like:
(In the memory manager)
extern char endOfKernel;
... (also in the memory manager)
myOffset = &endOfKernel;
... (at the end of the file that gets placed last in the binary)
char endOfKernel;
I guess it goes without saying, but depending on how serious you get about the operating system, you'll probably want some books on operating system design, and if you're in school it wouldn't hurt to take an OS class.
If you're using GCC with LD, you can create a linker script that defines a symbol at the end of the .BSS section (which would give you the complete size of the kernel's memory footprint). Many kernels in fact use this value as a parameter for GRUB's AOUT_KLUDGE header.
See http://wiki.osdev.org/Bare_bones#linker.ld for more details, note the declaration of the ebss symbol in the linker script.

How to detect high contention critical sections?

My application uses many critical sections, and I want to know which of them might cause high contention. I want to avoid bottlenecks, to ensure scalability, especially on multi-core, multi-processor systems.
I already found one accidentally when I noticed many threads hanging while waiting to enter critical section when application was under heavy load. That was rather easy to fix, but how to detect such high contention critical sections before they become a real problem?
I know there is a way to create a full dump and get that info from it (somehow?). But this is rather intrusive way. Are there methods application can do on the fly to diagnose itself for such issues?
I could use data from structure _RTL_CRITICAL_SECTION_DEBUG, but there are notes that this could be unsafe across different Windows versions: http://blogs.msdn.com/b/oldnewthing/archive/2005/07/01/434648.aspx
Can someone suggest a reliable and not too complex method to get such info?
What you're talking about makes perfect sense during testing, but isn't really feasible in production code.
I mean.. you CAN do things in production code, such as determine the LockCount and RecursionCount values (this is documented), subtract RecursionCount from LockCount and presto, you have the # of threads waiting to get their hands on the CRITICAL_SECTION object.
You may even want to go deeper. The RTL_CRITICAL_SECTION_DEBUG structure IS documented in the SDK. The only thing that ever changed regarding this structure was that some reserved fields were given names and were put to use. I mean.. it's in the SDK headers (winnt.h), documented fields do NOT change. You misunderstood Raymond's story. (He's partially at fault, he likes a sensation as much as the next guy.)
My general point is, if there's heavy lock contention in your application, you should, by all means, ferret it out. But don't ever make the code inside a critical section bigger if you can avoid it. And reading the debug structure (or even lockcount/recursioncount) should only ever happen when you're holding the object. It's fine in a debug/testing version, but it should not go into production.
There are other ways to handle concurrency besides critical sections (i.e semaphores). One of the best ways is non-blocking synchronization. That means structuring your code to not require blocking even with shared resources. You shoudl read up on concurrency. Also, you can post a code snippet here and someone can give you advice on how ways to improve your concurrecy code.
Take a look at Intel Thread Profiler. It should be able to help to spot such problems.
Also you may want to instrument your code by wrapping critical sections in a proxy that dumps data on the disk for analysis. It really depends on the app itself, but it could be at least the information how long thread been waiting for the CS.

Have you ever used NSZoneMalloc() instead of malloc()?

Cocoa provides for page-aligned memory areas that it calls Memory Zones, and provides a few memory management functions that take a zone as an argument.
Let's assume you need to allocate a block of memory (not for an object, but for arbitrary data). If you call malloc(size), the buffer will always be allocated in the default zone. However, somebody may have used allocWithZone: to allocate your object in another zone besides the default. In that case, it would seem better to use NSZoneMalloc([self zone], size), which keeps your buffer and owning object in the same area of memory.
Do you follow this practice? Have you ever made use of memory zones?
Update: I think there is a tendency on Stack Overflow to respond to questions about low-level topics with a lecture about premature optimization. I understand that zones probably mattered more in 1993 on NeXT hardware than they do today, and a Google search makes it pretty clear that virtually nobody is concerned with them. I am asking anyway, to see if somebody could describe a project where they made use of memory zones.
I've written software for NeXTStep, GNUstep on Linux and Cocoa on Mac OS X, and have never needed to use custom memory zones. The condition which would suggest it as a good improvement to the software has either never arisen, or never been detected as significant.
You're absolutely right in your entire question, but in practice, nobody really uses zones. As the page you link to puts it:
In most circumstances, using the default zone is faster and more efficient than creating a separate zone.
The benefit of making your own zone is:
If a page fault occurs when trying to access one of the objects, loading the page brings in all of the related objects, which could significantly reduce the number of future page faults.
If a page fault occurs, that means that the system was recently paging things out and is therefore slow anyway, and that either your app is not responsible or the solution is in the part of your app that allocated too much memory at once in the first place.
So, basically, the question is “can you prove that you really do need to create your own zone to fix a performance problem or make your app wicked fast”, and the answer is “no”.
If you find yourself doing this, you're probably operating at a lower level than you really ought to be. The subsystem pretty much ignores them; any calls to +alloc or such will get you objects in the default zone. malloc and NSAllocateCollectable are all you need to know.

Resources