Are there memory leaks in Linux?

Are there memory leaks in Linux? - windows

This question is purely theoretical.
I was wondering whether the Linux source code could have memory leaks, and how they debugged it, considering that it is Linux, after all, that deals with each program's memory?
I obviously understand that Linux, being written in C, has to deal itself with malloc and free. What I don't understand is how we measure the operating system's memory leaks.
Note that this question is not Linux-specific; it also addresses the corresponding issues in Windows and MacOS X (darwin).

Quite frequently non-mainstream drivers and the staging tree has memory leaks. Follow the LKML and you can see occasional fixes for mistakes in the networking code for corner cases handling lists of SKBs.
Due to the nature of the kernel most work is code review and refactoring, but work is ongoing to make more tools:
http://www.linuxfoundation.org/en/Google_Summer_of_Code#kmemtrace_-_Kernel_Memory_Profiler
In certain cases you can use frameworks like Usermode Linux and then use conventional tools like Valgrind to attempt to peer into the running code:
http://user-mode-linux.sourceforge.net/

The implementation of malloc and free (actually brk/sbrk, since malloc and free are implemented by libc in-process) are not magical or special - it's just code, just like anything else, and there are data structures behind it that describe the mappings.
If you want to test correct behavior, one way is to write test programs in user-space that are known to allocate then free all their memory correctly. Run the app, then check the internal memory allocation structures in kernel mode using a debugger (or better yet, make this check a debug assert on process shutdown).

All software has bugs, including operating systems. Some of those bugs will result in memory leaks.
The Linux has a kernel debugger to help track down these things, but one has to discover that they exist before one can track them down. Usually, once a bug has been discovered and can be replicated at will, it becomes much easier to fix (Relatively speaking! Obviously you need a good coder to do the job). The hard part is finding the bugs in the first place and creating reliable test cases that demonstrate them. This is where you need to have a skilled QA team.
So I guess the short version of this answer is that good QA is as important good coding.

Related

Software patching at a billion miles

Could someone here shed some light about how NASA goes about designing their spacecraft architecture to ensure that they are able to patch bugs in the deployed code?
I have never built any “real time” type systems and this is a question that has come to mind after reading this article:
http://pluto.jhuapl.edu/overview/piPerspective.php?page=piPerspective_05_21_2010
“One of the first major things we’ll
do when we wake the spacecraft up next
week will be uploading almost 20 minor
bug fixes and other code enhancements
to our fault protection (or “autopilot
response”) software.”

I've been a developer on public telephone switching system software, which has pretty severe constraints on reliability, availability, survivability, and performance that approach what spacecraft systems need. I haven't worked on spacecraft (although I did work with many former shuttle programmers while at IBM), and I'm not familiar with VXworks, the operating system used on many spacecraft (including the Mars rovers, which have a phenomenal operating record).
One of the core requirements for patchability is that a system should be designed from the ground up for patching. This includes module structure, so that new variables can be added, and methods replaced, without disrupting current operations. This often means that both old and new code for a changed method will be resident, and the patching operation simply updates the dispatching vector for the class or module.
It is just about mandatory that the patching (and un-patching) software is integrated into the operating system.
When I worked on telephone systems, we generally used patching and module-replacement functions in the system to load and test our new features as well as bug fixes, long before these changes were submitted for builds. Every developer needs to be comfortable with patching and replacing modules as part of their daly work. It builds a level of trust in these components, and makes sure that the patching and replacement code is exercised routinely.
Testing is far more stringent on these systems than anything you've ever encountered on any other project. Complete and partial mock-ups of the deployment system will be readily available. There will likely be virtual machine environments as well, where the complete load can be run and tested. Test plans at all levels above unit test will be written and formally reviewed, just like formal code inspections (and those will be routine as well).
Fault tolerant system design, including software design, is essential. I don't know about spacecraft systems specifically, but something like high-availability clusters is probably standard, with the added capability to run both synchronized and unsynchronized, and with the ability to transfer information between sides during a failover. An added benefit of this system structure is that you can split the system (if necessary), reload the inactive side with a new software load, and test it in the production system without being connected to the system network or bus. When you're satisfied that the new software is running properly, you can simply failover to it.
As with patching, every developer should know how to do failovers, and should do them both during development and testing. In addition, developers should know every software update issue that can force a failover, and should know how to write patches and module replacement that avoid required failovers whenever possible.
In general, these systems are designed from the ground up (hardware, operating system, compilers, and possibly programming language) for these environments. I would not consider Windows, Mac OSX, Linux, or any unix variant, to be sufficiently robust. Part of that is realtime requirements, but the whole issue of reliability and survivability is just as critical.
UPDATE: As another point of interest, here's a blog by one of the Mars rover drivers. This will give you a perspective on the daily life of maintaining an operating spacecraft. Neat stuff!

I've never build real-time system either, but in those system, I suspect their system would not have memory protection mechanism. They do not need it since they wrote all their own software themselves. Without memory protection, it will be trivial for a program to write the memory location of another program and this can be used to hot-patch a running program (writing a self-modifying code was a popular technique in the past, without memory protection the same techniques used for self-modifying code can be used to modify another program's code).
Linux has been able to do minor kernel patching without rebooting for some time with Ksplice. This is necessary for use in situations where any downtime can be catastrophic. I've never used it myself, but I think the technique they uses is basically this:
Ksplice can apply patches to the Linux
kernel without rebooting the computer.
Ksplice takes as input a unified diff
and the original kernel source code,
and it updates the running kernel in
memory. Using Ksplice does not require
any preparation before the system is
originally booted (the running kernel
does not need to have been specially
compiled, for example). In order to
generate an update, Ksplice must
determine what code within the kernel
has been changed by the source code
patch. Ksplice performs this analysis
at the ELF object code layer, rather
than at the C source code layer.
To apply a patch, Ksplice first
freezes execution of a computer so it
is the only program running. The
system verifies that no processors
were in the middle of executing
functions that will be modified by the
patch. Ksplice modifies the beginning
of changed functions so that they
instead point to new, updated versions
of those functions, and modifies data
and structures in memory that need to
be changed. Finally, Ksplice resumes
each processor running where it left
off.
(from Wikipedia)

Well I'm sure they have simulators to test with and mechanisms for hot-patching. Take a look at the linked article below - there's a pretty good overview of the spacecraft design. Section 5 discusses the computation machinery.
http://www.boulder.swri.edu/pkb/ssr/ssr-fountain.pdf
Of note:
Redundant processors
Command switching by the uplink card that does not require processor help
Time-lagged rules

I haven't worked on spacecraft, but the machines I've worked on have all been built to have a stable idle state where it's possible to shut down the machine briefly to patch the firmware. The systems that have accommodated 'live' updates are those that were broken into interacting components, where you can bring down one segment of the system long enough to update it and the other components can continue operating as normal, as they can tolerate the temporary downtime of the serviced component.
One way you can do this is to have parallel (redundant) capabilities, such as parallel machines that all perform the same task, so that the process can be routed around the machine under service. The benefit of this approach is that you can bring it down for longer periods for more significant service, such as regular hardware preventative maintenance. Once you have this capability, supporting downtime for a firmware patch is fairly easy.

One of the approaches that's been used in the past is to use LISP.

Analogs of Intel's Cluster OpenMP

Are there analogs of Intel Cluster OpenMP? This library simulates shared-memory machine (like SMP or NUMA) while running on distributed memory machine (like Ethernet-connected cluster of PC's).
This library allows to start openmp programs directly on cluster.
I search for
libraries, which allow multithreaded programms run on distributed cluster
or libraries (replacement of e.g. libgomp), which allow OpenMP programms run on distributed cluster
or compilers, capable of generating cluster code from openmp programms, besides Intel C++

The keyword you want to be searching for is "distributed shared memory"; there's a Wikipedia page on the subject. MOSIX, which became openMOSIX, which is now being developed as part of LinuxPMI, is the closest thing I'm aware of; but I don't have much experience with the current LinuxPMI project.
One thing you need to be aware of is that none of these systems work especially well, performance-wise. (Maybe a more optimistic way of saying it is that it's a tribute to the developers that these things work at all). You can't just abstract away the fact that accessing on-node memory is very very different from memory on some other node over a network. Even making local memory systems fast is difficult and requires a lot of hardware; you can't just hope that a little bit of software will hide the fact that you're now doing things over a network.
The performance ramifications are especially important when you consider that OpenMP programs you might want to run are almost always going to be written assuming that memory accesses are local and thus cheap, because, well, that's what OpenMP is for. False sharing is bad enough when you're talking about different sockets accessing a common cache line - page-based false sharing across a network is just disasterous.
Now, it could well be that you have a very simple program with very little actual shared state, and a distributed shared memory system wouldn't be so bad -- but in that case I've got to think you'd be better off in the long run just migrating the problem away from a shared-memory-based model like OpenMP towards something that'll work better in a cluster environment anyway.

Can you start and stop boundschecker (DevPartner)?

I'm trying to use boundschecker to analyze a rather complex program. Running the program with boundschecker is almost too slow for it to be of any use since it takes me almost a day to run the program to the point in the code where I suspect the issue exists. Can anyone give me some ideas for how to inspect only certain parts of my software using boundschecker (DevPartner) in Visual Studio 2005?
Thanks in advance for all your help!

I last used BoundsChecker a few years ago, and had the same problems. With large projects, it makes everything run so slowly that it is useless. We ended up ditching it.
But, we still needed some of it's functionality, but like you, not for the whole program. So we had to do it ourselves.
In our case, we mainly used it to try and track down memory leaks. If that's your objective as well, there are other options.
Visual Studio does a pretty good job of telling you about memory leaks when your program exits
It reports leaks in the order that they were created
It will tell you exactly where the leaked memory was created if your source files have this at the top
#ifdef _DEBUG
#undef THIS_FILE
static char THIS_FILE[]=__FILE__;
#define new DEBUG_NEW
#endif
Those help a lot, but it's often not enough. Adding that snippet everywhere isn't always feasible. If you use factory classes, knowing where memory was allocated doesn't help at all. So when all else fails, we take advantage of #2.
Add something like the following:
#define LEAK(str) {char *s = (char*)malloc(100); strcpy(s, str);}
Then, pepper your code with "LEAK("leak1");" or whatever. Run the program, and exit it. Your new leaked strings will display in Visual Studio's leak dump surrounding the existing leaks. Keep adding/moving your LEAK statements and re-running the program to narrow your search until you've pinpointed the exact location. Then fix the leak, remove your debugging leaks, and you're all set!

BoundsChecker tracks all memory allocations and releases in extreme detail. It knows, for instance, that such and such a memory allocation was done from the C runtime heap, which in turn was taken from a Win32 heap, which in turn started life as memory allocated by VirtualAlloc. If the application was instrumented (FinalCheck), it also has detailed information as to which pointers reference the memory.
This is one reason (of many) why the thing is slow.
If BC were to connect to an application late, it would have none of this data built up, and would have either (1) dig it all up at once, or (2) start guessing about things. Neither solution is very practical.

One way to lighten up BoundsChecker is by excluding from instrumentation all but the few modules you are interested in. I know thats not great because if you knew where the leak was you wouldn't need BoundsChecker. What I usually recommend is that you use BC's Active Check mode first with only Memory Tracking available. You miss the API Validations but you could always rerun that seperately. After you run Active Check and you get clues regarding which modules tend to be problematic, only then do you enable instrumentation for the module or modules of interest and their dependencies. We know Final Check is annoyingly slow but as Mistiano correctly states, with Final Check not only does BC keep a graph of all allocated blocks but also all pointers and contexts to them. Therein lies the magic in how Final Check can nail leaks and corruptions at the point of occurance, not just on application shutdown or fault. Shameless plug: I work on the DevPartner team. We are releasing DPS 10.5 on February 4, 2011 with x64 application support in BC. Unlike the relatively ancient and undersold BC64 for Itanium which only provided Active Check, DPS 10.5 provides full Final Check support for x64 applications, both for pure C++ and for native modules running in .NET processes. See microfocus.com under MF Developer for details.

Supplying 64 bit specific versions of your software

Would I expect to see any performance gain by building my native C++ Client and Server into 64 bit code?
What sort of applications benefit from having a 64 bit specific build?
I'd imagine anything that makes extensive use of long would benefit, or any application that needs a huge amount of memory (i.e. more than 2Gb), but I'm not sure what else.

Architectural benefits of Intel x64 vs. x86
larger address space
a richer register set
can link against external libraries or load plugins that are 64-bit
Architectural downside of x64 mode
all pointers (and thus many instructions too) take up 2x the memory, cutting the effective processor cache size in half in the worst case
cannot link against external libraries or load plugins that are 32-bit
In applications I've written, I've sometimes seen big speedups (30%) and sometimes seen big slowdowns (> 2x) when switching to 64-bit. The big speedups have happened in number crunching / video processing applications where I was register-bound.
The only big slowdown I've seen in my own code when converting to 64-bit is from a massive pointer-chasing application where one compiler made some really bad "optimizations". Another compiler generated code where the performance difference was negligible.
Benefit of porting now
Writing 64-bit-compatible code isn't that hard 99% of the time, once you know what to watch out for. Mostly, it boils down to using size_t and ptrdiff_t instead of int when referring to memory addresses (I'm assuming C/C++ code here). It can be a pain to convert a lot of code that wasn't written to be 64-bit-aware.
Even if it doesn't make sense to make a 64-bit build for your application (it probably doesn't), it's worth the time to learn what it would take to make the build so that at least all new code and future refactorings will be 64-bit-compatible.

Before working too hard on figuring out whether there is a technical case for the 64-bit build, you must verify that there is a business case. Are your customers asking for such a build? Will it give you a definitive leg up in competition with other vendors? What is the cost for creating such a build and what business costs will be incurred by adding another item to your accounting, sales and marketing processes?
While I recognize that you need to understand the potential for performance improvements before you can get a handle on competitive advantages, I'd strongly suggest that you approach the problem from the big picture perspective. If you are a small or solo business, you owe it to yourself to do the appropriate due diligence. If you work for a larger organization, your superiors will greatly appreciate the effort you put into thinking about these questions (or will consider the whole issue just geeky excess if you seem unprepared to answer them).
With all of that said, my overall technical response would be that the vast majority of user-facing apps will see no advantage from a 64-bit build. Think about it: how much of the performance problems in your current app comes from being processor-bound (or RAM-access bound)? Is there a performance problem in your current app? (If not, you probably shouldn't be asking this question.)
If it is a Client/Server app, my bet is that network latency contributes far more to the performance on the client side (especially if your queries typically return a lot of data). Assuming that this is a database app, how much of your performance profile is due to disk latency times on the server? If you think about the entire constellation of factors that affect performance, you'll get a better handle on whether your specific app would benefit from a 64-bit upgrade and, if so, whether you need to upgrade both sides or whether all of your benefit would derive just from the server-side upgrade.

Not much else, really. Though writing a 64-bit app can have some advantages to you, as the programmer, in some cases. A simplistic example is an application whose primary focus is interacting with the registry. As a 32-bit process, your app would not have access to large swaths of the registry on 64-bit systems.

Continuing #mdbritt's comment, building for 64-bit makes far more sense [currently] if it's a server build, or if you're distributing to Linux users.
It seems that far more Windows workstations are still 32-bit, and there may not be a large customer base for a new build.
On the other hand, many server installs are 64-bit now: RHEL, Windows, SLES, etc. NOT building for them would be cutting-off a lot of potential usage, in my opinion.
Desktop Linux users are also likely to be running the 64-bit versions of their favorite distro (most likely Ubuntu, SuSE, or Fedora).
The main obvious benefit of building for 64-bit, however, is that you get around the 3GB barrier for memory usage.

According to this web page you benefit most from the extra general-purpose registers with 64 bit CPU if you use a lot of and/or deep loops.

You can expect gain thanks to the additional registers and the new passing parameters convention (which are really linked to the additional registers).

Finding GDI/User resource usage from a crash dump

I have a crash dump of an application that is supposedly leaking GDI. The app is running on XP and I have no problems loading it into WinDbg to look at it. Previously we have use the Gdikdx.dll extension to look at Gdi information but this extension is not supported on XP or Vista.
Does anyone have any pointers for finding GDI object usage in WinDbg.
Alternatively, I do have access to the failing program (and its stress testing suite) so I can reproduce on a running system if you know of any 'live' debugging tools for XP and Vista (or Windows 2000 though this is not our target).

I've spent the last week working on a GDI leak finder tool. We also perform regular stress testing and it never lasted longer than a day's worth w/o stopping due to user/gdi object handle overconsumption.
My attempts have been pretty successful as far as I can tell. Of course, I spent some time beforehand looking for an alternative and quicker solution. It is worth mentioning, I had some previous semi-lucky experience with the GDILeaks tool from msdn article mentioned above. Not to mention that i had to solve a few problems prior to putting it to work and this time it just didn't give me what and how i wanted it. The downside of their approach is the heavyweight debugger interface (it slows down the researched target by orders of magnitude which I found unacceptable). Another downside is that it did not work all the time - on some runs I simply could not get it to report/compute anything! Its complexity (judging by the amount of code) was another scare-away factor. I'm not a big fan of GUIs, as it is my belief that I'm more productive with no windows at all ;o). I also found it hard to make it find and use my symbols.
One more tool I used before setting on to write my own, was the leakbrowser.
Anyways, I finally settled on an iterative approach to achieve following goals:
minor performance penalties
implementation simplicity
non-invasiveness (used for multiple products)
relying on as much available as possible
I used detours (non-commercial use) for core functionality (it is an injectible DLL). Put Javascript to use for automatic code generation (15K script to gen 100K source code - no way I code this manually and no C preprocessor involved!) plus a windbg extension for data analysis and snapshot/diff support.
To tell the long story short - after I was finished, it was a matter of a few hours to collect information during another stress test and another hour to analyze and fix the leaks.
I'll be more than happy to share my findings.
P.S. some time did I spend on trying to improve on the previous work. My intention was minimizing false positives (I've seen just about too many of those while developing), so it will also check for allocation/release consistency as well as avoid taking into account allocations that are never leaked.
Edit: Find the tool here

There was a MSDN Magazine article from several years ago that talked about GDI leaks. This points to several different places with good information.
In WinDbg, you may also try the !poolused command for some information.
Finding resource leaks in from a crash dump (post-mortem) can be difficult -- if it was always the same place, using the same variable that leaks the memory, and you're lucky, you could see the last place that it will be leaked, etc. It would probably be much easier with a live program running under the debugger.
You can also try using Microsoft Detours, but the license doesn't always work out. It's also a bit more invasive and advanced.

I have created a Windbg script for that. Look at the answer of
Command to get GDI handle count from a crash dump
To track the allocation stack you could set a ba (Break on Access) breakpoint past the last allocated GDICell object to break just at the point when another GDI allocation happens. That could be a bit complex because the address changes but it could be enough to find pretty much any leak.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio