A possible method to completely free memory in TCL? - memory-management

Is there a way to completely free memory in TCL, meaning to reallocate it back to the operating system without closing the tclsh.85 process??

If you're doing this because you're about to unload the tcl DLL from your process, there's the Tcl_Finalize() API call. Otherwise, no, there's no way to do it by default. If you build Tcl with the PURIFY symbol defined (named for a commercial tool that does memory analysis) then Tcl will use the system memory allocator instead of its own high-performance thread-aware allocator; that might be able to release memory back to the OS. Or it might not; it's out of Tcl's hands entirely.
There's no script-level API for this at all. Most of the time, if a process uses a lot of memory it's either going to exit soon, going to reuse that much memory, or can rely on the OS just paging the unused memory out.

Related

Is writing kernel memory with system call available?

I know system call's uses is to communicate between use-level and kernel-level
So, Does that mean I can write kernel memory with system call?
e.g. write() is used to write kernel memory
But if what I think is available, It also relate big-security problem?
If I can't, why?
Let's break your questions down one by one.
Yes you can write to kernel memory via system calls directly if you implemented one that allowed writing to arbitrary locations in memory. Whether or why you would want something like this is another question. Something like this would pose a huge security risk. For example, any process can elevate their privileges or install rootkits in the kernel easily using your system call.
However, Linux does have an interface for reading and writing to kernel memory. Have a look at /dev/kmem. It allows you to open kernel memory as a 'file', seek around and read and write to it.

When does the MacOSX 'free' library call madvise, and is there any way to control it?

I've got a C++ program that is notably slower on OSX 10.8.2 than on Linux. Profiling shows that the reason is that calls to free (that result from STL operations, FWIW), are much slower on OSX, because they go and call madvise, and real time gets consumed in there.
Is there any way to modulate this behavior of OS/X?
Well, yes!
I had horrible performance issues with malloc/free in Linux and started looking for a replacement.
Two options came to mind tbbmalloc (part of Intel TBB which is free BTW) and Google malloc.
After extensive testing it wasn't clear which was faster (of the two) but both were significantly faster than LIBC's implementation.
I went with tbbmalloc since it was working smoother, google malloc had a bug that caused virtual memory to be very large (reserved but not committed) which was very bad for my app (IT daemons would kill it).
The good:
Much better performance than libc's malloc. Was 3x-300x in STL heavy app.
Simple integration. No code change. Add/change 1 line the executable's makefile. No change to SOs.
The bad:
Mem checkers will not with replacements. for memchk/valgrind/etc. revert to the original malloc.
App would take 10-30% more memory.
The app I developed was a CAD application that used 10s of GBs, building and destroying 10s of millions various structures (lots of STL maps, vectors, hash_maps).
How to do this:
In the linker command, add -ltbbmalloc and make sure the library is in the lib search path (-L flag).

how can i access my system resources without the operating system intermediation?

i want to access my system resources such as CPU without the use of OS system calls.
is there any way to make this possible?
The only way to access the hardware directly on most modern operating systems, Linux and Windows included, is via kernel code. Linux Device Drivers is an excellent starting point for writing such code on Linux, even if it is a bit dated.
Otherwise, the OS provides various I/O facilities and controls the allocation of resources to the user applications, using the system call interface. The system call interface is omnipresent in its basic concept among all operating systems that actually have some sort of separation between kernel and user code. The use of software interrupts is the standard way to implement system calls on current hardware.
You need a system call to allocate the slightest amount of memory and even to read or write a single character. Not to mention that even a program that does absolutely nothing generally needs a few system calls just to be loaded.
You could gain more direct access to the hardware if you used DOS or an exokernel design.
But why would you want to do that anyway? Modern hardware is far from trivial to work with directly.

Low-overhead I/O monitoring on Windows

I would like a low-overhead method of monitoring the I/O of a Windows process.
I got several useful answers to Monitoring certain system calls done by a process in Windows. The most promising was about using Windows Performance Toolkit to get a kernel event trace. All necessary information can indeed be pulled from there, but the WPT is a massive overkill for what I need and subsequently has a prohibitive overhead.
My idea was to implement an alternative approach to detecting C/C++ dependency graphs. Usually this is done by passing an option to the compiler (-M, for example). This works fine for compilers and tools which have such an option, but not all of them do, and those who do often implement them differently. So, I implemented an alternative way of doing this on Linux using strace to detect which files are opened. Running gcc (for example) in this way has a 50% overhead (ballpark figure), and I was hoping to figure out a way to do this on windows with a similarish overhead.
The xperf set of tools have two issues which prevents me from using them in this case:
There is no way to monitor file-I/O events for a single process; I have to use the kernel event trace which traces every single process and thus generates huge amounts of data (15Mb for the time it takes to run gcc, YMMV).
As a result of having to use the kernel event trace, I have to run as administrator.
I really don't need events at the kernel level; I suppose I could manage just as well if I could just monitor, say, the Win32 API call CreateFile(), and possibly CreateProcess() if I want to catch forked processes.
Any clever ideas?
Use API hooking. Hooking NtCreateFile and a few other calls in ntdll should be enough. I've had good experience using easyhook as a framework to do the hooking itself - free and open source. Even supports managed hooking (c# etc) if you wanted to do that. It's quite easy to set up.
It's at located at http://easyhook.codeplex.com
Edit: btw detours does not allow 64 bit hooking (unless you buy a license for a nominal price of 10,000USD)
EasyHook does not allow native hooks across a WOW64 boundary. It allows managed hooking across WOW64 boundaries though.
I used Microsoft's Detours in the past to track memory allocations by intercepting particular API calls. You could use it to track CreateFile and CreateProcess.
It seems like Dr. Memory's System Call Tracer for Windows is exactly what I was looking for. It is basically a strace implementation for Windows.

Windows malloc replacement (e.g., tcmalloc) and dynamic crt linking

A C++ program that uses several DLLs and QT should be equipped with a malloc replacement (like tcmalloc) for performance problems that can be verified to be caused by Windows malloc. With linux, there is no problem, but with windows, there are several approaches, and I find none of them appealing:
1. Put new malloc in lib and make sure to link it first (Other SO-question)
This has the disadvantage, that for example strdup will still use the old malloc and a free may crash the program.
2. Remove malloc from the static libcrt library with lib.exe (Chrome)
This is tested/used(?) for chrome/chromium, but has the disadvantage that it just works with static linking the crt. Static linking has the problem if one system library is linked dynamically against msvcrt there may be mismatches in the heap allocation/deallocation. If I understand it correctly, tcmalloc could be linked dynamically such that there is a common heap for all self-compiled dlls (which is good).
3. Patch crt-source code (firefox)
Firefox's jemalloc apparently patches the windows CRT source code and builds a new crt. This has again the static/dynamic linking problem above.
One could think of using this to generate a dynamic MSVCRT, but I think this is not possible, because the license forbids providing a patched MSVCRT with the same name.
4. Dynamically patching loaded CRT at run time
Some commercial memory allocators can do such magic. tcmalloc can do, too, but this seems rather ugly. It had some issues, but they have been fixed. Currently, with tcmalloc it does not work under 64 bit windows.
Are there better approaches? Any comments?
Q: A C++ program that is split accross several dlls should:
A) replace malloc?
B) ensure that allocation and de-allocation happens in the same dll module?
A: The correct answer is B. A c++ application design that incorporates multiple DLLs SHOULD ensure that a mechanism exists to ensure that things that are allocated on the heap in one dll, are free'd by the same dll module.
Why would you split a c++ program into several dlls anyway? By c++ program I mean that the objects and types you are dealing with are c++ templates, STL objects, classes etc. You CAN'T pass c++ objects accross dll boundries without either lot of very careful design and lots of compiler specific magic, or suffering from massive duplication of object code in the various dlls, and as a result an application that is extremely version sensitive. Any small change to a class definition will force a rebuild of all exe's and dll's, removing at least one of the major benefits of a dll approach to app development.
Either stick to a straight C interface between app and dll's, suffer hell, or just compile the entire c++ app as one exe.
It's a bold claim that a C++ program "should be equipped with a malloc replacement (like tcmalloc) for performance problems...."
"[In] 6 out of 8 popular benchmarks ... [real-sized applications] replacing back the custom allocator, in which people had invested significant amounts of time and money, ... with the system-provided dumb allocator [yielded] better performance. ... The simplest custom allocators, tuned for very special situations, are the only ones that can provide gains." --Andrei Alexandrescu
Most system allocators are about as good as a general purpose allocator can be. You can do better only if you have a very specific allocation pattern.
Typically, such special patterns apply only to a portion of the program, in which case, it's better to apply the custom allocator to the specific portion that can benefit than it is to globally replace the allocator.
C++ provides a few ways to selectively replace the allocator. For example, you can provide an allocator to an STL container or you can override new and delete on a class by class basis. Both of these give you much better control than any hack which globally replaces the allocator.
Note also that replacing malloc and free will not necessarily change the allocator used by operators new and delete. While the global new operator is typically implemented using malloc, there is no requirement that it do so. So replacing malloc may not even affect most of the allocations.
If you're using C, chances are you can wrap or replace key malloc and free calls with your custom allocator just where it matters and leave the rest of the program to use the default allocator. (If that's not the case, you might want to consider some refactoring.)
System allocators have decades of development behind them. They are stable and well-tested. They perform extremely well for general cases (in terms of raw speed, thread contention, and fragmentation). They have debugging versions for leak detection and support for tracking tools. Some even improve the security of your application by providing defenses against heap buffer overrun vulnerabilities. Chances are, the libraries you want to use have been tested only with the system allocator.
Most of the techniques to replace the system allocator forfeit these benefits. In some cases, they can even increase memory demand (because they can't be shared with the DLL runtime possibly used by other processes). They also tend to be extremely fragile in the face of changes in the compiler version, runtime version, and even OS version. Using a tweaked version of the runtime prevents your users from getting benefits of runtime updates from the OS vendor. Why give all that up when you can retain those benefits by applying a custom allocator just to the exceptional part of the program that can benefit from it?
Where does your premise "A C++ program that uses several DLLs and QT should be equipped with a malloc replacement" come from?
On Windows, if the all the dlls use the shared MSVCRT, then there is no need to replace malloc. By default, Qt builds against the shared MSVCRT dll.
One will run into problems if they:
1) mix dlls that use static linking vs using the shared VCRT
2) AND also free memory that was not allocated where it came from (ie, free memory in a statically linked dll that was allocated by the shared VCRT or vice versa).
Note that adding your own ref counted wrapper around a resource can help mitigate that problems associated with resources that need to be deallocated in particular ways (ie, a wrapper that disposes of one type of resource via a call back to the originating dll, a different wrapper for a resource that originates from another dll, etc).
nedmalloc? also NB that smplayer uses a special patch to override malloc, which may be the direction you're headed in.

Resources