I am struggling a little bit with finding how to free memory allocated by LLVM functions. For example, when I call the function Function::Create() to create an LLVM function, how can I free the memory allocated to it? Same actually applies to many LLVM functions like IRBuilder::CreateAlloca(), IRBuilder::CreateStore(), etc. Any idea?
First of all, when deleting any kind of Value, make sure it doesn't have any Users anymore. Deleting used values will, obviously, lead to errors (in the form of an assertion). This can easily be tested by calling getNumUses(), or better (read: faster) hasNUses(0).
When you're sure your value isn't used anymore, different kind of values sometimes need different ways to delete them. For your two cases:
Functions can simply be deleted by calling operator delete. This makes sure the function is removed correctly from the Module.
Instructions should be deleted by calling eraseFromParent(). Or, equivalently, by first calling removeFromParent() and then deleting it manually.
Related
I am an embedded engineer and I have never worked with neither windows nor visual basic.
For my current task I have to maintain and improve a test system running on Windows, written in Visual Studio, C#(also have no experience with) .
This project uses some libraries written in visual basic(all legacy code). And I detect a problem in there. I cannot copy the code directly in here but because of legal bindings but it is something like that:
'getter()
dim temp as byte = global_data
Array.reverse(temp);
...
This is a getter function. Since there is a reverse inside, the return of this function is different after each call because when temp changed, global_data is also changed. And I can get the real value only after odd number of calls. Previous handler told me to call function only once or three times... I think this is stupid and changed it by adding a .clone() like this:
dim temp as byte = global_data.clone()
Array.reverse(temp);
And it worked :)
There are a lot of functions like this so I'm gonna make similar adjustments to them too.
But since I am not familiar with the dynamics of this system, I am afraid to face with a problem later. For example can making multiple number of clones consume my RAM? Can those clones be destroyesd? If yes, do I have to destroy them? How?
Or are there any other possible problems?
And is there an other way to do this?
Thanks in advance!
To answer your question, no there is nothing wrong with calling Clone multiple times.
The cloned byte arrays will take up memory as long as they are referenced, but that isn't unique to the byte array being cloned. Presumably that cloned byte array is being passed to other methods. Once those methods are executed the array will be eligible for garbage collection, and the system will take care of it. If this code runs very very frequently, there might be better approaches that are more efficient than allocating and eventual garbage collection of those arrays, but you won't "break" anything using the Clone over an over.
For variables of basic type, clone method copies its value, which requires the stack to allocate space for it.
Value type allocates memory in the stack. They have their own life cycle, so they are automatically allocated and released without management. So you do not have to worry about taking a lot of memory, calling it many times will not cause trouble.
For the sake of some user-space performance profiling, I'd like to cleanly separate the costs of allocating memory from operations that access it. The application does no over-allocation, so every page that gets mapped will be faulted in, probably in code that runs shortly after its allocation.
What I'd like to do is set some flag, environment variable, something, to tell malloc that it should uniformly do the equivalent of calling mmap(..., MAP_POPULATE) or madvise(..., MADV_WILLNEED) or just touching every page of whatever it allocated itself. I haven't found any documentation, on any platform(!), that describes a way to do this. Is there some existing technique that's utterly undocumented, up to my ability to search? Is this a fundamentally misguided or bad idea?
If I wanted to implement this myself, I'm thinking of an LD_PRELOAD including just a reimplementation of malloc that calls the underlying malloc and then does the madvise thing (to be at least somewhat agnostic to huge pages behavior). Any reason that shouldn't work?
malloc is one of the most used, yet relatively slow functions in common use. As a result, it has received a lot of optimization attention over the years. I seriously doubt that any serious implementation of malloc does anything so slow as the string parsing that would be required to check an environment variable at every call.
LD_PRELOAD is not a bad idea, considering what you're doing, you wouldn't even need to recompile to switch between profile and release builds. If you're open to recompiling, I would suggest doing a #define malloc(size) { malloc(size); mmap(...);}. You could even do this at the compile command line via -Dmalloc=... (so long as the system malloc is not itself a define, which would overwrite the cli one).
Another option would be to find/implement a program that uses the debug interface to intercept and redirect calls to malloc. You could theoretically do this by messing with the post-compiled (or post-load) program's import section to point to your dll/so file.
Edit: On second thought, the define might not work on every allocation, since it is often implied by the compiler (e.g. new).
I've written an app in LuaJIT, using a third-party GUI framework (FFI-based) + some additional custom FFI calls. The app suddenly loses part of its functionality at some point soon after being run, and I'm quite confident it's because of some unpinned objects being GC-ed. I assume they're only referenced from the C world1, so Lua GC thinks they're unreferenced and can free them. The problem is, I don't know which of the numerous userdata are unreferenced (unpinned) on Lua side?
To confirm my theory, I've run the app with GC disabled, via:
collectgarbage 'stop'
and lo, with this line, the app works perfectly well long past the point where it got broken before. Obviously, it's an ugly workaround, and I'd much prefer to have the GC enabled, and the app still working correctly...
I want to find out which unpinned object (userdata, I assume) gets GCed, so I can pin it properly on Lua side, to prevent it being GCed prematurely. Thus, my question is:
(How) can I track which userdata objects got collected when my app loses functionality?
One problem is, that AFAIK, the LuaJIT FFI already assigns custom __gc handlers, so I cannot add my own, as there can be only one per object. And anyway, the framework is too big for me to try adding __gc in each and every imaginable place in it. Also, I've already eliminated the "most obviously suspected" places in the code, by removing local from some variables — thus making them part of _G, so I assume not GC-able. (Or is that not enough?)
1 Specifically, WinAPI.
For now, I've added some ffi.gc() handlers to some of my objects (printing some easily visible ALL-CAPS messages), then added some eager collectgarbage() calls to try triggering the issue as soon as possible:
ffi.gc(foo, function()
print '\n\nGC FOO !!!\n\n'
end)
[...]
collectgarbage()
And indeed, this exposed some GCing I didn't expect. Specifically, it led me to discover a note in luajit's FFI docs, which is most certainly relevant in my case:
Please note that [C] pointers [...] are not followed by the garbage collector. So e.g. if you assign a cdata array to a pointer, you must keep the cdata object holding the array alive [in Lua] as long as the pointer is still in use.
Just wondering if it is possible to obtain a task for a given proc_t inside a kext.
I have tried task_for_pid() which didn't work for some reason that I don't remember.
I tried proc_task(proc_t p) from sys/proc.h but I can't load my kext since that function is not exported.
I guess that I'm doing something wrong but I can't quite figure out what. Assuming I can get the task for a process, I'd like to use some mach calls and allocate memory, write memory and whatnot but for that, I would need the task I believe.
After some research it would appear that it's not the case.
There is proc_task() defined in proc.h but it's under the #ifdef KERNEL_PRIVATE. The KEXT will compile albeit the warning.
In order to use that function, you have to add the com.apple.kpi.private in the list of dependencies but even that will fail since you are most likely NOT Apple :)
Only Apple kexts may link against com.apple.kpi.private.
Anyway, the experiment was interesting in the sense that other APIs such as vm_read, vm_write etc. are not available to use inside a KEXT (which probably makes sense since they are declared in a vm_user.h and I suppose are reserved for user mode).
I'm not aware of a public direct proc_t->task_t lookup KPI, unfortunately.
However, in some cases, you might be able to get away with using current_task() and holding on to that pointer for as long as you need it. Use task_reference and task_deallocate for reference counting (but don't hold references forever obviously, otherwise they'll never be freed). You can also access the kernel's task (corresponding to process 0) anytime via the global variable kernel_task.
V8 requires a HandleScope to be declared in order to clean up any Local handles that were created within scope. I understand that HandleScope will dereference these handles for garbage collection, but I'm interested in why each Local class doesn't do the dereferencing themselves like most internal ref_ptr type helpers.
My thought is that HandleScope can do it more efficiently by dumping a large number of handles all at once rather than one by one as they would in a ref_ptr type scoped class.
Here is how I understand the documentation and the handles-inl.h source code. I, too, might be completely wrong since I'm not a V8 developer and documentation is scarce.
The garbage collector will, at times, move stuff from one memory location to another and, during one such sweep, also check which objects are still reachable and which are not. In contrast to reference-counting types like std::shared_ptr, this is able to detect and collect cyclic data structures. For all of this to work, V8 has to have a good idea about what objects are reachable.
On the other hand, objects are created and deleted quite a lot during the internals of some computation. You don't want too much overhead for each such operation. The way to achieve this is by creating a stack of handles. Each object listed in that stack is available from some handle in some C++ computation. In addition to this, there are persistent handles, which presumably take more work to set up and which can survive beyond C++ computations.
Having a stack of references requires that you use this in a stack-like way. There is no “invalid” mark in that stack. All the objects from bottom to top of the stack are valid object references. The way to ensure this is the LocalScope. It keeps things hierarchical. With reference counted pointers you can do something like this:
shared_ptr<Object>* f() {
shared_ptr<Object> a(new Object(1));
shared_ptr<Object>* b = new shared_ptr<Object>(new Object(2));
return b;
}
void g() {
shared_ptr<Object> c = *f();
}
Here the object 1 is created first, then the object 2 is created, then the function returns and object 1 is destroyed, then object 2 is destroyed. The key point here is that there is a point in time when object 1 is invalid but object 2 is still valid. That's what LocalScope aims to avoid.
Some other GC implementations examine the C stack and look for pointers they find there. This has a good chance of false positives, since stuff which is in fact data could be misinterpreted as a pointer. For reachability this might seem rather harmless, but when rewriting pointers since you're moving objects, this can be fatal. It has a number of other drawbacks, and relies a lot on how the low level implementation of the language actually works. V8 avoids that by keeping the handle stack separate from the function call stack, while at the same time ensuring that they are sufficiently aligned to guarantee the mentioned hierarchy requirements.
To offer yet another comparison: an object references by just one shared_ptr becomes collectible (and actually will be collected) once its C++ block scope ends. An object referenced by a v8::Handle will become collectible when leaving the nearest enclosing scope which did contain a HandleScope object. So programmers have more control over the granularity of stack operations. In a tight loop where performance is important, it might be useful to maintain just a single HandleScope for the whole computation, so that you won't have to access the handle stack data structure so often. On the other hand, doing so will keep all the objects around for the whole duration of the computation, which would be very bad indeed if this were a loop iterating over many values, since all of them would be kept around till the end. But the programmer has full control, and can arrange things in the most appropriate way.
Personally, I'd make sure to construct a HandleScope
At the beginning of every function which might be called from outside your code. This ensures that your code will clean up after itself.
In the body of every loop which might see more than three or so iterations, so that you only keep variables from the current iteration.
Around every block of code which is followed by some callback invocation, since this ensures that your stuff can get cleaned if the callback requires more memory.
Whenever I feel that something might produce considerable amounts of intermediate data which should get cleaned (or at least become collectible) as soon as possible.
In general I'd not create a HandleScope for every internal function if I can be sure that every other function calling this will already have set up a HandleScope. But that's probably a matter of taste.
Disclaimer: This may not be an official answer, more of a conjuncture on my part; but the v8 documentation is hardly
useful on this topic. So I may be proven wrong.
From my understanding, in developing various v8 based backed application. Its a means of handling the difference between the C++ and javaScript environment.
Imagine the following sequence, which a self dereferencing pointer can break the system.
JavaScript calls up a C++ wrapped v8 function : lets say helloWorld()
C++ function creates a v8::handle of value "hello world =x"
C++ returns the value to the v8 virtual machine
C++ function does its usual cleaning up of resources, including dereferencing of handles
Another C++ function / process, overwrites the freed memory space
V8 reads the handle : and the data is no longer the same "hell!#(#..."
And that's just the surface of the complicated inconsistency between the two; Hence to tackle the various issues of connecting the JavaScript VM (Virtual Machine) to the C++ interfacing code, i believe the development team, decided to simplify the issue via the following...
All variable handles, are to be stored in "buckets" aka HandleScopes, to be built / compiled / run / destroyed by their
respective C++ code, when needed.
Additionally all function handles, are to only refer to C++ static functions (i know this is irritating), which ensures the "existence"
of the function call regardless of constructors / destructor.
Think of it from a development point of view, in which it marks a very strong distinction between the JavaScript VM development team, and the C++ integration team (Chrome dev team?). Allowing both sides to work without interfering one another.
Lastly it could also be the sake of simplicity, to emulate multiple VM : as v8 was originally meant for google chrome. Hence a simple HandleScope creation and destruction whenever we open / close a tab, makes for much easier GC managment, especially in cases where you have many VM running (each tab in chrome).