Serializing a CUfunction object

Serializing a CUfunction object - compilation

Is it possible to serialize a CUfunction object generated by NVRTC and save it on a non-volatile memory (disk, SSD, etc.) so that it can be used again later without having to go through the JIT compilation process?

I have not found a way to do this. The API exposes a way to retrieve PTX from an nvrtc compilation call, but not the resulting binary payload. You could use that PTX output to avoid calling an nvrtc session at every runtime, but that is about all that is possible.

Related

Golang new memory allocation

I have started programming in Go and I was wondering when new(Object) is used it allocates memory to the size of that object right? If this is the case how do I free this memory once I have finished using the object?
I ask this because in C++ when new is used on an object you can delete the object once there is no longer any need for the object to be stored.
I have been searching to see if Go does have delete or something similar to C++ but I have been unable to find anything.
Any help is much appreciated.

As you see here:
Go is fully garbage-collected and provides fundamental support for concurrent execution and communication.
So you don't have to care about memory allocation.

Go has garbage collection. This means the Go runtime checks in the background if an object or any other variable is not used anymore and if this is the case, frees the memory.
Also see the Go FAQ: Why is the syntax so different from C? - Why do garbage collection? Won't it be too expensive?

In Go, unlike in C and C++, but like in Java, memory is managed automatically by a garbage collector.
There is no delete to call.
Off-topic:
in C++ when new is used on an object you can delete the object once there is no longer any need for the object to be stored.
You must delete, otherwise you have memory leak.

PIN_CALLER_TRACKS_DIRTY_DATA in User Mode

One possible solution to the problem of Why does WriteFile call ReadFile and how do I avoid it?. Is to write to file using CcPreparePinWrite and PIN_CALLER_TRACKS_DIRTY_DATA. Basically what this does is to make the cache manager map a file section into memory without having to read it from disk, since the entire section is assumed to be overwritten.
The PIN_CALLER_TRACKS_DIRTY_DATA flag is commonly used in cases where a file system is managing a log file that is written to but not read from. Because the existing file data will be overwritten and not read, the cache manager may return pages of zeros instead of faulting in the actual pages of file data from disk.
This is all great in theory. Though it seems quite complicated to achieve in practice. Especially since these are kernel-mode functions that cannot be called from a user-mode application.
Is there any way to achieve this behaviour using the regular WriteFile API? Or is there any good resource that further explain how to make use of the Cache Manager Routines?

Why do user space apps need kernel headers?

I am studying a smartphone project. During compilation process it's installing kernel header files for user space building.
Why do user space apps need kernel headers?

In general, those headers are needed because userspace applications often talk to kernel, passing some data. To do this, they have to agree on the structure of data passed between them.
Most of the kernel headers are only needed by libc library (if you're using one) as it usually hides all the lowlevel aspects from you by the providing abstractions conforming to some standards like POSIX (it will usually provide its own include files). Those headers will, for example, provide all the syscall numbers and definitions of all the structures used by their arguments.
The are, however, some "custom services" provided by kernel that are not handled by libc. One example is creating userspace programs that talk directly to some hardware drivers. That may require passing some data structures (so you need some struct definitions), knowing some magic numbers (so you need some defines), etc.
As an example, take a look at hid-example.c from kernel sources. It will, for example, call this ioctl:
struct hidraw_report_descriptor rpt_desc;
[...]
ioctl(fd, HIDIOCGRDESC, &rpt_desc);
But where did it get HIDIOCGRDESC or know the structure of struct hidraw_report_descriptor? They are of course defined in linux/hidraw.h which this application included.

What's the purpose of copy relocation?

BACKGROUND:
If an executable file has a external data reference, which is defined in a shared object, the compiler will use copy relocation and place a copy in its .bss section.
Copy relocation is detailed in this site:
http://www.shrubbery.net/solaris9ab/SUNWdev/LLM/p22.html#CHAPTER4-84604
However, my question is:
Is it possible to implement it through GOT, just like the external data reference in shared object? The executable can indirectly accesses this external code through its GOT entry, and this GOT entry can be stuffed with the real address of this symbol in run-time.
I don't know why GCC doesn't implement it like this. What's the upside of copy relocation?

In languages like C and C++ addresses of objects with static storage duration qualify as address constants. It means that conceptually, at language level they are treated as if their values are "known" at compile time.
Of course, this is not the case in reality, when it comes to the matter in question. To counter that the compiler-linker-loader combination has to implement a dynamic mechanism that would provide full support the language-level concept of address constant. Intuitively a GOT-based mechanism, being based on full run-time indirection, would be much farther away from that concept than a load-time relocation-based mechanism.
For one thing, C language was designed as a language that requires no dynamic initialization of objects with static storage duration, i.e. conceptually there's no initializing startup code and no issues related to the order of initialization. But in a GOT-based implementation an initialization of a global variable with such address constant would require startup code to extract the actual value from GOT and place it into the variable. Meanwhile, a relocation-based approach produces a full illusion of such global variable beginning its life with the proper value without any startup code.
If you look at the features provided by the relocation mechanism, you will notice that they are in sync with the C specification of address constant. E.g. the final value might involve adding a fixed offset, which is intended to act as a loader-side implementation of C [] and -> operators, permissible in C address constant expressions.

Is it possible to implement it through GOT, just like the external data reference in shared object?
Yes. For this to work, you'll need to build code that is linked into the main executable with -fPIC. Since that is often less efficient (extra indirection), and usually not done, the linker has to do copy relocations instead.
More info here.

Creating a list similar to .ctors from multiple object files

I'm currently at a point where I need to link in several modules (basically ELF object files) to my main executable due to a limitation of our target (background: kernel, targeting the ARM architecture). On other targets (x86 specifically) these object files would be loaded at runtime and a specific function in them would be called. At shutdown another function would be called. Both of these functions are exposed to the kernel as symbols, and this all works fine.
When the object files are statically linked however there's no way for the kernel to "detect" their presence so to speak, and therefore I need a way of telling the kernel about the presence of the init/fini functions without hardcoding their presence into the kernel - it needs to be extensible. I thought a solution to this might be to put all the init/fini function pointers into their own section - in much the same way you'd expect from .ctors and .dtors - and call through them at the relevant time.
Note that they can't actually go into .ctors, as they require specific support to be running by the time they're called (specifically threads and memory management, if you're interested).
What's the best way of going about putting a bunch of arbitrary function pointers into a specific section? Even better - is it possible to inject arbitrary data into a section, so I could also store stuff like module name (a struct rather than a function pointer, basically). Using GCC targeted to arm-elf.

GCC attributes can be used to specify a section:
__attribute__((section("foobar")))

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Serializing a CUfunction object - compilation

Is it possible to serialize a CUfunction object generated by NVRTC and save it on a non-volatile memory (disk, SSD, etc.) so that it can be used again later without having to go through the JIT compilation process?

I have not found a way to do this. The API exposes a way to retrieve PTX from an nvrtc compilation call, but not the resulting binary payload. You could use that PTX output to avoid calling an nvrtc session at every runtime, but that is about all that is possible.

Related

Golang new memory allocation

PIN_CALLER_TRACKS_DIRTY_DATA in User Mode

Why do user space apps need kernel headers?

What's the purpose of copy relocation?

Creating a list similar to .ctors from multiple object files

Categories

Resources