How To Structure Large OpenCL Kernels?

How To Structure Large OpenCL Kernels? - coding-style

I have worked with OpenCL on a couple of projects, but have always written the kernel as one (sometimes rather large) function. Now I am working on a more complex project and would like to share functions across several kernels.
But the examples I can find all show the kernel as a single file (very few even call secondary functions). It seems like it should be possible to use multiple files - clCreateProgramWithSource() accepts multiple strings (and combines them, I assume) - although pyopencl's Program() takes only a single source.
So I would like to hear from anyone with experience doing this:
Are there any problems associated with multiple source files?
Is the best workaround for pyopencl to simply concatenate files?
Is there any way to compile a library of functions (instead of passing in the library source with each kernel, even if not all are used)?
If it's necessary to pass in the library source every time, are unused functions discarded (no overhead)?
Any other best practices/suggestions?
Thanks.

I don't think OpenCL has a concept of multiple source files in a program - a program is one compilation unit. You can, however, use #include and pull in headers or other .cl files at compile time.
You can have multiple kernels in an OpenCL program - so, after one compilation, you can invoke any of the set of kernels compiled.
Any code not used - functions, or anything statically known to be unreachable - can be assumed to be eliminated during compilation, at some minor cost to compile time.

In OpenCL 1.2 you link different object files together.

Related

Is it possible to find sizes of structures declared in a DLL?

We have the situation where we have a large number of inter-related DLLs that form our product. This is a very "old" product (as in it has been developed over 20 years) and has suffered in the past from different defaults for structure packing over several versions of Visual Studio.
So, in the many cases where #pragma pack has not been used in the DLL header files, but the structure alignment has been set instead in the project properties, we can have the situation where a project that imports the DLL (via its lib and header) has a different structure alignment and potentially results in mismatched structure sizes.
This is complicated by the fact that structs can be sized correctly by "accident" - e.g. if all members of the struct are unsigned int then pack(4) in the DLL and pack(2) in the importing project can work ok. Until, of course, someone amends a struct to add a bool for example.
I would like to remove all of this potential confusion by adding #pragma pack statements to the header files of all exporting modules but would first like to assess whether we have any such exposures in our current code (thinking about hard-to-track runtime bugs here). Also, it may be useful to introduce some automated checking into our build process to make sure we never hit these situations, even with third-party DLLs or LIBs.
So, my question:
Is it possible, from a compiled DLL, or its associated LIB, to determine what structure alignment was in force at the time the DLL was compiled? Similarly, is it possible to discover this for an EXE?
What am I wondering is if there is anything in the PE format or LIB (is that COFF?) that can be used to find this information?
UPDATE
Well, no good came from examining libs and dlls with dumpbin, so I'm going to try to get some info from the PDB files we generate from our Release builds. I found this as a starting point...

I would say that is not possible. C++ doesn't have type-information applied to it (unless enabled with RTTI, but won't be of much help for this problem). Structure is nothing but a sequence of bytes, for the programmer. Compiler will replace the variable.member with appropriate byte-alignment to access that data.
I doubt you have correct debugging information (i.e. PDB file) for the DLL to lookup the symbols. Even with that, it is not possible to find "packing" of a structure.
I have faced problem with structure sizes in different EXE/DLLs (having full source code), where sizeof is only tool we can use to find the difference (and go nested to find the root of problem). Even with this technique, it it not possible which packing is enabled for a particular structure.

LoadLibrary from offset in a file

I am writing a scriptable game engine, for which I have a large number of classes that perform various tasks. The size of the engine is growing rapidly, and so I thought of splitting the large executable up into dll modules so that only the components that the game writer actually uses can be included. When the user compiles their game (which is to say their script), I want the correct dll's to be part of the final executable. I already have quite a bit of overlay data, so I figured I might be able to store the dll's as part of this block. My question boils down to this:
Is it possible to trick LoadLibrary to start reading the file at a certain offset? That would save me from having to either extract the dll into a temporary file which is not clean, or alternatively scrapping the automatic inclusion of dll's altogether and simply instructing my users to package the dll's along with their games.
Initially I thought of going for the "load dll from memory" approach but rejected it on grounds of portability and simply because it seems like such a horrible hack.
Any thoughts?
Kind regards,
Philip Bennefall

You are trying to solve a problem that doesn't exist. Loading a DLL doesn't actually require any physical memory. Windows creates a memory mapped file for the DLL content. Code from the DLL only ever gets loaded when your program calls that code. Unused code doesn't require any system resources beyond reserved memory pages. You have 2 billion bytes worth of that on a 32-bit operating system. You have to write a lot of code to consume them all, 50 megabytes of machine code is already a very large program.
The memory mapping is also the reason you cannot make LoadLibrary() do what you want to do. There is no realistic scenario where you need to.
Look into the linker's /DELAYLOAD option to improve startup performance.

I think every solution for that task is "horrible hack" and nothing more.
Simplest way that I see is create your own virtual drive that present custom filesystem and hacks system access path from one real file (compilation of your libraries) to multiple separate DLL-s. For example like TrueCrypt does (it's open-source). And than you may use LoadLibrary function without changes.
But only right way I see is change your task and don't use this approach. I think you need to create your own script interpreter and compiler, using structures, pointers and so on.
The main thing is that I don't understand your benefit from use of libraries. I think any compiled code in current time does not weigh so much and may be packed very good. Any other resources may be loaded dynamically at first call. All you need to do is to organize the working cycles of all components of the script engine in right way.

What might cause a slight difference in binaries when compiled at different times using make?

I compiled my code using the make utility and got the binaries.
I compiled the code again with a few changes in makefile (-j inserted at some points) and got a slight difference in the binaries. The difference was reported by "beyond compare". To further check in, I compiled the code again without my changes in makefile and found that the binaries are still differing.
Why is it happening that the same code compiled at different times is resulting into slightly different (in size and content) binaries? How should if check if the changes i have made are legitimate and the binaries are the same logically?
Do ask me for any further explanation.

You haven't said what you're building (C, C++ etc) but I wouldn't be surprised if it's a timestamp.
You could find out the format for the binary type you're building (which will depend on your operating system) and see whether it makes sense for there to be a timestamp in the place which is changing.
It's probably easiest to do this on a tiny sample program which will produce a very small binary, to make it easier to work out what everything means.

ELF object files contain a timestamp for when they are compiled. Thus, you can expect a different object file each time you compile (on Linux or Solaris). You may find the same with other systems of object files too.

Creating a list similar to .ctors from multiple object files

I'm currently at a point where I need to link in several modules (basically ELF object files) to my main executable due to a limitation of our target (background: kernel, targeting the ARM architecture). On other targets (x86 specifically) these object files would be loaded at runtime and a specific function in them would be called. At shutdown another function would be called. Both of these functions are exposed to the kernel as symbols, and this all works fine.
When the object files are statically linked however there's no way for the kernel to "detect" their presence so to speak, and therefore I need a way of telling the kernel about the presence of the init/fini functions without hardcoding their presence into the kernel - it needs to be extensible. I thought a solution to this might be to put all the init/fini function pointers into their own section - in much the same way you'd expect from .ctors and .dtors - and call through them at the relevant time.
Note that they can't actually go into .ctors, as they require specific support to be running by the time they're called (specifically threads and memory management, if you're interested).
What's the best way of going about putting a bunch of arbitrary function pointers into a specific section? Even better - is it possible to inject arbitrary data into a section, so I could also store stuff like module name (a struct rather than a function pointer, basically). Using GCC targeted to arm-elf.

GCC attributes can be used to specify a section:
__attribute__((section("foobar")))

C Runtime objects, dll boundaries

What is the best way to design a C API for dlls which deals with the problem of passing "objects" which are C runtime dependent (FILE*, pointer returned by malloc, etc...). For example, if two dlls are linked with a different version of the runtime, my understanding is that you cannot pass a FILE* from one dll to the other safely.
Is the only solution to use windows-dependent API (which are guaranteed to work across dlls) ? The C API already exists and is mature, but was designed from a unix POV, mostly (and still has to work on unix, of course).

You asked for a C, not a C++ solution.
The usual method(s) for doing this kind of thing in C are:
Design the modules API to simply not require CRT objects. Get stuff passed accross in raw C types - i.e. get the consumer to load the file and simply pass you the pointer. Or, get the consumer to pass a fully qualified file name, that is opened , read, and closed, internally.
An approach used by other c modules, the MS cabinet SD and parts of the OpenSSL library iirc come to mind, get the consuming application to pass in pointers to functions to the initialization function. So, any API you pass a FILE* to would at some point during initialization have taken a pointer to a struct with function pointers matching the signatures of fread, fopen etc. When dealing with the external FILE*s the dll always uses the passed in functions rather than the CRT functions.
With some simple tricks like this you can make your C DLLs interface entirely independent of the hosts CRT - or in fact require the host to be written in C or C++ at all.

Neither existing answer is correct: Given the following on Windows: you have two DLLs, each is statically linked with two different versions of the C/C++ standard libraries.
In this case, you should not pass pointers to structures created by the C/C++ standard library in one DLL to the other. The reason is that these structures may be different between the two C/C++ standard library implementations.
The other thing you should not do is free a pointer allocated by new or malloc from one DLL that was allocated in the other. The heap manger may be differently implemented as well.
Note, you can use the pointers between the DLLs - they just point to memory. It is the free that is the issue.
Now, you may find that this works, but if it does, then you are just luck. This is likely to cause you problems in the future.
One potential solution to your problem is dynamically linking to the CRT. For example,you could dynamically link to MSVCRT.DLL. That way your DLL's will always use the same CRT.
Note, I suggest that it is not a best practice to pass CRT data structures between DLLs. You might want to see if you can factor things better.
Note, I am not a Linux/Unix expert - but you will have the same issues on those OSes as well.

The problem with the different runtimes isn't solvable because the FILE* struct belongs
to one runtime on a windows system.
But if you write a small wrapper Interface your done and it does not really hurt.
stdcall IFile* IFileFactory(const char* filename, const char* mode);
class IFile {
virtual fwrite(...) = 0;
virtual fread(...) = 0;
virtual delete() = 0;
}
This is save to be passed accross dll boundaries everywhere and does not really hurt.
P.S.: Be careful if you start throwing exceptions across dll boundaries. This will work quiet well if you fulfill some design creterions on windows OS but will fail on some others.

If the C API exists and is mature, bypassing the CRT internally by using pure Win32 API stuff gets you half the way. The other half is making sure the DLL's user uses the corresponding Win32 API functions. This will make your API less portable, in both use and documentation. Also, even if you go this way with memory allocation, where both the CRT functions and the Win32 ones deal with void*, you're still in trouble with the file stuff - Win32 API uses handles, and knows nothing about the FILE structure.
I'm not quite sure what are the limitations of the FILE*, but I assume the problem is the same as with CRT allocations across modules. MSVCRT uses Win32 internally to handle the file operations, and the underlying file handle can be used from every module within the same process. What might not work is closing a file that was opened by another module, which involves freeing the FILE structure on a possibly different CRT.
What I would do, if changing the API is still an option, is export cleanup functions for any possible "object" created within the DLL. These cleanup functions will handle the disposal of the given object in the way that corresponds to the way it was created within that DLL. This will also make the DLL absolutely portable in terms of usage. The only worry you'll have then is making sure the DLL's user does indeed use your cleanup functions rather than the regular CRT ones. This can be done using several tricks, which deserve another question...

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How To Structure Large OpenCL Kernels? - coding-style

In OpenCL 1.2 you link different object files together.

Related

Is it possible to find sizes of structures declared in a DLL?

LoadLibrary from offset in a file

What might cause a slight difference in binaries when compiled at different times using make?

Creating a list similar to .ctors from multiple object files

C Runtime objects, dll boundaries

Categories

Resources