C Runtime objects, dll boundaries - windows

What is the best way to design a C API for dlls which deals with the problem of passing "objects" which are C runtime dependent (FILE*, pointer returned by malloc, etc...). For example, if two dlls are linked with a different version of the runtime, my understanding is that you cannot pass a FILE* from one dll to the other safely.
Is the only solution to use windows-dependent API (which are guaranteed to work across dlls) ? The C API already exists and is mature, but was designed from a unix POV, mostly (and still has to work on unix, of course).

You asked for a C, not a C++ solution.
The usual method(s) for doing this kind of thing in C are:
Design the modules API to simply not require CRT objects. Get stuff passed accross in raw C types - i.e. get the consumer to load the file and simply pass you the pointer. Or, get the consumer to pass a fully qualified file name, that is opened , read, and closed, internally.
An approach used by other c modules, the MS cabinet SD and parts of the OpenSSL library iirc come to mind, get the consuming application to pass in pointers to functions to the initialization function. So, any API you pass a FILE* to would at some point during initialization have taken a pointer to a struct with function pointers matching the signatures of fread, fopen etc. When dealing with the external FILE*s the dll always uses the passed in functions rather than the CRT functions.
With some simple tricks like this you can make your C DLLs interface entirely independent of the hosts CRT - or in fact require the host to be written in C or C++ at all.

Neither existing answer is correct: Given the following on Windows: you have two DLLs, each is statically linked with two different versions of the C/C++ standard libraries.
In this case, you should not pass pointers to structures created by the C/C++ standard library in one DLL to the other. The reason is that these structures may be different between the two C/C++ standard library implementations.
The other thing you should not do is free a pointer allocated by new or malloc from one DLL that was allocated in the other. The heap manger may be differently implemented as well.
Note, you can use the pointers between the DLLs - they just point to memory. It is the free that is the issue.
Now, you may find that this works, but if it does, then you are just luck. This is likely to cause you problems in the future.
One potential solution to your problem is dynamically linking to the CRT. For example,you could dynamically link to MSVCRT.DLL. That way your DLL's will always use the same CRT.
Note, I suggest that it is not a best practice to pass CRT data structures between DLLs. You might want to see if you can factor things better.
Note, I am not a Linux/Unix expert - but you will have the same issues on those OSes as well.

The problem with the different runtimes isn't solvable because the FILE* struct belongs
to one runtime on a windows system.
But if you write a small wrapper Interface your done and it does not really hurt.
stdcall IFile* IFileFactory(const char* filename, const char* mode);
class IFile {
virtual fwrite(...) = 0;
virtual fread(...) = 0;
virtual delete() = 0;
}
This is save to be passed accross dll boundaries everywhere and does not really hurt.
P.S.: Be careful if you start throwing exceptions across dll boundaries. This will work quiet well if you fulfill some design creterions on windows OS but will fail on some others.

If the C API exists and is mature, bypassing the CRT internally by using pure Win32 API stuff gets you half the way. The other half is making sure the DLL's user uses the corresponding Win32 API functions. This will make your API less portable, in both use and documentation. Also, even if you go this way with memory allocation, where both the CRT functions and the Win32 ones deal with void*, you're still in trouble with the file stuff - Win32 API uses handles, and knows nothing about the FILE structure.
I'm not quite sure what are the limitations of the FILE*, but I assume the problem is the same as with CRT allocations across modules. MSVCRT uses Win32 internally to handle the file operations, and the underlying file handle can be used from every module within the same process. What might not work is closing a file that was opened by another module, which involves freeing the FILE structure on a possibly different CRT.
What I would do, if changing the API is still an option, is export cleanup functions for any possible "object" created within the DLL. These cleanup functions will handle the disposal of the given object in the way that corresponds to the way it was created within that DLL. This will also make the DLL absolutely portable in terms of usage. The only worry you'll have then is making sure the DLL's user does indeed use your cleanup functions rather than the regular CRT ones. This can be done using several tricks, which deserve another question...

Related

Why do user space apps need kernel headers?

I am studying a smartphone project. During compilation process it's installing kernel header files for user space building.
Why do user space apps need kernel headers?
In general, those headers are needed because userspace applications often talk to kernel, passing some data. To do this, they have to agree on the structure of data passed between them.
Most of the kernel headers are only needed by libc library (if you're using one) as it usually hides all the lowlevel aspects from you by the providing abstractions conforming to some standards like POSIX (it will usually provide its own include files). Those headers will, for example, provide all the syscall numbers and definitions of all the structures used by their arguments.
The are, however, some "custom services" provided by kernel that are not handled by libc. One example is creating userspace programs that talk directly to some hardware drivers. That may require passing some data structures (so you need some struct definitions), knowing some magic numbers (so you need some defines), etc.
As an example, take a look at hid-example.c from kernel sources. It will, for example, call this ioctl:
struct hidraw_report_descriptor rpt_desc;
[...]
ioctl(fd, HIDIOCGRDESC, &rpt_desc);
But where did it get HIDIOCGRDESC or know the structure of struct hidraw_report_descriptor? They are of course defined in linux/hidraw.h which this application included.

Extending functionality of existing program I don't have source for

I'm working on a third-party program that aggregates data from a bunch of different, existing Windows programs. Each program has a mechanism for exporting the data via the GUI. The most brain-dead approach would have me generate extracts by using AutoIt or some other GUI manipulation program to generate the extractions via the GUI. The problem with this is that people might be interacting with the computer when, suddenly, some automated program takes over. That's no good. What I really want to do is somehow have a program run once a day and silently (i.e. without popping up any GUIs) export the data from each program.
My research is telling me that I need to hook each application (assume these applications are always running) and inject a custom DLL to trigger each export. Am I remotely close to being on the right track? I'm a fairly experienced software dev, but I don't know a whole lot about reverse engineering or hooking. Any advice or direction would be greatly appreciated.
Edit: I'm trying to manage the availability of a certain type of professional. Their schedules are stored in proprietary systems. With their permission, I want to install an app on their system that extracts their schedule from whichever system they are using and uploads the information to a central server so that I can present that information to potential clients.
I am aware of four ways of extracting the information you want, both with their advantages and disadvantages. Before you do anything, you need to be aware that any solution you create is not guaranteed and in fact very unlikely to continue working should the target application ever update. The reason is that in each case, you are relying on an implementation detail instead of a pre-defined interface through which to export your data.
Hooking the GUI
The first way is to hook the GUI as you have suggested. What you are doing in this case is simply reading off from what an actual user would see. This is in general easier, since you are hooking the WinAPI which is clearly defined. One danger is that what the program displays is inconsistent or incomplete in comparison to the internal data it is supposed to be representing.
Typically, there are two common ways to perform WinAPI hooking:
DLL Injection. You create a DLL which you load into the other program's virtual address space. This means that you have read/write access (writable access can be gained with VirtualProtect) to the target's entire memory. From here you can trampoline the functions which are called to set UI information. For example, to check if a window has changed its text, you might trampoline the SetWindowText function. Note every control has different interfaces used to set what they are displaying. In this case, you are hooking the functions called by the code to set the display.
SetWindowsHookEx. Under the covers, this works similarly to DLL injection and in this case is really just another method for you to extend/subvert the control flow of messages received by controls. What you want to do in this case is hook the window procedures of each child control. For example, when an item is added to a ComboBox, it would receive a CB_ADDSTRING message. In this case, you are hooking the messages that are received when the display changes.
One caveat with this approach is that it will only work if the target is using or extending WinAPI controls.
Reading from the GUI
Instead of hooking the GUI, you can alternatively use WinAPI to read directly from the target windows. However, in some cases this may not be allowed. There is not much to do in this case but to try and see if it works. This may in fact be the easiest approach. Typically, you will send messages such as WM_GETTEXT to query the target window for what it is currently displaying. To do this, you will need to obtain the exact window hierarchy containing the control you are interested in. For example, say you want to read an edit control, you will need to see what parent window/s are above it in the window hierarchy in order to obtain its window handle.
Reading from memory (Advanced)
This approach is by far the most complicated but if you are able to fully reverse engineer the target program, it is the most likely to get you consistent data. This approach works by you reading the memory from the target process. This technique is very commonly used in game hacking to add 'functionality' and to observe the internal state of the game.
Consider that as well as storing information in the GUI, programs often hold their own internal model of all the data. This is especially true when the controls used are virtual and simply query subsets of the data to be displayed. This is an example of a situation where the first two approaches would not be of much use. This data is often held in some sort of abstract data type such as a list or perhaps even an array. The trick is to find this list in memory and read the values off directly. This can be done externally with ReadProcessMemory or internally through DLL injection again. The difficulty lies mainly in two prerequisites:
Firstly, you must be able to reliably locate these data structures. The problem with this is that code is not guaranteed to be in the same place, especially with features such as ASLR. Colloquially, this is sometimes referred to as code-shifting. ASLR can be defeated by using the offset from a module base and dynamically getting the module base address with functions such as GetModuleHandle. As well as ASLR, a reason that this occurs is due to dynamic memory allocation (e.g. through malloc). In such cases, you will need to find a heap address storing the pointer (which would for example be the return of malloc), dereference that and find your list. That pointer would be prone to ASLR and instead of a pointer, it might be a double-pointer, triple-pointer, etc.
The second problem you face is that it would be rare for each list item to be a primitive type. For example, instead of a list of character arrays (strings), it is likely that you will be faced with a list of objects. You would need to further reverse engineer each object type and understand internal layouts (at least be able to determine offsets of primitive values you are interested in in terms of its offset from the object base). More advanced methods revolve around actually reverse engineering the vtable of objects and calling their 'API'.
You might notice that I am not able to give information here which is specific. The reason is that by its nature, using this method requires an intimate understanding of the target's internals and as such, the specifics are defined only by how the target has been programmed. Unless you have knowledge and experience of reverse engineering, it is unlikely you would want to go down this route.
Hooking the target's internal API (Advanced)
As with the above solution, instead of digging for data structures, you dig for the internal API. I briefly covered this with when discussing vtables earlier. Instead of doing this, you would be attempting to find internal APIs that are called when the GUI is modified. Typically, when a view/UI is modified, instead of directly calling the WinAPI to update it, a program will have its own wrapper function which it calls which in turn calls the WinAPI. You simply need to find this function and hook it. Again this is possible, but requires reverse engineering skills. You may find that you discover functions which you want to call yourself. In this case, as well as being able to locate the location of the function, you have to reverse engineer the parameters it takes, its calling convention and you will need to ensure calling the function has no side effects.
I would consider this approach to be advanced. It can certainly be done and is another common technique used in game hacking to observe internal states and to manipulate a target's behaviour, but is difficult!
The first two methods are well suited for reading data from WinAPI programs and are by far easier. The two latter methods allow greater flexibility. With enough work, you are able to read anything and everything encapsulated by the target but requires a lot of skill.
Another point of concern which may or may not relate to your case is how easy it will be to update your solution to work should the target every be updated. With the first two methods, it is more likely no changes or small changes have to be made. With the second two methods, even a small change in source code can cause a relocation of the offsets you are relying upon. One method of dealing with this is to use byte signatures to dynamically generate the offsets. I wrote another answer some time ago which addresses how this is done.
What I have written is only a brief summary of the various techniques that can be used for what you want to achieve. I may have missed approaches, but these are the most common ones I know of and have experience with. Since these are large topics in themselves, I would advise you ask a new question if you want to obtain more detail about any particular one. Note that in all of the approaches I have discussed, none of them suffer from any interaction which is visible to the outside world so you would have no problem with anything popping up. It would be, as you describe, 'silent'.
This is relevant information about detouring/trampolining which I have lifted from a previous answer I wrote:
If you are looking for ways that programs detour execution of other
processes, it is usually through one of two means:
Dynamic (Runtime) Detouring - This is the more common method and is what is used by libraries such as Microsoft Detours. Here is a
relevant paper where the first few bytes of a function are overwritten
to unconditionally branch to the instrumentation.
(Static) Binary Rewriting - This is a much less common method for rootkits, but is used by research projects. It allows detouring to be
performed by statically analysing and overwriting a binary. An old
(not publicly available) package for Windows that performs this is
Etch. This paper gives a high-level view of how it works
conceptually.
Although Detours demonstrates one method of dynamic detouring, there
are countless methods used in the industry, especially in the reverse
engineering and hacking arenas. These include the IAT and breakpoint
methods I mentioned above. To 'point you in the right direction' for
these, you should look at 'research' performed in the fields of
research projects and reverse engineering.

How To Structure Large OpenCL Kernels?

I have worked with OpenCL on a couple of projects, but have always written the kernel as one (sometimes rather large) function. Now I am working on a more complex project and would like to share functions across several kernels.
But the examples I can find all show the kernel as a single file (very few even call secondary functions). It seems like it should be possible to use multiple files - clCreateProgramWithSource() accepts multiple strings (and combines them, I assume) - although pyopencl's Program() takes only a single source.
So I would like to hear from anyone with experience doing this:
Are there any problems associated with multiple source files?
Is the best workaround for pyopencl to simply concatenate files?
Is there any way to compile a library of functions (instead of passing in the library source with each kernel, even if not all are used)?
If it's necessary to pass in the library source every time, are unused functions discarded (no overhead)?
Any other best practices/suggestions?
Thanks.
I don't think OpenCL has a concept of multiple source files in a program - a program is one compilation unit. You can, however, use #include and pull in headers or other .cl files at compile time.
You can have multiple kernels in an OpenCL program - so, after one compilation, you can invoke any of the set of kernels compiled.
Any code not used - functions, or anything statically known to be unreachable - can be assumed to be eliminated during compilation, at some minor cost to compile time.
In OpenCL 1.2 you link different object files together.

LoadLibrary from offset in a file

I am writing a scriptable game engine, for which I have a large number of classes that perform various tasks. The size of the engine is growing rapidly, and so I thought of splitting the large executable up into dll modules so that only the components that the game writer actually uses can be included. When the user compiles their game (which is to say their script), I want the correct dll's to be part of the final executable. I already have quite a bit of overlay data, so I figured I might be able to store the dll's as part of this block. My question boils down to this:
Is it possible to trick LoadLibrary to start reading the file at a certain offset? That would save me from having to either extract the dll into a temporary file which is not clean, or alternatively scrapping the automatic inclusion of dll's altogether and simply instructing my users to package the dll's along with their games.
Initially I thought of going for the "load dll from memory" approach but rejected it on grounds of portability and simply because it seems like such a horrible hack.
Any thoughts?
Kind regards,
Philip Bennefall
You are trying to solve a problem that doesn't exist. Loading a DLL doesn't actually require any physical memory. Windows creates a memory mapped file for the DLL content. Code from the DLL only ever gets loaded when your program calls that code. Unused code doesn't require any system resources beyond reserved memory pages. You have 2 billion bytes worth of that on a 32-bit operating system. You have to write a lot of code to consume them all, 50 megabytes of machine code is already a very large program.
The memory mapping is also the reason you cannot make LoadLibrary() do what you want to do. There is no realistic scenario where you need to.
Look into the linker's /DELAYLOAD option to improve startup performance.
I think every solution for that task is "horrible hack" and nothing more.
Simplest way that I see is create your own virtual drive that present custom filesystem and hacks system access path from one real file (compilation of your libraries) to multiple separate DLL-s. For example like TrueCrypt does (it's open-source). And than you may use LoadLibrary function without changes.
But only right way I see is change your task and don't use this approach. I think you need to create your own script interpreter and compiler, using structures, pointers and so on.
The main thing is that I don't understand your benefit from use of libraries. I think any compiled code in current time does not weigh so much and may be packed very good. Any other resources may be loaded dynamically at first call. All you need to do is to organize the working cycles of all components of the script engine in right way.

Creating a list similar to .ctors from multiple object files

I'm currently at a point where I need to link in several modules (basically ELF object files) to my main executable due to a limitation of our target (background: kernel, targeting the ARM architecture). On other targets (x86 specifically) these object files would be loaded at runtime and a specific function in them would be called. At shutdown another function would be called. Both of these functions are exposed to the kernel as symbols, and this all works fine.
When the object files are statically linked however there's no way for the kernel to "detect" their presence so to speak, and therefore I need a way of telling the kernel about the presence of the init/fini functions without hardcoding their presence into the kernel - it needs to be extensible. I thought a solution to this might be to put all the init/fini function pointers into their own section - in much the same way you'd expect from .ctors and .dtors - and call through them at the relevant time.
Note that they can't actually go into .ctors, as they require specific support to be running by the time they're called (specifically threads and memory management, if you're interested).
What's the best way of going about putting a bunch of arbitrary function pointers into a specific section? Even better - is it possible to inject arbitrary data into a section, so I could also store stuff like module name (a struct rather than a function pointer, basically). Using GCC targeted to arm-elf.
GCC attributes can be used to specify a section:
__attribute__((section("foobar")))

Resources