How to intercept file system access inside dlopen()?

How to intercept file system access inside dlopen()? - glibc

I want to intercept all file system access that occurs inside of dlopen(). At first, it would seem like LD_PRELOAD or -Wl,-wrap, would be viable solutions, but I have had trouble making them work due to some technical reasons:
ld.so has already mapped its own symbols by the time LD_PRELOAD is processed. It's not critical for me to intercept the initial loading, but the _dl_* worker functions are resolved at this time, so future calls go through them. I think LD_PRELOAD is too late.
Somehow malloc circumvents the issue above because the malloc() inside of ld.so does not have a functional free(), it just calls memset().
The file system worker functions, e.g. __libc_read(), contained in ld.so are static so I can't intercept them with -Wl,-wrap,__libc_read.
This might all mean that I need to build my own ld.so directly from source instead of linking it into a wrapper. The challenge there is that both libc and rtld-libc are built from the same source. I know that the macro IS_IN_rtld is defined when building rtld-libc, but how can I guarantee that there is only one copy of static data structures while still exporting a public interface function? (This is a glibc build system question, but I haven't found documentation of these details.)
Are there any better ways to get inside dlopen()?
Note: I can't use a Linux-specific solution like FUSE because this is for minimal "compute-node" kernels that do not support such things.

it would seem like LD_PRELOAD or -Wl,-wrap, would be viable solutions
The --wrap solution could not possibly be viable: it works only at (static) link time, and your ld.so and libc.so.6 and libdl.so.2 have all already been linked, so now it is too late to use --wrap.
The LD_PRELOAD could have worked, except ... ld.so considers the fact that dlopen() calls open() an internal implementation detail. As such, it just calls the internal __open function, bypassing PLT, and your ability to interpose open with it.
Somehow malloc circumvents the issue
That's because libc supports users who implement their own malloc (e.g. for debugging purposes). So the call to e.g. calloc from dlopen does go through PLT, and is interposable via LD_PRELOAD.
This might all mean that I need to build my own ld.so directly from source instead of linking it into a wrapper.
What will the rebuilt ld.so do? I think you want it to call __libc_open (in libc.so.6), but that can't possibly work for obvious reason: it is ld.so that opens libc.so.6 in the first place (at process startup).
You could rebuild ld.so with the call to __open replaced with a call to open. That will cause ld.so to go through PLT, and expose it to LD_PRELOAD interposition.
If you go that route, I suggest that you don't overwrite the system ld.so with your new copy (the chance of making a mistake and rendering the system unbootable is just too great). Instead, install it to e.g. /usr/local/my-ld.so, and then link your binaries with -Wl,--dynamic-linker=/usr/local/my-ld.so.
Another alternative: runtime patching. This is a bit of a hack, but you can (once you gain control in main) simply scan the .text of ld.so, and look for CALL __open instructions. If ld.so is not stripped, then you can find both the internal __open, and the functions you want to patch (e.g. open_verify in dl-load.c). Once you find the interesting CALL, mprotect the page that contains it to be writable, and patch in the address of your own interposer (which can in turn call __libc_open if it needs to), then mprotect it back. Any future dlopen() will now go through your interposer.

Related

actual machine code to execute what Win APIs do stays in OS kernel memory space or compiled together as part of the app?

If this question deals with too basic a matter, please forgive me.
As a somewhat-close-to-beginner-level programmer, I really wonder about this--whether the underlying code of every win API function is compiled altogether at the time of writing an app, or whether the machine code for executing win APIs stays in the memory as part of the OS since the pc is booted up, and only the app uses them?
All the APIs for an OS are used by many apps by means of function call. So I thought that rather than making every individual app include the API machine code on their own, apps just contain the header or signature to call the APIs and the API machine code addresses are mapped when launching the app.
I am sorry that I failed to make this question succinct due to my poor English. I really would like to get your insights. Thank you.

The implementation for (most) API calls is provided by the system by way of compiled modules (Portable Executable images). Application code only contains enough information so that the system can identify and load the required modules, and resolve the respective imports.
As an example consider the following code that shows a message box, waits for it to close, and then exits the program:
#include <Windows.h>
int main()
{
::MessageBoxW(nullptr, L"Foo", L"Bar", MB_OK);
}
Given the function signature (declared in WinUser.h, which gets pulled in from Windows.h) the compiler can almost generate a call instruction. It knows the number of arguments, their expected types, and the order and location the callee expects them in. What's missing is the actual target address inside user32.dll, that's only known after a process was fully initialized, and had the user32.dll module mapped into its address space.
Clearly, the compiler cannot postpone code generation until after load time. It needs to generate a call instruction now. Since we know that "all problems in computer science can be solved by another level of indirection" that's what the compiler does, too: Instead of emitting a direct call instruction it generates an indirect call. The difference is that, while a direct call immediately needs to provide the target address, an indirect call can specify the address at which the target address is stored.
In x86 assembly, instead of having to say
call _MessageBoxW#16 ; uh-oh, not yet known
the compiler can conveniently delegate the call to the Import Address Table (IAT):
call dword ptr [__imp__MessageBoxW#16]
Disaster averted, we've bought us just enough time to fix things up before the code actually executes.
Once a process object is created the system hands over control to its primary thread to finish initialization. Part of that initialization is loading dependencies (such as user32.dll here). Once that has completed, the system finally knows the load address (and ultimately the address of imported symbols, such as _MessageBoxW#16), and can overwrite the IAT entry at address __imp__MessageBoxW#16 with the imported function address.
And that is approximately how the system provides implementations for system services without requiring client applications to know where (physically) they will find them.
I'm saying "approximately" because things are somewhat more involved in reality. If that is something you'll want to learn about, I'll leave it up to Raymond Chen. He has published a series of blog entries covering this topic in far more detail:
How were DLL functions exported in 16-bit Windows?
How were DLL functions imported in 16-bit Windows?
How are DLL functions exported in 32-bit Windows?
Exported functions that are really forwarders
Rethinking the way DLL exports are resolved for 32-bit Windows
Calling an imported function, the naive way
How a less naive compiler calls an imported function
Issues related to forcing a stub to be created for an imported function
What happens when you get dllimport wrong?
Names in the import library are decorated for a reason
Why can't I GetProcAddress a function I dllexport'ed?

Linking shared library in linux kernel

I would like to modify the linux kernel.
I would like to use functions from a shared library (an .so file) in file kernel/panic.c.
Unfortunately I don't know how to compile it.
When I put it in to the Makefile I receive the following error:
ld: attempted static link of dynamic object.
Is there a way to put the shared library file to the Linux kernel or do I need to recompile my library to gain an object file.

It is not possible to link shared library into kernel code (ELF shared objects are a user-space thing, using ld-linux(8)...) You should consider making a kernel module (and use modprobe(8) to load it). Read Loadable Kernel Module HowTo.
kernel modules *.ko are conceptually similar to shared objects *.so but the linking mechanism is different.
BTW, you generally should avoid writing kernel code and should prefer coding application code. In other words, modifying the kernel is generally a bad idea and is frowned upon.
Also, the API available in kernel space is not the same as user space API (which extends the C standard library and POSIX functions). For example, kernel modules (and kernel code) don't have (so cannot call) fopen or fprintf or fork; the kernel is a freestanding C application. Also, kernel code cannot use any floating point operation!
Userland applications are interacting with the kernel using system calls listed in syscalls(2) (and the libc is using them, e.g. for printf or system(3)). Kernel code (including kernel modules) cannot use directly syscalls (since they are provided by the kernel, see syscalls(2)).
Read also Advanced Linux Programming (mostly about application programming) and Operating Systems: Three Easy Pieces (to get a broader view about OSes).

Whether the APIs in kernel32.dll (or others) have subrutines

I was wondering that whether the APIs in kernel32.dll (or others) have subrutines.
For example the CopyFile function, it should take different action to copy file from C: to D: and from a netshare path (\HOSTNAME\SHAREDFOLDER\FILENAME) to somewhere, or trigger the windows server 2012 (hyper-v) new feature ODX.
So in the definition of the CopyFile function, there should be some if/else branch, and call some sub function, isn't it?
If the subrutines exist. Is it possible to call the these sub functions directly, and is it possible to hook them?
Thanks.

As far as I know, the current implementation of kernel32.dll calls functions in ntdll.dll. The functions in ntdll.dll then do a syscall into the kernel somehow.
To answer your question, yes, it calls subroutines, and they probably can be hooked, but most of the logic about how specifically to read from and write to filesystems in different ways is probably buried in the kernel.
Keep in mind that you're probably not supposed to be digging into the internals of these DLLs — it's best to use the public interface. Relying on implementation details makes your code more fragile and likely to break with operating system upgrades.

How to defeat framework injections?

Is anyone hardening their code in an attempt to detect injections? For example, if someone is trying to intercept a username/password via NSUrlConnection, they could use LD_PRELOAD/DYLD_LIBRARY_PATH, provide exports for my calls into NSUrlConnection, and then forward the calls to the real NSUrlConnection.
Ali gave excellent information below, but I'm trying to determine what measures should be take for a hostile environment, where a phone might be jail broken. Most applications don't have to care, but one class of apps do - high integrity software.
If you are hardening, what method(s) are you using? Is there a standard way to detect injections on Macs and iPhones? How are you defeating framework injections?

For iOS / CocoaTouch, loading dynamic libraries is not allowed* (except for the System frameworks). To build and distribute an Application thru the AppStore, you can only link with static libraries and system frameworks, no dynamic library.
So on iOS you can't use that for code injection, neither can you use LD_PRELOAD of course (as you don't have access to such environment variables on iOS).
Except for jailbroken iPhones probably, but people jailbreaking their iPhone should take upon themselves that jailbreaking is by definition lifting all securities provided by iOS to avoid things such as injections (so you can't expect to remove the lock on your door to avoid having to use your key… and still expect that you're still protected against thieves robbing your house ;-))
That's the advantage of the Sandboxing + CodeSigning + No dylib constraints on iOS. No Code injection possible.
(On OSX it is still possible anyway, inparticular using LD_PRELOAD)
[EDIT] Since iOS8, iOS also allows dynamic frameworks. But as that's still sandboxed (you can only load code-signed frameworks that are inside your application bundles, and can't load frameworks that comes from outside your app bundle) injection is still not possible*
*except if the user jailbreaks its phone but it means that s/he chose to get rid of all protections and purpose and thus put its phone at risk — we can't crack our phone security and still expect it to provide all the protections those securities provided

This is an answer specific to UNIX like operating systems, I apologize if it doesn't make sense for your question but I don't know your platform well. Simply don't create a dynamically linked executable.
There are two ways I can think of to do this. Method #2 is probably best for you. They're both similar.
Important for both, the executable must be statically compiled using -static at build time
Method 1 - static exe, manual load shared libraries by their trusted full paths
Manually dlopen each library you need via a full path and then get the function addresses via dlsym at runtime and assign them to function pointers to use them. You'll need to do this for every external function you want to use. I believe reentrant unsafe functions won't like this so for those that use static variables- you'll need to use the reentrant safe versions, these end with "_r" i.e. use strtok_r instead of strtok
This will be difficult or simple depending on what your app does and how many functions you're using.
Method 2 - Statically link the executable, period
You can solve your subversion problem by just linking a static executable to avoid using dynamic libraries at all. This will generate a much larger exe than the the dlopen()/dlsym() method. Build using the -static compile flag and instead of using, for example gcc bah.c -o bah lssl use gcc -static bah.c -o bah /usr/lib/libssl.a to use the statically compiled version of the libraries you need instead of the dynamic shared libraries. In other words, use -static and don't use -l while building
For either method:
Once built, use file bah to confirm the executable is statically linked. Or confirm by running ldd on it
Note you'll need statically compiled versions of all the libraries you're linking against present in your system. These files end with.a instead of .so)
Also note upgrading system libraries will not update your executable. If there's a new security bug in OpenSSL, you'll need to get the latest libssl.a and recompile it. If you use the dlopen()/dlsym() method you won't have this problem but you will have portability issues if symbols change in different versions
Each method has its pros and cons based on your needs.
Taking the method 1 dlopen and dlsym approach makes your code more "obfuscated" and smaller, but sacrifices portability in most cases so probably isn't what you want. The upside is that it can possibly benefit when security bugs are fixed system wide.

Working around fls limitations with too many statically linked CRTs?

When loading external DLLs (not under our control) via LoadLibrary, we're hitting a problem where the statically linked CRT in those DLLs are failing to allocate fiber-local storage. This is similar to mskb 193462, except that this is FLS and there's only 128 of them.
Are there any useful ways to work around the problem? The CRT is using GetProcAddress to find FlsAlloc anyway (since that apparently never existed in XP), so does it even really need it?
(This is on Vista, where FlsAlloc actually exists; the DLLs appear to be using MSVC8)

There is frankly no solution here, short of loading less dlls.
You could hook the dll's import address table - but that will happen too late as you can only install an IAT hook when LoadLibrary returns, and the CRT initialization code probably executes in response to DllProcessAttach which will already have been processed.
You could I guess find the kernel32.dll module in memory, and patch the export address for GetProcAddress or perhaps FlsAlloc to point to your implementation. But that approach is getting seriously hackish.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio