Force program to treat symlinks as real files using LD_PRELOAD - symlink

GOAL: I want to force zpaq into backing up symlinks as though they were real files, possibly by fooling it (using LD_PRELOAD or some sort or FUSE system) into thinking symlinks are actual files.
I want to create/find a library that forces programs to read symlinks
as they were actual files, and then use LD_PRELOAD (or something
similar) to run the program in that environment.
In other words, when the program calls readdir() [or whatever], the
symlink appears as an actual file, and when the program calls open()
[or whatever], it opens the actual target file, not the symlink.
Is there any way to do this? The otherwise wonderful zpaq doesn't
support symlinks at the moment, and the files are on different drives,
so I can't use hard linking either.

So what is the problem? You seem to know about LD_PRELOAD already; all you need to do is write a library which exposes the correct functions and put it in LD_PRELOAD. This link explains the process a bit more verbosely if you need it.
The only potential issue is that calls to things in glibc don't always get linked to the symbols you might expect… for example, a call to write may actually call __write (if it doesn't get inlined to something even lower-level). Depending on your optimization level, some functions calls will actually be removed completely, such as memset with a fixed length. There are also checked variants for many functions if you're using _FORTIFY_SOURCE. I don't think this should be an issue for readlink, but you're probably just going to have to TIAS.
Basically, just do it. If it doesn't work, then come back to SO if you need help debugging.

Related

Suspend program execution if syscall with specific parameters called (GDB / strace)

Is there a straigtforward way with ready-at-hand tooling to suspend a traced process' execution when a certain syscalls are called with specific parameters? Specifically I want to suspend program execution whenever
stat("/${SOME_PATH}")
or
readlink("/${SOME_PATH}")
are called. I aim to then attach a debugger, so that I can identify which of the hundreds of shared objects that are linked into the process is trying to access that specific path.
strace shows me the syscalls alright, and gdb does the rest. The question is, how to bring them together. This surely can be solved with custom glue-scripting, but I'd rather use a clean solution.
The problem at hand is a 3rd party toolsuite which is available only in binary form and which distribution package completely violates the LSB/FHS and good manners and places shared objects all over the filesystem, some of which are loaded from unconfigurable paths. I'd like to identify which modules of the toolsuite try to do this and either patch the binaries or to file an issue with the vendor.
This is the approach that I use for similar condition in windows debugging. Even though I think it should be possible for you too, I have not tried it with gdb in linux.
When you attached your process, set breakpoint on your system call which is for example stat in your case.
Add a condition based on esp to your breakpoint. For example you want to check stat("/$te"). value at [esp+4] should point to address of string which in this case is "/$te". Then add a condition like: *(uint32_t*)[esp+4] == "/$te". It seems that you can use strcmp() in your condition too as described here.
I think something similar to this should work for you too.

Windows: redirect ReadFile to run process and pipe it's stdout

I was wondering how hard it would be to create a set-up under Windows where a regular ReadFile on certain files is being redirected by the file system to actually run (e.g. ShellExecute) those files, and then the new process' stdout is being used as the file content streamed out to the ReadFile call to the callee...
What I envision the set-up to look like, is that you can configure it to denote a certain folder as 'special', and that this extra functionality is then only available on that folder's content (so it doesn't need to be disk-wide). It might be accessible under a new drive letter, or a path parallel to the source folder; the location it is hooked up to is irrelevant to me.
To those of you that wonder if this is a classic xy problem: it might very well be ;) It's just that this idea has intrigued me, and I want to know what possibilities there are. In my particular case I want to employ it to #include content in my C++ code base, where the actual content included is being made up on the spot, different on each compile round. I could of course also create a script to create such content to include, call it as a pre-build step and leave it at that, but why choose the easy route.
Maybe there are already ready-made solutions for this? I did an extensive Google search for it, but came out empty handed. But then I'm not sure I already know all the keywords involved to do a good search...
When coding up something myself, I think a minifilter driver might be needed intercepting ReadFile calls, but then it must at that spot run usermode apps from kernel space - not a happy marriage I assume. Or use an existing file system driver framework that allows for usermode parts, but I found the price of existing solutions to be too steep for my taste (several thousand dollars).
And I also assume that a standard file system (minifilter) driver might be required to return a consistent file size for such files, although the actual data size returned through ReadFile would of course differ on each call. Not to mention negating any buffering that takes place.
All in all I think that a create-it-yourself solution will take quite some effort, especially when you have never done Windows driver development in your life :) Although I see myself quite capable of learning up on it, the time invested will be prohibitive I think.
Another approach might be to hook ReadFile calls from the process doing the ReadFile - via IAT hooking, or via code injection. But I want this solution to more work 'out-of-the-box', i.e. all ReadFile requests for these special files trigger the correct behavior, regardless of origin. In my case I'd need to intercept my C++ compiler (G++) behavior, but that one is called on the fly by the IDE, so I see no easy way to detect it's startup and hook it up quickly before it does it's ReadFiles. And besides, I only want certain files to be special in this regard; intercepting all ReadFiles for a certain process is overkill.
You want something like FUSE (which I used with profit many times), but for Windows. Apparently there's Dokan, I've never used it but seems to be well known enough (and, at very least, can be used as an inspiration to see "how it's done").

"Hiding" a system call from ltrace and strace

Is there a way to hide a system call from strace and a dynamic library call from ltrace? For example, the use of system (<stdlib.h>).
In the last class for my software construction this semester, the instructor revealed to us that we could have gotten away with using the system library function call in many parts of the command shell project we were assigned instead of the more complicated fork, exec, readdir, stat, dup, and pipe system calls we were told to use.
The way system works, he said, is you simply pass in a string of the command you want to execute: system("cmd [flags] [args]; cmd && cmd"); and there you are.
We were not supposed to use this function, but he said he didn't check our programs for it. One way to hide its use would have been to obscure it through Macro definitions and such. However, ltrace is still able to track system down when used through Macros. I believe it even finds it when its called from a separate program, like `execvp( "./prgrm_with_system", ...).
My chance to use it is gone, but I am really curious about whether there is a way to hide system from even ltrace.
system() doesn't do anything that's magic. It doesn't even do anything that's smart (and using it is often a code smell). It also isn't a system call in the sense that the term "syscall" refers to.
You could trivially create your own version of system() using the underlying syscalls fork() and execve(), and bypass detection with ltrace... but strace would still show those calls happening.
You also could bypass ltrace with static linking, but since syscalls are by definition for things that require the OS kernel's help, you can't do without them entirely -- so tools such as strace, sysdig, truss, dtrace, and local equivalents can't be so easily avoided (without exploiting security vulnerabilities in the OS or the tools themselves).

How to find stuff in the kernel

I'm doing various tasks on the linux kernel, and I end up reading source code from time to time. I haven't really needed to change the kernel yet (I'm good with so called "Loadable Kernel Modules") so I didn't download the source of the kernel, just using http://lxr.free-electrons.com/ . And quite a lot I find myself finding a function that has many implementations, and start guessing which one is the one I need.
For example, I looked at the file Linux/virt/kvm/kvm_main.c at line 496 is a call to list_add, a click on it gives me two options: drivers/gpu/drm/radeon/mkregtable.c, line 84 and include/linux/list.h, line 60 - It's quite clear that kvm will not send my to something under "gpu" but this is not always the case. I have looked at the includes of the file - was not much help.
So my questions: Given a file from the kernel, and a function call at line ###, what is the nicest way to find where one function call actually continues?
(I'll be happy to hear also about ways that don't include the website and\or require me to download the source code)
There are many things in kernel that are #define'd or typedef'd or functions mapped inside structs (the fop struct in the drivers). So, there's no easy way to browse the kernel source. lxr site helps you but it can't go any further when you encounter any of the above data structs. The same is with using cscope/ctags. The best way though, despite you explicitly mentioning against it, is to download the source and browse through it.
Another method would be to use kgdb and inspect the code function by function, but that requires you to have some knowledge of the functions where you want to step in or not, to save a lot of time. And last but not the least, increase the kernel log level, and print the logs that are accessible through dmesg. But these all require you to have a kernel source.

LoadLibrary from offset in a file

I am writing a scriptable game engine, for which I have a large number of classes that perform various tasks. The size of the engine is growing rapidly, and so I thought of splitting the large executable up into dll modules so that only the components that the game writer actually uses can be included. When the user compiles their game (which is to say their script), I want the correct dll's to be part of the final executable. I already have quite a bit of overlay data, so I figured I might be able to store the dll's as part of this block. My question boils down to this:
Is it possible to trick LoadLibrary to start reading the file at a certain offset? That would save me from having to either extract the dll into a temporary file which is not clean, or alternatively scrapping the automatic inclusion of dll's altogether and simply instructing my users to package the dll's along with their games.
Initially I thought of going for the "load dll from memory" approach but rejected it on grounds of portability and simply because it seems like such a horrible hack.
Any thoughts?
Kind regards,
Philip Bennefall
You are trying to solve a problem that doesn't exist. Loading a DLL doesn't actually require any physical memory. Windows creates a memory mapped file for the DLL content. Code from the DLL only ever gets loaded when your program calls that code. Unused code doesn't require any system resources beyond reserved memory pages. You have 2 billion bytes worth of that on a 32-bit operating system. You have to write a lot of code to consume them all, 50 megabytes of machine code is already a very large program.
The memory mapping is also the reason you cannot make LoadLibrary() do what you want to do. There is no realistic scenario where you need to.
Look into the linker's /DELAYLOAD option to improve startup performance.
I think every solution for that task is "horrible hack" and nothing more.
Simplest way that I see is create your own virtual drive that present custom filesystem and hacks system access path from one real file (compilation of your libraries) to multiple separate DLL-s. For example like TrueCrypt does (it's open-source). And than you may use LoadLibrary function without changes.
But only right way I see is change your task and don't use this approach. I think you need to create your own script interpreter and compiler, using structures, pointers and so on.
The main thing is that I don't understand your benefit from use of libraries. I think any compiled code in current time does not weigh so much and may be packed very good. Any other resources may be loaded dynamically at first call. All you need to do is to organize the working cycles of all components of the script engine in right way.

Resources