Preventing from accessing process memory - macos

I made an example that writes into process memory using task_for_pid() and mach_vm_write().
task_for_pid(mach_task_self(), pid, &target_task);
mach_vm_write(target_task, address, '?', local_size);
Is there a way to block to access memory of the specific process from another processes like cheat engine on OS X.
How do I prevent another process from calling task_for_pid?
Not that many others come to mind except hooking.

In OS X, the calls to task_for_pid are regulated by taskgated. Basically, unless it's your task , or you're root (or, in older systems, member of procview group), you won't get that elusive task port. But if you are allowed, then you have the port, and can do basically anything you want.
Hooking won't help, since task_for_pid is a mach trap - people can call it directly using the system call interface. iOS has much tighter controls on it (thanks to AppleMobileFileIntegrity.kext). If you want to control the trap, effectively the only way of doing so is writing a small kext to do the trick for you.

Related

What is the ideal way to emulate process replacement on Windows?

So, in a feature request I filed against Node.js, I was looking for a way to replace the current Node process with another. In Linux and friends (really, any POSIX-compliant system), this is easy: use execve and friends and call it a day. But obviously, that won't work on Windows, since it only has CreateProcess (which execve and friends delegate to, complete with async behavior). And it's not like people haven't wanted to do similar, leading to numerous duplicate questions on this site. (This isn't a duplicate because it's explicitly seeking a workaround given certain constraints, not just asking for direct replacement.)
Process replacement has several facets that have to addressed:
All console I/O streams have to be forwarded to the new process.
All signals need transparently forwarded to the new process.
The data from the old process have to be destroyed, with as many resources reclaimed as possible.
All pre-existing threads and child processes should be destroyed.
All pre-existing handles should be destroyed apart from open file descriptors and named pipes/etc.
Optimally, the old process's memory should be kept to a minimum after the process is created.
For my particular use case, retaining the process ID is not important.
And for my particular case, there are a few constraints:
I can control the initial process's startup as well as the location of my "process replacement" function.
I could load arbitrary native code via add-ons at potentially any stack offset.
Implication: I can't even dream of tracking malloc calls, handles, thread manipulation, or process manipulation to track and free them all, since DLL rewriting isn't exactly practical.
I have no control over when my "process replacement" is called. It could be called through an add-on, which could've been called through either interpreted code via FFI or even another add-on recursively. It could even be called during add-on initialization.
Implication: I would have no ability to know what's in the stack, even if I perfectly instrumented my side. And rewriting all their calls and pushes is far from practical, and would just be all-around slow for obvious reasons.
So, here's the gist of what I was thinking: use something similar to a pseudo-trampoline.
Statically allocate the following:
A single pointer for the stack pointer.
MAX_PATH + 1 chars for the application path + '\0'.
MAX_PATH + 1 chars for the current working directory path + '\0'.
32768 chars for the arguments + '\0'.
32768 chars for the environment + '\0'.
On entry, set the global stack pointer reference to the stack pointer.
On "replacement":
Do relevant process cleanup and lock/release everything you can.
Set the stack pointer to the stored original global one.
Terminate each child thread.
Kill each child process.
Free each open handle.
If possible (i.e. not in a UWP program), For each heap, destroy it if it's not the default heap or the temporary heap (if it exists).
If possible, close each open handle.
If possible, walk the default heap and free each segment associated with it.
Create a new process with the statically allocated file/arguments/environment/etc. with no new window created.
Proxy all future received signals, exceptions, etc. without modification to this process somehow. The standard signals are easy, but not so much with the exceptions.
Wait for the process to end.
Return with the process's exit code.
The idea here is to use a process-based trampoline and drop the current process size to an absolute minimum while the newly created one is started.
But where I'm not very familiar with Windows, I probably made quite a few mistakes here. Also, the above seems extremely inefficient and to an extent it just feels horribly wrong for something a kernel could just release a few memory pages, deallocate a bunch of memory handles, and move some memory around for the next process.
So, to summarize, what's the ideal way to emulate process replacement on Windows with the fewest limitations?
Given that I don't understand what is actually being requested and I certainly look at things like 'execve' with a "who the hell would ever call that anyway, nothing but madness can ever result" sentiment, I nonetheless look at this problem by asking myself:
if process-a was killed and replaced by an near identical process-b - who or what would notice?
Anything that held the process id, or a handle to the process would certainly notice. This can be handled by writing a wrapper app which loads the first node process, and when prodded, kills it and loads the next. External observers see the wrapping process handles and id's unchanged.
Obviously this would cut off the stdin and stdout streams being fed into the node applications. But again, the wrapper process could get around this by passing the same set of inheritable handles to each node process launched by filling in the STARTUPINFO structure passed to CreateProcess properly.
Windows doesn't support signals, and the ones that the MS C runtime fake all deal with internal errors except one, which deals with an interactive console window being closed via ctrl-C, which the active Node.js app is sure to get anyway - or can be passed on from the wrapper as the node apps would not actually be running on the interactive console with this approach.
Other than that, everything else seems to be an internal detail of the Node.js application so shouldn't effect any 3rd party app communicating with what it thinks is a single node app via its stdin/stdout streams.

How to identify a process in Windows? Kernel and User mode

In Windows, what is the formal way of identifying a process uniquely? I am not talking about PID, which is allocated dynamically, but a unique ID or a name which is permanent to that process. I know that every program/process has a security descriptor but it seems to hold SIDs for loggedin user and group (not the process). We cannot use the path and name of executable from where the process starts as that can change.
My aim is to identify a process in the kernel mode and allow it to perform certain operation. What is the easiest and best way of doing this?
Your question is too vague to answer properly. For example how could the path possibly change (without poking around in kernel memory) after creation of a process? And yes, I am aware that one could hook into the memory-mapping process during process creation to replace the image originally destined to be loaded with another. Point is that a process is merely one instance of running a given executable. And it's not clear what exact tampering attempts you want to counter here.
But from kernel mode you do have the ability to simply use the pointer to the EPROCESS structure. No need to use the PID, although that will be unique while the process is still alive.
So assuming your process uses an IRP to communicate to the driver (whether it be WriteFile, ReadFile, DeviceIoControl or something more exotic), in order to register itself, you can use IoGetCurrentProcess to get the PEPROCESS value which will be unique to the process.
While the structure itself is not officially documented, hints can be gleaned from the "Windows Internals" book (in its various incarnations), the dt (Display Type) command in WinDbg (and friends) as well as from third-party resources on the internet (e.g. here, specific to Vista).
The process objects are kept in several linked lists. So if you know the (officially undocumented!!!) layout for a particular OS version, you may traverse the lists to get from one to the next process object (i.e. EPROCESS structure).
Cautionary notes
Make sure to reference the object of the process, by using the respective object manager routines. Otherwise you cannot be certain it's safe to both reach into these structures (which is anyway unsafe, since you cannot rely on their layout across OS versions) or to pass it to functions that expect a PEPROCESS.
As a side-note: Harry Johnston is of course right to assert that a privileged user can insert arbitrary (well almost arbitrary) code into the TCB in order to thwart your protective measures. In the end it is going to be an arms race.
Also keep in mind that similar to PIDs, theoretically the value of the PEPROCESS may be recycled. But in both cases you can simply counter this by invalidating whatever internal state you keep in your driver that allows the process to do its magic, whenever the process goes down. Using something like PsSetCreateProcessNotifyRoutine would seem to be a good method here. In order to translate your process handle from the callback to a PEPROCESS value, use ObReferenceObjectByHandle.
An alternative of countering recycling of the PID/PEPROCESS is by keeping a reference to the process object and thus keeping it in a kind of undead state (similar to not closing a handle in user mode), although the main thread may have finished.

Which mutex lock variant should I use in Linux kernel developing?

AFAIK, the mutex API was introduced to the kernel after LDD3 (Linux device drivers 3rd edition) was written so it's not described in the book.
The book describes how to use the kernel's semaphore API for mutex functionality.
It suggest to use down_interruptable() instead of down():
You do not, as a general rule,
want to use noninterruptible operations unless there truly is no alternative. Non-interruptible operations are a good way to create unkillable processes (the dreaded
“D state” seen in ps), and annoy your users [Linux Device Drivers 3rd ed]
Now. here's my question:
The mutex API has two "similar" functions:
mutex_lock_killable() an mutex_lock_interruptable(). Which one should I choose?
Use mutex_lock_interruptible() function to allow your driver to be interrupted by any signal.
This implies that your system call should be written so that it can be restarted.
(Also see ERESTARTSYS.)
Use mutex_lock_killable() to allow your driver to be interrupted only by signals that actually kill the process, i.e., when the process has no opportunity to look at the results of your system call, or even to try it again.
Use mutex_lock() when you can guarantee that the mutex will not be held for a long time.

Pipe output(stdout) from running process Win32Api

I need to get (or pipe) the output from a process that is already running, using the windows api.
Basically my application should allow the user to select a window to pipe the input from, and all input will be displayed in a console. I would also be looking on how to get a pipe on stderr later on.
Important: I did not start the process using CreateProcess() or otherwise. The process is already running, and all I have is the handle to the process (returned from GetWindowThreadProcessId()).
The cleanest way of doing this without causing any ill effects, such that may occur if you used the method Adam implied of swapping the existing stdout handle with your own, is to use hooking.
If you inject a thread into the existing application and swap calls to WriteFile with an intercepted version that will first give you a copy of what's being written (filtered by handle, source, whatever) then pass it along to the real ::WriteFile with no harm done. Or you can intercept the call higher up by only swapping out printf or whichever call it is that the software is using (some experimentation needed, obviously).
HOWEVER, Adam is spot-on when he says this isn't what you want to do. This is a last resort, so think very, very carefully before going down this line!
Came across this article from MS while searching on the topic.
http://support.microsoft.com/kb/190351
The concept of piping input and output on Unix is trivial, there seems no great reason for it to be so complex on Windows. - Karl
Whatever you're trying to do, you're doing it wrong. If you're interacting with a program for which you have the source code, create a defined interface for your IPC: create a socket, a named pipe, windows messaging, shared memory segment, COM server, or whatever your preferred IPC mechanism is. Do not try to graft IPC onto a program that wasn't intending to do IPC.
You have no control over how that process's stdout was set up, and it is not yours to mess with. It was created by its parent process and handed off to the child, and from there on out, it's in control of the child. You don't go in and change the carpets in somebody else's house.
Do not even think of going into that process, trying to CloseHandle its stdout, and CreateFile a new stdout pointing to your pipe. That's a recipe for disaster and will result in quirky behavior and "impossible" crashes.
Even if you could do what you wanted to do, what would happen if two programs did this?

How does an OS or a systems program wait for user input?

I come from the world of web programming and usually the server sets a superglobal variable through the specified method (get, post, etc) that makes available the data a user inputs into a field. Another way is to use AJAX to register a callback method to an event that the AJAX XMLhttpRequest object will initiate once notified by the browser (I'm assuming...). So I guess my question would be if there is some sort of dispatch interface that a systems programmer's code must interact with vicariously to execute in response to user input or does the programmer control the "waiting" process directly? And if there is a dispatch is there a loop structure in an OS that waits for a particular event to occur?
I was prompted to ask this question here because I'm in a basic programming logic class and the professor won't answer such a "sophisticated" question as this one. My book gives a vague pseudocode example like:
//start
sentinel_val = 'stop';
get user_input;
while (user_input not equal to sentinel_val)
{
// do something.
get user_input;
}
//stop
This example leads me to believe 1) that if no input is received from the user the loop will continue to repeat the sequence "do something" with the old or no input until the new input magically appears and then it will repeat again with that or a null value. It seems the book has tried to use the example of priming and reading from a file to convey how a program would get data from event driven input, no?
I'm confused :(
At the lowest level, input to the computer is asynchronous-- it happens via "interrupts", which is basically something external to the CPU (a keyboard controller) sending a signal to the CPU that says "stop what you're doing and accept this data". (It's complex, but this is the general idea). So the CPU stops, grabs the keystroke, and puts it in a buffer to be read, and then continues doing what it was doing before the interrupt.
Very similar things happen with inbound network traffic, and the results of reading from a disk, etc.
At a higher level, it gets more dependent on the operating system or framework that you're using.
With keyboard input, there might be a process (application, basically) that is blocked, waiting for user input. That "block" doesn't mean the computer just sits there waiting, it lets other processes run instead. But when the keyboard result comes in, it will wake up the one who was waiting for it.
From the point of view of that waiting process, they called some function "get_next_character()" and that function returned with the character. Etc.
Frankly, how all this stuff ties together is super interesting and useful to understand. :)
An OS is driven by hardware event (called interrupt). An OS does not wait for an interrupt, instead, it execute a special instruction to put the CPU a nap in a loop. If a hardware event occurs, the corresponding interrupt will be invoked.
It seems the book has tried to use the example of priming and reading from a file
to convey how a program would get data from event driven input, no?
Yes that is what the book is doing. In fact... the unix operating system is built on the idea of abstracting all input and output of any device to look like this.
In reality most operating systems and hardware make use of interrupts that jump to what we can call a sub-routine to perform the low level data read and then return control back to the operating system.
Also on most systems many of the devices work independent of the rest of the operating system and present a high level API to the operating system. For example a keyboard port (or maybe a better example is a network card) on a computer process interrupts itself and then the keyboard driver presents the operating system with a different api. You can look at standards for devices to see what these are. If you want to know the api the keyboard port presents for example you could look at the source code for the keyboard driver in a linix distro.
A basic explanation based on my understanding...
Your get user_input pseudo function is often something like readLine. That means that the function will block until the data read contains a new line character.
Below this the OS will use interrupts (this means it's not dealing with the keyboard unessesarily, but only when required) to allow it to respond when it the user hits some keys. The keyboard interrupt will cause execution to jump to a special routine which will fill an input buffer with data from the keyboard. The OS will then allow the appropriate process - generally the active one - to use readLine functions to access this data.
There's a bunch more complexity in there but that's a simple view. If someone offers a better explanation I'll willingly bow to superior knowledge.

Resources