Can sys_execve() still return with error after begin_new_exec() returns zero?

Can sys_execve() still return with error after begin_new_exec() returns zero? - linux-kernel

I'm using a BPF kprobe to find out when a task's UIDs, GIDs and namespaces change outside the syscalls that have the ability to change these values. For this, I update values[pid] when returning from execve(), execveat(), setns(), unshare(), set*uid(). And on entry to begin_new_exec(), I check whether the current task's values match values[pid].
In some cases the probes incorrectly report that the UID or nsproxy for a process has changed, indicating I've missed a place where I need to update the task's values.
Looking at begin_new_exec(), it replaces the task's credentials and wipes out the old executable. But after begin_new_exec() returns, load_elf_binary() can still return errors while trying to set up the new process image.
Do these late errors reach usermode ? Is there a scenario where sys_execve can fail after begin_new_exec returns, and the PID is not terminated ?

Related

What does ANSI mean in the LOAD_DLL_DEBUG_INFO event?

The Debug API reports DLL load events through a LOAD_DLL_DEBUG_INFO event. One of the structure's data members optionally holds the DLL's file name (lpImageName).
The character encoding of this field is described as:
If fUnicode is a nonzero value, the name string is Unicode; otherwise, it is ANSI.
Unicode presumably means UTF-16. Though it's unclear which codepage to use to interpret the ANSI encoding. There are multiple potential contenders (e.g. the originating process' default codepage, the system's codepage, the receiving process' default codepage, the receiving thread's current codepage, etc.).
Which codepage is it?

initially the debug event comes in the form DBGUI_WAIT_STATE_CHANGE
if use WaitForDebugEvent[Ex] api - it internally convert DBGUI_WAIT_STATE_CHANGE to DEBUG_EVENT by using DbgUiConvertStateChangeStructure[Ex]
when section (file mapping in win32 terms) created with SEC_IMAGE mapped in process, which is being debugged, the DbgLoadDllStateChange message send to debugger. DbgUiConvertStateChangeStructure[Ex] convert it to LOAD_DLL_DEBUG_INFO
note that original DBGKM_LOAD_DLL not containing any info about are in ansi or unicode was NamePointer. this is "unknown". the DbgUiConvertStateChangeStructure[Ex] always hard-code fUnicode = TRUE. this string, if exist, always in unicode.
This member is strictly optional. Debuggers must be prepared to handle
the case where lpImageName is NULL or *lpImageName (in the address
space of the process being debugged) is NULL. Specifically, the system
will never provide an image name for a create process event, and it
will not likely pass an image name for the first DLL event. The system
will also never provide this information in the case of debugging
events that originate from a call to the DebugActiveProcess function.
note, that lpImageName is pointer to pointer of a string (WCHAR** lpImageName can be say). in current implementation - this is always point to NT_TIB.ArbitraryUserPointer (not containing value of ArbitraryUserPointer but address of ArbitraryUserPointer)
formally can say lpImageName = &ptib->ArbitraryUserPointer where NT_TIB* ptib.
so lpImageName by self never 0, but *lpImageName (of course in target process address space) can be 0. when LdrLoadDll (or LoadLibrary) load dll, before map image section (call to ZwMapViewOfSection) set ArbitraryUserPointer to unicode string passed to LdrLoadDll as is. and restore original value of ArbitraryUserPointer after this. in case image name for a create process event, and image name for the first DLL (ntdll) here (in ArbitraryUserPointer 0) also it of course not valid when we receive debug events latter (case of DebugActiveProcess). so use lpImageName not reliable.
also interesing that in case load and unload image section (this is not always mean dll load/unload) (dwProcessId, dwThreadId) not of process/thread in which the debugging event occurred, but process/thread which call ZwMapViewOfSection or ZwUnmapViewOfSection. this is in general case different things, because possible map/unmap section in another process. however this is rarely case, but many debuggers (including windbg and from msvc) wrong handle this case and hung on it

what is the purpose of the BeingDebugged flag in the PEB structure?

What is the purpose of this flag (from the OS side)?
Which functions use this flag except isDebuggerPresent?
thanks a lot

It's effectively the same, but reading the PEB doesn't require a trip through kernel mode.
More explicitly, the IsDebuggerPresent API is documented and stable; the PEB structure is not, and could, conceivably, change across versions.
Also, the IsDebuggerPresent API (or flag) only checks for user-mode debuggers; kernel debuggers aren't detected via this function.
Why put it in the PEB? It saves some time, which was more important in early versions of NT. (There are a bunch of user-mode functions that check this flag before doing some runtime validation, and will break to the debugger if set.)
If you change the PEB field to 0, then IsDebuggerPresent will also return 0, although I believe that CheckRemoteDebuggerPresent will not.

As you have found the IsDebuggerPresent flag reads this from the PEB. As far as I know the PEB structure is not an official API but IsDebuggerPresent is so you should stick to that layer.
The uses of this method are quite limited if you are after a copy protection to prevent debugging your app. As you have found it is only a flag in your process space. If somebody debugs your application all he needs to do is to zero out the flag in the PEB table and let your app run.
You can raise the level by using the method CheckRemoteDebuggerPresent where you pass in your own process handle to get an answer. This method goes into the kernel and checks for the existence of a special debug structure which is associated with your process if it is beeing debugged. A user mode process cannot fake this one but you know there are always ways around by simply removing your check ....

IoGetDeviceObjectPointer() fails with no return status

This is my code:
UNICODE_STRING symbol;
WCHAR ntNameBuffer[128];
swprintf(ntNameBuffer, L"\\Device\\Harddisk1\\Partition1");
RtlInitUnicodeString(&symbol, ntNameBuffer);
KdPrint(("OSNVss:symbol is %ws\n",symbol.Buffer));
status = IoGetDeviceObjectPointer(&symbol,
FILE_READ_DATA,
&pDiskFileObject,
&pDiskDeviceObject);
My driver is next-lower-level of \\Device\\Harddisk1\\Partition1.
When I call IoGetDeviceObjectPointer it will fail and no status returns and it not continue do remaining code.
When I use windbg debug this, it will break with a intelpm.sys;
If I change the objectname to "\\Device\\Harddisk1\\Partition2" (the partition2 is really existing), it will success call
If I change objectname to "\\Device\\Harddisk1\\Partition3", (the partition3 is not existing), it failed and return status = 0xc0000034, mean objectname is not existing.
Does anybody know why when I use object "\\Device\\Harddisk1\\Partition1" it fails and no return status? thanks very much!

First and foremost: what are you trying to achieve and what driver model are you using? What bitness, what OS versions are targeted and on which OS version does it fail? Furthermore: you are at the correct IRQL for the call and is running inside a system thread, right? From which of your driver's entry points (IRP_MJ_*, DriverEntry ...) are you calling this code?
Anyway, was re-reading the docs on this function. Noting in particular the part:
The IoGetDeviceObjectPointer routine returns a pointer to the top object in the named device object's stack and a pointer to the
corresponding file object, if the requested access to the objects can
be granted.
and:
IoGetDeviceObjectPointer establishes a "connection" between the caller
and the next-lower-level driver. A successful caller can use the
returned device object pointer to initialize its own device objects.
It can also be used as as an argument to IoAttachDeviceToDeviceStack,
IoCallDriver, and any routine that creates IRPs for lower drivers. The
returned pointer is a required argument to IoCallDriver.
You don't say, but if you are doing this on a 32bit system, it may be worthwhile tracking down what's going on with IrpTracker. However, my guess is that said "connection" or rather the request for it gets somehow swallowed by the next-lower-level driver or so.
It is also hard to say what kind of driver you are writing here (and yes, this can be important).
Try not just breaking at a particular point before or after the fact but rather follow the stack that the IRP would travel downwards in the target device object's stack.
But thinking about it, you probably aren't attached to the stack at all (for whatever reason). Could it be that you actually should be using IoGetDiskDeviceObject instead, in order to get the actual underlying device object (at the bottom of the stack) and not a reference to the top-level object attached?
Last but not least: don't forget you can also ask this question over on the OSR mailing lists. There are plenty of seasoned professionals there who may have run into the exact same problem (assuming you are doing all of the things correct that I asked about).

thanks everyone , I solve this problem; what cause this problem is it becoming synchronous; when I
call IoGetDeviceObjectPointer , it will generate an new Irp IRP_MJ_WRITER which pass though from high level, when this irp reach my driver, my thread which handle IRP is the same thread whilch call IoGetDeviceObjectPointer ,so it become drop-dead halt;

Problem with .release behavior in file_operations

I'm dealing with a problem in a kernel module that get data from userspace using a /proc entry.
I set open/write/release entries for my own defined /proc entry, and manage well to use it to get data from userspace.
I handle errors in open/write functions well, and they are visible to user as open/fopen or write/fwrite/fprintf errors.
But some of the errors can only be checked at close (because it's the time all the data is available). In these cases I return something different than 0, which I supposed to be in some way the value 'close' or 'fclose' will return to user.
But whatever the value I return my close behave like if all is fine.
To be sure I replaced all the release() code by a simple 'return(-1);' and wrote a program that open/write/close the /proc entry, and prints the close return value (and the errno). It always return '0' whatever the value I give.
Behavior is the same with 'fclose', or by using shell mechanism (echo "..." >/proc/my/entry).
Any clue about this strange behavior that is not the one claimed in many tutorials I found?
BTW I'm using RHEL5 kernel (2.6.18, redhat modified), on a 64bit system.
Thanks.
Regards,
Yannick

The release() isn't allowed to cause the close() to fail.
You could require your userspace programs to call fsync() on the file descriptor before close(), if they want to find out about all possible errors; then implement your final error checking in the fsync() handler.

file_operations Question, how do i know if a process that opened a file for writing has decided to close it?

I'm currently writing a simple "multicaster" module.
Only one process can open a proc filesystem file for writing, and the rest can open it for reading.
To do so i use the inode_operation .permission callback, I check the operation and when i detect someone open a file for writing I set a flag ON.
i need a way to detect if a process that opened a file for writing has decided to close the file so i can set the flag OFF, so someone else can open for writing.
Currently in case someone is open for writing i save the current->pid of that process and when the .close callback is called I check if that process is the one I saved earlier.
Is there a better way to do that? Without saving the pid, perhaps checking the files that the current process has opened and it's permission...
Thanks!

No, it's not safe. Consider a few scenarios:
Process A opens the file for writing, and then fork()s, creating process B. Now both A and B have the file open for writing. When Process A closes it, you set the flag to 0 but process B still has it open for writing.
Process A has multiple threads. Thread X opens the file for writing, but Thread Y closes it. Now the flag is stuck at 1. (Remember that ->pid in kernel space is actually the userspace thread ID).
Rather than doing things at the inode level, you should be doing things in the .open and .release methods of your file_operations struct.
Your inode's private data should contain a struct file *current_writer;, initialised to NULL. In the file_operations.open method, if it's being opened for write then check the current_writer; if it's NULL, set it to the struct file * being opened, otherwise fail the open with EPERM. In the file_operations.release method, check if the struct file * being released is equal to the inode's current_writer - if so, set current_writer back to NULL.
PS: Bandan is also correct that you need locking, but the using the inode's existing i_mutex should suffice to protect the current_writer.

I hope I understood your question correctly: When someone wants to write to your proc file, you set a variable called flag to 1 and also save the current->pid in a global variable. Then, when any close() entry point is called, you check current->pid of the close() instance and compare that with your saved value. If that matches, you turn flag to off. Right ?
Consider this situation : Process A wants to write to your proc resource, and so you check the permission callback. You see that flag is 0, so you can set it to 1 for process A. But at that moment, the scheduler finds out process A has used up its time share and chooses a different process to run(flag is still o!). After sometime, process B comes up wanting to write to your proc resource also, checks that the flag is 0, sets it to 1, and then goes about writing to the file. Unfortunately at this moment, process A gets scheduled to run again and since, it thinks that flag is 0 (remember, before the scheduler pre-empted it, flag was 0) and so sets it to 1 and goes about writing to the file. End result : data in your proc resource goes corrupt.
You should use a good locking mechanism provided by the kernel for this type of operation and based on your requirement, I think RCU is the best : Have a look at RCU locking mechanism

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio