Problem with .release behavior in file_operations - linux-kernel

I'm dealing with a problem in a kernel module that get data from userspace using a /proc entry.
I set open/write/release entries for my own defined /proc entry, and manage well to use it to get data from userspace.
I handle errors in open/write functions well, and they are visible to user as open/fopen or write/fwrite/fprintf errors.
But some of the errors can only be checked at close (because it's the time all the data is available). In these cases I return something different than 0, which I supposed to be in some way the value 'close' or 'fclose' will return to user.
But whatever the value I return my close behave like if all is fine.
To be sure I replaced all the release() code by a simple 'return(-1);' and wrote a program that open/write/close the /proc entry, and prints the close return value (and the errno). It always return '0' whatever the value I give.
Behavior is the same with 'fclose', or by using shell mechanism (echo "..." >/proc/my/entry).
Any clue about this strange behavior that is not the one claimed in many tutorials I found?
BTW I'm using RHEL5 kernel (2.6.18, redhat modified), on a 64bit system.
Thanks.
Regards,
Yannick

The release() isn't allowed to cause the close() to fail.
You could require your userspace programs to call fsync() on the file descriptor before close(), if they want to find out about all possible errors; then implement your final error checking in the fsync() handler.

Related

what is the purpose of the BeingDebugged flag in the PEB structure?

What is the purpose of this flag (from the OS side)?
Which functions use this flag except isDebuggerPresent?
thanks a lot
It's effectively the same, but reading the PEB doesn't require a trip through kernel mode.
More explicitly, the IsDebuggerPresent API is documented and stable; the PEB structure is not, and could, conceivably, change across versions.
Also, the IsDebuggerPresent API (or flag) only checks for user-mode debuggers; kernel debuggers aren't detected via this function.
Why put it in the PEB? It saves some time, which was more important in early versions of NT. (There are a bunch of user-mode functions that check this flag before doing some runtime validation, and will break to the debugger if set.)
If you change the PEB field to 0, then IsDebuggerPresent will also return 0, although I believe that CheckRemoteDebuggerPresent will not.
As you have found the IsDebuggerPresent flag reads this from the PEB. As far as I know the PEB structure is not an official API but IsDebuggerPresent is so you should stick to that layer.
The uses of this method are quite limited if you are after a copy protection to prevent debugging your app. As you have found it is only a flag in your process space. If somebody debugs your application all he needs to do is to zero out the flag in the PEB table and let your app run.
You can raise the level by using the method CheckRemoteDebuggerPresent where you pass in your own process handle to get an answer. This method goes into the kernel and checks for the existence of a special debug structure which is associated with your process if it is beeing debugged. A user mode process cannot fake this one but you know there are always ways around by simply removing your check ....

IoGetDeviceObjectPointer() fails with no return status

This is my code:
UNICODE_STRING symbol;
WCHAR ntNameBuffer[128];
swprintf(ntNameBuffer, L"\\Device\\Harddisk1\\Partition1");
RtlInitUnicodeString(&symbol, ntNameBuffer);
KdPrint(("OSNVss:symbol is %ws\n",symbol.Buffer));
status = IoGetDeviceObjectPointer(&symbol,
FILE_READ_DATA,
&pDiskFileObject,
&pDiskDeviceObject);
My driver is next-lower-level of \\Device\\Harddisk1\\Partition1.
When I call IoGetDeviceObjectPointer it will fail and no status returns and it not continue do remaining code.
When I use windbg debug this, it will break with a intelpm.sys;
If I change the objectname to "\\Device\\Harddisk1\\Partition2" (the partition2 is really existing), it will success call
If I change objectname to "\\Device\\Harddisk1\\Partition3", (the partition3 is not existing), it failed and return status = 0xc0000034, mean objectname is not existing.
Does anybody know why when I use object "\\Device\\Harddisk1\\Partition1" it fails and no return status? thanks very much!
First and foremost: what are you trying to achieve and what driver model are you using? What bitness, what OS versions are targeted and on which OS version does it fail? Furthermore: you are at the correct IRQL for the call and is running inside a system thread, right? From which of your driver's entry points (IRP_MJ_*, DriverEntry ...) are you calling this code?
Anyway, was re-reading the docs on this function. Noting in particular the part:
The IoGetDeviceObjectPointer routine returns a pointer to the top object in the named device object's stack and a pointer to the
corresponding file object, if the requested access to the objects can
be granted.
and:
IoGetDeviceObjectPointer establishes a "connection" between the caller
and the next-lower-level driver. A successful caller can use the
returned device object pointer to initialize its own device objects.
It can also be used as as an argument to IoAttachDeviceToDeviceStack,
IoCallDriver, and any routine that creates IRPs for lower drivers. The
returned pointer is a required argument to IoCallDriver.
You don't say, but if you are doing this on a 32bit system, it may be worthwhile tracking down what's going on with IrpTracker. However, my guess is that said "connection" or rather the request for it gets somehow swallowed by the next-lower-level driver or so.
It is also hard to say what kind of driver you are writing here (and yes, this can be important).
Try not just breaking at a particular point before or after the fact but rather follow the stack that the IRP would travel downwards in the target device object's stack.
But thinking about it, you probably aren't attached to the stack at all (for whatever reason). Could it be that you actually should be using IoGetDiskDeviceObject instead, in order to get the actual underlying device object (at the bottom of the stack) and not a reference to the top-level object attached?
Last but not least: don't forget you can also ask this question over on the OSR mailing lists. There are plenty of seasoned professionals there who may have run into the exact same problem (assuming you are doing all of the things correct that I asked about).
thanks everyone , I solve this problem; what cause this problem is it becoming synchronous; when I
call IoGetDeviceObjectPointer , it will generate an new Irp IRP_MJ_WRITER which pass though from high level, when this irp reach my driver, my thread which handle IRP is the same thread whilch call IoGetDeviceObjectPointer ,so it become drop-dead halt;

swapoff from inside kernel code

kernel newbie here...
I'm trying to do a swapoff from inside kernel code (on a swap device at a known location, suitable for hardcoding). I found the syscall sys_swapoff, which looks fairly straigtforward, so I tried just doing:
sys_swapoff("/path/to/swap/device");
but that doesn't work (it returns error no -14). Using ghetto-style debugging via printk, I've determined that it's erroring out on this codeblock in sys_swapoff:
pathname = getname(specialfile);
err = PTR_ERR(pathname);
if (IS_ERR(pathname))
goto out;
So apparently it doesn't like something about the pathname I'm giving it. I thought maybe it was because I was passing it a string literal instead of an allocated buffer, so I tried kmallocing a buffer, strcpying the path into it, and passing it that, but that made no difference. What am I doing wrong? Is there a better way to do a swapoff from inside kernel code other than using the syscall?
Are you specifying the path to the specific numbered partition (e.g. sda1 vs sda) as part of your path? Can you provide the specific value you used?
Actually, if you're trying to do this inside kernel code -- sys_swapoff expects the parameter being passed in to be from userspace, so you probably need to decompose sys_swapoff and do some of that work yourself in your code.
There's existing code in the TuxOnIce patch set (widely used for hibernation support in various Linux distros) that enables / disables swap when needed, i.e. when creating a hibernation image and/or resuming.
What this code does (check kernel/power/tuxonice_swap.c in the patch sources) is exactly the same as you do - sys_swapoff(swapfilename); - and it's functional. So there's nothing wrong with the invocation from kernel space as such.
How sure are you about your device pathname ? Have you instrumented sys_swapon() and sys_swapoff() so that they print what is actually passed when you manually on the command line issue swapon / swapoff commands ? udev & friends and/or the usage of initramfs sometimes result in device pathnames that aren't universally valid.
Edit:
Since you just stated in a comment you're attempting /dev/block/... - that might well be your cause; the path is an artifact, just after boot these device nodes exist directly in /dev/ (like /dev/mmcblk0 becomes /dev/block/mmcblk0 later). Try /dev/zram0 and see what happens.

Some Windows API calls fail unless the string arguments are in the system memory rather than local stack

We have an older massive C++ application and we have been converting it to support Unicode as well as 64-bits. The following strange thing has been happening:
Calls to registry functions and windows creation functions, like the following, have been failing:
hWnd = CreateSysWindowExW( ExStyle, ClassNameW.StringW(), Label2.StringW(), Style,
Posn.X(), Posn.Y(),
Size.X(), Size.Y(),
hParentWnd, (HMENU)Id,
AppInstance(), NULL);
ClassNameW and Label2 are instances of our own Text class which essentially uses malloc to allocate the memory used to store the string.
Anyway, when the functions fail, and I call GetLastError it returns the error code for "invalid memory access" (though I can inspect and see the string arguments fine in the debugger). Yet if I change the code as follows then it works perfectly fine:
BSTR Label2S = SysAllocString(Label2.StringW());
BSTR ClassNameWS = SysAllocString(ClassNameW.StringW());
hWnd = CreateSysWindowExW( ExStyle, ClassNameWS, Label2S, Style,
Posn.X(), Posn.Y(),
Size.X(), Size.Y(),
hParentWnd, (HMENU)Id,
AppInstance(), NULL);
SysFreeString(ClassNameWS); ClassNameWS = 0;
SysFreeString(Label2S); Label2S = 0;
So what gives? Why would the original functions work fine with the arguments in local memory, but when used with Unicode, the registry function require SysAllocString, and when used in 64-bit, the Windows creation functions also require SysAllocString'd string arguments? Our Windows procedure functions have all been converted to be Unicode, always, and yes we use SetWindowLogW call the correct default Unicode DefWindowProcW etc. That all seems to work fine and handles and draws Unicode properly etc.
The documentation at http://msdn.microsoft.com/en-us/library/ms632679%28v=vs.85%29.aspx does not say anything about this. While our application is massive we do use debug heaps and tools like Purify to check for and clean up any memory corruption. Also at the time of this failure, there is still only one main system thread. So it is not a thread issue.
So what is going on? I have read that if string arguments are marshalled anywhere or passed across process boundaries, then you have to use SysAllocString/BSTR, yet we call lots of API functions and there is lots of code out there which calls these functions just using plain local strings?
What am I missing? I have tried Googling this, as someone else must have run into this, but with little luck.
Edit 1: Our StringW function does not create any temporary objects which might go out of scope before the actual API call. The function is as follows:
Class Text {
const wchar_t* StringW () const
{
return TextStartW;
}
wchar_t* TextStartW; // pointer to current start of text in DataArea
I have been running our application with the debug heap and memory checking and other diagnostic tools, and found no source of memory corruption, and looking at the assembly, there is no sign of temporary objects or invalid memory access.
BUT I finally figured it out:
We compile our code /Zp1, which means byte aligned memory allocations. SysAllocString (in 64-bits) always return a pointer that is aligned on a 8 byte boundary. Presumably a 32-bit ANSI C++ application goes through an API layer to the underlying Unicode windows DLLs, which would also align the pointer for you.
But if you use Unicode, you do not get that incidental pointer alignment that the conversion mapping layer gives you, and if you use 64-bits, of course the situation will get even worse.
I added a method to our Text class which shifts the string pointer so that it is aligned on an eight byte boundary, and viola, everything runs fine!!!
Of course the Microsoft people say it must be memory corruption and I am jumping the wrong conclusion, but there is evidence it is not the case.
Also, if you use /Zp1 and include windows.h in a 64-bit application, the debugger will tell you sizeof(BITMAP)==28, but calling GetObject on a bitmap will fail and tell you it needs a 32-byte structure. So I suspect that some of Microsoft's API is inherently dependent on aligned pointers, and I also know that some optimized assembly (I have seen some from Fortran compilers) takes advantage of that and crashes badly if you ever give it unaligned pointers.
So the moral of all of this is, dont use "funky" compiler arguments like /Zp1. In our case we have to for historical reasons, but the number of times this has bitten us...
Someone please give me a "this is useful" tick on my answer please?
Using a bit of psychic debugging, I'm going to guess that the strings in your application are pooled in a read-only section.
It's possible that the CreateSysWindowsEx is attempting to write to the memory passed in for the window class or title. That would explain why the calls work when allocated on the heap (SysAllocString) but not when used as constants.
The easiest way to investigate this is to use a low level debugger like windbg - it should break into the debugger at the point where the access violation occurs which should help figure out the problem. Don't use Visual Studio, it has a nasty habit of being helpful and hiding first chance exceptions.
Another thing to try is to enable appverifier on your application - it's possible that it may show something.
Calling a Windows API function does not cross the process boundary, since the various Windows DLLs are loaded into your process.
It sounds like whatever pointer that StringW() is returning isn't valid when Windows is trying to access it. I would look there - is it possible that the pointer returned it out of scope and deleted shortly after it is called?
If you share some more details about your string class, that could help diagnose the problem here.

file_operations Question, how do i know if a process that opened a file for writing has decided to close it?

I'm currently writing a simple "multicaster" module.
Only one process can open a proc filesystem file for writing, and the rest can open it for reading.
To do so i use the inode_operation .permission callback, I check the operation and when i detect someone open a file for writing I set a flag ON.
i need a way to detect if a process that opened a file for writing has decided to close the file so i can set the flag OFF, so someone else can open for writing.
Currently in case someone is open for writing i save the current->pid of that process and when the .close callback is called I check if that process is the one I saved earlier.
Is there a better way to do that? Without saving the pid, perhaps checking the files that the current process has opened and it's permission...
Thanks!
No, it's not safe. Consider a few scenarios:
Process A opens the file for writing, and then fork()s, creating process B. Now both A and B have the file open for writing. When Process A closes it, you set the flag to 0 but process B still has it open for writing.
Process A has multiple threads. Thread X opens the file for writing, but Thread Y closes it. Now the flag is stuck at 1. (Remember that ->pid in kernel space is actually the userspace thread ID).
Rather than doing things at the inode level, you should be doing things in the .open and .release methods of your file_operations struct.
Your inode's private data should contain a struct file *current_writer;, initialised to NULL. In the file_operations.open method, if it's being opened for write then check the current_writer; if it's NULL, set it to the struct file * being opened, otherwise fail the open with EPERM. In the file_operations.release method, check if the struct file * being released is equal to the inode's current_writer - if so, set current_writer back to NULL.
PS: Bandan is also correct that you need locking, but the using the inode's existing i_mutex should suffice to protect the current_writer.
I hope I understood your question correctly: When someone wants to write to your proc file, you set a variable called flag to 1 and also save the current->pid in a global variable. Then, when any close() entry point is called, you check current->pid of the close() instance and compare that with your saved value. If that matches, you turn flag to off. Right ?
Consider this situation : Process A wants to write to your proc resource, and so you check the permission callback. You see that flag is 0, so you can set it to 1 for process A. But at that moment, the scheduler finds out process A has used up its time share and chooses a different process to run(flag is still o!). After sometime, process B comes up wanting to write to your proc resource also, checks that the flag is 0, sets it to 1, and then goes about writing to the file. Unfortunately at this moment, process A gets scheduled to run again and since, it thinks that flag is 0 (remember, before the scheduler pre-empted it, flag was 0) and so sets it to 1 and goes about writing to the file. End result : data in your proc resource goes corrupt.
You should use a good locking mechanism provided by the kernel for this type of operation and based on your requirement, I think RCU is the best : Have a look at RCU locking mechanism

Resources