drmDropMaster requires root privileges? - linux-kernel

Pardon for the long introduction, but I haven't seen any other questions for this on SO.
I'm playing with DRM (Direct Rendering Manager, a wrapper for Linux kernel mode setting) and I'm having difficulty understanding a part of its design.
Basically, I can open a graphic card device in my virtual terminal, set up frame buffers, change connector and its CRTC just fine. This results in me being able to render to VT in a lightweight graphic mode without need for X server (that's what kms is about, and in fact X server uses it underneath).
Then I wanted to implement graceful VT switching, so when I hit ctrl+alt+f3 etc., I can see my other consoles. Turns out it's easy to do with calling ioctl() with stuff from linux/vt.h and handling some user signals.
But then I tried to switch from my graphic program to a running X server. Bzzt! didn't work at all. X server didn't draw anything at all. After some digging I found that in Linux kernel, only one program can do kernel mode setting. So what happens is this:
I switch from X to a virtual terminal
I run my program
This program enters graphic mode with drmOpen, drmModeSetCRTC etc.
I switch back to X
X has no longer privileges to restore its own mode.
Then I found this in wayland source code: drmDropMaster() and drmSetMaster(). These functions are supposed to release and regain privileges to set modes so that X server can continue to work, and after switching back to my program, it can take it from there.
Finally the real question.
These functions require root privileges. This is the part I don't understand. I can mess with kernel modes, but I can't say "okay X11, I'm done playing, I'm giving you the access now"? Why? Or should this work in theory, and I'm just doing something wrong in my code? (e.g. work with wrong file descriptors, or whatever.)
If I try to run my program as a normal user, I get "permission denied". If I run it as root, it works fine - I can switch from X to my program and vice versa.
Why?

Yes, drmSetMaster and drmDropMaster require root privileges because they allow you to do mode setting. Otherwise, any random application could display whatever it wanted to your screen. weston handles this through a setuid launcher program. The systemd people also added functionality to systemd-logind (which runs as root) to do the drm{Set,Drop}Master calls for you. This is what enables recent X servers to run without root privileges. You could look into this if you don't mind depending on systemd.
Your post seems to suggest that you can successfully call drmModeSetCRTC without root privileges. This doesn't make sense to me. Are you sure?
It is up to display servers like X, weston, and whatever you're working on to call drmDropMaster before it invokes the VT_RELDISP ioctl, so that the next session can call drmSetMaster successfully.

Before digging into why it doesn't work, I had to understand how it works.
So, calling drmModeSetCRTC and drmSetMaster in libdrm in reality just calls ioctl:
include/xf86drm.c
int drmSetMaster(int fd)
{
return ioctl(fd, DRM_IOCTL_SET_MASTER, 0);
}
This is handled by the kernel. In my program the most important function that controls the display is drmModeSetCRTC and drmModeAddFB, the rest is just diagnostics really. So let's see how they're handled by the kernel. Turns out there is a big table that maps ioctl events to their handlers:
drivers/gpu/drm/drm_ioctl.c
static const struct drm_ioctl_desc drm_ioctls[] = {
...
DRM_IOCTL_DEF(DRM_IOCTL_MODE_GETCRTC, drm_mode_getcrtc, DRM_CONTROL_ALLOW|DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_SETCRTC, drm_mode_setcrtc, DRM_MASTER|DRM_CONTROL_ALLOW|DRM_UNLOCKED),
...,
DRM_IOCTL_DEF(DRM_IOCTL_MODE_ADDFB, drm_mode_addfb, DRM_CONTROL_ALLOW|DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_ADDFB2, drm_mode_addfb2, DRM_CONTROL_ALLOW|DRM_UNLOCKED),
...,
},
This is used by the drm_ioctl, out of which the most interesting part is drm_ioctl_permit.
drivers/gpu/drm/drm_ioctl.c
long drm_ioctl(struct file *filp,
unsigned int cmd, unsigned long arg)
{
...
retcode = drm_ioctl_permit(ioctl->flags, file_priv);
if (unlikely(retcode))
goto err_i1;
...
}
static int drm_ioctl_permit(u32 flags, struct drm_file *file_priv)
{
/* ROOT_ONLY is only for CAP_SYS_ADMIN */
if (unlikely((flags & DRM_ROOT_ONLY) && !capable(CAP_SYS_ADMIN)))
return -EACCES;
/* AUTH is only for authenticated or render client */
if (unlikely((flags & DRM_AUTH) && !drm_is_render_client(file_priv) &&
!file_priv->authenticated))
return -EACCES;
/* MASTER is only for master or control clients */
if (unlikely((flags & DRM_MASTER) && !file_priv->is_master &&
!drm_is_control_client(file_priv)))
return -EACCES;
/* Control clients must be explicitly allowed */
if (unlikely(!(flags & DRM_CONTROL_ALLOW) &&
drm_is_control_client(file_priv)))
return -EACCES;
/* Render clients must be explicitly allowed */
if (unlikely(!(flags & DRM_RENDER_ALLOW) &&
drm_is_render_client(file_priv)))
return -EACCES;
return 0;
}
Everything makes sense so far. I can indeed call drmModeSetCrtc because I am the current DRM master. (I'm not sure why. This might have to do with X11 properly waiving its rights once I switch to another VT. Perhaps this alone allows me to become automatically the new DRM master once I start messing with ioctl?)
Anyway, let's take a look at the drmDropMaster and drmSetMaster definitions:
drivers/gpu/drm/drm_ioctl.c
static const struct drm_ioctl_desc drm_ioctls[] = {
...
DRM_IOCTL_DEF(DRM_IOCTL_SET_MASTER, drm_setmaster_ioctl, DRM_ROOT_ONLY),
DRM_IOCTL_DEF(DRM_IOCTL_DROP_MASTER, drm_dropmaster_ioctl, DRM_ROOT_ONLY),
...
};
What.
So my confusion was correct. I don't do anything wrong, things really are this way.
I'm under the impression that this is a serious kernel bug. Either I shouldn't be able to set CRTC at all, or I should be able to drop/set master. In any case, revoking every non-root program rights to draw to screen because
any random application could display whatever it wanted to your screen
is too aggressive. I, as the user, should have the freedom to control that without giving root access to the whole program, nor depending on systemd, for example by making chmod 0777 /dev/dri/card0 (or group management). As it is now, it looks to me like lazy man's answer to proper permission management.

Thanks for writing this up. This is indeed the expected outcome; you don't need to look for a subtle bug in your code.
It's definitely intended that you can become the master implicitly. A dev wrote example code as initial documentation for DRM, and it does not use SetMaster. And there is a comment in the source code (now drm_auth.c) "successfully became the device master (either through the SET_MASTER IOCTL, or implicitly through opening the primary device node when no one else is the current master that time)".
DRM_ROOT_ONLY is commented as
/**
* #DRM_ROOT_ONLY:
*
* Anything that could potentially wreak a master file descriptor needs
* to have this flag set. Current that's only for the SETMASTER and
* DROPMASTER ioctl, which e.g. logind can call to force a non-behaving
* master (display compositor) into compliance.
*
* This is equivalent to callers with the SYSADMIN capability.
*/
The above requires some clarification IMO. The way logind forces a non-behaving master is not simply by calling SETMASTER for a different master - that would actually fail. First, it must call DROPMASTER on the non-behaving master. So logind is relying on this permission check, to make sure the non-behaving master cannot then race logind and call SETMASTER first.
Equally logind is assuming the unprivileged user doesn't have permission to open the device node directly. I would suspect the ability to implicitly become master on open() is some form of backwards compatibility.
Notice, if you could drop your master, you couldn't use SETMASTER to get it back. This means the point of doing so is rather limited - you can't use it to implement the traditional switching back and forth between multiple graphics servers.
There is a way you can drop the master and get it back: close the fd, and re-open it when needed. It sounds to me like this would match how old-style X (pre-DRM?) worked - wasn't it possible to switch between multiple instances of the X server, and each of them would have to completely take over the hardware? So you always had to start from scratch after a VT switch. This is not as good as being able to switch masters though; logind says
/* On DRM devices we simply drop DRM-Master but keep it open.
* This allows the user to keep resources allocated. The
* CAP_SYS_ADMIN restriction to DRM-Master prevents users from
* circumventing this. */

As of Linux 5.8, drmDropMaster() no longer requires root privileges.
The relevant commit is 45bc3d26c: drm: rework SET_MASTER and DROP_MASTER perm handling .
The source code comments provide a good summary of the old and new situation:
In the olden days the SET/DROP_MASTER ioctls used to return EACCES when
CAP_SYS_ADMIN was not set. This was used to prevent rogue applications
from becoming master and/or failing to release it.
At the same time, the first client (for a given VT) is always master.
Thus in order for the ioctls to succeed, one had to explicitly run the
application as root or flip the setuid bit.
If the CAP_SYS_ADMIN was missing, no other client could become master...
EVER :-( Leading to a) the graphics session dying badly or b) a completely
locked session.
...
Here we implement the next best thing:
ensure the logind style of fd passing works unchanged, and
allow a client to drop/set master, iff it is/was master at a given point
in time.
...

Related

MFC: Conflicting information on how to clean up after using COleDataSource::DoDragDrop()

Can someone clear up all the conflicting information.
The documentation for COleDataSource::CacheData() says:
After the call to CacheData the ptd member of lpFormatEtc and the
contents of lpStgMedium are owned by the data object, not by the
caller.
The documentation for COleDataSource::CacheGlobalData() doesn't say that.
You find code examples of how to use COleDataSource::DoDragDrop() on places like Code Project which call COleDataSource::CacheGlobalData() then check the DoDragDrop() results and free the memory if it didn't drop:
DROPEFFECT dweffect=datasrc->DoDragDrop(DROPEFFECT_COPY);
// They say if operation wasn't accepted, or was canceled, we
// should call GlobalFree() to clean up.
if (dweffect==DROPEFFECT_NONE) {
GlobalFree(hgdrop);
}
You also have Q182219 (which for some unknown reason MS has totally broken all the good old MSKB links and you can't find any information anymore (links all over the Internet are dead). Q182219 says something about after a successful move operation on NT based OS (Win10), it will return DROPEFFECT_NONE so you have to check if things really moved.
So the questions are:
1) Should you really free the memory if the drop operation returns DROPEFFECT_NONE or would MFC handle all that?
2) Does it still return DROPEFFFECT_NONE after a successful move operation or does MFC handle that internally (or fixed in some Windows version) ?
3) If it does return DROPEFFECT_NONE after a move, wouldn't logic like the above be a double free on memory if it was a move operation?
Extra:
You also find examples and usage of people doing COleDataSource mydatasource which is WRONG. You have to allocate it like COleDataSource *mydatasource=new ColeDataSource() and then at the end use mydatasource->ExternalRelease() to release it. (ExternalRelease() handles calling InternalRelease() if the object isn't using aggregation (that is a derived class that acts as a wrapper))
TIA!!

int instruction from user space

I was under the impression that "int" instruction on x86 is not privileged. So, I thought we should be able to execute this instruction from the user space application. But does not seem so.
I am trying to execute int from user application on windows. I know it may not be right to do so. But I wanted to have some fun. But windows is killing my application.
I think the issue is due to condition cpl <=iopl. Does anyone know how to get around it?
Generally the old dispatcher mechanism for user mode code to transition to kernel mode in order to invoke kernel services was implemented by int 2Eh (now replaced by sysenter). Also int 3 is still reserved for breakpoints to this very day.
Basically the kernel sets up traps for certain interrupts (don't remember whether for all, though) and depending the trap code they will perform some service for the user mode invoker or if that is not possible your application would get killed, because it attempts a privileged operation.
Details would anyway depend on the exact interrupt you were trying to invoke. The functions DbgBreakPoint (ntdll.dll) and DebugBreak (kernel32.dll) do nothing other than invoking int 3 (or actually the specific opcode int3), for example.
Edit 1: On newer Windows versions (XP SP2 and newer, IIRC) sysenter replaces int 2Eh as I wrote in my answer. One possible reason why it gets terminated - although you should be able to catch this via exception handling - is because you don't pass the parameters that it expects on the stack. Basically the usermode part of the native API places the parameters for the system service you call onto the stack, then places the number of the service (an index into the system service dispatcher table - SSDT, sometimes SDT) into a specific register and then calls on newer systems sysenter and on older systems int 2Eh.
The minimum ring level of a given interrupt vector (which decides whether a given "int" is privileged) is based on the ring-level descriptor associated with the vector in the interrupt descriptor table.
In Windows the majority of interrupts are privileged instructions. This prevents user-mode from merely calling the double-fault handler to immediately bugcheck the OS.
There are some non-privileged interrupts in Windows. Specifically:
int 1 (both CD 01 encoding and debug interrupt occurs after a single instruction if EFLAGS_TF is set in eflags)
int 3 (both encoding CC and CD 03)
int 2E (Windows system call)
All other interrupts are privileged, and calling them causes the "invalid instruction" interrupt to be issued instead.
INT is a 'privilege controlled' instruction. It has to be this way for the kernel to protect itself from usermode. INT goes through the exact same trap vectors that hardware interrupts and processor exceptions go through, so if usermode could arbitrarily trigger these exceptions, the interrupt dispatching code would get confused.
If you want to trigger an interrupt on a particular vector that's not already set up by Windows, you have to modify the IDT entry for that interrupt vector with a debugger or a kernel driver. Patchguard won't let you do this from a driver on x64 versions of Windows.

Accesing block device from kernel module

I am interested in developing kernel module that binds two block devices into a new block device in such manner that first block device contains data at mount time, and the other is considered empty. Every write is being made to second partition, so on next mount the base filesystem remains unchanged. I know of solutions like UnionFS, but those are filesystem-based, while i want to develop it a layer lower, block-based.
Can anyone tell me how could i open ad read/write block device from kernel module? Possibly without using userspace program for reading/writing merged block devices. I found similar topic here, but the answer was rather unsatysfying because filp_* functions are rather for reading small config files, not for (large) block device I/O.
Since interface for creating block devices is standarized i was thinking of direct (or almost direct) acces to functions implementing source devices, as i will be requested to export similar functions anyway. If i could do that i would simply create some proxy-functions calling appropriate functions on source devices. Can i somehow obtain pointer to a gendisk structure that belongs to different driver?
This serves only my own purposes (satisfying quriosity being main of them) so i am not worried about messing my kernel up seriously.
Or does somebody know if module like that already exists?
The source code in the device mapper driver will suit your needs. Look at the code in the Linux source in Linux/drivers/md/dm-*.
You don't need to access the other device's gendisk structure, but rather its request queue. You can prepare I/O requests and push it down the other device's queue, and it will do the rest itself.
I have implemented a simple block device that opens another block device. Take a look in my post describing it:
stackbd: Stacking a block device over another block device
Here are some examples of functions that you need for accessing another device's gendisk.
The way to open another block device using its path ("/dev/"):
struct block_device *bdev_raw = lookup_bdev(dev_path);
printk("Opened %s\n", dev_path);
if (IS_ERR(bdev_raw))
{
printk("stackbd: error opening raw device <%lu>\n", PTR_ERR(bdev_raw));
return NULL;
}
if (!bdget(bdev_raw->bd_dev))
{
printk("stackbd: error bdget()\n");
return NULL;
}
if (blkdev_get(bdev_raw, STACKBD_BDEV_MODE, &stackbd))
{
printk("stackbd: error blkdev_get()\n");
bdput(bdev_raw);
return NULL;
}
The simplest example of passing an I/O request from one device to another is by remapping it without modifying it. Notice in the following code that the bi_bdev entry is modified with a different device. One can also modify the block address (*bi_sector) and the data itself.
static void stackbd_io_fn(struct bio *bio)
{
bio->bi_bdev = stackbd.bdev_raw;
trace_block_bio_remap(bdev_get_queue(stackbd.bdev_raw), bio,
bio->bi_bdev->bd_dev, bio->bi_sector);
/* No need to call bio_endio() */
generic_make_request(bio);
}
Consider examining the code for the dm / md block devices in drivers/md - these existing drivers create a block device that stores data on other block devices.
In fact, you could probably implement your idea as another "RAID personality" in md, and thereby make use of the existing userspace tools for setting up the devices.
You know, if you're a GPL'd kernel module you can just call open(), read(), write(), etc. from kernel mode right?
Of course this way has certain caveats including requiring forking from kernel mode to create a space for your handle to live.

Problem with .release behavior in file_operations

I'm dealing with a problem in a kernel module that get data from userspace using a /proc entry.
I set open/write/release entries for my own defined /proc entry, and manage well to use it to get data from userspace.
I handle errors in open/write functions well, and they are visible to user as open/fopen or write/fwrite/fprintf errors.
But some of the errors can only be checked at close (because it's the time all the data is available). In these cases I return something different than 0, which I supposed to be in some way the value 'close' or 'fclose' will return to user.
But whatever the value I return my close behave like if all is fine.
To be sure I replaced all the release() code by a simple 'return(-1);' and wrote a program that open/write/close the /proc entry, and prints the close return value (and the errno). It always return '0' whatever the value I give.
Behavior is the same with 'fclose', or by using shell mechanism (echo "..." >/proc/my/entry).
Any clue about this strange behavior that is not the one claimed in many tutorials I found?
BTW I'm using RHEL5 kernel (2.6.18, redhat modified), on a 64bit system.
Thanks.
Regards,
Yannick
The release() isn't allowed to cause the close() to fail.
You could require your userspace programs to call fsync() on the file descriptor before close(), if they want to find out about all possible errors; then implement your final error checking in the fsync() handler.

environment variables propagation on Windows system

It is possible to propagate in already opened application the value(environment variables of Windows) of a variable of Windows after its creation or its modification without having to restart the applications which turn?
How to?
Perhaps, using server fault to post a such question would be better?
Something like SendMessage(HWND_BROADCAST,WM_SETTINGCHANGE,0,TEXT("Environment")) is your best bet, but most applications will ignore it, but Explorer should handle it. (Allow applications to pick up updates)
If you want to go into crazy undocumented land, you could use WriteProcessMemory and update the environment block in every process you have access to.
Yes, this is possible.
Method
It is involved though. I'll outline the basic steps. The detail for each step is documented in many places on the web, including Stack Overflow.
Create a helper dll. The dll does nothing except set the environment variables you want to set. It can do this from DllMain without causing any problems. Just don't got mad with other function calls from inside DllMain. How you communicate to the DLL what variables to set and what values to set them is left for you to decide (read a file, read from registry...)
Enumerate all processes that you wish to update (toolhelp32 will help with this).
For each process you wish to update, inject your helper dll. CreateRemoteThread() will help with this. This will fail for 2% of all apps on NT 4, rising to 5% on XP. Most likely higher percentage failures for Vista/7 and the server versions.
Things you have to live with:
If you are running a 32 bit process on a 64 bit OS, CreateRemoteThread will fail to inject your DLL into 32 bit apps 100% of the time (and cannot inject into 64 bit apps anyway as that is a job for a 64 bit app).
EDIT: Turns out 100% isn't correct. But it is very hit and miss. Don't rely on it.
Don't remain resident
If you don't want your helper DLL to remain resident in the target application, return FALSE for the DLL_PROCESS_ATTACH notification.
BOOL APIENTRY DllMain(HANDLE hModule,
DWORD ul_reason_for_call,
LPVOID lpReserved)
{
if (ul_reason_for_call == DLL_PROCESS_ATTACH)
{
// set our env vars here
SetEnvironmentVariable("weebles", "wobble but they don't fall down");
// we don't want to remain resident, our work is done
return FALSE;
}
return TRUE;
}
No, I'm pretty sure that's not possible.

Resources