How to to stop a machine from sleeping/hibernating for execution period - windows

I have an app written in golang (partially), as part of its operation it will spawn an external process (written in c) and begin monitoring. This external process can take many hours to complete so I am looking for a way to prevent the machine from sleeping or hibernating whilst processing.
I would like to be able to then relinquish this lock so that when the process is finished the machine is allowed to sleep/hibernate
I am initially targeting windows, but a cross-platform solution would be ideal (does nix even hibernate?).

Thanks to Anders for pointing me in the right direction - I put together a minimal example in golang (see below).
Note: polling to reset the timer seems to be the only reliable method, I found that when trying to combine with the continuous flag it would only take effect for approx 30 seconds (no idea why), having said that polling on this example is excessive and could probably be increased to 10 mins (since min hibernation time is 15 mins)
Also FYI this is a windows specific example:
package main
import (
"log"
"syscall"
"time"
)
// Execution States
const (
EsSystemRequired = 0x00000001
EsContinuous = 0x80000000
)
var pulseTime = 10 * time.Second
func main() {
kernel32 := syscall.NewLazyDLL("kernel32.dll")
setThreadExecStateProc := kernel32.NewProc("SetThreadExecutionState")
pulse := time.NewTicker(pulseTime)
log.Println("Starting keep alive poll... (silence)")
for {
select {
case <-pulse.C:
setThreadExecStateProc.Call(uintptr(EsSystemRequired))
}
}
}
The above is tested on win 7 and 10 (not tested on Win 8 yet - presumed to work there too).
Any user request to sleep will override this method, this includes actions such as shutting the lid on a laptop (unless power management settings are altered from defaults)
The above were sensible behaviors for my application.

On Windows, your first step is to try SetThreadExecutionState:
Enables an application to inform the system that it is in use, thereby preventing the system from entering sleep or turning off the display while the application is running
This is not a perfect solution but I assume this is not an issue for you:
The SetThreadExecutionState function cannot be used to prevent the user from putting the computer to sleep. Applications should respect that the user expects a certain behavior when they close the lid on their laptop or press the power button
The Windows 8 connected standby feature is also something you might need to consider. Looking at the power related APIs we find this description of PowerRequestSystemRequired:
The system continues to run instead of entering sleep after a period of user inactivity.
This request type is not honored on systems capable of connected standby. Applications should use PowerRequestExecutionRequired requests instead.
If you are dealing with tablets and other small devices then you can try to call PowerSetRequest with PowerRequestExecutionRequired to prevent this although the description of that is also not ideal:
The calling process continues to run instead of being suspended or terminated by process lifetime management mechanisms. When and how long the process is allowed to run depends on the operating system and power policy settings.
You might also want to use ShutdownBlockReasonCreate but I'm not sure if it blocks sleep/hibernate.

Related

drmDropMaster requires root privileges?

Pardon for the long introduction, but I haven't seen any other questions for this on SO.
I'm playing with DRM (Direct Rendering Manager, a wrapper for Linux kernel mode setting) and I'm having difficulty understanding a part of its design.
Basically, I can open a graphic card device in my virtual terminal, set up frame buffers, change connector and its CRTC just fine. This results in me being able to render to VT in a lightweight graphic mode without need for X server (that's what kms is about, and in fact X server uses it underneath).
Then I wanted to implement graceful VT switching, so when I hit ctrl+alt+f3 etc., I can see my other consoles. Turns out it's easy to do with calling ioctl() with stuff from linux/vt.h and handling some user signals.
But then I tried to switch from my graphic program to a running X server. Bzzt! didn't work at all. X server didn't draw anything at all. After some digging I found that in Linux kernel, only one program can do kernel mode setting. So what happens is this:
I switch from X to a virtual terminal
I run my program
This program enters graphic mode with drmOpen, drmModeSetCRTC etc.
I switch back to X
X has no longer privileges to restore its own mode.
Then I found this in wayland source code: drmDropMaster() and drmSetMaster(). These functions are supposed to release and regain privileges to set modes so that X server can continue to work, and after switching back to my program, it can take it from there.
Finally the real question.
These functions require root privileges. This is the part I don't understand. I can mess with kernel modes, but I can't say "okay X11, I'm done playing, I'm giving you the access now"? Why? Or should this work in theory, and I'm just doing something wrong in my code? (e.g. work with wrong file descriptors, or whatever.)
If I try to run my program as a normal user, I get "permission denied". If I run it as root, it works fine - I can switch from X to my program and vice versa.
Why?
Yes, drmSetMaster and drmDropMaster require root privileges because they allow you to do mode setting. Otherwise, any random application could display whatever it wanted to your screen. weston handles this through a setuid launcher program. The systemd people also added functionality to systemd-logind (which runs as root) to do the drm{Set,Drop}Master calls for you. This is what enables recent X servers to run without root privileges. You could look into this if you don't mind depending on systemd.
Your post seems to suggest that you can successfully call drmModeSetCRTC without root privileges. This doesn't make sense to me. Are you sure?
It is up to display servers like X, weston, and whatever you're working on to call drmDropMaster before it invokes the VT_RELDISP ioctl, so that the next session can call drmSetMaster successfully.
Before digging into why it doesn't work, I had to understand how it works.
So, calling drmModeSetCRTC and drmSetMaster in libdrm in reality just calls ioctl:
include/xf86drm.c
int drmSetMaster(int fd)
{
return ioctl(fd, DRM_IOCTL_SET_MASTER, 0);
}
This is handled by the kernel. In my program the most important function that controls the display is drmModeSetCRTC and drmModeAddFB, the rest is just diagnostics really. So let's see how they're handled by the kernel. Turns out there is a big table that maps ioctl events to their handlers:
drivers/gpu/drm/drm_ioctl.c
static const struct drm_ioctl_desc drm_ioctls[] = {
...
DRM_IOCTL_DEF(DRM_IOCTL_MODE_GETCRTC, drm_mode_getcrtc, DRM_CONTROL_ALLOW|DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_SETCRTC, drm_mode_setcrtc, DRM_MASTER|DRM_CONTROL_ALLOW|DRM_UNLOCKED),
...,
DRM_IOCTL_DEF(DRM_IOCTL_MODE_ADDFB, drm_mode_addfb, DRM_CONTROL_ALLOW|DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_ADDFB2, drm_mode_addfb2, DRM_CONTROL_ALLOW|DRM_UNLOCKED),
...,
},
This is used by the drm_ioctl, out of which the most interesting part is drm_ioctl_permit.
drivers/gpu/drm/drm_ioctl.c
long drm_ioctl(struct file *filp,
unsigned int cmd, unsigned long arg)
{
...
retcode = drm_ioctl_permit(ioctl->flags, file_priv);
if (unlikely(retcode))
goto err_i1;
...
}
static int drm_ioctl_permit(u32 flags, struct drm_file *file_priv)
{
/* ROOT_ONLY is only for CAP_SYS_ADMIN */
if (unlikely((flags & DRM_ROOT_ONLY) && !capable(CAP_SYS_ADMIN)))
return -EACCES;
/* AUTH is only for authenticated or render client */
if (unlikely((flags & DRM_AUTH) && !drm_is_render_client(file_priv) &&
!file_priv->authenticated))
return -EACCES;
/* MASTER is only for master or control clients */
if (unlikely((flags & DRM_MASTER) && !file_priv->is_master &&
!drm_is_control_client(file_priv)))
return -EACCES;
/* Control clients must be explicitly allowed */
if (unlikely(!(flags & DRM_CONTROL_ALLOW) &&
drm_is_control_client(file_priv)))
return -EACCES;
/* Render clients must be explicitly allowed */
if (unlikely(!(flags & DRM_RENDER_ALLOW) &&
drm_is_render_client(file_priv)))
return -EACCES;
return 0;
}
Everything makes sense so far. I can indeed call drmModeSetCrtc because I am the current DRM master. (I'm not sure why. This might have to do with X11 properly waiving its rights once I switch to another VT. Perhaps this alone allows me to become automatically the new DRM master once I start messing with ioctl?)
Anyway, let's take a look at the drmDropMaster and drmSetMaster definitions:
drivers/gpu/drm/drm_ioctl.c
static const struct drm_ioctl_desc drm_ioctls[] = {
...
DRM_IOCTL_DEF(DRM_IOCTL_SET_MASTER, drm_setmaster_ioctl, DRM_ROOT_ONLY),
DRM_IOCTL_DEF(DRM_IOCTL_DROP_MASTER, drm_dropmaster_ioctl, DRM_ROOT_ONLY),
...
};
What.
So my confusion was correct. I don't do anything wrong, things really are this way.
I'm under the impression that this is a serious kernel bug. Either I shouldn't be able to set CRTC at all, or I should be able to drop/set master. In any case, revoking every non-root program rights to draw to screen because
any random application could display whatever it wanted to your screen
is too aggressive. I, as the user, should have the freedom to control that without giving root access to the whole program, nor depending on systemd, for example by making chmod 0777 /dev/dri/card0 (or group management). As it is now, it looks to me like lazy man's answer to proper permission management.
Thanks for writing this up. This is indeed the expected outcome; you don't need to look for a subtle bug in your code.
It's definitely intended that you can become the master implicitly. A dev wrote example code as initial documentation for DRM, and it does not use SetMaster. And there is a comment in the source code (now drm_auth.c) "successfully became the device master (either through the SET_MASTER IOCTL, or implicitly through opening the primary device node when no one else is the current master that time)".
DRM_ROOT_ONLY is commented as
/**
* #DRM_ROOT_ONLY:
*
* Anything that could potentially wreak a master file descriptor needs
* to have this flag set. Current that's only for the SETMASTER and
* DROPMASTER ioctl, which e.g. logind can call to force a non-behaving
* master (display compositor) into compliance.
*
* This is equivalent to callers with the SYSADMIN capability.
*/
The above requires some clarification IMO. The way logind forces a non-behaving master is not simply by calling SETMASTER for a different master - that would actually fail. First, it must call DROPMASTER on the non-behaving master. So logind is relying on this permission check, to make sure the non-behaving master cannot then race logind and call SETMASTER first.
Equally logind is assuming the unprivileged user doesn't have permission to open the device node directly. I would suspect the ability to implicitly become master on open() is some form of backwards compatibility.
Notice, if you could drop your master, you couldn't use SETMASTER to get it back. This means the point of doing so is rather limited - you can't use it to implement the traditional switching back and forth between multiple graphics servers.
There is a way you can drop the master and get it back: close the fd, and re-open it when needed. It sounds to me like this would match how old-style X (pre-DRM?) worked - wasn't it possible to switch between multiple instances of the X server, and each of them would have to completely take over the hardware? So you always had to start from scratch after a VT switch. This is not as good as being able to switch masters though; logind says
/* On DRM devices we simply drop DRM-Master but keep it open.
* This allows the user to keep resources allocated. The
* CAP_SYS_ADMIN restriction to DRM-Master prevents users from
* circumventing this. */
As of Linux 5.8, drmDropMaster() no longer requires root privileges.
The relevant commit is 45bc3d26c: drm: rework SET_MASTER and DROP_MASTER perm handling .
The source code comments provide a good summary of the old and new situation:
In the olden days the SET/DROP_MASTER ioctls used to return EACCES when
CAP_SYS_ADMIN was not set. This was used to prevent rogue applications
from becoming master and/or failing to release it.
At the same time, the first client (for a given VT) is always master.
Thus in order for the ioctls to succeed, one had to explicitly run the
application as root or flip the setuid bit.
If the CAP_SYS_ADMIN was missing, no other client could become master...
EVER :-( Leading to a) the graphics session dying badly or b) a completely
locked session.
...
Here we implement the next best thing:
ensure the logind style of fd passing works unchanged, and
allow a client to drop/set master, iff it is/was master at a given point
in time.
...

Clicking qt .app vs running .exe in terminal

I have a qt gui that spawns a c++11 clang server in osx 10.8 xcode
It does a cryptographic proof-of-work mining of a name (single mining thread)
when i click .app process takes 4 1/2 hours
when i run the exact exe inside the .app folder, from the terminal, process takes 30 minutes
question, how do i debug this?
thank you
====================================
even worse:
mining server running in terminal.
if i start GUI program that connect to server and just sends (ipc) it the "mine" command: 4 hours
if I start a CL-UI that connects to server and just sends (ipc) it the "mine" command: 30 minutes
both cases the server is mining in a tight loop. corrupt memory? single CPU is at 100%, as it should be.. cant figure it out.
=========
this variable is is used w/o locking...
volatile bool running = true;
server thread
fut = std::async(&Commissioner::generateName, &comish, name, m_priv.get_public_key() );
server loop...
nonce_t reset = std::numeric_limits<nonce_t>::max()-1000;
while ( running && hit < target ) {
if ( nt.nonce >= reset )
{
nt.utc_sec = fc::time_point::now();
nt.nonce = 0;
}
else { ++nt.nonce; }
hit = difficulty(nt.id());
}
evidence is now pointing to deterministic chaotic behavior. just very sensitive to initial conditions.
initial condition may be the timestamp data within the object that is hashed during mining.
mods please close.

On Windows, WSASend fails with WSAENOBUFS

On Windows XP when I am calling WSASend in iterations on non-blocking socket, it fails with WSAENOBUFS.
I've two cases here:
Case 1:
On non-blocking socket I am calling WSASend. Here is pseudo-code:
while(1)
{
result = WSASend(...); // Buffersize 1024 bytes
if (result == -1)
{
if (WSAGetLastError() == WSAENOBUFS)
{
// Wait for some time before calling WSASend again
Sleep(1000);
}
}
}
In this case WSASend returns sucessfully for around 88000 times. Then it fails with WSAENOBUFS and never recovers even when tried after some time as shown in the code.
Case 2:
In order to solve this problem, I referred this and as suggested there,
just before above code, I called setsockopt with SO_SNDBUF and set buffersize 0 (zero)
In this case, WSASend returns sucessfully for around 2600 times. Then it fails. But after waiting it succeeds again for 2600 times then fails.
Now I've these questions in both the cases:
Case 1:
What factors decides this number 88000 here?
If the failure was because of TCP buffer was full, why it didn't recover after some time?
Case 2:
Again, what factors decides the number 2600 here?
As given in Microsoft KB article, if instead of internal TCP buffers it sends from application buffer directly, why would it fail with WSAENOBUFS?
EDIT:
In case of asynchronous sockets (On Windows XP), the behavior is more strange. If I ignore WSAENOBUFS and continued further writing to socket I eventually get disconnection WSAECONNRESET. And not sure at the moment why does that happen?
The values are undocumented and depend on what's installed on your machine that may sit between your application and the network driver. They're likely linked to the amount of memory in the machine. The limits (most probably non-paged pool memory and i/o page lock limit) are likely MUCH higher on Vista and above.
The best way to deal with the problem is add application level flow control to your protocol so that you don't assume that you can just send at whatever rate you feel like. See this blog posting for details of how non-blocking and async I/O can cause resource usage to balloon and how you have no control over it unless you have your own flow control.
In summary, never assume that you can just write data to the wire as fast as you like using non-blocking/async APIs. Remember that due to how TCP/IP's internal flow control works you COULD be using an uncontrollable amount of local machine resources and the client is the only thing that has any control over how fast those resources are released back to the O/S on the server machine.

How do I increase windows interrupt latency to stress test a driver?

I have a driver & device that seem to misbehave when the user does any number of complex things (opening large word documents, opening lots of files at once, etc.) -- but does not reliably go wrong when any one thing is repeated. I believe it's because it does not handle high interrupt latency situations gracefully.
Is there a reliable way to increase interrupt latency on Windows XP to test this theory?
I'd prefer to write my test programn in python, but c++ & WinAPI is also fine...
My apologies for not having a concrete answer, but an idea to explore would be to use either c++ or cython to hook into the timer interrupt (the clock tick one) and waste time in there. This will effectively increase latency.
I don't know if there's an existing solution. But you may create your own one.
On Windows all the interrupts are prioritized. So that if there's a driver code running on a high IRQL, your driver won't be able to serve your interrupt if its level is lower. At least it won't be able to run on the same processor.
I'd do the following:
Configure your driver to run on a single processor (don't remember how to do this, but such an option definitely exists).
Add an I/O control code to your driver.
In your driver's Dispatch routine do a busy wait on a high IRQL (more about this later)
Call your driver (via DeviceIoControl) to simulate a stress.
The busy wait may look something like this:
KIRQL oldIrql;
__int64 t1, t2;
KeRaiseIrql(31, &oldIrql);
KeQuerySystemTime((LARGE_INTEGER*) &t1);
while (1)
{
KeQuerySystemTime((LARGE_INTEGER*) &t2);
if (t1 - t1 > /* put the needed time interval */)
break;
}
KeLowerIrql(oldIrql);

How to determine which task is dead?

I have an embedded system that has multiple (>20) tasks running at different priorities. I also have watchdog task that runs to check that all the other tasks are not stuck. My watchdog is working because every once in a blue moon, it will reboot the system because a task did not check in.
How do I determine which task died?
I can't just blame the oldest task to kick the watchdog because it might have been held off by a higher priority task that is not yielding.
Any suggestions?
A per-task watchdog requires that the higher priority tasks yield for an adequate time so that all may kick the watchdog. To determine which task is at fault, you'll have to find the one that's starving the others. You'll need to measure task execution times between watchdog checks to locate the actual culprit.
Is this pre-emptive? I gather so since otherwise a watchdog task would not run if one of the others had gotten stuck.
You make no mention of the OS but, if a watchdog task can check if a single task has not checked in, there must be separate channels of communication between each task and the watchdog.
You'll probably have to modify the watchdog to somehow dump the task number of the one that hasn't checked in and dump the task control blocks and memory so you can do a post-mortem.
Depending on the OS, this could be easy or hard.
Even I was working last few weeks on Watchdog reset problem. But fortunately for me in the ramdump files (in ARM development environment), which has one Interrupt handler trace buffer, containing PC and SLR at each of the interrupts. Thus from the trace buffer I could exactly find out which part of code was running before WD reset.
I think if you have same kind of mechanism of storing PC, SLR at each interrupt then you can precisely find out culprit task.
Depending on your system and OS, there may be different approaches. One very low level approach I have used is to blink an LED on when each of the tasks is running. You may need to put a scope on the LEDs to see very fast task switching.
For an interrupt-driven watchdog, you'd just make the task switcher update the currently running task number each time it is changed, allowing you to identify which one didn't yield.
However, you suggest you wrote the watchdog as a task yourself, so before rebooting, surely the watchdog can identify the starved task? You can store this in memory that persists beyond a warm reboot, or send it over a debug interface. The problem with this is that the starved task is probably not the problematic one: you'll probably want to know the last few task switches (and times) in order to identify the cause.
A simplistic, back of the napkin approach would be something like this:
int8_t wd_tickle[NUM_TASKS]
void taskA_main()
{
...
// main loop
while(1) {
...
wd_tickle[TASKA_NUM]++;
}
}
... tasks B, C, D... follow similar pattern
void watchdog_task()
{
for(int i= 0; i < NUM_TASKS; i++) {
if(0 == wd_tickle[i]) {
// Egads! The task didn't kick us! Reset and record the task number
}
}
}
How is your system working exactly? I always use a combination of software and hardware watchdogs. Let me explain...
My example assumes you're working with a preemptive real time kernel and you have watchdog support in your cpu/microcontroller. This watchdog will perform a reset if it was not kicked withing a certain period of time. You want to check two things:
1) The periodic system timer ("RTOS clock") is running (if not, functions like "sleep" would no longer work and your system is unusable).
2) All threads can run withing a reasonable period of time.
My RTOS (www.lieron.be/micror2k) provides the possibility to run code in the RTOS clock interrupt handler. This is the only place where you refresh the hardware watchdog, so you're sure the clock is running all the time (if not the watchdog will reset your system).
In the idle thread (always running at lowest priority), a "software watchdog" is refreshed. This is simply setting a variable to a certain value (e.g. 1000). In the RTOS clock interrupt (where you kick the hardware watchdog), you decrement and check this value. If it reaches 0, it means that the idle thread has not run for 1000 clock ticks and you reboot the system (can be done by looping indefinitely inside the interrupt handler to let the hardware watchdog reboot).
Now for your original question. I assume the system clock keeps running, so it's the software watchdog that resets the system. In the RTOS clock interrupt handler, you can do some "statistics gathering" in case the software watchdog situation occurs. Instead of resetting the system, you can see what thread is running at each clock tick (after the problem occurs) and try to find out what's going on. It's not ideal, but it will help.
Another option is to add several software watchdogs at different priorities. Have the idle thread set VariableA to 1000 and have a (dedicated) medium priority thread set Variable B. In the RTOS clock interrupt handler, you check both variables. With this information you know if the looping thread has a priority higher then "medium" or lower then "medium". If you wish you can add a 3rd or 4th or how many software watchdogs you like. Worst case, add a software watchdog for each priority that's used (will cost you as many extra threads though).

Resources