CPU idle loop without Sleep

CPU idle loop without Sleep - windows

I am learning WIN32 ASM right now and I was wondering if there is something like an "idle" infinite loop that doesn't consume any resources at all. Basically I need a running process to experiment with that goes like this:
loop:
; alternative to sleep...
jmp loop
Is there something that may idle the process?

You can't have it both ways. You can either consume the CPU or you can let it do something else. You can conserve power and avoid depriving other cores of resources with rep; nop; (also known as pause), as Vlad Lazarenko suggested. But if you loop without yielding the core to another process, at least that virtual core cannot do anything else.

Note: You should never use empty loops to make the application idle. It will load your processor to 100%
There are several ways to make the application idle, when there is no GUI activity in Win32 environment. The main (documented in MSDN) is to use "GetMessage" function in the main loop in order to extract the messages from the message queue.
When the message queue is empty, this function will idle, consuming very low processor time, waiting for message to arrive in the message queue.
Below is an example, using FASM macro library:
msg_loop:
invoke GetMessage, msg, NULL, 0, 0
cmp eax, 1
jb end_loop
jne msg_loop
invoke TranslateMessage, msg
invoke DispatchMessage, msg
jmp msg_loop
Another approach is used when you want to catch the moment when the application goes to idle state and make some low priority, one-time processing (for example enabling/disabling the buttons on the toolbar, according to the state of the application).
In this case, a combination of PeekMessage and WaitMessage have to be used. PeekMessage function returns immediately, even when the message queue is empty. This way, you can detect this situation and provide some idle tasks to be done, then you have to call WaitMessage in order to idle the process waiting for incoming messages.
Here is an simplified example from my code (using FreshLib macros):
; Main message loop
Run:
invoke PeekMessageA, msg, 0, 0, 0, PM_REMOVE
test eax,eax
jz .empty
cmp [msg.message], WM_QUIT
je .terminate
invoke TranslateMessage, msg
invoke DispatchMessageA, msg
jmp Run
.empty:
call OnIdle
invoke WaitMessage
jmp Run
.terminate:
FinalizeAll
stdcall TerminateAll, 0

What do you mean by "consume resources"?
If you just want a command that does nothing? If so nop will do that, and you can loop as much as you want even: rep; nop. However, the CPU will actually be busy doing work: executing the "no operation" instruction.
If you want an instruction that will cause the CPU itself to stop, then you are sorta-kinda out of luck: although there are ways to do that, you cannot do it from userspace.

With ring 0 access level (like kernel driver), you could use the x86 HLT opcode, but you need to have a system programming skill to really understand how to use it. Using HLT this way requires interrupts to be enabled (not masked) and the guarantee of an interrupt occurring (e.g. system timer), because the return from the interrupt will execute the next instruction after the HLT.
Without ring 0 access you'll never find any x86 opcode to enter an "idle" mode...
You only could find some instructions which consume less power (no memory access, no cache access, no FPU access, low ALU usage...).

Yes, there are some architectures support intrinsic idle state AKA halt.
For example, in x86 hlt, opcode 0xf4. Probably can be called only on privileged mode.
CPU Switches from User mode to Kernel Mode : What exactly does it do? How does it makes this transition?
How to completely suspend the processor?
a Linux's userspace example I found here:
.section .rodata
greeting:
.string "Hello World\n"
.text
_start:
mov $12,%edx /* write(1, "Hello World\n", 12) */
mov $greeting,%ecx
mov $1,%ebx
mov $4,%eax /* write is syscall 4 */
int $0x80
xorl %ebx, %ebx /* Set exit status and exit */
mov $0xfc,%eax
int $0x80
hlt /* Just in case... */

Related

DOS DEBUG trace command doesn't work as I would expect

I have ASM code which print abc using looping syntax. Here is my code
;abc.com
.model small
.code
org 100h
start:
mov ah, 02h
mov dl, 'a'
mov cx, 3h
ulang:
int 21h
inc dl
loop ulang
int 20h
end start
the COM program run normally
result of debug abc.com followed with -t looks like
The question is why it's NOP after INT 21, instead of INC dl? AFAIK it should INC dl then LOOP xxxx for three times then INT 20.
When I press -t continously it's go somewhere I don't know till crash, means can't find INT 20h
it's different with debug abc.com followed with -u
it's show INC dl and LOOP 0107 which indicate looping.
FYI:
Win 7 Ultimate SP 1 32 Bit
GUI Turbo ASM x86 3.0
Celeron Dual Core n2840

The Trace command in debug is the equivalent of the STEP INTO feature of modern day debuggers. The int instruction (like call) executes a series of instructions and then returns back to the caller. Trace will step into a software interrupt handler or a function and execute each instruction one at a time. The MSDN documentation for debug says this about Trace:
Executes one instruction and displays the contents of all registers, the status of all flags, and the decoded form of the instruction executed.
In your case you hit int 21h and jumped to the software interrupt handlers code at CS:IP 00A7:107C . If you trace through all the interrupt handler code you'd eventually reach CS:IP of 1400:0109 where the INC DL instruction is.
In order to execute a function or interrupt without stepping through each instruction associated with it, you can use the proceed command. Proceed is akin to the STEP OVER feature of modern day debuggers. The code of an interrupt handler or a function/subroutine will execute and then break on the instruction after the INT or CALL instruction.
The documentation says this about PROCEED:
When the p command transfers control from Debug to the program being tested, that program runs without interruption until the loop, repeated string instruction, software interrupt, or subroutine at the specified address is completed, or until the specified number of machine instructions have been executed. Control then returns to Debug.

Structured Exception Handler catches near-zero EIP trap differently on nearly identical machines?

I have a rather complex, but extremely well-tested assembly language x86-32 application running on variety of x86-32 and x86-64 boxes. This is a runtime system for a language compiler, so it supports the execution of another compiled binary program, the "object code".
It uses Windows SEH to catch various kinds of traps: division by zero, illegal access, ... and prints a register dump using the context information provided by Windows, that shows the state of the machine at the time of the trap. (It does lots of other stuff irrelevant to the question, such as printing a function backtrace or recovering from the division by zero as appropriate). This allows the writer of the "object code" to get some idea what went wrong with his program.
It behaves differently on two Windows 7-64 systems, that are more or less identical, on what I think is an illegal memory access. The specific problem is that the "object code" (not the well-tested runtime system) somewhere stupidly loads 0x82 into EIP; that is a nonexistent page in the address space AFAIK. I expect a Windows trap though the SEH, and expect to a register dump with EIP=00000082 etc.
On one system, I get exactly that register dump. I could show it here, but it doesn't add anything to my question. So, it is clear the SEH in my runtime system can catch this, and display the situation. This machine does not have any MS development tools on it.
On the other ("mystery") system, with the same exact binaries for runtime system and object code, all I get is the command prompt. No further output. FWIW, this machine has MS Visual Studio 2010 on it. The mystery machine is used heavily for other purposes, and shows no other funny behaviors in normal use.
I assume the behavior difference is caused by a Windows configuration somewhere, or something that Visual Studio controls. It isn't the DEP configuration the system menu; they are both configured (vanilla) as "DEP for standard system processes". And my runtime system executable has "No (/NXCOMPAT:NO)" configured.
Both machines are i7 but different chips, 4 cores, lots of memory, different motherboards. I don't think this is relevant; surely both of these CPUs take traps the same way.
The runtime system includes the following line on startup:
SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX); // stop Windows pop-up on crashes
This was recently added to prevent the "mystery" system from showing a pop-up window, "xxx.exe has stopped working" when the crash occurs. The pop-up box behaviour doesn't happen on the first system, so all this did was push the problem into a different corner on the "mystery" machine.
Any clue where I look to configure/control this?
I provide here the SEH code I am using. It has been edited
to remove a considerable amount of sanity-checking code
that I claim has no effect on the apparant state seen
in this code.
The top level of the runtime system generates a set of worker
threads (using CreateThread) and points to execute ASMGrabGranuleAndGo;
each thread sets up its own SEH, and branches off to a work-stealing scheduler, RunReadyGranule. To the best of my knowledge, the SEH is not changed
after that; at least, the runtime system and the "object code" do
not do this, but I have no idea what the underlying (e.g, standard "C")
libraries might do.
Further down I provide the trap handler, TopLevelEHFilter.
Yes, its possible the register printing machinery itself blows
up causing a second exception. I'll try to check into this again soon,
but IIRC my last attempt to catch this in the debugger on the
mystery machine, did not pass control to the debugger, just
got me the pop up window.
public ASMGrabGranuleAndGo
ASSUME FS:NOTHING ; cancel any assumptions made for this register
ASMGrabGranuleAndGo:
;Purpose: Entry for threads as workers in PARLANSE runtime system.
; Each thread initializes as necessary, just once,
; It then goes and hunts for work in the GranulesQ
; and start executing a granule whenever one becomes available
; install top level exception handler
; Install handler for hardware exceptions
cmp gCompilerBreakpointSet, 0
jne HardwareEHinstall_end ; if set, do not install handler
push offset TopLevelEHFilter ; push new exception handler on Windows thread stack
mov eax, [TIB_SEH] ; expected to be empty
test eax, eax
BREAKPOINTIF jne
push eax ; save link to old exception handler
mov fs:[TIB_SEH], esp ; tell Windows that our exception handler is active for this thread
HardwareEHinstall_end:
;Initialize FPU to "empty"... all integer grains are configured like this
finit
fldcw RTSFPUStandardMode
lock sub gUnreadyProcessorCount, 1 ; signal that this thread has completed its initialization
##: push 0 ; sleep for 0 ticks
call MySleep ; give up CPU (lets other threads run if we don't have enuf CPUs)
lea esp, [esp+4] ; pop arguments
mov eax, gUnreadyProcessorCount ; spin until all other threads have completed initialization
test eax, eax
jne #b
mov gThreadIsAlive[ecx], TRUE ; signal to scheduler that this thread now officially exists
jmp RunReadyGranule
ASMGrabGranuleAndGo_end:
;-------------------------------------------------------------------------------
TopLevelEHFilter: ; catch Windows Structured Exception Handling "trap"
; Invocation:
; call TopLevelEHFilter(&ReportRecord,&RegistrationRecord,&ContextRecord,&DispatcherRecord)
; The arguments are passed in the stack at an offset of 8 (<--NUMBER FROM MS DOCUMENT)
; ESP here "in the stack" being used by the code that caused the exception
; May be either grain stack or Windows thread stack
extern exit :proc
extern syscall #RTSC_PrintExceptionName#4:near ; FASTCALL
push ebp ; act as if this is a function entry
mov ebp, esp ; note: Context block is at offset ContextOffset[ebp]
IF_USING_WINDOWS_THREAD_STACK_GOTO unknown_exception, esp ; don't care what it is, we're dead
; *** otherwise, we must be using PARLANSE function grain stack space
; Compiler has ensured there's enough room, if the problem is a floating point trap
; If the problem is illegal memory reference, etc,
; there is no guarantee there is enough room, unless the application is compiled
; with -G ("large stacks to handle exception traps")
; check what kind of exception
mov eax, ExceptionRecordOffset[ebp]
mov eax, ExceptionRecord.ExceptionCode[eax]
cmp eax, _EXCEPTION_INTEGER_DIVIDE_BY_ZERO
je div_by_zero_exception
cmp eax, _EXCEPTION_FLOAT_DIVIDE_BY_ZERO
je float_div_by_zero_exception
jmp near ptr unknown_exception
float_div_by_zero_exception:
mov ebx, ContextOffset[ebp] ; ebx = context record
mov Context.FltStatusWord[ebx], CLEAR_FLOAT_EXCEPTIONS ; clear any floating point exceptions
mov Context.FltTagWord[ebx], -1 ; Marks all registers as empty
div_by_zero_exception: ; since RTS itself doesn't do division (that traps),
; if we get *here*, then we must be running a granule and EBX for granule points to GCB
mov ebx, ContextOffset[ebp] ; ebx = context record
mov ebx, Context.Rebx[ebx] ; grain EBX has to be set for AR Allocation routines
ALLOCATE_2TOK_BYTES 5 ; 5*4=20 bytes needed for the exception structure
mov ExceptionBufferT.cArgs[eax], 0
mov ExceptionBufferT.pException[eax], offset RTSDivideByZeroException ; copy ptr to exception
mov ebx, ContextOffset[ebp] ; ebx = context record
mov edx, Context.Reip[ebx]
mov Context.Redi[ebx], eax ; load exception into thread's edi
GET_GRANULE_TO ecx
; This is Windows SEH (Structured Exception Handler... see use of Context block below!
mov eax, edx
LOOKUP_EH_FROM_TABLE ; protected by DelayAbort
TRUST_JMP_INDIRECT_OK eax
mov Context.Reip[ebx], eax
mov eax, ExceptionContinueExecution ; signal to Windows: "return to caller" (we've revised the PC to go to Exception handler)
leave
ret
TopLevelEHFilter_end:
unknown_exception:
<print registers, etc. here>

"DEP for standard system processes" won't help you; it's internally known as "OptIn". What you need is the IMAGE_DLLCHARACTERISTICS_NX_COMPAT flag set in the PE header of your .exe file. Or call the SetProcessDEPPolicy function in kernel32.dll The SetProcessMitigationPolicy would be good also... but it isn't available until Windows 8.
There's some nice explanation on Ed Maurer's blog, which explains both how .NET uses DEP (which you won't care about) but also the system rules (which you do).
BIOS settings can also affect whether hardware NX is available.

x86 assembly (masm32) - Can I use int 21h on windows xp to print things?

Just wondering, in regards to my post Alternatives to built-in Macros, is it possible to avoid using the StdOut macro by using the int 21h windows API? Such as:
.data
msg dd 'This will be displayed'
;original macro usage:
invoke StdOut, addr msg
;what I want to know will work
push msg
int 21h ; If this does what I think it does, it should print msg
Does such a thing exist (as in using int 21h to print things), or does something like it exist, but not exactly int 21h. Or am I completely wrong.
Could someone clarify this for me?
Thanks,
Progrmr

The interrupt 21h was the entry point for MS-DOS functions.
For example to print something on stdout you have to:
mov ah, 09h ; Required ms-dos function
mov dx, msg ; Address of the text to print
int 21h ; Call the MS-DOS API entry-point
The string must be terminated with the '$' character.
But:
You cannot use interrupts in Windows desktop application (they're available only for device drivers).
You must write a 16 bit application if you need to call MS-DOS functions.
Then...yes, you can't use it to print messages, nothing like that exists: you have to call OS functions to print your messages and they are not available via interrupts.

DOS interrupts cannot be used in protected mode on Windows.
You can use the WriteFile Win32 API function to write to the console, or use the MASM macro instead.

The other answers saying that you cannot use interrupts in Windows are quite wrong. If you really want, you can (that's not recommended). At least on 32-bit x86 Windows there's the legacy int 2Eh-based interface for system calls. See e.g. this page for a bit of discussion of system call mechanisms on x86 and x86_64 Windows.
Here's a very simple example (compiled with FASM) of a program, which immediately exits on Windows 7 using int 0x2e (and crashes on most other versions):
format PE
NtTerminateProcess_Wind7=0x172
entry $
; First call terminates all threads except caller thread, see for details:
; http://www.rohitab.com/discuss/topic/41523-windows-process-termination/
mov eax, NtTerminateProcess_Wind7
mov edx, terminateParams
int 0x2e
; Second call terminates current process
mov eax, NtTerminateProcess_Wind7
mov edx, terminateParams
int 0x2e
ud2 ; crash if we failed to terminate
terminateParams:
dd 0, 0 ; processHandle, exitStatus
Do note though, that this is an unsupported way of using Windows: the system call numbers are changing quite often and in general can't be relied on. On this page you can see that e.g. NtCreateFile on Windows XP calls system call number 0x25, while already on Windows Server 2003 this number corresponds to NtCreateEvent, and on Vista it's NtAlpcRevokeSecurityContext.
The supported (albeit not much documented) way of doing the system calls is through the functions of the Native API library, ntdll.dll.
But even if you use the Native API, "printing things" is still very version-dependent. Namely, if you have a redirect to file, you must use NtWriteFile, but when writing to a true console window, you have to use LPC, where the target process depends on Windows version.

Endless loop with assember

OK, I'm new to PC Assembler. I"m trying to write an program, but it won't stop looping. I'm guessing the ECX register is being modified? How can I fix this? Thanks.
DATA SECTION
;
KEEP DD 0 ;temporary place to keep things
;
CODE SECTION
;
START:
MOV ECX,12
TOPOFLOOP:
PUSH -11 ;STD_OUTPUT_HANDLE
CALL GetStdHandle ;get, in eax, handle to active screen buffer
PUSH 0,ADDR KEEP ;KEEP receives output from API
PUSH 5,'bruce' ;5=length of string
PUSH EAX ;handle to active screen buffer
CALL WriteFile
XOR EAX,EAX ;return eax=0 as preferred by Windows
LOOP TOPOFLOOP
ENDLABEL:
RET

In most x86 calling convention, including the stdcall convention used by Windows API functions, ECX is a caller-save register -- the called function is not required to make sure the value of the register is the same when it returns as when it was called. You have to save it somewhere safe in your own code.

Does Mutex call a system call?

CRITICAL_SECTION locking (enter) and unlocking (leave) are efficient because
CS testing is performed in user space without making the kernel system call that
a mutex makes. Unlocking is performed entirely in user space, whereas ReleaseMutex requires a system call.
I just read these sentences in this book.
What the kernel system call mean? Could you give me the function's name?
I'm a English newbie. I interpreted them like this.
CS testing doesn't use a system call.
Mutex testing uses a system call.(But I don't know the function name. Let me know)
CS unlocking doesn't call a system call.
Mutex unlocking requires a system call.(But I don't know the function name. Let me know)
Another question.
I think CRITICAL_SECTION might call WaitForSingleObject or family functions. Don't these functions require a system call? I guess they do. So CS testing doesn't use a system call is very weird to me.

The implementation of critical sections in Windows has changed over the years, but it has always been a combination of user-mode and kernel calls.
The CRITICAL_SECTION is a structure that contains a user-mode updated values, a handle to a kernel-mode object - EVENT or something like that, and debug information.
EnterCriticalSection uses an interlocked test-and-set operation to acquire the lock. If successful, this is all that is required (almost, it also updates the owner thread). If the test-and-set operation fails to aquire, a longer path is used which usually requires waiting on a kernel object with WaitForSignleObject. If you initialized with InitializeCriticalSectionAndSpinCount then EnterCriticalSection may spin an retry to acquire using interlocked operation in user-mode.
Below is a diassembly of the "fast" / uncontended path of EnterCriticialSection in Windows 7 (64-bit) with some comments inline
0:000> u rtlentercriticalsection rtlentercriticalsection+35
ntdll!RtlEnterCriticalSection:
00000000`77ae2fc0 fff3 push rbx
00000000`77ae2fc2 4883ec20 sub rsp,20h
; RCX points to the critical section rcx+8 is the LockCount
00000000`77ae2fc6 f00fba710800 lock btr dword ptr [rcx+8],0
00000000`77ae2fcc 488bd9 mov rbx,rcx
00000000`77ae2fcf 0f83e9b1ffff jae ntdll!RtlEnterCriticalSection+0x31 (00000000`77ade1be)
; got the critical section - update the owner thread and recursion count
00000000`77ae2fd5 65488b042530000000 mov rax,qword ptr gs:[30h]
00000000`77ae2fde 488b4848 mov rcx,qword ptr [rax+48h]
00000000`77ae2fe2 c7430c01000000 mov dword ptr [rbx+0Ch],1
00000000`77ae2fe9 33c0 xor eax,eax
00000000`77ae2feb 48894b10 mov qword ptr [rbx+10h],rcx
00000000`77ae2fef 4883c420 add rsp,20h
00000000`77ae2ff3 5b pop rbx
00000000`77ae2ff4 c3 ret
So the bottom line is that if the thread does not need to block it will not use a system call, just an interlocked test-and-set operation. If blocking is required, there will be a system call. The release path also uses an interlocked test-and-set and may require a system call if other threads are blocked.
Compare this to Mutex which always requires a system call NtWaitForSingleObject and NtReleaseMutant

Calling to the kernel requires a context switch, which is takes a small (but measurable) performance hit for every context switch. The function in question is ReleaseMutex() itself.
The critical section functions are available in kernel32.dll (at least from the caller's point of view - see comments for discussion about ntdll.dll) and can often avoid making any calls into the kernel.
It is worthwhile to know that Mutex objects can be accessed from different processes at the same time. On the other hand, CRITICAL_SECTION objects are limited to one process.

To my knowledge critical sections are implemented using semaphores.
The critical section functions are implemented in NTDLL, which implements some runtime functions in user mode and passes control so the kernel for others (system call). The functions in kernel32.dll are simple function forwarders.
Mutexes on the other hand are kernel objects and require a system call as such. The kernel calls them "mutants", by the way (no joke).

Critical section calls only transition to kernel mode if there is contention and only then if they can't relieve the contention by spinning. In that case the thread blocks and calls a wait function – that's a system call.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio