Windows 7 added the functions GetActiveProcessorCount and GetMaximumProcessorCount, but MSDN doesn't say much about them. As far as I can tell, both functions always return the same value, which is the number of logical CPUs in the specified processor group or the system as a whole. So what makes a processor "active" or "inactive"?
Related
Basically, this is the same question that was asked here.
When performing kernel debugging of a machine running Windows 7 or older, with WinDbg version 6.2 and up, the debugger doesn't show anything in the registers window. Pressing the Customize... button results in a message box that reads Registers are not yet known.
At the same time, issuing the r command results in perfectly valid register values being printed out.
What is the reason for this behaviour, and can it be fixed?
TL;DR: I wrote an extension DLL that fixes the bug. Available here.
The Problem
To understand the problem, we first need to understand that WinDbg is basically just a frontend to Microsoft's Windows Symbolic Debugger Engine, implemented inside dbgeng.dll. Other frontends include the command-line kd.exe (kernel debugger) and cdb.exe (user-mode debugger).
The engine implements everything we expect from a debugger: working with symbol files, read and writing memory and registers, setting breakpoitns, etc. The engine then exposes all of this functionality through COM-like interfaces (they implement IUnknown but are not registered components). This allows us, for instance, to write our own debugger (like this person did).
Armed with this knowledge, we can now make an educated guess as to how WinDbg obtains the values of the registers on the target machine.
The engine exposes the IDebugRegisters interface for manipulating registers. This interface declares the GetValues method for retrieving the values of multiple registers in one go. But how does WinDbg know how many registers are there? That why we have the GetNumberRegisters method.
So, to retrieve the values of all registers on the target, we'll have to do something like this:
Call IDebugRegisters::GetNumberRegisters to get the total number of registers.
Call IDebugRegisters::GetValues with the Count parameter set to the total number of registers, the Indices parameter set to NULL, and the Start parameter set to 0.
One tiny problem, though: the second call fails with E_INVALIDARG.
Ehm, excuse me? How can it fail? Especially puzzling is the documentation for this return value:
The value of the index of one of the registers is greater than the number of registers on the target machine.
But I just asked you how many registers there are, so how can that value be out of range? Okay, let's continue reading the docs anyway, maybe something will become clear:
If the return value is not S_OK, some of the registers still might have been read. If the target was not accessible, the return type is E_UNEXPECTED and Values is unchanged; otherwise, Values will contain partial results and the registers that could not be read will have type DEBUG_VALUE_INVALID.
(Emphasis mine.)
Aha! So maybe the engine just couldn't read one of the registers! But which one? Turns out that the engine chokes on the xcr0 register. From the Intel 64 and IA-32 Architectures Software Developer’s Manual:
Extended control register XCR0 contains a state-component bitmap that specifies the user state components that software has enabled the XSAVE feature set to manage. If the bit corresponding to a state component is clear in XCR0, instructions in the XSAVE feature set will not operate on that state component, regardless of the value of the instruction mask.
Okay, so the register controls the operation of the XSAVE instruction, which saves the state of the CPU's extended features (like XMM and AVX). According to the last comment on this page, this instruction requires some support from the operating system. Although the comment states that Windows 7 (that's what the VM I was testing on was running) does support this instruction, it seems that the issue at hand is related to the OS anyway, as when the target is Windows 8 everything works fine.
Really, it's unclear whether the bug is within the debugger engine, which reports more registers than it can retrieve values for, or within WinDbg, which refuses to show any values at all if the engine fails to produce all of them.
The Solution
We could, of course, bite the bullet and just use an older version of WinDbg for debugging older Windows versions. But where's the challenge in that?
Instead, I present to you a debugger extension that solves this problem. It does so by hooking (with the help of this library) the relevant debugger engine methods and returning S_OK if the only register that failed was xcr0. Otherwise, it propagates the failure. The extension supports runtime unload, so if you experience problems you can always disable the hooks.
That's it, have fun!
I need to use unique computer id for the purpose of software licensing. I decided to use CPU Flags. On MSVC they are retrived with function __cpuid, and on gcc version 4.3 and up with the function __get_cpuid. I get an integer out of these functions which is sort of a bit array with the purpose to be used as unique ID.
What I'm not sure whether the CPU flags retrieved with the above functions can ever change? Can those flags be programmatically changed by the user? If not by regular application maybe through BIOS?
Thank you.
No, an end user cannot change them because each of the commands you listed is essentially a wrapper for an actual processor opcode cpuid that is provided in Intel (and Intel-clone) chips.
So this information is 'burned' into the silicon. No user can change it.
The following resources might be helpful:
1) Wikipedia's article on CPUID
2) Code guru article (2 pages) on accessing processor info using a call to CPUID
3) Table listing many processors by stepping, family, and model numbers
OK after some testing I can confirm that the flags from the 2nd byte of Info Type 1 are changing. So I will stick with Stepping ID,Model,Family and Processor Type values only.
Hi I've looked around for an answer to this question and I am wondering if anyone with experience in windows internals knows if the kernel ever will assign a process id that is the same as a thread id. What I mean is say there is process a.exe that I have started that has a thread with id 123. If another process is started, for example b.exe, will the process id be 123? In other words do process and thread identifiers ever collide? Thanks
EDIT: It appears that process and thread ids come from the same pool called the PspCidTable. A hacker named Polynomial who reviewed the windows nt source says the following:
The kernel needs to be able to generate a sequence of process and
thread IDs that are unique across the whole system. To efficiently and
safely do this, the kernel creates a pool of IDs that can be used for
both processes and threads. This pool is exported in the kernel as a
HANDLE_TABLE object called PspCidTable. During Phase0 startup of the
system, the PspInitPhase0 function is called. This function creates a
HANDLE_TABLE object using ExCreateHandleTable, which automatically
populates the table with 65536 entires. Each entry is a 16-bit
unsigned integer (at least it is on a 32-bit OS) stored inside a list
item object that is part of a doubly linked list. Both process and
thread IDs come from the PspCidTable pool.
Source for above: Stuff you (probably) didn't know about Windows
The PspCidTable still exists in Windows XP and empirical observations in Windows 7 lead me to believe the above is still true.
Thread and process ids come from the same pool in all versions of windows AFAIK but that does not mean that this will be true forever. In practice it should not matter at all since you should only pass things that you know is thread id to OpenThread and vice versa.
Don't assume other things about these ids either (They are not 16 bit, they might seem like they are on NT but it is possible to get ids > 0xffff (On Win9x they are xor'ed with a secret and often use the full 32 bits))
The only weird thing you should keep in the back of your mind is that on 64 bit systems they are 32 bit in user mode and pointer sized in kernel mode (Use HandleToUlong/UlongToHandle)
How to find size of a semaphore object in windows?
I tried using sizeof() but we cannot give name of the sempahore object as an argument to sizeof. It has to be the handle. sizeof(HANDLE) gives us the size of handle and not semaphore.
This what is known as an "opaque handle.". There is no way to know how big it really is, what it contains or how any of the functions work internally. This gives Microsoft the ability to completely rewrite the implementation with each new version of Windows if they want to without worrying about breaking existing code. It's a similar concept to having a public and private interface to a class. Since we are not working on the Windows kernel, we only get to see the public interface.
Update:
It might be possible to get a rough idea of how big they are by creating a bunch and monitoring what happens to your memory usage in Process Explorer. However, since there is a good chance that they live in the kernel and not in user space, it might not show up at all. In any case, there are no guarantees about any other version of Windows, past or future, including patches/service packs.
It's something "hidden" from you. You can't say how big it is. And it's a kernel object, so it probably doesn't even live in your address space. It's like asking "how big is the Process Table?", or "how many MB is Windows wasting?".
I'll add that I have made a small test on my Windows 7 32 bits machine: 100000 kernel semaphores (with name X{number} with 0 <= number < 100000)) : 4 mb of kernel memory and 8 mb of user space (both measured with Task Manager). It's about 40 bytes/semaphore in kernel space and 80 bytes/semaphore in user space! (this in Win32... In 64 bits it'll probably double)
I'm writing a kernel module which uses a customized print-on-screen system. Basically each time a print is involved the string is inserted into a linked list.
Every X seconds I need to process the list and perform some operations on the strings before printing them.
Basically I have two choices to implement such a filter:
1) Timer (which restarts itself in the end)
2) Kernel thread which sleeps for X seconds
While the filter is performing its stuff nothing else can use the linked list and, of course, while inserting a string the filter function shall wait.
AFAIK timer runs in interrupt context so it cannot sleep, but what about kernel threads? Can they sleep? If yes is there some reason for not to use them in my project? What other solution could be used?
To summarize: my filter function has got only 3 requirements:
1) Must be able to printk
2) When using the list everything else which is trying to access the list must block until the filter function finishes execution
3) Must run every X seconds (not a realtime requirement)
kthreads are allowed to sleep. (However, not all kthreads offer sleepful execution to all clients. softirqd for example would not.)
But then again, you could also use spinlocks (and their associated cost) and do without the extra thread (that's basically what the timer does, uses spinlock_bh). It's a tradeoff really.
each time a print is involved the string is inserted into a linked list
I don't really know if you meant print or printk. But if you're talking about printk(), You would need to allocate memory and you are in trouble because printk() may be called in an atomic context. Which leaves you the option to use a circular buffer (and thus, you should be tolerent to drop some strings because you might not have enough memory to save all the strings).
Every X seconds I need to process the list and perform some operations on the strings before printing them.
In that case, I would not even do a kernel thread: I would do the processing in print() if not too costly.
Otherwise, I would create a new system call:
sys_get_strings() or something, that would dump the whole linked list into userspace (and remove entries from the list when copied).
This way the whole behavior is controlled by userspace. You could create a deamon that would call the syscall every X seconds. You could also do all the costly processing in userspace.
You could also create a new device says /dev/print-on-screen:
dev_open would allocate the memory, and print() would no longer be a no-op, but feed the data in the device pre-allocated memory (in case print() would be used in atomic context and all).
dev_release would throw everything out
dev_read would get you the strings
dev_write could do something on your print-on-screen system