Confusing ReturnLength from Windows GetLogicalProcessorInformationEx function

Confusing ReturnLength from Windows GetLogicalProcessorInformationEx function - windows

I'm trying to use the (fairly new) GetLogicalProcessorInformationEx function in Windows. The ReturnLength it gives isn't making sense.
The older GetLogicalProcessorInformation gives reasonable results...
ReturnLength = 0;
Result = GetLogicalProcessorInformation(NULL, &ReturnLength);
printf("GLPI (%d): %d %d\n",
Result,
sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION),
ReturnLength);
Here's the output (2-core, 64-bit, Win7 box): GLPI (0): 32 416
In other words, the function will populate the buffer I pass with 416/32=13 SYSTEM_LOGICAL_PROCESSOR_INFORMATION structures.
For GetLogicalProcessorInformationEx, here's my call...
ReturnLength = 0;
Result = GetLogicalProcessorInformationEx(RelationProcessorCore,
NULL, &ReturnLength);
printf("GLPIX (%d): %d %d %d\n",
Result,
sizeof(PROCESSOR_RELATIONSHIP),
sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX),
ReturnLength);
Here's the output (2-core, 64-bit, Win7 box): GLPIX (0): 40 80 96
The Microsoft docs (http://msdn.microsoft.com/en-us/library/windows/desktop/dd405488(v=vs.85).aspx) indicate that the function will return either PROCESSOR_RELATIONSHIP or SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX structures, depending on the value of the first argument. ReturnLength suggests it isn't going to return either, though - 96 isn't divisible by sizeof(PROCESSOR_RELATIONSHIP) or sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX).
I also tried RelationAll for the first argument, and that gave a ReturnLength of 768 - also not a multiple or 40 or 80.
Can anyone shed any light?

You'll need to trust what the function returns you. Necessarily so, the structures in the union have an unpredictable size. Particularly this member of PROCESSOR_RELATIONSHIP:
GROUP_AFFINITY GroupMask[ANYSIZE_ARRAY];
The ANYSIZE_ARRAY macro is the hint, that says that the size of the GroupMask array is variable and depends on the value of the GroupCount member. Using sizeof on the structure never gives you the correct size, it will be too low. Be sure to use the returned size to allocate the storage for the struct, like this:
SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX* buf =
(SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX*)malloc(ReturnLength);
This pattern is otherwise common in C and the winapi.

Related

EnumProcessModules returns 998

I am trying to run this code in Rust (2021 version):
let module_list_size: PDWORD = ptr::null_mut();
res = winapi::um::psapi::EnumProcessModules(remote_handle, ptr::null_mut(), 0, module_list_size);
Res is well defined and the handle is valid (I checked it before) yet I'm still getting windows error 998 which is invalid access (I'm running this code as admin).
(The function exists and I imported it correctly).
Thank you in advance!

The last parameter is a pointer that indicates where to write how many bytes are needed to store all the module handles. But you're pointing at null, so it'll fail with an invalid access error when it tries to give you the result.
Instead, make a DWORD variable and pass a pointer to it:
let module_list_size: DWORD = 0;
res = winapi::um::psapi::EnumProcessModules(remote_handle, ptr::null_mut(), 0, &mut module_list_size);

Why XFetchBuffer() returns null instead of clipboard?

int N, atom;
atom = XInternAtom (display, "CLIPBOARD", false);
char *c = XFetchBuffer(display, &N, atom);
The code above supposed to get the string from the clipboard, but it only returns null. And N is 0 as well.

XFetchBuffer works with cut buffers, not with the clipboard. Cut buffers are hardly ever used these days. Note the argument XFetchBuffer accepts is not an Atom but an integer. These are not the same thing.
If you need the clipboard, you need to follow ICCCM and write lots more code.

Windows RegQueryValueEx odd return results

I am getting odd results when using RegQueryValueEx and I cannot figure out why.
This is what I had set up before making the RegQueryValueEx
DWORD dataSize;
TCHAR data[256];
The first time I call
LONG ret = RegQueryValueEx( hKey, dataKey, NULL, NULL, (LPBYTE)data, &dataSize);
ret is equal to 234 (ERROR_MORE_DATA)
But when I call the same thing on the next line
LONG ret2 = RegQueryValueEx( hKey, dataKey, NULL, NULL, (LPBYTE)data, &dataSize);
ret2 is equal to 0 (ERROR_SUCCESS)
Why would this function return ERROR_MORE_DATA the first time I call it, then return ERROR_SUCESS on the same call on the very next line?
I attempted to change TCHAR data[1024] but I got the exact same results. Any ideas?
Complete code:
for( int i=0; i<NUM_HISTORY; i++){
CString dataKey = getDataKey(i);
DWORD dataSize = 1024;
TCHAR data[1024];
LONG ret = RegQueryValueEx( hKey, dataKey, NULL, NULL, (LPBYTE)data, &dataSize);
LONG ret2 = RegQueryValueEx( hKey, dataKey, NULL, NULL, (LPBYTE)data, &dataSize);
// Breakpoint to see what ret and ret2 are equal to
int j = 0;
}

This is by design. The first call failed because you specified as size that was too small. But what you didn't count on is that it also updated your dataSize variable. To tell you how much memory to allocate so the call can succeed.
So the second call succeeded since you now specify a size that's exactly correct. But without doing the other thing you needed to do, actually make the buffer bigger. Nothing much good can happen when the call then causes a buffer overflow and corrupt your stack frame, be sure to use the /RTC compile option so you'll get a runtime error from that.
You avoided this problem by increasing the buffer size from 256 to 1024. But your code is still incorrect, your program will fail miserably if the registry value ever gets larger than 1024 bytes. Don't use a local array, use the new operator or malloc() to allocate the buffer so it can never fail like this. Or simply fail the call and declare "bad data".
Also note another bug, the dataSize is in bytes but the buffer is TCHAR, not a byte. Which is probably why you didn't corrupt the stack frame, the buffer was big enough by accident. You don't want to rely on accidents like this. Consider a helper class like CRegKey to avoid these kind of mistakes.

How to put my structure variable into CPU caches to eliminate main memory page access time? Options

It's clear that there is no explicit way or certain system calls that
help programmers to put a variable into the CPU cache.
But I think that a certain programming style or well designed
algorithm can make it possible to increase the possibilities that the
variable can be cached into the CPU caches.
Here is my example:
I want to append an 8 byte structure at the end of an array consisting
of the same type of structures, declared in the global main memory
region.
This process is continuously repeated for 4 million operations. This process takes 6 seconds, 1.5 us for each operation. I think this result tells that the two memory areas have not been cached.
I got some clues from a cache-oblivious algorithm, so I tried several
ways to enhance this. Until now, no enhancement.
I think some clever codes can reduce the elapsed time, up to 10 to 100
times. Please show me the way.
-------------------------------------------------------------------------
Appended (2011-04-01)
Damon~ thank you for your comment!
After reading your comment, I analyzed my code again, and found several things
that I missed. The following code that I attached is the abbreviated version of my original code.
To accurately measure each operation's execution time (in the original code, there are several different types of operations), I inserted the time measuring code using clock_gettime() function. I thought if I measure each operation's execution time and accumulate them, the additional cost by the main loop can be avoided.
In the original code, the time measuring code was hidden by a macro function, so I totally forgot about it.
The running time of this code is almost 6 seconds. But if I get rid of the time measuring function in the main loop, it becomes 0.1 seconds.
Since the clock_gettime() function supports very high precision (upto 1 nano second), executed on the basis of an independent thread, and also it requires very big structure,
I think the function caused the cache-out of the main memory area where the consecutive insertions are performed.
Thank you again for your comment. For further enhancement, any suggestion will be very helpful for me to optimize my code.
I think the hierachically defined structure variable might cause unnecessary time cost,
but first I want to know how much it would be, before I change it to the more C-style code.
typedef struct t_ptr {
uint32 isleaf :1, isNextLeaf :1, ptr :30;
t_ptr(void) {
isleaf = false;
isNextLeaf = false;
ptr = NIL;
}
} PTR;
typedef struct t_key {
uint32 op :1, key :31;
t_key(void) {
op = OP_INS;
key = 0;
}
} KEY;
typedef struct t_key_pair {
KEY key;
PTR ptr;
t_key_pair() {
}
t_key_pair(KEY k, PTR p) {
key = k;
ptr = p;
}
} KeyPair;
typedef struct t_op {
KeyPair keyPair;
uint seq;
t_op() {
seq = 0;
}
} OP;
#define MAX_OP_LEN 4000000
typedef struct t_opq {
OP ops[MAX_OP_LEN];
int freeOffset;
int globalSeq;
bool queueOp(register KeyPair keyPair);
} OpQueue;
bool OpQueue::queueOp(register KeyPair keyPair) {
bool isFull = false;
if (freeOffset == (int) (MAX_OP_LEN - 1)) {
isFull = true;
}
ops[freeOffset].keyPair = keyPair;
ops[freeOffset].seq = globalSeq++;
freeOffset++;
}
OpQueue opQueue;
#include <sys/time.h>
int main() {
struct timespec startTime, endTime, totalTime;
for(int i = 0; i < 4000000; i++) {
clock_gettime(CLOCK_REALTIME, &startTime);
opQueue.queueOp(KeyPair());
clock_gettime(CLOCK_REALTIME, &endTime);
totalTime.tv_sec += (endTime.tv_sec - startTime.tv_sec);
totalTime.tv_nsec += (endTime.tv_nsec - startTime.tv_nsec);
}
printf("\n elapsed time: %ld", totalTime.tv_sec * 1000000LL + totalTime.tv_nsec / 1000L);
}

YOU don't put the structure into any cache. The CPU does that automatically for you. The CPU is even more clever than that; if you access sequential memory, it will start putting things from memory into the cache before you read them.
And really, it should be common sense that for a simple bit of code like this, the time you spend on measuring is ten times more than the time to perform the code (apparently 60 times in your case).
Since you put so much confidence in clock_gettime (): I suggest you call it five times in a row and store the results, then print the differences. There's resolution, there's precision, and there's how long it takes to return the current time, which is pretty damned long.

I have been unable to force caching, but you can force memory to be uncache-able. If you have large other datastructures you might exclude these so that they will not pollute your caches. This can be done by specifying PAGE_NOCACHE for the Windows VirutalAllocXXX functions.
http://msdn.microsoft.com/en-us/library/windows/desktop/aa366786(v=vs.85).aspx

How can I calculate the complete buffer size for GetModuleFileName?

The GetModuleFileName() takes a buffer and size of buffer as input; however its return value can only tell us how many characters is has copied, and if the size is not enough (ERROR_INSUFFICIENT_BUFFER).
How do I determine the real required buffer size to hold entire file name for GetModuleFileName()?
Most people use MAX_PATH but I remember the path can exceed that (260 by default definition)...
(The trick of using zero as size of buffer does not work for this API - I've already tried before)

The usual recipe is to call it setting the size to zero and it is guaranteed to fail and provide the size needed to allocate sufficient buffer. Allocate a buffer (don't forget room for nul-termination) and call it a second time.
In a lot of cases MAX_PATH is sufficient because many of the file systems restrict the total length of a path name. However, it is possible to construct legal and useful file names that exceed MAX_PATH, so it is probably good advice to query for the required buffer.
Don't forget to eventually return the buffer from the allocator that provided it.
Edit: Francis points out in a comment that the usual recipe doesn't work for GetModuleFileName(). Unfortunately, Francis is absolutely right on that point, and my only excuse is that I didn't go look it up to verify before providing a "usual" solution.
I don't know what the author of that API was thinking, except that it is possible that when it was introduced, MAX_PATH really was the largest possible path, making the correct recipe easy. Simply do all file name manipulation in a buffer of length no less than MAX_PATH characters.
Oh, yeah, don't forget that path names since 1995 or so allow Unicode characters. Because Unicode takes more room, any path name can be preceeded by \\?\ to explicitly request that the MAX_PATH restriction on its byte length be dropped for that name. This complicates the question.
MSDN has this to say about path length in the article titled File Names, Paths, and Namespaces:
Maximum Path Length
In the Windows API (with some
exceptions discussed in the following
paragraphs), the maximum length for a
path is MAX_PATH, which is defined as
260 characters. A local path is
structured in the following order:
drive letter, colon, backslash,
components separated by backslashes,
and a terminating null character. For
example, the maximum path on drive D
is "D:\<some 256 character path
string><NUL>" where "<NUL>" represents
the invisible terminating null
character for the current system
codepage. (The characters < > are used
here for visual clarity and cannot be
part of a valid path string.)
Note File I/O functions in the
Windows API convert "/" to "\" as part
of converting the name to an NT-style
name, except when using the "\\?\"
prefix as detailed in the following
sections.
The Windows API has many functions
that also have Unicode versions to
permit an extended-length path for a
maximum total path length of 32,767
characters. This type of path is
composed of components separated by
backslashes, each up to the value
returned in the
lpMaximumComponentLength parameter of
the GetVolumeInformation function. To
specify an extended-length path, use
the "\\?\" prefix. For example,
"\\?\D:\<very long path>". (The
characters < > are used here for
visual clarity and cannot be part of a
valid path string.)
Note The maximum path of 32,767
characters is approximate, because the
"\\?\" prefix may be expanded to a
longer string by the system at run
time, and this expansion applies to
the total length.
The "\\?\" prefix can also be used
with paths constructed according to
the universal naming convention (UNC).
To specify such a path using UNC, use
the "\\?\UNC\" prefix. For example,
"\\?\UNC\server\share", where "server"
is the name of the machine and "share"
is the name of the shared folder.
These prefixes are not used as part of
the path itself. They indicate that
the path should be passed to the
system with minimal modification,
which means that you cannot use
forward slashes to represent path
separators, or a period to represent
the current directory. Also, you
cannot use the "\\?\" prefix with a
relative path, therefore relative
paths are limited to MAX_PATH
characters as previously stated for
paths not using the "\\?\" prefix.
When using an API to create a
directory, the specified path cannot
be so long that you cannot append an
8.3 file name (that is, the directory name cannot exceed MAX_PATH minus 12).
The shell and the file system have
different requirements. It is possible
to create a path with the Windows API
that the shell user interface might
not be able to handle.
So an easy answer would be to allocate a buffer of size MAX_PATH, retrieve the name and check for errors. If it fit, you are done. Otherwise, if it begins with "\\?\", get a buffer of size 64KB or so (the phrase "maximum path of 32,767 characters is approximate" above is a tad troubling here so I'm leaving some details for further study) and try again.
Overflowing MAX_PATH but not beginning with "\\?\" appears to be a "can't happen" case. Again, what to do then is a detail you'll have to deal with.
There may also be some confusion over what the path length limit is for a network name which begins "\\Server\Share\", not to mention names from the kernel object name space which begin with "\\.\". The above article does not say, and I'm not certain about whether this API could return such a path.

Implement some reasonable strategy for growing the buffer like start with MAX_PATH, then make each successive size 1,5 times (or 2 times for less iterations) bigger then the previous one. Iterate until the function succeeds.

Using
extern char* _pgmptr
might work.
From the documentation of GetModuleFileName:
The global variable _pgmptr is automatically initialized to the full path of the executable file, and can be used to retrieve the full path name of an executable file.
But if I read about _pgmptr:
When a program is not run from the command line, _pgmptr might be initialized to the program name (the file's base name without the file name extension) or to a file name, relative path, or full path.
Anyone who knows how _pgmptr is initialized? If SO had support for follow-up questions I would posted this question as a follow up.

While the API is proof of bad design, the solution is actually very simple. Simple, yet sad it has to be this way, for it's somewhat of a performance hog as it might require multiple memory allocations. Here is some keypoints to the solution:
You can't really rely on the return value between different Windows-versions as it can have different semantics on different Windows-versions (XP for example).
If the supplied buffer is too small to hold the string, the return value is the amount of characters including the 0-terminator.
If the supplied buffer is large enough to hold the string, the return value is the amount of characters excluding the 0-terminator.
This means that if the returned value exactly equals the buffer size, you still don't know whether it succeeded or not. There might be more data. Or not. In the end you can only be certain of success if the buffer length is actually greater than required. Sadly...
So, the solution is to start off with a small buffer. We then call GetModuleFileName passing the exact buffer length (in TCHARs) and comparing the return result with it. If the return result is less than our buffer length, it succeeded. If the return result is greater than or equal to our buffer length, we have to try again with a larger buffer. Rinse and repeat until done. When done we make a string copy (strdup/wcsdup/tcsdup) of the buffer, clean up, and return the string copy. This string will have the right allocation size rather than the likely overhead from our temporary buffer. Note that the caller is responsible for freeing the returned string (strdup/wcsdup/tcsdup mallocs memory).
See below for an implementation and usage code example. I have been using this code for over a decade now, including in enterprise document management software where there can be a lot of quite long paths. The code can ofcourse be optimized in various ways, for example by first loading the returned string into a local buffer (TCHAR buf[256]). If that buffer is too small you can then start the dynamic allocation loop. Other optimizations are possible but that's beyond the scope here.
Implementation and usage example:
/* Ensure Win32 API Unicode setting is in sync with CRT Unicode setting */
#if defined(_UNICODE) && !defined(UNICODE)
# define UNICODE
#elif defined(UNICODE) && !defined(_UNICODE)
# define _UNICODE
#endif
#include <stdio.h> /* not needed for our function, just for printf */
#include <tchar.h>
#include <windows.h>
LPCTSTR GetMainModulePath(void)
{
TCHAR* buf = NULL;
DWORD bufLen = 256;
DWORD retLen;
while (32768 >= bufLen)
{
if (!(buf = (TCHAR*)malloc(sizeof(TCHAR) * (size_t)bufLen))
{
/* Insufficient memory */
return NULL;
}
if (!(retLen = GetModuleFileName(NULL, buf, bufLen)))
{
/* GetModuleFileName failed */
free(buf);
return NULL;
}
else if (bufLen > retLen)
{
/* Success */
LPCTSTR result = _tcsdup(buf); /* Caller should free returned pointer */
free(buf);
return result;
}
free(buf);
bufLen <<= 1;
}
/* Path too long */
return NULL;
}
int main(int argc, char* argv[])
{
LPCTSTR path;
if (!(path = GetMainModulePath()))
{
/* Insufficient memory or path too long */
return 0;
}
_tprintf("%s\n", path);
free(path); /* GetMainModulePath malloced memory using _tcsdup */
return 0;
}
Having said all that, I like to point out you need to be very aware of various other caveats with GetModuleFileName(Ex). There are varying issues between 32/64-bit/WOW64. Also the output is not necessarily a full, long path, but could very well be a short-filename or be subject to path aliasing. I expect when you use such a function that the goal is to provide the caller with a useable, reliable full, long path, therefor I suggest to indeed ensure to return a useable, reliable, full, long absolute path, in such a way that it is portable between various Windows-versions and architectures (again 32/64-bit/WOW64). How to do that efficiently is beyond the scope here.
While this is one of the worst Win32 APIs in existance, I wish you alot of coding joy nonetheless.

My example is a concrete implementation of the "if at first you don't succeed, double the length of the buffer" approach. It retrieves the path of the executable that is running, using a string (actually a wstring, since I want to be able to handle Unicode) as the buffer. To determine when it has successfully retrieved the full path, it checks the value returned from GetModuleFileNameW against the value returned by wstring::length(), then uses that value to resize the final string in order to strip the extra null characters. If it fails, it returns an empty string.
inline std::wstring getPathToExecutableW()
{
static const size_t INITIAL_BUFFER_SIZE = MAX_PATH;
static const size_t MAX_ITERATIONS = 7;
std::wstring ret;
DWORD bufferSize = INITIAL_BUFFER_SIZE;
for (size_t iterations = 0; iterations < MAX_ITERATIONS; ++iterations)
{
ret.resize(bufferSize);
DWORD charsReturned = GetModuleFileNameW(NULL, &ret[0], bufferSize);
if (charsReturned < ret.length())
{
ret.resize(charsReturned);
return ret;
}
else
{
bufferSize *= 2;
}
}
return L"";
}

Here is a another solution with std::wstring:
DWORD getCurrentProcessBinaryFile(std::wstring& outPath)
{
// #see https://msdn.microsoft.com/en-us/magazine/mt238407.aspx
DWORD dwError = 0;
DWORD dwResult = 0;
DWORD dwSize = MAX_PATH;
SetLastError(0);
while (dwSize <= 32768) {
outPath.resize(dwSize);
dwResult = GetModuleFileName(0, &outPath[0], dwSize);
dwError = GetLastError();
/* if function has failed there is nothing we can do */
if (0 == dwResult) {
return dwError;
}
/* check if buffer was too small and string was truncated */
if (ERROR_INSUFFICIENT_BUFFER == dwError) {
dwSize *= 2;
dwError = 0;
continue;
}
/* finally we received the result string */
outPath.resize(dwResult);
return 0;
}
return ERROR_BUFFER_OVERFLOW;
}

Windows cannot handle properly paths longer than 260 characters, so just use MAX_PATH.
You cannot run a program having path longer than MAX_PATH.

My approach to this is to use argv, assuming you only want to get the filename of the running program. When you try to get the filename from a different module, the only secure way to do this without any other tricks is described already, an implementation can be found here.
// assume argv is there and a char** array
int nAllocCharCount = 1024;
int nBufSize = argv[0][0] ? strlen((char *) argv[0]) : nAllocCharCount;
TCHAR * pszCompleteFilePath = new TCHAR[nBufSize+1];
nBufSize = GetModuleFileName(NULL, (TCHAR*)pszCompleteFilePath, nBufSize);
if (!argv[0][0])
{
// resize memory until enough is available
while (GetLastError() == ERROR_INSUFFICIENT_BUFFER)
{
delete[] pszCompleteFilePath;
nBufSize += nAllocCharCount;
pszCompleteFilePath = new TCHAR[nBufSize+1];
nBufSize = GetModuleFileName(NULL, (TCHAR*)pszCompleteFilePath, nBufSize);
}
TCHAR * pTmp = pszCompleteFilePath;
pszCompleteFilePath = new TCHAR[nBufSize+1];
memcpy_s((void*)pszCompleteFilePath, nBufSize*sizeof(TCHAR), pTmp, nBufSize*sizeof(TCHAR));
delete[] pTmp;
pTmp = NULL;
}
pszCompleteFilePath[nBufSize] = '\0';
// do work here
// variable 'pszCompleteFilePath' contains always the complete path now
// cleanup
delete[] pszCompleteFilePath;
pszCompleteFilePath = NULL;
I had no case where argv didn't contain the file path (Win32 and Win32-console application), yet. But just in case there is a fallback to a solution that has been described above. Seems a bit ugly to me, but still gets the job done.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio