Creating dialog templates in memory on 64-bit architectures

Creating dialog templates in memory on 64-bit architectures - winapi

Dialog templates for DialogBoxIndirect() can also be constructed in memory. MSDN actually has very detailed instructions on how to do this (see here).
However, there is something in Microsoft's sample code that looks problematic from a 64-bit perspective. It's a function called lpwAlign() which seems to take a pointer and align it to a DWORD boundary. The function looks like this:
LPWORD lpwAlign(LPWORD lpIn)
{
ULONG ul;
ul = (ULONG)lpIn;
ul ++;
ul >>=1;
ul <<=1;
return (LPWORD)ul;
}
AFAICS, when compiled on a 64-bit system, this will cast a 64-bit pointer to a 32-bit integer, pad that integer to a multiple of 4 and then return it as a 64-bit pointer. So this looks like something that will crash as soon as pointer values higher than 2^32 are involved. So how should this code be adapted to work with 64-bit?

Replace ULONG with ULONG_PTR to avoid the 32 bit truncation.
EDIT: Also note that the sample code posted on MSDN and quoted in the OP is actually wrong because it aligns the pointer on a WORD boundary, not a DWORD boundary. It should look like this instead:
LPWORD lpwAlign(LPWORD lpIn)
{
ULONG_PTR ul;
ul = (ULONG_PTR)lpIn;
ul += 3;
ul >>= 2;
ul <<= 2;
return (LPWORD)ul;
}
(taken from here)

Related

Trap memory accesses inside a standard executable built with MinGW

So my problem sounds like this.
I have some platform dependent code (embedded system) which writes to some MMIO locations that are hardcoded at specific addresses.
I compile this code with some management code inside a standard executable (mainly for testing) but also for simulation (because it takes longer to find basic bugs inside the actual HW platform).
To alleviate the hardcoded pointers, i just redefine them to some variables inside the memory pool. And this works really well.
The problem is that there is specific hardware behavior on some of the MMIO locations (w1c for example) which makes "correct" testing hard to impossible.
These are the solutions i thought of:
1 - Somehow redefine the accesses to those registers and try to insert some immediate function to simulate the dynamic behavior. This is not really usable since there are various ways to write to the MMIO locations (pointers and stuff).
2 - Somehow leave the addresses hardcoded and trap the illegal access through a seg fault, find the location that triggered, extract exactly where the access was made, handle and return. I am not really sure how this would work (and even if it's possible).
3 - Use some sort of emulation. This will surely work, but it will void the whole purpose of running fast and native on a standard computer.
4 - Virtualization ?? Probably will take a lot of time to implement. Not really sure if the gain is justifiable.
Does anyone have any idea if this can be accomplished without going too deep? Maybe is there a way to manipulate the compiler in some way to define a memory area for which every access will generate a callback. Not really an expert in x86/gcc stuff.
Edit: It seems that it's not really possible to do this in a platform independent way, and since it will be only windows, i will use the available API (which seems to work as expected). Found this Q here:
Is set single step trap available on win 7?
I will put the whole "simulated" register file inside a number of pages, guard them, and trigger a callback from which i will extract all the necessary info, do my stuff then continue execution.
Thanks all for responding.

I think #2 is the best approach. I routinely use approach #4, but I use it to test code that is running in the kernel, so I need a layer below the kernel to trap and emulate the accesses. Since you have already put your code into a user-mode application, #2 should be simpler.
The answers to this question may provide help in implementing #2. How to write a signal handler to catch SIGSEGV?
What you really want to do, though, is to emulate the memory access and then have the segv handler return to the instruction after the access. This sample code works on Linux. I'm not sure if the behavior it is taking advantage of is undefined, though.
#include <stdint.h>
#include <stdio.h>
#include <signal.h>
#define REG_ADDR ((volatile uint32_t *)0x12340000f000ULL)
static uint32_t read_reg(volatile uint32_t *reg_addr)
{
uint32_t r;
asm("mov (%1), %0" : "=a"(r) : "r"(reg_addr));
return r;
}
static void segv_handler(int, siginfo_t *, void *);
int main()
{
struct sigaction action = { 0, };
action.sa_sigaction = segv_handler;
action.sa_flags = SA_SIGINFO;
sigaction(SIGSEGV, &action, NULL);
// force sigsegv
uint32_t a = read_reg(REG_ADDR);
printf("after segv, a = %d\n", a);
return 0;
}
static void segv_handler(int, siginfo_t *info, void *ucontext_arg)
{
ucontext_t *ucontext = static_cast<ucontext_t *>(ucontext_arg);
ucontext->uc_mcontext.gregs[REG_RAX] = 1234;
ucontext->uc_mcontext.gregs[REG_RIP] += 2;
}
The code to read the register is written in assembly to ensure that both the destination register and the length of the instruction are known.

This is how the Windows version of prl's answer could look like:
#include <stdint.h>
#include <stdio.h>
#include <windows.h>
#define REG_ADDR ((volatile uint32_t *)0x12340000f000ULL)
static uint32_t read_reg(volatile uint32_t *reg_addr)
{
uint32_t r;
asm("mov (%1), %0" : "=a"(r) : "r"(reg_addr));
return r;
}
static LONG WINAPI segv_handler(EXCEPTION_POINTERS *);
int main()
{
SetUnhandledExceptionFilter(segv_handler);
// force sigsegv
uint32_t a = read_reg(REG_ADDR);
printf("after segv, a = %d\n", a);
return 0;
}
static LONG WINAPI segv_handler(EXCEPTION_POINTERS *ep)
{
// only handle read access violation of REG_ADDR
if (ep->ExceptionRecord->ExceptionCode != EXCEPTION_ACCESS_VIOLATION ||
ep->ExceptionRecord->ExceptionInformation[0] != 0 ||
ep->ExceptionRecord->ExceptionInformation[1] != (ULONG_PTR)REG_ADDR)
return EXCEPTION_CONTINUE_SEARCH;
ep->ContextRecord->Rax = 1234;
ep->ContextRecord->Rip += 2;
return EXCEPTION_CONTINUE_EXECUTION;
}

So, the solution (code snippet) is as follows:
First of all, i have a variable:
__attribute__ ((aligned (4096))) int g_test;
Second, inside my main function, i do the following:
AddVectoredExceptionHandler(1, VectoredHandler);
DWORD old;
VirtualProtect(&g_test, 4096, PAGE_READWRITE | PAGE_GUARD, &old);
The handler looks like this:
LONG WINAPI VectoredHandler(struct _EXCEPTION_POINTERS *ExceptionInfo)
{
static DWORD last_addr;
if (ExceptionInfo->ExceptionRecord->ExceptionCode == STATUS_GUARD_PAGE_VIOLATION) {
last_addr = ExceptionInfo->ExceptionRecord->ExceptionInformation[1];
ExceptionInfo->ContextRecord->EFlags |= 0x100; /* Single step to trigger the next one */
return EXCEPTION_CONTINUE_EXECUTION;
}
if (ExceptionInfo->ExceptionRecord->ExceptionCode == STATUS_SINGLE_STEP) {
DWORD old;
VirtualProtect((PVOID)(last_addr & ~PAGE_MASK), 4096, PAGE_READWRITE | PAGE_GUARD, &old);
return EXCEPTION_CONTINUE_EXECUTION;
}
return EXCEPTION_CONTINUE_SEARCH;
}
This is only a basic skeleton for the functionality. Basically I guard the page on which the variable resides, i have some linked lists in which i hold pointers to the function and values for the address in question. I check that the fault generating address is inside my list then i trigger the callback.
On first guard hit, the page protection will be disabled by the system, but i can call my PRE_WRITE callback where i can save the variable state. Because a single step is issued through the EFlags, it will be followed immediately by a single step exception (which means that the variable was written), and i can trigger a WRITE callback. All the data required for the operation is contained inside the ExceptionInformation array.
When someone tries to write to that variable:
*(int *)&g_test = 1;
A PRE_WRITE followed by a WRITE will be triggered,
When i do:
int x = *(int *)&g_test;
A READ will be issued.
In this way i can manipulate the data flow in a way that does not require modifications of the original source code.
Note: This is intended to be used as part of a test framework and any penalty hit is deemed acceptable.
For example, W1C (Write 1 to clear) operation can be accomplished:
void MYREG_hook(reg_cbk_t type)
{
/** We need to save the pre-write state
* This is safe since we are assured to be called with
* both PRE_WRITE and WRITE in the correct order
*/
static int pre;
switch (type) {
case REG_READ: /* Called pre-read */
break;
case REG_PRE_WRITE: /* Called pre-write */
pre = g_test;
break;
case REG_WRITE: /* Called after write */
g_test = pre & ~g_test; /* W1C */
break;
default:
break;
}
}
This was possible also with seg-faults on illegal addresses, but i had to issue one for each R/W, and keep track of a "virtual register file" so a bigger penalty hit. In this way i can only guard specific areas of memory or none, depending on the registered monitors.

Why can't get process id that more than 65535 by 'ntQuerySystemInformation' in Win7 64bit?

I used the 'ntQuerySystemInformation' to get all the handle information like:
NtQuerySystemInformation(SystemHandleInformation, pHandleInfor, ulSize,NULL);//SystemHandleInformation = 16
struct of pHandleInfor is:
typedef struct _SYSTEM_HANDLE_INFORMATION
{
ULONG ProcessId;
UCHAR ObjectTypeNumber;
UCHAR Flags;
USHORT Handle;
PVOID Object;
ACCESS_MASK GrantedAccess;
} SYSTEM_HANDLE_INFORMATION, *PSYSTEM_HANDLE_INFORMATION;
It works well in xp 32bit, but in Win7 64bit can only get the right pid that less than 65535. The type of processId in this struct is ULONG, I think it can get more than 65535. What's wrong with it? Is there any other API instead?

There are two enum values for NtQuerySystemInformation to get handle info:
CNST_SYSTEM_HANDLE_INFORMATION = 16
CNST_SYSTEM_EXTENDED_HANDLE_INFORMATION = 64
And correspondingly two structs: SYSTEM_HANDLE_INFORMATION and SYSTEM_HANDLE_INFORMATION_EX.
The definitions for these structs are:
struct SYSTEM_HANDLE_INFORMATION
{
short UniqueProcessId;
short CreatorBackTraceIndex;
char ObjectTypeIndex;
char HandleAttributes; // 0x01 = PROTECT_FROM_CLOSE, 0x02 = INHERIT
short HandleValue;
size_t Object;
int GrantedAccess;
}
struct SYSTEM_HANDLE_INFORMATION_EX
{
size_t Object;
size_t UniqueProcessId;
size_t HandleValue;
int GrantedAccess;
short CreatorBackTraceIndex;
short ObjectTypeIndex;
int HandleAttributes;
int Reserved;
}
As You can see, the first struct really can only contain 16-bit process id-s...
See for example ProcessExplorer project's source file ntexapi.h for more information.
Note also that the field widths for SYSTEM_HANDLE_INFORMATION_EX in my struct definitions might be different from theirs (that is, in my definition some field widths vary depending on the bitness), but I think I tested the code both under 32-bit and 64-bit and found it to be correct.
Please recheck if necessary and let us know if You have additional info.

From Raymond Chen's article Processes, commit, RAM, threads, and how high can you go?:
I later learned that the Windows NT folks do try to keep the numerical values of process ID from getting too big. Earlier this century, the kernel team experimented with letting the numbers get really huge, in order to reduce the rate at which process IDs get reused, but they had to go back to small numbers, not for any technical reasons, but because people complained that the large process IDs looked ugly in Task Manager. (One customer even asked if something was wrong with his computer.)

bds 2006 C hidden memory manager conflicts (class new / delete[] vs. AnsiString)

I am using BDS 2006 Turbo C++ for a long time now and some of my bigger projects (CAD/CAM,3D gfx engines and Astronomic computations) occasionally throw an exception (for example once in 3-12 months of 24/7 heavy duty usage). After extensive debugging I found this:
//code1:
struct _s { int i; } // any struct
_s *s=new _s[1024]; // dynamic allocation
delete[] s; // free up memory
this code is usually inside template where _s can be also class therefore delete[] this code should work properly, but the delete[] does not work properly for structs (classes looks OK). No exceptions is thrown, the memory is freed, but it somehow damages the memory manager allocation tables and after this any new allocation can be wrong (new can create overlapped allocations with already allocated space or even unallocated space hence the occasional exceptions)
I have found that if I add empty destructor to _s than suddenly seems everything OK
struct _s { int i; ~_s(){}; }
Well now comes the weird part. After I update this to my projects I have found that AnsiString class has also bad reallocations. For example:
//code2:
int i;
_s *dat=new _s[1024];
AnsiString txt="";
// setting of dat
for (i=0;i<1024;i++) txt+="bla bla bla\r\n";
// usage of dat
delete[] dat;
In this code dat contains some useful data, then later is some txt string created by adding lines so the txt must be reallocated few times and sometimes the dat data is overwritten by txt (even if they are not overlapped, I thing the temp AnsiString needed to reallocate txt is overlapped with dat)
So my questions are:
Am I doing something wrong in code1, code2 ?
Is there any way to avoid AnsiString (re)allocation errors ? (but still using it)
After extensive debugging (after posting question 2) I have found that AnsiString do not cause problems. They only occur while using them. The real problem is probably in switching between OpenGL clients. I have Open/Save dialogs with preview for vector graphics. If I disable OpenGL usage for these VCL sub-windows than AnsiString memory management errors disappears completely. I am not shore what is the problem (incompatibility between MFC/VCL windows or more likely I made some mistake in switching contexts, will further investigate). Concern OpenGL windows are:
main VCL Form + OpenGL inside Canvas client area
child of main MFC Open/Save dialog + docked preview VCL Form + OpenGL inside Canvas client area
P.S.
these errors depend on number of new/delete/delete[] usages not on the allocated sizes
both code1 and code2 errors are repetitive (for example have a parser to load complex ini file and the error occurs on the same line if the ini is not changed)
I detect these errors only on big projects (plain source code > 1MB) with combined usage of AnsiString and templates with internal dynamic allocations, but is possible that they are also in simpler projects but occurs so rarely that I miss it.
Infected projects specs:
win32 noinstall standalone (using Win7sp1 x64 but on XPsp3 x32 behaves the same)
does not meter if use GDI or OpenGl/GLSL
does not meter if use device driver DLLs or not
no OCX,or nonstandard VCL component
no DirectX
1 Byte aligned compilation/link
do not use RTL,packages or frameworks (standalone)
Sorry for bad English/grammar ...
any help / conclusion / suggestion appreciated.

After extensive debugging i finely isolated the problem.
Memory management of bds2006 Turbo C++ became corrupt after you try to call any delete for already deleted pointer. for example:
BYTE *dat=new BYTE[10],*tmp=dat;
delete[] dat;
delete[] tmp;
After this is memory management not reliable. ('new' can allocate already allocated space)
Of course deletion of the same pointer twice is bug on programmers side, but i have found the real cause of all my problems which generates this problem (without any obvious bug in source code) see this code:
//---------------------------------------------------------------------------
class test
{
public:
int siz;
BYTE *dat;
test()
{
siz=10;
dat=new BYTE[siz];
}
~test()
{
delete[] dat; // <- add breakpoint here
siz=0;
dat=NULL;
}
test& operator = (const test& x)
{
int i;
for (i=0;i<siz;i++) if (i<x.siz) dat[i]=x.dat[i];
for ( ;i<siz;i++) dat[i]=0;
return *this;
}
};
//---------------------------------------------------------------------------
test get()
{
test a;
return a; // here call a.~test();
} // here second call a.~test();
//---------------------------------------------------------------------------
void main()
{
get();
}
//---------------------------------------------------------------------------
In function get() is called destructor for class a twice. Once for real a and once for its copy because I forget to create constructor
test::test(test &x);
[Edit1] further upgrades of code
OK I have refined the initialization code for both class and struct even templates to fix even more bug-cases. Add this code to any struct/class/template and if needed than add functionality
T() {}
T(const T& a) { *this=a; }
~T() {}
T* operator = (const T *a) { *this=*a; return this; }
//T* operator = (const T &a) { ...copy... return this; }
T is the struct/class name
the last operator is needed only if T uses dynamic allocations inside it if no allocations are used you can leave it as is
This also resolves other compiler issues like this:
Too many initializers error for a simple array in bcc32
If anyone have similar problems hope this helps.
Also look at traceback a pointer in c++ code mmap if you need to debug your memory allocations...

How do we clear the console in assembly?

I am looking for a win32 api function that clears the console, much like the cls command
Thanks!
Devjeet

This is pretty old, but should still work. Conversion to assembly language is left as an exercise for the reader, but shouldn't be terribly difficult (most of it is just function calls, and the multiplication is trivial):
#include <windows.h>
void clear_screen(char fill = ' ') {
COORD tl = {0,0};
CONSOLE_SCREEN_BUFFER_INFO s;
HANDLE console = GetStdHandle(STD_OUTPUT_HANDLE);
GetConsoleScreenBufferInfo(console, &s);
DWORD written, cells = s.dwSize.X * s.dwSize.Y;
FillConsoleOutputCharacter(console, fill, cells, tl, &written);
FillConsoleOutputAttribute(console, s.wAttributes, cells, tl, &written);
SetConsoleCursorPosition(console, tl);
}

There is no Win32 API which directly clears the console - you need to use something like FillConsoleOutputCharacter.

Weird P/Invoke issue on Win 7 x64

I'm P/Invoking to CreateRectRgn in gdi32.dll. The normal P/Invoke signature for this function is:
[DllImport("gdi32", SetLastError=true)]
static extern IntPtr CreateRectRgn(int nLeft, int nTop, int nRight, int nBottom);
As a shortcut, I've also defined this overload:
[DllImport("gdi32", SetLastError=true)]
static extern IntPtr CreateRectRgn(RECT rc);
[StructLayout(LayoutKind.Sequential)]
struct RECT{
public int left;
public int top;
public int right;
public int bottom;
}
(Yes, I am aware of CreateRectRgnIndirect, but since I must use functions to convert between System.Drawing.Rectangle and this RECT structure, the above is more useful to me, as it doesn't involve an intermediate variable.)
This overload should work identically to the normal signature, since it should put the stack in an identical state at entry to CreateRectRgn. And indeed, on Windows XP, 32-bit, it works flawlessly. But on Windows 7, 64-bit, the function returns zero, and Marshal.GetLastWin32Error() returns 87, which is "The parameter is incorrect."
Any ideas as to what could be the problem?

Oh. The calling convention Microsoft uses on x64 is totally different from STDCALL. In the call to CreateRectRgn, the stack isn't used for the parameters at all, they're all passed in registers. When I try to pass a RECT structure, it makes a copy of the structure on the stack, and puts a pointer to this copy in a register. Therefore, this little trick won't work at all in 64-bit Windows. Now I've got to go through all my interop code and find other places I've done this and take them all out.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio