Win32 Assembly - Extern function naming (The meaning of '#') - winapi

As I see, extern WinAPI functions in assembly code have names like _ExitProcess#4.
What is the meaning of the #4 part, and how to determine what number to use after # ?
I know that this has something to do with DLL we are linking against, but in many cases it's not known what number to use after the #, and this leads to many nasty undefined reference errors.

As Andreas H answer said the number after the # is the number of bytes the function removes from stack before the function returns. This means it should be easy to determine that number, as it's the also number of bytes you need push on the stack to correctly call the function. It should be the number of PUSH instructions before the call multiplied by 4. In most cases this will also be the number of arguments passed to the function multiplied by 4.
If you want to double check that you've gotten the right number and you have Microsoft Visual Studio installed you can find the decorated symbol name from the Developer Command Prompt like this:
C:\> dumpbin /headers kernel32.lib | find "ExitProcess"
Symbol name : _ExitProcess#4
Name : ExitProcess
If you're using the MinGW compiler tools to link your assembly code, you can do this instead:
C:\> nm C:\MinGW\lib\libkernel32.a | find "ExitProcess"
00000000 I __imp__ExitProcess#4
00000000 T _ExitProcess#4
You'll need to replace C:\MinGW with the directory you installed MinGW.
Since not all Windows APIs reside in the kernel32 import library you'll need to replace kernel32 with the name of the import library given in the Windows SDK documentation for the API function you want to link to. For example, with MessageBoxA you'd need to use user32.lib with Visual Studio and libuser32.a with MinGW instead.
Note there are few rare Windows APIs that don't use the stdcall calling convention. These are functions like wsprintf that take a variable number of arguments, which the stdcall calling convention doesn't support. These functions just have an underscore _ before their names, and no # or number after. They also require that the caller remove the arguments from the stack.

The # symbol is, as the leading underscore, part of the function name when the stdcall calling convention is specified for the function.
The number specifies the number of bytes the function removes from the stack.
The compiler generates this number.
The suffix is added so that the function is not accidentally called with the wrong calling convention or the prototype in the source code specifies the wrong number or size of arguments. So the intention is to provide a means to avoid program crashes.
See also https://msdn.microsoft.com/de-de/library/zxk0tw93.aspx

If you want to get the number to use, make sure you have _NT_SYMBOL_PATH defined to the correct value.
Like:
srv*https://msdl.microsoft.com/download/symbols
or
srv*c:\MyServerSymbols*https://msdl.microsoft.com/download/symbols
For example (in cmd.exe, windows command line):
set _NT_SYMBOL_PATH=srv*https://msdl.microsoft.com/download/symbols
Then use:
dumpbin /exports /symbols kernel32.lib | findstr _ExitProcess#
You'll have to be in the directory where kernel32 is and you'll have to have grep.
Probably is a way to use built in find command. You could also redirect it to a file and then view it in your editor.

Related

How to specify a "clean" name of DLL exports?

I have defined a DLL-export as follows:
__declspec(dllexport)
DWORD WINAPI DllBootstrap(LPVOID addr) {
return 0;
}
Now, using DUMPBIN, the symbol is displayed as follows:
1 0 0001100A ?DllBootstrap##YGKPAX#Z = #ILT+5(?DllBootstrap##YGKPAX#Z)
And this is how the memory looks in Visual Studio:
ยก}....ReflectDLL.dll.?DllBootstrap##YGKPAX#Z..........................................
when inspecting PIMAGE_EXPORT_DIRECTORY.AddressOfNames.
What I need is a clean symbol, i.e., DUMPBIN should output something like:
1 0 0001100A DllBootstrap
and PIMAGE_EXPORT_DIRECTORY.AddressOfNames should point to:
DllBootstrap..........................................
How can I achieve this?
WIN32 BUILDS:
As #RbMm indicated, to retain your function name as-is and get no name decoration, you must use a .DEF file (and remove the __declspec(dllexport) specifier). Then create a DEF file with the line below and either specify it with the /DEF linker option or add it to your Visual Studio project and it will be picked up automatically by the linker:
EXPORTS DllBootstrap
If you don't want to deal with an external .DEF file and you will be using the Visual C++ compiler, the simplest way to limit decoration using just code is to declare your function with 'extern "C"'. This results in decoration including a preceding underscore and appends an "#" along with the argument's byte count in decimal. The following code for example:
extern "C" __declspec(dllexport)
DWORD WINAPI DllBootstrap(LPVOID addr) {
return 0;
}
produces an exported name of:
_DllBootstrap#4
This is how stdcall functions are decorated when C++ name-mangling is disabled with 'extern "C"'. NOTE: WINAPI maps to __stdcall. Retaining 'extern "C"' and changing the convention to __cdecl, you won't get any decoration whatsoever, but module entrypoints should generally remain stdcall as you have it listed in your sample.
If you still want to avoid a .DEF file, there is one last hack you can employ. Add the following line to your code:
#pragma comment(linker,"/EXPORT:DllBootstrap=_DllBootstrap#4")
This will pass an argument to the linker creating a new undecorated name symbol which maps to the decorated name. This isn't very clean as the original name will still exist in your DLL, but you will get your clean exported name.
WIN64 BUILDS (UPDATE):
As Hans Passant commented, for anyone using the Visual C++ 64-bit compiler, there is only the 64-bit calling convention (stdcall, cdecl, etc. keywords are ignored). While C++ mangling will still occur under this compiler, no additional decoration is made to the exported names. In this case, 'extern "C"' would be enough when the sample is compiled as C++ code; if compiled as C, no modifications would be necessary.

Compiling Fortran external symbols

When compiling fortran code into object files: how does the compiler determine the symbol names?
when I use the intrinsic function "getarg" the compiler converts it into a symbol called "_getarg#12"
I looked in the external libraries and found that the symbol name inside is called "_getarg#16" what is the significance of the "#[number]" at the end of "getarg" ?
_name#length is highly Windows-specific name mangling applied to the name of routines that obey the stdcall (or __stdcall by the name of the keyword used in C) calling convention, a variant of the Pascal calling convention. This is the calling convention used by all Win32 API functions and if you look at the export tables of DLLs like KERNEL32.DLL and USER32.DLL you'd see that all symbols are named like this.
The _...#length decoration gives the number of bytes occupied by the routine arguments. This is necessary since in the stdcall calling conventions it is the callee who cleans up the arguments from the stack and not the caller as is the case with the C calling convention. When the compiler generates a call to func with two 4-byte arguments, it puts a reference to _func#8 in the object code. If the real func happens to have different number or size of arguments, its decorated name would be something different, e.g. _func#12 and hence a link error would occur. This is very useful with dynamic libraries (DLLs). Imagine that a DLL was replaced with another version where func takes one additional argument. If it wasn't for the name mangling (the technical term for prepending _ and adding #length to the symbol name), the program would still call into func with the wrong arguments and then func would increment the stack pointer with more bytes than was the size of the passed argument list, thus breaking the caller. With name mangling in place the loader would not launch the executable at all since it would not be able to resolve the reference to _func#8.
In your case it looks like the external library is not really intended to be used with this compiler or you are missing some pragma or compiler option. The getarg intrinsic takes two arguments - one integer and one assumed-sized character array (string). Some compilers pass the character array size as an additional argument. With 32-bit code this would result in 2 pointers and 1 integer being passed, totalling in 12 bytes of arguments, hence the _getarg#12. The _getarg#16 could be, for example, 64-bit routine with strings being passed by some kind of descriptor.
As IanH reminded me in his comment, another reason for this naming discrepancy could be that you are calling getarg with fewer arguments than expected. Fortran has this peculiar feature of "prototypeless" routine calls - Fortran compilers can generate calls to routines without actually knowing their signature, unlike in C/C++ where an explicit signature has to be supplied in the form of a function prototype. This is possible since in Fortran all arguments are passed by reference and pointers are always the same size, no matter the actual type they point to. In this particular case the stdcall name mangling plays the role of a very crude argument checking mechanism. If it wasn't for the mangling (e.g. on Linux with GNU Fortran where such decorations are not employed or if the default calling convention was cdecl) one could call a routine with different number of arguments than expected and the linker would happily link the object code into an executable that would then most likely crash at run time.
This is totally implementation dependent. You did not say, which compiler do you use. The (nonstandard) intrinsic can exist in more versions for different integer or character kinds. There can also be more versions of the runtime libraries for more computer architectures (e.g. 32 bit and 64 bit).

Using DLLs with NASM

I have been doing some x86 programming in Windows with NASM and I have run into some confusion. I am confused as to why I must do this:
extern _ExitProcess#4
Specifically I am confused about the '_' and the '#4'. I know that the '#4' is the size of the stack but why is it needed? When I looked in the kernel32.dll with a hex editor I only saw 'ExitProcess' not '_ExitProcess#4'.
I am also confused as to why C Functions do not need the underscore and the stack size such as this:
extern printf
Why don't C Functions need decorations?
My third question is "Is this the way I should be using these functions?" Right now I am linking with the actual dll files themselves.
I know that the '#4' is the size of the stack but why is it needed?
To enable the linker to report a fatal error if your compiler assumed the wrong calling convention for the function (this can happen if you forget to include header files in C and ignore all the compiler warnings or if a declaration doesn't exactly match the function in the shared library).
Why don't C Functions need decorations?
Functions that use the cdecl calling convention are decorated with a single leading (so it would actually be _printf).
The reason why no parameter size is encoded into the decorated name is that the caller is responsible for both setting up and tearing down the stack, so an argument count mismatch will not be fatal for the stack setup (though the calling function might still crash if it isn't given the right arguments, of course). It might even be possible that the argument count is variable, like in the case of printf.
When I looked in the kernel32.dll with a hex editor I only saw ExitProcess not _ExitProcess#4.
The mangled names are usually mapped to the actual exported names of the DLL using definition files (*.def), which then get compiled to *.lib import library files that can be used in your linker invocation. An example of such a definition file for kernel32.dll is this one. The following line defines the mapping for ExitProcess:
_ExitProcess#4 = ExitProcess
Is this the way I should be using these functions?
I don't know NASM very well, but the code I've seen so far usually specifies the decorated name, like in your example.
You can find more information on this excellent page about Win32 calling conventions.

Should use "__imp__ApiName#N" or "_ApiName#N"?

I have dumped a Windows SDK .lib file (kernel32.lib) with DUMPBIN, the output show me that there are two "versions" for every API name, for example:
_imp_ExitProcess#4
and
_ExitProcess#4
So, why there are two of the same with different name mangling? .
Let say i want to call ExitProcess from NASM, wich of them should i use when declare EXTERN?, mi practice shows me that i can call any of them but i don't know which one is the "correct" or the "prefered" to stick with it.
I think the _imp_ version is meant to be used with __declspec(dllimport) on Visual C++ compilers to prevent potential conflicts with code in the same module.
You're not supposed to use that fact explicitly in your code -- just use the original function, i.e. _ExitProcess#4.

How to use a dll without knowing parameters?

I have a dll that I need to make use of. I also have a program that makes calls to this dll to use it. I need to be able to use this dll in another program, however previous programmer did not leave any documentation or source code. Is there a way I can monitor what calls are made to this dll and what is passed?
You can't, in general. This is from the Dependency Walker FAQ:
Q: How do I view the parameter and
return types of a function?
A: For most functions, this
information is simply not present in
the module. The Windows' module file
format only provides a single text
string to identify each function.
There is no structured way to list the
number of parameters, the parameter
types, or the return type. However,
some languages do something called
function "decoration" or "mangling",
which is the process of encoding
information into the text string. For
example, a function like int Foo(int,
int) encoded with simple decoration
might be exported as _Foo#8. The 8
refers to the number of bytes used by
the parameters. If C++ decoration is
used, the function would be exported
as ?Foo##YGHHH#Z, which can be
directly decoded back to the
function's original prototype: int
Foo(int, int). Dependency Walker
supports C++ undecoration by using the
Undecorate C++ Functions Command.
Edit: One thing you could do, I think, is to get a disassembler and disassemble the DLL and/or the calling code, and work out from that the number and types of the arguments, and the return types. You wouldn't be able to find out the names of the arguments though.
You can hook the functions in the DLL you wish to monitor (if you know how many arguments they take)
You can use dumpbin (Which is part of the Visual Studio Professional or VC++ Express, or download the platform kit, or even use OpenWatcom C++) on the DLL to look for the 'exports' section, as an example:
dumpbin /all SimpleLib.dll | more
Output would be:
Section contains the following exports for SimpleLib.dll
00000000 characteristics
4A15B11F time date stamp Thu May 21 20:53:03 2009
0.00 version
1 ordinal base
2 number of functions
2 number of names
ordinal hint RVA name
1 0 00001010 fnSimpleLib
2 1 00001030 fnSimpleLib2
Look at the ordinals, there are the two functions exported...the only thing is to work out what parameters are used...
You can also use the PE Explorer to find this out for you. Working out the parameters is a bit tricky, you would need to disassemble the binary, and look for the function call at the offset in the file, then work out the parameters by looking at the 'SP', 'BP' registers.
Hope this helps,
Best regards,
Tom.

Resources