The Microsoft PE/COFF SPEC (v8, section 5.4.4) says that when a symbol has:
A storage class of IMAGE_SYM_CLASS_EXTERNAL
And a section number of 0 (IMAGE_SYM_UNDEFINED)
It's "value" field (in the symbol table) which "indicates the size".
This confuses me. In particular, I'm wondering "indicates the size of what?".
Generally, IMAGE_SYM_CLASS_EXTERNAL and IMAGE_SYM_UNDEFINED are used by CL(visual C++) to represent externs.
Why would the linker need to know, or care, about the symbol's size? Doesn't it just need to know a name, that it's an extern, and have the appropriate relocation entries set? None of this should depend on size. Now, admittedly, the compiler needs to know this, but it would get that information from a header file, not from an object file.
I've looked at some simple example externs compiled by CL, and the Value field always seems to be zero. So, it's clearly not being used to encode the size of the field.
Does anyone know what "size" the spec is referring to? Are their any scenarios where the visual studio linker might use that field, or is that blurb in the spec just nonsense? My limited brain is unable to think of any such scenarios.
Update:
Please note that it does not, at least not always, appear to be the size of the symbol. In the cases I've observed the VALUE IS ALWAYS 0, hence the question.
Mr.Wisniewski, I believe I found the answer.
I'm a student and I've tried to write my own linker.
The very first version of it can link OBJ files and dump them
to my own binary format. But soon I've realized that many C++ language
features are unsupported without LIBCMT.LIB.
So at first I've coded lib parser... and stuck while trying to link CRT.
In the second linker member of the file LIBCMT.LIB was specified that
object file crt0.obj (inside libcmt) contains symbol __acmdln (global pointer to the
command line)... but I couldn't manage to find it there! I was really frustrated...
Symbol had IMAGE_SYM_CLASS_EXTERNAL and section IMAGE_SYM_UNDEFINED, but why?
In the source file crt0.c there is a declaration:
#ifdef WPRFLAG
wchar_t *_wcmdln; /* points to wide command line */
#else /* WPRFLAG */
char *_acmdln; /* points to command line */
#endif /* WPRFLAG */
My investigation was rather long and the result is so:
C++ compiler places uninitilized data into the .bss section and marks it with IMAGE_SCN_CNT_UNINITIALIZED_DATA, but
pure C compiler behaves in a different way (libcmt was written in C).
It is linker's duty to place uninitialized data into sections.
If C compiler emits symbol without section (0) and marked as external and if
it's value field is zero, then it is declared anywhere else, but if value field is not null, that
means that given OBJ file really contains that symbol but it is not initialized.
So linker should reserve place in .bss section for it. THE PLACE OF 'VALUE' SIZE.
And when you change those lines to:
#ifdef WPRFLAG
wchar_t *_wcmdln = 0xCCCCCCCC; /* points to wide command line */
#else /* WPRFLAG */
char *_acmdln = 0xCCCCCCCC; /* points to command line */
#endif /* WPRFLAG */
There will be zero value field and both of them will be placed in .data section.
Good luck, and sorry for my bad English.
How about an extern declaration for an array that declares the size:
a.cpp:
extern int example[42];
b.cpp:
int example[13];
The fact that the linker doesn't catch this mismatch suggests however that Value isn't used. I have no easy way to see that.
It's the size of the data structure referred to by the symbol.
Basically, if the symbol is undefined, the linker can't otherwise find the size of the data structure, and therefore needs to know in advance how big it is when it's instantiated so it can deal with those issues during linkage.
You have a very exotic, but interesting question. It it correct that the only possibility to produce symbol table inside of COFF is usage of /Zd compiler switch which are supported till Visual C++ 6.0 and use the old linker switch /debugtype:coff (see http://www.debuginfo.com/articles/gendebuginfo.html#gendebuginfovc6)? Is there any possibility to produce symbol table inside of COFF with at least Visual Studio 2008?
My idea is try to produce a PE with a symbol table of storage class IMAGE_SYM_CLASS_EXTERNAL and the section number 0 (IMAGE_SYM_UNDEFINED) with respect of linker switch /FORCE (/FORCE:UNRESOLVED or /FORCE:MULTIPLE) and an unresolved symbol either by /INCLUDE:dummySymbol or by /NODEFAULTLIB. My problem is that it's not easy to produce symbol table inside of COFF. Where you receive the test PEs?
Related
I have defined a DLL-export as follows:
__declspec(dllexport)
DWORD WINAPI DllBootstrap(LPVOID addr) {
return 0;
}
Now, using DUMPBIN, the symbol is displayed as follows:
1 0 0001100A ?DllBootstrap##YGKPAX#Z = #ILT+5(?DllBootstrap##YGKPAX#Z)
And this is how the memory looks in Visual Studio:
ยก}....ReflectDLL.dll.?DllBootstrap##YGKPAX#Z..........................................
when inspecting PIMAGE_EXPORT_DIRECTORY.AddressOfNames.
What I need is a clean symbol, i.e., DUMPBIN should output something like:
1 0 0001100A DllBootstrap
and PIMAGE_EXPORT_DIRECTORY.AddressOfNames should point to:
DllBootstrap..........................................
How can I achieve this?
WIN32 BUILDS:
As #RbMm indicated, to retain your function name as-is and get no name decoration, you must use a .DEF file (and remove the __declspec(dllexport) specifier). Then create a DEF file with the line below and either specify it with the /DEF linker option or add it to your Visual Studio project and it will be picked up automatically by the linker:
EXPORTS DllBootstrap
If you don't want to deal with an external .DEF file and you will be using the Visual C++ compiler, the simplest way to limit decoration using just code is to declare your function with 'extern "C"'. This results in decoration including a preceding underscore and appends an "#" along with the argument's byte count in decimal. The following code for example:
extern "C" __declspec(dllexport)
DWORD WINAPI DllBootstrap(LPVOID addr) {
return 0;
}
produces an exported name of:
_DllBootstrap#4
This is how stdcall functions are decorated when C++ name-mangling is disabled with 'extern "C"'. NOTE: WINAPI maps to __stdcall. Retaining 'extern "C"' and changing the convention to __cdecl, you won't get any decoration whatsoever, but module entrypoints should generally remain stdcall as you have it listed in your sample.
If you still want to avoid a .DEF file, there is one last hack you can employ. Add the following line to your code:
#pragma comment(linker,"/EXPORT:DllBootstrap=_DllBootstrap#4")
This will pass an argument to the linker creating a new undecorated name symbol which maps to the decorated name. This isn't very clean as the original name will still exist in your DLL, but you will get your clean exported name.
WIN64 BUILDS (UPDATE):
As Hans Passant commented, for anyone using the Visual C++ 64-bit compiler, there is only the 64-bit calling convention (stdcall, cdecl, etc. keywords are ignored). While C++ mangling will still occur under this compiler, no additional decoration is made to the exported names. In this case, 'extern "C"' would be enough when the sample is compiled as C++ code; if compiled as C, no modifications would be necessary.
I came across an interesting error when I was trying to link to an MSVC-compiled library using MinGW while working in Qt Creator. The linker complained of a missing symbol that went like _imp_FunctionName. When I realized That it was due to a missing extern "C", and fixed it, I also ran the MSVC compiler with /FAcs to see what the symbols are. Turns out, it was __imp_FunctionName (which is also the way I've read on MSDN and quite a few guru bloggers' sites).
I'm thoroughly confused about how the MinGW linker complains about a symbol beginning with _imp, but is able to find it nicely although it begins with __imp. Can a deep compiler magician shed some light on this? I used Visual Studio 2010.
This is fairly straight-forward identifier decoration at work. The imp_ prefix is auto-generated by the compiler, it exports a function pointer that allows optimizing binding to DLL exports. By language rules, the imp_ is prefixed by a leading underscore, required since it lives in the global namespace and is generated by the implementation and doesn't otherwise appear in the source code. So you get _imp_.
Next thing that happens is that the compiler decorates identifiers to allow the linker to catch declaration mis-matches. Pretty important because the compiler cannot diagnose declaration mismatches across modules and diagnosing them yourself at runtime is very painful.
First there's C++ decoration, a very involved scheme that supports function overloads. It generates pretty bizarre looking names, usually including lots of ? and # characters with extra characters for the argument and return types so that overloads are unambiguous. Then there's decoration for C identifiers, they are based on the calling convention. A cdecl function has a single leading underscore, an stdcall function has a leading underscore and a trailing #n that permits diagnosing argument declaration mismatches before they imbalance the stack. The C decoration is absent in 64-bit code, there is (blessfully) only one calling convention.
So you got the linker error because you forgot to specify C linkage, the linker was asked to match the heavily decorated C++ name with the mildly decorated C name. You then fixed it with extern "C", now you got the single added underscore for cdecl, turning _imp_ into __imp_.
Sorry if the title is not very clear. I am using MinGW with GCC 6.3.0 to build a x86 32-bit DLL on Windows (so far). I'll spare you the details why I need hacky offsets amongst its sections accessible from code, so please do not ask if it's useful or not (because I don't want to bother explaining that).
So, if I can get the following testcase to work, I'm good. Here's my problem:
In a C++ file, I want to access a linker symbol as an absolute numeric value, not relocated, directly. Remember that I am building a 32-bit DLL which requires a .reloc section for relocations, but in this case I do NOT want relocation, in fact a relocation would screw it up completely.
Here's an example: retrieve the offset of say __imp__MessageBoxW#16 relative to __IAT_start__, in case you don't know what they are, __imp__MessageBoxW#16 is the relocated pointer to the actual function at runtime, and __IAT_start__ is a linker symbol in the default script file. Here's where it is defined:
.idata BLOCK(__section_alignment__) :
{
/* This cannot currently be handled with grouped sections.
See pe.em:sort_sections. */
KEEP (SORT(*)(.idata$2))
KEEP (SORT(*)(.idata$3))
/* These zeroes mark the end of the import list. */
LONG (0); LONG (0); LONG (0); LONG (0); LONG (0);
KEEP (SORT(*)(.idata$4))
__IAT_start__ = .;
KEEP (SORT(*)(.idata$5))
__IAT_end__ = .;
KEEP (SORT(*)(.idata$6))
KEEP (SORT(*)(.idata$7))
}
So far, no problem. Because GAS doesn't allow me to "subtract" two externally defined symbols (both symbols are defined in the linker), I have to define the symbol in the linker script, so at the end of the linker script I have this:
test_symbol = ABSOLUTE("__imp__MessageBoxW#16" - __IAT_start__);
Then in C++ I use this little inline asm to retrieve this relative difference which is supposed to be a fixed value once linked:
asm("movl $test_symbol, %0":"=r"(var));
Now var should contain that fixed number right? Wrong!
Because test_symbol is an "undefined" symbol as far as the assembler is concerned, it makes it relocated. Or I don't know why, but I tried so many things to force it to be an "absolute constant value symbol" instead of a "relocated symbol" to no avail. Even editing the linker script with many things like LD_FEATURE("SANE_EXPR") and others, doesn't work at all.
Its value is correct only if the DLL does not get relocated.
You see, either GNU LD or the assembler adds an entry in the .reloc section for that movl instruction, which is WRONG!
Is there a way to force it to treat an external/undefined symbol as a fixed CONSTANT and apply no relocation to it whatsoever? Basically, omit it from the .reloc section.
I am going crazy with this, please tell me there's something easy I overlooked, I searched for hours!
In other words, is there a way to use a Linker Symbol from within inline asm/C++ without having it relocated whatsoever? No entry to the .reloc section or anything, basically same as a constant like $1234. So if a DLL gets loaded into another base address, that constant would be the same everytime.
UPDATE: I forgot about this question but decided to bring an update, since it seems it's likely not possible as nobody even commented. For anyone else in the same boat as me, I presume this is a limitation of the COFF object format itself. In other words, external symbols are implicitly relocated, and it doesn't seem there's a way against this.
I didn't "fix" it the way I wanted, I did it in a very hacky way though. If anyone is interested, here's my ugly "hack":
First I put a special "custom" instruction in the inline assembly where I reference this external symbol from C++. This "custom" instruction holds a placeholder instruction that grabs the symbol (normal x86 asm instruction with a dummy constant, e.g. 1234) and a way to identify it. Then let GCC generate the assembly files (.S files), then I parse the assembly with a simple script and when I find that "custom" instruction I insert a label for the linker (make it .global) and at the same time add a directive to a custom "on-the-fly" generated linker script that gets included from my main linker script at the end.
This places data in a temporary section in the resulting DLL with absolute offsets to the custom instruction that I need, but without relocation.
Next, I parse the binary DLL itself, in particular that temporary section I added with all this hack. I take the offsets from there, convert them to file offsets, and modify the DLL's .text section directly where those offsets point (remember those placeholder instructions? it is replacing their immediate constants 1234 with the respective value from the linker's non-relocated constant). Then I strip the temporary section from the DLL, and it's done. Of course, all of this is done automatically by a helper program and script
It's an insane hack, but it works and it's fully automatic now that I got it going. If my assumption is correct that COFF doesn't support non-relocated external symbols, then it's really the only way to use linker constants from C++ without them being relocated, which would be a disaster.
I have been doing some x86 programming in Windows with NASM and I have run into some confusion. I am confused as to why I must do this:
extern _ExitProcess#4
Specifically I am confused about the '_' and the '#4'. I know that the '#4' is the size of the stack but why is it needed? When I looked in the kernel32.dll with a hex editor I only saw 'ExitProcess' not '_ExitProcess#4'.
I am also confused as to why C Functions do not need the underscore and the stack size such as this:
extern printf
Why don't C Functions need decorations?
My third question is "Is this the way I should be using these functions?" Right now I am linking with the actual dll files themselves.
I know that the '#4' is the size of the stack but why is it needed?
To enable the linker to report a fatal error if your compiler assumed the wrong calling convention for the function (this can happen if you forget to include header files in C and ignore all the compiler warnings or if a declaration doesn't exactly match the function in the shared library).
Why don't C Functions need decorations?
Functions that use the cdecl calling convention are decorated with a single leading (so it would actually be _printf).
The reason why no parameter size is encoded into the decorated name is that the caller is responsible for both setting up and tearing down the stack, so an argument count mismatch will not be fatal for the stack setup (though the calling function might still crash if it isn't given the right arguments, of course). It might even be possible that the argument count is variable, like in the case of printf.
When I looked in the kernel32.dll with a hex editor I only saw ExitProcess not _ExitProcess#4.
The mangled names are usually mapped to the actual exported names of the DLL using definition files (*.def), which then get compiled to *.lib import library files that can be used in your linker invocation. An example of such a definition file for kernel32.dll is this one. The following line defines the mapping for ExitProcess:
_ExitProcess#4 = ExitProcess
Is this the way I should be using these functions?
I don't know NASM very well, but the code I've seen so far usually specifies the decorated name, like in your example.
You can find more information on this excellent page about Win32 calling conventions.
I am using gSoap to generate ANSI C source code, that I would like to build within the LabWindows/CVI environment, on a Windows 7, 64 bit OS. The gSoap file stdsoap2.c includes several instances of the _setmode() function, with the following prototype:
int _setmode (int fd, int mode);
Where fd is a file descriptor, and mode is set to either _O_TEXT or _O_BINARY.
Oddly enough, even though LW/CVI contains an interface to Microsoft's SDK, this SDK does not contain a prototype to _setmode in any of its included header files, even though the help link to the SDK contains information on the function.
Is anyone aware of the method in LabWindows/CVI used to set file (or stream) translation mode to text, or binary.
Thanks,
Ryyker
Closing the loop on this question.
I could not use the only offered answer for the reason listed in my comment above.
Although I did use the SDK, it was not to select a different version of the OpenFile function, rather it was to support the use of a function that an auto-code generator used, _setmode() but that was not supported by my primary development environment (LabWindows/CVI).
So, in summary, my solution WAS to include the SDK to give me definition for _setmode as well as including the following in my non- auto-generated code:
#define _O_TEXT 0x4000 /* file mode is text (translated) */
#define _O_BINARY 0x8000 /* file mode is binary (untranslated) */
So, with the caveat that this post describes what I actually did, I am going to mark the answer #gary offered as the answer, as it was in the ball park. Thanks #gary.
It sounds like you just want to open a file as either ASCII or binary. So you should be able to replace the instances of _setmode() with the LW/CVI OpenFile() function as described here. Here's a short example reading a file as binary.
char filename = "path//to//file.ext"
int result;
result = OpenFile(filename, VAL_READ_ONLY, VAL_OPEN_AS_IS, VAL_BINARY);
if (result < 0)
// Error, notify user.
else
// No error.
Also note this warning from the page:
Caution The Windows SDK also contains an OpenFile function. If you
include windows.h and do not include formatio.h, you will get compile
errors if you call OpenFile.