What's the difference between "Import Table address" and "Import Address Table address" in Date Directories of PE? - portable-executable

Anyone knows the difference?

If you want to play with Portable Executables, there's no way around grabbing a copy of the specs.
It's been a while, but in case memory serves me correctly: IT and IAT are identical, except that IAT is filled by the PE-loader while resolving imports - but don't take my word for it, check the specs :)
EDIT:
Had a quick browse through the specs, and refreshed my memory a bit:
The Import Table is the master structure, with one entry per DLL you're importing from. Each entry contains, among other things, an Import Lookup Table (ILT) and Import Address Table (IAT) pointer (iirc these used to be called OriginalFirstThunk and FirstThunk). The ILT and IAT tables are identical on-disk, but during runtime the IAT will be filled with the memory addresses of imported functions.
The PE header IAT field probably can't be relied on 100% if you want to be able to deal with nonstandard EXEs, just like you can't depend on the start-of/size-of code and data pointers. It's best to ignore the IAT header field and parse the IT instead. Also, when parsing the IT, the ILT will be missing on some executables, having only the IAT - older borland (iirc) linkers were notorious for not generating the ILT.
EDIT 2: definitions
IT: Import Table (PeCoff section 6.4.1) - table of per-DLL IMAGE_IMPORT_DESCRIPTOR.
ILT: Import Lookup Table (PeCoff section 6.4.2) - table of per-import IMAGE_THUNK_DATA.
IAT: Import Address Table (PeCoff section 6.4.4) - on-disk: identical to ILT, runtime: filled with imported function memory addresses.

IMAGE_DIRECTORY_ENTRY_IMPORT eventually leads to multiple IAT thunks, which are stored in a memory region, which starts at [IMAGE_DIRECTORY_ENTRY_IAT].VirtualAddress, and has size [IMAGE_DIRECTORY_ENTRY_IAT].Size.
I guess it is useful when all the sections are loaded by default as read-only, and you can use IMAGE_DIRECTORY_ENTRY_IAT to make the IAT (but not the ILT) thunks writable.
BTW, ILT and IAT can have different content, when DLL is bound. In that case, IAT thunks contain the pre-calculated addresses of the imported functions.

#snemarch Is mostly right, though I think both him and the documentation are wrong that the ILT and IAT are the same on disk. I've looked through the bytes, they are not the same.
Though, he is right about the definition and purpose of the tables.
The ILT (Import Lookup Table) is used by the Windows Loader to associate the functions used by an EXE with their address in a DLL. However, once this association is made, the address in the DLL gets written to the IAT (Import Address Table) in the EXE. After the EXE is loaded, it doesn't need the ILT anymore, when it calls a function in a DLL it points into the IAT.

The import directory points to an import directory table, which is a table in .rdata, and in the import table, there's an IMAGE_IMPORT_DESCRIPTOR entry for each dll and the entry points to the name string of the dll, the start of the IAT portion for the imports from that dll, and the start of the ILT portion for the imports from that dll.
The bound import directory table is usually in the header page, and contains IMAGE_BOUND_IMPORT_DESCRIPTORs for each bound module. Each descriptor contain a pointer to the bound module name string (also in the header), and a timestamp, which is the timestamp of the dll it's bound to.
The delay import table is usually in .rdata and contains IMAGE_DELAY_IMPORT_DESCRIPTORs for each delay loaded module. IMAGE_DELAY_IMPORT_DESCRIPTORs contain a timestamp, link to module name, link to the delay load IAT and delay load ILT and the bound delay load IAT and the unload delayed import table.
In dwmcore.dll, the .rdata section looks something like (in order): IAT, constant file scope variables, export directory, EAT, ELT, EOT, export function names, more constant file scope variables and strings, delay import table, delay import module names, delay ILT, delay import function names, import table, import module names, ILT, import function names, unwind info.
Delay IAT is actually at the start of .data. I'm not sure if modules share the same delay IAT/ILTs or whether they're separate. I'm not sure why delay and delay bound has separate IAT instead of using the main IAT.
The IAT contains the RVA of the function name string on disk if the function is not bound, delayed or delay bound. If it is bound then the IAT contains an address hint for the function. If it is delayed / delay bound then it contains the address of a helper function. If only hinting is used instead of binding then the IAT contains an index hint.

Related

How much of shared object is loaded to memory

If there is a shared object file say libComponent.so which is made up of two object files Component_1.o and Compononet_2.o.
And there is an application which links to libComponent.so but is only using Compononent_1.o functions.
Will the entire shared object i.e libComponent.so will be loaded into memory when application runs and uses shared object file or just the Component_1.o ?
Is there an option available in gcc compiler to toggle this behaviour of only loading the required symbols from a shared object ?
Well, it depends on what you mean by 'loaded'.
The dynamic linker will map all of the library into the process's virtual memory space and will fill in entries in the executable's import table for each library function used with the addresses of functions in the shared library. But filling in the import table doesn't actually load from those addresses, so they won't be loaded into physical memory.
From then on, the library code will be paged into physical memory on demand when the function is called, just like any other pageable memory in the process's virtual address space. If a function is never called (directly from the application or indirectly from another library function called by the application), it won't be paged in. (Well, paging occurs with page size granularity, so you might pull in a function the application doesn't call if it's next to a function it does call. Some compilers use profile-guided optimization to place functions commonly called together next to each other to minimize the number of pages used.)
(Aside: if your library wasn't compiled to use position-independent code and it's loaded at its non-default base address, the linker will need to fix up addresses in the code when it's loaded, which would cause the entire library to be paged in. This could be done lazily when each page is first loaded, though I'm not sure which linkers do this.)

What's the difference between .rdata and .idata segments?

I noticed in IDA that the PE file which I analyze has not only the .rdata section but also .idata. What's the difference?
.rdata is for const data. It is the read only version of the .data segment.
.idata holds the import directory (.edata for exports). It is used by EXE's and DLL's to designate the imported and exported functions. See the PE format specification (http://msdn.microsoft.com/library/windows/hardware/gg463125) for details.
Summarizing typical segment names:
.text: Code
.data: Initialized data
.bss: Uninitialized data
.rdata: Const/read-only (and initialized) data
.edata: Export descriptors
.idata: Import descriptors
.reloc: Relocation table (for code instructions with absolute addressing when
the module could not be loaded at its preferred base address)
.rsrc: Resources (icon, bitmap, dialog, ...)
.tls: __declspec(thread) data (Fails with dynamically loaded DLLs -> hard to find bugs)
As Martin Rosenau mentions, the segment names are only typical. The true segment type is specified in the segment header or is defined by usage of data stored in the segment.
In fact, the names of the segments are ignored by Windows.
There are linkers that use different segment names and it is even possible to store the Import Descriptors, Export descriptors, Resources etc. in the ".text" segment instead of using separate segments.
However it seems to be simpler to create separate sections for such metadata so most linkers will use separate sections.
This means: Sections ".idata", ".rdata", ".rsrc", ... do not contain program data (although their name ends with "data") but they contain meta information that is used by the operating system. The ".rsrc" section for example holds information about the icon that is shown when looking at the executable file in the Explorer.
".idata" contains information about all DLL files required by the program.

Need for relocations in an exe

Why is there a need for relocation table when every element in an exe is at a relative offset from the base of the image?? I mean even if the image gets dispacled by a positive offset of say 0X60000, why is there for relocation table, as we would anyways be using RVA's which would be relative to the new base??
The point is that the code doesn't access the globals (global variables and function addresses) via RVA or whats-or-ever. They're accessed by their absolute address. And this address should be changed in case the executable was not loaded at its preferred address.
The relocation table consists exactly of those places. It's a table of all the places that should be adjusted by the difference of the actual base address and the preferred one.
BTW, EXEs, in contrast to DLLs usually don't contain relocation tables. This is because they're the first module to be mapped into the address space, hence they may always be loaded at their preferred address. The situation is different for DLLs, which usually do contains relocation tables.
P.S. In Windows 7 EXE may contain relocation table in case they prefer to be loaded at random address. It's a security feature (pitiful IMHO)
Edit:
Should be mentioned that function addresses are not always accessed by their absolute value. On x86 branching instructions (such as jmp, call and etc.) have a "short" format which works with the relative offset. Such places don't neet to be mentioned in relocation table.
For an EXE file there is no need for the relocation table because the executable image is always loaded at its preferred address. The relocation table can safely be stripped.

Import Table in PE (.exe)

I found the pointer for the "Import Table" field. Which is 8 bytes in size and is divided into Virtual Address and Size. However the value in Virtual Address field is to big and is misleading my efforts to extract any information relating the whereabouts for entries relating to the Import Table. Is the value pointing to the offset, if so the (.exe) file finishes before reaching the desired offset.
The RVA (relative virtual address) of the Import Directory in the Directory Table must be valid. Perhaps your conversion of it into a physical offset is malfunctioning. Of course, that is done by traversing the section table to find the containing section. Then subtract that section's starting RVA from the target RVA. Then simply add the physical offset of the section to this result. That will give you the position within the file of the Import Directory. Conversions to and from RVAs to physical offsets may be necessary often if you are working with the on-disk file. If working with the in-memory image, sometimes protection utilities modify or destroy parts of the PE header in memory to deter dumping.
Once you get to the Import Directory you still have more work to do. A quick Google showed a better explanation than I'm likely to write here: http://sandsprite.com/CodeStuff/Understanding_imports.html
I am the author of the now 'classic' PECompact, PEBundle (now discontinued), and other PE manipulation utilities.

When does the PE file format IAT function addresses get set

I google'd a bit and read http://en.wikipedia.org/wiki/Portable_Executable but i can't seem to find when the Import adress table addresses are written. Does it happen on compilation? Or when the executable is ran?
It happens during runtime. Read this.
The whole point of the IAT is to allow a PE image to be loaded at an arbitrary location in the address space at run time. Since the base address is not known until run time, the IAT cannot be populated at compile time. This means that the addresses are set when the PE image is loaded into memory at run time.
Matt Pietrek's MSJ columns about the PE format are excellent references.

Resources