I noticed in IDA that the PE file which I analyze has not only the .rdata section but also .idata. What's the difference?
.rdata is for const data. It is the read only version of the .data segment.
.idata holds the import directory (.edata for exports). It is used by EXE's and DLL's to designate the imported and exported functions. See the PE format specification (http://msdn.microsoft.com/library/windows/hardware/gg463125) for details.
Summarizing typical segment names:
.text: Code
.data: Initialized data
.bss: Uninitialized data
.rdata: Const/read-only (and initialized) data
.edata: Export descriptors
.idata: Import descriptors
.reloc: Relocation table (for code instructions with absolute addressing when
the module could not be loaded at its preferred base address)
.rsrc: Resources (icon, bitmap, dialog, ...)
.tls: __declspec(thread) data (Fails with dynamically loaded DLLs -> hard to find bugs)
As Martin Rosenau mentions, the segment names are only typical. The true segment type is specified in the segment header or is defined by usage of data stored in the segment.
In fact, the names of the segments are ignored by Windows.
There are linkers that use different segment names and it is even possible to store the Import Descriptors, Export descriptors, Resources etc. in the ".text" segment instead of using separate segments.
However it seems to be simpler to create separate sections for such metadata so most linkers will use separate sections.
This means: Sections ".idata", ".rdata", ".rsrc", ... do not contain program data (although their name ends with "data") but they contain meta information that is used by the operating system. The ".rsrc" section for example holds information about the icon that is shown when looking at the executable file in the Explorer.
".idata" contains information about all DLL files required by the program.
Related
What parts of a PE file are mapped into memory by the MS loader?
From the PE documentation, I can deduce the typical format of a PE executable (see below).
I know, by inspection, that all contents of the PE file, up to and including the section headers, gets mapped into memory exactly as stored on disk.
What happens next?
Is the remainder of the file also mapped (here I refer to the Image Pages part in the picture below), so that the whole file is in memory exactly like stored on disk, or is the loader more selective than that?
In the documentation, I've found the following snippet:
Another exception is that attribute certificate and debug information
must be placed at the very end of an image file, with the attribute
certificate table immediately preceding the debug section, because the
loader does not map these into memory. The rule about attribute
certificate and debug information does not apply to object files,
however.
This is really all I can find about loader behavior; it just says that these two parts must be placed last in the file, since they don't go into memory.
But, if the loader loads everything except these two parts, and I set the section RVA's suffiently high, then section data will actually be duplicated in memory (once in the mapped file and once for the position specified by the RVA)?
If possible, link to places where I can read further about loading specific to MS Windows.
Finding this information is like an egg hunt, because MS always insists on using its own terminology when the COFF description uses AT&T terms.
What parts of a PE file are mapped into memory by the MS loader?
It depends.
All sections covered by a section header are mapped into the run-time address space.
However sections that have an RVA of 0 are not mapped and thus never loaded.
Each debug directory entry identifies the location and size of a block of debug information. The RVA specified may be 0 if the debug information is not covered by a section header (i.e., it resides in the image file and is not mapped into the run-time address space). If it is mapped, the RVA is its address.
Memory contains an exact replica of the file on disk.
Note that executables and dll's are mapped into virtual memory, not physical!
As you access the executable parts of it are swapped into RAM as needed.
If a section is not accessed then it obviously does not get swapped into physical RAM, it is however still mapped into virtual memory.
You can read up on everything you might ever want to know about PE files (and more) on MSDN.
Your quote is lifted from the documentation of the COFF file format.
The critical part is:
The rule on attribute certificate and debug information does not apply to object files.
From: https://support.microsoft.com/en-us/kb/121460
Size: Size of the optional header, which is included for executable files but not object files. An object file should have a value of 0 here.
Ergo: executable files or not object files, they are image files.
as such the exception to the rule does not apply to them.
According to the ld manual on Output Section Description:
section [address] [(type)] :
[AT(lma)]
[ALIGN(section_align) | ALIGN_WITH_INPUT]
[SUBALIGN(subsection_align)]
[constraint]
{
output-section-command
output-section-command
...
} [>region] [AT>lma_region] [:phdr :phdr ...] [=fillexp] [,]
The address or >region stand for the VMA, i.e. the Virtual Memory Address of the output section.
The AT() or AT>lma_region stand for the LMA, i.e. the Load Memory Address of the output section.
And I decide get a close view with readelf -e to dump the section headers and program headers of a helloworld elf file. The result is below:
My questions are:
Why there's no LMA in the dumped headers? How is LMA represented in ELF file?
What does the Addr column in the red rectangle mean? VMA?
What does the PhysAddr in the green rectangle mean?
ADD 1
So far, It seems the PhysAddr is the LMA.
Why there's no LMA in the dumped headers? How is the LMA represented
in an ELF file
Firstly there is no LMA header within an elf file, it is actually quiet simple, multiple sections in an ELF file are mapped into segments, if the sections mapped into segments have a LOAD flag for example (PROFBITS) is a loadable section type, and the segment they are mapped into is also a load type segment (INTERP and LOAD) for example are also loadable segments, that means every section within that segment within that elf file would be loaded into memory. where? simply to the VMA they were given, so no there is no LMA in an elf file, a LMA is represented by a VMA given that the section should be loaded which is a specified type / flag.
What does the addr column in the red rectangle mean?
This has a direct correlation to your previous question, Yes! it does mean a VMA, in order to have this properly explained we need to understand that an ELF format was designed for architectures that support some memory protection / memory segmentation.
you might want to give some section special permissions, instead of giving every section it's own memory protection, you'll map multiple sections into a segment and give that sole segment it's own memory protections.
This causes the need to map sections into segment, how would the OS loader know how to map each section into segment and by that give it the appropriate memory protection? by it's address.
Each section is also given an address and by those addresses / offsets / sizes they are mapped into a segment which in overall would be allocated into memory and given some memory protection rules that would apply to all sections.
The only way that the OS could know how to map these is by address so yes if the section is of a loadable type it's ADDR means VMA
( at least for modern systems that use Virtual Memory and dont abuse the elf file )
What does the PhysAddr mean?
As much as I know, PhysAddr is only relevant to old fashioned architectures in which physical addressing is relevant to user-space programs, this section should hold the actual physical address the segment would sit in, yet in most modern systems this is simply ignored...
I suggest you read this http://flint.cs.yale.edu/cs422/doc/ELF_Format.pdf,
personally back in the day when learning this, it helped me a lot and gave me a lot of knowledge regarding ELF files
hopefully I've helped you some how! :)
I compiled this simple program at ubuntu 15.10 x64
char *glob = "hello strings"
void main() {
}
and using the gdb I could find the "hello strings" are located at the
read/execute segment with .text section.
I already know that some strings contained in the ELF header are located in the code segment
but why the user defined strings are located at the same segment with code?
I've also tried to enlarge the size of the strings to 0x1000 for checking
whether it is compiler optimization to locate small sized strings with code section, but
they are also located at the same segment with code.
It's very interesting to me because intuitively strings should be readable not executable.
By default, the Linux linker creates two PT_LOAD segments: a read-only one, containing .text, and a writable one containing .data (initialized data).
Your string literal resides in .rodata section.
Which of the above two segments would you like this section to go into? If it is to be read-only, then it will have to go into the same segment that contains .text, and that segment must be executable. If the section is to go into the writable segment, it will not have execute permissions, but then you would be able to write to these strings, and they would not be shared when multiple instances of your binary run.
You can see the assignment of sections to segments in the output of readelf -l a.out.
With older versions of GCC (before 4.0), you can see that adding -fwritable-strings moves the string into .data, and into non-executable segment.
Gold linker supports --rodata flag, which moves all read-only non-executable sections into a separate PT_LOAD segment. But that increases the number of mmap and mprotect calls that the dynamic loader has to perform, and so is not the default.
So I have been reasearching the PE format for the last couple days, and I still have a couple of questions
Does the data section get mapped into the process' memory, or does the program read it from the disk?
If it does get mapped into its memory, how can the process aqquire the offset of the section? ( And other sections )
Is there any way the get the entry point of a process that has already been mapped into the memory, without touching the file on disk?
Does the data section get mapped into the process' memory
Yes. That's unlikely to survive for very long, the program is apt to write to that section. Which triggers a copy-on-write page copy that gets the page backed by the paging file instead of the PE file.
how can the process aqquire the offset of the section?
The linker already calculated the offsets of variables in the section. It might be relocated, common for DLLs that have an awkward base address that's already in use when the DLL gets loaded. In which case the relocation table in the PE file is used by the loader to patch the addresses in the code. The pages that contain such patched code get the same treatment as the data section, they are no longer backed by the PE file and cannot be shared between processes.
Is there any way the get the entry point of a process
The entire PE file gets mapped to memory, including its headers. So you can certainly read IMAGE_OPTIONAL_HEADER.AddressOfEntryPoint from memory without reading the file. Do keep in mind that it is painful if you do this for another process since you don't have direct access to its virtual address space. You'd have to use ReadProcessMemory(), that's fairly little joy and unlikely to be faster than reading the file. The file is pretty likely to be present in the file system cache. The Address Space Layout Randomization feature is apt to give you a headache, designed to make it hard to do these kind of things.
Does the data section get mapped into the process' memory, or does the program read it from the disk?
It's mapped into process' memory.
If it does get mapped into its memory, how can the process aqquire the offset of the section? ( And other sections )
By means of a relocation table: every reference to a global object (data or function) from the executable code, that uses direct addressing, has an entry in this table so that the loader patches the code, fixing the original offset. Note that you can make a PE file without relocation section, in which case all data and code sections have a fixed offset, and the executable has a fixed entry point.
Is there any way the get the entry point of a process that has already been mapped into the memory, without touching the file on disk?
Not sure, but if by "not touching" you mean not even reading the file, then you may figure it out by walking up the stack.
Yes, all sections that are described in the PE header get mapped into memory. The IMAGE_SECTION_HEADER struct tells the loader how to map it (the section can for example be much bigger in memory than on disk).
I'm not quite sure if I understand what you are asking. Do you mean how does code from the code section know where to access data in the data section? If the module loads at the preferred load address then the addresses that are generated statically by the linker are correct, otherwise the loader fixes the addresses with relocation information.
Yes, the windows loader also loads the PE Header into memory at the base address of the module. There you can file all the info that was in the file PE header - also the Entry Point.
I can recommend this article for everything about the PE format, especially on relocations.
Does the data section get mapped into the process' memory, or does the
program read it from the disk?
Yes, everything before execution by the dynamic loader of operating systems either Windows or Linux must be mapped into memory.
If it does get mapped into its memory, how can the process acquire the
offset of the section? ( And other sections )
PE file has a well-defined structure which loader use that information and also parse that information to acquire the relative virtual address of sections around ImageBase. Also, if ASLR - Address randomization feature - was activated on the system, the loader has to use relocation information to resolve those offsets.
Is there any way the get the entry point of a process that has already
been mapped into the memory, without touching the file on disk?
NOPE, the loader of the operating system for calculation of OEP uses ImageBase + EntryPoint member values of the optional header structure and in some particular places when Address randomization is enabled, It uses relocation table to resolve all addresses. So we can't do anything without parsing of PE file on the disk.
I found the pointer for the "Import Table" field. Which is 8 bytes in size and is divided into Virtual Address and Size. However the value in Virtual Address field is to big and is misleading my efforts to extract any information relating the whereabouts for entries relating to the Import Table. Is the value pointing to the offset, if so the (.exe) file finishes before reaching the desired offset.
The RVA (relative virtual address) of the Import Directory in the Directory Table must be valid. Perhaps your conversion of it into a physical offset is malfunctioning. Of course, that is done by traversing the section table to find the containing section. Then subtract that section's starting RVA from the target RVA. Then simply add the physical offset of the section to this result. That will give you the position within the file of the Import Directory. Conversions to and from RVAs to physical offsets may be necessary often if you are working with the on-disk file. If working with the in-memory image, sometimes protection utilities modify or destroy parts of the PE header in memory to deter dumping.
Once you get to the Import Directory you still have more work to do. A quick Google showed a better explanation than I'm likely to write here: http://sandsprite.com/CodeStuff/Understanding_imports.html
I am the author of the now 'classic' PECompact, PEBundle (now discontinued), and other PE manipulation utilities.