Generating a PE format executable - windows

I'm trying to generate a PE format executable; I'm at the stage where I have something that dumpbin is happy with, and as far as I can tell is not materially different from an empty program linked with Microsoft's linker, but Windows still rejects it: PE file - what's missing?
If I had some algorithm for generating a valid PE file, maybe I could hill climb from there. Here's what I've found so far:
There's plenty of documentation, sample code and tools for reading PE files, as opposed to generating them from scratch.
PE Bliss lists generation among its features, but won't compile.
Sample assembly language templates for PE file generation concentrate on minimizing size. The most promising looking one generates a file that Windows rejects even though as far as I can see it should be accepted; the one I found that did work, ironically, generates a file that Windows accepts even though as far as I can see it should be rejected, since almost every nominally essential component is missing or malformed.
Is there any sample code available that generates a correct PE file?

Here's the classic page about generating PE from scratch:
http://www.phreedom.org/research/tinype
As for the generic list of required/optional parts, see corkami page on the PE format:
http://code.google.com/p/corkami/wiki/PE
See also the code tree for many examples of small PE files, generated completely from scratch.

Related

Repost - Question about PE sections overlapping

The thread Overlapping sections in PE format on stackoverflow seems to be inactive.
In the PE (Portable Executable) file format, it seems not specified whether or not overlapping sections are allowed. However, it appears that Windows can still execute PE files with overlapping sections, but it may lead to unexpected behavior.
Here are a few questions related to this topic:
Is the presence of overlapping sections in PE files a pattern recognized in the PE Format Specification?
What characteristics does the overlapped range have?
I have attempted to copy a section from the PE file, append it to the end of the last section, rebuild the image, and run it. This method appears to be successful, however, I am unable to confirm if it will cause any issues in the future. The section that I appended does not have any direct interaction with the original sections, therefore I cannot ensure that the code's interaction with the original sections will still function correctly.

How can an executable be this small in file size?

I've been generating payloads on Metasploit and I've been experimenting with the different templates and one of the templates you can have your payload as is exe-small. The type of payload I've been generating is a windows/meterpreter/reverse_tcp and just using the normal exe template it has a file size around 72 KB however exe-small outputs a payload the size of 2.4kb. Why is this? And how could I apply this to my programming?
The smallest possible PE file is just 97 bytes - and it does nothing (just return).
The smallest runnable executable today is 133 bytes, because Windows requires kernel32 being loaded. Executing a PE file with no imports is not possible.
At that size it can already download payload from the Internet by specifying an UNC path in the import table.
To achieve such a small executable, you have to
implement in assembler, mainly to get rid of the C runtime
decrease the file alignment which is 1024 by default
remove the DOS stub that prints the message "This program cannot be run in DOS mode"
Merge some of the PE parts into the MZ header
Remove the data directory
The full description is available in a larger research blog post called TinyPE.
For EXE's this small, the most space typically is used for the icon. Typically the icon has various sizes and color schemes contained, which you could get rid of, if you do not care having an "old, rusty" icon, or no icon at all.
There is also some 4k of space used, when you sign the EXE.
As an example for a small EXE, see never10 by grc. There is a details page which highlights the above points:
https://www.grc.com/never10/details.htm
in the last paragraph:
A final note: I'm a bit annoyed that “Never10” is as large as it is at
85 kbyte. The digital signature increases the application's size by
4k, but the high-resolution and high-color icons Microsoft now
requires takes up 56k! So without all that annoying overhead, the app
would be a respectable 25k. And, yes, of course I wrote it in
assembly language.
Disclaimer: I am not affiliated with grc in any way.
The is little need for an executable to be big, except when it contains what I call code spam, code not actually critical to the functionality of the program/exe. This is valid for other files too. Look at a manually written HTML page compared to one written in FrontPage. That's spamcode.
I remember my good old DOS files that were all KB in size and were performing practically any needed task in the OS. One of my .exes (actually .com) was only 20 bytes in size.
Just think of it this way: just as in some situations a large majority of the files contained in a Windows OS can be removed and still the OS can function perfectly, it's the same with the .exe files: large parts of the code is either useless, or has different than relevant-to-objective purpose or are intentionally added (see below).
The peak of this aberration is the code added nowdays in the .exe files of some games that use advanced copy protection, which can make the files as large as dozens of MB. The actually code needed to run the game is practically under 10% of the full code.
A file size of 72 KB as in your example can be pretty sufficient to do practically anything to a windows OS.
To apply this to your programming, as in make very small .exes, keep things simple. Don't add unnecessary code just for the looks of it or by thinking you will use that part of the program/code at a point.

What is the relationship between sections and data directories in a PE file?

I'm trying to better understand the PE format, and I'm wondering what the relationship between sections and data directories are in a PE file. Opening up a PE file I notice that they often overlap, but I'm not clear on why, or how they relate, and Microsoft's official PE file format spec doesn't really seem to make this any more clear.
I understand that the name value of a section header can be changed and so isn't a guaranteed reference to a specific block, and that as such data directories should be relied on for finding a specific block within the file.
In an example PE file I have opened I notice that the .text section has the same offset as the Import Address Table data directory header, though the IAT size is listed as 8, whilst the .text section size is 6804. In contrast the resource data directory header states that it starts at 16384, and is 1568 in length - tallying precisely with the entries for the .rsrc section. The latter makes sense to me, the former doesn't.
So what are the differing purposes of sections vs. data directories? why do both concepts exist, and why do they sometimes overlap where it doesn't appear to make sense for them to do so?
Sections are meant to package things with "nearly" the same memory protections.
For example let's take calc.exe:
The code section here has a section protection (IMAGE_SECTION_HEADER.Characteritics) set to 0x60000020:
IMAGE_SCN_CNT_CODE | IMAGE_SCN_MEM_EXECUTE | IMAGE_SCN_MEM_READ
The .idata section (import section) has a value of 0x40000040:
IMAGE_SCN_CNT_INITIALIZED_DATA | IMAGE_SCN_MEM_READ
On some case, the linker might decide that the same memory protections will be applied to different sections and merge them together (you can force this setting by using the /MERGE linker option).
Citing Matt Pietrek from his wonderful two-part article "An In-Depth Look into the Win32 Portable Executable File Format" (which can be found here (1/2) (2/2), here (1/2) (2/2) and in .chm format (1/2) (2/2)):
If two sections have similar, compatible attributes, they can usually
be combined into a single section at link time
This is usually true if the sections shares the same IMAGE_SCN_MEM_READ / IMAGE_SCN_MEM_WRITE protections: that's why on some case you might have the import table into the code section (even tough the import table is obviously not meant to be executed). As you can only read the code and import sections (you can't write to them) that's enough for the linker to merge them into the same section.
From the same article:
For example, it's OK to merge .rdata into .text, but you shouldn't
merge .rsrc, .reloc, or .pdata into other sections. Prior to Visual
Studio .NET, you could merge .idata into other sections. In Visual
Studio .NET, this is not allowed, but the linker often merges parts of
the .idata into other sections, such as .rdata, when doing a release
build.
AFAIK, the resource (.rsrc) and relocation (.reloc) sections are always left alone. The reason for the resource section to be left alone might be because some APIs rely on it.
On the other hand, data directories tells you where to find important parts of the PE file (import, export, debug, TLS, resources, relocations, etc.) and even if different sections are merged, you can still find the relevant piece of data.

Stepwise description of file execution in Windows

What happens, at low-level (stepwise) when a program is executed in windows. In other words the processes that take place from clicking a file to actually reaching execution.
Are you aware of any resources that might cover this topic in-depth?
I'd suggest reading this two part MSDN article on the Win32 Portable Executable file format. It describes all the different parts of the file which gives you a lot of information about what has to happen in order to load and run the executable file.
The Wikipedia page on the PE format also contains useful info.

How to map a file offset in an EXE to its PE section

I've opened up a program I wrote with ImageHlp.dll to play around with it a little, and I noticed that there seem to be large gaps in the file. As I understand it, for each PE section, the section header gives its offset in the file as PhysicalAddress, and its size as SizeOfRawData, and thus everything from PhysicalAddress to PhysicalAddress + SizeOfRawData ought to be that section. But there are large swaths of the EXE file that aren't covered by these ranges, so I must be missing something.
I know I can use ImageRVAToSection and give it an RVA address to find out which section that RVA is located in. Is there any way to do something similar with file offsets? How can I find out which PE section byte $ED178 or whatever belongs to?
Edit: Sorry, I didn't read your question carefully enough.
Doing some looking, I'm finding a few files like you mentioned, that the data in the section headers doesn't cover the entire contents of the file. Most of those I've found so far contain a debug record that's not covered. There are a few others with discrepancies I haven't been able to figure out yet though. When/if I can figure out more, I'll add it.
I posted in How does one use VirtualAllocEx do make room for a code cave? a code fragment which examine PEs current loaded in the memory. Probably you will find the answer on your question if you compare the contain of DLL in memory with the contain on the disk (which shows ImageHlp.dll).

Resources