DOS stub in a PE file [duplicate] - windows

This question already has an answer here:
What's this extra bytes?
(1 answer)
Closed 2 years ago.
Lately, I analyzed some Windows executable files using a hex editor. The PE header starts at address 0x100, so there are 256 Bytes of data before the PE image actually starts. The first 256 Bytes:
I know the following about the file structure
0x00 - 0x3F: This is the MZ header (64 bytes long).
0x40 - 0x4D: These 14 bytes encode seven x86 (16 bit mode) instructions, which are used to print "This program cannot run in DOS mode.\r\r\n" to the screen, using a DOS system call (interrupt 0x21).
0x4E - 0x78: This is the string "This program cannot run in DOS mode.\r\r\n" with a dollar-sign at the end, which tells DOS that this is the end of the string.
0x79 - 0x7F: These are NULL bytes; I guess that they are inserted for alignment.
So I know what the first 128 bytes are for. My question is: What are the next 128 bytes (0x80 - 0xFF) used for? (The PE image starts after them at 0x100.)

It's the so-called undocumented "Rich header". It's a weakly encrypted block of data inserted by the Microsoft linker that indicates what Microsoft tools were used to make the executable. It includes version information from the object files linked, so includes information on what compilers, assemblers and other tools were used.
To decode the Rich header search for the Rich marker and then obtain the 32-bit encryption key that follows. Then working backwards from the Rich marker, XOR the key with the 32-bit values stored there until you find a decoded DanS marker. In between these two markers will be a list of pairs of 32-bit values. The first value of the pair identifies the Microsoft tool used, and the second value indicates how many linked object files were created using this tool. The upper 16-bit part of the tool id value indicates what kind of tool it was, and the lower 16-bit part identifies the build version of the tool.

Related

How to binary analyze a Windows exe file?

I want to binary analyze a Windows EXE file without Windows API call (because I will do it from another OS). I want to disginguish 2 x 2 types:
Is it a windowed program or a command line program?
Is it a Win32 or a Win64 program?
I hope that there are general bit structures which I can query.
The link Microsoft PE and COFF Specification was useful, but a little tricky. Here is my result now:
Every Windows program has got a DOS program block showing a text like "This program cannot be run under DOS" or a similar text. The length of the DOS block can differ. The "real Windows program" section begins later. The beginning offset address of the Windows program is coded in the bytes offset 0x3c and 0x3d. 0x3d holds the hi and 0x3c the lo value. So you have to calulate 256*(0x3d) + (0x3c) to get the offset address of the real Windows program.
The real Windows program begins with four bytes: "PE", followed by two nullbytes. The fifth and sixth byte is 0x4c01 if it is a Win32 program and 0x6486 if it is a Win64 program.
To check if the program is textbased, you have to read offset byte (counted from "PE"=0x00) 0x5c. A value of 3 means text based, 2 means a Windowed GUI program.

How can a windows executable be of only 128 bytes

Look into this post which describes a technique to put an executable code in the first 128 bytes of a DICOM file i.e. in the preamble section. This way the DICOM can be viewed as both a DICOM and an PE executable file.
This git repo demonstrates the same. However they don't show the code, instead only has the binaries.
Now my question. How can an executable be kept only in 128 bytes because I understand a minimal exe will take at least a few KBs from this, this and this SO posts?
From looking at image 1 it appears pretty simple, the valid DOS header is placed in the free area while the full PE image is embedded later in the file, the author put it between two legitimate DICOM meta entries for example. The DOS header is really short and has a field named e_lfanew which holds the file offset to IMAGE_NT_HEADERS. In other words you don't actually need 128 bytes for the full image, you can embed it anywhere in the file as long as it doesn't interfere with DICOM, all that's needed at the start is the dos header.
Before answering how to put an executable in 128 bytes, we need to understand a few things first.
A dicom file must have the characters DICM (File extension) on the bytes 121-124 (Prefix section) to be recognized as a dicom file
A windows executable file must have the DOS Header in the first 64 bytes of the file to be able to be executable as per the PE(Portable Executable) File Format.
Combining the above 2 points a new file format is created called PEDICOM which is both a dicom as well as an executable. The PEDICOM has the architecture as shown in the image above.
The PEDICOM contains both the header and content of the executable file in different sections because an entire executable can’t be fit inside 128 bytes.
Windows provides a list of structures and Win32 APIs to read/write these PE files section by section in winnt.h header.
Creating a PEDICOM file:
DOS header (IMAGE_DOS_HEADER) has 1 field named e_lfanew which contains the offset of the actual PE content. This allows to keep an entire executable code in at least 2 memory locations.
The PE Header (IMAGE_NT_HEADER) has the number of sections and the pointes to the sections (Code, Data, Stack etc.)
Now to answer the original question, an entire executable can't be kept in 128 bytes. However 128 bytes of data are sufficient to declare a file as executable i.e. the dos header and the dos stub can be kept in the 128 bytes while the rest of the executable can be kept somewhere else, in this case in a private dicom tag and a field in the header can point to this. Make the containing file a valid and legitimate executable.

Is there a safe way to identify MS-DOS executable?

I'm trying to identify and filter out all MS-DOS executables files out of executable files I have.
As far as I know, PE differ from MS-DOS by the headers he have which MS-DOS doesn't have, but for some reason some of the samples I have are recognized by TrID as MS-DOS even though they are PE.
I can't find any documentation on the subject, and I searched a lot.
Thanks!
The problem with identifying MS-DOS executables is that technically Windows PECOFF executables are also valid MS-DOS executables. PECOFF executables are prefixed with an "MS-DOS Stub", which is a complete MS-DOS program that in most executables prints a message like "This program cannot be run in DOS mode".
So the first thing to is do is to look at the MS-DOS executable header, and see if if it's valid. It looks like this (from Ralf Brown's Interrupt List):
00h 2 BYTEs .EXE signature, either "MZ" or "ZM" (5A4Dh or 4D5Ah)
(see also #01593)
02h WORD number of bytes in last 512-byte page of executable
04h WORD total number of 512-byte pages in executable (includes any
partial last page)
06h WORD number of relocation entries
08h WORD header size in paragraphs
0Ah WORD minimum paragraphs of memory required to allocate in addition
to executable's size
0Ch WORD maximum paragraphs to allocate in addition to executable's size
0Eh WORD initial SS relative to start of executable
10h WORD initial SP
12h WORD checksum (one's complement of sum of all words in executable)
14h DWORD initial CS:IP relative to start of executable
18h WORD offset within header of relocation table
40h or greater for new-format (NE,LE,LX,W3,PE,etc.) executable
1Ah WORD overlay number (normally 0000h = main program)
The key values to check are at offsets 00h and 18h. The two bytes at the start of the file, the signature, must be "MZ" or 54ADh. While "ZM" also works for MS-DOS program, Windows requires that PECOFF executables use the more common "MZ" signature. The next thing to check is the 16-bit value at offset 18h. It needs to be greater than or equal to 40h for this to be an PECOFF executable.
If the values at offsets 00h and 18h check out then the next thing to do is to read the 32-bit value at offset 3Ch. This contains the offset of the actual PECOFF header. You then need to check the header stars with the signature "PE\0\0", that is, the two characters "P" and "E", followed by two 0 bytes.
Note that its possible to find other letters at the location given at offset 3Ch, like "NE", "LE", "LX" which were used for 16-bit Windows executables, VxDs, and 32-bit OS/2 executables respectively. These other executable formats also have MS-DOS stubs and locate their real headers the same way.

8086 Assembly / MS-DOS, passing file name from the command line

Say I have PROGRAM.ASM - I have the following in the data segment:
.data
Filename db 'file.txt', 0
Fhndl dw ?
Buffer db ?
I want 'file.txt' to be dynamic I guess? Once compiled, PROGRAM.exe needs to be able to accept a file name via the command line:
c:\> PROGRAM anotherfile.txt
EXECUTION GOES HERE
How do I enable this? Thank you in advance.
DOS stores the command line in a legacy structure called the Program Segment Prefix ("PSP"). And I do mean legacy. This structure was designed to be backwards-compatible with programs ported from CP/M.
Where's the PSP?
You know how programs built as .COM files always start with ORG 100h? The reason for that is precisely that - for .COM programs - the PSP is always stored at the beginning of the code segment (at CS:0h). The PSP is 0FFh bytes long, and the actual program code starts right after that (that is, at CS:100h).
The address is also conveniently available at DS:00h and ES:00h, since the key characteristic of the .COM format is that all the segment registers start with the same value (and a COM program typically never changes them).
To read the command line from a .COM program, you can pick its length at CS:80h (or DS:80h, etc. as long as you haven't changed those registers). The Command Line starts at CS:81h and takes the rest of PSP, ending with a Carriage Return (0Dh) as a terminator, so the command line is never more than 126 bytes long.
(and that is why the command line has been 126 bytes in DOS forever, despite the fact we all wished for years it could be made longer. Since WinNT uses provides a different mechanism to access the command line, the WinNT/XP/etc. command line doesn't suffer from this size limitation).
For an .EXE program, you can't rely on CS:00h because the startup code segment can be just about anywhere in memory. However, when the program starts, DOS always stores the PSP at the base of the default data segment. So, at startup, DS:00h and ES:00h will always point to the PSP, for both .EXE and .COM programs.
If you didn't keep track of PSP address at the beginning of the program, and you change both DS and ES, you can always ask DOS to provide the segment value at any time, via INT 21h, function 62h. The segment portion of the PSP address will be returned in BX (the offset being of course 0h).

Which of the MS-DOS header fields are mandatory/optional?

The above is the complete list of MS-DOS header fields, but I don't know which of them are mandatory and which are optional, does anyone know?
If you're trying to create PE Image, e_magic(Magic number) and elfanew(File address of new exe header) are the only mandatory fields that you have to fill in. elfanew should point to the PE IMAGE_NT_HEADER structure.
Well back in 2006 someone wanted to create the world most tiny PE. For this he wrote a small PE Fuzzer. With the smallest codebase posible.
return 42;
He managed to get the following sizes of PE's
you are too busy to read the entire page, here is a summary of the results:
Smallest possible PE file: 97 bytes
Smallest possible PE file on Windows 2000: 133 bytes
Smallest PE file that downloads a file over WebDAV and executes it: 133 bytes
You can check his work here:
http://www.phreedom.org/research/tinype/
He also states the required header values. These are:
e_magic
e_lfanew
Machine
NumberOfSections
SizeOfOptionalHeader
Characteristics
OptionalHeader:
Magic
AddressOfEntryPoint
ImageBase
SectionAlignment
FileAlignment
MajorSubsystemVersion
SizeOfImage
SizeOfHeaders
Subsystem
SizeOfStackCommit
SizeOfHeapReserve
For MS-DOS, all of the headers are mandatory.
For Win9x and above, e_lfanew must be the offset from the start of the image to the start of the IMAGE_NT_HEADERS, and e_magic must be IMAGE_DOS_SIGNATURE ('MZ').

Resources