How do I make space for my code cave in a Windows PE 32bit executable - windows

So I want to make a space for my code caves in minesweeper.exe (typical Windows XP minesweeper game, link: Minesweeper). So I modified the PE header of the file via CFF Explorer to increase size of the .text section.
I tried increasing raw size of .text segment by 1000h (new size was 3B58), but Windows was unable to locate the entry point and the game failed to launch. Then I tried increasing the size of the .rsrc section, adding a new section, increasing the image size, but none of those attempts were successful, Windows was saying that "This is not x32 executable".
So here is the question: how do I make space for my code cave? I don't want to search for empty space left by the compiler, I want to have nice and clean 1000h bytes for my code. A tutorial for that and a detailed explanation for how to do that without corrupting a game would be GREAT! (And yes, I am actually hacking a minesweeper)

You can't increase the size of a section without invalidating the following ones (typically because it invalidates offsets and addresses in those sections). This remains possible but it's extremely error prone and doesn't worth the hassle when you have a simpler solution.
Typically, you juste need to add a section at the end of the PE and jump there from the code section. There is usually a little bit of space at the end of the code section (code cave) so you can place your JMPs (or a little code stub) there to redirect to the new section. You can also add other new sections for data or new resources or whatever you want.
Note: I'm using two tools: CFF explorer as a PE browser; an hex editor.
This file is quite particular so it is a little bit harder than usual to add a new section.
Let's start!
Below is an hex view of the array of IMAGE_SECTION_HEADER:
Usually there is some room to add a new section but in this particular case, there's none... The last section header is immediately followed by something.
Judging by the content, this is probably a bound import directory, which is confirmed in CFF explorer (offset of the bound directory is 0x248):
Bound import directory are of no use today, especially with ASLR, so we can zero out the whole directory (its size is 0xA8 bytes as indicated in the previous screenshot):
You can also zero out the Bound Import directory RVA in the Data Directories although this is not strictly required:
Now, it's time to add the new section.
Add a new section
Minesweeper comes with 3 sections by default, so increment the Number of sections from 3 to 4:
Go to the sections headers and add a new section (you can do it directly in CFF explorer; I named mine, .foobar, be wary that section names are at most 8 characters and don't need to end with a NULL byte):
You need to choose two numbers:
The raw size of the new section (I picked 0x400) ; it must be a multiple of FileAlignment (which is 0x200 in this case).
The virtual size of the new section (I picked 0x1000); it must be a multiple of SectionAlignement (which is 0x1000 for this binary).
Now we" need to calculate the two other members, Virtual Address and Raw Address.
Virtual Address
Let's take an example with the first and second section.
The first section starts at virtual address 0x1000 and has a virtual Size of 0x3A56. The next section virtual address must be aligned on SectionAlignement (0x1000) so the calculation is (using python here):
>>> def round_up_multiple_of(number, multiple):
num = number + (multiple - 1)
return num - (num % multiple)
>>> hex(round_up_multiple_of(0x1000 + 0x3a56, 0x1000))
'0x5000'
Which gives 0x5000 which is right (.data section starts at virtual address 0x5000).
Now, where our last section should start?
.rsrc section starts at 0x6000 and has a size of 0x19160:
>>> hex(round_up_multiple_of(0x6000 + 0x19160, 0x1000))
'0x20000'
So it must start at virtual address 0x20000. Put that number in Virtual Address.
Raw address
(Typically this is not needed as all sections are already aligned the last section must start right at the end of the file, but we'll do it).
As a reminder, the raw address is an address in the file (not in memory).
Let's start with an example (first and second section):
The first section raw address is 0x400 and its raw size 0x3c00. FileAlignement is 0x200, thus:
>>> hex(round_up_multiple_of(0x400 + 0x3c00, 0x200))
'0x4000'
The second section should start on the file (its Raw address) at 0x4000 which is right.
Thus for our new section, the calculation is:
.rsrc section starts in the file at 0x4200
.rsrc section size on file is 0x19200
FileAligment is 0x200
The calculation is as follow:
>>> hex(round_up_multiple_of(0x4200 + 0x19200, 0x200))
'0x1d400'
Our last section starts at the raw address 0x1d400 in the file which is confirmed with an hex editor:
Final steps
One last step is required, the calculation of the SizeOfImage field in the Optional header. According to the PE specification the field is:
The size (in bytes) of the image, including all headers, as the image
is loaded in memory. It must be a multiple of SectionAlignment.
Hence the calculation can be simplified as: VirtualAddress + VirtualSize of the last section, aligned on SectionAlignment (0x1000):
>>> hex(round_up_multiple_of(0x20000 + 0x1000, 0x1000))
'0x21000'
Now, save all your modifications in CFF explorer and exit.
Adding room for the new section
The last step is to add the required bytes for the last section. As I choose a Raw size of 0x400, I insert 0x400 bytes at Raw Address (0x1d400) with an hex editor.
Save you file. If you followed all the steps it must work (tested on Win 10) as is and you can start the modified executable without any errors.
Try to experience with a different raw size for the new section if 0x400 is not enough.
Now you have a new empty section, the rest is up to you for modifying the code :)

Related

Finding and patching an instruction in a DLL

I have a (C++) program where, in one of its dll's, the following is done:
if (m_Map.GetMaxValue() >= MAX_CLASSES) {
I have two binaries of this program (compiled with various versions of Visual Studio), one where MAX_CLASSES was #define'd to 50, and one where it was 75. These binaries were made from different branches of the code and the other functionality is different as well. What I need is a version of the binary where the MAX_CLASSES was defined as 50, except with the higher limit i.e. 75.
So a sane person would change the constant in the source code of the branch I need, rebuild and go home. But, building this software is complex because it's old, the dependencies and tooling are old, etc.; plus I have issues with building the installers, and data and so on. So, I thought, how about I just patch this binary so that this one constant is changed directly in the DLL. I have vague recollections of doing similar things in the 1990's for, eh, probably 'educational' purposes.
But times have changed and I barely remember doing it, let alone how I did things back then. I opened the DLL (one where the limit is set to 75, this is the binary I have at hand - I will have to re-do this as soon as I have the actual binary with the 50 limit, so the following references 75 i.e. 0x4b for illustrating the principle) in Ghidra and after some poking around, I found the following:
18005160e 3c 4b CMP AL,0x4b
180051610 0f 82 19 JC LAB_18005172f
01 00 00
Which in the decompiler window I could link back to
if (bVar3 < 0x4b)
and some operations after that that I can map to the source code of the function I have.
Now my questions are:
how do I interpret the values above (the Ghidra output) wrt to the binary layout of the dll? When I hover over the first column value ('18005160e') in Ghidra, I get values for 'imagebase offset', 'memory block offset', 'function offset' and 'byte source offset'. Is this 'byte source offset' the physical address from the start of the dll where these instructions start? The actual value in this hover balloon is 50a0eh - is that Ghidra's notation for 0x50a0e ? I.e. does the trailing 'h' denote 'hex'?
I then tried to open the dll in a regular hex editor ('Hex Editor Neo' which I like to use to view/edit binary data files), and went to offset 0x50a0e, and looked for the values '3c 4b' around there which I didn't find. I searched for this byte sequence in the whole file, and found 7 occurrences, none of which are around 0x50a0e, leading me to think I'm misinterpreting Ghidra's 'byte source offset' here.
how do I make a 'patcher' for this? I would think what I need is a program that only does
FILE* fh = fopen('mydll.dll);
fseek(fh, 0x[magic constant]);
fwrite(fh, 0x4b);
fclose(fh);
where '0x[magic constant]' is hopefully just the value I got from Ghidra in 'byte source offset'? Or is there anything else I need to consider here? Is there any software tool that can generate a patcher program?
Thanks.
18005160e is a VA, a Virtual Address.
It is the sum of a Base Address (most likely 180000000) and an RVA, a Relative Virtual Address.
Find the Base Address of the DLL with any PE inspecting tool (e.g. CFF Explorer) or Ghidra itself.
Subtract the base address from 18005160e to the RVA. Let's say the result is 5160e.
Now you need to find which section this RVA lies in. Again use an PE inspecting tool to find the list of the sections and their RVA/Virtual start and RVA/Virtual size.
Say the RVA lies in the .text section with start at the RVA 1000.
Subtract this start RVA from the result above: 5160e - 1000 = 4160e.
This is the offset of the instruction in the .text section.
To find the offset in the file, just add the raw/offset start of the section (again you can find this with a PE inspecting tool).
Say the .text section starts at the offset 400, then 4160e + 400 = 41a0e is the offset corresponding to the VA 18005160e.
This is all PE 101.

How can i calculate the file offset of the memory virtual address of the export table?

so, i was trying to read a DLL file, everything was fine till i reach the Optional Header Data Directories, specifically its first member, the Export Table.
My problem is that i can't move the offset of my reader because the virtual address member is based on memory VA, and my reader is based on file offset. May a visual example helps:
As you can see, the loaded virtual address that this PE viewer reads at the Export Table Address from the Data Directory(Optional Header) is the value 0x00002630(lets refer to it as hex1 from now on).
However, when i click on the Export Table to see the actual content, the program does the conversion of this address from memory to file offset, redirecting me to this address as the result:
The address that it redirects me is the 0x00001a30(lets refer to it as hex2 from now on).
I did some tests on my own like dividing the hex1 per 8 because i thought it could be the transition from memory alignment which is 4096 and the file alignment which is 512 but it didn't gave me the same result as hex2. I also did some weird stuff to try to get that formula, but it gave me even more bizarre results.
So, my question would be, how could i get/calculate that file offset(hex2) if i only know the memory offset at the Data Directory(hex1)?
Assuming you are using MSVC C/C++, you first need to locate the array of IMAGE_SECTION_HEADER structures following the Optional Header. The SDK has a macro called IMAGE_FIRST_SECTION(pNtHeaders) in which you just pass the pointer of your PE header to make this process easier. It basically just skips past the optional header in memory which is where the section headers start. This macro will also work on either 32-bit or 64-bit Windows PE files.
Once you have the address of the IMAGE_SECTION_HEADER array, you loop through the structures up to FileHeader.NumberOfSections using pointer math. Each of the structures describe the relative starting of a memory address (VirtualAddress) for each of the named PE sections along with the file offset (PointerToRawData) to that section within the file you have loaded.
The size of the section WITHIN the file is SizeOfRawData. At this point, you now have everything you need to translate any given RVA to a file offset. First range check each IMAGE_SECTION_HEADER's VirtualAddress with the RVA you are looking up. I.e.:
if (uRva >= pSect->VirtualAddress && (uRva < (pSect->VirtualAddress + pSect->SizeOfRawData))
{
//found
}
If you find a matching section, you then subtract the VirtualAddress from your lookup RVA, then add the PointerToRawData offset:
uFileOffset = uRva - pSect->VirtualAddress + pSect->PointerToRawData
This results in an offset from the beginning of the file corresponding to that RVA. At this point you have translated the RVA to a file offset.
NOTE: Due to padding, incorrect PE files, etc., you may find not all RVAs will map to a location within the file at which point you might display an error message.

IDA Pro disassembly shows ? instead of hex or plain ascii in .data?

I am using IDA Pro to disassemble a Windows DLL file. At one point I have a line of code saying
mov esi, dword_xxxxxxxx
I need to know what the dword is, but double-clicking it brings me to the .data page and everything is in question marks.
How do I get the plain text that is supposed to be there?
If you see question marks in IDA, this means that there's no physical data at this location on the file (on your disk drive).
Sections in PE files have a physical size (given by the SizeOfRawData field of the section header). This physical size (on disk) might be different from the size of the section once it is mapped onto the process memory by the Windows' loader (this size is given by the VirtualSize field of the section header).
So, if the VirtualSize field is bigger than the SizeOfRawData field, a part of the section has no physical existence and it exists only in memory (once the file is mapped onto the process address space).
On most case, at program entry point, you can assume this memory is filled with 0 (but some parts of the memory might be written by the windows loader).
To get the locations where the data is being written, read or loaded you can use cross-references (xref). Here's an example :
Click on the name of the data from which you want the xref :
Then press 'x', you'll be shown all known (to ida) location where the data is used :
The second column indicates how the data is used:
r means it is read
w means it is written
o means it is loaded as a pointer

x86 segmentation, DOS, MZ file format, and disassembling

I'm disassembling "Test Drive III". It's a 1990 DOS game. The *.EXE has MZ format.
I've never dealt with segmentation or DOS, so I would be grateful if you answered some of my questions.
1) The game's system requirements mention 286 CPU, which has protected mode. As far as I know, DOS was 90% real mode software, yet some applications could enter protected mode. Can I be sure that the app uses the CPU in real mode only? IOW, is it guaranteed that the segment registers contain actual offset of the segment instead of an index to segment descriptor?
2) Said system requirements mention 1 MB of RAM. How is this amount of RAM even meant to be accessed if the uppermost 384 KB of the address space are reserved for stuff like MMIO and ROM? I've heard about UMBs (using holes in UMA to access RAM) and about HMA, but it still doesn't allow to access the whole 1 MB of physical RAM. So, was precious RAM just wasted because its physical address happened to be reserved for UMA? Or maybe the game uses some crutches like LIM EMS or XMS?
3) Is CS incremented automatically when the code crosses segment boundaries? Say, the IP reaches 0xFFFF, and what then? Does CS switch to the next segment before next instruction is executed? Same goes for SS. What happens when SP goes all the way down to 0x0000?
4) The MZ header of the executable looks like this:
signature 23117 "0x5a4d"
bytes_in_last_block 117
blocks_in_file 270
num_relocs 0
header_paragraphs 32
min_extra_paragraphs 3349
max_extra_paragraphs 65535
ss 11422
sp 128
checksum 0
ip 16
cs 8385
reloc_table_offset 30
overlay_number 0
Why does it have no relocation information? How is it even meant to run without address fixups? Or is it built as completely position-independent code consisting from program-counter-relative instructions? The game comes with a cheat utility which is also an MZ executable. Despite being much smaller (8448 bytes - so small that it fits into a single segment), it still has relocation information:
offset 1
segment 0
offset 222
segment 0
offset 272
segment 0
This allows IDA to properly disassemble the cheat's code. But the game EXE has nothing, even though it clearly has lots of far pointers.
5) Is there even such thing as 'sections' in DOS? I mean, data section, code (text) section etc? The MZ header points to the stack section, but it has no information about data section. Is data and code completely mixed in DOS programs?
6) Why even having a stack section in EXE file at all? It has nothing but zeroes. Why wasting disk space instead of just saying, "start stack from here"? Like it is done with BSS section?
7) MZ header contains information about initial values of SS and CS. What about DS? What's its initial value?
8) What does an MZ executable have after the exe data? The cheat utility has whole 3507 bytes in the end of the executable file which look like
__exitclean.__exit.__restorezero._abort.DGROUP#.__MMODEL._main._access.
_atexit._close._exit._fclose._fflush._flushall._fopen._freopen._fdopen
._fseek._ftell._printf.__fputc._fputc._fputchar.__FPUTN.__setupio._setvbuf
._tell.__MKNAME._tmpnam._write.__xfclose.__xfflush.___brk.___sbrk._brk._sbrk
.__chmod.__close._ioctl.__IOERROR._isatty._lseek.__LONGTOA._itoa._ultoa.
_ltoa._memcpy._open.__open._strcat._unlink.__VPRINTER.__write._free._malloc
._realloc.__REALCVT.DATASEG#.__Int0Vector.__Int4Vector.__Int5Vector.
__Int6Vector.__C0argc.__C0argv.__C0environ.__envLng.__envseg.__envSize
Is this some kind of debugging symbol information?
Thank you in advance for your help.
Re. 1. No, you can't be sure until you prove otherwise to yourself. One giveaway would be the presence of MOV CR0, ... in the code.
Re. 2. While marketing materials aren't to be confused with an engineering specification, there's a technical reason for this. A 286 CPU could address more than 1M of physical address space. The RAM was only "wasted" in real mode, and only if an EMM (or EMS) driver wasn't used. On 286 systems, the RAM past 640kb was usually "pushed up" to start at the 1088kb mark. The ISA and on-board peripherals' memory address space was mapped 1:1 into the 640-1024kb window. To use the RAM from the real mode needed an EMM or EMS driver. From protected mode, it was simply "there" as soon as you set up the segment descriptor correctly.
If the game actually needed the extra 384kb of RAM over the 640kb available in the real mode, it's a strong indication that it either switched to protected mode or required the services or an EMM or EMS driver.
Re. 3. I wish I remembered that. On reflection, I wish not :) Someone else please edit or answer separately. Hah, I did know it at some point in time :)
Re. 4. You say "[the code] has lots of instructions like call far ptr 18DCh:78Ch". This implies one of three things:
Protected mode is used and the segment part of the address is a selector into the segment descriptor table.
There is code there that relocates those instructions without DOS having to do it.
There is code there that forcibly relocates the game to a constant position in the address space. If the game doesn't use DOS to access on-disk files, it can remove DOS completely and take over, gaining lots of memory in the process. I don't recall whether you could exit from the game back to the command prompt. Some games where "play until you reboot".
Re. 5. The .EXE header does not "point" to any stack, there is no stack section you imply, the concept of sections doesn't exist as far as the .EXE file is concerned. The SS register value is obtained by adding the segment the executable was loaded at with the SS value from the header.
It's true that the linker can arrange sections contiguously in the .EXE file, but such sections' properties are not included in the .EXE header. They often can be reverse-engineered by inspecting the executable.
Re. 6. The SS and SP values in the .EXE header are not file pointers. The EXE file might have a part that maps to the stack, but that's entirely optional.
Re. 7. This is already asked and answered here.
Re. 8. This looks like a debug symbol list. The cheat utility was linked with the debugging information left in. You can have completely arbitrary data there - often it'd various resources (graphics, music, etc.).

Import Table in PE (.exe)

I found the pointer for the "Import Table" field. Which is 8 bytes in size and is divided into Virtual Address and Size. However the value in Virtual Address field is to big and is misleading my efforts to extract any information relating the whereabouts for entries relating to the Import Table. Is the value pointing to the offset, if so the (.exe) file finishes before reaching the desired offset.
The RVA (relative virtual address) of the Import Directory in the Directory Table must be valid. Perhaps your conversion of it into a physical offset is malfunctioning. Of course, that is done by traversing the section table to find the containing section. Then subtract that section's starting RVA from the target RVA. Then simply add the physical offset of the section to this result. That will give you the position within the file of the Import Directory. Conversions to and from RVAs to physical offsets may be necessary often if you are working with the on-disk file. If working with the in-memory image, sometimes protection utilities modify or destroy parts of the PE header in memory to deter dumping.
Once you get to the Import Directory you still have more work to do. A quick Google showed a better explanation than I'm likely to write here: http://sandsprite.com/CodeStuff/Understanding_imports.html
I am the author of the now 'classic' PECompact, PEBundle (now discontinued), and other PE manipulation utilities.

Resources