Finding the Raw entrypoint - windows

I want to be able to find out where the code appearing at the entry point comes from by looking at the PE header.
For example, this piece of code is the starting code of my program(401000h)
00401000 >/$ 58 POP EAX ; kernel32.76E93677
00401001 |. 2D 77360100 SUB EAX,13677
00401006 |. BB 4A184000 MOV EBX,<JMP.&kernel32.VirtualProtect>
I want to know where this code comes from. How can I find it without manually scanning my file? (to complete the example, here's an hexdump from the same file, the code now resides at 200h)
Offset 0 1 2 3 4 5 6 7 8 9 A B C D E F
00000200 58 2D 77 36 01 00 BB 4A 18 40 00
How can I get from my virtual entry point (401000h) to the raw entry point (200h)?
I tried solving it myself of course. But I'm missing something. At first I thought:
.text[ Entrypoint (1000h) - VirtualOffset (1000d) ] = raw entrypoint
since the file alignment = 200, and the raw entry point was at the very start of my .text section, I thought I could use this for all the executables.
Solved, I made stupid mistakes when calculating the raw entry point
.text[ Entry point - Virtual offset ] + File Alignment = Raw entry point (relative to .text section)

To locate the offset in the file by yourself you need to have a look at the _IMAGE_NT_HEADERS structure. From this you can get the IMAGE_OPTIONAL_HEADER where
the member you are interested in ImageBase is. You can change its value with EditBin /REBASE so there is little need to roll your own tool.
For reference how you can determine the entry point via dumpbin.
You can use
dumpbin /headers
dumpbin /headers \Windows\bfsvc
Dump of file \Windows\bfsvc.exe
PE signature found
File Type: EXECUTABLE IMAGE
FILE HEADER VALUES
14C machine (x86)
4 number of sections
4A5BBFB3 time date stamp Tue Jul 14 01:13:55 2009
0 file pointer to symbol table
0 number of symbols
E0 size of optional header
102 characteristics
Executable
32 bit word machine
OPTIONAL HEADER VALUES
10B magic # (PE32)
9.00 linker version
DE00 size of code
2000 size of initialized data
0 size of uninitialized data
4149 entry point (01004149)
1000 base of code
F000 base of data
1000000 image base (01000000 to 01011FFF)
1000 section alignment
200 file alignment
For the entry point the image base value is relevant. But this is only true for images that are not ASLR enabled. For them a random base address (1 of 128 different ones) is choosen.
The flag that indicates if an image is ASLR enabled is the value 0x40 which is set in DLL characteristics.
8140 DLL characteristics
For svchost.exe for example it is set for older programs it is generally 0.
Yours,
Alois Kraus

Have a look at this thread including an answer with a detailed explanation: Calculating the file offset of a entry point in a PE file
AddressOfRawEntryPoint (in EXE file) = AddressOfEntryPoint + .text[PointerToRawData] - .text[VirtualAddress]

Related

Finding and patching an instruction in a DLL

I have a (C++) program where, in one of its dll's, the following is done:
if (m_Map.GetMaxValue() >= MAX_CLASSES) {
I have two binaries of this program (compiled with various versions of Visual Studio), one where MAX_CLASSES was #define'd to 50, and one where it was 75. These binaries were made from different branches of the code and the other functionality is different as well. What I need is a version of the binary where the MAX_CLASSES was defined as 50, except with the higher limit i.e. 75.
So a sane person would change the constant in the source code of the branch I need, rebuild and go home. But, building this software is complex because it's old, the dependencies and tooling are old, etc.; plus I have issues with building the installers, and data and so on. So, I thought, how about I just patch this binary so that this one constant is changed directly in the DLL. I have vague recollections of doing similar things in the 1990's for, eh, probably 'educational' purposes.
But times have changed and I barely remember doing it, let alone how I did things back then. I opened the DLL (one where the limit is set to 75, this is the binary I have at hand - I will have to re-do this as soon as I have the actual binary with the 50 limit, so the following references 75 i.e. 0x4b for illustrating the principle) in Ghidra and after some poking around, I found the following:
18005160e 3c 4b CMP AL,0x4b
180051610 0f 82 19 JC LAB_18005172f
01 00 00
Which in the decompiler window I could link back to
if (bVar3 < 0x4b)
and some operations after that that I can map to the source code of the function I have.
Now my questions are:
how do I interpret the values above (the Ghidra output) wrt to the binary layout of the dll? When I hover over the first column value ('18005160e') in Ghidra, I get values for 'imagebase offset', 'memory block offset', 'function offset' and 'byte source offset'. Is this 'byte source offset' the physical address from the start of the dll where these instructions start? The actual value in this hover balloon is 50a0eh - is that Ghidra's notation for 0x50a0e ? I.e. does the trailing 'h' denote 'hex'?
I then tried to open the dll in a regular hex editor ('Hex Editor Neo' which I like to use to view/edit binary data files), and went to offset 0x50a0e, and looked for the values '3c 4b' around there which I didn't find. I searched for this byte sequence in the whole file, and found 7 occurrences, none of which are around 0x50a0e, leading me to think I'm misinterpreting Ghidra's 'byte source offset' here.
how do I make a 'patcher' for this? I would think what I need is a program that only does
FILE* fh = fopen('mydll.dll);
fseek(fh, 0x[magic constant]);
fwrite(fh, 0x4b);
fclose(fh);
where '0x[magic constant]' is hopefully just the value I got from Ghidra in 'byte source offset'? Or is there anything else I need to consider here? Is there any software tool that can generate a patcher program?
Thanks.
18005160e is a VA, a Virtual Address.
It is the sum of a Base Address (most likely 180000000) and an RVA, a Relative Virtual Address.
Find the Base Address of the DLL with any PE inspecting tool (e.g. CFF Explorer) or Ghidra itself.
Subtract the base address from 18005160e to the RVA. Let's say the result is 5160e.
Now you need to find which section this RVA lies in. Again use an PE inspecting tool to find the list of the sections and their RVA/Virtual start and RVA/Virtual size.
Say the RVA lies in the .text section with start at the RVA 1000.
Subtract this start RVA from the result above: 5160e - 1000 = 4160e.
This is the offset of the instruction in the .text section.
To find the offset in the file, just add the raw/offset start of the section (again you can find this with a PE inspecting tool).
Say the .text section starts at the offset 400, then 4160e + 400 = 41a0e is the offset corresponding to the VA 18005160e.
This is all PE 101.

why do all elements of the relocation table have an additional offset?

The question is about loading portable executable images to a random address.
Let's take kernel32.dll as an example, loaded at 0x75A00000.
I can see that at offset 0x10e15 from the image, there is an assembler instruction, which depends on where the image is located.
address:
75A10E13
bytes:
8B 35 18 03 AE 75
command:
MOV ESI,DWORD PTR DS:[75AE0318]
It turns out that by launching the executable file, we must tell the system that we need to relocation to this address.
The system looks at the relocation table, which is in the executable file, and sees the following:
base relocation table
To get the absolute address of the first element to be moved, I do the following: add the virtual address to the address of the image, and then I add the first element of the block to the resulting number.
0x75A00000 + 0x10000 + 0x3E15 = 75A10E15
it's a good number, but always 0x3000 more than I expect. i just subtract 0x3000 and it works. Please, help me find the answer, where does 0x3000 for x86 come from?
Relocation in Portable Executables were resolved when the file was linked. The base relocation table, which you are referring, has a different function: it is used by Windows loader when the PE could not be loaded at the prefered ImageBase address specified by the linker, usually 0x0040_0000.
Dynamically Loaded Libraries shipped with MS Windows are linked to ImageBase addresses different for each core DLL and chosen not to colide with one another, so an executable which imports usual combination of libraries doesn't have to relocate them.
You misinterpreted the format of base relocation section .reloc.
Those 16bit words TypeOrOffset which follow PageRVA and BlockSize have their Base Relocation Type encoded in four most significant bits.
For instance the first TypeOrOffset entry in you dump 0x3E15 has type IMAGE_REL_BASED_HIGHLOW (3) and offset 0x0E15, which is the number to be added to PageRVA.

DWARF debug information: Additional byte generated in element inside debug_info

I am fixing a bug in a parser for DWARF debug information (2nd DWARF version). In the process I made the following strange observation:
A bytestream was created by reading a dll file (created with ada files by GNAT). At the position of a "DW_TAG_structure_type" in debug_info inside this bytestream an additional byte with the value 1 has crept into the byte stream. Thereby all values in the FileInputStream are shifted by 1 byte.
This is how the original DIE in .debug_info looks like:
<1><3aa824>: Abbrev Number: 129 (DW_TAG_structure_type)
<3aa826> DW_AT_byte_size : 44
<3aa827> DW_AT_decl_file : 11
<3aa828> DW_AT_decl_line : 380
<3aa82a> DW_AT_artificial : 1
<3aa82b> DW_AT_sibling : <0x3aa888>
This is the corresponding scheme for the DIE in .debug_abbrev:
129 DW_TAG_structure_type [has children]
DW_AT_byte_size DW_FORM_data1
DW_AT_decl_file DW_FORM_data1
DW_AT_decl_line DW_FORM_data2
DW_AT_artificial DW_FORM_flag
DW_AT_sibling DW_FORM_ref4
DW_AT value: 0 DW_FORM value: 0
However, when I display the bytestream at this point, these values are shown:
Abbrev Number >>Strange Byte<< DW_AT_byte_size DW_AT_decl_file
81 01 2C 0B ...
(129) ?? (44) (11)
Does anyone know what this "Strange Byte" is all about?
Not really familiar with DWARF, but the DWARF 2.0 specification reads (section 7.5.3):
Following the tag encoding is a 1-byte value that determines whether a
debugging information entry using this abbreviation has child entries
or not. If the value is DW_CHILDREN_yes, the next physically
succeeding entry of any debugging information entry using this
abbreviation is the first child of the prior entry. If the 1-byte
value following the abbreviation’s tag encoding is DW_CHILDREN_no, the
next physically succeeding entry of any debugging information entry
using this abbreviation is a sibling of the prior entry. [...]
Finally, the child encoding is followed by a series of attribute
specifications. [...]
So, could this "strange byte" represent DW_CHILDREN_yes?
I'm also a little bit puzzled by the value 0x81 (129). The specification states that the tag encoding for DW_TAG_structure_type is 0x13 (which should fit in a byte), and the previous quote suggests that the tag encoding is followed by a byte that is not part of the tag encoding itself (if I understand correctly). So I would expect a stream of 0x13 0x01 (encoded tag + has child entries flag).

How can I add extra space at the begining of a PE file section? (Windows API)

I'm working with a patcher program and I want to add extra space at the begining of the .text section. Like if the PE section at disk(raw) begins with let's say 90 90 90 EB 64 ... etc I want to make it to begin with let's say 00 00 00 90 90 90 90 EB 64 ... Can this be done using the windows api in c or asm? how can be done?
I'm using the function createfile to open the file and mapview to map it into memory(both functions from win api) and I'm working with MASM
I know that I can increase the file size by calling the CreateFile function and setting a larger file space but how can the specific .text section can be increased?
thanks!!
There is nothing in the Win32 API that will handles this for you. You will have to actually parse the file's PE header and all of its sections yourself:
Peering Inside the PE: A Tour of the Win32 Portable Executable File Format.
Open the existing file for input and create a new file for output. Parse the input file's PE, writing out everything that precedes the .text section, then write out the extra spacing as needed, then write out the .text section and everything that follows it. And make sure you are updating any RVAs throughout the PE that refer to memory addresses within/after the extra spacing you add, since you are changing the offset of those addresses.
When finished, replace the input file with the output file (preferably after backing it up first).

Which of the MS-DOS header fields are mandatory/optional?

The above is the complete list of MS-DOS header fields, but I don't know which of them are mandatory and which are optional, does anyone know?
If you're trying to create PE Image, e_magic(Magic number) and elfanew(File address of new exe header) are the only mandatory fields that you have to fill in. elfanew should point to the PE IMAGE_NT_HEADER structure.
Well back in 2006 someone wanted to create the world most tiny PE. For this he wrote a small PE Fuzzer. With the smallest codebase posible.
return 42;
He managed to get the following sizes of PE's
you are too busy to read the entire page, here is a summary of the results:
Smallest possible PE file: 97 bytes
Smallest possible PE file on Windows 2000: 133 bytes
Smallest PE file that downloads a file over WebDAV and executes it: 133 bytes
You can check his work here:
http://www.phreedom.org/research/tinype/
He also states the required header values. These are:
e_magic
e_lfanew
Machine
NumberOfSections
SizeOfOptionalHeader
Characteristics
OptionalHeader:
Magic
AddressOfEntryPoint
ImageBase
SectionAlignment
FileAlignment
MajorSubsystemVersion
SizeOfImage
SizeOfHeaders
Subsystem
SizeOfStackCommit
SizeOfHeapReserve
For MS-DOS, all of the headers are mandatory.
For Win9x and above, e_lfanew must be the offset from the start of the image to the start of the IMAGE_NT_HEADERS, and e_magic must be IMAGE_DOS_SIGNATURE ('MZ').

Resources