I'm reading the docs of PE's file structure and I'm wondering what parts of the structure of a PE may differ without altering its behavior.
To clarify, suppose I have two PEs of a calculator program, the TimeDateStamp of the COFF File Header may differ between them but the program itself would be "equivalent".
My question is what are all fields that may be different too between them? Does this even make sense to ask?
Everything between MZ and PE except e_lfanew (Might break it in DOS of course).
TimeDateStamp
MajorLinkerVersion, MinorLinkerVersion, MajorImageVersion and MinorImageVersion but might trigger very minor compatibility shims in Windows
CheckSum (Assuming IMAGE_DLLCHARACTERISTICS_FORCE_INTEGRITY is not set and it is not a kernel driver)
SizeOfStack... and SizeOfHeap...
LoaderFlags maybe? This field is deprecated/undocumented.
In the section headers (IMAGE_SECTION_HEADER) you can most likely change the ASCII names and ...Linenumbers. You can also add Write and/or Execute to Characteristics.
There are several data areas (resources etc.) where there are timestamps you can change.
The padding between data areas.
SizeOfInitializedData and SizeOfUninitializedData can be set to 0 and maybe other values but then you start to violate the PE spec.
When you look at some of the tiny PE projects you will see that they don't include the full list of DataDirectories but this is hard to do on an existing PE. These projects often just do whatever the NT loader needs and they don't care about the PE spec.
Related
I'm working on a program to do post-compilation optimization. Because I've noticed there are a few special cases that gcc just doesn't optimize well, even at -O3.
Is there a library that would allow me to load a binary (x86), into some datastructure that would be suitable for editing, and then write it out again? I would also want it to handle updating all the memory offsets, as the edits might change the size of the binary.
I will suppose you are looking at ELF executable format. Say it if you were more thinking about PE or Mach-O.
You, indeed, can find several libraries or tools to edit an ELF format and modify it. Among others, here is a small list of such tools:
Elfesteem
ERESI
PatchELF
When I thought about resizing images and saving the new sizes parallel on the server, I came to the following question:
// Original size
DSC_18342.jpg
// New size: Use an "x" for "times"
DSC_18342_640x480px.jpg
// New size: Use the real "×" for "times"
DSC_18342_640×480px.jpg
The point is, that it's slightly easier if you got a real × instead of an x in the file name, as the unit px already contains the x, which makes it a little bit harder to read.
Question: What problems could I get in, when using the Html entity in the filename?
Sidenotes: I'm writing an open source, publicly available script, so the targeted server can be anything - therefore I'm also interested (and will vote up) edge cases, that I'm not aware off.
Thank you all!
You may have noticed, that I'm aware, that I could simply avoid it (which I'll do anyway), but I'm interested in this issue and learning about it, so please just take above example as possible case.
There are file systems that simply don't support unicode. This may be less of a problem if you make unicode support a requirement of your application.
Some consideration about different unicode file system are given in File Systems, Unicode, and Normalization.
A concluding remark (from a viewpoint of solaris file systems) is:
Complete compatibility and seamless interoperability with
all other existing Unicode file systems appears not 100%
possible due to inherent differences.
I can imagine that there will be problems especially when migrating the application. Just storing files is probably no problem but if their names are stored in a database there might be a mismatch after migration.
I'm attempting to edit a library in hex editor, insert mode. The main point is to rename a few entries in it. If I make it in "Otherwrite" mode, everything works fine, but every time I try to add a few symbols to the end of string in "Insert" mode, the library fails to load. Anything I'm missing here?
Yes, you're missing plenty. A library follows the PE/COFF format, which is quite heavy on pointers throughout the file. (Eg, towards the beginning of the file is a table which points to the locations of each section in the file).
In the case that you are editing resources, there's the potential to do it without breaking things if you make sure you correct any pointers and sizes for anything pointing to after your edits, but I doubt it'll be easy. In the case that you are editing the .text section (ie, the code), then I doubt you'll get it done, since the operands of function calls and jumps are relative locations to their position in code - you would need to update the entire code to account for edits.
One technique to overcome this is a "code cave", where you replace a piece of the existing code with an explicit JMP instruction to some empty location (You can do this at runtime, where you have the ability to create new memory) - where you define some new code which can be of arbitrary length - then you explicitly JMP back to where you called from (+5 bytes say for the JMP opcode + operand).
Are the names you're changing them to the same length as the old names? If not, then the offsets of everything is shifted. And do any of the functions call one another? That could be another problem point. It'd be easier to obtain the source code (from the project's website if it's not in-house, or from the vendor if it's closed) and change them in that, and then recompile it. I'm curious as to why you are changing the names anyway.
DLLs are a complex binary format (ie compiled code). The compiling process turns named function calls into hard-wired references to specific positions in the file ("offsets"). Therefore if you insert characters into the middle of the file, the offsets after that point will no longer match what is actually at the position they reference, meaning that the function calls in your library will run the wrong code (if they manage to run anything at all).
Basically, the bottom line is what you're doing is always going to break stuff. If you're unlucky, it might even break it really badly and cause serious damage.
Sure - a detailed knowledge of the format, and what has to change. If you're wondering why some of your edits cause loading to fail, you are missing that knowledge.
Libraries are intended to be written by the linker for the use of the linker. They follow a well-defined format that is intended to be easy for the linker to write and read. They don't need tolerance for human input like a compiler does.
Very simply, libraries aren't intended to be modified by hex editors. It may be possible to change entries by overwriting them with names of the same length, or that may screw up an index somewhere. If you change the length of anything, you're likely breaking pointers and metadata.
You don't give any reason for wanting to do this. If it's for fun, well, it's harder than you expected. If you have another reason, you're better off getting the source, or getting somebody who has the source to rename and rebuild.
Say there is a buggy program that contains a sprintf() and i want to change it to a snprintf so it doesn't have a buffer overflow.. how do I do that in IDA??
You really don't want to make that kind of change using information from IDA pro.
Although IDA's disassembly is relatively high quality, it's not high quality enough to support executable rewriting. Converting a call to sprintf to a call to snprintf requires pushing a new argument on to the stack. That requires the introduction of a new instruction, which impacts the EA of everything that follows it in the executable image. Updating those effective addresses requires extremely high quality disassembly. In particular, you need to be able to:
Identify which addresses in the executable are data, and which ones are code
Identify which instruction operands are symbolic (address references) and which instruction operands are numeric.
Ida can't (reliably) give you that information. Also, if the executable is statically linked against the crt, it may not contain snpritnf, which would make performing the rewriting by hand VERY difficult.
There are a few potential workarounds. If there is sufficient padding available in (or after) the function making the call, you might be able to get away with only rewriting a single function. Alternatively, if you have access to object files, and those object files were compiled with the /GY switch (assuming you are using Visual Studio) then you may be able to edit the object file. However, editing the object file may still require substantial fix ups.
Presumably, however, if you have access to the object files you probably also have access to the source. Changing the source is probably your best bet.
I'd like to find out which of the DLLs located in various of my installed softwares have been compiled with SafeSEH and which ones haven't. Is there a tool that could give me that information, otherwise what would be the best solution to code something that does that verification?
Thanks in advance.
You could start out by taking this tool, SafeSEH Dump and examine the output. It shouldn't be too hard to run it batch-like against a list of all your DLLs. You need to create a login to download it. Here's a blog post that references SafeSEH Dump too, but the download link at that page seems dead.
Also you can use dumpbin.exe /loadconfig to look for the presence of the Safe Exception Handler table. More info here: http://www.jwsecure.com/dan/2007/07/06/the-safe-exception-handler-table/
I know this is an ancient question that I'm dredging up from the dead, but there is a programmatic solution.
First, parse the PE format. There are all sorts of solutions for this, so I won't go into it. Suffice to say that it's a bigger topic than I can cover here. If you decide to roll your own, be careful of the differences between 32-bit and 64-bit executables.
Once you've got a PE file parsed, skip the DOS header, NT signature, file header (a.k.a. COFF header), the optional header, and finally get to the Data Directories. Each of these directories has an RVA and a size. Find the RVA and size of the Configuration Directory (the 10th entry in the list).
Here's where we can start doing detection. If the RVA or size is zero, SafeSEH isn't enabled. If the size is anything other than 0x40, it was built with a compiler that was (probably) vulnerable to the MS12-001 SafeSEH bypass bug. Don't trust the size value though - it does not necessarily match up to the size of the data within, because of some quirks with Windows XP - see the previous link for more details.
If the RVA and size seem sensible, follow the RVA to the Load Configuration structure. Parse that, then read the SEHandlerTable and SEHandlerCount values. If the handler table pointer is null (i.e. zero) then SafeSEH is not enabled. If the handler count is zero, there are no handlers registered, even though SafeSEH might be switched on.