What's the differences between Address of Entry Point and Original Entry Point? - portable-executable

I'm learning PE files structure, but I'm confused about the concept of Address of Entry Point and Original Entry Point. I know Address of Entry Point can be calculated according to Image_Optional_Header, Does Original Entry Point do? And the code between Address of Entry Point and Original Entry Point do what?

The Original Entry Point is a concept typically referred to in reverse engineering for an executable that has been modified by some means such as being compressed (or encrypted) by a packer or infected with malware. Prior to modification, the entry-point of an executable IS the original entry point (OEP). When an executable has been modified, such as to include a stub of code that runs prior to the original code, the entry-point of the executable is changed to point to the new code. The stub then references the old entry-point when it is done. So once the stub runs, it will transfer control to the address of the original entry point so the modified program still works (or appears) to work as normal.

Related

What is the purpose of creating a symbolic link between files?

Recently I came across the os library in Python and found out about the existence of symbolic links. I would like to know what a symbolic link is, why it exists, and what are various uses of it?
I will answer this from a perspective of an *nix user (specifically Linux). If you're interested in how this relates to Windows I suggest you look for tutorials like this one. This will be a bit of a roundabout, but I find it that symbolic links or symlinks are best explained together with hard links and generic properties of a filesystem on Linux.
Links and files on Linux
As a rule of thumb, in Linux everything is treated as a file. Directories are files that contain mappings from names (paths) to inodes, which are just unique identifiers of different objects residing on your system. Basically, if I give you a name like /home/gst/mydog.png the accessing process will first look into the / directory (the root directory) where it will find information on where to find home, then opening that file it will look into it to see where gst is and finally in that file it will try to find the location of mydog.png, and if successful try do whatever it set out to do with it. Going back to directory files, the mappings they contain are called links. Which brings us to hard and symbolic links.
Hard vs Symbolic links
A hard link is just a mapping like the one we discussed previously. It points directly to a certain object. A symlink on the other hand does not point directly to an object. Rather it just saves a path to an object. For example, say that I created a symbolic link to /home/gst/mydog.png at /home/gst/Desktop/mycat.png with os.symlink("/home/gst/mydog.png", "/home/gst/Desktop/mycat.png"). When I try to open it, the name /home/gst/Desktop/mycat.png is usually resolved to /home/gst/mydog.png. By following the symlink located at /home/gst/Desktop/mycat.png I actually (try to) access an object pointed to by /home/gst/mydog.png.
If I create a hard link (for example by calling os.link) I just add entries to the relevant directory files, such that the specific name can be followed to the linked object. When I create a symbolic link I create a file that contains a path to another file (which might be another symbolic link).
More specific to your question, if I pass /home/gst/Desktop/mycat.png to os.readlink it will return /home/gst/mydog.png. This name resolution also happens when calling functions in os with an (optional) parameter follow_symlinks set to True, however, if it's set to False the name does not get resolved (for instance you'd set it to false when you want to manipulate the symlink itself not the object it points to). From the module documentation:
not following symlinks: If follow_symlinks is False, and the last element of the path to operate on is a symbolic link, the function will operate on the symbolic link itself instead of the file the link points to. (For POSIX systems, Python will call the l... version of the function.)
You can check whether or not follow_symlinks is supported on your platform using os.supports_follow_symlinks. If it is unavailable, using it will raise a NotImplementedError.
Why use hard links?
This question has already been answered here, quoting from the accepted answer:
The main advantage of hard links is that, compared to soft links, there is no size or speed penalty. Soft links are an extra layer of indirection on top of normal file access; the kernel has to dereference the link when you open the file, and this takes a small amount of time. The link also takes a small amount of space on the disk, to hold the text of the link. These penalties do not exist with hard links because they are built into the very structure of the filesystem.
I'd like to add that hard links allow for an easy method of file backup. For every file the system keeps a count of its hard links. Once this count reaches 0 the memory segment on which the file is located is marked as free, meaning that the system will eventually overwrite it with another data (effectively deleting the previous file - which doesn't happen for at least as long as a running process has an opened stream associated with the file, but that's another story). Why would that matter?
Let's say you have a huge directory full of files you'd like to manipulate somehow (rename some, delete others, etc.) and you write a script to do this for you. However, you're not completely sure that the script will work as intended and you fear it might delete some wrong files. You also don't want to copy all the files, as this would take up too much space and time. One solution is to just create a hard link for each file at some other point in the filesystem. If you delete a file in the target directory, the associated object is still available because there's another hard link associated with it. Creating that many hard links will consume much less time and space than copying all the file, yet it will give you a reasonable backup strategy.
This is not the case with symbolic links. Remember, symlinks point to other links (possibly another symlink as well) not to actual files. Hence, I might create a symlink to a file, but that it will only save the link. If the (eventual) hard link that the symlink is pointing to gets removed from the system, trying to resolve the symlink won't lead you to a file. Such symlinks are said to be "broken" or "dangling". Thus you cannot rely on symlinks to preserve access to a certain file. (Conversely, deleting a symlink does not affect the link count associated with a target file.) So what's their use?
Why use symbolic links?
You can operate on symlinks as if they were the actual files to which they pointing somewhere down the line (except deleting them). This allows you to have multiple "access points" to a file, without having excess copies (that remain up to date, since they always access the same file). If you want to replace the file that is being accessed you only need to change it once and all of the symlinks will point to it (as long as the path saved by them is not changed). However, if you have hard links to a certain file and you then replace that file with another one, you also need to replace the hard links as otherwise they'll still be pointing to the old file.
Lastly, it is not uncommon to have different filesystems mounted on the same Linux machine. That is to say, that the way data is organized and interpreted at some point in the file hierarchy (say /home/gst/fs1) can be different to how it is organized and interpreted at another point (say /home/gst/Desktop/fs2). A hard link can only reside on the same filesystem as the file it's pointing to. Whereas, a symlink can be created on one filesystem but effectively pointing to a file on another filesystem (see answers to this question).
Symbolic links, also known as soft links, are special types of files that point to other files, much like shortcuts in Windows and Macintosh aliases. The data in the target file does not appear in a symbolic link, unlike a hard link. Instead, it points to another file system entry.
Read more here:
https://kb.iu.edu/d/abbe#:~:text=A%20symbolic%20link%2C%20also%20termed,somewhere%20in%20the%20file%20system.

NTFS Journal USN_REASON_HARD_LINK_CHANGE event

I've written a program that reads the NTFS index and journal similar to what is described here:
http://ejrh.wordpress.com/2012/07/06/using-the-ntfs-journal-for-backups/
And It works fairly well.
In addition to the normal journal events USN_REASON_CLOSE, USN_REASON_FILE_CREATE, USN_REASON_FILE_DELETE etc' I'm receiving an event with reason USN_REASON_HARD_LINK_CHANGE. I'd like to be able to update the directory index according to this event but I can't find any information about it. The only documentation is:
An NTFS file system hard link is added to or removed from the file or
directory. An NTFS file system hard link, similar to a POSIX hard
link, is one of several directory entries that see the same file or
directory.
What does this mean? where was the hard-link created? or was it removed? how do I get more information about what happened?
I know this is ancient, but I stumbled upon this while researching a related problem. Here's what I found: The hard-links are a complicating factor when reading the USN. You can get journal entries describing change to a single file reference number by way of changes made through any hard-link that's been created. Generally, and to the original question, hard-links are alternative directory entries through which a single file might be accessed. Thus, all the file's characteristics are shared for each link (except for the names and parent file reference numbers). Technically, you can't tell which entry is the original and which is a link.
A subtle difference does exist, and it manifests if you query the master file table (using DeviceIOControl and Fsctl_Enum_Usn_Data). The query will return only a single representative file regardless of how many links exist. You can query for the links using NtQueryInformationFile, querying for FILE_HARD_LINK_INFORMATION. I think of the entry returned by the MFT query as the main entry and the NtQueryInformationFile-returned items as links...however, the main entry can get deleted and one of the links will get promoted...so it's only a housekeeping thought and little else.
Note that a problem arises where one of the hard-links is moved or renamed. In this case, the journal entries for the rename or move reflect the filename and parent file reference number of the affected link. The problem arises if you ask for only the summary "on-close" records. In such a case, you won't ever see the USN_REASON_RENAME_OLD_NAME record...because that USN entry never gets an associated REASON_CLOSE associated with it. Without this tidbit, you won't be able to easily determine which link's name or location was changed. You have to read the usn with ReadOnlyOnClose set to 0 in the Read_Usn_Journal_Data_V0. This is a far chattier query, but without it, you can't accurately associate the change with one link or the other.
As always with the USN, I expect you'll need to go through a bit of trial and error to get it to work right. These observations/guesses may, I hope, be helpful:
When the last hard link to a file is deleted, the file is deleted; so if the last hard link has been removed you should see USN_REASON_FILE_DELETE instead of USN_REASON_HARD_LINK_CHANGE. I believe that each reference number refers to a file (or directory, but NTFS doesn't support multiple hard links to directories AFAIK) rather than to a hard link. So immediately after the event is recorded, at least, the file reference number should still be valid, and point to another name for the file.
If the file still exists, you can look it up by reference number and use FindFirstFileNameW and friends to find the current links. Comparing this to the event record in question plus any relevant later events should give you enough information, although if multiple hard links for the same file are deleted and/or created you might not be able to reconstruct the order in which this happened, and if you don't have enough information about the prior state of the file system you might not be able to identify the deleted hard links. I don't know whether that would matter to you or not.
If the file no longer exists, you should still be able to identify it by the USN record in which it was deleted. Again, taking all relevant events into consideration, and with enough information about the prior state, you should be able to reconstruct most of what happened, if not the order.
There is some hope that we can do better than this: the file name and/or ParentFileReference number in the event record might refer to the hard link that was created or deleted, rather than to an arbitrary link to the file. In this case you'll have all the relevant information about the sequence of events except for whether any particular event was a create or a delete, which you should be able to work out by looking at the current state of the file and working backwards through the records.
I assume you've already looked for nearby change records that might contain additional information? There isn't, for example, a USN_REASON_RENAME_NEW_NAME record generated when a hard link is created or a USN_REASON_RENAME_OLD_NAME when a hard link is removed? Or paired USN_REASON_HARD_LINK_CHANGE records, one for the file, one for the directory containing the affected hard link to the file? (Wishful thinking, I expect, but it wouldn't hurt to look!)
For testing purposes, you can create hard links with the mklink command.

How to modify an executable without corrupting it?

Is there a specific null char or a sequence of bytes which would not corrupt the executable if added in FRONT of the file? I tried adding NUL (00 hex) but it corrupts the executable every time. Is there some bytecode for NOP (no operation) or something similar?
Long story short, I want to mess up a "hack" that modifies a value in memory at &process+fixed offset. Pushing the memory stack up would (or so I think) prevent it from working.
No, the PE file format that Windows executables use has a very specific header. See http://en.wikipedia.org/wiki/Portable_Executable for more details.
You can try using ASLR to make your code more resistant to in-memory patching.
You'd have to parse it completely into it's different sections and segment, modify the one you are looking for, but you won't be able to INSERT code before any code segment:
you'd better had a segment that will be executed first, then will jump to the old start segment.
At the end you will have to recreate a new complete executable file.

How do I determine where process executable code starts and ends when loaded in memory?

Say I have app TestApp.exe
While TestApp.exe is running I want a separate program to be able to read the executable code that is resident in memory. I'd like to ignore stack and heap and anything else that is tangential.
Put another way, I guess I'm asking how to determine where the memory-side equivalent of the .exe binary data on disk resides. I realize it's not a 1:1 stuffing into memory.
Edit: I think what I'm asking for is shown as Image in the following screenshot of vmmap.exe
Edit: I am able to get from memory all memory that is tagged with any protect flag of Execute* (PAGE_EXECUTE, etc) using VirtualQueryEx and ReadProcessMemory. There are a couple issues with that. First, I'm grabbing about 2 megabytes of data for notepad.exe which is a 189 kilobyte file on disk. Everything I'm grabbing has a protect flag of PAGE_EXECUTE. Second, If I run it on a different Win7 64bit machine I get the same data, only split in half and in a different order. I could use some expert guidance. :)
Edit: Also, not sure why I'm at -1 for this question. If I need to clear anything up please let me know.
Inject a DLL to the target process and call GetModuleHandle with the name of the executable. That will point to its PE header that has been loaded in the memory. Once you have this information, you can parse the PE header manually and find where .text section is located relative to the base address of the image in the memory.
no need to inject a dll
use native api hooking apis
I learned a ton doing this project. I ended up parsing the PE header and using that information to route me all over. In the end I accomplished what I set out to and I am more knowledgeable as a result.

How to map a file offset in an EXE to its PE section

I've opened up a program I wrote with ImageHlp.dll to play around with it a little, and I noticed that there seem to be large gaps in the file. As I understand it, for each PE section, the section header gives its offset in the file as PhysicalAddress, and its size as SizeOfRawData, and thus everything from PhysicalAddress to PhysicalAddress + SizeOfRawData ought to be that section. But there are large swaths of the EXE file that aren't covered by these ranges, so I must be missing something.
I know I can use ImageRVAToSection and give it an RVA address to find out which section that RVA is located in. Is there any way to do something similar with file offsets? How can I find out which PE section byte $ED178 or whatever belongs to?
Edit: Sorry, I didn't read your question carefully enough.
Doing some looking, I'm finding a few files like you mentioned, that the data in the section headers doesn't cover the entire contents of the file. Most of those I've found so far contain a debug record that's not covered. There are a few others with discrepancies I haven't been able to figure out yet though. When/if I can figure out more, I'll add it.
I posted in How does one use VirtualAllocEx do make room for a code cave? a code fragment which examine PEs current loaded in the memory. Probably you will find the answer on your question if you compare the contain of DLL in memory with the contain on the disk (which shows ImageHlp.dll).

Resources