Executable sections marked as "execute" AND "read"? - winapi

I've noticed (on Win32 at least) that in executables, code sections (.text) have the "read" access bit set, as well as the "execute" access bit. Are there any bonafide legit reasons for code to be reading itself instead of executing itself? I thought this was what other sections were for (such as .rdata).
(Specifically, I'm talking about IMAGE_SCN_MEM_READ.)

IMAGE_SCN_MEM_EXECUTE |IMAGE_SCN_MEM_READ are mapped into memory as PAGE_EXECUTE_READ, which is equivalent to PAGE_EXECUTE_WRITECOPY. This is needed to enable copy-on-write access. Copy-on-write means that any attempts to modify the page results in a new, process-private copy of the page being created.
There are a few different reasons for needing write-copy:
Code that needs to be relocated by the loader must have this set so that the loader can do the fix-ups. This is very common.
Sections that have code and data in single section would need this as well, to enable modifying process globals. Code & data in a single section can save space, and possibly improve locality by having code and the globals the code uses being on the same page.
Code that attempts to modify itself. I believe this is fairly rare.

Compile-time constants, particularly for long long or double values, are often loaded with a mov register, address statement from the code segment.

The one example I can think of for a reason to read code is to allow for self modifying code. Code must necessarily be able to read itself in order to be self modifying.
Also consider the opposite side. What advantage is gained from disallowing code from reading itself? I struggled for a bit on this one but I can see no advantage gained from doing so.

Related

Force malloc to pre-fault/MAP_POPULATE/MADV_WILLNEED all allocations for an entire program/process

For the sake of some user-space performance profiling, I'd like to cleanly separate the costs of allocating memory from operations that access it. The application does no over-allocation, so every page that gets mapped will be faulted in, probably in code that runs shortly after its allocation.
What I'd like to do is set some flag, environment variable, something, to tell malloc that it should uniformly do the equivalent of calling mmap(..., MAP_POPULATE) or madvise(..., MADV_WILLNEED) or just touching every page of whatever it allocated itself. I haven't found any documentation, on any platform(!), that describes a way to do this. Is there some existing technique that's utterly undocumented, up to my ability to search? Is this a fundamentally misguided or bad idea?
If I wanted to implement this myself, I'm thinking of an LD_PRELOAD including just a reimplementation of malloc that calls the underlying malloc and then does the madvise thing (to be at least somewhat agnostic to huge pages behavior). Any reason that shouldn't work?
malloc is one of the most used, yet relatively slow functions in common use. As a result, it has received a lot of optimization attention over the years. I seriously doubt that any serious implementation of malloc does anything so slow as the string parsing that would be required to check an environment variable at every call.
LD_PRELOAD is not a bad idea, considering what you're doing, you wouldn't even need to recompile to switch between profile and release builds. If you're open to recompiling, I would suggest doing a #define malloc(size) { malloc(size); mmap(...);}. You could even do this at the compile command line via -Dmalloc=... (so long as the system malloc is not itself a define, which would overwrite the cli one).
Another option would be to find/implement a program that uses the debug interface to intercept and redirect calls to malloc. You could theoretically do this by messing with the post-compiled (or post-load) program's import section to point to your dll/so file.
Edit: On second thought, the define might not work on every allocation, since it is often implied by the compiler (e.g. new).

What does 'dirty-flag' / 'dirty-values' mean?

I see some variables named 'dirty' in some source code at work and some other code. What does it mean? What is a dirty flag?
Generally, dirty flags are used to indicate that some data has changed and needs to eventually be written to some external destination. It isn't written immediate because adjacent data may also get changed and writing bulk of data is generally more efficient than writing individual values.
There's a deeper issue here - rather than "What does 'dirty mean?" in the context of code, I think we should really be asking - is 'dirty' an appropriate term for what is generally intended.
'Dirty' is potentially confusing and misleading. It will suggest to many new programmers corrupt or erroneous form data. The work 'dirty' implies that something is wrong and that the data needs to be purged or removed. Something dirty is, after all undesirable, unclean and unpleasant.
If we mean 'the form has been touched' or 'the form has been amended but the changes haven't yet been written to the server', then why not 'touched' or 'writePending' rather than 'dirty'?
That I think, is a question the programming community needs to address.
Dirty could mean a number of things, you need to provide more context. But in a very general sense a "dirty flag" is used to indicate whether something has been touched / modified.
For instance, see usage of "dirty bit" in the context of memory management in the wiki for Page Table
"Dirty" is often used in the context of caching, from application-level caching to architectural caching.
In general, there're two kinds of caching mechanisms: (1) write through; and (2) write back. We use WT and WB for short.
WT means that the write is done synchronously both to the cache and to the backing store. (By saying the cache and the backing store, for example, they can stand for the main memory and the disk, respectively, in the context of databases).
In contrast, for WB, initially, writing is done only to the cache. The write to the backing store is postponed until the cache blocks containing the data are about to be modified/replaced by new content.
The data is the dirty values. When implementing a WB cache, you can set dirty bits to indicate whether a cache block contains dirty value or not.

Extending functionality of existing program I don't have source for

I'm working on a third-party program that aggregates data from a bunch of different, existing Windows programs. Each program has a mechanism for exporting the data via the GUI. The most brain-dead approach would have me generate extracts by using AutoIt or some other GUI manipulation program to generate the extractions via the GUI. The problem with this is that people might be interacting with the computer when, suddenly, some automated program takes over. That's no good. What I really want to do is somehow have a program run once a day and silently (i.e. without popping up any GUIs) export the data from each program.
My research is telling me that I need to hook each application (assume these applications are always running) and inject a custom DLL to trigger each export. Am I remotely close to being on the right track? I'm a fairly experienced software dev, but I don't know a whole lot about reverse engineering or hooking. Any advice or direction would be greatly appreciated.
Edit: I'm trying to manage the availability of a certain type of professional. Their schedules are stored in proprietary systems. With their permission, I want to install an app on their system that extracts their schedule from whichever system they are using and uploads the information to a central server so that I can present that information to potential clients.
I am aware of four ways of extracting the information you want, both with their advantages and disadvantages. Before you do anything, you need to be aware that any solution you create is not guaranteed and in fact very unlikely to continue working should the target application ever update. The reason is that in each case, you are relying on an implementation detail instead of a pre-defined interface through which to export your data.
Hooking the GUI
The first way is to hook the GUI as you have suggested. What you are doing in this case is simply reading off from what an actual user would see. This is in general easier, since you are hooking the WinAPI which is clearly defined. One danger is that what the program displays is inconsistent or incomplete in comparison to the internal data it is supposed to be representing.
Typically, there are two common ways to perform WinAPI hooking:
DLL Injection. You create a DLL which you load into the other program's virtual address space. This means that you have read/write access (writable access can be gained with VirtualProtect) to the target's entire memory. From here you can trampoline the functions which are called to set UI information. For example, to check if a window has changed its text, you might trampoline the SetWindowText function. Note every control has different interfaces used to set what they are displaying. In this case, you are hooking the functions called by the code to set the display.
SetWindowsHookEx. Under the covers, this works similarly to DLL injection and in this case is really just another method for you to extend/subvert the control flow of messages received by controls. What you want to do in this case is hook the window procedures of each child control. For example, when an item is added to a ComboBox, it would receive a CB_ADDSTRING message. In this case, you are hooking the messages that are received when the display changes.
One caveat with this approach is that it will only work if the target is using or extending WinAPI controls.
Reading from the GUI
Instead of hooking the GUI, you can alternatively use WinAPI to read directly from the target windows. However, in some cases this may not be allowed. There is not much to do in this case but to try and see if it works. This may in fact be the easiest approach. Typically, you will send messages such as WM_GETTEXT to query the target window for what it is currently displaying. To do this, you will need to obtain the exact window hierarchy containing the control you are interested in. For example, say you want to read an edit control, you will need to see what parent window/s are above it in the window hierarchy in order to obtain its window handle.
Reading from memory (Advanced)
This approach is by far the most complicated but if you are able to fully reverse engineer the target program, it is the most likely to get you consistent data. This approach works by you reading the memory from the target process. This technique is very commonly used in game hacking to add 'functionality' and to observe the internal state of the game.
Consider that as well as storing information in the GUI, programs often hold their own internal model of all the data. This is especially true when the controls used are virtual and simply query subsets of the data to be displayed. This is an example of a situation where the first two approaches would not be of much use. This data is often held in some sort of abstract data type such as a list or perhaps even an array. The trick is to find this list in memory and read the values off directly. This can be done externally with ReadProcessMemory or internally through DLL injection again. The difficulty lies mainly in two prerequisites:
Firstly, you must be able to reliably locate these data structures. The problem with this is that code is not guaranteed to be in the same place, especially with features such as ASLR. Colloquially, this is sometimes referred to as code-shifting. ASLR can be defeated by using the offset from a module base and dynamically getting the module base address with functions such as GetModuleHandle. As well as ASLR, a reason that this occurs is due to dynamic memory allocation (e.g. through malloc). In such cases, you will need to find a heap address storing the pointer (which would for example be the return of malloc), dereference that and find your list. That pointer would be prone to ASLR and instead of a pointer, it might be a double-pointer, triple-pointer, etc.
The second problem you face is that it would be rare for each list item to be a primitive type. For example, instead of a list of character arrays (strings), it is likely that you will be faced with a list of objects. You would need to further reverse engineer each object type and understand internal layouts (at least be able to determine offsets of primitive values you are interested in in terms of its offset from the object base). More advanced methods revolve around actually reverse engineering the vtable of objects and calling their 'API'.
You might notice that I am not able to give information here which is specific. The reason is that by its nature, using this method requires an intimate understanding of the target's internals and as such, the specifics are defined only by how the target has been programmed. Unless you have knowledge and experience of reverse engineering, it is unlikely you would want to go down this route.
Hooking the target's internal API (Advanced)
As with the above solution, instead of digging for data structures, you dig for the internal API. I briefly covered this with when discussing vtables earlier. Instead of doing this, you would be attempting to find internal APIs that are called when the GUI is modified. Typically, when a view/UI is modified, instead of directly calling the WinAPI to update it, a program will have its own wrapper function which it calls which in turn calls the WinAPI. You simply need to find this function and hook it. Again this is possible, but requires reverse engineering skills. You may find that you discover functions which you want to call yourself. In this case, as well as being able to locate the location of the function, you have to reverse engineer the parameters it takes, its calling convention and you will need to ensure calling the function has no side effects.
I would consider this approach to be advanced. It can certainly be done and is another common technique used in game hacking to observe internal states and to manipulate a target's behaviour, but is difficult!
The first two methods are well suited for reading data from WinAPI programs and are by far easier. The two latter methods allow greater flexibility. With enough work, you are able to read anything and everything encapsulated by the target but requires a lot of skill.
Another point of concern which may or may not relate to your case is how easy it will be to update your solution to work should the target every be updated. With the first two methods, it is more likely no changes or small changes have to be made. With the second two methods, even a small change in source code can cause a relocation of the offsets you are relying upon. One method of dealing with this is to use byte signatures to dynamically generate the offsets. I wrote another answer some time ago which addresses how this is done.
What I have written is only a brief summary of the various techniques that can be used for what you want to achieve. I may have missed approaches, but these are the most common ones I know of and have experience with. Since these are large topics in themselves, I would advise you ask a new question if you want to obtain more detail about any particular one. Note that in all of the approaches I have discussed, none of them suffer from any interaction which is visible to the outside world so you would have no problem with anything popping up. It would be, as you describe, 'silent'.
This is relevant information about detouring/trampolining which I have lifted from a previous answer I wrote:
If you are looking for ways that programs detour execution of other
processes, it is usually through one of two means:
Dynamic (Runtime) Detouring - This is the more common method and is what is used by libraries such as Microsoft Detours. Here is a
relevant paper where the first few bytes of a function are overwritten
to unconditionally branch to the instrumentation.
(Static) Binary Rewriting - This is a much less common method for rootkits, but is used by research projects. It allows detouring to be
performed by statically analysing and overwriting a binary. An old
(not publicly available) package for Windows that performs this is
Etch. This paper gives a high-level view of how it works
conceptually.
Although Detours demonstrates one method of dynamic detouring, there
are countless methods used in the industry, especially in the reverse
engineering and hacking arenas. These include the IAT and breakpoint
methods I mentioned above. To 'point you in the right direction' for
these, you should look at 'research' performed in the fields of
research projects and reverse engineering.

LoadLibrary from offset in a file

I am writing a scriptable game engine, for which I have a large number of classes that perform various tasks. The size of the engine is growing rapidly, and so I thought of splitting the large executable up into dll modules so that only the components that the game writer actually uses can be included. When the user compiles their game (which is to say their script), I want the correct dll's to be part of the final executable. I already have quite a bit of overlay data, so I figured I might be able to store the dll's as part of this block. My question boils down to this:
Is it possible to trick LoadLibrary to start reading the file at a certain offset? That would save me from having to either extract the dll into a temporary file which is not clean, or alternatively scrapping the automatic inclusion of dll's altogether and simply instructing my users to package the dll's along with their games.
Initially I thought of going for the "load dll from memory" approach but rejected it on grounds of portability and simply because it seems like such a horrible hack.
Any thoughts?
Kind regards,
Philip Bennefall
You are trying to solve a problem that doesn't exist. Loading a DLL doesn't actually require any physical memory. Windows creates a memory mapped file for the DLL content. Code from the DLL only ever gets loaded when your program calls that code. Unused code doesn't require any system resources beyond reserved memory pages. You have 2 billion bytes worth of that on a 32-bit operating system. You have to write a lot of code to consume them all, 50 megabytes of machine code is already a very large program.
The memory mapping is also the reason you cannot make LoadLibrary() do what you want to do. There is no realistic scenario where you need to.
Look into the linker's /DELAYLOAD option to improve startup performance.
I think every solution for that task is "horrible hack" and nothing more.
Simplest way that I see is create your own virtual drive that present custom filesystem and hacks system access path from one real file (compilation of your libraries) to multiple separate DLL-s. For example like TrueCrypt does (it's open-source). And than you may use LoadLibrary function without changes.
But only right way I see is change your task and don't use this approach. I think you need to create your own script interpreter and compiler, using structures, pointers and so on.
The main thing is that I don't understand your benefit from use of libraries. I think any compiled code in current time does not weigh so much and may be packed very good. Any other resources may be loaded dynamically at first call. All you need to do is to organize the working cycles of all components of the script engine in right way.

Edit library in hex editor while preserving its integrity

I'm attempting to edit a library in hex editor, insert mode. The main point is to rename a few entries in it. If I make it in "Otherwrite" mode, everything works fine, but every time I try to add a few symbols to the end of string in "Insert" mode, the library fails to load. Anything I'm missing here?
Yes, you're missing plenty. A library follows the PE/COFF format, which is quite heavy on pointers throughout the file. (Eg, towards the beginning of the file is a table which points to the locations of each section in the file).
In the case that you are editing resources, there's the potential to do it without breaking things if you make sure you correct any pointers and sizes for anything pointing to after your edits, but I doubt it'll be easy. In the case that you are editing the .text section (ie, the code), then I doubt you'll get it done, since the operands of function calls and jumps are relative locations to their position in code - you would need to update the entire code to account for edits.
One technique to overcome this is a "code cave", where you replace a piece of the existing code with an explicit JMP instruction to some empty location (You can do this at runtime, where you have the ability to create new memory) - where you define some new code which can be of arbitrary length - then you explicitly JMP back to where you called from (+5 bytes say for the JMP opcode + operand).
Are the names you're changing them to the same length as the old names? If not, then the offsets of everything is shifted. And do any of the functions call one another? That could be another problem point. It'd be easier to obtain the source code (from the project's website if it's not in-house, or from the vendor if it's closed) and change them in that, and then recompile it. I'm curious as to why you are changing the names anyway.
DLLs are a complex binary format (ie compiled code). The compiling process turns named function calls into hard-wired references to specific positions in the file ("offsets"). Therefore if you insert characters into the middle of the file, the offsets after that point will no longer match what is actually at the position they reference, meaning that the function calls in your library will run the wrong code (if they manage to run anything at all).
Basically, the bottom line is what you're doing is always going to break stuff. If you're unlucky, it might even break it really badly and cause serious damage.
Sure - a detailed knowledge of the format, and what has to change. If you're wondering why some of your edits cause loading to fail, you are missing that knowledge.
Libraries are intended to be written by the linker for the use of the linker. They follow a well-defined format that is intended to be easy for the linker to write and read. They don't need tolerance for human input like a compiler does.
Very simply, libraries aren't intended to be modified by hex editors. It may be possible to change entries by overwriting them with names of the same length, or that may screw up an index somewhere. If you change the length of anything, you're likely breaking pointers and metadata.
You don't give any reason for wanting to do this. If it's for fun, well, it's harder than you expected. If you have another reason, you're better off getting the source, or getting somebody who has the source to rename and rebuild.

Resources