Fix hard-coded display setting without source (24-bit, need 32-bit) - vb6

I wrote a program about 10 years ago in Visual Basic 6 which was basically a full-screen game similar to Breakout / Arkanoid but had 'demoscene'-style backgrounds. I found the program, but not the source code. Back then I hard-coded the display mode to 800x600x24, and the program crashes whenever I try to run it as a result. No virtual machine seems to support 24-bit display when the host display mode is 16/32-bit. It uses DirectX 7 so DOSBox is no use.
I've tried all sorts of decompiler and at best they give me the form names and a bunch of assembly calls which mean nothing to me. The display mode setting was a DirectX 7 call but there's no clear reference to it in the decompilation.
In this situation, is there any pointers on how I can:
pin-point the function call in the program which is setting the display mode to 800x600x24 (ResHacker maybe?) and change the value being passed to it so it sets 800x600x32
view/intercept DirectX calls being made while it's running
or if that's not possible, at least
run the program in an environment that emulates a 24-bit display
I don't need to recover the source code (as nice as it would be) so much as just want to get it running.

One technique you could try in your disassembler is to do a search for the constants you remember, but as the actual bytes that would be contained within the executable. I guess you used the DirectDraw SetDisplayMode call, which is a COM object so can't be as easily traced to/from an entry point in a DLL. It takes parameters for width, height and bits per pixel and they are DWORDs (32-bit) so do a search for "58 02 00 00", "20 03 00 00" and "18 00 00 00". Hopefully that will narrow it down to what you need to change.
By the way which disassembler are you using?
This approach may be complicated somewhat if your VB6 program compiled to p-code rather than native code as you'll just get a huge chunk of data that represents the program rather than useful assembler instructions.

Check this:
http://www.sevenforums.com/tutorials/258-color-bit-depth-display-settings.html
If your graphics card doesn't have an entry for 24-bit display....I guess hacking your code's the only possibility. That or finding an old machine to throw windows 95 on :P.

Related

Finding and patching an instruction in a DLL

I have a (C++) program where, in one of its dll's, the following is done:
if (m_Map.GetMaxValue() >= MAX_CLASSES) {
I have two binaries of this program (compiled with various versions of Visual Studio), one where MAX_CLASSES was #define'd to 50, and one where it was 75. These binaries were made from different branches of the code and the other functionality is different as well. What I need is a version of the binary where the MAX_CLASSES was defined as 50, except with the higher limit i.e. 75.
So a sane person would change the constant in the source code of the branch I need, rebuild and go home. But, building this software is complex because it's old, the dependencies and tooling are old, etc.; plus I have issues with building the installers, and data and so on. So, I thought, how about I just patch this binary so that this one constant is changed directly in the DLL. I have vague recollections of doing similar things in the 1990's for, eh, probably 'educational' purposes.
But times have changed and I barely remember doing it, let alone how I did things back then. I opened the DLL (one where the limit is set to 75, this is the binary I have at hand - I will have to re-do this as soon as I have the actual binary with the 50 limit, so the following references 75 i.e. 0x4b for illustrating the principle) in Ghidra and after some poking around, I found the following:
18005160e 3c 4b CMP AL,0x4b
180051610 0f 82 19 JC LAB_18005172f
01 00 00
Which in the decompiler window I could link back to
if (bVar3 < 0x4b)
and some operations after that that I can map to the source code of the function I have.
Now my questions are:
how do I interpret the values above (the Ghidra output) wrt to the binary layout of the dll? When I hover over the first column value ('18005160e') in Ghidra, I get values for 'imagebase offset', 'memory block offset', 'function offset' and 'byte source offset'. Is this 'byte source offset' the physical address from the start of the dll where these instructions start? The actual value in this hover balloon is 50a0eh - is that Ghidra's notation for 0x50a0e ? I.e. does the trailing 'h' denote 'hex'?
I then tried to open the dll in a regular hex editor ('Hex Editor Neo' which I like to use to view/edit binary data files), and went to offset 0x50a0e, and looked for the values '3c 4b' around there which I didn't find. I searched for this byte sequence in the whole file, and found 7 occurrences, none of which are around 0x50a0e, leading me to think I'm misinterpreting Ghidra's 'byte source offset' here.
how do I make a 'patcher' for this? I would think what I need is a program that only does
FILE* fh = fopen('mydll.dll);
fseek(fh, 0x[magic constant]);
fwrite(fh, 0x4b);
fclose(fh);
where '0x[magic constant]' is hopefully just the value I got from Ghidra in 'byte source offset'? Or is there anything else I need to consider here? Is there any software tool that can generate a patcher program?
Thanks.
18005160e is a VA, a Virtual Address.
It is the sum of a Base Address (most likely 180000000) and an RVA, a Relative Virtual Address.
Find the Base Address of the DLL with any PE inspecting tool (e.g. CFF Explorer) or Ghidra itself.
Subtract the base address from 18005160e to the RVA. Let's say the result is 5160e.
Now you need to find which section this RVA lies in. Again use an PE inspecting tool to find the list of the sections and their RVA/Virtual start and RVA/Virtual size.
Say the RVA lies in the .text section with start at the RVA 1000.
Subtract this start RVA from the result above: 5160e - 1000 = 4160e.
This is the offset of the instruction in the .text section.
To find the offset in the file, just add the raw/offset start of the section (again you can find this with a PE inspecting tool).
Say the .text section starts at the offset 400, then 4160e + 400 = 41a0e is the offset corresponding to the VA 18005160e.
This is all PE 101.

How to write and executable Windows .exe manually (machine code with Hex editor)?

I'd like to know how is it possible to write something as simple as an Hello World program just by using an Hex Editor. I know that I could use an assembler and assembly language to this at a near machine level but I just want to experiment with really writing machine code in a toy example such as Hello World.
This could be a simple DOS .COM file that I can run on DOSBox. But it would be nice if someone could provide an example for an .EXE file for running it directly on my Windows PC.
This is just pure curiosity. No... I'm not thinking of writing programs directly in binary machine code (I don't even usually write assembly code, I just use C/C++ as my most low level tools most of the time). I just want to see if that's possible to do it, because probably someone had to do it in the very early days of computers.
P.S.:
I know that there are similar questions about this topic around but none provide a working example. I just want a simple example so that it can help me understand how compilers and assemblers generate an executable file. I mean... someone must have done this by hand in the past for the very first programs. Also, for the Windows EXE format there must have been someone at Microsoft that wrote the first tools to generate the format and the way that Windows itself reads it and then executes it.
There's a quite minimalistic but fully working (on Win7, too) exe on corkami/wiki/PE101, every byte of it is explained in the nice graphic. You can type it all by hand in a hex editor, but the paddings may make that a little tedious.
As for the history, yes someone at Microsoft invented the exe format (the old DOS MZ exe format) and he (or someone else at Microsoft) wrote a loader for it and a linker, which is the thing that traditionally turns the output of a compiler ("object files") into executable files. It's possible (and even likely, I would say) that the first exe programs were written by hand, after all they were only meant to test the new loader.
Later, AT&T's COFF format was extended by Microsoft to the PE format, which still has the MZ header and typically (but optionally, it's not in the corkami example, and it can be anything really) includes a small DOS program just to print the message "This program cannot be run in DOS mode".
1) a .com file is the simplest place to start and will run on a dosbox, basically the program starts at something like offset 0x100 in the file, I think the first 0x100 can be whatever, dont remember
2) although true that first programs are often written and assembled by hand into machine code, we are talking about when you add two numbers save them in memory and are so happy that you take the rest of the day off. a "hello world" program that prints stuff to a video card is significantly more complicated. Now you can make a very simple one using dos system calls, and perhaps that is not what you are interested in, perhaps it is.
3) based on 2, anything more complicated than one or a few instructions at a time for testing back in the 1960s or 1970s, even when writing hand assembling a program you write your program in assembler by hand, then assemble it to machine code, then load it. Basically learn assembly language first, then learn how to generate the machine code for it, then start typing those bytes into a hex editor. It is not then 1960s, unless you enjoy excessive pain, learn the above by writing asm, using an assembler to generate the machine code, then use a disassembler to disassemble it and examine the assembly language and the machine code side by side to significantly improve the amount of time it is going to take you to get a working program. If you worked for a chip company before there were operating systems and instruction sets, you would still take advantage of other members of the team, the chip designers, etc for understanding how to make the machine code and arrange it. You wouldnt be coming at this with only high level language experience and doing it all on your own with a hope of success.
4) x86 is a horrible instruction set, if you dont know assembly I strongly discourage you to not learn it first. having an x86 is the worst excuse I have heard to learn x86 first. you already mentioned dosbox so are already planning to emulate/simulate so use a good instruction set and simulate it or buy that hardware (under $50 even under $20 will buy you a board with a much better instruction sets). I recommend simulate/emulate first and in parallel with the hardware if you choose to buy some. If you really want an education write your own simulator it is not difficult at all. Perhaps invent your own instruction set.
5) none of this will help you understand what a compiler does. Knowing assembly language then disassembling the compilers output is your best path toward that knowledge, machine code is not involved, no need to actually run the programs. A compiler goes from the higher level language to a lower level language (C to asm or C++ to asm for example). Then understand what an assembler does, there are many different solutions, both due to history and due to other reasons. The typical solution today is a separate compiler, assembler and linker (your compiler calls the assembler and linker for you unless you tell it not to, the three steps are hidden from view, in fact the compile process may be more than one program that is run to complete that task). Assemblers that output a binary will have to resolve the whole program, assemblers that output to an object will leave holes in the machine code for the linker to fill in. things like branching or calling items in another object that it cannot encode until the linker places things in the binary and knows the spacing/addressing. Also accessing variables that live in other objects.
You are likely not seeing actual examples on hex editing a program because first off it is such a broad question there isnt a simple answer (what operating, system, what system calls or are you creating those, what file format, what hex editor, etc). Also because it is a high level question and problem, the real questions are where do I learn assembly, where do I learn about the relationship between assembly and machine code, where do I learn about system calls (which are not an assembly question, they are unrelated to learning asm, you learn assembly language itself then you learn to USE it as a tool to perform system calls if you cannot perform the system calls directly using a higher language), where do I learn about executable file formats like .com, .exe, coff, elf, etc. What is a good or easy or some adjective, hex editor that runs on xyz operating system or environment. Ask those questions separately and you will find the answers and examples and once you have those answers you will know how to make a program using a hex editor typing in machine code. A shorter example is that you ARE seeing hex examples of complete programs when you see the disassembly of a program posted at SO, some of those are complete programs shown in hex. and if you know the file format you can simply type that stuff into a hex editor.
I make binaries by hand, but I think it's easier in assembly itself than a pure hex editor, where updating anything would be difficult.
The easiest is surely DOS COM format, which you can even type in notepad,
or at least, it's very easy even for a normal Hello World.
The EXE (non DOS format) doesn't require much either see here.
If you're trying to make a PE, you can make a TinyPE.
Most binaries should be available as PE, and EXE and COM.
Not spot on, but this tutorial should give you a better insight into how assembly maps to machinde code (x86 ELF): http://timelessname.com/elfbin/ (especially look at the lower half of the page)
This page is [...] about my attempts at creating the smallest x86 ELF binary that would execute saying Hello World on Ubuntu Linux My first attempts started with C then progressed to x86 assembly and finally to a hexeditor.
It's great to analyze really small executables like these because the mapping between assembly and machine code will be easier to spot. This is also a really interesting article on the subject (not exactly related to your question though): http://www.phreedom.org/research/tinype/ (x86 PE)
I wrote an article on creating executable DOS binary files just by using the ECHO at the command prompt. No other 3rd party HEX utilities or x86 IDEs required!
The technique uses a a combination of keypad - ALT ASCII codes which convert OPCODES to a binary format readable directly under MSDOS. The output is a fully runnable binary *.com file.
http://colinord.blogspot.co.uk/2015/02/extreme-programming-hand-coded.html
Excerpt:
Type the following key commands at the DOS prompt remembering to hold Left ALT.
c:\>Echo LALT-178 LALT-36 LALT-180 LALT-2 LALT-205 LALT-33 LALT-205 LALT-32 > $.com
The codes above are actually opcode values describing an X86 assembly program to print a dollar sign to the screen.
Your prompt should look something similar below when finished. Press enter to build!
c:\>Echo ▓$┤☻═!═ > $.com
Run the file '$.com' and you will see a single dollar ($) character displayed on the screen.
c:\>$.com
$
c:\>
Congratulations! You just created your first hand coded executable file called $.com.
you can do a disassembly and try figure out the machine code for the opcodes you use in your assembler
for example
org 0x100
mov dx,msg
mov ah,0x09
int 0x21
ret
msg db 'hello$'
compiled with nasm -fbin ./a.asm -o ./a.com
has ndisasm a.com deliver the following disassembly:
00000000 BA0801 mov dx,0x108
00000003 B409 mov ah,0x9
00000005 CD21 int 0x21
00000007 C3 ret
00000008 68656C push word 0x6c65
0000000B 6C insb
0000000C 6F outsw
0000000D 24 db 0x24
00000000 to 00000007 are the instructions
so you can play with the ba0801 machine code, using some hex editor, try changing it to ba0901, and only 'ello' will be printed, you can play around with your hex editor and pad stuff out with NOP, which is 0x90 in machine code, for example:
00000000: ba 50 01 90 90 90 90 90 90 90 90 90 90 90 90 90 .#..............
00000010: b4 09 90 90 90 90 90 90 90 90 90 90 90 90 90 90 ................
00000020: cd 21 90 90 90 90 90 90 90 90 90 90 90 90 90 90 .!..............
00000030: c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 ................
00000040: 71 77 65 72 74 79 75 69 61 73 64 66 67 68 6a 24 qwertyuiasdfghj$
00000050: 61 73 64 66 67 68 6a 6b 61 73 64 66 67 68 6a 24 asdfghjkasdfghj$
00000060: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- ----------------
if you save this with the extension .com you can run it in DosBox

C++/CLI serial port send command

I have a hardware here, wich communicates over serial port. I use MS Visual C++ 2010, and I want to send a command: <-S->
I am doing this:
SerialPort^ serialPort = gcnew SerialPort(portName , 9600, Parity::None, 8, StopBits::One);
serialPort->Open();
serialPort->WriteLine("<-S->");
serialPort->Close();
But the command that goes out is <-S->., and not <-S->
(please notice the point that is attached to the outgoing command).
I use Free Serial Port Monitor to watch my ingoing/outcoming data.
So how can I get rid of that point in <-S->. ?
This is what is going out:
3C 2D 53 2D 3E 0A = <-S->.
This is what I want:
3C 2D 53 2D 3E = <-S->
Thanks for help.
You are using WriteLine(), which is appending a newline (character 0x0A) to your output (which something is showing as a ., but it's not really a dot). Try Write() instead.
It is the value of the SerialPort.NewLine property. A line-feed by default. Use Write() instead.
There's more trouble, your code will only work well when you single-step with the debugger. Without it, the Close() method will instantly purge the transmit buffer and only a random number of characters will manage to get sent, including nothing at all. Only close serial ports when your program terminates.
Since you're using C++/CLI, there's really no reason to use the horrid .NET SerialPort class. The native Win32 serial port functions are much more powerful and reliable. Turn on OVERLAPPED mode, attach an event, and you can get background I/O operations going without having to mess with the thread pool and synchronization.

CreateThread() fails on 64 bit Windows, works on 32 bit Windows. Why?

Operating System: Windows XP 64 bit, SP2.
I have an unusual problem. I am porting some code from 32 bit to 64 bit. The 32 bit code works just fine. But when I call CreateThread() for the 64 bit version the call fails. I have three places where this fails. 2 call CreateThread(). 1 calls beginthreadex() which calls CreateThread().
All three calls fail with error code 0x3E6, "Invalid access to memory location".
The problem is all the input parameters are correct.
HANDLE h;
DWORD threadID;
h = CreateThread(0, // default security
0, // default stack size
myThreadFunc, // valid function to call
myParam, // my param
0, // no flags, start thread immediately
&threadID);
All three calls to CreateThread() are made from a DLL I've injected into the target program at the start of the program execution (this is before the program has got to the start of main()/WinMain()). If I call CreateThread() from the target program (same params) via say a menu, it works. Same parameters etc. Bizarre.
If I pass NULL instead of &threadID, it still fails.
If I pass NULL as myParam, it still fails.
I'm not calling CreateThread from inside DllMain(), so that isn't the problem. I'm confused and searching on Google etc hasn't shown any relevant answers.
If anyone has seen this before or has any ideas, please let me know.
Thanks for reading.
ANSWER
Short answer: Stack Frames on x64 need to be 16 byte aligned.
Longer answer:
After much banging my head against the debugger wall and posting responses to the various suggestions (all of which helped in someway, prodding me to try new directions) I started exploring what-ifs about what was on the stack prior to calling CreateThread(). This proved to be a red-herring but it did lead to the solution.
Adding extra data to the stack changes the stack frame alignment. Sooner or later one of the tests gets you to 16 byte stack frame alignment. At that point the code worked. So I retraced my steps and started putting NULL data onto the stack rather than what I thought was the correct values (I had been pushing return addresses to fake up a call frame). It still worked - so the data isn't important, it must be the actual stack addresses.
I quickly realised it was 16 byte alignment for the stack. Previously I was only aware of 8 byte alignment for data. This microsoft document explains all the alignment requirements.
If the stackframe is not 16 byte aligned on x64 the compiler may put large (8 byte or more) data on the wrong alignment boundaries when it pushes data onto the stack.
Hence the problem I faced - the hooking code was called with a stack that was not aligned on a 16 byte boundary.
Quick summary of alignment requirements, expressed as size : alignment
1 : 1
2 : 2
4 : 4
8 : 8
10 : 16
16 : 16
Anything larger than 8 bytes is aligned on the next power of 2 boundary.
I think Microsoft's error code is a bit misleading. The initial STATUS_DATATYPE_MISALIGNMENT could be expressed as a STATUS_STACK_MISALIGNMENT which would be more helpful. But then turning STATUS_DATATYPE_MISALIGNMENT into ERROR_NOACCESS - that actually disguises and misleads as to what the problem is. Very unhelpful.
Thank you to everyone that posted suggestions. Even if I disagreed with the suggestions, they prompted me to test in a wide variety of directions (including the ones I disagreed with).
Written a more detailed description of the problem of datatype misalignment here: 64 bit porting gotcha #1! x64 Datatype misalignment.
The only reason that 64bit would make a difference is that threading on 64bit requires 64bit aligned values. If threadID isn't 64bit aligned, you could cause this problem.
Ok, that idea's not it. Are you sure it's valid to call CreateThread before main/WinMain? It would explain why it works in a menu- because that's after main/WinMain.
In addition, I'd triple-check the lifetime of myParam. CreateThread returns (this I know from experience) long before the function you pass in is called.
Post the thread routine's code (or just a few lines).
It suddenly occurs to me: Are you sure that you're injecting your 64bit code into a 64bit process? Because if you had a 64bit CreateThread call and tried to inject that into a 32bit process running under WOW64, bad things could happen.
Starting to seriously run out of ideas. Does the compiler report any warnings?
Could the bug be due to a bug in the host program, rather than the DLL? There's some other code, such as loading a DLL if you used __declspec(import/export), that occurs before main/WinMain. If that DLLMain, for example, had a bug in it.
I ran into this issue today. And I checked every argument feed into _beginthread/CreateThread/NtCreateThread via rohitab's Windows API Monitor v2. Every argument is aligned properly (AFAIK).
So, where does STATUS_DATATYPE_MISALIGNMENT come from?
The first few lines of NtCreateThread validate parameters passed from user mode.
ProbeForReadSmallStructure (ThreadContext, sizeof (CONTEXT), CONTEXT_ALIGN);
for i386
#define CONTEXT_ALIGN (sizeof(ULONG))
for amd64
#define STACK_ALIGN (16UI64)
...
#define CONTEXT_ALIGN STACK_ALIGN
On amd64, if the ThreadContext pointer is not aligned to 16 bytes, NtCreateThread will return STATUS_DATATYPE_MISALIGNMENT.
CreateThread (actually CreateRemoteThread) allocated ThreadContext from stack, and did nothing special to guarantee the alignment requirement is satisfied. Things will work smoothly if every piece of your code followed Microsoft x64 calling convention, which unfortunately not true for me.
PS: The same code may work on newer Windows (say Vista and newer). I didn't check though. I'm facing this issue on Windows Server 2003 R2 x64.
I'm in the business of using parallel threads under windows
for calculations. No funny business, no dll-calls, and certainly
no call-back's. The following works in 32 bits windows. I set up the stack for my calculation, well within the area reserved for my program.
All releveant data about area's and start addresses is contained in
a data structure that is passed to CreateThread as parameter 3.
The address that is called contains a small assembler routine
that uses this data stucture.
Indeed this routine finds the address to return to on the stack,
then the address of the data structure.
There is no reason to go far into this. It just works and it calculates
the number of primes below 2,000,000,000 just fine, in one thread,
in two threads or in 20 threads.
Now CreateThread in 64 bits doesn't push the address of the data
structure. That seems implausible so I show you the smoking gun,
a dump of a debug session.
In the subwindow at the bottom right you see the stack, and
there is merely the return address, amidst a sea of zeroes.
The mechanism I use to fill in parameters is portable between 32 and 64 bits.
No other call exhibits a difference between word-sizes.
Moreover why would the code address work but not the data address?
The bottom line: one would expect that CreateThread passes the data parameter on the stack in the same way in 64 bits as in 32 bits, then does a subroutine call. At the assembler level it doesn't work that way. If there are any hidden requirements to e.g. RSP that are automatically fullfilled in C++ that would be very nasty.
P.S. No there are no 16 byte alignment problems. That lies ages behind me.
Try using _beginthread() or _beginthreadex() instead, you shouldn't be using CreateThread directly.
See this previous question.

How to read / write .exe machine code manually?

I am not well acquainted to the compiler magic. The act of transforming human-readable code (or the not really readable Assembly instructions) into machine code is, for me, rocket science combined with sorcery.
I will narrow down the subject of this question to Win32 executables (.exe). When I open these files up in a specialized viewer, I can find strings (usually 16b per character) scattered at various places, but the rest is just garbage. I suppose the unreadable part (majority) is the machine code (or maybe resources, such as images etc...).
Is there any straightforward way of reading the machine code? Opening the exe as a file stream and reading it byte by byte, how could one turn these individual bytes into Assembly? Is there a straightforward mapping between these instruction bytes and the Assembly instruction?
How is the .exe written? Four bytes per instruction? More? Less? I have noticed some applications can create executable files just like that: for example, in ACD See you can export a series of images into a slideshow. But this does not necessarily have to be a SWF slideshow, ACD See is also capable of producing EXEcutable presentations. How is that done?
How can I understand what goes on inside an EXE file?
OllyDbg is an awesome tool that disassembles an EXE into readable instructions and allows you to execute the instructions one-by-one. It also tells you what API functions the program uses and if possible, the arguments that it provides (as long as the arguments are found on the stack).
Generally speaking, CPU instructions are of variable length, some are one byte, others are two, some three, some four etc. It mostly depends on the kind of data that the instruction expects. Some instructions are generalised, like "mov" which tells the CPU to move data from a CPU register to a place in memory, or vice versa. In reality, there are many different "mov" instructions, ones for handling 8-bit, 16-bit, 32-bit data, ones for moving data from different registers and so on.
You could pick up Dr. Paul Carter's PC Assembly Language Tutorial which is a free entry level book that talks about assembly and how the Intel 386 CPU operates. Most of it is applicable even to modern day consumer Intel CPUs.
The EXE format is specific to Windows. The entry-point (i.e. the first executable instruction) is usually found at the same place within the EXE file. It's all kind of difficult to explain all at once, but the resources I've provided should help cure at least some of your curiosity! :)
You need a disassembler which will turn the machine code into assembly language. This Wikipedia link describes the process and provides links to free disassemblers. Of course, as you say you don't understand assembly language, this may not be very informative - what exactly are you trying to do here?
You can use debug from the command line, but that's hard.
C:\WINDOWS>debug taskman.exe
-u
0D69:0000 0E PUSH CS
0D69:0001 1F POP DS
0D69:0002 BA0E00 MOV DX,000E
0D69:0005 B409 MOV AH,09
0D69:0007 CD21 INT 21
0D69:0009 B8014C MOV AX,4C01
0D69:000C CD21 INT 21
0D69:000E 54 PUSH SP
0D69:000F 68 DB 68
0D69:0010 69 DB 69
0D69:0011 7320 JNB 0033
0D69:0013 7072 JO 0087
0D69:0015 6F DB 6F
0D69:0016 67 DB 67
0D69:0017 7261 JB 007A
0D69:0019 6D DB 6D
0D69:001A 206361 AND [BP+DI+61],AH
0D69:001D 6E DB 6E
0D69:001E 6E DB 6E
0D69:001F 6F DB 6F
The executable file you see is Microsofts PE (Portable Executable) format. It is essentially a container, which holds some operating system specific data about a program and the program data itself split into several sections. For example code, resources, static data are stored in seperate sections.
The format of the section depends on what is in it. The code section holds the machine code according to the executable target architecture. In the most common cases this is Intel x86 or AMD-64 (same as EM64T) for Microsoft PE binaries. The format of the machine code is CISC and originates back to the 8086 and earlier. The important aspect of CISC is that its instruction size is not constant, you have to start reading at the right place to get something valuable out of it. Intel publishes good manuals on the x86/x64 instruction set.
You can use a disassembler to view the machine code directly. In combination with the manuals you can guess the source code most of the time.
And then there's MSIL EXE: The .NET executables holding Microsofts Intermediate Language, these do not contain machine specific code, but .NET CIL code. The specifications for that are available online at the ECMA.
These can be viewed with a tool such as Reflector.
The contents of the EXE file are described in Portable Executable. It contains code, data, and instructions to OS on how to load the file.
There is an 1:1 mapping between machine code and assembly. A disassembler program will perform the reverse operation.
There isn't a fixed number of bytes per instruction on i386. Some are a single byte, some are much longer.
Just relating to this question, anyone still read things like
CD 21?
I remembered Sandra Bullock in one show, actually reading a screenful of hex numbers and figure out what the program does. Sort of like the current version of reading Matrix code.
if you do read stuff like CD 21, how do you remember the different various combinations?
Win32 exe format on MSDN
I'd suggest taking an bit of Windows C source code and build and start debugging it in Visual Studio. Switch to the disassembly view and step over the commands. You can see how the C code has been compiled into machine code - and watch it run step-by-step.
If it's as foreign to you as it seems, I don't think a debugger or disassembler is going to help - you need to learn assembler programming first; study the architecture of the processor (plenty of documentation downloadable from Intel). And then since most machine code is generated by compilers, you'll need to understand how compilers generate code - the simplest way to write lots of small programs and then disassemble them to see what your C/C++ is turned into.
A couple of books that'll help you understand:-
Reversing
Hacking = The Art of Exploitation
To get an idea, set a breakpoint on some interesting code, and then go to the CPU window.
If you are interested in more, it is easier to compile short fragments with Free Pascal using the -al parameter.
FPC allows to output the generated assembler in a multitude of assembler formats (TASM,MASM,GAS ) using the -A parameter, and you can have the original pascal code interleaved in comments (and more) for easy crossreference.
Because it is compiler generated assembler, as opposed to assembler from disassembled .exe, it is more symbolic and easier to follow.
Familiarity with low level assembly (and I mean low level assembly, not "macros" and that bull) is probably a must. If you really want to read the raw machine code itself directly, usually you would use a hex editor for that. In order to understand what the instructions do, however, most people would use a disassembler to convert that into the appropriate assembly instructions. If you're one of the minority who wants to understand the machine language itself, I think you'd want the Intel® 64 and IA-32 Architectures Software Developer's Manuals. Volume 2 specifically covers the instruction set, which relates to your query about how to read machine code itself and how assembly relates to it.
Both your curiosity and your level of understanding is exactly where I was at one point. I highly recommend Code: The Hidden Language of Computer Hardware and Software. This will not answer all of the questions you ask here but it will shed light on some of the utterly black magic aspects of computers. It's a thick book but highly readable.
ACD See is probably taking advantage of the fact that .EXE files do no error checking on file length or anything beyond the length of the expected portion of the file. Because of this, you can make an .EXE file that will open its self and load everything beyond a given point as data. This is useful because you can then make a .EXE that works on a given set of data by just tacking that data on the end of a suitably written .EXE
(I have no idea what exactly ACD See is so take that with a big grain of salt but I do know that some program are generated that way.)
Every instruction is in machine code kept in a special memory area within the cpu. EARLY INTEL books gave the machine code for their instructions, so one should try to obtain such books so as to understand this. Obviously today machine codeis not easily available. What would be nice is a program which can reverse hex to machine code. Or do it manually _!!
tedious

Resources