Endianness of "IBM PC" byte order? - endianness

I looked at the Wikipedia article on endianness and it doesn't mention anything about this type of byte order (which is what photoshop asks me about when opening RAW files). Which is it?

IBM PC == x86 which is mentioned in the article (uses LO HI byte order a.k.a. little endian)

That Wikipedia article says that X86 is little-endian. Is there something else you're asking and I'm missing it?

Related

Multidrop Bus addressing

Currently I am reading the MDB_interface_specification( (https://namanow.org/wp-content/uploads/Multi-Drop-Bus-and-Internal-Communication-Protocol.pdf) Version 4. 3(July 2019). In Kapitel 2.3 page 34 they are talking about the Peripheral Address and I don't undrstand how the Address scheme has been built . One prototy of the address scheme look like this: 00101xxxB ( this can be 28H for example ). They upper five bits are used for addressing and the lower 3 bit are the command. If i considered this statement wih my example then the address ist 5 and the Command is 0. I am a little bit confuse can someone please explain me that?
OK. First, read this:
Then, we have value 0x00 as command to Energy Management System (huh, i never saw this kind of MBD device in the wild). MDB datasheet doesn't contain any references to this device yet, but seems it's just POLL command, device must respond to POLL with last status changes (if any) or just ACK with x100 - it's not mistake, it's 0x00 with 9th bit set. Don't read this datasheet unless you want to lose your mind. I am already read this awesome shit and put it (mostly) to hardware implementation see github repo with complete solution
Cheers.

How to get .text segment statistics in IDA Pro?

Is there any way to show how many bytes in .text segment IDA Pro treats as instructions and how many as data? Are other statistics also available (e.g. function sizes)?
Function sizes are shown in the functions list. But there is no built-in functionality for the code/data bytes counts. You will need to use scripting (IDC or Python), or write a plugin using the IDA SDK to calculate it. I'd recommend posting this question on the Hex-Rays user support forum.

For strict education purposes, what exact format of bytes/bits do modern BIOS understand?

BIOS will look in the first 512 bytes of the first sector(at least on PC BIOS, AmeriTrend, PhoenixBIOS, etc.), and any .bin file binary formatted block of bytes will be understood by BIOS, am I correct here?
I just want to ask this to be certain, and because I want to assure that I don't make mistakes when writing my operating system carefully.
The BIOS will be executing under the processor and native-architecture obviously, so once I instruct BIOS with the binary to have the processor move the bytes in to memory I can then transfer control to my software which will then instruct the processor on what it does next, right?
I just want to know if I have this right, and I assure you this isn't spam, as I'm a curious hobbyist who has C/C++, Java, C#, x86 Assembly, and some hardware-design experience as well.
EDIT PEOPLE: I also would like to know if there's a modernized format, file, or block of bytes the BIOS must be assembled/compiled to to be executed, such as a .bin.
As pst comment says, the boot sector is treated as i386 machine code.
The last 2 bytes need to match a special signature (0x55AA), but I think that is it as far as hard requirements.
The code just gets loaded and executed as is.
If you are trying to conform to MBR or GPT partition specs (so that other OS's can see your disk partitions) there is more to it, but that is another thing altogether.
There is no specific "file format" for a boot sector. The BIOS simply reads the raw bytes from the boot sector, and jumps to the first instruction. It is literally just a "block of bytes", the file extension (you keep mentioning .bin) is not relevant at all.

How to read / write .exe machine code manually?

I am not well acquainted to the compiler magic. The act of transforming human-readable code (or the not really readable Assembly instructions) into machine code is, for me, rocket science combined with sorcery.
I will narrow down the subject of this question to Win32 executables (.exe). When I open these files up in a specialized viewer, I can find strings (usually 16b per character) scattered at various places, but the rest is just garbage. I suppose the unreadable part (majority) is the machine code (or maybe resources, such as images etc...).
Is there any straightforward way of reading the machine code? Opening the exe as a file stream and reading it byte by byte, how could one turn these individual bytes into Assembly? Is there a straightforward mapping between these instruction bytes and the Assembly instruction?
How is the .exe written? Four bytes per instruction? More? Less? I have noticed some applications can create executable files just like that: for example, in ACD See you can export a series of images into a slideshow. But this does not necessarily have to be a SWF slideshow, ACD See is also capable of producing EXEcutable presentations. How is that done?
How can I understand what goes on inside an EXE file?
OllyDbg is an awesome tool that disassembles an EXE into readable instructions and allows you to execute the instructions one-by-one. It also tells you what API functions the program uses and if possible, the arguments that it provides (as long as the arguments are found on the stack).
Generally speaking, CPU instructions are of variable length, some are one byte, others are two, some three, some four etc. It mostly depends on the kind of data that the instruction expects. Some instructions are generalised, like "mov" which tells the CPU to move data from a CPU register to a place in memory, or vice versa. In reality, there are many different "mov" instructions, ones for handling 8-bit, 16-bit, 32-bit data, ones for moving data from different registers and so on.
You could pick up Dr. Paul Carter's PC Assembly Language Tutorial which is a free entry level book that talks about assembly and how the Intel 386 CPU operates. Most of it is applicable even to modern day consumer Intel CPUs.
The EXE format is specific to Windows. The entry-point (i.e. the first executable instruction) is usually found at the same place within the EXE file. It's all kind of difficult to explain all at once, but the resources I've provided should help cure at least some of your curiosity! :)
You need a disassembler which will turn the machine code into assembly language. This Wikipedia link describes the process and provides links to free disassemblers. Of course, as you say you don't understand assembly language, this may not be very informative - what exactly are you trying to do here?
You can use debug from the command line, but that's hard.
C:\WINDOWS>debug taskman.exe
-u
0D69:0000 0E PUSH CS
0D69:0001 1F POP DS
0D69:0002 BA0E00 MOV DX,000E
0D69:0005 B409 MOV AH,09
0D69:0007 CD21 INT 21
0D69:0009 B8014C MOV AX,4C01
0D69:000C CD21 INT 21
0D69:000E 54 PUSH SP
0D69:000F 68 DB 68
0D69:0010 69 DB 69
0D69:0011 7320 JNB 0033
0D69:0013 7072 JO 0087
0D69:0015 6F DB 6F
0D69:0016 67 DB 67
0D69:0017 7261 JB 007A
0D69:0019 6D DB 6D
0D69:001A 206361 AND [BP+DI+61],AH
0D69:001D 6E DB 6E
0D69:001E 6E DB 6E
0D69:001F 6F DB 6F
The executable file you see is Microsofts PE (Portable Executable) format. It is essentially a container, which holds some operating system specific data about a program and the program data itself split into several sections. For example code, resources, static data are stored in seperate sections.
The format of the section depends on what is in it. The code section holds the machine code according to the executable target architecture. In the most common cases this is Intel x86 or AMD-64 (same as EM64T) for Microsoft PE binaries. The format of the machine code is CISC and originates back to the 8086 and earlier. The important aspect of CISC is that its instruction size is not constant, you have to start reading at the right place to get something valuable out of it. Intel publishes good manuals on the x86/x64 instruction set.
You can use a disassembler to view the machine code directly. In combination with the manuals you can guess the source code most of the time.
And then there's MSIL EXE: The .NET executables holding Microsofts Intermediate Language, these do not contain machine specific code, but .NET CIL code. The specifications for that are available online at the ECMA.
These can be viewed with a tool such as Reflector.
The contents of the EXE file are described in Portable Executable. It contains code, data, and instructions to OS on how to load the file.
There is an 1:1 mapping between machine code and assembly. A disassembler program will perform the reverse operation.
There isn't a fixed number of bytes per instruction on i386. Some are a single byte, some are much longer.
Just relating to this question, anyone still read things like
CD 21?
I remembered Sandra Bullock in one show, actually reading a screenful of hex numbers and figure out what the program does. Sort of like the current version of reading Matrix code.
if you do read stuff like CD 21, how do you remember the different various combinations?
Win32 exe format on MSDN
I'd suggest taking an bit of Windows C source code and build and start debugging it in Visual Studio. Switch to the disassembly view and step over the commands. You can see how the C code has been compiled into machine code - and watch it run step-by-step.
If it's as foreign to you as it seems, I don't think a debugger or disassembler is going to help - you need to learn assembler programming first; study the architecture of the processor (plenty of documentation downloadable from Intel). And then since most machine code is generated by compilers, you'll need to understand how compilers generate code - the simplest way to write lots of small programs and then disassemble them to see what your C/C++ is turned into.
A couple of books that'll help you understand:-
Reversing
Hacking = The Art of Exploitation
To get an idea, set a breakpoint on some interesting code, and then go to the CPU window.
If you are interested in more, it is easier to compile short fragments with Free Pascal using the -al parameter.
FPC allows to output the generated assembler in a multitude of assembler formats (TASM,MASM,GAS ) using the -A parameter, and you can have the original pascal code interleaved in comments (and more) for easy crossreference.
Because it is compiler generated assembler, as opposed to assembler from disassembled .exe, it is more symbolic and easier to follow.
Familiarity with low level assembly (and I mean low level assembly, not "macros" and that bull) is probably a must. If you really want to read the raw machine code itself directly, usually you would use a hex editor for that. In order to understand what the instructions do, however, most people would use a disassembler to convert that into the appropriate assembly instructions. If you're one of the minority who wants to understand the machine language itself, I think you'd want the Intel® 64 and IA-32 Architectures Software Developer's Manuals. Volume 2 specifically covers the instruction set, which relates to your query about how to read machine code itself and how assembly relates to it.
Both your curiosity and your level of understanding is exactly where I was at one point. I highly recommend Code: The Hidden Language of Computer Hardware and Software. This will not answer all of the questions you ask here but it will shed light on some of the utterly black magic aspects of computers. It's a thick book but highly readable.
ACD See is probably taking advantage of the fact that .EXE files do no error checking on file length or anything beyond the length of the expected portion of the file. Because of this, you can make an .EXE file that will open its self and load everything beyond a given point as data. This is useful because you can then make a .EXE that works on a given set of data by just tacking that data on the end of a suitably written .EXE
(I have no idea what exactly ACD See is so take that with a big grain of salt but I do know that some program are generated that way.)
Every instruction is in machine code kept in a special memory area within the cpu. EARLY INTEL books gave the machine code for their instructions, so one should try to obtain such books so as to understand this. Obviously today machine codeis not easily available. What would be nice is a program which can reverse hex to machine code. Or do it manually _!!
tedious

monitoring file access of an old dos game

I have a bit of an unusual question. I'm running an old DOS game in dosbox under windows xp and i'm trying to determine when and where it access it's data file.
what can i use that will give me a log of all read requests made to a file? I want to know the "when", "from" and "size" of each file read.
I know my basic 8086/8088 assembly but nothing more. so if there's no shortcut tool available, a recommendation of a debugging tool / tutorial that can help me get on the right track can be great also.
the game's "below the roots", if anyone can shed some light about this game's internals, it will be a great help :)
You could try using FileMon for Windows and see what dosbox is accessing via the windows file system.
You could patch the DOSBOX source code :) Just get it to write some debug messages when the reads occur. If you set the debug level high enough it might happen anyway!
Most DOS programs use DOS interrupts. Some however use BIOS interrupts or worse.
Anyway, in case it helps, here are the file-reading DOS interrupts I know of:
FCB-oriented functions:
INT 21h, AH=14h (sequential read)
INT 21h, AH=21h (random read)
INT 21h, AH=27h (random block read, à la fread())
Handle-oriented functions:
INT 21h, AH=3Fh (sequential read)
INT 21h, AH=42h (seek)

Resources