OS development - converting logical block format to Cylinder-Head-Sector - bootloader

I am referring BrokenThorn's OS development tutorial, and currently reading the part on developing a complete first stage bootloader that loads the second stage - Bootloaders 4.
In the part of converting Logical Block Address (LBA) to Cylinder-Head-Sector (CHS) format, this is the code that is used -
LBACHS:
xor dx, dx ; prepare dx:ax for operation
div WORD [bpbSectorsPerTrack] ; divide by sectors per track
inc dl ; add 1 (obsolute sector formula)
mov BYTE [absoluteSector], dl
xor dx, dx ; prepare dx:ax for operation
div WORD [bpbHeadsPerCylinder] ; mod by number of heads (Absolue head formula)
mov BYTE [absoluteHead], dl ; everything else was already done from the first formula
mov BYTE [absoluteTrack], al ; not much else to do :)
ret
I am not able to understand the logic behind this conversion. I tried using a few sample values to walk through it and see how it works, but that got me even more confused. Can someone explain how this conversion works and the logic used ?

I am guessing that your LBA value is being stored in AX as you are performing division on some value.
As some pre-information for you, the absoluteSector is the CHS sector number, absoluteHead is the CHS head number, and absoluteTrack is the CHS cylinder number. Cylinders and tracks are the exact same thing, just a different name.
Also, the DIV operation for your code in 16-bit will take whatever is in the DX:AX register combination and divide it by some value. The remainder of the division will be in the DX register while the actual result will be in the AX register.
Next, the *X registers are 16-bit registers, where * is one of ABCD. They are made up of a low and high component, both referred to as *H and *L for high and low, respectively. For example, the DX register has DH for the upper 8 bits and DL for the lower 8 bits.
Finally, as the BYTE and WORD modifiers simply state the size of the data that will be used/transferred.
The first value you must extract is the sector number, which is obtained by diving the LBA value by the number of sectors per track. The DL register will then contain the sector number minus one. This is because counting sectors starts at 1, which is different than most values, which start at zero. To fix this, we add one to the DL register get the correct sector value. This value is stored in memory at absoluteSector.
The next value you must extract is the head number, which is obtained by dividing the result of the last DIV operation by the number of heads per cylinder. The DL register will then contain the head number, which we store at absoluteHead.
Finally, we get the track number. With the last division we already obtained the value, which is in the AL register. We then store this value at absoluteTrack.
Hope this cleared things up a little bit.
-Adrian

Related

Invalid division, swears at comma when compiling [duplicate]

In x86 assembly, most instructions have the following syntax:
operation dest, source
For example, add looks like
add rax, 10 ; adds 10 to the rax register
But mnemonics like mul and div only have a single operand - source - with the destination being hardcoded as rax. This forces you to set and keep track of the rax register anytime you want to multiply or divide, which can get cumbersome if you are doing a series of multiplications.
I'm assuming there is a technical reason pertaining to the hardware implementation for multiplication and division. Is there?
The destination register of mul and div is not rax, it's rdx:rax. This is because both instructions return two results; mul returns a double-width result and div returns both quotient and remainder (after taking a double-width input).
The x86 instruction encoding scheme only permits two operands to be encoded in legacy instructions (div and mul are old, dating back to the 8086). Having only one operand permits the bits dedicated to the other operand to be used as an extension to the opcode, making it possible to encode more instructions. As the div and mul instructions are not used too often and as one operand has to be hard-coded anyway, hard-coding the second one was seen as a good tradeoff.
This ties into the auxillary instructions cbw, cwd, cwde, cdq, cdqe, and cqo used for preparing operands to the div instruction.
Note that later, additional forms of the imul instruction were introduced, permitting the use of two modr/m operands for the common cases of single word (as opposed to double-word) multiplication (80386) and multiplication by constant (80186). Even later, BMI2 introduced a VEX-encoded mulx instruction permitting three freely chosable operands of which one source operand can be a memory operand.

How are Ethereum bytecode JUMPs and JUMPDESTs resolved?

I've been looking around for info on how Ethereum deals with jumps and jump destinations. From various blogs and the yellow paper what I found is as follows:
The operand taken by JUMP and the first of the two operands taken by JUMPI are the value the the PC is set to (assume the first stack value != 0 in the case of JUMPI).
However, looking at this contract's creation code (as opcodes) the first few opcodes/values are:
PUSH1 0x60
PUSH1 0x40
MSTORE
CALLDATASIZE
ISZERO
PUSH2 0x00f8
JUMPI
As I understand it this means that if the value pushed to the stack by ISZERO != 0 then PC will change to 0x00f8 as JUMPI takes two from the stack, checks if the second is 0 and if not sets PC to the value of its first operand.
The problem I am having is that 0x00f8 in decimal is 248. The 248th position in the contract appears to be MSTORE and not a JUMPDEST, which would cause the contract to fail in its execution as JUMP* can only point to a valid JUMPDEST.
Presumably contracts don't jump to invalid destinations on purpose?
If anyone could explain how jumps and jump destinations are resolved I would be very grateful.
In case it helps others:
The confusion arose from the EVM reading byte by byte and NOT word by word.
From the example in the question, 0x00f8 would be the 248th byte, not the 248th word.
As each opcode is 1 byte long PC is normally incremented by 1 when reading an opcode.
However in the case of a PUSH instruction, information on how many of the following bytes are to be taken as its operand is also included.
For example PUSH2 takes the 2 bytes that follow it, PUSH6 takes 6 bytes that follow it, and so on. Here PC would be incremented by 1 for the PUSH and then 2 or 6 respectively for each byte of the data used by the PUSH.
Just want to point out that there is a difference in JUMP and JUMPI.
JUMP just takes 1 element from the stack i.e. destination. Which is generally an offset in hex pushed to the stack.
JUMPI is a conditional jump that takes top 2 elements from the stack i.e. destination and condition.
In the example you gave the condition is ISZERO(checks if the top most element of the stack is 0 or not).
So if that returns true, it will JUMP to the desitnation that is the offset 0x00f8(248 in decimal).
If the condition is False, it will just increase the program counter by 1.
In the contract you mentioned, it is a JUMPDEST opcode at (Program counter)248.
The program counter depends on the opcode. How much many bytes does a opcode push into the stack,etc. e.g.
PUSH1 0x60 - PC[0]
PUSH1 0x40 - PC[2]
MSTORE - PC[4]
CALLDATASIZE- PC[5]
ISZERO - PC[6]
PUSH2 0x00f8- PC[7]
JUMPI - PC[10]
Maybe this website will give you a better understanding on opcodes https://ethervm.io/

Understanding 8086 assembler debugger

I'm learning assembler and I need some help with understanding codes in the debugger, especially the marked part.
mov ax, a
mov bx, 4
I know how above instructions works, but in the debugger I have "2EA10301" and "BB0400".
What do they mean?
The first instruction moves variable a from data segment to the ax register, but in debugger I have cs:[0103].
What do mean these brackets and these numbers?
Thanks for any help.
The 2EA10301 and BB0400 numbers are the opcodes for the two instructions highlighted.
2E is Code Segment (CS) prefix and instructs the CPU to access memory with the CS segment instead of the default DS one.
A1 is the opcode for MOV AX, moffs16 and 0301 is the immediate 0103h in little endian, the address to read from.
So 2EA10301 is mov ax, cs:[103h].
The square brackets are the preferred way to denote a memory access through one the addressing mode but some assemblers support the confusing syntax without the brackets.
As this syntax is ambiguous and less standardised across different assemblers than the other, it is discouraged.
During the assembling the assembler keeps a location counter incremented for each byte emitted (each "section"/segment has its own counter, i.e. the counter is reset at the beginning of each "section").
This gives each variable an offset that is used to access it and to craft the instruction, variables names are for the human, CPUs can only read from addresses, numbers.
This offset will later be and address in memory once the file is loaded.
The assembler, the linker and the loader cooperate, there are various tricks at play, to make sure the final instruction is properly formed in memory and that the offset is transformed into the right address.
In your example their efforts culminate in the value 103h, that is the address of a in memory.
Again, in your example, the offset, if the file is a COM (by the way, don't put variables in the execution flow), was still 103h due to the peculiar structure of the COM files.
But in general, it could have been another number.
BB is MOV r16, imm16 with the register BX. The base form is B8 with the lower 3 bits indicating the register to use, BX is denoted by a value of 3 (011b in binary) and indeed 0B8h + 3 = 0BBh.
After the opcode, again, the WORD immediate 0400 that encodes 4 in little endian.
You now are in the position to realise that the assembly source is not always fully informative, as the assemblers implement some form of syntactic sugar.
The instruction mov ax, a, identical to mov bx, 4 in its syntax and that technically is move the immediate value, constant and known at assembly time, given by the address of a into ax, is instead interpreted as move the content of a, a value present in memory and readable only with a memory access, into ax because a is known to be a variable.
This phenomenon is limited in the x86, being CISC, and more widespread in the RISC world, where the lack of commonly needed instructions is compensated with pseudo-instructions.
Well, first, assembler is x86 Assembly. The assembler is what turns the instructions into machine code.
When you disassemble programs, it probably will use the hex values (like 90 is NOP instruction or B8 to move something to AX).
Square brackets copies the memory address to which the register points to.
The hex on the side is called the address.
Everything is very simple. The command mov ax, cx: [0103] means that the value of 000Ah is loaded into the register ax. This value is taken from the code segment at 0103h. Slightly higher in the pictures you can see this value. cx: 0101 0B900A00. Accordingly, at the address 0101h to be the value 0Bh, 0102h to be the value 90h, 0103h to be the value 0Ah, 0104h to be the value 00h. It turns out that the AL register loads the value from the address 0103h equal to 0Ah. It turns out that the AH register loads the value from the address 0104h equal to 00h and it turns out ax = 000Ah. If instead of the ax command, cx: [0103] there was the ax command, cx: [0101], then ax = 900Bh or the ax command, cx: [0102], then ax = 0A90h.

8051 - PSW being set to 0X80 after CJNE

I'm pretty new to 8051 and was testing it out. After CJNE executes, it sets PSW to 0x80. Why does it do that? Below is the code. I am using the EdSim51DI simulator.
Any help would greatly be appreciated
The PSW is set to 0x80 because your first operand to the CJNE instruction is less than the second operand. Read on to better understand why.
The Program Status Word (PSW) contains status bits that reflect the current CPU state. The most significant bit (bit 7) in the PSW is the carry bit (C).
Operation: CJNE
Function: Compare and Jump If Not Equal
Syntax: CJNE operand1,operand2,reladdr
The CJNE instruction compares the value of operand1 and operand2 and branches to the indicated relative address if they are not equal. If the two operands are equal, program flow continues with the instruction following the CJNE instruction. This instruction also affects the carry flag in the PSW. The carry bit (C) is set if operand1 is less than operand2, otherwise it is cleared. This functionality allows you to use the CJNE instruction to perform a greater than/less than test for decision making purposes as demonstrated in the example below.
; The following code sample checks if the value in A is equal to, less
; than, or greater than 0x55. The NOP instructions can be replaced
; with code to handle each condition as desired.
CJNE A, #55h, CHK_LESS ; If A is not 0x55, check
LJMP EQUAL ; A is 0x55, so jump to EQUAL code
CHK_LESS: JC IS_LESS ; If carry is set, A is less than 0x55
IS_GREATER: NOP ; A is greater than 0x55
LJMP DONE
IS_LESS: NOP ; A is less than 0x55
LJMP DONE
EQUAL: NOP ; A is equal to 0x55
DONE: NOP ; Done with the comparison

Appending 0 before the hexa number

I have been instructed by my teacher to append 0 before the hexa numbers while writing instructions as some compilers search for 0 before the number in an instruction to differentiate it from a label. I am confused if the instruction already starts with a 0, what should be done in such a case?
For Example,
AND BL, 0FH
Is there a need of adding 0 before that hexa number or not? Please help me out. Thanks
EDIT:
Sorry if I had not been clearer enough before. What I meant was that in the above example, a 0 is already present, do I need to convert it to,
AND BL, 00FH
Except for the special cases like 0 or 1, I tend to encode my hex numbers with the full complement of digits just so it's easier to see what the intent is:
mov al, 09h
mov ax, 0123h
and so on.
For cases where the number starts with an alpha character (like deadbeef), I prefix it with an extra 0.
But no, it's not usually (a) necessary to do this if your hex number already begins with a digit.
In any case, I'd be putting most numbers into an equ statement rather than sprinkling magic numbers throughout the code. Would you rather see:
mov ax, 80
or:
mov ax, lines_per_screen
(a) Of course, it depends on your assembler but, from memory, all the ones I've used work this way.
No, there's no need (and including more than one leading 0 is fairly unusual).
Your example is an apt one though -- without the leading 0 to tell it this was a number, the assembler would normally interpret FH as a symbol rather than a number.

Resources