8051 - PSW being set to 0X80 after CJNE - 8051

I'm pretty new to 8051 and was testing it out. After CJNE executes, it sets PSW to 0x80. Why does it do that? Below is the code. I am using the EdSim51DI simulator.
Any help would greatly be appreciated

The PSW is set to 0x80 because your first operand to the CJNE instruction is less than the second operand. Read on to better understand why.
The Program Status Word (PSW) contains status bits that reflect the current CPU state. The most significant bit (bit 7) in the PSW is the carry bit (C).
Operation: CJNE
Function: Compare and Jump If Not Equal
Syntax: CJNE operand1,operand2,reladdr
The CJNE instruction compares the value of operand1 and operand2 and branches to the indicated relative address if they are not equal. If the two operands are equal, program flow continues with the instruction following the CJNE instruction. This instruction also affects the carry flag in the PSW. The carry bit (C) is set if operand1 is less than operand2, otherwise it is cleared. This functionality allows you to use the CJNE instruction to perform a greater than/less than test for decision making purposes as demonstrated in the example below.
; The following code sample checks if the value in A is equal to, less
; than, or greater than 0x55. The NOP instructions can be replaced
; with code to handle each condition as desired.
CJNE A, #55h, CHK_LESS ; If A is not 0x55, check
LJMP EQUAL ; A is 0x55, so jump to EQUAL code
CHK_LESS: JC IS_LESS ; If carry is set, A is less than 0x55
IS_GREATER: NOP ; A is greater than 0x55
LJMP DONE
IS_LESS: NOP ; A is less than 0x55
LJMP DONE
EQUAL: NOP ; A is equal to 0x55
DONE: NOP ; Done with the comparison

Related

RISCV: how the branch intstructions are calculated?

I am trying to understand how modern CPU works. I am focused on RISC-V. there are a few types of branches:
BEQ
BNE
BLT
BGE
BLTU
BGEU
I use a venus simulator to test this and also I am trying to simulate it as well and so far so good it works, but I cannot understand, how are branches calculated.
From what I have read, the ALU unit has just one signal output - ZERO (apart from its math output) which is active whenever the output is zero. But just how can I determine if the branch should be taken or not based just on the ZERO output? And how are they calculated?
Example code:
addi t0, zero, 9
addi t1, zero, 10
blt t0, t1, end
end:
Example of branches:
BEQ - subtract 2 numbers, if ZERO is active, branch
BNE - subtract 2 numbers, if ZERO is not active, branch
BLT - and here I am a little bit confused; should I subtract and then look at the sign bit, or what?
BGE / BGEU - and how to differentiate these? What math instructions should I use?
Yes, the ZERO output gives you equal / not-equal. You can also use XOR instead of SUB for equality comparisons if that runs faster (ready earlier in a partial clock cycle) and/or uses less power (fewer transistors switching).
Fun fact: MIPS only has eq / ne and signed-compare-against-zero branch conditions, all of which can be tested fast without carry propagation or any other cascading bits. That mattered because it checked branch conditions in the first half cycle of exec, in time to forward to fetch, keeping branch latency down to 1 cycle which the branch-delay slot hid on classic MIPS pipelines. For other conditions, like blt between two registers, you need slt and branch on that. RISC-V has true hardware instructions for blt between two registers, vs. MIPS's bltz against zero only.
Why use an ALU with only a zero output? That makes it unusable for comparisons other than exact equality.
You need other outputs to determine GT / GE / LE / LT (and their unsigned equivalents) from a subtract result.
For unsigned conditions, all you need is zero and a carry/borrow (unsigned overflow) flag.
The sign bit of the result on its own is not sufficient for signed conditions because signed overflow is possible: (-1) - (-2) = +1 : -1 > -2 (signbit clear) but (8-bit wraparound) 0x80 - 0x7F = +1 (signbit also clear) but -128 < 127. The sign bit of a number on its own is only useful if comparing against zero.
If you widen the result (by sign-extending the inputs and doing one more bit of add/sub) that makes signed overflow impossible so that 33rd bit is a signed-less-than result directly.
You can also get a signed-less-than result from signed_overflow XOR signbit instead of actually widening + adding. You might also want an ALU output for signed overflow, if RISC-V has any architectural way for software to check for signed-integer overflow.
Signed-overflow can be computed by looking at the carry in and carry out from the MSB (the sign bit). If those differ, you have overflow. i.e. SF = XOR of those two carries. See also http://teaching.idallen.com/dat2343/10f/notes/040_overflow.txt for a detailed look at unsigned carry vs. signed overflow with 2-bit and 4-bit examples.
In CPUs with a FLAGS register (e.g. x86 and ARM), those ALU outputs actually go into a special register with named bits. You can look at an x86 manual for conditional-jump instructions to see how condition names like l (signed less-than) or b (unsigned below) map to those flags:
signed conditions:
jl (aka RISC-V blt) : Jump if less (SF≠ OF). That's output signbit not-equal to Overflow Flag, from a subtract / cmp
jle : Jump if less or equal (ZF=1 or SF≠ OF).
jge (aka RISC-V bge) : Jump if greater or equal (SF=OF).
jg (aka RISC-V bgt) : Jump short if greater (ZF=0 and SF=OF).
If you decide to have your ALU just produce a "signed-less-than" output instead of separate SF and OF outputs, that's fine. SF==OF is just !(SF != OF).
(x86 also has some mnemonic synonyms for the same opcode, like jl = jnge. There are "only" 16 FLAGS predicates, including OF=0 alone (test for overflow, not a compare result), and the parity flag. You only care about the actual signed/unsigned compare conditions.)
If you think through some example cases, like testing that INT_MAX > INT_MIN you'll see why these conditions make sense, like that example I showed above for 8-bit numbers.
unsigned:
jb (aka RISC-V bltu) : Jump if below (CF=1). That's just testing the carry flag.
jae (aka RISC-V bgeu) : Jump short if above or equal (CF=0).
ja (aka RISC-V bgtu) : Jump short if above (CF=0 and ZF=0).
(Note that x86 subtract sets CF = borrow output, so 1 - 2 sets CF=1. Some other ISAs (e.g. ARM) invert the carry flag for subtract. When implementing RISC-V this will all be internal to the CPU, not architecturally visible to software.)
I don't know if RISC-V actually has all of these different branch conditions, but x86 does.
There might be simpler ways to implement a signed or unsigned comparator than doing subtraction at all.
But if you already have an add/subtract ALU and want to piggyback on that then you might just want it to generate Carry and Signed-less-than outputs as well as Zero.
That way you don't need a separate sign-flag output, or to grab the MSB of the integer result. It's just one extra XOR gate inside the ALU to combine those two things.
You don't have to do subtraction to compare two (signed or unsigned) numbers.
You can use cascaded 7485 chip for example.
With this chip you can do all Branch computation without doing any subtraction.

Understanding 8086 assembler debugger

I'm learning assembler and I need some help with understanding codes in the debugger, especially the marked part.
mov ax, a
mov bx, 4
I know how above instructions works, but in the debugger I have "2EA10301" and "BB0400".
What do they mean?
The first instruction moves variable a from data segment to the ax register, but in debugger I have cs:[0103].
What do mean these brackets and these numbers?
Thanks for any help.
The 2EA10301 and BB0400 numbers are the opcodes for the two instructions highlighted.
2E is Code Segment (CS) prefix and instructs the CPU to access memory with the CS segment instead of the default DS one.
A1 is the opcode for MOV AX, moffs16 and 0301 is the immediate 0103h in little endian, the address to read from.
So 2EA10301 is mov ax, cs:[103h].
The square brackets are the preferred way to denote a memory access through one the addressing mode but some assemblers support the confusing syntax without the brackets.
As this syntax is ambiguous and less standardised across different assemblers than the other, it is discouraged.
During the assembling the assembler keeps a location counter incremented for each byte emitted (each "section"/segment has its own counter, i.e. the counter is reset at the beginning of each "section").
This gives each variable an offset that is used to access it and to craft the instruction, variables names are for the human, CPUs can only read from addresses, numbers.
This offset will later be and address in memory once the file is loaded.
The assembler, the linker and the loader cooperate, there are various tricks at play, to make sure the final instruction is properly formed in memory and that the offset is transformed into the right address.
In your example their efforts culminate in the value 103h, that is the address of a in memory.
Again, in your example, the offset, if the file is a COM (by the way, don't put variables in the execution flow), was still 103h due to the peculiar structure of the COM files.
But in general, it could have been another number.
BB is MOV r16, imm16 with the register BX. The base form is B8 with the lower 3 bits indicating the register to use, BX is denoted by a value of 3 (011b in binary) and indeed 0B8h + 3 = 0BBh.
After the opcode, again, the WORD immediate 0400 that encodes 4 in little endian.
You now are in the position to realise that the assembly source is not always fully informative, as the assemblers implement some form of syntactic sugar.
The instruction mov ax, a, identical to mov bx, 4 in its syntax and that technically is move the immediate value, constant and known at assembly time, given by the address of a into ax, is instead interpreted as move the content of a, a value present in memory and readable only with a memory access, into ax because a is known to be a variable.
This phenomenon is limited in the x86, being CISC, and more widespread in the RISC world, where the lack of commonly needed instructions is compensated with pseudo-instructions.
Well, first, assembler is x86 Assembly. The assembler is what turns the instructions into machine code.
When you disassemble programs, it probably will use the hex values (like 90 is NOP instruction or B8 to move something to AX).
Square brackets copies the memory address to which the register points to.
The hex on the side is called the address.
Everything is very simple. The command mov ax, cx: [0103] means that the value of 000Ah is loaded into the register ax. This value is taken from the code segment at 0103h. Slightly higher in the pictures you can see this value. cx: 0101 0B900A00. Accordingly, at the address 0101h to be the value 0Bh, 0102h to be the value 90h, 0103h to be the value 0Ah, 0104h to be the value 00h. It turns out that the AL register loads the value from the address 0103h equal to 0Ah. It turns out that the AH register loads the value from the address 0104h equal to 00h and it turns out ax = 000Ah. If instead of the ax command, cx: [0103] there was the ax command, cx: [0101], then ax = 900Bh or the ax command, cx: [0102], then ax = 0A90h.

OF and CF status when multiply two numbers

When two numbers multiply, what happens to the O (Overflow flag register) and C (Carry flag register) flag bits?
After multiplying two 8-bit numbers OF and CF will be set if an overflow occurs. That's when the result is:
above 255 if you use the MUL instruction
and when the result is
above 127 if you use the IMUL instruction and the operands had had the same sign or
below -128 and the operands had had two different signs
The overflow will be stored automatically to ah, so AX will be al*[the 2nd source register]. ah will be zeroed, if no overflow occurs.
After multiplying AX with a 16-bit number OF and CF will be set if the result is:
above 65535 if you use the MUL instruction
and when the result is
above 32767 if you use the IMUL instruction and the operands had had the same sign or
below -32768 if you use the IMUL instruction and the operands had had two different signs
The overflow will be stored automatically to DX.

OS development - converting logical block format to Cylinder-Head-Sector

I am referring BrokenThorn's OS development tutorial, and currently reading the part on developing a complete first stage bootloader that loads the second stage - Bootloaders 4.
In the part of converting Logical Block Address (LBA) to Cylinder-Head-Sector (CHS) format, this is the code that is used -
LBACHS:
xor dx, dx ; prepare dx:ax for operation
div WORD [bpbSectorsPerTrack] ; divide by sectors per track
inc dl ; add 1 (obsolute sector formula)
mov BYTE [absoluteSector], dl
xor dx, dx ; prepare dx:ax for operation
div WORD [bpbHeadsPerCylinder] ; mod by number of heads (Absolue head formula)
mov BYTE [absoluteHead], dl ; everything else was already done from the first formula
mov BYTE [absoluteTrack], al ; not much else to do :)
ret
I am not able to understand the logic behind this conversion. I tried using a few sample values to walk through it and see how it works, but that got me even more confused. Can someone explain how this conversion works and the logic used ?
I am guessing that your LBA value is being stored in AX as you are performing division on some value.
As some pre-information for you, the absoluteSector is the CHS sector number, absoluteHead is the CHS head number, and absoluteTrack is the CHS cylinder number. Cylinders and tracks are the exact same thing, just a different name.
Also, the DIV operation for your code in 16-bit will take whatever is in the DX:AX register combination and divide it by some value. The remainder of the division will be in the DX register while the actual result will be in the AX register.
Next, the *X registers are 16-bit registers, where * is one of ABCD. They are made up of a low and high component, both referred to as *H and *L for high and low, respectively. For example, the DX register has DH for the upper 8 bits and DL for the lower 8 bits.
Finally, as the BYTE and WORD modifiers simply state the size of the data that will be used/transferred.
The first value you must extract is the sector number, which is obtained by diving the LBA value by the number of sectors per track. The DL register will then contain the sector number minus one. This is because counting sectors starts at 1, which is different than most values, which start at zero. To fix this, we add one to the DL register get the correct sector value. This value is stored in memory at absoluteSector.
The next value you must extract is the head number, which is obtained by dividing the result of the last DIV operation by the number of heads per cylinder. The DL register will then contain the head number, which we store at absoluteHead.
Finally, we get the track number. With the last division we already obtained the value, which is in the AL register. We then store this value at absoluteTrack.
Hope this cleared things up a little bit.
-Adrian

Usefulness of LOOPNE

I am unable to understand the usefulness of LOOPNE. Even if LOOPNE was not there and only LOOP was there, it would have done the same thing here. Please help me out.
MOV CX, 80
MOV AH,1
INT 21H
CMP AL, ' '
LOOPNE BACK
CMP is more or less a SUB instruction without changing the value, which means that it sets flags such as ZF (the zero flag).
LOOPNE has 2 conditions to loop: cx > 0 and ZF = 0
LOOP has 1 condition to loop: cx > 0
So, a normal LOOP would go through all characters, whereas LOOPNE will go through all characters, or until a space is encountered. Whichever comes first
LOOPNE loops when a comparison fails, and when there is a remaining nonzero iteration count (after decrementing it). This is arguably very convenient for finding an element in a linear list of known length.
There is little use for it in modern x86 CPUs.
The LOOPNE instruction is likely implemented internally in the CPU by microinstructions and thus effectively equivalent to JNE/DEC CX/JNE.
Because the CPU designers invest vast amounts of effort to optimize compare/branch/register arithmetic, the equivalent instruction sequence is likely, on a highly pipelined CPU, to execute virtually just as fast. It may actually execute slower; you'll only know by timing it. And the fact that you are confused about what it does makes it a source of coding errors.
I presently code the equivalent instruction sequence, because I got bit by a misunderstanding once. I'm not confused about CMP and JNE.

Resources