Is carry flag complemented when subtrahend is converted into 2's complement in 8085 microprocessor? - flags

I was studying from a web course and found an example in which subtraction operation was explained. In that example,
A= A5H, B= 9BH
and operation SUB B was executed.
As the subtraction operation in 8085 microprocessor is carried out by converting subtrahend into 2's complement and then adding it to minuend, the answer thus obtained was A= (0000 1010)2(see figure)
As it is clearly visible that a carry is produced after the operation, so the CY flag, i.e., carry flag must be SET. But they explained it as under:
"CY bit seems to be ‘1’. But it is complemented and then
stored. Therefore, CY bit is stored as ‘0’."
I didn't understand that why carry flag is to be complimented? Is it because the subtrahend is converted into 2's complement or anything else?

Indirectly, yes.
In order to subtract with a 'borrow' status result as 808x architecture requires, you add the complement of the subtrahend, AND complement the carry out from the ALU to get the 'borrow' bit. Thus you complement carry for effectively the same reason you complemented the subtrahend, but not directly because you did so.
Some CPUs instead have a 'carry/NOT borrow' status which uses the un-complemented carry logic. See https://en.wikipedia.org/wiki/Carry_flag#Carry_flag_vs._borrow_flag .

Related

Can overflow occur when a positive number is subtracted from a positive number resulting in a negative number?

I am working with the mips assembly language but am confused on the overflow aspect of arithmetic here.
Say I am subtracting 25 from 20 and end up with -5. Would this result in an overflow?
I understand that with addition if you add 2 positive numbers or 2 negative numbers and the output is the opposite sign then there is overflow but am lost when it comes to subtraction.
Find the examples at the extreme, let's do it in 8 bits signed to make it simple but the same principles holds in 32 bits
minuend: the smallest possible positive (non-negative) number, 0 and
subtrahend: the largest possible number, 127
When we do 0 - 127, the answer is -127 and that indeed fits in 8 bits signed.  There is a borrow.  In any processor the effect of the borrow is to propagate 1's though out the upper bits, making it a negative number of the proper magnitude.
Different processors set flags differently based on this borrow, MIPS doesn't have flags, while x86 will set flags to indicate borrow, and other processors will set the flags to indicate carry.
In 8 bit signed numbers:
minuend: the smallest possible positive (non-negative) number, 0 and
subtrahend: the largest possible number, -128
When we do 0 - -128, the answer should be 128, but that cannot be represented in 8 bit signed format, so this is the example of overflow.  0 - -127 = 127 and that can be represented, so no overflow there.
If we do it in 8 bits unsigned, your example of 25-20 = -5, but -5 cannot be represented in an unsigned format, so this is indeed overflow (or modular arithmetic, if you like that).
Short answer: Yes, as the 32-bit representation of -5 is FFFFFFFB.
Long answer: Depends on what you mean by "overflow".
There's signed overflow, which is when you cross the 7FFFFFFF-80000000 boundary.
And there's unsigned overflow where you cross the FFFFFFFF-00000000 boundary.
For signed arithmetic, signed overflow is undeniably a bad thing (and is considered undefined behavior in C and other languages). However, unsigned overflow is not necessarily a problem. Usually it is, but many procedures rely on it to work.
For example, imagine you have a "frame timer" variable, i.e. a 32-bit counter variable that increments by 1 during an interrupt service routine. This interrupt is tied to a real-time clock running at 60 hertz, so every 1/60th of a second the variable's value increases by 1.
Now, this variable will overflow eventually. But do we really care? No. It just wraps around back to zero again. For our purposes, it's fine, since we really don't need to know that accurately how long our program has been running since it began. We probably have events that occur every n ticks of the timer, but we can just use a bitmask for that. Effectively in this case we're using unsigned overflow to say "if this value equals FFFFFFFF and we're about to add 1 to it, reset it to zero instead." Which thanks to overflow we can easily implement without any additional condition checking.
The reason I bring this up is so that you understand that overflow is not always a bad thing, if it's the unsigned variety. It depends entirely on what your data is intended to represent (which is something you can't explain even to a C compiler.)

RISCV: how the branch intstructions are calculated?

I am trying to understand how modern CPU works. I am focused on RISC-V. there are a few types of branches:
BEQ
BNE
BLT
BGE
BLTU
BGEU
I use a venus simulator to test this and also I am trying to simulate it as well and so far so good it works, but I cannot understand, how are branches calculated.
From what I have read, the ALU unit has just one signal output - ZERO (apart from its math output) which is active whenever the output is zero. But just how can I determine if the branch should be taken or not based just on the ZERO output? And how are they calculated?
Example code:
addi t0, zero, 9
addi t1, zero, 10
blt t0, t1, end
end:
Example of branches:
BEQ - subtract 2 numbers, if ZERO is active, branch
BNE - subtract 2 numbers, if ZERO is not active, branch
BLT - and here I am a little bit confused; should I subtract and then look at the sign bit, or what?
BGE / BGEU - and how to differentiate these? What math instructions should I use?
Yes, the ZERO output gives you equal / not-equal. You can also use XOR instead of SUB for equality comparisons if that runs faster (ready earlier in a partial clock cycle) and/or uses less power (fewer transistors switching).
Fun fact: MIPS only has eq / ne and signed-compare-against-zero branch conditions, all of which can be tested fast without carry propagation or any other cascading bits. That mattered because it checked branch conditions in the first half cycle of exec, in time to forward to fetch, keeping branch latency down to 1 cycle which the branch-delay slot hid on classic MIPS pipelines. For other conditions, like blt between two registers, you need slt and branch on that. RISC-V has true hardware instructions for blt between two registers, vs. MIPS's bltz against zero only.
Why use an ALU with only a zero output? That makes it unusable for comparisons other than exact equality.
You need other outputs to determine GT / GE / LE / LT (and their unsigned equivalents) from a subtract result.
For unsigned conditions, all you need is zero and a carry/borrow (unsigned overflow) flag.
The sign bit of the result on its own is not sufficient for signed conditions because signed overflow is possible: (-1) - (-2) = +1 : -1 > -2 (signbit clear) but (8-bit wraparound) 0x80 - 0x7F = +1 (signbit also clear) but -128 < 127. The sign bit of a number on its own is only useful if comparing against zero.
If you widen the result (by sign-extending the inputs and doing one more bit of add/sub) that makes signed overflow impossible so that 33rd bit is a signed-less-than result directly.
You can also get a signed-less-than result from signed_overflow XOR signbit instead of actually widening + adding. You might also want an ALU output for signed overflow, if RISC-V has any architectural way for software to check for signed-integer overflow.
Signed-overflow can be computed by looking at the carry in and carry out from the MSB (the sign bit). If those differ, you have overflow. i.e. SF = XOR of those two carries. See also http://teaching.idallen.com/dat2343/10f/notes/040_overflow.txt for a detailed look at unsigned carry vs. signed overflow with 2-bit and 4-bit examples.
In CPUs with a FLAGS register (e.g. x86 and ARM), those ALU outputs actually go into a special register with named bits. You can look at an x86 manual for conditional-jump instructions to see how condition names like l (signed less-than) or b (unsigned below) map to those flags:
signed conditions:
jl (aka RISC-V blt) : Jump if less (SF≠ OF). That's output signbit not-equal to Overflow Flag, from a subtract / cmp
jle : Jump if less or equal (ZF=1 or SF≠ OF).
jge (aka RISC-V bge) : Jump if greater or equal (SF=OF).
jg (aka RISC-V bgt) : Jump short if greater (ZF=0 and SF=OF).
If you decide to have your ALU just produce a "signed-less-than" output instead of separate SF and OF outputs, that's fine. SF==OF is just !(SF != OF).
(x86 also has some mnemonic synonyms for the same opcode, like jl = jnge. There are "only" 16 FLAGS predicates, including OF=0 alone (test for overflow, not a compare result), and the parity flag. You only care about the actual signed/unsigned compare conditions.)
If you think through some example cases, like testing that INT_MAX > INT_MIN you'll see why these conditions make sense, like that example I showed above for 8-bit numbers.
unsigned:
jb (aka RISC-V bltu) : Jump if below (CF=1). That's just testing the carry flag.
jae (aka RISC-V bgeu) : Jump short if above or equal (CF=0).
ja (aka RISC-V bgtu) : Jump short if above (CF=0 and ZF=0).
(Note that x86 subtract sets CF = borrow output, so 1 - 2 sets CF=1. Some other ISAs (e.g. ARM) invert the carry flag for subtract. When implementing RISC-V this will all be internal to the CPU, not architecturally visible to software.)
I don't know if RISC-V actually has all of these different branch conditions, but x86 does.
There might be simpler ways to implement a signed or unsigned comparator than doing subtraction at all.
But if you already have an add/subtract ALU and want to piggyback on that then you might just want it to generate Carry and Signed-less-than outputs as well as Zero.
That way you don't need a separate sign-flag output, or to grab the MSB of the integer result. It's just one extra XOR gate inside the ALU to combine those two things.
You don't have to do subtraction to compare two (signed or unsigned) numbers.
You can use cascaded 7485 chip for example.
With this chip you can do all Branch computation without doing any subtraction.

Theoretically, is comparison between 0 and 255 faster than 0 and 1?

From the point of view of very low level programming, how is performed the comparison between two numbers?
Using one byte, unsigned numbers 0, 1 and 255 are written:
0 -----> 00000000
1 -----> 00000001
255 ---> 11111111
Now, what happens during the comparison between these numbers?
Using my vision as a human having learned basic programming, I could imagine the following algorithm about == implementation:
b = 0
while b < 8:
if first_number[b] != second_number[b]:
return False
b += 1
return True
Basically this is like comparing each bit step by step, and stop before the end if two bits are different.
Thus we note that the comparison stops at the first iteration compared 0 and 255, while it stops at the last if 0 and 1 are compared.
The first comparison would be 8 times faster than the second.
In practice, I doubt that is the case. But is this theoretically true?
If not, how does the computer work?
A comparison between integers is tipically implemented by the cpu as a subtraction, whose result sign contains information about which number is bigger.
While a naive implementation of subtraction executes one bit at a time (because every bit needs to know the carry of the preceding one), tipical implementation use a carry-lookahead circuit that allows the calculation of more result bits at the same time.
So, the answer is: no, every comparison takes almost the same time for every possible input.
Hardware is fundamentally different from the dominant programming paradigms in that all logic gates (or circuits in general) always do their work independently, in parallel, at all times. There is no such thing as "do this, then do that", only "do this here, feed the result into the circuit over there". If there's a circuit on the chip with input A and output B, then the circuit always, continuously, updates B in accordance with the current values of A — regardless of whether the result is needed right now "in the big picture".
Your pseudo code algorithm doesn't even begin to map to logic gates well. Instead, a comparator looks like this in Verilog (ignoring that there's a built-in == operator):
assign not_equal = (a[0] ^ b[0]) | (a[1] ^ b[1]) | ...;
Where each XOR is a separate logic gate and hence works independently from the others. The results are "reduced" with a logical or, i.e. the output is 1 if any of the XORs produces a 1 (this too does some work in parallel, but the critical path is longer than one gate). Furthermore, all these gates exist in silicon regardless of the specific bit values, and the signal has to propagate through about (1 + log w) gates for a w-bit integer. This propagation delay is again independent of the intermediate and final results.
On some CPU families, equality comparison is implemented by subtracting the two numbers and comparing the result to zero (using a circuit as described above), but the same principle applies. An adder/subtracter doesn't get slower or faster depending on the values.
Not to mention that instructions in a CPU can't take less than one clock cycle anyway, so even if the hardware would finish more quickly, the next instruction still wouldn't start until the next tick.
Now, some hardware operations can take a variable amount of time, but that's because they are state machines, i.e. sequential logic. Technically one could implement the moral equivalent of your algorithm with a state machine, but nobody does that, it's harder to implement than the naive, un-optimized combinatorial circuit above, and less efficient to boot.
State machine circuits are circuits with memory: They store their current state and always compute the outputs (depending on the current state) and the next state (depending on current state and inputs) each clock cycle. On some inputs they may go through N states until they produce an output, and N+x on other inputs. ALU operations generally don't do that though. Pipeline stalls, branch mispredictions, and cache misses are common reasons one instruction takes longer than usual in some circumstances. Properly reasoning about these in a way that helps programmers write faster code is hard though: You have to take into account all the tricky and quirks of real hardware, and there's a lot of those. Empirical evidence, i.e. benchmarking a real black box CPU, is vital.
When it gets down to the assembly the cmp instruction is used regardless of the contents of the variables.
So there is no performance difference.

I have a few questions about ALU's....

So, I've been trying to learn about computers for the last few months and really learn in detail how they work. I was learning about subtractors recently and I was wondering..
First of all, to my understanding, a subtractor uses two's compliment to get a result. But, why does it subtract? for example, the two's compliment of 5 (0101) is 1011. But, that also is a positive eleven. Even though the number gets negated, what makes the subtractor take that as a negative number instead of another positive number? If the problem was 8 - 5, what stops it from doing 8 +11?
What makes it recognize a signed bit from an unsigned bit? I've heard the program running decided, but then the question would be what gives the program the ability to decide whether to add or subtract and how is that interpreted to the CPU and AlU.
Also, I've learned that AlU's use one circuit that switches forth between addition and subtraction. How does this circuit work? What makes it decide whether to add or subtract?
Lastly, how does this circuit switch from addition to subtraction? The only subtractor I've been shown is an adder with not gates attached to it? How does the circuitry differ in something that can change functions?
Subtraction is nothing else than addition with the second operand being a negative number. Two's complement is constructed so that adding with a negative number simply works as expected (when ignoring overflow).
The ALU does not need to know if a number is positive or negative, but in two's complement all numbers with the most significant bit set to 1 is a negative number. The ALU doesn't care because two's complement is designed so that it all just works.
Now, subtraction is just addition so we can use the same circuits to carry out both functions. What the switch you are talking about does is that it negates the second operand (negation is pretty easy in two's complement, there are a few ways to do it), then it adds the numbers.

Overflow and Carry flags

Is it possible to add two signed 8-bit numbers together and set both the carry and overflow bits?
Per your comments, your question seems to be "is it possible to have both carry and overflow set for a two's complement add involving signed number?" It is. The typical implementation is to take the exclusive-OR of the carry-in for the last adder with the carry-out at the end of the chain -- hence, an overflowing addition of negative numbers will cause the carry-out bit to be set and the overflow bit to be set.
Here's an example, add -1 to -128:
Carry 10000 0000
1000 0000 (-128)
1111 1111 (-1)
---------
0111 1111 (oops, this is 127!)
Carry will be set, since the last add resulted in a carry -- and overflow will be set based on the rule above (also, note that -128 added to -1 is obviously not 127)
You don't have access to the flags in C, even if you could get the compiler to generate code that set them, you have have no way to use them.
You can write your own add routine in C that will return carry and overflow flags for
signed 8-bit operands. If you're referring to the hardware carry and overflow bits inside
the processor, no, that cannot be done portably in C.

Resources