How to see the local variable in DDC-I debugger? - debugging

I am trying to see the index value of for loop in DDC-I debugger and it always shows me ERROR.
With the assembly of the same, it shows the following instruction:
cmp cr7,0,r20,r23
so it's comparing r20 and r23 but both of these registers don't hold the index value. I am not sure what is cr7 ?

In short, most embedded tool chains (including the ones you pay for) are horrible about reconstructing local/automatic variables in even lightly optimized code. A lot of them simply can't reconstruct variables that never have storage because they live in registers the whole time (loop index variables like the one you can't see are typical cases). Some even have issues with interim computation holders, and arguments (since they're almost always passed as registers).
Typical strategies might be:
Temporarily turning off optimizations around the code in question
Temporarily moving the variable in question to the global scope
Becoming proficient at reading disassembly.
This isn't a terribly practical answer, but it is surprising for a lot of people that are new to the embedded world or never had the luxury of a source level debugger on their embedded platform.

On PowerPC there are eight CR fields, cr0 to cr7. If you don't specify a CR field for a compare result the default is cr0, but in this case cr7 is specified and so the flags in field cr7 will indicate the result of the compare operation. There are 4 condition code bits in each CR field: lt, gt, eq and so. Typically the compare will be followed by a conditional branch, bc.
There is some useful info in this IBM developerWorks article: Assembly language for Power Architecture, Part 3: Programming with the PowerPC branch processor.

Related

Automatically LTORGing every 510 instructions

I have a beefy section of code that makes significant use of the constant pool. I am aware that by doing this, I must LTORG at least every 511 instructions (PC relative addressing is limited to 4k, instructions are 4 byte, and addressing is signed so absolute distance is one less than half) to ensure the constant pools are close enough to their use.
I could, of course, keep track of this myself, but this is manual and a bit of a pain (especially in the presence of macros). Are there any special features of gcc/gas (or macro tricks, etc.) that will automatically LTORG every 511 instructions for me? Ideally I'd like it to insert:
b lxxx
.ltorg
lxxx:
(Where lxxx is some unique label)
Bonus points if it only LTORGs after 510 instructions following the last instruction that uses an expression literal (that isn't in a constant pool). As an example, if there are 1024 sequential instructions that don't use expression literals, it shouldn't place an LTORG after them. But if immediately after that there is one instruction that uses an expression literal, then it will LTORG after 510 instructions (and after that the counter reset until the next instruction that uses an expression literal is reached).
Bonus bonus points if the above but it doesn't reset the counter unless the constant pool is actually used (ie. =1 doesn't use it, so it doesn't reset the counter, but =1234567689 does so an instruction using that expression literal would start/continue the countdown from 510).
edit: My initial thinking is if you wrapped every instruction in a macro that subtracts one from a global arithmetic variable (starting at 510). Once it reaches 0, an LTORG is omitted (it's unclear if you can get the assembler to give you a unique label to branch around the LTORG). You can get the "bonus points" with this solution by special casing the wrapper for LDR such that only it can restart the counter if it's negative. If there was a way of inspecting expression literals (to see if they need the constant pool), then you could special case the reset to only apply when the constant pool is used.
edit: edit: It's unclear if you can do what I've proposed generically, because label-expressions may be resolved at link time (after the assembler has run all the macros). But, I'd still be interested in a solution that works for just expressions (eg. ldr r0, =12345678).

How Does Assembly Code Generation Work?

I've been studying compiler design a lot recently. I've managed to get a strong grasp of the parsing stage, but am having a bit of trouble understanding how code generation works.
From what I've read, there seems to be 3 major steps in the code generation phase:
Instruction Selection (Greedy Tiling)
Instruction Scheduling
Register Allocation
Now, instruction scheduling is a little beyond what I'm trying to do at the moment, and I think with a bit more studying and prototyping, I can probably wrap my mind around the graph coloring algorithm for register allocation.
What stumps me is the first step, instruction selection. From what I've read about it, each instruction in a target machine language is represented by a tile; and the goal is to find the instructions that match the largest parts of the tree (hence the nickname, greedy tiling).
The thing I'm confused about is, how do you select instructions when they don't actually correspond 1:1 with the syntax tree?
Take for example, accumulator-based architectures like the Z80 or MIPs single instruction architecture. Performing even 16-bit integer arithmetic on a Z80 may require the use of the accumulator or shadow registers.
There are also some instructions that can only be used on certain registers despite them being general purpose.
Would I be right in assuming the following?
a) A tile may consist of a sequence of instructions that match a syntax tree pattern, rather than just a 1:1 match.
b) The code generator generates code for a stack-based architecture (or an architecture with infinite temporary registers) first and expands and substitutes instructions as necessary somehow during the register allocation phase.
a) A tile can emit arbitrary number of instructions. For example, if you have instruction like %x <- %y + %z, but the target machine has only two-address instructions, then a matching tile might emit the assembly sequence (destination is first operand)
mov %x, %y
add %x, %z
b) what kind of register (or const, or mem reference) is allowed as an operand to an instruction is determined by the instruction itself, hence the instruction selection phase has to work on representation with symbolic register names (pseudo registers). Register allocation phase may indeed emit addition instructions, e.g. spill/load code when a register of a required class is not available for allocation.
Check this
Survey on Instruction Selection: an Extensive and Modern Literature Review

In a Warren's Abstract Machine, how does bind work, if one of the arguments is a register?

I'm trying to create my own WAM implementation and I'm stuck at the exercise 2.4
I can't understand how to execute instruction unify_value X4 in figure 2.4.
As far as I understand, this instruction should unify Y from the program with f(W) from the query.
unify_value X4 calls unify (X4,S) where S=2 (see Figure 2.1) and a corresponding heap cell is "REF 2", and X4 is "STR 5".
Unify (Figure 2.7) should bind those values, but I do not understand how to deref a register.
"REF 2" is in the heap, "STR 5" is in a register. How do you bind something to a register?
We are talking about Warren's "New" Engine, WAM and not the Old Engine, known as PLM.
In the WAM variables are allocated in two places.
the local stack (environment stack)
the heap
Registers cannot hold variables. However, they may hold references to variables. Note that references from the heap only point into the heap.
Much related to your question is the pretty ingenious way how the WAM maintains this order and at the same time has very cheap last-call optimization. At the point in time of a (determinate) last call, the local variables that are arguments of the last call must be moved somehow. In more traditional Prolog machines like the ZIP this is an extremely laborious undertaking which essentially requires to scan the environment frame for variables still sitting in them.
The WAM however has a much better calling convention: Most variables are already in a safe place, which can be trivially analyzed during compilation. The very few remaining need an explicit PUT_UNSAFE instruction where the value is checked, and should it still be a local variable that variable is transferred onto the heap.
Consider what is a safe variable in the WAM:
All variables occurring in the head
All variables that appear as an argument of a structure
Thus only variables that appear first in a goal and in the last goal and that do not appear in some structure must have a PUT_UNSAFE. That is not that much. Further, the dynamic check may reduce the actual copying onto the heap to a minimum.
At first this PUT_UNSAFE looks like a lot of work, but never forget that the WAM permits to remove many PUTs, while the ZIP has to execute at least one instruction for each argument.
Here is a tiny typical example using GNU:
a --> b, c.
expanded to:
a(S0,S) :- b(S0,S1), c(S1,S).
and compiled using the command pl2wam to:
predicate(a/2,1,static,private,monofile,global,[
allocate(2),
get_variable(y(0),1), % S
put_variable(y(1),1), % S1
call(b/2),
put_unsafe_value(y(1),0), % S1
put_value(y(0),1), % S
deallocate,
execute(c/2)]).

tutorials on first pass and second pass of assembler

Are there good tutorials around that explain about the first and second pass of assembler along with their algorithms ? I searched a lot about them but haven't got satisfying results.
Please link the tutorials if any.
Dont know of any tutorials, there really isnt much to it.
one:
inc r0
cmp r0,0
jnz one
call fun
add r0,7
jmp more_fun
fun:
mov r1,r0
ret
more_fun:
The assembler/software, like a human is going to read the source file from top to bottom, byte 0 in the file to the end. there are no hard and fast rules as to what you complete in each pass, and it is not necessarily a pass "on the file" but a pass "on the data".
First pass:
As you read each line you parse it. You are building some sort of data structure that has the instructions in file order. When you come across a label like one:, you keep track of what instruction that was in front of or perhaps you have a marker between instructions however you choose to implement it. When you come across an instruction that uses a label you have two choices, you can right now go look for that label, and if it is a backwards looking label then you should have seen it already like the jnz one instruction. IF you have thus far been keeping track of the number and size (if variable word length) instructions you can choose to encode this instruction now if it is a relative instruction, if the instruction set uses absolute you might have to just leave a placeholder anyway.
Now the call fun and jump more_fun instructions pose a problem, when you get to these instructions you cannot resolve them at this time, you dont know if these labels are local to this file or are in another file, so you cannot encode this instruction on the first pass, you have to save it for later, and this is the reason for the second pass.
The second pass is likely to be a pass across your data structures and not actually on the file, and this is heavily implementation specific. For example you might have a one dimensional array of structures and everything is in there. You may choose to make many passes on that data for example, start one index through the array looking for unresolved labels. When you find an unresolved label, send a second index through the array looking for a label definition. If you dont find it then, application specific, does your assembler create objects to be linked later or does it create a binary does it have to have everything resolved in this one assembly to binary step? If object then you assume this is external, unless application specific, your assembler requires external labels to be defined as external. So whether or not the missing label is an error is application specific. if it is not an error then, application specific, you should encode for the longest/farthest type of branch leaving the address or distance details for the linker to fill in.
For the labels you have found you now have a rough idea on how far. Now, depending on the instruction set and/or features of your assembler, you need to make several more passes on the data. You need to start encoding the instructions, assuming you have at least one flavor of relative distance call or branch instruction, you have to decide on the first encoding pass whether to hope for the, what i assume, is a shorter/smaller instruction for the relative distance branch or assume the larger one. You cant really determine if the smaller one will reach until you get one or a few encoding passes across the instructions.
top:
...
jmp down
...
jnz top
...
down:
As you encode the jmp down, you might choose optimistically to encode it as a smaller (number of bytes/words if variable word length) relative branch leaving the distance to be determined. When you get to the jnz top, lets say it is exactly to the byte just close enough to top to encode using a relative branch. On the second pass though you have to go back and finish the jmp down you find that it wont reach, you need more bytes/words to encode it as a long branch. Now the jnz top has to become a far branch as well (causing down to move again). You have to keep passing over the instructions, computing their distance far/short until you make pass with no changes. Be careful not to get caught in an infinite loop, where one pass you get to shorten an instruction, but that causes another to lengthen, and on the next pass the lengthen one causes the other to lengthen but the second to shorten and this repeats forever.
We could go back to the top of this and in your first pass you might build more than one or several data structures, maybe as you go you build a list of found labels, and a list of missing labels. And the second pass you look through the list of missing and see if they are in the found then resolve them that way. Or maybe on the first pass, and some might argue this is a single pass assembler, when you find a label, before continuing through the file you look back to see if anyone was looking for that label (or if that label had already been defined to declare an error) I would call this a multi pass assembler because it still passes through the data many times.
And now lets make it much worse. Look at the arm instruction set as an example and any other fixed length instruction set. Your relative branches are usually encoded in one instruction, thus fixed length instruction set. A far branch normally involves a load pc from the data found at this address, meaning you really need two items the instruction, then somewhere within the relative reach of that instruction a data word containing the absolute address of where to branch. You can choose to force the user to create these, but with the ARM assemblers for example they can and will do this for you, the simplest example is:
ldr r0,=0x12345678
...
b somewhere
That syntax means load r0 with the value 0x12345678, which does not fit in an arm instruction. What the assembler does with that syntax is it tries to find a dead spot in the code within reach of that instruction where it can place the data value, then it encodes that instruction as a load from pc relative address. For example after an unconditional branch is a good place to hide data. sometimes you have to use directives like .pool to encourage or remind the assembler good places to stick this data. r0 is not the program counter r15 is and you could use r15 there to connect this to the branching discussion above.
Take a look at the assembler I created for this project http://github.com/dwelch67/lsasim, a fixed length instruction set, but I force the user to allocate the word and load from it, I dont allow the shortcut the arm assemblers tend to allow.
I hope this helps explain things. The bottom line is that you cannot resolve lables in one linear pass through the data, you have to go back and connect the dots to the forward referenced labels. And I argue you have to do many passes anyway to resolve all of the long/short encodings (unless the instruction set/syntax forces the user to explicitly specify an absolute vs relative branch, and some do rjmp vs jmp or rjmp vs ljmp, rcall vs call, etc). Making one pass on the "file" sure, not a problem. If you allow include type directives some tools will create a temporary file where it pulls all the includes in creating a single file which has no includes in it, and then the tool makes one pass on this (this is how gcc manages includes for example, save intermediate files sometime and see what files are produced)(if you report line numbers with warnings/errors then you have to manage the temp file lines vs the original file name and line.).
A good place to start is David Solomon's book, Assemblers and Loaders. It's an older book, but the information is still relevant.
You can download a PDF of the book.

Win32 EXCEPTION_INT_OVERFLOW vs EXCEPTION_INT_DIVIDE_BY_ZERO

I have a question about the EXCEPTION_INT_OVERFLOW and EXCEPTION_INT_DIVIDE_BY_ZERO exceptions.
Windows will trap the #DE errors generated by the IDIV instruction and will end up generating and SEH exception with one of those 2 codes.
The question I have is how does it differentiate between the two conditions? The information about idiv in the Intel manual indicates that it will generate #DE in both the "divide by zero" and "underflow cases".
I took a quick look at the section on the #DE error in Volume 3 of the intel manual, and the best I could gather is that the OS must be decoding the DIV instruction, loading the divisor argument, and then comparing it to zero.
That seems a little crazy to me though. Why would the chip designers not use a flag of some sort to differentiate between the 2 causes of the error? I feel like I must be missing something.
Does anyone know for sure how the OS differentiates between the 2 different causes of failure?
Your assumptions appear to be correct. The only information available on #DE is CS and EIP, which gives the instruction. Since the two status codes are different, the OS must be decoding the instruction to determine which.
I'd also suggest that the chip makers don't really need two separate interrupts for this case, since anything divided by zero is infinity, which is too big to fit into your destination register.
As for "knowing for sure" how it differentiates, all of those who do know are probably not allowed to reveal it, either to prevent people exploiting it (not entirely sure how, but jumping into kernel mode is a good place to start looking to exploit) or making assumptions based on an implementation detail that may change without notice.
Edit: Having played with kd I can at least say that on the particular version of Windows XP (32-bit) I had access to (and the processor it was running on) the nt!Ki386CheckDivideByZeroTrap interrupt handler appears to decode the ModRM value of the instruction to determine whether to return STATUS_INTEGER_DIVIDE_BY_ZERO or STATUS_INTEGER_OVERFLOW.
(Obviously this is original research, is not guaranteed by anyone anywhere, and also happens to match the deductions that can be made based on Intel's manuals.)
Zooba's answer summarizes the Windows parses the instruction to find out what to raise.
But you cannot rely on that the routine correctly chooses the code.
I observed the following on 64 bit Windows 7 with 64 bit DIV instructions:
If the operand (divisor) is a memory operand it always raises EXCEPTION_INT_DIVIDE_BY_ZERO, regardless of the argument value.
If the operand is a register and the lower dword is zero it raises EXCEPTION_INT_DIVIDE_BY_ZERO regardless if the upper half isn't zero.
Took me a day to find this out... Hope this helps.

Resources