LC-3 Operands that only use registers - lc3

Hi i had a quick question
Which of the following LC-3 instructions can only have operands that are in registers (i.e. cannot have
immediate or memory operands)?
a. ADD b. NOT c. LD d. AND
From what I can tell this would be NOT as ADD and AND use registers along with immediate values. While LD has the pcoffset? So I was wondering if this is correct?

If you look up the NOT instruction for LC-3, you'll see that the opcode takes the format:
opcode destination-reg source-reg
1001 xxx yyy 111111
or:
1001xxxyyy111111
ADD takes the form:
0001xxxyyy000zzz for register values
0001xxxyyy1zzzzz for immediate values
AND takes the form:
0101xxxyyy000zzz for register values
0101xxxyyy1zzzzz for immediate values
LD takes a register to load into and an offset to the memory:
0010xxxyyyyyyyyy
If you count the offset as an operand, then the answer is NOT and LD, otherwise, it's just NOT.

Related

Documentation for MIPS predefined macros

When I compile a C code using GCC to MIPS, it contains code like:
daddiu $28,$28,%lo(%neg(%gp_rel(f)))
And I have trouble understanding instructions starting with %.
I found that they are called macros and predefined macros are dependent on the assembler but I couldn't find description of the macros (as %lo, %neg etc.) in the documentation of gas.
So does there exist any official documentation that explains macros used by GCC when generating MIPS code?
EDIT: The snippet of the code comes from this code.
This is a very odd instruction to find in compiled C code, since this instruction is not just using $28/$gp as a source but also updating that register, which the compiler shouldn't be doing, I would think.  That register is the global data pointer, which is setup on program start, and used by all code accessing near global variables, so it shouldn't ever change once established.  (Share a godbolt.org example, if you would.)
The functions you're referring to are for composing the address of labels that are located in global data.  Unlike x86, MIPS cannot load (or otherwise have) a 32-bit immediate in one instruction, and so it uses multiple instructions to do work with 32-bit immediates including address immediates.  A 32-bit immediate is subdivided into 2 parts — the top 16-bits are loaded using an LUI and the bottom 16-bits using an ADDI (or LW/SW instruction), forming a 2 instruction sequence.
MARS does not support these built-in functions.  Instead, it uses the pseudo instruction, la $reg, label, which is expanded by the assembler into such a sequence.  MARS also allows lw $reg, label to directly access the value of a global variable, however, that also expands to multiple instruction sequence (sometimes 3 instructions of which only 2 are really necessary..).
%lo computes the low 16-bits of a 32-bit address for the label of the argument to the "function".  %hi computes the upper 16-bits of same, and would be used with LUI.  Fundamentally, I would look at these "functions" as being a syntax for the assembly author to communicate to the assembler to share certain relocation information/requirements to the linker.  (In reverse, a disassembler may read relocation information and determine usage of %lo or %hi, and reflect that in the disassembly.)
I don't know %neg() or %gp_rel(), though could guess that %neg negates and %gp_rel produces the $28/$gp relative value of the label.
%lo and %hi are a bit odd in that the value of the high immediate sometimes is offset by +1 — this is done when the low 16-bits will appear negative.  ADDI and LW/SW will sign extend, which will add -1 to the upper 16-bits loaded via LUI, so %hi offsets its value by +1 to compensate when that happens.  This is part of the linker's operation since it knows the full 32-bit address of the label.
That generated code is super weird, and completely different from that generated by the same compiler, but 32-bit version.  I added the option -msym32 and then the generated code looks like I would expect.
So, this has something to do with the large(?) memory model on MIPS 64, using a multiple instruction sequence to locate and invoke g, and swapping the $28/$gp register as part of the call.  Register $25/$t9 is somehow also involved as the generated code sources it without defining it; later, prior to where we would expect the call it sets $25.
One thing I particularly don't understand, though, is where is the actual function invocation in that sequence!  I would have expected a jalr instruction, if it's using an indirect branch because it doesn't know where g is (except as data), but there's virtually nothing but loads and stores.
There are two additional oddities in the output: one is the blank line near where the actual invocation should be (maybe those are normal, but usually don't see those inside a function) and the other is a nop that is unnecessary but might have been intended for use in the delay slot following an invocation instruction.

Translating pseudocode into machine code

For academic purposes, I am being asked to translate this statement
assign x the value 5
Into a machine code that is made up by an author of a computer science book, called brookshear machine code. I am given a hint that is
(HINTS: Assume that the value of x is to be stored into main memory location 47.
Your program would begin by loading a value into a register. You do not need to
specify the memory locations of your program. Don't forget to end the program with
the HALT instruction.)
I am wondering if anyone knows the best way to approach this? He makes it clear to end with the halt instruction but I am unsure what exactly I should be doing.
0iii - No-operation
1RXY - Load register R with contents of location XY
2RXY - Load register R with value XY
3RXY - Store contents of register R at location XY
4iRS - Move contents of register R to register S
5RST - Add contents of registers S and T as binary numbers, place result in register R
6RST - Add contents of registers S and T as floating-point numbers, place result in register R
7RST - OR together the contents of registers S and T , place result in register R
8RST - AND together the contents of registers S and T , place result in register R
9RST - XOR together the contents of registers S and T , place result in register R
ARiZ - Rotate the contents of register R one bit to the right, Z times
BRXY - Jump to instruction at XY if contents of register R equal contents of register 0
Ciii - Halt
DRXY - Jump to instruction at XY if contents of register R are greater than contents of register 0
R,S,T - Register numbers
XY - A one-byte address or data value
Z - A half-byte value
i - Ignored when the instruction is de-coded: usually entered as 0
Above is the machine language I am expected to use.
If only there were an instruction:
EABXY - Store value XY at location AB
If that command existed, your program would be:
E4705 # store '05' at address '47'
C000 # halt
But, that instruction doesn't exist -- partly because it takes five half-byte characters, and the instructions are meant to fit into four.
So you're going to have to simulate the 'E' instruction using two steps.
You can't specify a value to put into an address directly.
There is one instruction that lets you specify a value and put it somewhere.
There is one instruction that copies a value from somewhere, into an address
That's really enough clues.

What is "=qm" in extended assembler

I was looking through an Intel provided reference implementation of RDRAND instruction. The page is Intel Digital Random Number Generator (DRNG) Software Implementation Guide, and the code came from Intel Digital Random Number Generator software code examples.
The following is the relevant portion from Intel. It reads a random value and places it in val, and it sets the carry flag on success.
char rc;
unsigned int val;
__asm__ volatile(
"rdrand %0 ; setc %1"
: "=r" (val), "=qm" (rc)
);
// 1 = success, 0 = underflow
if(rc) {
// use val
...
}
Soory to have to ask. I don't think it was covered in GNU Extended Assembler, and searching for "=qm" is producing spurious hits.
What does the "=qm" mean in the extended assembler?
What you're looking at is an inline assembler constraint. The GCC documentation is at 6.47.3.1 Simple Constraints and 6.47.3.4 Constraints for Particular Machines under x86 family section. This one (=qm) combines three flags which indicate:
=: The operand is write-only - its previous value is not relevant.
q: The operand must be in register a, b, c, or d (it cannot be in esi, for instance).
m: The operand may be placed in memory.
qm probably means 1 byte 8 bit mem
=qm will be valid constraint for storing 1 byte result
See what setc wants
http://web.itu.edu.tr/~aydineb/index_files/instr/setc.html
reg8 and mem8
as we know only eax , ebx edx ecx .. a,b,c,d registers that q refer can be used cause they can accessed with low byte al dl cl ...With combining qm we are getting mem8 . m meant memory. Thats what I meant
Wow that stumped me at first but I searched around a bit and found out that it is a reference to the model of the processor this peice of code is meant for.
Spicically I read that it is for the i7 Quadcore.
Is that where you got this code from?
It is a simple value indicator for a variable syntax.

More Null Free Shellcode

I need to find null-free replacements for the following instructions so I can put the following code in shellcode.
The first instruction I need to convert to null-free is:
mov ebx, str ; the string containing /dev/zero
The string str is defined in my .data section.
The second is:
mov eax,0x5a
Thanks!
Assuming what you want to learn is how assembly code is made up, what type of instruction choices ends up in assembly code with specific properties, then (on x86/x64) do the following:
Pick up Intel's instruction set reference manuals (four volumes as of this writing, I think). They contain opcode tables (instruction binary formats), and detailed lists of all allowed opcodes for a specific assembly mnemonic (instruction name).
Familiarize yourself with those and mentally divide them into two groups - those that match your expected properties (like, not containing the 'x' character ... or any other specific one), and those that don't. The 2nd category you need to eliminate from your code if they're present.
Compile your code telling the compiler not to discard compile intermediates:gcc -save-temps -c csource.c
Disassemble the object file:objdump -d csource.o
The disassembly output from objdump will contain the binary instructions (opcodes) as well as the instruction names (mnemonics), i.e. you'll see exactly which opcode format was chosen. You can now check whether any opcodes in there are from the 2nd set as per 1. above.
The creative bit of the work comes in now. When you've found an instruction in the disassembly output that doesn't match the expectations/requirements you have, look up / create a substitute (or, more often, a substitute sequence of several instructions) that gives the same end result but is only made up from instructions that do match what you need.
Go back to the compile intermediates from above, find the csource.s assembly, make changes, reassemble/relink, test.
If you want to make your assembly code standalone (i.e. not using system runtime libraries / making system calls directly), consult documentation on your operating system internals (how to make syscalls), and/or disassemble the runtime libraries that ordinarily do so on your behalf, to learn how it's done.
Since 5. is definitely homework, of the same sort like create a C for() loop equivalent to a given while() loop, don't expect too much help there. The instruction set reference manuals and experiments with the (dis)assembler are what you need here.
Additionally, if you're studying, attend lessons on how compilers work / how to write compilers - they do cover how assembly instruction selection is done by compilers, and I can well imagine it to be an interesting / challenging term project to e.g. write a compiler whose output is guaranteed to contain the character '?' (0x3f) but never '!' (0x21). You get the idea.
You mention the constant load via xor to clear plus inc and shl to get any set of bits you want.
The least fragile way I can think of to load an unknown constant (your unknown str) is to load the constant xor with some value like 0xAAAAAAAA and then xor that back out in a subsequent instruction. For example to load 0x1234:
0: 89 1d 9e b8 aa aa mov %ebx,0xaaaab89e
6: 31 1d aa aa aa aa xor %ebx,0xaaaaaaaa
You could even choose the 0xAAAAAAAA to be some interesting ascii!

(8051) Check if a single bit is set

I'm writing a program for a 8051 microcontroller. In the first part of the program I do some calculations and based on the result, I either light the LED or not (using CLR P1.7, where P1.7 is the port the LED is attached to in the microcontroller).
In the next part of the program I want to retrieve the bit, perhaps store it somewhere, and use it in a if-jump instruction like JB. How can I do that?
Also, I've seen the instruction MOV C, P1.7 in a code sample. What's the C here?
The C here is the 8051's carry flag - called that because it can be used to hold the "carry" when doing addition operations on multiple bytes.
It can also be used as a single-bit register - so (as here) where you want to move bits around, you can load it with a port value (such as P1.7) then store it somewhere else, for example:
MOV C, P1.7
MOV <bit-address>, C
Then later you can branch on it using:
JB <bit-address>, <label>
Some of the special function registers are also bit addressable. I believe its all the ones ending in 0 or 8. Don't have a reference in front of me but you can do something like setb r0.1. That way if you need the carry for something you dont have to worry about pushing it and using up space on your stack.

Resources