I have a MIPS64 binary (readelf tells me it's release 2), and using a corresponding objdump I can see that the first instruction of __start is:
1200009a0: 03e00025 move zero,ra
I do not understand this. Looking at the ISA[note], the opcode (first six bits) is 000000₂, corresponding to the "special" block with function 100101₂ (last six bits): the or instruction (ref. pg. 413). In any case, we see that move is not an instruction anyway (ref. §3.2).
However, I notice that some other instructions present in the file exist and are encoded correctly.
What's going on? Is this an error in objdump or something? How do I resolve it?
[note]Apparently MIPS64 comes in six revisions. Revisions 1–5 are mostly compatible, while release 6 changes many things. I wasn't able to find a release 2 specification, so I linked revision 5. move doesn't occur at least in releases 1, 5, or 6, which is all the specifications I could find.
Related
When I compile a C code using GCC to MIPS, it contains code like:
daddiu $28,$28,%lo(%neg(%gp_rel(f)))
And I have trouble understanding instructions starting with %.
I found that they are called macros and predefined macros are dependent on the assembler but I couldn't find description of the macros (as %lo, %neg etc.) in the documentation of gas.
So does there exist any official documentation that explains macros used by GCC when generating MIPS code?
EDIT: The snippet of the code comes from this code.
This is a very odd instruction to find in compiled C code, since this instruction is not just using $28/$gp as a source but also updating that register, which the compiler shouldn't be doing, I would think. That register is the global data pointer, which is setup on program start, and used by all code accessing near global variables, so it shouldn't ever change once established. (Share a godbolt.org example, if you would.)
The functions you're referring to are for composing the address of labels that are located in global data. Unlike x86, MIPS cannot load (or otherwise have) a 32-bit immediate in one instruction, and so it uses multiple instructions to do work with 32-bit immediates including address immediates. A 32-bit immediate is subdivided into 2 parts — the top 16-bits are loaded using an LUI and the bottom 16-bits using an ADDI (or LW/SW instruction), forming a 2 instruction sequence.
MARS does not support these built-in functions. Instead, it uses the pseudo instruction, la $reg, label, which is expanded by the assembler into such a sequence. MARS also allows lw $reg, label to directly access the value of a global variable, however, that also expands to multiple instruction sequence (sometimes 3 instructions of which only 2 are really necessary..).
%lo computes the low 16-bits of a 32-bit address for the label of the argument to the "function". %hi computes the upper 16-bits of same, and would be used with LUI. Fundamentally, I would look at these "functions" as being a syntax for the assembly author to communicate to the assembler to share certain relocation information/requirements to the linker. (In reverse, a disassembler may read relocation information and determine usage of %lo or %hi, and reflect that in the disassembly.)
I don't know %neg() or %gp_rel(), though could guess that %neg negates and %gp_rel produces the $28/$gp relative value of the label.
%lo and %hi are a bit odd in that the value of the high immediate sometimes is offset by +1 — this is done when the low 16-bits will appear negative. ADDI and LW/SW will sign extend, which will add -1 to the upper 16-bits loaded via LUI, so %hi offsets its value by +1 to compensate when that happens. This is part of the linker's operation since it knows the full 32-bit address of the label.
That generated code is super weird, and completely different from that generated by the same compiler, but 32-bit version. I added the option -msym32 and then the generated code looks like I would expect.
So, this has something to do with the large(?) memory model on MIPS 64, using a multiple instruction sequence to locate and invoke g, and swapping the $28/$gp register as part of the call. Register $25/$t9 is somehow also involved as the generated code sources it without defining it; later, prior to where we would expect the call it sets $25.
One thing I particularly don't understand, though, is where is the actual function invocation in that sequence! I would have expected a jalr instruction, if it's using an indirect branch because it doesn't know where g is (except as data), but there's virtually nothing but loads and stores.
There are two additional oddities in the output: one is the blank line near where the actual invocation should be (maybe those are normal, but usually don't see those inside a function) and the other is a nop that is unnecessary but might have been intended for use in the delay slot following an invocation instruction.
I'm trying to build bootable code for an ARM M7-based embedded system that is able to execute in place at two different locations in the QSPI, so that if one version gets corrupted, the backup version of the image can be executed in a different place.
Compiling with -fpic seems to produce a relocatable code image that is (nearly) able to execute in both places fine. However, the problem is that the data/bss the code refers to is also getting offset by the same amount - that is, the compiler is assuming that the .data and .bss segments live immediately after the .text segment, which isn't true for XIP embedded systems (where the RAM is separate).
As a result, if the original binary was linked to run at 0x60000000 (and using a fixed ram area at 0x20000000) but is then executed in place at 0x60100000 instead , the ram addresses will be shifted by 0x100000 as well (i.e. to 0x20100000), which isn't what I want at all.
Clearly, what I'd like to do is to modify gcc's behaviour so that references to the code (executing in place in two different places in the QSPI) are position-independent, while references to the .data/bss segments (in a fixed position in RAM) are position-dependent (as per normal).
Is this something that gcc can be tweaked to achieve (e.g. by some obscure linker attribute flag)? Or is this just out of its reach? Thanks!
I'm using a PIN based simulator to test some new architectural modifications. I need to test a "new" instruction with two operands (a register and a memory location) using my simulator.
Since it's tedious to use GCC Machine description to add only one instructions it seemed logical to use NOPs or Undefined Instructions. PIN would easily be able to detect a NOP instruction using INS_IsNop, but it would interfere with NOPs added naturally to the code, It also has either no operands or a single memory operand.
The only option left is to use and undefined instruction. undefined instructions would never interfere with the rest of the code, and can be detected by PIN using INS_IsInvalid.
The problem is I don't know how to add an undefined instruction (with operands) using GCC inline assembly. How do I do that?
So it turns out that x86 has an explicit "unknown instruction" (see this). gcc can produce this by simply using:
asm("ud2");
As for an undefined instruction with operands, I'm not sure what that would mean. Once you have an undefined opcode, the additional bytes are all undefined.
But maybe you can get what you want with something like:
asm(".byte 0x0f, 0x0b");
Try using a prefix that doesn't normally apply to an instruction. e.g.
rep add eax, [rsi + rax*4 - 15]
will assemble just fine. Some instruction set extensions are done this way. e.g. lzcnt is encoded as rep bsf, so it executes as bsf on older CPUs, rather than generating an illegal instruction exception. (Prefixes that don't apply are ignored, as required by the x86 ISA.)
This will let you take advantage of the assembler's ability to encode instruction operands, which as David Wohlferd notes in his answer, is a problem if you use ud2.
clang / gcc : Some inline assembly operands can be satisfied with multiple constraints, e.g., "rm", when an operand can be satisfied with a register or memory location. As an example, the 64 x 64 = 128 bit multiply:
__asm__ ("mulq %q3" : "=a" (rl), "=d" (rh) : "%0" (x), "rm" (y) : "cc")
The generated code appears to choose a memory constraint for argument 3, which would be fine if we were register starved, to avoid a spill. Obviously there's less register pressure on x86-64 than on IA32. However, the assembly snippet generated (by clang) is:
movq %rcx, -8(%rbp)
## InlineAsm Start
mulq -8(%rbp)
## InlineAsm End
Choosing a memory constraint is clearly pointless! Changing the constraint to: "r" (y), however (forcing a register) we get:
## InlineAsm Start
mulq %rcx
## InlineAsm End
as expected. These results are for clang / LLVM 3.2 (current Xcode release). The first question: Why would clang select the less efficient constraint in this case?
Secondly, there is the less widely used, comma-separated, multiple alternative constraint syntax:
"r,m" (y), which should evaluate the costs of each alternative, and choose the one that results in less copying. This appears to work, but clang simply chooses the first - as evidenced by: "m,r" (y)
I could simply drop the "m" alternative constraints, but this doesn't express the range of possible legal operands. This brings me to the second question: Have these issues been resolved or at least acknowledged in 3.3? I've tried looking through LLVM dev archives, but I'd rather solicit some answers before unnecessarily restricting constraints further, or joining project discussions, etc.
I had a response on the cfe-dev (clang front end developers' list) from one of the developers:
LLVM currently always spills "rm" constraints in order to simplify the
handling of inline asm in the backend (you can ask on llvmdev if you want
details). I don't know of any plans to fix this in the near future.
So it's clearly a 'known' issue. One of the goals of clang is to correctly handle gcc's inline assembly syntax, amongst other extensions, which it does in this case - just not very efficiently. In short, this isn't a bug, per se.
Since this isn't a bug, I'm going to continue with the "r,m" constraint syntax. I figure that this is the best compromise for now. gcc will choose the best - presumably a register where possible - and clang will force the use of a register by ignoring further options after the comma. If nothing else, it still preserves the semantic intent of the assembly statement, i.e., describing possible constraints, even if they are ignored.
A final note (20130715) : This particular example will not compile using the "r,m" constraint in a single position - we would have to supply an alternative constraint match for each, e.g.,
: "=a,a" (rl), "=d,d" (rh) : "%0,0" (x), "r,m" (y)
This is required for multiple alternative constraints with GCC. But we're getting into territory where GCC has been known to exhibit bugs in the past - whether or not this is true as of 4.8.1, I don't know. Clang works without the alternatives in the other constraints, which is incompatible with GCC syntax, and must therefore be considered a bug.
If performance is critical, use "r", otherwise, stick with "rm" and maybe clang will address this in the future, even as it benefits GCC.
I need to find null-free replacements for the following instructions so I can put the following code in shellcode.
The first instruction I need to convert to null-free is:
mov ebx, str ; the string containing /dev/zero
The string str is defined in my .data section.
The second is:
mov eax,0x5a
Thanks!
Assuming what you want to learn is how assembly code is made up, what type of instruction choices ends up in assembly code with specific properties, then (on x86/x64) do the following:
Pick up Intel's instruction set reference manuals (four volumes as of this writing, I think). They contain opcode tables (instruction binary formats), and detailed lists of all allowed opcodes for a specific assembly mnemonic (instruction name).
Familiarize yourself with those and mentally divide them into two groups - those that match your expected properties (like, not containing the 'x' character ... or any other specific one), and those that don't. The 2nd category you need to eliminate from your code if they're present.
Compile your code telling the compiler not to discard compile intermediates:gcc -save-temps -c csource.c
Disassemble the object file:objdump -d csource.o
The disassembly output from objdump will contain the binary instructions (opcodes) as well as the instruction names (mnemonics), i.e. you'll see exactly which opcode format was chosen. You can now check whether any opcodes in there are from the 2nd set as per 1. above.
The creative bit of the work comes in now. When you've found an instruction in the disassembly output that doesn't match the expectations/requirements you have, look up / create a substitute (or, more often, a substitute sequence of several instructions) that gives the same end result but is only made up from instructions that do match what you need.
Go back to the compile intermediates from above, find the csource.s assembly, make changes, reassemble/relink, test.
If you want to make your assembly code standalone (i.e. not using system runtime libraries / making system calls directly), consult documentation on your operating system internals (how to make syscalls), and/or disassemble the runtime libraries that ordinarily do so on your behalf, to learn how it's done.
Since 5. is definitely homework, of the same sort like create a C for() loop equivalent to a given while() loop, don't expect too much help there. The instruction set reference manuals and experiments with the (dis)assembler are what you need here.
Additionally, if you're studying, attend lessons on how compilers work / how to write compilers - they do cover how assembly instruction selection is done by compilers, and I can well imagine it to be an interesting / challenging term project to e.g. write a compiler whose output is guaranteed to contain the character '?' (0x3f) but never '!' (0x21). You get the idea.
You mention the constant load via xor to clear plus inc and shl to get any set of bits you want.
The least fragile way I can think of to load an unknown constant (your unknown str) is to load the constant xor with some value like 0xAAAAAAAA and then xor that back out in a subsequent instruction. For example to load 0x1234:
0: 89 1d 9e b8 aa aa mov %ebx,0xaaaab89e
6: 31 1d aa aa aa aa xor %ebx,0xaaaaaaaa
You could even choose the 0xAAAAAAAA to be some interesting ascii!