Translating pseudocode into machine code - pseudocode

For academic purposes, I am being asked to translate this statement
assign x the value 5
Into a machine code that is made up by an author of a computer science book, called brookshear machine code. I am given a hint that is
(HINTS: Assume that the value of x is to be stored into main memory location 47.
Your program would begin by loading a value into a register. You do not need to
specify the memory locations of your program. Don't forget to end the program with
the HALT instruction.)
I am wondering if anyone knows the best way to approach this? He makes it clear to end with the halt instruction but I am unsure what exactly I should be doing.
0iii - No-operation
1RXY - Load register R with contents of location XY
2RXY - Load register R with value XY
3RXY - Store contents of register R at location XY
4iRS - Move contents of register R to register S
5RST - Add contents of registers S and T as binary numbers, place result in register R
6RST - Add contents of registers S and T as floating-point numbers, place result in register R
7RST - OR together the contents of registers S and T , place result in register R
8RST - AND together the contents of registers S and T , place result in register R
9RST - XOR together the contents of registers S and T , place result in register R
ARiZ - Rotate the contents of register R one bit to the right, Z times
BRXY - Jump to instruction at XY if contents of register R equal contents of register 0
Ciii - Halt
DRXY - Jump to instruction at XY if contents of register R are greater than contents of register 0
R,S,T - Register numbers
XY - A one-byte address or data value
Z - A half-byte value
i - Ignored when the instruction is de-coded: usually entered as 0
Above is the machine language I am expected to use.

If only there were an instruction:
EABXY - Store value XY at location AB
If that command existed, your program would be:
E4705 # store '05' at address '47'
C000 # halt
But, that instruction doesn't exist -- partly because it takes five half-byte characters, and the instructions are meant to fit into four.
So you're going to have to simulate the 'E' instruction using two steps.
You can't specify a value to put into an address directly.
There is one instruction that lets you specify a value and put it somewhere.
There is one instruction that copies a value from somewhere, into an address
That's really enough clues.

Related

Can I add a watch to a COMMON block?

I have a very large, very old, very byzantine, very undocumented set of Fortran code that I am trying to troubleshoot. It is giving me divide-by-zero problems at run time due to a section that's roughly like this:
subroutine badsub(ainput)
implicit double precision (a-h,o-z)
include 'commonincludes2.h'
x=dsqrt((r(6)-r(8))**2+(z(6)-z(8))**2)
y=ainput
w=y+x
v=2./dlog(dsqrt(w/y))
This code hits divide by zero on the last line, because y is equal to w because x is zero, and thus dlog(dsqrt(1) is zero.
The include file looks something like this:
common /cblk/ r(12),z(12),otherstuff
There are actually 3 include headers with /cblk/ declaration which I've found from running grep -in "/cblk/" *.h *.f *.F: "commonincludes.h", "commonincludes2.h", and "commonincludes3.h". As an added bonus, the section of memory corresponding to r and z are named x and y in "commonincludes.h", i.e. "commonincludes'h" looks like:
common /cblk/ x(12),y(12),otherstuff
My problem is, I have NO IDEA where r and z are set. I've used grep to find everyplace where each of the headers are included, and I don't see anyplace where the variables are written into.
If I inspect the actual values in r and z in gdb where the error occurs the values look reasonable--they're non-zero, not-garbage-looking vectors of real numbers, it's just that r(6) equals r(8) and z(6) equals z(8) that's causing issue.
I need to find where z and r get written, but I can't find any instruction in the gdb documentation for attaching a watchpoint to COMMON block. How can I find where these are written to?
I think I have figured out how to do what I'm trying to do. Because COMMON variables are allocated statically, their addresses shouldn't change from run to run. Therefore, when my program stops due to my divide-by-zero error, I'm able to find the memory address of (in this example) r(8), which is global in scope and shouldn't change on subsequent runs. I can then re-run the code with a watchpoint on that address and it will flag when the value changes anywhere in the code.
In my example, the gdb session looks like this, with process names and directories filed off to protect the guilty:
Reading symbols from myprogram...
(gdb) r
Starting program: ************
Program received signal SIGFPE, Arithmetic exception.
0x00000000004df96d in badsub (ainput=1875.0000521766287) at badsub.f:109
109 v=2./dlog(dsqrt(w/y))
(gdb) p &r(8)
$1 = (PTR TO -> ( real(kind=8) )) 0xcbf7618 <cblk_+56>
(gdb) watch *(double precision *) 0x0cbf7618
Hardware watchpoint 1: *(double precision *) 0x0cbf7618
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: *************
Hardware watchpoint 1: *(double precision *) 0x0cbf7618
Old value = 0
New value = 6.123233995736766e-17
0x00007ffff6f2be2d in __memmove_avx_unaligned_erms () from /lib64/libc.so.6
I have confirmed from running a backtrace that this is indeed a place (presumably the first place) where my common block variable is being set.

Is the second parameter in ioremap() gives the size in number of bits for a register- Linux?

My NEC microcontroller has a timer controller register 8-bits -
Do, I need to pass 8 in the second parameter of ioremap?
After reading the spec, I got to know the following property of it.
Address |Function Register Name |Symbol |R/W Manipulatable Bits |Default Val
FFFFF590H |TMP0 control register 0 |TP0CTL0 |R/W √ √ |00H
So, I believe that the physical address at which the Timer register TP0CTL0 is mapped is 0xFFFFF590.
Now, I am remapping this register as the following. After reading more description, I got to know that the register is 8-bit in size.
The spec says "The TPnCTL0 register is an 8-bit register that controls the operation of TMPn."
Is this right? I am using the base address as 0xFFFFF590 and the size of this register is 8-bits. Thus, I have given the size as 8-bit. Is it correct? Is the second paramter of ioremap_nocache is in the size of bits? Is my following API is correct? Have I used the parameters correctly in the function - ioremap_nocache.
void *tp0ctl0 = ioremap_nocache(0xFFFFF590, 8);
Next, I am doing the following -
unsigned int val = ioread8(tp0ctl0);
val = 2;
iowrite8(val, tp0ctl0);
Please correct me here. Please let me know, if I am using the API's correctly or not based on the microcontroller information I have.
The size given to ioremap_* is in bytes not bits. The purpose of this function is to map physical address space into kernel virtual address though, so anything greater than zero and less than or equal to the system page size will be equivalent.
Given the information you provided above, ioremap_nocache(0xFFFFF590, 1) would actually be correct. But the effect of "1" versus "8" will be identical since the system page size is (no doubt) larger than both.

What does "a GP/function address pair" mean in IA-64?

What does "a GP/function address pair" mean in Itanium C++ ABI? What does GP stand for?
Short explanation: gp is, for all practical means, a hidden parameter to all functions that comply with the Itanium ABI. It's a kind of this pointer to the global variables the function uses. As far as I know, no mainstream OS does it anymore.
GP stands for "globals pointer". It's a base address for data statically allocated by executables, and the Itanium architecture has a register just for it.
For instance, if you had these global variables and this function in your program:
int foo;
int bar;
int baz;
int func()
{
foo++;
bar += foo;
baz *= bar / foo;
return foo + bar + baz;
}
The gp/function pair would conceptually be &foo, &func. The code generated for func would refer to gp to find where the globals are located. The compiler knows foo can be found at gp, bar can be found at gp + 4 and baz can be found at gp + 8.
Assuming func is defined in an external library, if you call it from your program, the compiler will use a sequence of instructions like this one:
save current gp value to the stack;
load code address from the pair for func into some register;
load gp value from same pair into GP;
perform indirect call to the register where we stored the code address;
restore old gp value that we saved on the stack before, resume calling function.
This makes executables fully position-independent since they don't ever store absolute addresses to data symbols, and therefore makes it possible to maintain only one instance of any executable file in memory, no matter how many processes use it (you could even load the same executable multiple times within a single process and still only have one copy of the executable code systemwide), at the cost of making function pointers a little weird. With the Itanium ABI, a function pointer is not a code address (like it is with "regular" x86 ABIs): it's an address to a gp value and a code address, since that code address might not be worth much if it can't access its global variables, just like a method might not be able to do much if it doesn't have a this pointer.
The only other ABI I know that uses this concept was the Mac OS Classic PowerPC ABI. They called those pairs "transition vectors".
Since x86_64 supports RIP-relative addressing (x86 did not have an equivalent EIP-relative addressing), it's now pretty easy to create position-independent code without having to use an additional register or having to use "enhanced" function pointers. Code and data just have to be kept at constant offsets. Therefore, this part of the Itanium ABI is probably gone for good on Intel platforms.
From the Itanium Register Conventions:
8.2 The gp Register
Every procedure that references statically-allocated data or calls another procedure requires a pointer to its data segment in the gp register, so that it can access its static data and its linkage tables. Each load module has its own data segment, and the gp register must be set correctly prior to calling any entry point within that load module.
The linkage conventions require that each load module define exactly one gp value to refer to a location within its short data segment. It is expected that this location will be chosen to maximize the usefulness of short-displacement immediate instructions for addressing scalars and linkage table entries. The DLL loader will determine the absolute value of the gp register for each load module after loading its data segment into memory.
For calls within a load module, the gp register will remain unchanged, so calls known to be local can be optimized accordingly.
For calls between load modules, the gp register must be initialized with the correct gp value for the new load module, and the calling function must ensure that its own gp value is saved and restored.
Just a comment about this quote from the other answer:
It is expected that this location will be chosen to maximize the usefulness of short-displacement immediate instructions for addressing scalars and linkage table entries.
What this is talking about: Itanium has three different ways to put a value into a register (where 'immediate' here means 'offset from the base'). You can support a full 64 bit offset from anywhere, but it takes two instructions:
// r34 has base address
movl r33 = <my immediate>
;;
add r35 = r34, r35
;;
Not only does that take 2 separate clocks, it takes 3 instruction slots across 2 bundles to make that happen.
There are two shorter versions: add14 (also adds) and add22 (also addl). The difference was in the immediate size each could handle. Each took a single 'A' slot iirc, and completed in a single clock.
add14 could use any register as the source & target, but could only handle up to 14 bit immediates.
add22 could use any register as the target, but for source, only two bits were allocated. So you could only use r0, r1, r2, r3 as the source regs. r0 is not a real register - it's hardwired to 0. But using one of the other 3 as a local stack registers, means you can address 256 times the memory using simple offsets, compared to using the local stack registers. Therefore, if you put your global base address into r1 (the convention), you could access that much more local offsets before having to do a separate movl and/or modifying gp for the next section of code.

More Null Free Shellcode

I need to find null-free replacements for the following instructions so I can put the following code in shellcode.
The first instruction I need to convert to null-free is:
mov ebx, str ; the string containing /dev/zero
The string str is defined in my .data section.
The second is:
mov eax,0x5a
Thanks!
Assuming what you want to learn is how assembly code is made up, what type of instruction choices ends up in assembly code with specific properties, then (on x86/x64) do the following:
Pick up Intel's instruction set reference manuals (four volumes as of this writing, I think). They contain opcode tables (instruction binary formats), and detailed lists of all allowed opcodes for a specific assembly mnemonic (instruction name).
Familiarize yourself with those and mentally divide them into two groups - those that match your expected properties (like, not containing the 'x' character ... or any other specific one), and those that don't. The 2nd category you need to eliminate from your code if they're present.
Compile your code telling the compiler not to discard compile intermediates:gcc -save-temps -c csource.c
Disassemble the object file:objdump -d csource.o
The disassembly output from objdump will contain the binary instructions (opcodes) as well as the instruction names (mnemonics), i.e. you'll see exactly which opcode format was chosen. You can now check whether any opcodes in there are from the 2nd set as per 1. above.
The creative bit of the work comes in now. When you've found an instruction in the disassembly output that doesn't match the expectations/requirements you have, look up / create a substitute (or, more often, a substitute sequence of several instructions) that gives the same end result but is only made up from instructions that do match what you need.
Go back to the compile intermediates from above, find the csource.s assembly, make changes, reassemble/relink, test.
If you want to make your assembly code standalone (i.e. not using system runtime libraries / making system calls directly), consult documentation on your operating system internals (how to make syscalls), and/or disassemble the runtime libraries that ordinarily do so on your behalf, to learn how it's done.
Since 5. is definitely homework, of the same sort like create a C for() loop equivalent to a given while() loop, don't expect too much help there. The instruction set reference manuals and experiments with the (dis)assembler are what you need here.
Additionally, if you're studying, attend lessons on how compilers work / how to write compilers - they do cover how assembly instruction selection is done by compilers, and I can well imagine it to be an interesting / challenging term project to e.g. write a compiler whose output is guaranteed to contain the character '?' (0x3f) but never '!' (0x21). You get the idea.
You mention the constant load via xor to clear plus inc and shl to get any set of bits you want.
The least fragile way I can think of to load an unknown constant (your unknown str) is to load the constant xor with some value like 0xAAAAAAAA and then xor that back out in a subsequent instruction. For example to load 0x1234:
0: 89 1d 9e b8 aa aa mov %ebx,0xaaaab89e
6: 31 1d aa aa aa aa xor %ebx,0xaaaaaaaa
You could even choose the 0xAAAAAAAA to be some interesting ascii!

(8051) Check if a single bit is set

I'm writing a program for a 8051 microcontroller. In the first part of the program I do some calculations and based on the result, I either light the LED or not (using CLR P1.7, where P1.7 is the port the LED is attached to in the microcontroller).
In the next part of the program I want to retrieve the bit, perhaps store it somewhere, and use it in a if-jump instruction like JB. How can I do that?
Also, I've seen the instruction MOV C, P1.7 in a code sample. What's the C here?
The C here is the 8051's carry flag - called that because it can be used to hold the "carry" when doing addition operations on multiple bytes.
It can also be used as a single-bit register - so (as here) where you want to move bits around, you can load it with a port value (such as P1.7) then store it somewhere else, for example:
MOV C, P1.7
MOV <bit-address>, C
Then later you can branch on it using:
JB <bit-address>, <label>
Some of the special function registers are also bit addressable. I believe its all the ones ending in 0 or 8. Don't have a reference in front of me but you can do something like setb r0.1. That way if you need the carry for something you dont have to worry about pushing it and using up space on your stack.

Resources