Inferring BRAM on Intel Cyclone 10 LP

Inferring BRAM on Intel Cyclone 10 LP - fpga

I'm trying to port an HDL description of a regex coprocessor written for a Xilinx FPGA to a Cyclone 10 LP, to use it on the Arduino MKR Vidor 4000.
I have a problem with BRAM inference: I am trying to use the BRAM HDL description written for the Xilinx board (https://github.com/leonardo-panseri/cicero-on-vidor4000/blob/master/projects/cicero-cyclone/cicero-rtl/memories/bram.sv), but when I compile it with Quartus Prime Lite 21.1, it gets synthesized as logic instead of as memory blocks. I have even tried putting the Intel attribute 'ramstyle' to force the synthesis tool to correctly recognize it, but it seems to have no effects.
In the compilation log there are only two warning for the file, but they seem uncorrelated to my issue:
Warning (10230): Verilog HDL assignment warning at bram.sv(69): truncated value with size 32 to match size of target (1)
Warning (10268): Verilog HDL information at bram.sv(61): always construct contains both blocking and non-blocking assignments
The really strange thing is that in other places of the project the 'ramstyle' attribute does work, in fact the synthesis report for the RAM is as follows:
RAM Summary report for MKRVIDOR4000
Thu Jun 16 11:41:55 2022
Quartus Prime Version 21.1.0 Build 842 10/21/2021 SJ Lite Edition
---------------------
; Table of Contents ;
---------------------
1. Legal Notice
2. Analysis & Synthesis RAM Summary
----------------
; Legal Notice ;
----------------
Copyright (C) 2021 Intel Corporation. All rights reserved.
Your use of Intel Corporation's design tools, logic functions
and other software and tools, and any partner logic
functions, and any output files from any of the foregoing
(including device programming or simulation files), and any
associated documentation or information are expressly subject
to the terms and conditions of the Intel Program License
Subscription Agreement, the Intel Quartus Prime License Agreement,
the Intel FPGA IP License Agreement, or other applicable license
agreement, including, without limitation, that your use is for
the sole purpose of programming logic devices manufactured by
Intel and sold by Intel or its authorized distributors. Please
refer to the applicable agreement for further details, at
https://fpgasoftware.intel.com/eula.
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
; Analysis & Synthesis RAM Summary ;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+------------------+--------------+--------------+--------------+--------------+------+------+
; Name ; Type ; Mode ; Port A Depth ; Port A Width ; Port B Depth ; Port B Width ; Size ; MIF ;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+------------------+--------------+--------------+--------------+--------------+------+------+
; AXI_top:UIP|coprocessor_top:a_regex_coprocessor|topology_single:a_topology|engine_interfaced:anEngine|engine:anEngine|cache_block_directly_mapped_broadcast:a_cache|altsyncram:content_rtl_0|altsyncram_gll1:auto_generated|ALTSYNCRAM ; M9K ; Simple Dual Port ; 16 ; 48 ; 16 ; 48 ; 768 ; None ;
; AXI_top:UIP|coprocessor_top:a_regex_coprocessor|topology_single:a_topology|engine_interfaced:anEngine|engine:anEngine|fifo:gen_fifo[0].fifo_cc_id|altsyncram:content_rtl_0|altsyncram_sni1:auto_generated|ALTSYNCRAM ; M9K ; Simple Dual Port ; 32 ; 10 ; 32 ; 10 ; 320 ; None ;
; AXI_top:UIP|coprocessor_top:a_regex_coprocessor|topology_single:a_topology|engine_interfaced:anEngine|engine:anEngine|fifo:gen_fifo[1].fifo_cc_id|altsyncram:content_rtl_0|altsyncram_sni1:auto_generated|ALTSYNCRAM ; M9K ; Simple Dual Port ; 32 ; 10 ; 32 ; 10 ; 320 ; None ;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+------------------+--------------+--------------+--------------+--------------+------+------+

The file linked to in the post, and the file the tool is warning about are not the same. The warnings in the posted refer to line numbers that are commented out in the linked file.
Also, the design uses % and / statements in code that changes on the fly which is not good practice in RTL.
if(w_valid) ram[w_addr / RATIO][w_addr % RATIO] <= w_data;
Recommend:
Double check you have the correct file as part of the build in the local project.
Remove the % and / operators from the RTL code that is changing on the fly. Its ok to use these for statements determined at elaboration time, its a better practice to not use them for dynamic code. Some exceptions can be a case where you know the results are on mod2 boundaries. For debugging 'why does this not infer RAM', remove them.
For RTL code which is not evaluated at compile-elaboration time:
When dividing is by 2^something, use shift operators rather than /.
When calculating the mod and the argument is mod 2^something, use the something number of LSBs of the argument as the mod, avoid the % operator.
When diving or getting the mod of something not a power of 2, synthesis does not handle the operation well in most cases.

Related

why "bcl 20, 31, $+4" instructoin for obtaining the address of next instruction in LR? why not "bl $+4" or "bcl 20, xx, $+4"?

I have read the last Power ISA manual from IBM (https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0) and says [in page 35; 2.4 Branch Instructions]:
Obtaining the address of the next instruction:Use the following form of Branch and Link.
bcl 20, 31, $+4
where this standard comes from? It appears in all manual of powerpc instruction set, but there is some reason for write that?
In fact, the GNU compiler use that instruction.

Because on Power the instruction pointer isn't a general register we use this trick to get it into the LR.
$+4 is the next instruction which we jump to.
20 - corresponds in binary to 10100 which in Figure 40 means branch always.
31 is a condition register - which is ignored because of branch always.
So with this instruction we end up at the next instruction, and as a side effect we have LR containing the instruction address (of the instruction after the branch).
LR is effectively a return address (p.37 the if LK then LR <- CIA + 4 text) of the "subroutine" we just called (and don't need to return from. In Power this is used as a quick form compared to the stack).

What is "=qm" in extended assembler

I was looking through an Intel provided reference implementation of RDRAND instruction. The page is Intel Digital Random Number Generator (DRNG) Software Implementation Guide, and the code came from Intel Digital Random Number Generator software code examples.
The following is the relevant portion from Intel. It reads a random value and places it in val, and it sets the carry flag on success.
char rc;
unsigned int val;
__asm__ volatile(
"rdrand %0 ; setc %1"
: "=r" (val), "=qm" (rc)
);
// 1 = success, 0 = underflow
if(rc) {
// use val
...
}
Soory to have to ask. I don't think it was covered in GNU Extended Assembler, and searching for "=qm" is producing spurious hits.
What does the "=qm" mean in the extended assembler?

What you're looking at is an inline assembler constraint. The GCC documentation is at 6.47.3.1 Simple Constraints and 6.47.3.4 Constraints for Particular Machines under x86 family section. This one (=qm) combines three flags which indicate:
=: The operand is write-only - its previous value is not relevant.
q: The operand must be in register a, b, c, or d (it cannot be in esi, for instance).
m: The operand may be placed in memory.

qm probably means 1 byte 8 bit mem
=qm will be valid constraint for storing 1 byte result
See what setc wants
http://web.itu.edu.tr/~aydineb/index_files/instr/setc.html
reg8 and mem8
as we know only eax , ebx edx ecx .. a,b,c,d registers that q refer can be used cause they can accessed with low byte al dl cl ...With combining qm we are getting mem8 . m meant memory. Thats what I meant

Wow that stumped me at first but I searched around a bit and found out that it is a reference to the model of the processor this peice of code is meant for.
Spicically I read that it is for the i7 Quadcore.
Is that where you got this code from?
It is a simple value indicator for a variable syntax.

Where is the "Zero divide" done in kernel for Arm Cortex A-9

I am looking into kernel source code (2.6.35 ) for Zero divide .
I inserted Zero divide in user space program and all threads stopped.
So I want to know Where is the "Zero divide" done in kernel for Arm Cortex A-9?
I am not able to find any trap for this ....
Thanks

It depends on the architecture. Given the following user space code on an x86 system:
main() {
int x = 42 / 0;
}
the compiler inserts a idivl command into the object code. When this command is executed with a divisor of 0, the CPU generates a division by zero trap (similar to an interrupt). This calls the divide_error trap handler inside the kernel, in case of x86 it is located in arch/x86/kernel/entry_32.S:
ENTRY(divide_error)
RING0_INT_FRAME
pushl_cfi $0 # no error code
pushl_cfi $do_divide_error
jmp error_code
CFI_ENDPROC
END(divide_error)
The error_code target then takes care of all necessary actions to handle the error and finally returns from the trap.
On ARM, things are different: With a few exceptions, ARM CPUs do not have a hardware division instruction (e.g. Arm Cortex A-9 does not have one). Division needs to be implemented as a library function. For the kernel, this is implemented in arch/arm/lib/lib1funcs.S where you also find the division by zero handling. For user space applications, I suppose this is implemented as a library function in the libgcc library.

More Null Free Shellcode

I need to find null-free replacements for the following instructions so I can put the following code in shellcode.
The first instruction I need to convert to null-free is:
mov ebx, str ; the string containing /dev/zero
The string str is defined in my .data section.
The second is:
mov eax,0x5a
Thanks!

Assuming what you want to learn is how assembly code is made up, what type of instruction choices ends up in assembly code with specific properties, then (on x86/x64) do the following:
Pick up Intel's instruction set reference manuals (four volumes as of this writing, I think). They contain opcode tables (instruction binary formats), and detailed lists of all allowed opcodes for a specific assembly mnemonic (instruction name).
Familiarize yourself with those and mentally divide them into two groups - those that match your expected properties (like, not containing the 'x' character ... or any other specific one), and those that don't. The 2nd category you need to eliminate from your code if they're present.
Compile your code telling the compiler not to discard compile intermediates:gcc -save-temps -c csource.c
Disassemble the object file:objdump -d csource.o
The disassembly output from objdump will contain the binary instructions (opcodes) as well as the instruction names (mnemonics), i.e. you'll see exactly which opcode format was chosen. You can now check whether any opcodes in there are from the 2nd set as per 1. above.
The creative bit of the work comes in now. When you've found an instruction in the disassembly output that doesn't match the expectations/requirements you have, look up / create a substitute (or, more often, a substitute sequence of several instructions) that gives the same end result but is only made up from instructions that do match what you need.
Go back to the compile intermediates from above, find the csource.s assembly, make changes, reassemble/relink, test.
If you want to make your assembly code standalone (i.e. not using system runtime libraries / making system calls directly), consult documentation on your operating system internals (how to make syscalls), and/or disassemble the runtime libraries that ordinarily do so on your behalf, to learn how it's done.
Since 5. is definitely homework, of the same sort like create a C for() loop equivalent to a given while() loop, don't expect too much help there. The instruction set reference manuals and experiments with the (dis)assembler are what you need here.
Additionally, if you're studying, attend lessons on how compilers work / how to write compilers - they do cover how assembly instruction selection is done by compilers, and I can well imagine it to be an interesting / challenging term project to e.g. write a compiler whose output is guaranteed to contain the character '?' (0x3f) but never '!' (0x21). You get the idea.

You mention the constant load via xor to clear plus inc and shl to get any set of bits you want.
The least fragile way I can think of to load an unknown constant (your unknown str) is to load the constant xor with some value like 0xAAAAAAAA and then xor that back out in a subsequent instruction. For example to load 0x1234:
0: 89 1d 9e b8 aa aa mov %ebx,0xaaaab89e
6: 31 1d aa aa aa aa xor %ebx,0xaaaaaaaa
You could even choose the 0xAAAAAAAA to be some interesting ascii!

PPC breakpoints

How is a breakpoint implemented on PPC (On OS X, to be specific)?
For example, on x86 it's typically done with the INT 3 instruction (0xCC) -- is there an instruction comparable to this for ppc? Or is there some other way they're set/implemented?

With gdb and a function that hexdumps itself, I get 0x7fe00008. This appears to be the tw instruction:
0b01111111111000000000000000001000
011111 31
11111 condition flags: lt, gt, ge, logical lt, logical gt
00000 rA
00000 rB
0000000100 constant 4
0 reserved
i.e. compare r0 to r0 and trap on any result.
The GDB disassembly is simply the extended mnemonic trap
EDIT: I'm using "GNU gdb 6.3.50-20050815 (Apple version gdb-696) (Sat Oct 20 18:20:28 GMT 2007)"
EDIT 2: It's also possible that conditional breakpoints will use other forms of tw or twi if the required values are already in registers and the debugger doesn't need to keep track of the hit count.

Besides software breakpoints, PPC also supports hardware breakpoints, implemented via IABR (and possibly IABR2, depending on the core version) registers. These are instructions breakpoints, but there are also data breakpoints (implemented with DABR and, possibly, DABR2). If your core supports two sets of hardware breakpoint registers (i.e. IABR2 and DABR2 are present), you can do more than just trigger on a specific address: you can specify a whole contiguous range of addresses as a breakpoint target. For data breakpoints, you can also specify whether you want them to trigger on write, or read, or any access.

Best guess is a 'tw' or 'twi' instruction.
You could dig into the source code of PPC gdb, OS X probably uses the same functionality as its FreeBSD roots.

PowerPC architectures use "traps".
http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=/com.ibm.aix.aixassem/doc/alangref/twi.htm

Instruction breakpoints are typically realised with the TRAP instruction or with the IABR debug hardware register.
Example implementations:
ArchLinux, Apple, Wii and Wii U.

I'm told by a reliable (but currently inebriated, so take it with a grain of salt) source that it's a zero instruction which is illegal and causes some sort of system trap.
EDIT: Made into community wiki in case my friend is so drunk that he's talking absolute rubbish :-)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio