problem in ram in fpga zynq 7020, someone can give me advice? - fpga

Hello I get a strange message when I try to run the MAP, I set the RAM properly and also checked that it uses only 80% of the resources I have on the card. Why do I get this message? Can anyone advise me what to do? And why do I have this message?
The error i got when i try to Synthesize the label "map" to get a bit file
enter image description here - summery of the resources.
enter image description here
ERROR:Place:543 - This design does not fit into the number of slices available
in this device due to the complexity of the design and/or constraints.
Unplaced instances by type:
BLOCKRAM 77 (55.0)
Please evaluate the following:
BLOCKRAM
u_xyz2lcd_for_test/u_send_to_zedboard/dpr_2/U0/xst_blk_mem_generator/gnativeb
mg.native_blk_mem_gen/valid.cstr/ramloop[6].ram.r/v6_noinit.ram/NO_BMM_INFO.S
DP.SIMPLE_PRIM18.ram
BLOCKRAM

It simply means that you want to use more RAM then the device has.
I suggest you check your resources again and check the amount of memory used.
Your 80% may be LUTs or FFs or you may have read something wrong.
There is another possibility although it is very rare:
You memory usage may increase in Place And Route if it has to split the memory over multiple blocks because you have some weird configuration.
This example may not be valid bit it tries to show what can happen:
Suppose you use bit-write enables. Synthesis thinks you have enough memory but PAR has to use a byte for each bit, thus PAR needs to splits the data over more blocks and in the end runs out.
The case where I have seen this was a very complex one with DSPs.

Related

External memory Data Copy through SPI -- Speed

Any experience still seems to be insufficient to answer those strange issues that pop up in serial communication buses. We are trying to implement a data copy from an external flash in to the SRAM. Below are the details how we have configured our system.
Controller : RH850 (D1M1), PLL speed at 60MHz
External Flash (IS25LP128)
SPI speed: 5MHz (clocks observed using oscilloscope)
Data size: 4 MB
Now, in theory, if my SPI is operating at 5MHZ it should copy 5MBits/Sec. We are trying to copy 4MB so essentially it will be 32 Mega Bits. So in theory, our transfer should take about 7 seconds. Ok we have some implication overheads. My driver code can accept only up to 64Kb per read call so we chose to copy 40Kb for about 100 times to achieve this and we run this in a for loop.. Ok let me add a whooping 5 seconds of overhead (Sorry RH850!) so in total 12 seconds; well, lets add some more buffer and make it a comfort zone of 15 sec (Max expected!). But then when we run the code, its taking a whole 40seconds to finish the copy. We have checked the clock and it is 5MHz as expected and at least they are continuous.
Has anyone here faced this? Where can we look in to? Well I know I have some flash-driver provided by my vendor to dig in to but before I do that, I wanted to be sure! Any help will be really appreciated.
At a first glance, I can think about minimum 10 things which may be responsible for this. One thing I'm sure, this problem is complex. There is no simple "one line solution". The main suspect is what is not yours: the flash driver. So, isolate "pieces" one by one and verify them, starting from the bottom.
Is there operating system? DMA in use? Issue with memory or resource arbitration/sharing? Interrupts are in use or polling? Any higher priority jobs are running? Data read from registers or memory mapped? Generic SPI peripheral or special serial flash is used by the driver (I don't know RH850, some uC has it)?
Your post is not precise enough, so maybe these questions will help you. What I would do? My own driver!

Array of values loaded through UART in VHDL

I am working on a project in VHDL wich includes mutliplying matrices. I would like to be able to load data from PC to arrays on FPGA using UART. I am only making my first bigger steps in VHDL and I am not sure if I am taking the right attitude.
I wanted to declare an array of integer signals, and then implement UART to receive data form PC and load it into those signals. However, I can't use for-loop for that, as it will be synthesised to load data parallelly (which is impossible, because values will be comming from PC one after another, using serial port.) And because matrices may be various sizes, in order to assign signals one by one I would need to write lots of specific code (and it appears to be a bad practice to me.)
Is the idea to use an array of signals and load data to those signals through UART realizable? And if my approach is entirely wrong, how could I achieve that?
What you want is doable but you will probably need to design a kind of hardware monitor to act as an intermediate between your UART and your storage (your array of integer signals). This hardware monitor will interpret commands coming from the UART and perform read/write operations in your storage. It will have one interface with the storage and another with the UART. You will have to define a kind of protocol with a syntax for your commands and of sequences of operations for each command.
Example: the monitor waits for commands coming from the UART. The first received character indicates whether it is a read (0) or a write (1). The four next characters are the target address, least significant byte first. If the command is a read, the monitor reads the data at the specified address in your storage and sends it to the UART, one byte at a time, least significant byte first. If the command is a write, the address is followed by a data to write in your storage at the specified address, least significant byte first, and your monitor waits until the data is received and writes it in your storage.
Optionally, the monitor could send an exit status byte at the end of each command to indicate potential errors (protocol errors, unmapped addresses, write attempts in read-only regions...)
Of course, depending on the characteristics of your application, you will probably define a completely different protocol, simpler or more complex, but the principle will be the same.
All this is usually implemented in software and runs on a CPU that has the UART as peripheral and the storage in its memory space. But if you do not have a CPU...
Warning: this is quite complex. The UART itself is quite complex. Not sure you should start with this if you are a VHDL beginner.
Your approach is not entirely wrong but you have a software orientated way of expressing this which indicate you are missing the fundamentals. People with strong software backgrounds tend to think in terms of the programming language and not in terms of the actual FPGA specific structures they want to achieve. It is the important to unlearn this if you want to be successful in designing for FPGA.
Based on what I just wrote you should consider in what type of FPGA structure you would like to store the data. The speed, resource and power requirements govern this choice. One suitable way to store the data would be in either a single or an array of either Block RAM or LUTRAM. Both of these structures can be inferred by using a signal of an array type in the hardware description language which is why I said you are not entirely off track. Consult the manual of your synthesis tool to find templates for how to infer these structures. An alternative is to use a vendor IP block or to instantiate a primitive directly but both those methods are clumsier in my opinion.
Important parameters to consider are the total number of words you need to store, the size of a word and the number of read/write operations per clock cycle. For higher number of reads per cycle an array of memories must be used since most FPGA memories only support two reads per cycle.

Combining two wires in verilog

I'm designing a Single Cycle CPU.
I have designed both the data path and controller for this CPU.
Now I have encountered a problem.
For the Instruction Memory and Data Memory, there should be a way out for inputs and outputs out of the CPU, since it is needed to write data to IM and read data from DM, and viceversa.
But the way I have designed my data path, these two memories are part of the data path.
since for writing to a memory, you need to provide an address and a data, and in the data path there are already wires connected to these memories, I don't know how I should connect two wires to a single input/output place.
for example, for writing to the IM, I provide the inputs "IM_address" and "IM_data_in".
but in the data path, the wires connected to the address input of this memory are outputs of other components, so I cannot assign the IM_address wire to this place because it should both be an input and an output at the same time.
now I know that there is something called an "inout" , but I'm not familiar with the usage of it, and I am also not sure that this might apply to my situation.
if anybody could give me a help on this, I would very much appreciate it!
thanks in advance
Only one component can read or write to any memory location at a time. If two components ever need to access the same memory, you either need to duplicate the memory and give each person their own copy, or create an arbitration scheme to prevent both components from reading/writing at the same time.
It sounds to me like you need to be using a multiplexer and selecting who is able to write to the instruction memory at any given time. I would think though that you should only be writing to the instruction memory at initialization, to program your CPU. Why would other components need to access the instruction memory?
A Multiplexer, or mux for short, is able to select one of a number of inputs to a single output. The signal that does the selection needs to be set by you.

Design a 256x8 bit RAM using 64 rows and 32 columns programmatically using VHDL

I am new to VHDL programming, I am going to do a project on Built-In Self-Repair.In this project am going to design RAMs of different sizes(256 B,8kB,16kB,32kB)etc. and those rams has to be tested using BIST and then they should be repaired.So please help me by giving an example like how to design RAM with 'n' rows and columns
Start by drawing a block diagram of the RAM at the level of abstraction you want (probably gate-level). Then use VHDL to describe the block diagram.
You should probably limit yourself to a behavioral description, i.e., don't expect to be able to synthesize it. Synthesis for FPGAs usually expects a register-transfer-level description, and synthesis for ASICs is not something I would recommend for a VHDL beginner.
I will assume you want to work with SRAM, since this is the simplest case. Also, let's suppose you want to model a RAM with RAM_DEPTH words, and each word is RAM_DATA_WIDTH bits wide. One possible approach is to structure your solution in three modules:
One module that holds the RAM bits. This module should have the typical ports for a RAM: clock, reset (optional), write_enable, data_in, data_out. Note that each RAM word should be wide enought to hold the data bits, plus the parity bits, which are redundant bits that will allow yout to correct any errors. You can read about Hamming codes used for memory correction here: http://bit.ly/1dKrjV5. You can see a RAM modilng example from Doulos here: http://bit.ly/1aq1tn9.
A second module that loops through all memory locations, fixing them as needed. This should happen right after reset. Note that this will probably take many clock cycles (at least RAM_DEPTH clock cycles). Also note that it won't be implemented as a loop in VHDL. You could implement it using a counter, then use the count value as a read address, pass the data value through a an EDC function, and then write the corrected value back to the RAM module.
A top-level entity (optional), that instantiates modules (1) and (2), and coordinates the process. This module could have a 'init_done' pin that will be asserted after the verification and correction take place. This pin should be checked by the modules that use your RAM to know whether it is safe to start using the RAM.
To summarize, you could loop through all memory locations upon reset, fixing them as needed using an error-correcting code. After making sure all memory locations are ok, just assert an 'init_done' signal.

allocate large (32mb) contiguous region

Is it at all possible to allocate large (i.e. 32mb) physically contiguous memory regions from kernel code at runtime (i.e. not using bootmem)? From my experiments, it seems like it's not possible to get anything more than a 4mb chunk successfully, no matter what GFP flags I use. According to the documentation I've read, GFP_NOFAIL is supposed to make the kmalloc just wait as long as is necessary to free the requested amount, but from what I can tell it just makes the request hang indefinitely if you request more than is availble - it doesn't seem to be actively trying to free memory to fulfil the request (i.e. kswapd doesn't seem to be running). Is there some way to tell the kernel to aggressively start swapping stuff out in order to free the requested allocation?
Edit: So I see from Eugene's response that it's not going to be possible to get a 32mb region from a single kmalloc.... but is there any possibility of getting it done in more of a hackish kind of way? Like identifying the largest available contiguous region, then manually migrating/swapping away data on either side of it?
Or how about something like this:
1) Grab a bunch of 4mb chunks until you're out of memory.
2) Check them all to see if any of them happen to be contiguous, if so,
combine them.
3) kfree the rest
4) goto 1)
Might that work, if given enough time to run?
You might want to take a look at the Contiguous Memory Allocator patches. Judgging from the LWN article, these patches are exactly what you need.
Mircea's link is one option; if you have an IOMMU on your device you may be able to use that to present a contiguous view over a set of non-contiguous pages in memory as well.

Resources