How to get a rgb picture into FPGA most efficiently, using verilog - image

I am trying to write a verilog code for FPGA programming where I will implement a VGA application. I use Quartus II and Altera DE2.
At the moment, my aim is to get a 640x480 rgb image during compilation (method doesn't matter as long as it works and is efficient). The best solution I came up with is to convert the picture into rgb hex files using matlab and to use $readmemh to get them into a register.
But as discussed here: verilog $readmemh takes too much time for 50x50 pixel rgb image
it takes too much time and apparently there is no way around it with this method. It would be fine if it was only the time but there is also the size problem, 640x480 pretty much costs most of the free space.
What I am hoping is some system function or variable type of verilog that will take and store the picture in a different way so that size won't be a problem anymore. I have checked solutions for verilog and quartus webpage but I believe there should be a faster way to do this general task, rather than writing something from scratch.
compilation report for 200x200 readmemh attempt:

Based on your compilation report, I'd recommend you using a block ROM (or RAM) memory, instead of registers to store your image.
At this moment you're using distributed RAM, i.e. the memory that is available inside a each small logic blocks of FPGA. This makes distributed RAM, ideal for small sized memories. But when comes to large memories, this may cause an extra wiring delays and increase synthesis time (the synthesiser need to wire all of this blocks).
On the other hand, a block RAM is a dedicated two port memory containing several kilobits (depending on your device and manufacture) of RAM. That's why you should use block RAM for large sized memories, while distributed RAM for FIFO's or small sized memories. Cyclone IV EP4CE115F29 (available in DE2-115) has 432 M9K memory blocks (3981312 memory bits).
One important thing, the READ operation is asynchronous for distributed RAM (data is read from memory as soon as the address is given, doesn't wait for the clock edge), but synchronous for block RAM.
The example of single port ROM memory (Quartus II Verilog Template):
module single_port_rom
#(parameter DATA_WIDTH=8, parameter ADDR_WIDTH=8)
(
input [(ADDR_WIDTH-1):0] addr,
input clk,
output reg [(DATA_WIDTH-1):0] q
);
// Declare the ROM variable
reg [DATA_WIDTH-1:0] rom[2**ADDR_WIDTH-1:0];
initial
begin
$readmemh("single_port_rom_init.txt", rom);
end
always # (posedge clk)
begin
q <= rom[addr];
end
endmodule

Related

STM32F411 I need to send a lot of data by USB with high speed

I'm using STM32F411 with USB CDC library, and max speed for this library is ~1Mb/s.
I'm creating a project where I have 8 microphones connected into ADC line (this part works fine), I need a 16-bit signal, so I'm increasing accuracy by adding first 16 signals from one line (ADC gives only 12-bits signal). In my project, I need 96k 16-bit samples for one line, so it's 0,768M signals for all 8 lines. This signal needs 12000Kb space, but STM32 have only 128Kb SRAM, so I decided to send about 120 with 100Kb data in one second.
The conclusion is I need ~11,72Mb/s to send this.
The problem is that I'm unable to do that because CDC USB limited me to ~1Mb/s.
Question is how to increase USB speed to 12Mb/s for STM32F4. I need some prompt or library.
Or maybe should I set up "audio device" in CubeMX?
If small b means byte in your question, the answer is: it is not possible as your micro has FS USB which max speeds is 12M bits per second.
If it means bits your 1Mb (bit) speed assumption is wrong. But you will not reach the 12M bit payload transfer.
You may try to write (only if b means bit) your own class but I afraid you will not find a ready made library. You will need also to write the device driver on the host computer

What does a multiplexer do in CPU?

I had designed a simple ALU, and I generated "operation codes" using a decoder. Now, I'm studying about Multiplexers, but I couldn't understand what they do in a CPU or ALU?
A really simple example: If you want to fetch a data bit from memory, a multiplexer allows you to specify an address (the input code), and the memory bit will be connected to another "pin".
So say you have 256 bits of memory, and you want to connect this to an output pin, the multiplexer has 8 bits for input codes. You proved a code say N, and and bit N is connected through the logic gates to the output of the multiplexer. This multiplexer would have a total of 256 + 8 input lines.
I'm not sure how this would be implemented in more modern CPUs but you can probably see how several bit multiplexers could be stacked together and be used to fetch a byte from memory in parallel as well, and connected to say an arithmetic register to perform computations.
Fun right?!

Xilinx FPGA output to output timing constraints

I have a Spartan-6/ISE design where I'm generating 8-bit data # 70MHz to feed the FIFO of a Cypress FX3 USB3 controller. I also generate a 70MHz o/p clock and /WR strobe that clock data into the USB controller. The 70MHz is derived from halving the 140MHz system clock, divided by 2 in a process rather than using a DPLL, though the 140MHz system clock is produced using a DPLL.
I want to ensure the 8-bit data meets the set-up & hold time requirements of the USB controller and, although the data, o/p clock and /WR are derived from the 140MHz, I don't really care about their relationship to it. What I'm really concerned about is ensuring the set-up & hold times for data & /WR w.r.t the 70MHz o/p clock are within the USB controller limits.
My question is: how do I go about specifying timing constraints between FPGA outputs rather than w.r.t. to the internal system clock ?
Thanks
Dave

What makes a CPU architecture "X-bit"?

Warning: I'm not sure where this type of question belongs. If you know a better place for it, drop a link.
Background: Imagine you heard a sentence like this: "this computer/processor has X-bit architecture". Now, if that computer is standard, you get a lot of information, like maximum RAM capacity, maximum unsigned/signed integer value and so on... But what if computer is not standard?
The mystery: back to 70's and 80's, the period referred as "8-bit era". Wait, 8-bit? Yes. So, if a CPU architecture is 8-bit, then:
The maximum RAM capacity of computer is exactly 256 bytes.
The maximum UInt range is from 0 to 256 and the maximum signed integer range is -128 to 127.
The maximum ROM capacity is also 256 bytes, because you have to be able to jump around?
However, it's clearly not like that. Look at some technical characteristics of game consoles of that time and you will see that those exceed the 256 limit.
Quotes (http://www.8bitcomputers.co.uk/whatbasics.html):
The Sharp PC1211 is actually a 4-bit computer but cleverly glues two together to look like 8 (a computer able to add up to 16 would not be very useful!)
So if it's a 4-bit computer, why can manipulate 8-bit integers? And another one...
The Sinclair QL is one of those computers that actually leaves the experts arguing. In parts, it is a 16 bit computer, in some ways it is even like a 32 bit computer but it holds its memory in 8 bits.
What? So why is this mess in www.8bitcomputers.co.uk?
Generally: how is an X-bit computer defined?
The biggest data bus that it has is X bits long (then Sinclair QL is a 32-bit computer)?
The CU functions of that computer are X bits long?
It holds its memory (in registers, ROM, RAM, whatever) in 8 bits?
Other definitions?
Purpose: I think that what I am designing is a 4-bit CPU. I don't really know if it has a 4-bit architecture, because it uses double ROM address, and includes functions like "activate ALU" that take another 4 bits from register Y. I want to know if I can still call it a 4-bit CPU. That's it!
Thank you very much in advance :)
An X-bit computer (or CPU) is defined whether the central unites and registers, such as CPU and ALU, are in X-bit. The addressing doesn't matter in defining the number X. As you have mentioned, an 8-bit computer (e.g. Motorola 68HC11 even tough it is a MCU, still it can be counted as a computer with CPU, I/O and Memory) can have 16-bit addressing in order to increase the RAM or memory size.
The data-bus size and the register sizes of CPU and ALU is the limiting factor in defining the X number in an X-bit computer architecture. You can get more information from http://en.wikipedia.org/wiki/Word_(computer_architecture)
An answer to your question will be "Yes, you are designing a 4-bit CPU if the registers and data bus size are in 4-bit.

Xilinx True Dual Ported RAM with different aspect ratios on two ports

I am trying to build a RAM block in Verilog with the following configuration:
Port A: 128 bit wide, with clk_a, sees RAM block as 128 bit wide times 128 lines deep
Port B: 32 bit wide with clk_b, sees RAM block as 32 bit wide times 512 lines deep
Do not worry about READ-WRITE serialization and mutexing, I will be taking care of it with a layer above that.
Basically, the code that generates the 128 bit times 128 lines looks like:
reg [DATA_WIDTH-1:0] mem [0:2**ADDRESS_WIDTH-1];
Now, if I want it to look like 32 bit times 512 deep, how do I refactor this memory to look different (kind of like a recast in C) ? I understand that I might be able to do this with 32 bit word enable(s), but I am trying to see if there is a cleaner way to achieve this.
Let me know what you think ?
RRS
Correction: I am referring Xilinx BRAM (BRAMs cant be 512 deep). But this is essentially a memory block with the glue logic chaining multiple BRAMs. Thanks for pointing out !!
I solved it this way:
In ISE, I was able to find "Language Templates" in one of the menus which has actual code samples. There is one with "File I/O", that one works perfectly.
You can also build a wrapper module around the dual port RAM which will change data widths on the other side. On the smaller data-width port (i.e. more address lines) you can use the lower address bits as a word select system allowing you to write to part of a memory line. This synthesizes properly for me (check your synthesis tool).
For instructions on exactly how to do this, see the Xilinx documentation. For example, http://www.xilinx.com/support/documentation/sw_manuals/xilinx13_2/xst_v6s6.pdf starting at page 217 gives explicit VHDL and Verilog examples of how to do what you are asking.

Resources