What does a multiplexer do in CPU? - logic

I had designed a simple ALU, and I generated "operation codes" using a decoder. Now, I'm studying about Multiplexers, but I couldn't understand what they do in a CPU or ALU?

A really simple example: If you want to fetch a data bit from memory, a multiplexer allows you to specify an address (the input code), and the memory bit will be connected to another "pin".
So say you have 256 bits of memory, and you want to connect this to an output pin, the multiplexer has 8 bits for input codes. You proved a code say N, and and bit N is connected through the logic gates to the output of the multiplexer. This multiplexer would have a total of 256 + 8 input lines.
I'm not sure how this would be implemented in more modern CPUs but you can probably see how several bit multiplexers could be stacked together and be used to fetch a byte from memory in parallel as well, and connected to say an arithmetic register to perform computations.
Fun right?!

Related

How to get a rgb picture into FPGA most efficiently, using verilog

I am trying to write a verilog code for FPGA programming where I will implement a VGA application. I use Quartus II and Altera DE2.
At the moment, my aim is to get a 640x480 rgb image during compilation (method doesn't matter as long as it works and is efficient). The best solution I came up with is to convert the picture into rgb hex files using matlab and to use $readmemh to get them into a register.
But as discussed here: verilog $readmemh takes too much time for 50x50 pixel rgb image
it takes too much time and apparently there is no way around it with this method. It would be fine if it was only the time but there is also the size problem, 640x480 pretty much costs most of the free space.
What I am hoping is some system function or variable type of verilog that will take and store the picture in a different way so that size won't be a problem anymore. I have checked solutions for verilog and quartus webpage but I believe there should be a faster way to do this general task, rather than writing something from scratch.
compilation report for 200x200 readmemh attempt:
Based on your compilation report, I'd recommend you using a block ROM (or RAM) memory, instead of registers to store your image.
At this moment you're using distributed RAM, i.e. the memory that is available inside a each small logic blocks of FPGA. This makes distributed RAM, ideal for small sized memories. But when comes to large memories, this may cause an extra wiring delays and increase synthesis time (the synthesiser need to wire all of this blocks).
On the other hand, a block RAM is a dedicated two port memory containing several kilobits (depending on your device and manufacture) of RAM. That's why you should use block RAM for large sized memories, while distributed RAM for FIFO's or small sized memories. Cyclone IV EP4CE115F29 (available in DE2-115) has 432 M9K memory blocks (3981312 memory bits).
One important thing, the READ operation is asynchronous for distributed RAM (data is read from memory as soon as the address is given, doesn't wait for the clock edge), but synchronous for block RAM.
The example of single port ROM memory (Quartus II Verilog Template):
module single_port_rom
#(parameter DATA_WIDTH=8, parameter ADDR_WIDTH=8)
(
input [(ADDR_WIDTH-1):0] addr,
input clk,
output reg [(DATA_WIDTH-1):0] q
);
// Declare the ROM variable
reg [DATA_WIDTH-1:0] rom[2**ADDR_WIDTH-1:0];
initial
begin
$readmemh("single_port_rom_init.txt", rom);
end
always # (posedge clk)
begin
q <= rom[addr];
end
endmodule

AXI4 (Lite) Narrow Burst vs. Unaligned Burst Clarification/Compatibility

I'm currently writing an AXI4 master that is supposed to support AXI4 Lite (AXI4L) as well.
My AXI4 master is receiving data from a 16-bit interface. This is on a Xilinx Spartan 6 FPGA and I plan on using the EDK AXI4 Interconnect IP, which has a minimum WDATA width of 32 bits.
At first I wanted to use narrow burst, i.e. AWSIZE = x"01" (2 bytes in transfer). However, I found that Xilinx' AXI Reference Guide UG761 states "narrow bursts [are] supported but [...] not recommended." Unaligned transactions are supposed to be supported.
This had me thinking. Say I start an unaligned burst:
AWLEN = x"01" (2 beats)
AWSIZE = x"02" (4 bytes in transfer")
And do the following:
AX (32-bit word #0: send hi16)
XB (32-bit word #1: send lo16)
Where A, B are my 16 bit words that start off at an unaligned (2-byte aligned) address. X means WSTRB is deasserted for the indicated 16 bit.
Is this supported or does this fall under the category "narrow burst" even through AWSIZE = x"02" (4 bytes in transfer) as opposed to AWSIZE = x"01" (2 bytes in transfer)?
Now, if this was just for AXI4, I would probably not care as much about this use case, because AXI4 peripherals are required to use the WSTRB signals. However, the AXI Reference Guide UG761 states "[AXI4L] Slaves interface can elect to ignore WSTRB (assume all bytes valid)."
I read here that many (but not all; and there is not list?) Xilinx AXI4L peripherals do elect to ignore WSTRB.
Does this mean that I'm essentially barred from doing narrow burst ("not recommended") as well as unaligned bursts ("WSTRB can be ignored") or is there an easy way to unload some of the implementation work from my master into the interconnect, guaranteeing proper system behavior when accessing AXI4L peripherals?
Your example is not a narrow burst, and should work.
The reason narrow burst is not recommended is that it gives sub-optimal performances. Both narrow-burst and data realignement cost in area and are not recommended IMHO. However, DRE has minimal bandwidth cost, while narrow burst does. If your AXI port is 100MHz 32 bits, you have 3.2GBits maximum throughput, if you use narrow burst of 16 bits 50% of the time, than your maximum throughput is reduced to 2.4GBits (32bits X 50MHz + 16bits X 50Mhz). Also, I'm not sure AXI-Lite support narrow burst or data realignement.
Your example has 2 major flaws. First, it requires 3 data-beats to transfer 32 bits, which is worst than narrow-burst (I don't think AXI is smart enough to cancel the last burst with WSTRB to 0). Second, you can't burst more than 2 16-bits at a time, which will hang your AXI infrastructure's performances if you have a lot of data to transfer.
The best way to deal with this is concatenate the 16 bits together to form a 32 bits in your block. Then you buffer these 32 bits and burst them when you have enough. This is the AXI high performance way to do this.
However, if you receive data as 16-bits, it seems you would be better using AXI-Stream, which support 16-bits but doesn't have the notion of addresses. You can map an AXI-Stream to AXI-4 using Xilinx's IP cores. Either AXI-Datamover or AXI-DMA can do that. Both do the same (in fact, AXI-DMA includes a datamover), but AXI-DMA is controlled trough an AXI-Lite interface while Datamover is controlled through additionals AXI-Streams.
As a final note, the Xilinx cores never requires narrow-burst or DRE. If you need DRE in AXI-DMA, it's done by the AXI-DMA core and not the AXI Interconnect. Also, these cores are clear-source, so you can checkout how they operate easily.

How do bits become a byte? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Possibly the most basic computer question of all time, but I have not been able to find a straightforward answer and it is driving me crazy. When a computer 'reads' a byte, does it read it as a sequential series of ones and zeros one after the other, or does it somehow read all 8 ones and zeros at once?
A computer system reads the data in both ways depending upon type of operation and the how the digital system is designed.I'll explain this with very simple example of a Full adder circuit.
A full adder adds binary numbers and accounts for values carried in as well as out (Wikipedia)
Example of Parallel operation
Suppose in some task we need to add two 8 bit(1 byte) numbers such that all bits are available at the time of addition.
Then in that case we can design a digital system with 8 full-adders(1 for each bit).
Example of Serial Operation
In some other task you observe that all 8 bits will not be simultaneously available.
Or you think having 8 separate adders is costly as you need to implement other mathematical operations (like subtraction,multiplication and division). So instead of having 8 separate units you have 1 unit which will individually process bits. In this scenario we will need three storage units ( Shift Registers) such that two storage units will store two 8-bit numbers and one storage units will store the result .At a given clock pulse single bit will be transmitted from each of two registers to the full adder which will perform the addition process and transfer 1 bit result to the result shift register in single clock pulse.
This figure contains some additional stuff which is not useful for this thread but you can
study digital logic design and computer architecture if you want to go more deep in this stuff.
Shift register
Shift register operations demo
This is really kind of outside the scope of Stackoverflow, but it brings back such fond memories from college.
It depends. Some times a computer reads bits one at a time. For example over older ethernet manchester code is used. However over old parallel printer cables, there were 8 pins each one signaling a bit, and an entireoctet (byte) is sent at once.
In serial (one-bit-at-a-time) encodings, you're typically measuring transitions in the line or transitions against some well-defined clock source.
In parallel encodings, you're typically reading all the bits into a register at a time and latching the register.
Look up flipflops, registers, and logic gates for information on the low-level parts of this.
Bits are transmitted one at a time in serial transmission, and
multiple numbers of bits in parallel transmission. A bitwise operation
optionally process bits one at a time. Data transfer rates are usually
measured in decimal SI multiples of the unit bit per second (bit/s),
such as kbit/s.
Wikipedia's article on Bit
the processor works with a defined number of registerlength. 8, 16, 32, 64 ... think about a register as an amount of connection, one for each bit... thats the amount of bits that will be processed at once in one processor core, one register at once ... the processor hat different kinds of register, examples are the private instruction register or the public data or adress register
Think of it this way, at least at a physical level: In a transmission cable from point A to B (A and B can be anything, hard drive, CPU, RAM, USB, etc.) each wire in that cable can transmit one bit at a time. Both A and B have a clock pulsing at the same rate. On each pulse, the sender changes the amount of power going down each wire to signify the value of the new bit(s). So, the # of wires in the cable = the # of bits that can be transmitted each "pulse". (Note: This is a very simplified and theoretical explanation).
At a software level, in the CPU, you can never address anything smaller than a byte. You can "access" and manipulate specific bytes by using the bitwise operators (& (AND), | (OR), << (Left Shift), >> (Right Shift), ^ (XOR)).
In hardware, the number of bits being sent each pulse is completely dependent of the hardware itself.

What makes a CPU architecture "X-bit"?

Warning: I'm not sure where this type of question belongs. If you know a better place for it, drop a link.
Background: Imagine you heard a sentence like this: "this computer/processor has X-bit architecture". Now, if that computer is standard, you get a lot of information, like maximum RAM capacity, maximum unsigned/signed integer value and so on... But what if computer is not standard?
The mystery: back to 70's and 80's, the period referred as "8-bit era". Wait, 8-bit? Yes. So, if a CPU architecture is 8-bit, then:
The maximum RAM capacity of computer is exactly 256 bytes.
The maximum UInt range is from 0 to 256 and the maximum signed integer range is -128 to 127.
The maximum ROM capacity is also 256 bytes, because you have to be able to jump around?
However, it's clearly not like that. Look at some technical characteristics of game consoles of that time and you will see that those exceed the 256 limit.
Quotes (http://www.8bitcomputers.co.uk/whatbasics.html):
The Sharp PC1211 is actually a 4-bit computer but cleverly glues two together to look like 8 (a computer able to add up to 16 would not be very useful!)
So if it's a 4-bit computer, why can manipulate 8-bit integers? And another one...
The Sinclair QL is one of those computers that actually leaves the experts arguing. In parts, it is a 16 bit computer, in some ways it is even like a 32 bit computer but it holds its memory in 8 bits.
What? So why is this mess in www.8bitcomputers.co.uk?
Generally: how is an X-bit computer defined?
The biggest data bus that it has is X bits long (then Sinclair QL is a 32-bit computer)?
The CU functions of that computer are X bits long?
It holds its memory (in registers, ROM, RAM, whatever) in 8 bits?
Other definitions?
Purpose: I think that what I am designing is a 4-bit CPU. I don't really know if it has a 4-bit architecture, because it uses double ROM address, and includes functions like "activate ALU" that take another 4 bits from register Y. I want to know if I can still call it a 4-bit CPU. That's it!
Thank you very much in advance :)
An X-bit computer (or CPU) is defined whether the central unites and registers, such as CPU and ALU, are in X-bit. The addressing doesn't matter in defining the number X. As you have mentioned, an 8-bit computer (e.g. Motorola 68HC11 even tough it is a MCU, still it can be counted as a computer with CPU, I/O and Memory) can have 16-bit addressing in order to increase the RAM or memory size.
The data-bus size and the register sizes of CPU and ALU is the limiting factor in defining the X number in an X-bit computer architecture. You can get more information from http://en.wikipedia.org/wiki/Word_(computer_architecture)
An answer to your question will be "Yes, you are designing a 4-bit CPU if the registers and data bus size are in 4-bit.

Xilinx True Dual Ported RAM with different aspect ratios on two ports

I am trying to build a RAM block in Verilog with the following configuration:
Port A: 128 bit wide, with clk_a, sees RAM block as 128 bit wide times 128 lines deep
Port B: 32 bit wide with clk_b, sees RAM block as 32 bit wide times 512 lines deep
Do not worry about READ-WRITE serialization and mutexing, I will be taking care of it with a layer above that.
Basically, the code that generates the 128 bit times 128 lines looks like:
reg [DATA_WIDTH-1:0] mem [0:2**ADDRESS_WIDTH-1];
Now, if I want it to look like 32 bit times 512 deep, how do I refactor this memory to look different (kind of like a recast in C) ? I understand that I might be able to do this with 32 bit word enable(s), but I am trying to see if there is a cleaner way to achieve this.
Let me know what you think ?
RRS
Correction: I am referring Xilinx BRAM (BRAMs cant be 512 deep). But this is essentially a memory block with the glue logic chaining multiple BRAMs. Thanks for pointing out !!
I solved it this way:
In ISE, I was able to find "Language Templates" in one of the menus which has actual code samples. There is one with "File I/O", that one works perfectly.
You can also build a wrapper module around the dual port RAM which will change data widths on the other side. On the smaller data-width port (i.e. more address lines) you can use the lower address bits as a word select system allowing you to write to part of a memory line. This synthesizes properly for me (check your synthesis tool).
For instructions on exactly how to do this, see the Xilinx documentation. For example, http://www.xilinx.com/support/documentation/sw_manuals/xilinx13_2/xst_v6s6.pdf starting at page 217 gives explicit VHDL and Verilog examples of how to do what you are asking.

Resources