I've been trying to figure out a nice way of transposing a large amount of data in VHDL using a block ram (or similar).
Using a vector of vectors it's relatively easy, but it gets icky for large amounts of data.
I want to use a dual channel block ram so that I can write to the one block and read out the other. write in 8 bit std_logic_vectors, read out 32 bit std_logic_vectors, where the 32 bits is (for first rotation at least) the MSB for input vectors 0 - 31, then 32 - 63 all the way to 294911, then MSB-1, etc.
The case described above is my ideal scenario. Is this even possible? I can't seem to find a nice way of doing this.
In this answer, I'm assuming 18kbit Xilinx-style BRAMs. I'm most familiar with the Virtex-4, so I'm referring to UG070 for the following. The answer will map trivially to other Xilinx FPGAs, and probably other vendors' parts as well.
The first thing to note is that Virtex-4 BRAMs can be used in dual-port mode with different geometries for each port. However, with 1-bit-wide ports, these RAMs' parity bits aren't useful. Thus, an 18kbit BRAM is only effectively 16kbits here.
Next, consider the number of BRAMs you need just for storage (irrespective of your design.) 294912x8 bits maps to 144 BRAMs, which is a pretty large resource commitment. There's always a balance between throughput, design complexity, and resource requirements; if you need to squeeze every MBit/sec out of the design, maybe a BRAM-based approach is ideal. If not, you should consider whether your throughput and latency requirements allow you to use off-chip RAM instead of BRAM.
If you do plan on using BRAMs, then you should consider an array of 18x8 BRAMs. The 8 columns each store a single input bit. Each BRAM is written via a 1-bit port, and read out with a 32-bit port.
Every 8-bit write is mapped to eight one-bit writes (one write to a BRAM in each column.)
Every 32-bit read is mapped to a single 32-bit read from a single BRAM.
You should need very little sequencing logic (around a RAMB16 primitive) to get this working. The major complexity is how to map address bits between the 1-bit port and the 32-bit port.
After a bit of research, and thought, this is my answer to this problem:
Due to the nature of the block ram addressing, the ideal scenario mentioned in the OP is not possible with the current block ram addressing implementation. In order to perform a bitwise matrix transposition in the manner described, the block ram addressing would need to be able to switch between horizontal and vertical. That is, the ram must be accessible both row-wise and column-wise, and the addressing mode must be switchable in real-time. Since a bit-wise data transposition is not really a particularly "useful" transform, there wouldn't really be a reason for the implementation of such a switching scheme. Especially since the whole point of block ram is to store datain chunks of more than 1 bit, and such a transform would scramble the data.
I have discovered a way of changing my design such that 294911 x 8 bits do not need to be transformed at once, but rather done in stages using a process. This does not rely on block ram to perform the transform.
Related
I am trying to understand how to estimate FPGA resource requirement for a design/application.
Lets says Spartan 7 part has,
Logic Cells - 52160
DSP Slices - 120
Memory - 2700
How to find out number of CLB's, RAM, and Flash availability?
Lets say my design needs a SPI interface in FPGA,
How to estimate CLB, RAM and Flash requirement for this design?
Thanks
Estimation of a block of logic can be done in a couple ways. One method is to actually pen out the logic on paper and look at what registers you are planning on creating. Then you need to look at the part you working with. In this case the Spartan 7 has CLB config as below:
This is from the Xilinx UG474 7 Series document, pg 17. So now you can see the quantity of flops and memory per CLB. Once you look at the registers in the code and count up the memory in the design, you can figure out the number of CLB's. You can share memory and flops in a single CLB generally without issue, however, if you have multiple memories, quantization takes over. Two seperate memories can't occupy the same CLB generally. Also, there are other quantization effects. Memories some in perfect binary sizes, and if you build a 33 bit wide memory x 128K locations, you will really absorbes 64x128K bits of memory, where 31 bits x 128K are unused and untouchable for other uses.
The second method of estimating size is more experienced based as is practiced by larger FPGA teams where previous designs are looked at, and engineers will make basic comparisons of logic to identify previous blocks that are similar to what you are designing next. You might argue that I2C interface isn'a 100% like a SPI interface, but they are similar enough that you could say, 125% of I2C would be a good estiamte of a SPI with some margin for error. You then just throw that number into a spread sheet along with estimates for the 100 other modules that are in design and you call that the rough estimate.
If the estimate needs a second pass to make it more accurate, then you should throw a little code together and validate that it is functional enough to NOT be optimizing flops, gates and memory away and then use that to sure up the estimate. This is tougher because optimization (Read as dropping of unused flops) can happen all too easily, so you need to be certain that flops and gates are twiddle-able enough to not let them be interpreted as unused or always 1 or always 0.
To figure out the number of CLB's you can use the CLB slice configuration table above. Take the number of flops and divide by 16 (For the 7 Series devices) and this will give you the flop based CLB number. Take the memory bits, and divide each memory by 256 (again for 7 series devices) and you will get the total CLB's based on memory. At that point just take the larger of the CLB counts and that will be your CLB estimate.
I know that this question may sound stupid, but let me just explain. So...
Everyone knows that byte is 8 bits. Simple, right? But where exactly is it specified? I mean, phisically you don't really use bytes, but bits. For example drives. As I understand, it's just a reaaaaly long string of ones and zeros and NOT bytes. Sure, there are sectors, but, as far as I know, there are programmed at software level (at least in SSDs, I think). Also RAM, which is again - a long stream of ones and zeros. Another example is CPU. It doesn't process 8 bits at a time, but only one.
So where exactly is it specified? Or is it just general rule, which everyone follows? If so, could I make system (either operating system or even something at lower level) that would use, let's say, 9 bits in a byte? Or I wouldn't have to? Also - why can't you use less than a byte of memory? Or maybe you can? For example: is it possible for two applications to use the same byte (e.g. first one uses 4 bits and second one uses other 4)? And last, but not least - does computer drives really use bytes? Or is it that, for example, bits 1-8 belong to something, next to them there are some 3 random bits and bits 12-20 belong to something different?
I know that there are a lot of question and knowing answers to these questions doesn't change anything, but I was just wondering.
EDIT: Ok, I might've expressed myself not clear enough. I know that byte is just a concept (well, even bit is just a concept that we make real). I'm NOT asking why there are 8 bits in byte and why bytes exist as a term. What I'm asking is where in a computer is byte defined or if it even is defined. If bytes really are defined somewhere, at what level (a hardware level, OS level, programming language level or just at application level)? I'm also asking if computers even care about bytes (in that concept that we've made real), if they use bytes constantly (like in between two bytes, can there be some 3 random bits?).
Yes, they’re real insofaras they have a definition and a standardised use/understanding. The Wikipedia article for byte says:
The modern de-facto standard of eight bits, as documented in ISO/IEC 2382-1:1993, is a convenient power of two permitting the values 0 through 255 for one byte (2 in power of 8 = 256, where zero signifies a number as well).[7] The international standard IEC 80000-13 codified this common meaning. Many types of applications use information representable in eight or fewer bits and processor designers optimize for this common usage. The popularity of major commercial computing architectures has aided in the ubiquitous acceptance of the eight-bit size.[8] Modern architectures typically use 32- or 64-bit words, built of four or eight bytes
The full article is probably worth reading. No one set out stall 50+ years ago, banged a fist on the desk and said ‘a byte shallt be 8 bits’ but it became that way over time, with popular microprocessors being able to carry out operations on 8 bits at a time. Subsequent processor architectures carry out ops on multiples of this. While I’m sure intel could make their next chip a 100bit capable one, I think the next bitness revolution we’ll encounter will be 128
Everyone knows that byte is 8 bits?
These days, yes
But where exactly is it specified?
See above for the ISO code
I mean, phisically you don't really use bytes, but bits.
Physically we don’t use bits either, but a threshold of detectable magnetic field strength on a rust coated sheet of aluminium, or an amount of electrical charge storage
As I understand, it's just a reaaaaly long string of ones and zeros and NOT bytes.
True, everything to a computer is a really long stream of 0 and 1. What is important in defining anything else is where to stop counting this group of 0 or 1, and start counting the next group, and what you call the group. A byte is a group of 8 bits. We group things for convenience. It’s a lot more inconvenient to carry 24 tins of beer home than a single box containing 24 tins
Sure, there are sectors, but, as far as I know, there are programmed at software level (at least in SSDs, I think)
Sectors and bytes are analogous in that they represent a grouping of something, but they aren’t necessarily directly related in the way that bits and bytes are because sectors are a level of grouping on top of bytes. Over time the meaning of a sector as a segment of a track (a reference to a platter number and a distance from the centre of the platter) has changed as the march of progress has done away with positional addressing and later even rotational storage. In computing you’ll typically find that there is a base level that is hard to use, so someone builds a level of abstraction on top of it, and that becomes the new “hard to use”, so it’s abstracted again, and again.
Also RAM, which is again - a long stream of ones and zeros
Yes, and is consequently hard to use, so it’s abstracted, and abstracted again. Your program doesn’t concern itself with raising the charge level of some capacitive area of a memory chip, it uses the abstractions it has access to, and that attraction fiddles the next level down, and so on until the magic happens at the bottom of the hierarchy. Where you stop on this downward journey is largely a question of definition and arbitrary choice. I don’t usually consider my ram chips as something like ice cube trays full of electrons, or the subatomic quanta, but I could I suppose. We normally stop when it ceases to useful to solving the
Problem
Another example is CPU. It doesn't process 8 bits at a time, but only one.
That largely depends on your definition of ‘at a time’ - most of this question is about the definitions of various things. If we arbitrarily decide that ‘at a time’ is the unit block of the multiple picoseconds it takes the cpu to complete a single cycle then yes, a CPU can operate on multiple bits of information at once - it’s the whole idea of having a multiple bit cpu that can add two 32 bit numbers together and not forget bits. If you want to slice the time up so precisely that we can determine that enough charge has flowed to here but not there then you could say which bit the cpu is operating on right at this pico (or smaller) second, but it’s not useful to go so fine grained because nothing will happen until the end of the time slice the cpu is waiting for.
Suffice to say, when we divide time just enough to observe a single cpu cycle from start to finish, we can say the cpu is operating on more than one bit.
If you write at one letter per second, and I close my eyes for 2 out of every 3 seconds, I’ll see you write a whole 3 letter word “at the same time” - you write “the cat sat onn the mat” and to the observer, you generated each word simultaneously.
CPUs run cycles for similar reasons, they operate on the flow and buildup of electrical charge and you have to wait a certain amount of time for the charge to build up so that it triggers the next set of logical gates to open/close and direct the charge elsewhere. Faster CPUs are basically more sensitive circuitry; the rate of flow of charge is relatively constant, it’s the time you’re prepared to wait for input to flow from here to there, for that bucket to fill with just enough charge, that shortens with increasing MHz. Once enough charge has accumulated, bump! Something happens, and multiple things are processed “at the same time”
So where exactly is it specified? Or is it just general rule, which everyone follows?
it was the general rule, then it was specified to make sure it carried on being the general rule
If so, could I make system (either operating system or even something at lower level) that would use, let's say, 9 bits in a byte? Or I wouldn't have to?
You could, but you’d essentially have to write an adaptation(abstraction) of an existing processor architecture and you’d use nine 8bit bytes to achieve your presentation of eight 9bit bytes. You’re creating an abstraction on top of an abstraction and boundaries of basic building blocks don’t align. You’d have a lot of work to do to see the system out to completion, and you wouldn’t bother.
In the real world, if ice cube trays made 8 cubes at a time but you thought the optimal number for a person to have in the freezer was 9, you’d buy 9 trays, freeze them and make 72 cubes, then divvy them up into 8 bags, and sell them that way. If someone turned up with 9 cubes worth of water (it melted), you’d have to split it over 2 trays, freeze it, give it back.. this constant adaptation between your industry provided 8 slot trays and your desire to process 9 cubes is the adaptive abstraction
If you do do it, maybe call it a nyte? :)
Also - why can't you use less than a byte of memory? Or maybe you can?
You can, you just have to work with the limitations of the existing abstraction being 8 bits. If you have 8 Boolean values to store you can code things up so you flip bits of the byte on and off, so even though you’re stuck with your 8 cube ice tray you can selectively fill and empty each cube. If your program only ever needs 7 Booleans, you might have to accept the wastage of the other bit. Or maybe you’ll use it in combination with a regular 32 bit int to keep track of a 33bit integer value. Lot of work though, writing an adaptation that knows to progress onto the 33rd bit rather than just throw an overflow error when you try to add 1 to 4,294,967,295. Memory is plentiful enough that you’d waste the bit, and waste another 31bits using a 64bit integer to hold your 4,294,967,296 value.
Generally, resource is so plentiful these days that we don’t care to waste a few bits.. It isn’t always so, of course: take credit card terminals sending data over slow lines. Every bit counts for speed, so the ancient protocols for info interchange with the bank might well use different bits of the same byte to code up multiple things
For example: is it possible for two applications to use the same byte (e.g. first one uses 4 bits and second one uses other 4)?
No, because hardware and OS memory management these days keeps programs separate for security and stability. In the olden days though, one program could write to another program’s memory (it’s how we cheated at games, see the lives counter go down, just overwrite a new value), so in those days if two programs could behave, and one would only write to the 4 high bits and the other the 4 low bits then yes, they could have shared a byte. Access would probably be whole byte though, so each program would have to read the whole byte, only change its own bits of it, then write the entire result back
And last, but not least - does computer drives really use bytes? Or is it that, for example, bits 1-8 belong to something, next to them there are some 3 random bits and bits 12-20 belong to something different?
Probably not, but you’ll never know because you don’t get to peek to that level of abstraction enough to see the disk laid out as a sequence of bits and know where the byte boundaries are, or sector boundaries, and whether this logical sector follows that logical sector, or whether a defect in the disk surface means the sectors don’t follow on from each other. You don’t typically care though, because you treat the drive as a contiguous array of bytes (etc) and let its controller worry about where the bits are
This seems like it should be incredibly easy for me to grasp, but I am going insane trying to understand it. I understand that the computer only understands on or off, so what I am trying to understand is a situation where the computer reads more than one on or off value at a time - as in byte-addressable memory - where it would need to read 8 values of on and off to get the byte value. If the computer reads these 8 values sequentially - one after the other - then that would make sense to me. But from what I gather it reads all of these on and off states at the same time. I assume this is accomplished through the circuit design? Would someone be able to either explain this in simple terms to my simple mind or direct me to a resource that could do so? I would love to not have to be committed to the mental hospital over this - thanks.
If you mean reading the data off of a CD, it does happen sequentially. Each bit is gathered individually by a laser, but this operation is still treated as occurring all at once conceptually because it doesn't expose sufficiently fine-grained control to other parts of the computer to for it to be instructed to read just one bit. Simplified, the way this works is that a laser bounces off a mirror which depending on whether it is scratched or not in that particular place either causes the laser to bounce back and hit a photo-transistor (unscratched) setting a single bit to off, or to be deflected and not hit the photo-transistor (scratched), setting the bit to on. Even though it is loading these bits sequentially, it will always load complete bytes into its buffer before sending them to the computer's memory. Multiple bits travel from the hard drive buffer to memory at once. This is possible because the bus connecting them has multiple wires. Each wire can only hold one value at a given time, either high voltage or low (on or off). Eight adjacent wires can hold a byte at once.
A hard drive works by the same principal except rather than a laser and a scratched disk, you have a disk that is magnetized in specific locations, and a very sensitive inductor that will have current flow one way or another depending on the polarity of the accelerating magnetic field below it (i.e. the spinning disk).
Other parts of computer hardware actually do their operations in parallel. For sending information between different parts of the memory hierarchy, you can still think of this as just having multiple wires adjacent to each other.
Processing actually is built around the fact that multiple bits are being accessed at once, i.e. that there are different wires with on and off right next to each other. For example, in modern computers, a register can be thought of as 64 adjacent wires. (Calling them wires is a pretty big simplification at this point, but it is still gets across the appropriate idea.) A processor is built up of logic gates, and logic gates are made up of transistors. Transistors are a physical objects that have four places to attach wires. (1),(2),(3), and (4). A transistor allows current to flow from (1) to (2) if and only if the voltage at (3) is higher than the voltage at (4). Basically, a logic gate sets the voltage at (3) and (4) on a bunch of wires, and has a constant ON supply at (1), and (4) will only be powered if the amount of voltage at (3) and (4) matches the configuration that the logic gate is allowed to power.
From there it is pretty easy to see how input of some pre-specified size can be operated on at once.
If each of the gates in the drawing below is an OR gate, the drawing would show a new composite gate that calculated the OR of 8 bits at once instead of just two.
(calculated end -- registers get written to here)
^
^ ^ (everything that happens in here is "invisible" to the computer)
^ ^ ^ ^
a b c d e f g h (registers get read here)
Processors are built with a bunch of operations of some specified depth in number of transistors so that every 2 ^ - 32 second or so, the processor can flip a switch that allows the voltage stored at the calculated end of the gates to flow back to the registers. So that it can be used in new computations or pushed over wires back to memory.
That, in a nut shell, is what it means for a computer to deal with multiple bits at once.
I am new to VHDL programming, I am going to do a project on Built-In Self-Repair.In this project am going to design RAMs of different sizes(256 B,8kB,16kB,32kB)etc. and those rams has to be tested using BIST and then they should be repaired.So please help me by giving an example like how to design RAM with 'n' rows and columns
Start by drawing a block diagram of the RAM at the level of abstraction you want (probably gate-level). Then use VHDL to describe the block diagram.
You should probably limit yourself to a behavioral description, i.e., don't expect to be able to synthesize it. Synthesis for FPGAs usually expects a register-transfer-level description, and synthesis for ASICs is not something I would recommend for a VHDL beginner.
I will assume you want to work with SRAM, since this is the simplest case. Also, let's suppose you want to model a RAM with RAM_DEPTH words, and each word is RAM_DATA_WIDTH bits wide. One possible approach is to structure your solution in three modules:
One module that holds the RAM bits. This module should have the typical ports for a RAM: clock, reset (optional), write_enable, data_in, data_out. Note that each RAM word should be wide enought to hold the data bits, plus the parity bits, which are redundant bits that will allow yout to correct any errors. You can read about Hamming codes used for memory correction here: http://bit.ly/1dKrjV5. You can see a RAM modilng example from Doulos here: http://bit.ly/1aq1tn9.
A second module that loops through all memory locations, fixing them as needed. This should happen right after reset. Note that this will probably take many clock cycles (at least RAM_DEPTH clock cycles). Also note that it won't be implemented as a loop in VHDL. You could implement it using a counter, then use the count value as a read address, pass the data value through a an EDC function, and then write the corrected value back to the RAM module.
A top-level entity (optional), that instantiates modules (1) and (2), and coordinates the process. This module could have a 'init_done' pin that will be asserted after the verification and correction take place. This pin should be checked by the modules that use your RAM to know whether it is safe to start using the RAM.
To summarize, you could loop through all memory locations upon reset, fixing them as needed using an error-correcting code. After making sure all memory locations are ok, just assert an 'init_done' signal.
While HDDs evolve and offer more and more space on less room, why are we "sticking with" 32-bit or 64-bit?
Why can't there be a e.g.: 128-bit processor?
(This is not my homework; I'm just a student interested beyond the things they teach us in informatics)
Because the difference between 32-bit and 64-bit is astronomical - it's really the difference between 232 (a ten-digit number in the billions) and 264 (a twenty-digit number in the squillions :-).
64 bits will be more than enough for decades to come.
There's very little need for this, when do you deal with numbers that large? The current addressable memory space available to 64-bit is well beyond what any machine can handle for at least a few years...and beyond that it's probably more than any desktop will hold for quite a while.
Yes, desktop memory will continue to increase, but 4 billion times what it is now? That's going to take a while...sure we'll get to 128-bit, if the whole current model isn't thrown out before then, which I see equally as likely.
Also, it's worth noting that upgrading something from 32-bit to 64-bit puts you in a performance hole immediately in most scenarios (this is a major reason Visual Studio 2010 remains 32-bit only). The same will happen with 64-bit to 128-bit. The more small objects you have, the more pointers, which are now twice as large, that's more data to pass around to do the same thing, especially if you don't need that much addressable memory space.
When we talk about an n-bit architecture we are often conflating two rather different things:
(1) n-bit addressing, e.g. a CPU with 32-bit address registers and a 32-bit address bus can address 4 GB of physical memory
(2) size of CPU internal data paths and general purpose registers, e.g. a CPU with 32-bit internal architecture has 32-bit registers, 32-bit integer ALUs, 32-bit internal data paths, etc
In many cases (1) and (2) are the same, but there are plenty of exceptions and this may become increasingly the case, e.g. we may not need more than 64-bit addressing for the forseeable future, but we may want > 64 bits for registers and data paths (this is already the case with many CPUs with SIMD support).
So, in short, you need to be careful when you talk about, e.g. a "64-bit CPU" - it can mean different things in different contexts.
Cost. Also, what do you think the 128-bit architecture will get you? Memory addressing and such, but to handle it effectively, you need higher bandwidth buses and basically some new instruction languages that handle it. 64-bit is more than enough for addressing (18446744073709551616 bytes).
HDDs still have a bit of ground to catchup to RAM and such. They're still going to be the IO bottleneck I think. Plus, newer chips are just supporting more cores rather than making a massive change to the language.
Well, I happen to be a professional computer architect (my inventions are probably in the computer you are reading this on), and although I have not yet been paid to work on any processor with more than 64 bits of address, I know some of my friends who have been.
And I have been playing around with 128 bit architectures for fun for a few decades.
I.e. its already happening.
Actually, it has already happened to a limited extent. The HP Precision Architecture, Intel Itanium, and the higher end versions of the IBM Power line, have what I call a folded virtual memory. I have described these elsewhere, e.g. in comp.arch posts in some details, http://groups.google.com/group/comp.arch/browse_thread/thread/53a7396f56860e17/f62404dd5782f309?lnk=gst&q=folded+virtual+memory#f62404dd5782f309
I need to create a comp-arch.net wiki post for these.
But you can get the manuals for these processors and read them yourself.
E.g. you might start with a 64 bit user virtual address.
The upper 8 bits may be used to index a region table, that returns an upper 24 bits that is concatenated with the remaining 64-8=56 bits to produce an 80 bit expanded virtual address. Which is then translated by TLBs and page tables and hash lookups, as usual,
to whatever your physical address is.
Why go from 64->80?
One reason is shared libraries. You may want to have the shared libraries to stay at the same expanded virtual address in all processors, so that you cam share TLB entries. But you may be required, by your language tools, to relocate them to different user virtual addresses. Folded virtual addresses allow this.
Folded virtual addresses are not true >64 bit virtual addresses usable by the user.
For that matter, there are many proposals for >64 bit pointers: e.g. I worked on one where a pointer consisted of a 64bit address, and 64 bit lower and upper bounds, and metadata, for a total of 128 bits. Bounds checking. But, although these have >64 bit pointers or capabilities, they are not truly >64 bit virtual addresses.
Linus posts about 128 bit virtual addresses at http://www.realworldtech.com/beta/forums/index.cfm?action=detail&id=103574&threadid=103545&roomid=2
I'd also like to offer a computer architect's view of why 128bit is impractical at the moment:
Energy cost. See Bill Dally's presentations on how today, most energy in processors is spent moving data around (dissipated in the wires). However, since the most significant bits of a 128bit computation should change little, it should mitigate this problem.
Most arithmetic operations have a non-linear cost w.r.t operand size:
a. A tree multiplier has space complexity n^2, w.r.t. number of bits.
b. The delay of a hierarchical carry look ahead adder is Log[n] w.r.t number of bits (I think). So a 128bit adder will be slower than a 64bit add. Can anyone give some hard numbers (Log[n] seems very cheap) ?
Few programs use 128bit integers or quad precision floating point, and when they do, there are efficient ways to compose them from 32 or 64bit ops.
The next big thing in processor's architecture will be quantum computing. Instead of beeing just 0 or 1, a qbit has a probability of being 0 or 1.
This will lead to huge improvements in the performance of algorithm (for instance, it will be very easy to crack down any RSA private/public key).
Check http://en.wikipedia.org/wiki/Quantum_computer for more information and see you in 15 years ;-)
The main need for a 64 bit processor is to address more memory - and that is the driving force to switch to 64 bit. On 32 bit systems, you can really only address 4Gb of RAM, at least per process. 4Gb is not much.
64 bits give you an address space of several petabytes.(though, a lot of current 64 bit hardware can address "only" 48 bits - thats still enough to support 256 terrabytes of ram though).
Upping the natural integer sizes for a processor does not automatically make it "better" though. There are tradeoffs. With 128bit you'd need twice as much storage(registers/ram/caches/etc.) compared to 64 bit for common data types - with all the drawback that might have - more ram needed to store data, more data to transmit = slower, wider buses might requires more physical space/perhaps more power, etc.