33 * 33 bit using 16 bit DSPs - vhdl

I want to write a vhdl/verilog code to multiply 2 33 vector using 16 bit dsps.
I really don't understand the mechanism of splitting the 2 33 vector into smaller vectors. Then use multiply and addition to get the final result.
Could anyone please explain how to do so.
Thank you.

You don't have to do that.
Just instance a 33x33 multiplier and the FPGA mapping tools will take care of splitting and recombination.
If you insist of doing it yourself look up "Wallace tree multiplier". That is the principle of how multipliers are build in hardware, with the improvement of using carry look ahead adders.

Related

FPGA LUT with multiple outputs

I am working on designing a mandelbrot viewer and I am designing hardware for squaring values. My squarer is recursively built where a 4bit squarer relies on 2, 2bit squarers. so for my 16 bit squarer, that has 2 8bits squarers, and each one of those has 2 4bit squarer's.
As you can see the recursivity begins to make the design blow up in complexity. To help speed up my design i would like to use a 4input ROM that emulates a 4bit squarer. So when you enter 3 in the rom, it outputs 9, when you enter 15, it outputs 225.
I know that a normal LUT implemented in a logic cell ay have 3 or 4 input variables and only 1 output, but i need an 8 bit output so I need more of a ROM then a LUT.
Any and all help is appreciated, Im curious how the FPGA will store those ROMs and if storing it in ROM would be faster than computing the 4input Square.
-
Jarvi
To square a 4-bit number explicitly using LUTs, you would need to use 8 4-input LUTs. Each LUT's output would give you one bit of the 8-bit product.
The overall size and fmax performance of your design may be achieved with this approach, using larger block RAM primitives (as ROM), dedicated MAC (multiply-accumulate) units, or by using the normal mulitiplication operator * and relying on your synthesis tool's optimization.
You may also want to review some research papers related to this topic, for example here.

Bitboard algorithms for board sizes greater than 64?

I know the Magic BitBoard technique is useful for modern games that are on a n 8x8 grid because you it aligns perfectly with a single 64-bit integer, but is the idea extensible to board sizes greater than 64 squares?
Some games like Shogi have larger board sizes such as 81 squares, which doesn't cleanly fit into a 64-bit integer.
I assume you'd have to use multiple integers but would it would it be better to use 2 64-bit integers or something like 3 32-bit ones?
I know there probably isn't a trivial answer to this, but what kind of knowledge would I need in order to research something like this? I only have some basic/intermediate algorithms and data structures knowledge.
Yes, you could do this with a structure that contains multiple integers of varying lengths. For example, you could use 11 unsigned bytes. Or a 64-bit integer and a 32-bit integer, etc. Anything that will add up to 81 or more bits.
I rather like the idea of three 32-bit integers because you can store three rows per integer. It makes your indexing code simpler than if you used a 64-bit integer and a 32-bit integer. 9 16-bit words would work well, too, but you're wasting almost half your bits.
You could use 11 unsigned bytes, but the indexing is kind of ugly.
All things considered, I'd probably go with the 3 32-bit integers, using the low 27 bits of each.

Simple data structure for the Othello board game?

I've done my program ages ago here as a uni project, at least it works to some extent (you may try the Monkey and Novice level:) ).
I'd like to redesign and re-implement it, so to practice on data structure and algorithm.
In my previous project, min-max search and alpha-beta pruning was the missing part, as well as a lack of opening dictionary.
Because the game board is symmetric both horizontally and vertically, I need a better data structure than my previous approach:
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 11 12 13 14 15 16 17 18 -1
-1 21 22 23 24 25 26 27 28 -1
-1 31 32 33 34 35 36 37 38 -1
. . . . . .
In this way, one can easily calculate the adjacent positions given any cell value like this:
x-11 x-10 x-9
x-1 x x+1
x+9 x+10 x+11
Those -1s are acting like "walls" to prevent wrong calculation.
The biggest issue is it doesn't take any consideration of symmetric/orientation, i.e., same opening like parallel opening would have 4 corresponding opening cases in database, one for each orientation.
Any good suggestion? I am also considering to try ruby as to have a quicker calculation speed than PHP (just for min-max alpha-beta pruning, in case I will program it to look n steps ahead).
Many thanks for the suggestions in advance.
When you hash a position to store or lookup in your database, takes hashes of all eight symmetric positions, and store or lookup only the smallest of the eight. Thus all symmetric positions hash to the same value.
This reduces the size of your database by 8 but multiplies the cost of hashing by 8. Is this a good trade-off? It depends on how big your database is and how often you do database lookups.
After you move to C/C++ :-) consider representing the game board as "bit-boards" e.g. two 64-bit-vectors e.g. for white and black e.g. struct Board { unsigned long white, black };
With care you can then avoid array indexing to test piece positions, and in fact can search in parallel for all up-captures, up-right-captures, etc. from a position using a series of bit logical operators, shifts, and masks, and no loops (!). Much faster.
This representation idea is orthogonal to your questino of opening book symmetries though.
Happy hacking.
The problem is easy to deal with if you seperate the presentation of the board from the internal representation. Once the opening move is made, you get parallel, diagional, or perpendicular opening. Each one of them can be in any of the 4 orientations. Rotate the internal board representation, until it is aligned with your opening book. Then simply take the rotation into account when drawing the board.
In regard to play, you need to look into Mobility Theory. Take a look at Hugo Calendars book on the topic. Also Nick Buro has written a bit about his program Logistello. A FAQ
As that parallel opening only applies for the very first move, I would just make the first move fixed.
If you really want speed, I'd recommend C++.
I would also imagine checking the space is on the board is faster than checking if the space contains a -1.

about sigma delta filter bit-width

I am working on a Sigma Delta ADC project, and need to decide the bit-width of the digital filter.
My filter has 4 stages, the first is CIC and the bit width is 29 bits according to the OSR. My final filter output is only 24 bits. So this means the other 3 FIR filters need to reduce 5 bits in total. If my input is 4-bits signed and output is 24 bits signed, OSR=256, how to decide the bit reducation of each the 3 filters. What's the impact of the performance due to the bit-width reduce. Any comments?
I think your problem might be a bit too domain-specific for StackOverflow. I do code-based DSP myself, and I have only the dimmest understanding of what you're asking. KVR Audio has an excellent forum where you might be able to get some help with this problem.

How would I go about implementing this algorithm?

A while back I was trying to bruteforce a remote control which sent a 12 bit binary 'key'.
The device I made worked, but was very slow as it was trying every combination at about 50 bits per second (4096 codes = 49152 bits = ~16 minutes)
I opened the receiver and found it was using a shift register to check the codes and no delay was required between attempts. This meant that the receiver was simply looking at the last 12 bits to be received to see if they were a match to the key.
This meant that if the stream 111111111111000000000000 was sent through, it had effectively tried all of these codes.
111111111111 111111111110 111111111100 111111111000
111111110000 111111100000 111111000000 111110000000
111100000000 111000000000 110000000000 100000000000
000000000000
In this case, I have used 24 bits to try 13 12 bit combinations (>90% compression).
Does anyone know of an algorithm that could reduce my 49152 bits sent by taking advantage of this?
What you're talking about is a de Bruijn sequence. If you don't care about how it works, you just want the result, here it is.
Off the top of my head, I suppose flipping one bit in each 12-bit sequence would take care of another 13 combinations, for example 111111111101000000000010, then 111111111011000000000100, etc. But you still have to do a lot permutations, even with one bit I think you still have to do 111111111101000000000100 etc. Then flip two bits on one side and 1 on the other, etc.

Resources