about sigma delta filter bit-width - filter

I am working on a Sigma Delta ADC project, and need to decide the bit-width of the digital filter.
My filter has 4 stages, the first is CIC and the bit width is 29 bits according to the OSR. My final filter output is only 24 bits. So this means the other 3 FIR filters need to reduce 5 bits in total. If my input is 4-bits signed and output is 24 bits signed, OSR=256, how to decide the bit reducation of each the 3 filters. What's the impact of the performance due to the bit-width reduce. Any comments?

I think your problem might be a bit too domain-specific for StackOverflow. I do code-based DSP myself, and I have only the dimmest understanding of what you're asking. KVR Audio has an excellent forum where you might be able to get some help with this problem.

Related

How does Golomb Code improve efficiency in H.265?

I'm parsing HEVC [H.265] header and I noticed that many values are in Golomb code notation. One of them, for example, is the width.
Let's suppose a width value of 1600, in Golomb code is written as:
g=000000000001001000001
call "leadingZero" (lz) the first part of the string (from left to right).
LeadingZero is composed by 11 zeros. Let's call b the rest of the string of Golomb code.
to decode the Golomb code, where b=1001000001 (or decimal 577), you do:
a=2^(lz-1)-1;
n=a+to_decimal(b)
where to_decimal converts from binary to decimal value.
So you have 1023 + 577 = 1600.
Question:
With Golomb you're using 21 bits to represent 1600.
But 1600 in binary takes 11 bits (110 0100 0000).
Also the Golomb method does not allow for a custom number of bits to represent values.
So... Why Golomb code is used in compression algorithms like H.265?
Well, usually compression of High Level Syntax (HLS) is not a critical priority in video compression. If you do the math for a typical resolution (e.g. 1080p) in a typical bandwidth (e.g. 7 Mbps), you will see that saving a few bits to signal frame-level and sequence-level information is really negligible.
However, since ex Colomb code is also used in signalling large DCT coefficients, one might ask the same question in that context. And it would be a valid compression concern, as efficiency residual coding is everything! To answer that question, there are a lot of well stablished literature, dating back to AVC time.

FPGA LUT with multiple outputs

I am working on designing a mandelbrot viewer and I am designing hardware for squaring values. My squarer is recursively built where a 4bit squarer relies on 2, 2bit squarers. so for my 16 bit squarer, that has 2 8bits squarers, and each one of those has 2 4bit squarer's.
As you can see the recursivity begins to make the design blow up in complexity. To help speed up my design i would like to use a 4input ROM that emulates a 4bit squarer. So when you enter 3 in the rom, it outputs 9, when you enter 15, it outputs 225.
I know that a normal LUT implemented in a logic cell ay have 3 or 4 input variables and only 1 output, but i need an 8 bit output so I need more of a ROM then a LUT.
Any and all help is appreciated, Im curious how the FPGA will store those ROMs and if storing it in ROM would be faster than computing the 4input Square.
-
Jarvi
To square a 4-bit number explicitly using LUTs, you would need to use 8 4-input LUTs. Each LUT's output would give you one bit of the 8-bit product.
The overall size and fmax performance of your design may be achieved with this approach, using larger block RAM primitives (as ROM), dedicated MAC (multiply-accumulate) units, or by using the normal mulitiplication operator * and relying on your synthesis tool's optimization.
You may also want to review some research papers related to this topic, for example here.

How to add a LUT in VHDL to generate a sine

I've made an I2S transmitter to generate a "sound" out of my FPGA. The next step I would like to do, is create a sine. I've made 16 samples in a LUT. My question is how to implement something like this in VHDL. And also how you load the samples in sequence. Who has tried this already, and could share his knowledge?
I've made a Lookup table with 16 samples:
0 0π
0,382683432 1/16π
0,707106781 1/8π
0,923879533 3/16π
1 1/4π
0,923879533 5/16π
0,707106781 3/8π
0,382683432 7/16π
3,23114E-15 1π
-0,382683432 1 1/16π
-0,707106781 1 1/8π
-0,923879533 1 3/16π
-1 1 1/4π
-0,923879533 1 5/16π
-0,707106781 1 3/8π
-0,382683432 1 7/16π
-6,46228E-15 2π
The simplest solution is to make a ROM which is just a big case statement.
FPGA synthesis tools will map this on ore more LUT's.
Note that for bigger tables only 1/4 of the wave is stored, the other values are derived.
I would like to send out a 24 bit samples, do you also know how to do that with this data (binary!)?
24 bits (signed) mean you have to convert your floating point values to integer values in the range -8388608..8388607. (For symmetry reason you would use -8388608..8388607)
Thus multiply the sine values (which you know are in the range -1..1) with 8388607.
The frequency of the sine depends on how fast (many samples per second) you send.

33 * 33 bit using 16 bit DSPs

I want to write a vhdl/verilog code to multiply 2 33 vector using 16 bit dsps.
I really don't understand the mechanism of splitting the 2 33 vector into smaller vectors. Then use multiply and addition to get the final result.
Could anyone please explain how to do so.
Thank you.
You don't have to do that.
Just instance a 33x33 multiplier and the FPGA mapping tools will take care of splitting and recombination.
If you insist of doing it yourself look up "Wallace tree multiplier". That is the principle of how multipliers are build in hardware, with the improvement of using carry look ahead adders.

How would I go about implementing this algorithm?

A while back I was trying to bruteforce a remote control which sent a 12 bit binary 'key'.
The device I made worked, but was very slow as it was trying every combination at about 50 bits per second (4096 codes = 49152 bits = ~16 minutes)
I opened the receiver and found it was using a shift register to check the codes and no delay was required between attempts. This meant that the receiver was simply looking at the last 12 bits to be received to see if they were a match to the key.
This meant that if the stream 111111111111000000000000 was sent through, it had effectively tried all of these codes.
111111111111 111111111110 111111111100 111111111000
111111110000 111111100000 111111000000 111110000000
111100000000 111000000000 110000000000 100000000000
000000000000
In this case, I have used 24 bits to try 13 12 bit combinations (>90% compression).
Does anyone know of an algorithm that could reduce my 49152 bits sent by taking advantage of this?
What you're talking about is a de Bruijn sequence. If you don't care about how it works, you just want the result, here it is.
Off the top of my head, I suppose flipping one bit in each 12-bit sequence would take care of another 13 combinations, for example 111111111101000000000010, then 111111111011000000000100, etc. But you still have to do a lot permutations, even with one bit I think you still have to do 111111111101000000000100 etc. Then flip two bits on one side and 1 on the other, etc.

Resources