I am trying to convolve a 16-bit input data stream with a Dirac Delta on a Xilinx Virtex 7.
More specifically, instead of multiplying my input stream by a cosine in the time domain, I would like to convolve it with the following expression in the frequency domain:
F(f) = 0.5 * (delta(f - f0) + delta(f + f0))
Does anybody have any idea about how to implement that ? Indeed, the only possibly interesting Xilinx IP core for my problem is the FIR Complier but I don't know how to represent my function F(f) as the 'coefficient' input of this IP core.
EDIT: mathematically, since the target convolution involves only Dirac Deltas, there may exist a shorter way to avoid the convolution by simply evaluating the input function at the point f0. But I have no idea about how to implement that neither ..
Thank you in advance
Xilinx has an IP to perform Fast Fourier Transform on the FPGA. Once in frequency domain, you are somewhat on your own to perform your operations. You could use the FIR ip core, but since your function is quite simple it would waste a lot of resources compared to a custom implementation. Finally, the Xilinx's core can do inverse FFT to go back to time-domain.
AFAIK, there is no core to help perform convolution in frequency domain. So don't forget to overlap-add your transforms to do the proper calculation. Matlab will be your friend there!
Finally, you may be interested in Number Theoretic Transform (NTT). The algorithm is more efficient than FFT for FPGA and can be used to perform convolution. The drawback is that there are limitations to the length of the transform you can have and that the "frequency-domain coefficient" are totally unrelated to frequency (they are somewhat random). If all you want if fast-convolution, NTT is for you, if you're looking for other uses for these fourier-coefficient, it's not. However, the NTT expression of the cosine would be much more complicated and would defeat the purpose of your work, but I thought you may be interested on an academic standpoint. As I stated in my comment, multiplying with a cosine is simpler in time-domain after all.
Related
I'm using the librosa library to convert music segments into mel-spectrograms to use as inputs for my neural network, as shown in the docs here.
How is this different from MFCCs, if at all? Are there any advantages or disadvantages to using either?
To get MFCC, compute the DCT on the mel-spectrogram. The mel-spectrogram is often log-scaled before.
MFCC is a very compressible representation, often using just 20 or 13 coefficients instead of 32-64 bands in Mel spectrogram. The MFCC is a bit more decorrelarated, which can be beneficial with linear models like Gaussian Mixture Models. With lots of data and strong classifiers like Convolutional Neural Networks, mel-spectrogram can often perform better.
I suppose, jonnor's answer is not exactly correct. There are two steps:
1. Take logs of Mel spectrogram.
2. Compute DCT on logs.
Moreover, taking logs seems to be "the main part" for training NN: https://qr.ae/TWtPLD
A key difference is that the mel-spectrogram has the semantics of a spectrum, whereas MFCC in a sense is a 'spectrum of a spectrum'. The real question is thus: What is the purpose of applying the DCT to the mel-spectrogram, which has good answers here and there.
Note that in the meantime librosa also has a mfcc function. And looking at its implementation basically confirms that it is
calling melspectrogram,
converting its output to logs (via power_to_db),
taking the DCT of the frequencies, as if they were a signal,
truncating the new 'spectrum of spectrum' after the first n_mfcc coefficients.
I have a stream of data measurements with an initial increasing phase that is followed by a plateau. The measurements are noisy without clear bound. I would like to stop ingesting the stream when the plateau is detected:
while (not_const)
{
add_measurement( stream.get() );
not_const = !is_const();
}
Is there a well-known algorithm for dealing with such problem? I know about Kalman-Filters, but not so sure if they are specifically made for this.
The Kalman filter will cover your noise, so long as the variance is calculable. Yes, it can help in this situation. Depending on your application, you may find that the first derivative of a moving average will do as well for you. Kalman merely optimizes some linear parameters to give a "best" prediction of actual (vs observed-through-noise) values.
You still need to handle your interpretation of that prediction series. You need to define what constitutes a "plateau". How close to 0 do you need the computable slope? Does that figure depend on the preceding input? How abrupt is the transition between the increase and the plateau? The latter considerations would suggest looking at the second derivative as well: a quick-change detector of some ilk.
I need to calculate the function y=x/(1+x^2) on a small FPGA in fixed point, can you help me finding the best algorithm?
I thought of those possibilities:
as the FPGA is small I think I will use shift&add/subtract algebra a
multiplier and a divider will use about the same number of cycles,
right?
this function is similar to a digital filter, can I calculate
divisionless using a feedback loop?
I don't think I will have much memory so I'd prefer not using a LUT
Are there other options?
The time requirement isn't very strict so I thought of a simple shift&subtract but if there is something simpler, that might be better.
I am implementing a SPICE solver. I have the following problem: say I put two diodes and a current source in serial (standard diodes). I use MNA and Boost LU-decomposition. The problem is that the nodal matrix becomes very quickly near-singular. I think I have to scale the values but I don't know how and I couldn't find anything on the Internet. Any ideas how to do this scaling?
In the perspective of numerical, there is a scale technique for this kind of near-singular matrices. Basically, this technique is to divide each row of A by the sum (or maximum) of the absolute values in that row. You can find KLU which is a linear solver for circuit simulations for more details.
In perspective of SPICE simulation, it uses so-call Gmin stepping technique to iteratively compute and approach a real answer. You can find this in the documents of a SPICE project QUCS (Quite Universal Circuit Simulator).
Scaling does not help when the matrix has both very large and very small entries.
It is necessary to use some or all of the many tricks that were developed for circuit solver applications. A good start is clipping the range of the exponential and log function arguments to reasonable values -- in most circuits a diode forward voltage is never more than 1V and the diode reverse current not less than 1pA.
Actually, look at all library functions and wrap them in code that makes their arguments and results suitable for circuit-solving purposes. Simple clipping is sometimes good enough, but it is way better to make sure the functions stay (twice) differentiable and continuous.
I am attempting to extract pitch data from an audio stream. From what I can see, it looks as though FFT is the best algorithm to use.
Rather than digging straight into the math, could someone help me understand what this FFT algorithm does?
Please don't say something obvious like 'FFT extracts frequency data from a raw signal.' I need the next level of detail.
What do I pass in, and what do I get out?
Once I understand the interface clearly, this will help me to understand the implementation.
I take it I need to pass in an audio buffer, I need to tell it how many bytes to use for each computation (say the most recent 1024 bytes from this buffer). and maybe I need to specify the range of pitches I want it to detect. Now it is going to pass back what? An array of frequency bins? What are these?
(Edit:) I have found a C++ algorithm to use (if I can only understand it)
Performous extracts pitch from the microphone. Also the code is open source. Here is a description of what the algorithm does, from the guy that coded it.
PCM input (with buffering)
FFT (1024 samples at a time, remove 200 samples from front of the buffer afterwards)
Reassignment method (against the previous FFT that was 200 samples earlier)
Filtering of peaks (this part could be done much better or even left out)
Combining peaks into sets of harmonics (we call the combination a tone)
Temporal filtering of tones (update the set of tones detected earlier instead of simply using the newly detected ones)
Pick the best vocal tone (frequency limits, weighting, could use the harmonic array also but I don't think we do)
But could someone help me understand how this works? What is it that is getting sent from the FFT to the Reassignment method?
The FFT is just one building block in the process, and it may not be the best approach for pitch detection. Read up on pitch detection and decide which algo you want to use first (this will depend on what exactly you are trying to measure the pitch of - speech, single musical instrument, other types of sound, etc. Get this right before getting into low level details such as the FFT (some, but not all pitch detection algorithms use the FFT internally).
There are numerous similar questions on SO already, e.g. Real-time pitch detection using FFT and Pitch detection using FFT for trumpet, and there is good overview material on Wikipedia etc - read these and then decide whether you still want to roll your own FFT-based solution or perhaps use an existing library which is suitable for your particular application.
There is an element of choice here. The most straightforward to implement is to do (2^n samples in) complex numbers in, and 2^n complex numbers out, so maybe you should start with that.
In the special case of a DCT(discrete cosine transform), typically what goes in is 2^n samples (often floats), and out go 2^n values, often floats too. DCT is an FFT but that takes only the real values, and analyses the function in terms of cosines.
It is smart (but commonly skipped) to define a struct to handle the complex values. Traditionally FFT's are done in-place, but it works fine if you don't.
It can be useful to instantiate a class that contains a work buffer for the FFT (if you don't want to do the FFT in-place), and reuse that for several FFTs.
In goes N samples of PCM (purely real complex numbers). Out comes N bins of frequency domain (each bin corresponding to a 1/N slice of sample rate). Each bin is a complex number. Rather than real and imaginary parts, these values should generally be handled in polar format (absolute value and argument). The absolute value tells the amount of sound near the bin center frequency while the argument tells the phase (at which position the sine wave is travelling).
Most often coders only use the magnitude (absolute value) and throw away the phase angle (argument).