Is modulation index = "Out" level in Modulator Operator - audio-processing

I'm reading about Algorithms in Frecuency Modulation. In most synthetizers each algorithm operator have an "Out" level knob, in carriers this knob controls the output volume. For modulators however the level knob decides the amount of change it does to the carrier.
Is this amount the Modulation Index?

Short answer: yes, I think you're understanding the concept correctly.
The modulation index is the ratio between the carrier and modulator frequency deviations. The modulation index is directly proportional to the amplitude of the modulator, and inversely proportional to the frequency of the modulator.
The formula for the modulation index is:
You've mentioned that you can set the output level for each operator on your synth. For FM radio, the amplitude of the carrier wave is constant. In music synthesizers you can adapt it to tweak sounds.
That's also because you often have more complex algorithms than what's used for FM radio (one modulator+carrier only). In a DX7 you can cascade up to 6 operators, and in the FS1R and Montage you have 8.
In FM synths you'd use it to get more or fewer sideband frequencies in the resulting signal.
By the way, if you're talking about FM synths:
It's mostly an implementation detail, but they don't actually modulate the frequency but the phase.

Related

power sampling in GNU Radio

I am using GNU Radio with hackrf. I need to get a picking for each frequency in accordance with the selected decibel level or a level above a certain decibel threshold and save the frequencies/db to a file.
Trying to solve this, I decided to recreate the "QT GUI Frequency Sink" algorithm through the Embedded python block, but unfortunately I lack knowledge of how to convert comlex64 data to fft freq / amplitude signal. I hit the wall for several months, I will be glad to any advice.
First off: the hackRF is not a calibrated measurement device. You need to use a calibrated device to know at any given gain, sampling rate and RF frequency, what a digital power corresponds to in physical units. No exceptions.
but unfortunately I lack knowledge of how to convert comlex64 data to fft freq / amplitude signal.
You literally just apply the FFT of the length of your choosing, then calculate the magnitude of each complex value in that result. There's nothing more to it. (If the FFT being an algorithm that works on vectors of complex number and produces vectors of complex numbers of the same size confuses you, you don't have a programming, but a math basics problem!)
In GNU Radio, you can ask for your signal to come in multiples of any given length, so that getting vectors of input samples to transform is trivial python slicing.

How do fourier processing algorithms deal with "data edges"

I am doing some interesting experiments with audio and image files and Fast-Fourier Transforms (FFTs).
Fast Fourier Transforms are used in signal processing rather than other Fourier Transform algorithms because for large quantities of data they are the only (or one of the only) viable algorithm variants to use, as they scale as O(n log(n)), rather than n^2 as the naive implementation does.
The disadvantage is that the data must be stored in an array which has 2^n elements, for n integer.
When processing some data which does not have 2^n elements, the simple approach is to extend the array to be length 2^n and fill the "empty" elements with zero. (Assuming the mean value of the input signal is zero.)
I wrote a program to process some audio samples taken from WAV files. I tried implementing things such as a low-cut filter. In this case I found that my output signal (after doing the reverse transform) cuts to zero amplitude after a certain period of time. This is obviously not what one would expect of a low-pass filter.
I could dump my code at this point, but that is neither useful, nor legal as the source of my algorithm is a text-book with closed source code.
Instead I shall ask the following question.
Is packing out the array with zeros the best possible thing to do? Could this be causing my program to produce the unexpected results I am seeing? if I understand fourier mathematics correctly, having a bunch of zeros at the end of my array will introduce a large amount of low and high-frequency content as this essentially looks like a step-function (low frequency square wave). Should I be doing something else such as implementing my band-pass filter in a different way, for example, splitting the data into smaller groups of say 1024 samples and applying the FT, filter and IFT (inverse FT) to those small groups?
This question has been tagged with theory as it is not related to any specific programming language. (I assume that is the correct tag to use?)
Edit: It's now working beautifully, thanks all, I was able to pinpoint the 2 mistakes I made using the information below.
All finite length DFTs and FFT multiply longer data (longer source data or wav file than the FFT) with a rectangular window, which convolves the spectrum with a (periodic) Sinc function. Zero padding uses a shorter rectangular window, which results in the convolution of the spectrum with a wider Sinc function.
Filtering by multiplication of FFTs results in circular convolution, which wraps the impulse response of the filter around the FFT/IFFT result (e.g. the end of your filtered signal will interfere with the beginning of the filtered signal within the IFFT result). So you want to zero-pad your data before the FFT, and then see the impulse response of your filter go to zero at or before the very end of the filtered result (e.g. not wrap around). Look up the overlap-add and overlap-save algorithms, for using short FFTs for fast convolution filtering of longer signals, which take care of the filter impulse response extending into the zero-padded portion.
You can also use FFTs that are not a power of 2 in length. Any length that can be factored into small primes will work with most modern FFT libraries.
It depends what you are interested in.
If you are just interested in spectrum magnitude, then place the real data in the middle of the window to be processed. Just know that this time shift will put a phase shift into the spectrum result.
Regardless of the number of points, do not forget to place a window on your data. Wikipedia has a good write up on the windowing functions at https://en.wikipedia.org/wiki/Window_function.
If you do not perform some sort of windowing on your real world data, the padded signal will appear to have a step up and a step down at the end of the valid data (which puts a lot of noise into your spectrum giving you the false impression that you have a noise floor).
So, my recommendation, if you primarily care about magnitude:
- develop a hamming window for the number of points of valid data you have.
- apply the hamming window to the data you have
After that you have OPTIONS:
A) if your samples are slightly above a base two number, use the lower base two number (i.e. if you have 1400 points, do two 1024 point FFTs with overlap). The results of these two FFTs can be "smartly" combined for an aggregate spectrum. Depending on your fidelity needs, you can do this with more FFTs with a larger portion of overlapped data. Try to keep the overlap less that 10% to account for your window edges that will get attenuated by the start and end of the windowing functions.
B) place your windowed data anywhere in the FFT input vector (beginning, middle or end, it should only impact your phase results - which is why I asked if phase is important).
If it turns out phase is important, start your valid windowed data at the beginning of the FFT vector.
Regarding your spectrum observations (I just went through the same thing two weeks ago). If you are looking at a wave file converted from a lossy compression, you are going to be starting with a band limited signal, so expect the spectrum to do an abrupt drop. My first lossless wave file plot had a huge bald spot from Fs/10 -> 9Fs/10 (which is expected). For your plots - also display your data in logarithmic bins (linear bins will give you misleading info and squish the lower frequency elements which are the bulk of the signal in compressed music files).
FYI - I recommended hamming (because I did the same thing). A decoded compressed audio signal will only use a portion of your spectrum (decoding a 320kbps stream is sampled at 10Khz), even when decoded to 44.1Khz representation, all of the interesting data should be below 5Khz.
Best of luck
J.R.
P.S. this is my first post here, chime back if you want some pretty pictures from TeraPlot.
This is a question for http://dsp.stackexchange.com but yes, zero-padding is perfectly legitimate here.
Here’s why the filtered signal (once it’s back in the time-domain) goes to zero after some time: imagine linearly-convolving the zero-padded signal with your low-pass filter’s impulse response (using the slow O(N^2) time-domain filter implementation). The output will go to zero after the original signal is done, when the filter is just being fed with zeros, right? That result will be the same as the output of FFT-based fast convolution. It’s perfectly normal. Just crop the output signal to the same length of the input and move on with your life.
Caveat on FFT orders: just because power-of-two FFT lengths are “the fastest” in terms of operation count, while FFTs of lengths with low prime factors (3, 5, 7) have slightly higher operation counts, you may find that zero-padding to a low-prime-factor is faster in terms of real-world runtime because of memory costs. A pathological example: if you have a 1025-long signal, you probably don’t want to zero-pad to 2048 and eat the cost of allocating a nearly 2x memory buffer, and running a nearly 2x longer FFT. You’d try 1080-length FFT or something (1080 = 2^3 * 3^3 * 5: nextprod is your friend) and wouldn’t be surprised if it completed much faster than power-of-two.

When should I use Low-pass Filter?

I'm trying to find a pitch of a guitar string. Sound is coming in through mic at a sample rate of 44100. I'm using 2048 bites for a buffer size. Considering the Nyquist rate there is no point for using bigger buffer size. After recieving the data, I apply hanning window... and this is the point where I get confused. Should I use Lowpass filter in the time domain or take FFT first? If I would take FFT first, wouldn't it be easier to use just the first half of the samples, disregarding the other half, because I need frequencies in range of 50-1000? After FFT I will use Harmonic Product Spectrum to find fundamental frequency.
What you suggest makes some sense: if you don't need low frequencies you don't need to use long samples. With long samples you gain frequency resolution, which might be useful in some circumstances, but you lose time resolution (in the sense that successive samples are further apart).
A few things that don't make sense:
1) using a low-pass digital filter in the computation prior to the FFT (I'm assuming this is what you mean) just takes extra computation time and doesn't really gain you anything.
2) "Considering the Nyquist rate there is no point for using bigger buffer size": these aren't really related. The Nyquist rate determines the maximum frequency of the FFT, and the buffer size determines the frequency resolution, and therefore also the lowest frequency.
It really depends on your pitch detection algorithm, but why would you use a low-pass filter in the first place?
In addition, a guitar usually produces spectral information way beyond 1000Hz. Notes on the high E string easily produce harmonics at 4-5kHz and beyond, and these harmonics are exactly what will make your HPS nice and clear.
The less data used or the shorter your FFT, the lower the resulting FFT frequency resolution.
From what I read here a guitar ranges from 82.4 (open 6th string) to 659.2 (12th fret on 1st string) and the difference between the lowest 2 notes is about 5Hz.
If possible, I would apply an analog filter after the mic, but before the sampling circuit. Failing that, you would normally apply an FIR filter before shaping everything with the Hanning function. You could also use Decimation to reduce the sample rate, or simply choose a lower sample rate to start with.
Since you are doing an FFT anyway, simply throw away results above 1000 Hz. Sadly, you can't cut back on the number of samples - cutting the sample rate reduces frequency resolution.
2048 samples at 44100 Hz will give the same resolution as 1024 samples at 22050 Hz.
Which the same as 512 samples at 11025 Hz.

Algorithm to decide if digital audio data is clipping?

Is there an algorithm or some heuristic to decide whether digital audio data is clipping?
The simple answer is that if any sample has the maximum or minimum value (-32768 and +32767 respectively for 16 bit samples), you can consider it clipping. This isn't stricly true, since that value may actually be the correct value, but there is no way to tell whether +32767 really should have been +33000.
For a more complicated answer: There is such a thing as sample counting clipping detectors that require x consecutive samples to be at the max/min value for them to be considered clipping (where x may be as high as 7). The theory here is that clipping in just a few samples is not audible.
That said, there is audio equipment that clips quite audible even at values below the maximum (and above the minimum). Typical advice is to master music to peak at -0.3 dB instead of 0.0 dB for this reason. You might want to consider any sample above that level to be clipping. It all depends on what you need it for.
If you ever receive values at the maximum or minimum, then you are, by definition, clipping. Those values represent their particular value as well as all values beyond, and so they are best used as outside bounds detectors.
-Adam
For digital audio data, the term "clipping" doesn't really carry a lot of meaning other than "max amplitude". In the analog world, audio data comes from some hardware which usually contains a "clipping register", which allows you the possibility of a maximum amplitude that isn't clipped.
What might be better suited to digital audio is to set some threshold based on the limitations of your output D/A. If you're doing VOIP, then choose some threshold typical of handsets or cell phones, and call it "clipping" if your digital audio gets above that. If you're outputting to high-end home theater systems, then you probably won't have any "clipping".
I just noticed that there even are some nice implementations.
For example in Audacity:
Analyze → Find Clipping…
What Adam said. You could also add some logic to detect maximum amplitude values over a period of time and only flag those, but the essence is to determine if/when the signal hits the maximum amplitude.

Peak detection of measured signal

We use a data acquisition card to take readings from a device that increases its signal to a peak and then falls back to near the original value. To find the peak value we currently search the array for the highest reading and use the index to determine the timing of the peak value which is used in our calculations.
This works well if the highest value is the peak we are looking for but if the device is not working correctly we can see a second peak which can be higher than the initial peak. We take 10 readings a second from 16 devices over a 90 second period.
My initial thoughts are to cycle through the readings checking to see if the previous and next points are less than the current to find a peak and construct an array of peaks. Maybe we should be looking at a average of a number of points either side of the current position to allow for noise in the system. Is this the best way to proceed or are there better techniques?
We do use LabVIEW and I have checked the LAVA forums and there are a number of interesting examples. This is part of our test software and we are trying to avoid using too many non-standard VI libraries so I was hoping for feedback on the process/algorithms involved rather than specific code.
There are lots and lots of classic peak detection methods, any of which might work. You'll have to see what, in particular, bounds the quality of your data. Here are basic descriptions:
Between any two points in your data, (x(0), y(0)) and (x(n), y(n)), add up y(i + 1) - y(i) for 0 <= i < n and call this T ("travel") and set R ("rise") to y(n) - y(0) + k for suitably small k. T/R > 1 indicates a peak. This works OK if large travel due to noise is unlikely or if noise distributes symmetrically around a base curve shape. For your application, accept the earliest peak with a score above a given threshold, or analyze the curve of travel per rise values for more interesting properties.
Use matched filters to score similarity to a standard peak shape (essentially, use a normalized dot-product against some shape to get a cosine-metric of similarity)
Deconvolve against a standard peak shape and check for high values (though I often find 2 to be less sensitive to noise for simple instrumentation output).
Smooth the data and check for triplets of equally spaced points where, if x0 < x1 < x2, y1 > 0.5 * (y0 + y2), or check Euclidean distances like this: D((x0, y0), (x1, y1)) + D((x1, y1), (x2, y2)) > D((x0, y0),(x2, y2)), which relies on the triangle inequality. Using simple ratios will again provide you a scoring mechanism.
Fit a very simple 2-gaussian mixture model to your data (for example, Numerical Recipes has a nice ready-made chunk of code). Take the earlier peak. This will deal correctly with overlapping peaks.
Find the best match in the data to a simple Gaussian, Cauchy, Poisson, or what-have-you curve. Evaluate this curve over a broad range and subtract it from a copy of the data after noting it's peak location. Repeat. Take the earliest peak whose model parameters (standard deviation probably, but some applications might care about kurtosis or other features) meet some criterion. Watch out for artifacts left behind when peaks are subtracted from the data.
Best match might be determined by the kind of match scoring suggested in #2 above.
I've done what you're doing before: finding peaks in DNA sequence data, finding peaks in derivatives estimated from measured curves, and finding peaks in histograms.
I encourage you to attend carefully to proper baselining. Wiener filtering or other filtering or simple histogram analysis is often an easy way to baseline in the presence of noise.
Finally, if your data is typically noisy and you're getting data off the card as unreferenced single-ended output (or even referenced, just not differential), and if you're averaging lots of observations into each data point, try sorting those observations and throwing away the first and last quartile and averaging what remains. There are a host of such outlier elimination tactics that can be really useful.
You could try signal averaging, i.e. for each point, average the value with the surrounding 3 or more points. If the noise blips are huge, then even this may not help.
I realise that this was language agnostic, but guessing that you are using LabView, there are lots of pre-packaged signal processing VIs that come with LabView that you can use to do smoothing and noise reduction. The NI forums are a great place to get more specialised help on this sort of thing.
This problem has been studied in some detail.
There are a set of very up-to-date implementations in the TSpectrum* classes of ROOT (a nuclear/particle physics analysis tool). The code works in one- to three-dimensional data.
The ROOT source code is available, so you can grab this implementation if you want.
From the TSpectrum class documentation:
The algorithms used in this class have been published in the following references:
[1] M.Morhac et al.: Background
elimination methods for
multidimensional coincidence gamma-ray
spectra. Nuclear Instruments and
Methods in Physics Research A 401
(1997) 113-
132.
[2] M.Morhac et al.: Efficient one- and two-dimensional Gold
deconvolution and its application to
gamma-ray spectra decomposition.
Nuclear Instruments and Methods in
Physics Research A 401 (1997) 385-408.
[3] M.Morhac et al.: Identification of peaks in
multidimensional coincidence gamma-ray
spectra. Nuclear Instruments and
Methods in Research Physics A
443(2000), 108-125.
The papers are linked from the class documentation for those of you who don't have a NIM online subscription.
The short version of what is done is that the histogram flattened to eliminate noise, and then local maxima are detected by brute force in the flattened histogram.
I would like to contribute to this thread an algorithm that I have developed myself:
It is based on the principle of dispersion: if a new datapoint is a given x number of standard deviations away from some moving mean, the algorithm signals (also called z-score). The algorithm is very robust because it constructs a separate moving mean and deviation, such that signals do not corrupt the threshold. Future signals are therefore identified with approximately the same accuracy, regardless of the amount of previous signals. The algorithm takes 3 inputs: lag = the lag of the moving window, threshold = the z-score at which the algorithm signals and influence = the influence (between 0 and 1) of new signals on the mean and standard deviation. For example, a lag of 5 will use the last 5 observations to smooth the data. A threshold of 3.5 will signal if a datapoint is 3.5 standard deviations away from the moving mean. And an influence of 0.5 gives signals half of the influence that normal datapoints have. Likewise, an influence of 0 ignores signals completely for recalculating the new threshold: an influence of 0 is therefore the most robust option.
It works as follows:
Pseudocode
# Let y be a vector of timeseries data of at least length lag+2
# Let mean() be a function that calculates the mean
# Let std() be a function that calculates the standard deviaton
# Let absolute() be the absolute value function
# Settings (the ones below are examples: choose what is best for your data)
set lag to 5; # lag 5 for the smoothing functions
set threshold to 3.5; # 3.5 standard deviations for signal
set influence to 0.5; # between 0 and 1, where 1 is normal influence, 0.5 is half
# Initialise variables
set signals to vector 0,...,0 of length of y; # Initialise signal results
set filteredY to y(1,...,lag) # Initialise filtered series
set avgFilter to null; # Initialise average filter
set stdFilter to null; # Initialise std. filter
set avgFilter(lag) to mean(y(1,...,lag)); # Initialise first value
set stdFilter(lag) to std(y(1,...,lag)); # Initialise first value
for i=lag+1,...,t do
if absolute(y(i) - avgFilter(i-1)) > threshold*stdFilter(i-1) then
if y(i) > avgFilter(i-1)
set signals(i) to +1; # Positive signal
else
set signals(i) to -1; # Negative signal
end
# Adjust the filters
set filteredY(i) to influence*y(i) + (1-influence)*filteredY(i-1);
set avgFilter(i) to mean(filteredY(i-lag,i),lag);
set stdFilter(i) to std(filteredY(i-lag,i),lag);
else
set signals(i) to 0; # No signal
# Adjust the filters
set filteredY(i) to y(i);
set avgFilter(i) to mean(filteredY(i-lag,i),lag);
set stdFilter(i) to std(filteredY(i-lag,i),lag);
end
end
Demo
> For more information, see original answer
This method is basically from David Marr's book "Vision"
Gaussian blur your signal with the expected width of your peaks.
this gets rid of noise spikes and your phase data is undamaged.
Then edge detect (LOG will do)
Then your edges were the edges of features (like peaks).
look between edges for peaks, sort peaks by size, and you're done.
I have used variations on this and they work very well.
I think you want to cross-correlate your signal with an expected, exemplar signal. But, it has been such a long time since I studied signal processing and even then I didn't take much notice.
I don't know very much about instrumentation, so this might be totally impractical, but then again it might be a helpful different direction. If you know how the readings can fail, and there is a certain interval between peaks given such failures, why not do gradient descent at each interval. If the descent brings you back to an area you've searched before, you can abandon it. Depending upon the shape of the sampled surface, this also might help you find peaks faster than search.
Is there a qualitative difference between the desired peak and the unwanted second peak? If both peaks are "sharp" -- i.e. short in time duration -- when looking at the signal in the frequency domain (by doing FFT) you'll get energy at most bands. But if the "good" peak reliably has energy present at frequencies not existing in the "bad" peak, or vice versa, you may be able to automatically differentiate them that way.
You could apply some Standard Deviation to your logic and take notice of peaks over x%.

Resources