How to Spectrum-inverse a sampled audio signal - algorithm

I am looking for a simple (pseudo)code that spectrum-inverse a sampled audio signal.
Ideally C++
The code should support different sample rates (16/32/48KHz).

Mixing the signal by Fs/2 will swap high frequencies and low frequencies - think of rotating the spectrum around the unit circle by half a turn. You can achieve this rotation by multiplying every other sample by -1.
Mixing by Fs/2 is equivalent to mixing by exp(j*pi*n). If x is the input and y the output,
y[n] = x[n] * exp(j*pi*n) = x[n] * [cos(pi*n) + j*sin(pi*n)]
This simplifies easily because sin(pi*n) is 0, and cos(pi*n) is alternating 1,-1.

In order to get something that has the same type of temporal structure as the original, you need to
Create a spectrogram (with some window size)
Pick some upper and lower frequency bounds that you'll flip
Flip the spectogram's intensities within those bounds
Resynthesize a sound signal consistent with those frequencies
Since it's an audio signal, it doesn't much matter that the phases will be all messed up. You can't generally hear them anyway. Except for the flipping part, ARSS does the spectrogram creation and sound resynthesis.
Otherwise, you can just take a FFT, invert the amplitudes of the components, and take the inverse FFT. But that will be essentially nonsensical, as it will completely scramble the temporal structure of the sound as well as the frequency structure.

it does not make much sense to use a cosine. for a digital signal it is not neccessary to run a real ringmod here, at nyquist a consine is a square anyway.
so you would just multiply every other sample by *-1 and you are done.
no latency, no aliasing, no nothing.

Related

What algorithm can find noise in a signal?

In class the lecturer asked us a question, during a concert the conductor hears a false note (noise), translating this into signal fundamentals how does the conductor detect this noise?
My guess is that it may be related to the Fourier transform, but I'm not sure I'm even close to the answer.
Check out a spectrogram. it's a 3d representation of a frequency domain over a time.
As a general routine:
divide your time-domain recording of a musical piece into suitably sized time segments, small enough that they can capture the shortest musical note you want to represent (preferably smaller than this amount as you want more granular measurements of when the note starts/stops).
take the Fourier transform of each time segment, and represent this information in a spectrogram (X-axis for time, Y-axis for frequency, and Z-axis (colour) for signal power).
do appropriate filtering on each time segment to keep only frequencies with significant signal power.
compare this against your sheet music. Sheet music is essentially a spectrogram, telling you which notes(frequencies) should be played at which times (using the BPM or time signature of the music). if you have a note present in the spectrogram but not in the sheet music, it's spurious or accidental (or the result of a badly formed spectrogram).

What is a good algorithm to detect silence over a variety of recording environments?

My app processes samples from microphone audio streams. The task I'm asking about: programmatically make a good guess at what ranges of the audio stream samples should be considered signal versus noise. The "signal", in this case, is human speech.
The audio I'm getting from users comes from recording environments that I can't control or know much about. These users may be speaking into a professional microphone from a treated space or into a crummy laptop mic in their living room. Very noisy environments with excessive background noise, e.g. a busy restaurant, are outside of what I need to accommodate.
It would make the analysis simpler, but I don't want to request the user to record a room noise sample within my app. And I don't want the user to manually designate a range of audio as silence.
If I'm just looking at recorded audio within a DAW, it is simple and intuitive to spot where silence (nobody talking) is within the waveform. Basically, there's a bunch of relatively flat horizontal lines that are closer to negative infinity db (absolute silence) than any other flat lines.
I have no problems using various sound APIs to access samples. I've omitted technical specifics because I'm just asking about a suitable algorithm, and even if an existing library were available, I'd prefer my own implementation.
What algorithm is good for classifying sample ranges as silence for the requirements I've described?
Here is one simple algorithm that has its pros and cons:
Perform the calculation of noise floor as described below once. Or to continuously respond to changing conditions during recording, you can perform this calculation at some reasonably responsive interval like once every second.
Let K = a constant ratio of noise to signal, e.g. 10%, expected in worst cases.
Let A = the sample with the highest amplitude from the microphone audio stream.
Noise floor is A * K
Once the noise floor has been calculated, you can apply it to any range of the audio stream samples to classify values above the noise floor as signal and below the noise floor as noise.
With the above algorithm, samples are assumed to be stored in a typical computer audio format, e.g. PCM, where 0 is silence and a negative/positive sample value is air pressure creating sound. Negative samples can be negated to positive values when evaluating them.
K could be something like 10%. It's the noise/signal ratio expected in one of the poorest recording environments you'd like to support. Analyzing test recordings will show what the ratio should be. The higher the ratio is, the more noise will be miss-classified as signal by the algorithm.
Pros:
Easy to implement.
Computationally inexpensive. O(n) for a single pass over a sample array to find the highest peak value.
Cons:
It depends on the samples used to calculate a noise floor having signal (speech) in them. So there has to be some way of knowing the samples contain signal outside of the algorithm.
Any loud noises, e.g. a hand clap, that aren't speech but have a higher amplitude, can cause the noise floor to raise above speech causing speech to be miss-classified as noise.
The K value is a fudge factor. There are more direct ways to detect the actual noise floor from the samples.

Filtering rotational acceleration (Appropriate use for Kalman filter?)

I'm working on a project in which a rod is attached at one end to a rotating shaft. So, as the shaft rotates from 0 to ~100 degrees back-and-forth (in the xy plane), so does the rod. I mounted a 3-axis accelerometer at the end of the moving rod, and I measured the distance of the accelerometer from the center of rotation (i.e., the length of the rod) to be about 38 cm. I have collected a lot of data, but I'm in need of help to find the best method to filter it. First, here's a plot of the raw data:
I think the data makes sense: if it's ramping up, then then I think at that point the acceleration should be linearly increasing, and then when it's ramping down, it should linearly decrease. If its moving constantly, the acceleration will be ~zero. Keep in mind though that sometimes the speed changes (is higher) from one "trial" to the other. In this case, there were ~120 "trials" or movements/sweeps, data sampled at 148 Hz.
For filtering, I've tried a low pass filter and then an exponentially decreasing moving average, and both plots weren't too hot. And although I'm not good at interpreting these: here is what I got when coding a power frequency plot:
What I was hoping to get help with here is, attain a really good method by which I can filter this data. The one thing that keeps coming up again time and time again (especially on this site) is the Kalman filter. While there's lots of code online that helps implementing these in MATLAB, I haven't been able to actually understand it that great, and therefore neglect to include my work on it here. So, is a kalman filter appropriate here, for rotational acceleration? If so, can someone help me implement one in matlab and interpret it? Is there something I'm not seeing that may be just as good/better that is relatively simple?
Here's the data I'm talking about. Looking at it more closely/zooming in gives a better appreciation for what's going on in the movement, I think:
http://cl.ly/433B1h3m1L0t?_ga=1.81885205.2093327149.1426657579
Edit: OK, here is the plot of both relavent dimensions collected from the accelerometer. I am neglecting to include the up and down dimension as the accelerometer shows a near constant ~1 G, so I think its safe to say its not capturing much rotational motion. Red is what I believe is the centripetal component, and blue is tangential. I have no idea how to combine them though, which is why I (maybe wrongfully?) ignored it in my post.
And here is the data for the other dimension:
http://cl.ly/1u133033182V?_ga=1.74069905.2093327149.1426657579
Forget the Kalman filter, see the note at the end of the answer for the reason why.
Using a simple moving average filter (like I showed you on an earlier reply if i recall) which is in essence a low-pass filter :
n = 30 ; %// length of the filter
kernel = ones(1,n)./n ;
ysm = filter( kernel , 1 , flipud(filter( kernel , 1 , flipud(y) )) ) ;
%// assuming your data "y" are in COLUMN (otherwise change 'flipud' to 'fliplr')
note: if you have access to the curvefit toolbox, you can simply use: ys = smooth(y,30) ; to get nearly the same result.
I get:
which once zoomed look like:
You can play with the parameter n to increase or decrease the smoothing.
The gray signal is your original signal. I strongly suspect that the noise spikes you are getting are just due to the vibrations of your rod. (depending on the ratio length/cross section of your rod, you can get significant vibrations at the end of your 38 cm rod. These vibrations will take the shape of oscillations around the main carrier signal, which definitely look like what I am seeing in your signal).
Note:
The Kalman filter is way overkill to do a simple filtering of noisy data. Kalman filter is used when you want to calculate a value (a position if I follow your example) based on some noisy measurement, but to refine the calculations, the Kalman filter will also use a prediction of the position based on the previous state (position) and the inertial data (how fast you were rotating for example). For that prediction you need a "model" of the behavior of your system, which you do not seem to have.
In your case, you would need to calculate the acceleration seen by the accelerometer based on the (known or theoretical) rotation speed of the shaft at any point of time, the distance of the accell to the center of rotation, and probably to make it more precise, a dynamic model of the main vibration modes of your rod. Then for each step, compare that to the actual measurement... seems a bit heavy for your case.
Look at the quick figure explaining the Kalman filter process in this wikipedia entry : Kalman filter, and read on if you want to understand it more.
I will propose for you low-pass filter, but ordinary first-order inertial model instead of Kalman. I designed filter with pass-band till 10 Hz (~~0,1 of your sample frequency). Discrete model has following equation:
y[k] = 0.9418*y[k-1] + 0.05824*u[k-1]
where u is your measured vector, and y is vector after filtering. This equation starts at sample number 1, so you can just assign 0 to the sample number 0.

Finding the frequency of a sine wave in the microphone

I am looking for a way to get the frequency of a sine wave from a tape recorder plugged into a microphone socket on a Windows PC. It's for a small project I'm working on to see if I can store data on sound tapes, so I'll be reading and writing frequencies to the tape to store data.
Thanks
A simple way to estimate the frequency of a sine wave is doing a spectrum analysis and look for the "loudest" frequency (roughly):
take one chunk of audio (for example 256 samples) from the audio
file, or from the audio input
window the audio chunk ^
compute its power spectrum (using an FFT algorithm^^)
look for the dominant frequency, which should be the frequency of the sine wave
repeat until you have audio data
I expect it to work well with simple tones.
^ see http://en.wikipedia.org/wiki/Window_function
^^ there are plenty of FFT implementations available, for example http://www.fftw.org/
If there's only one sine wave at any given time, you can count how many times per second the signal changes its value from positive to negative (IOW, crosses 0) or the other way around and that will give you the frequency. Or you can measure the time between consecutive zero crossings. This is a pretty simple and cheap solution.

Anti-aliasing: Preferred ways of determing maximum frequency?

I've been reading up a bit on anti-aliasing and it seems to make sense, but there is one thing I'm not too sure of. How exactly do you find the maximum frequency of a signal (in the context of graphics).
I realize there's more than one case so I assume there is more than one answer. But first let me state a simple algorithm that I think would represent maximum frequency so someone can tell me if I'm conceptualizing it the wrong way.
Let's say it's for a 1 dimensional,finite, and greyscale image (in pixels). Am I correct in assuming you could simply scan the entire pixel line (in the spatial domain) looking for a for the minimum oscillation and the inverse of that smallest oscillation would be the maximum frequency?
Ex values {23,26,28,22,48,49,51,49}
Frequency:Pertaining to Set {}
(1/2) = .5 : {28,22}
(1/4) = .25 : {22,48,49,51}
So would .5 be the maximum frequency?
And what would be the ideal way to calculate this for a similar pixel line as the one above?
And on a more theoretical note, what if your sampling input was infinite (more like the real world)? Would a valid process be sort of like:
Predetermine a decent interval for point sampling
Determine max frequency from point sampling
while(2*maxFrequency > pointSamplingInterval)
{
pointSamplingInterval*=2
Redetermine maxFrequency from point sampling (with new interval)
}
I know these algorithms are fraught with inefficiencies, so what are some of the preferred ways? (Not looking for something crazy-optimized, just fundamentally better concepts)
The proper way to approach this is using a Fourier Transform (in practice, an FFT,or fast fourier transform)
The theory works as follows: if you have an set of pixels with color/grayscale, then we can say that the image is represented by pixels in the "spatial domain"; that is, each individual number specifies the image at a particular spatial location.
However, what we really want is a representation of the image in the "frequency domain". Instead of each individual number specifying each pixel, each number represents the amplitude of a particular frequency in the image as a whole.
The tool which converts from the "spatial domain" to the "frequency domain" is the Fourier Transform. The output of the FT will be a sequence of numbers specifying the relative contribution of different frequencies.
In order to find the maximum frequency, you perform the FT, and look at the amplitudes that you get for the high frequencies - then it is just a matter of searching from the highest frequency down until you hit your "minimum significant amplitude" threshold.
You can code your own FFT, but it is much easier in practice to use a pre-packaged library such as FFTW
You don't scan a signal for the highest frequency and then choose your sampling frequency: You choose a sampling frequency that's high enough to capture the things you want to capture, and then you filter the signal to remove high frequencies. You throw away everything higher than half the sampling rate before you sample it.
Am I correct in assuming you could
simply scan the entire pixel line (in
the spatial domain) looking for a for
the minimum oscillation and the
inverse of that smallest oscillation
would be the maximum frequency?
If you have a line of pixels, then the sampling is already done. It's too late to apply an antialiasing filter. The highest frequency that could be present is half the sampling frequency ("1/2px", I guess).
And on a more theoretical note, what
if your sampling input was infinite
(more like the real world)?
Yes, that's when you use the filter. First, you have a continuous function, like a real-life image (infinite sampling rate). Then you filter it to remove everything above fs/2, then you sample it at fs (digitize the image into pixels). Cameras don't actually do any filtering, which is why you get Moire patterns when you photograph bricks, etc.
If you're anti-aliasing computer graphics, you have to think of the ideal continuous mathematical function first, and think through how you would filter it and digitize it to produce the output on the screen.
For instance, if you want to generate a square wave with a computer, you can't just naively alternate between maximum and minimum values. That would be just like sampling a real life signal without filtering first. The higher harmonics wrap back into the baseband and cause lots of spurious spikes in the spectrum. You need to generate points as if they were sampled from a filtered continuous mathematical function:
I think this article from the O'Reilly site might also be useful to you ... http://www.onlamp.com/pub/a/python/2001/01/31/numerically.html ... in there they're referring to frequency analysis of sound files but you it gives you the idea.
I think what you need is an application of Fourier Analysis (http://en.wikipedia.org/wiki/Fourier_analysis). I've studied this but never used it so take it with a pinch of salt but I believe if you apply it correctly to your set of numbers you will get a set of frequencies which are components of the series and then you can pick off the highest one.
I can't point you at a piece of code that does this but I'm sure it would be out there somewhere .

Resources