What algorithm can find noise in a signal? - algorithm

In class the lecturer asked us a question, during a concert the conductor hears a false note (noise), translating this into signal fundamentals how does the conductor detect this noise?
My guess is that it may be related to the Fourier transform, but I'm not sure I'm even close to the answer.

Check out a spectrogram. it's a 3d representation of a frequency domain over a time.
As a general routine:
divide your time-domain recording of a musical piece into suitably sized time segments, small enough that they can capture the shortest musical note you want to represent (preferably smaller than this amount as you want more granular measurements of when the note starts/stops).
take the Fourier transform of each time segment, and represent this information in a spectrogram (X-axis for time, Y-axis for frequency, and Z-axis (colour) for signal power).
do appropriate filtering on each time segment to keep only frequencies with significant signal power.
compare this against your sheet music. Sheet music is essentially a spectrogram, telling you which notes(frequencies) should be played at which times (using the BPM or time signature of the music). if you have a note present in the spectrogram but not in the sheet music, it's spurious or accidental (or the result of a badly formed spectrogram).

Related

What is a good algorithm to detect silence over a variety of recording environments?

My app processes samples from microphone audio streams. The task I'm asking about: programmatically make a good guess at what ranges of the audio stream samples should be considered signal versus noise. The "signal", in this case, is human speech.
The audio I'm getting from users comes from recording environments that I can't control or know much about. These users may be speaking into a professional microphone from a treated space or into a crummy laptop mic in their living room. Very noisy environments with excessive background noise, e.g. a busy restaurant, are outside of what I need to accommodate.
It would make the analysis simpler, but I don't want to request the user to record a room noise sample within my app. And I don't want the user to manually designate a range of audio as silence.
If I'm just looking at recorded audio within a DAW, it is simple and intuitive to spot where silence (nobody talking) is within the waveform. Basically, there's a bunch of relatively flat horizontal lines that are closer to negative infinity db (absolute silence) than any other flat lines.
I have no problems using various sound APIs to access samples. I've omitted technical specifics because I'm just asking about a suitable algorithm, and even if an existing library were available, I'd prefer my own implementation.
What algorithm is good for classifying sample ranges as silence for the requirements I've described?
Here is one simple algorithm that has its pros and cons:
Perform the calculation of noise floor as described below once. Or to continuously respond to changing conditions during recording, you can perform this calculation at some reasonably responsive interval like once every second.
Let K = a constant ratio of noise to signal, e.g. 10%, expected in worst cases.
Let A = the sample with the highest amplitude from the microphone audio stream.
Noise floor is A * K
Once the noise floor has been calculated, you can apply it to any range of the audio stream samples to classify values above the noise floor as signal and below the noise floor as noise.
With the above algorithm, samples are assumed to be stored in a typical computer audio format, e.g. PCM, where 0 is silence and a negative/positive sample value is air pressure creating sound. Negative samples can be negated to positive values when evaluating them.
K could be something like 10%. It's the noise/signal ratio expected in one of the poorest recording environments you'd like to support. Analyzing test recordings will show what the ratio should be. The higher the ratio is, the more noise will be miss-classified as signal by the algorithm.
Pros:
Easy to implement.
Computationally inexpensive. O(n) for a single pass over a sample array to find the highest peak value.
Cons:
It depends on the samples used to calculate a noise floor having signal (speech) in them. So there has to be some way of knowing the samples contain signal outside of the algorithm.
Any loud noises, e.g. a hand clap, that aren't speech but have a higher amplitude, can cause the noise floor to raise above speech causing speech to be miss-classified as noise.
The K value is a fudge factor. There are more direct ways to detect the actual noise floor from the samples.

Reducing one frequency in song

How would I take a song input and output the same song without certain frequencies?
Based on my research so far, the song should be broken down into chucks, FFT it, reduce the target frequencies, iFFT it, and stitch the chunks back together. However, I am unsure if this is the right approach to take, and if so, how I would convert from the audio to FFT input (what seems to be a vector matrix), how to reduce the target frequencies, and how to convert back from the FFT output to audio, and how to restitch.
For background, my grandpa loves music. However, recently he cannot listen to it, as he has become hypersensitive to certain frequencies within songs. I'm a high school student who has some background coding, and am just getting into algorithmic work and thus have very little experience using these algorithms. Please excuse me if these are basic questions; any pointers would be helpful.
EDIT: So far, I've understood the basic premise of fft (through basic 3blue1brown yt videos and the like) and that it is available through scipy/numpi, and figured out how to convert from youtube to 0.25 second chunks in a wav format.
Your approach is right.
Concerning subquestions:
from the audio to FFT input - assign audio samples to real part of complex signal, imaginary part is zero
how to reduce the target frequencies - multiply FFT results near needed frequency by smooth (to diminish artifacts) function that becomes zero at that freqency and reaches 1.0 some samples apart
how to convert back - just make inverse FFT (don't forget about scale multiplier like 1/N) and copy real part into audio channel
Also consider using of simpler digital filtering - band-stop or notch filter.
Arbitrary found example1 example2
(calculation of parameters perhaps requires some understanding of DSP)

FFT image comparison (theoretical)

Can anybody explain me (simplified) what happen if I do an image comparison with FFT? I somehow don't understand how it's possible to convert a picture into frequencies and how this is used to differentiate between two images. Via Google I can not find a simple description, which I (as non mathematic/informatic) could understand.
Any help would be very appreaciated!
Thanks!
Alas, a good description of an FFT might involve subjects such as the calculus of complex variables and the computational theory of recursive algorithms. So a simple description may not be very accurate.
Think about sound. Looking at the waveform of the sound produced by two singers might not tell you much. The two waveforms would just be a complicated long and messy looking squiggles. But a frequency meter could quickly tell you that one person was singing way off pitch and whether they were a soprano or bass. So you might be able to determine that certain waveforms did not indicate a good match for who was singing from the frequency meter readings.
An FFT is like a big bunch of frequency meters. And each scan line of a photo is a waveform.
Around 2 centuries ago, some guy named Fourier proved that any reasonable looking waveform squiggle could be matched by an appropriate bunch of just sine waves, each at a single frequency. Other people several decades ago figured out a very clever way of very quickly calculating just which bunch of sine waves that that was. The FFT.
Discrete FFT transforms a (2D) matrix of let's say, pixel values, into a 2D matrix in frequency domain. You can use a library like FFTW to convert an image from the ordinary form to the spectral one. The result of your comparison depends on what you really compare.
Fourier transform works in other dimensions than 2d, as well. But you'll be interested in a 2D FFT.

Linearly Normalizing Stack of Images (data?) Prior to Averaging?

I'm writing an application that averages/combines/stacks a series of exposures. This is commonly used to reduce noise in the resultant image.
However, it seems, to optimize the average/stack the exposures are usually first normalized. It seems that this process assigns weights to each of the exposures and then proceeds to combine them. I am guessing that the process computes the overall intensity of each image as the purpose is to match the intensities of all the images in the stack.
My question is, how can I incorporate an algorithm that will allow me to normalize a series of images? I guess the question be generalized by instead asking "How can I normalize a series of readings?"
An outline in my head appears as follows:
Compute the average of a reference image.
Divide the average of each frame by the average of the the reference frame.
The result of each division is the weight for each frame.
Scale/Multiply each pixel in a frame by the weight found for that particular frame.
Does this seem to make sense to anyone? I have tried to google for the past hour but didn't found anything. Also took at the indices of various image processing books on Amazon but that didn't turn up anything either.
Each integration consists of signal and assorted noise - some is time-independent (e.g. bias or CCD readout noise), some time-dependent (e.g dark current), and some is random (shot noise). The aim is to remove the noise, and leave the signal. So you would first subtract the 'fixed' sources using dark frames (which will include dark current, readout and bias) leaving signal plus shot noise. Signal scales as flux times exposure time, shot noise as the square root of the signal
http://en.wikipedia.org/wiki/Shot_noise
so overall your signal/noise scales as the square root of the integration time (assuming your integrations are short enough that they are not saturated). So by adding frames you are simply increasing the exposure time, and hence the signal/noise ratio. You don't need to normalize first.
To complicate matters, transient non-Gaussian noise is also present (e.g. cosmic ray hits). There are many techniques for dealing with these, but a common one is 'sigma-clipping', where you have an extra pass to calculate the mean and standard deviation of each pixel, and then reject outliers that are many standard deviations from the mean. Real signal will show Gaussian fluctuations around the mean value, whereas transients will show a large deviation in one frame of the stack. Maybe that's what you are thinking of?

Anti-aliasing: Preferred ways of determing maximum frequency?

I've been reading up a bit on anti-aliasing and it seems to make sense, but there is one thing I'm not too sure of. How exactly do you find the maximum frequency of a signal (in the context of graphics).
I realize there's more than one case so I assume there is more than one answer. But first let me state a simple algorithm that I think would represent maximum frequency so someone can tell me if I'm conceptualizing it the wrong way.
Let's say it's for a 1 dimensional,finite, and greyscale image (in pixels). Am I correct in assuming you could simply scan the entire pixel line (in the spatial domain) looking for a for the minimum oscillation and the inverse of that smallest oscillation would be the maximum frequency?
Ex values {23,26,28,22,48,49,51,49}
Frequency:Pertaining to Set {}
(1/2) = .5 : {28,22}
(1/4) = .25 : {22,48,49,51}
So would .5 be the maximum frequency?
And what would be the ideal way to calculate this for a similar pixel line as the one above?
And on a more theoretical note, what if your sampling input was infinite (more like the real world)? Would a valid process be sort of like:
Predetermine a decent interval for point sampling
Determine max frequency from point sampling
while(2*maxFrequency > pointSamplingInterval)
{
pointSamplingInterval*=2
Redetermine maxFrequency from point sampling (with new interval)
}
I know these algorithms are fraught with inefficiencies, so what are some of the preferred ways? (Not looking for something crazy-optimized, just fundamentally better concepts)
The proper way to approach this is using a Fourier Transform (in practice, an FFT,or fast fourier transform)
The theory works as follows: if you have an set of pixels with color/grayscale, then we can say that the image is represented by pixels in the "spatial domain"; that is, each individual number specifies the image at a particular spatial location.
However, what we really want is a representation of the image in the "frequency domain". Instead of each individual number specifying each pixel, each number represents the amplitude of a particular frequency in the image as a whole.
The tool which converts from the "spatial domain" to the "frequency domain" is the Fourier Transform. The output of the FT will be a sequence of numbers specifying the relative contribution of different frequencies.
In order to find the maximum frequency, you perform the FT, and look at the amplitudes that you get for the high frequencies - then it is just a matter of searching from the highest frequency down until you hit your "minimum significant amplitude" threshold.
You can code your own FFT, but it is much easier in practice to use a pre-packaged library such as FFTW
You don't scan a signal for the highest frequency and then choose your sampling frequency: You choose a sampling frequency that's high enough to capture the things you want to capture, and then you filter the signal to remove high frequencies. You throw away everything higher than half the sampling rate before you sample it.
Am I correct in assuming you could
simply scan the entire pixel line (in
the spatial domain) looking for a for
the minimum oscillation and the
inverse of that smallest oscillation
would be the maximum frequency?
If you have a line of pixels, then the sampling is already done. It's too late to apply an antialiasing filter. The highest frequency that could be present is half the sampling frequency ("1/2px", I guess).
And on a more theoretical note, what
if your sampling input was infinite
(more like the real world)?
Yes, that's when you use the filter. First, you have a continuous function, like a real-life image (infinite sampling rate). Then you filter it to remove everything above fs/2, then you sample it at fs (digitize the image into pixels). Cameras don't actually do any filtering, which is why you get Moire patterns when you photograph bricks, etc.
If you're anti-aliasing computer graphics, you have to think of the ideal continuous mathematical function first, and think through how you would filter it and digitize it to produce the output on the screen.
For instance, if you want to generate a square wave with a computer, you can't just naively alternate between maximum and minimum values. That would be just like sampling a real life signal without filtering first. The higher harmonics wrap back into the baseband and cause lots of spurious spikes in the spectrum. You need to generate points as if they were sampled from a filtered continuous mathematical function:
I think this article from the O'Reilly site might also be useful to you ... http://www.onlamp.com/pub/a/python/2001/01/31/numerically.html ... in there they're referring to frequency analysis of sound files but you it gives you the idea.
I think what you need is an application of Fourier Analysis (http://en.wikipedia.org/wiki/Fourier_analysis). I've studied this but never used it so take it with a pinch of salt but I believe if you apply it correctly to your set of numbers you will get a set of frequencies which are components of the series and then you can pick off the highest one.
I can't point you at a piece of code that does this but I'm sure it would be out there somewhere .

Resources