I have built a little program that encodes binary data into a sound. For example the following binary input:
00101101
will produce a 'sound' like this:
################..S.SS.S################
where each character represents a constant unit of time. # stands for a 880 Hertz sine wave which is used to determine start and end of transmission, . stands for silence, representing the zeroes, and S stands for a 440 Hertz sine wave, representing the ones. Obviously, the part in the middle is much longer in practice.
The essence of my question is: How can I invert this operation?
The sound file is transmitted to the recipient via simple playback and recording of the sound. That means I am not trying to decode the original sound file which would be easy.
Obviously I have to analyze the recorded data with respect to frequency. But how? I have read a bit about Fourier Transform but I am quite lost here.
I am not sure where to start but I know that this is not trivial and probably requires quite some knowledge about signal processing. Can somebody point me in the right direction?
BTW: I am doing this in Ruby (I know, it's slow - it's just a proof of concept) but the problem itself is not programming language specific so any answers are very welcome.
Your problem is clearly trying to demodulate an FSK modulated signal. I would recommend implementing a correlation bank tuned to each frequency, it is a lot faster than fft if speed is one of your concerns
If you know the frequencies and the modulation rate, you can try using 2 sliding Goertzel filters for FSK demodulation.
Related
I have an application that is attempting to detect sirens from audio data. However, my understanding of audio concepts and terminology is elementary.
The first step of my application is to detect pitched sounds. The algorithm I implemented for this is as follows:
Split audio data into windows
Transform the data in each window to frequency domain using a FFT
Extract the magnitude of the dominant frequency (ignore bucket 0). Let this be maxMag
Extract the mean magnitude over all the FFT buckets (ignore bucket 0). Let this be meanMag
If maxMag / meanMag > some threshold, then the window contains pitched sound
Does this algorithm make sense? Is my terminology correct?
Thank you.
If you are detecting a single tone (or a small set of tones) you don't need to do a full FFT. You can use the Goertzel Algorithm to detect a specific tone. You probably don't care about the level of the tone you are looking for relative to everything else, so you should be able to avoid the "dominant frequency" test unless you have some reason for only detecting the tone if it is the loudest tone in the environment.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I want to develop an app for detecting wind according the audio stream.
I need some expert thoughts here, just to give me guide lines or some links, I know this is not easy task but I am planning to put a lot of effort here.
My plan is to detect some common patterns in the stream, and if the values are close to this common patterns of the wind noise I will notify that match is found, if the values are closer to the known pattern great, I can be sure that the wind is detected, if the values doesn't match with the patterns then I guess there is no so much wind....
That is my plan at first, but I need to learn how this things are done. Is there some open project already doing this ? Or is there someone who is doing research on this topics ?
The reason I write on this forum is because I do not know how to google it, the things I found was not I was looking for. I really do not know how to start developing this kind of algorithm.
EDIT 1 :
I tried to record a wind, and when I open the saved audio file for me it was just a bunch of numbers :). I do not even see in what format should I save this, is wave good enough ? Should I use something else, or what if I convert the wind noise audio file in mp3 : is this gonna help with parsing ?
Well I got many questions, that is because I do not know from where to read more about this kind of topic. I tag my question with guidlines so I hope someone will help me.
There must be something that is detectable, cause the wind noise is so common, there must be somehow to detect this, we need only someone to give me tips, someone who is familiar with this topic.
I just came across this post I have recently made a library which can detect wind noise in recordings.
I made a model of wind noise and created a database of examples and then trained a Machine Learning algorithm to detect and meter the wind level in a perceptually weighted way.
The C++/C code is here if it is of use to anyone!
The science for your problem is called "pattern classification", especially the subfield of "audio pattern classification". The task is abstracted as classifying a sound recording into two classes (wind and not wind). You seem to have no strong background in signal processing yet, so let me insert one central warning:
Pattern classification is not as easy as it looks at first. Humans excel at pattern classification. Computers don't.
A good first approach is often to compute the correlation of the Fourier transform of your signal and a sample. Don't know how much that will depend on wind speed, however.
You might want to have a look at the bag-of-frames approach, it was used successfully to classify ambient noise.
As #thiton mentioned this is an example of audio pattern classification.
Main characteristics for wind: it's a shaped (band/hp filtered) white noise with small semi-random fluctuations in amplitude and pitch. At least that's how most synthesizers reproduce it and it sounds quite convincing.
You have to check the spectral content and change in the wavefile, so you'll need FFT. Input format doesn't really matter, but obviously raw material (wav) is better.
Once you got that you should detect that it's close to some kind of colored noise and then perhaps extract series of pitch and amplitude and try to use classic pattern classification algorithm for that data set. I think supervised learning could work here.
This is actually a hard problem to solve.
Assuming you have only a single microphone data. The raw data you get when you open an audio file (time-domain signal) has some, but not a lot of information for this kind of processing. You need to go into the frequency domain using FFTs and look at the statistics of the the frequency bins and use that to build a classifier using SVM or Random Forests.
With all due respect to #Karoly-Horvath, I would also not use any recordings that has undergone compression, such as mp3. Audio compression algorithms always distorts the higher frequencies, which as it turns out, is an important feature in detecting wind now. If possible, get the raw PCM data from a mic.
You also need to make sure your recording is sampled at at least 24kHz so you have information of the signal up to 12kHz.
Finally - the wind shape in the frequency domain is not a simple filtered white noise. The characteristics is that it usually has high energy in the low frequencies (a rumbling type of sound) with sheering and flapping sounds in the high frequencies. The high frequency energy is quite transient, so if your FFT size is too big, you will miss this important feature.
If you have 2 microphone data, then this gets a little bit easier. Wind, when recorded, is a local phenomenon. Sure, in recordings, you can hear the rustling of leaves or the sound of chimes caused by the wind. But that is not wind-noise and should not be filtered out.
The actual annoying wind noise you hear in a recording is the air hitting the membrane of your microphone. That effect is a local event - and can be exploited if you have 2 microphones. It can be exploited because the event is local to each individual mic and is not correlated with the other mic. Of course, where the 2 mics are placed in relations to each other is also important. They have to be reasonably close to each other (say, within 8 inches).
A time-domain correlation can then be used to determine the presence of wind noise. (All the other recorded sound are correlated with each other because the mics are fairly close to each other, so a high correlation means no wind, low correlation means wind). If you are going with this approach, your input audio file need not be uncompressed. A reasonable compression algorithm won't affect this.
I hope this overview helps.
I have the following problem: I have 2 signals over time. They are from the same source so they should be the same. I want to check if they really are.
Complications:
they may be measured with different sample rates
the start / end time do not correlate. The measurement does not start at the same time and end at the same time.
there may be an time offset between the two signals.
My thoughts go along Fourier transformation, convolution and statistical methods for comparison. Can someone post me some links where I can find more information on how to handle this?
You can easily correct for the phase by just shifting them so their centers of mass line up. (Or alternatively, in the Fourier domain just multiplying by the inverse of the phase of the first coefficient.)
Similarly, if you want to line up the images given only partial data, you can just cross correlate and take the maximal value (which is again easy to do in the Fourier domain).
That leaves the only tricky part of this process as dealing with the sampling rates. Now if you know a-priori what the sample rates are, (and if they are related by a rational number), you can just use sinc interpolation/downsampling to rescale them to a common sampling rate:
https://ccrma.stanford.edu/~jos/st/Bandlimited_Interpolation_Time_Limited_Signals.html
If you don't know the sampling rate, you may be a bit screwed. Technically, you can try just brute forcing over all the different rescalings of your signal, but doing this tends to be either slow or else give mediocre results.
As a last suggestion, if you just want to match sounds exactly you can try using the cepstrum and verifying that the peaks of the signal are close enough to within some tolerance. This type of analysis is used a lot in sound and speech recognition, with some refinements to make it operate a bit more locally. It tends to work best with frequency modulated data like speech and music:
http://en.wikipedia.org/wiki/Cepstrum
Fourier transformation does sound like the right way.
There is too much mathematical information for me to just start explaining here so if you really wanna know what's going on with that (cause I don't think you can just use FT without understanding it) you should use this reference from MIT OpenCourseWare: http://ocw.mit.edu/courses/mathematics/18-103-fourier-analysis-theory-and-applications-spring-2004/lecture-notes/
Hope it helped.
If you are working with a linux box and the waveforms that need to be processed have already been recorded, you can try to use the file command to display details about the recording. It gives you the sampling rate when it is invoked on a wav file, though I am not sure what format you are recording in.
If the signals are time-shifted with respect to each other, you may try to convolve one with a delta function with increasing delays and then comparing. On MATLAB, conv and all should be good enough.
These are just 'crude' attempts (almost like hacking at the problem). There may be algorithms that are shift-invariant that may do a better job.
Hope that helps.
I am attempting to extract pitch data from an audio stream. From what I can see, it looks as though FFT is the best algorithm to use.
Rather than digging straight into the math, could someone help me understand what this FFT algorithm does?
Please don't say something obvious like 'FFT extracts frequency data from a raw signal.' I need the next level of detail.
What do I pass in, and what do I get out?
Once I understand the interface clearly, this will help me to understand the implementation.
I take it I need to pass in an audio buffer, I need to tell it how many bytes to use for each computation (say the most recent 1024 bytes from this buffer). and maybe I need to specify the range of pitches I want it to detect. Now it is going to pass back what? An array of frequency bins? What are these?
(Edit:) I have found a C++ algorithm to use (if I can only understand it)
Performous extracts pitch from the microphone. Also the code is open source. Here is a description of what the algorithm does, from the guy that coded it.
PCM input (with buffering)
FFT (1024 samples at a time, remove 200 samples from front of the buffer afterwards)
Reassignment method (against the previous FFT that was 200 samples earlier)
Filtering of peaks (this part could be done much better or even left out)
Combining peaks into sets of harmonics (we call the combination a tone)
Temporal filtering of tones (update the set of tones detected earlier instead of simply using the newly detected ones)
Pick the best vocal tone (frequency limits, weighting, could use the harmonic array also but I don't think we do)
But could someone help me understand how this works? What is it that is getting sent from the FFT to the Reassignment method?
The FFT is just one building block in the process, and it may not be the best approach for pitch detection. Read up on pitch detection and decide which algo you want to use first (this will depend on what exactly you are trying to measure the pitch of - speech, single musical instrument, other types of sound, etc. Get this right before getting into low level details such as the FFT (some, but not all pitch detection algorithms use the FFT internally).
There are numerous similar questions on SO already, e.g. Real-time pitch detection using FFT and Pitch detection using FFT for trumpet, and there is good overview material on Wikipedia etc - read these and then decide whether you still want to roll your own FFT-based solution or perhaps use an existing library which is suitable for your particular application.
There is an element of choice here. The most straightforward to implement is to do (2^n samples in) complex numbers in, and 2^n complex numbers out, so maybe you should start with that.
In the special case of a DCT(discrete cosine transform), typically what goes in is 2^n samples (often floats), and out go 2^n values, often floats too. DCT is an FFT but that takes only the real values, and analyses the function in terms of cosines.
It is smart (but commonly skipped) to define a struct to handle the complex values. Traditionally FFT's are done in-place, but it works fine if you don't.
It can be useful to instantiate a class that contains a work buffer for the FFT (if you don't want to do the FFT in-place), and reuse that for several FFTs.
In goes N samples of PCM (purely real complex numbers). Out comes N bins of frequency domain (each bin corresponding to a 1/N slice of sample rate). Each bin is a complex number. Rather than real and imaginary parts, these values should generally be handled in polar format (absolute value and argument). The absolute value tells the amount of sound near the bin center frequency while the argument tells the phase (at which position the sine wave is travelling).
Most often coders only use the magnitude (absolute value) and throw away the phase angle (argument).
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I currently have an array full of data which from what I believe is the Amplitude of my wave file. It is currently at a low -32768 and at a high 32767.
I also have the SampleRate which was 16,000hz.
My understanding of sound isn't very good; does anyone know from this how I can calculate the Frequency?
Help greatly appreciated,
Monkeyguy.
What exactly is it that you're wanting to do? The method will depend entirely on what you're hoping to achieve. Do you have a signal that contains a single sinusoid, eg a detector from a piece of mechanical equipment? Or more likely, are you wanting to play/sing into a microphone and transcribe the music?
In both cases, the FFT will be your first port of call. In the first case this may be pretty much all you need, as FFTs are good for isolated steady-state sinusoids. In the latter case, you have got a very long road ahead of you in order to get any useful results at all. Pitch recognition is a difficult problem, and merely throwing some FFTs won't get you very far. You'll need to have a good grounding in digital signal processing and also the characteristics of musical signals, and then probably your best bet is to use an autocorrelation based method instead.
See my previous answer on a related subject for some links which may be useful: Algorithms for determining the key of an audio sample
In almost all cases, an audio file has no single frequency. A sound in which the sound wave has a single frequency, is (typically) a pure sine tone, and sounds like this:
http://www.wolframalpha.com/input/?i=sound+440+Hz&a=*MC.~-_*PlaySoundTone-&a=*FS-_**DopplerShift.fo-.*DopplerShift.vs-.*DopplerShift.c--&f3=10+m/s&f=DopplerShift.vs_10+m/s&f4=340.3+m/s&f=DopplerShift.c_340.3+m/s&a=*FVarOpt.1-_***DopplerShift.fo-.*DopplerShift.fs--.***DopplerShift.DopplerRatio---.*--&a=*FVarOpt.2-_**-.***DopplerShift.vo--.**DopplerShift.vw---.**DopplerShift.fo-.*DopplerShift.fs---
This is a pure 440 Hz sine wave. (It was not possible to make a proper link of this, due to MarkDown limitations.)
A general sound, such as a recording (of speech, music, or just urban noise), consists of (an infinite number of) combinations of such sine waves, superimposed. That is, if you were to draw the graph of pressure vs. time (at a given point in space) of the wave, or, (more or less) equivalently, the position of the speaker's membrane as a function of time, it would hence not be a pure sine wave, but something much more complicated. (Indeed, how could all the information of a Beethoven symphony be represented in a simple sine wave, that is completely determined by only its frequency, a single number?)
The sampling rate of a digital recording is merely the number of samples per second of the sound wave. Indeed, a physical sound wave has an amplitude p(t) at each time, so, because there are an infinite number of times t between 0 s and 10 s (say), theoretically, to save the audio we would need an infinite number of bytes (each sample requireing a fixed number of bytes -- for instance, a 16-bit recording utilizes 16 bits, or 2 bytes, per sample -- of course, the higher the "bit number" is, the higher quality we get; for a 16-bit sound, we have 216 = 65536 levels to choose from when specifying a single sample). In practice, a sound is sampled, so that the amplitude p(t) is saved only at fixed intervals. For instance, a typical audio CD has a sampling rate of 44.1 kHz; that is, a sample is saved every 22.7 µs.
Hence, a pure sine wave of any frequency, or any recording, could be stored on a computer using any sampling rate, the quality of the recording determined by the sampling rate (the higher the better). [Technical note: Of course there is a lower limit (in some sense) on the sampling rate. This is called the Nyquist rate.]
To determine the mean frequency of the sound at any small time, you could use some advanced techniques from Fourier analysis, but it is not entirely trivial.
This is just what I remember from physics, and I'm definitely no music expert, either.
Unless it's a recording of a constant tone, it probably doesn't have a single frequency. Each tone has a different frequency, which is why they sound different. There is generally a relationship between a wave's (not wav) frequency and wavelength, but none that I know of regarding amplitude.
Your SampleRate is similar to a frequency, being measured in Hz, but it only tells you about the precision of the recording, rather than the actual frequency of the sounds recorded.
As a quick addendum to the other two answers, if you are trying to measure frequencies within the sound file itself, you will need to look into the Fast Fourier Transform (FFT), which is an algorithm used to determine the strength of the frequencies within a sampled data set.
While it's true that an audio recording will not have a single frequency, you can find the fundamental frequency easily enough. Start at the beginning of your sample, and trace through it; you're looking for the highest absolute value, and in a wave with multiple frequencies you will not know what it is until you're back to zero. Remember the highest or lowest value you've seen thus far. Now, trace forward, hopefully in the opposite direction. You're looking for the next peak or trough of similar absolute value as the one you found, using the same method as before. Find out how many samples there are between your two highest absolute value readings. Divide your sample rate by this number (it better not be zero) and then divide by 2. This is the lowest, or fundamental, frequency of your recording at this point.
You can also generate a sinusoidal function that represents a synthetic waveform at a given frequency, and subtract this waveform's instantaneous values from your sample. Find the difference in the root-mean-square amplitudes of the before and after samples. This difference is a rough approximation of the amplitude of the signal at that frequency. Repeat this process, doubling the frequency each time. You can use this to create a basic EQ spectrum.