How to get volume from mic input on WP7 [duplicate] - windows-phone-7

Given two byte arrays of data captured from a microphone, how can I determine which one has more spikes in noise? I would assume there is an algorithm I can apply to the data, but I have no idea where to start.
Getting down to it, I need to be able to determine when a baby is crying vs ambient noise in the room.
If it helps, I am using the Microsoft.Xna.Framework.Audio.Microphone class to capture the sound.

you can convert each sample (normalised to a range 1.0 to -1.0) into a decibel rating by applying the formula
dB = 20 * log-base-10 (sample-value)
To be honest, so long as you don't mind the occasional false positive, and your microphone is set up OK, you should have no problem telling the difference between a baby crying and ambient background noise, without going through the hassle of doing an FFT.
I'd recommend you having a look at the source code for a noise gate, which does pretty much what you are after, with configurable attack times & thresholds.

First use a Fast Fourier Transform to transform the signal into the frequency domain.
Then check if the signal in the typical "cry-frequencies" is significantly higher than the other amplitudes.
The preprocessor of the speex codec supports noise vs signal detection, but I don't know if you can get it to work with XNA.
Or if you really want some kind of loudness calculate the sum of squares of the amplitudes from the frequencies you're interested in (for example 50-20000Hz) and if the average of that over the last 30 seconds is significantly higher than the average over the last 10 minutes or exceeds a certain absolute threshold sound the alarm.

Louder at what point? The signal's average amplitude will tell you which one is louder on average, but that is kind of a dumb, brute force way to go about it. It may work for you in practice though.
Getting down to it, I need to be able to determine when a baby is crying vs ambient noise in the room.
Ok, so, I'm just throwing out ideas here; I am by no means an expert on audio processing.
If you know your input, i.e., a baby crying (relatively loud with a high pitch) versus ambient noise (relatively quiet), you should be able to analyze the signal in terms of pitch (frequency) and amplitude (loudness). Of course, if during he recording someone drops some pots and pans onto the kitchen floor, that will be tough to discern.
As a first pass I would simply traverse the signal, maintaining a standard deviation of pitch and amplitude throughout, and then set a flag when those deviations jump beyond some threshold that you will have to define. When they come back down you may be able to safely assume that you captured the baby's cry.
Again, just throwing you an idea here. You will have to see how it works in practice with actual data.

I agree with #Ed Swangren, it will take a lot of playing with samples of data for a lot of sources. To me, it sounds like the trick will be to limit or hopefully eliminate false positives. My experience with babies is they are much louder crying than the environment. so, keeping track of the average measurements (freq/amp/??) of the normal environment and then classifying how well the changes match the characteristics of a crying baby which changes from kid to kid, so you'll probably want a system that 'learns'. Best of luck.
update: you might find this library useful http://naudio.codeplex.com/

Related

Is GPS inaccuracy consistent over short time spans?

I'm interested in developing a semi-autonomous RC lawnmower.
That is, the operator would decide when to stop, turn, etc., but could request "slightly overlap previous cut" and the mower would automatically do so. (Having operated high-end RC mowers at trade shows, this is the tedious part. Overcoming that, plus the high cost -- which I believe is possible -- would make a commercial success.)
This feature would require accurate horizontal positioning. I have investigated ultrasonic, laser, optical, and GPS. Each has its problems in this application. (I'll resist the temptation to go off on these tangents here.)
So... my question...
I know GPS horizontal accuracy is only 3-4m. Not good enough, but:
I don't need to know where I am on the planet. I only need to know where I am relative to where I was a minute ago.
So, my question is, is the inaccuracy consistent in the short term? if so, I think it would work for me. If it varies wildly by +- 1.5m from one second to the next, then it will not work.
I have tried to find this information but have had no success (possibly because of the ubiquity of other GPS-accuracy discussion), so I appreciate any guidance.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Edit ~~~~~~~~~~~~~~~~~~~~~~
It's looking to me like GPS is not just skewed but granular. I'd be interested in hearing from anyone who can give better insight into this, but for now I'm going to explore other options.
I realized that even though my intended application is "outdoor", this question is technically in the field of "indoor positioning systems" so I am adding that tag.
My latest thinking is to have 3 "intelligent" high-dB ultrasonic (US) speaker units. The mower emits RF requests for a tone from each speaker in rapid sequence, measuring the time it takes to "hear" each unit's response, thereby calculating distance to each of these fixed point and using trilateration to get position. if the fixed-point speakers are 300' away from the mower, the mower may have moved several feet between the 1st and 3rd response, so this would have to be allowed for in the software. If it is possible to differentiate 3 different US frequencies, they could be requested/received "simultaneously". Though you still run into issues when you're close to one fixed unit and far from another. So some software correction may still be necessary. If we can assume the mower is moving in a straight line, this isn't too complicated.
Another variation is the mower does not request the tones. The fixed units send RF "here comes tone from unit A" etc., and the mower unit just monitors both RF info and US tones. This may simplify things somewhat, but it seems it really requires the ability to determine which speaker a tone is coming from.
This seems like the kind of thing you could (and should) measure empirically. Just set a GPS of your liking down in the middle of a field on a clear day and wait an hour. Then come back and see what you find.
Because I'm in a city, I can't run out and do this for you. However, I found a paper entitled iGeoTrans – A novel iOS application for GPS positioning in geosciences.
That includes this figure which duplicates the test I propose. You'll note that both the iPhone4 and Garmin eTrex10 perform pretty poorly versus the accuracy you say you need.
But the authors do some Math Magic™ to reduce the uncertainty in the position, presumably by using some kind of averaging. That gets them to a 3.53m RMSE measure.
If you have real-time differential GPS, you can do better. But this requires relatively expensive hardware and software.
Even aside from the above, you have the potential issue of GPS reflection and multipath error. What if your mower has to go under a deck, or thick trees, or near the wall of a house? These common yard features will likely break the assumptions needed to make a good averaging algorithm work and even frustrate attempts at DGPS by blocking critical signals.
To my mind, this seems like a computer vision problem. And not just because that'll give you more accurate row overlaps... you definitely don't want to run over a dog!
In my opinion a standard GPS is no way accurate enough for this application. A typical consumer grade receiver that I have used has a position accuracy defined as a CEP of 2.5 metres. This means that for a stationary receiver in a "perfect" sky view environment over time 50% of the position fixes will lie within a circle with a radius of 2.5 metres. If you look at the position that the receiver reports it appears to wander at random around the true position sometimes moving a number of metres away from its true location. When I have monitored the position data from a number of stationary units that I have used they could appear to be moving at speeds of up to 0.5 metres per second. In your application this would mean that the lawnmower could be out of position by some not insignificant distance (with disastrous consequences for your prized flowerbeds).
There is a way that this can be done, as has been proved by the tractor manufacturers who can position the seed drills and agricultural sprayers to millimetre accuracy. These systems use Differential GPS where there is a fixed reference station positioned in the neighbourhood of the tractor being controlled. This reference station transmits error corrections to the mobile unit allowing it to correct its reported position to a high degree of accuracy. Unfortunately this sort of positioning system is very expensive.

How to detect threshing in accelerometer?

I'm writing an application controlled by an accelerometer built into a wrist watch. I want one of the commands to be "wildly swinging your forehand". How do I detect it and measure for how long it goes?
To supplement John Fisher's suggestion, I would add: Look at analyzing this with spectral/Fourier transform techniques. I would expect to see a strong signal characteristic at low frequencies, but it could easily vary from user to user.
If the characteristic is there, signal processing techniques can help you isolate it and detect it.
Write something to record the measurements while you flail your arm around. Do it several times in different ways. Analyze the measurements for a pattern you can use.
Use a Kalman filter to track a moving average of the absolute value of the current acceleration? Or, if you can, the current acceleration minus gravity?
If that goes over a threshold, that indicates a lot of acceleration has been happening recently. That suggests thrashing.

Detecting wind noise [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I want to develop an app for detecting wind according the audio stream.
I need some expert thoughts here, just to give me guide lines or some links, I know this is not easy task but I am planning to put a lot of effort here.
My plan is to detect some common patterns in the stream, and if the values are close to this common patterns of the wind noise I will notify that match is found, if the values are closer to the known pattern great, I can be sure that the wind is detected, if the values doesn't match with the patterns then I guess there is no so much wind....
That is my plan at first, but I need to learn how this things are done. Is there some open project already doing this ? Or is there someone who is doing research on this topics ?
The reason I write on this forum is because I do not know how to google it, the things I found was not I was looking for. I really do not know how to start developing this kind of algorithm.
EDIT 1 :
I tried to record a wind, and when I open the saved audio file for me it was just a bunch of numbers :). I do not even see in what format should I save this, is wave good enough ? Should I use something else, or what if I convert the wind noise audio file in mp3 : is this gonna help with parsing ?
Well I got many questions, that is because I do not know from where to read more about this kind of topic. I tag my question with guidlines so I hope someone will help me.
There must be something that is detectable, cause the wind noise is so common, there must be somehow to detect this, we need only someone to give me tips, someone who is familiar with this topic.
I just came across this post I have recently made a library which can detect wind noise in recordings.
I made a model of wind noise and created a database of examples and then trained a Machine Learning algorithm to detect and meter the wind level in a perceptually weighted way.
The C++/C code is here if it is of use to anyone!
The science for your problem is called "pattern classification", especially the subfield of "audio pattern classification". The task is abstracted as classifying a sound recording into two classes (wind and not wind). You seem to have no strong background in signal processing yet, so let me insert one central warning:
Pattern classification is not as easy as it looks at first. Humans excel at pattern classification. Computers don't.
A good first approach is often to compute the correlation of the Fourier transform of your signal and a sample. Don't know how much that will depend on wind speed, however.
You might want to have a look at the bag-of-frames approach, it was used successfully to classify ambient noise.
As #thiton mentioned this is an example of audio pattern classification.
Main characteristics for wind: it's a shaped (band/hp filtered) white noise with small semi-random fluctuations in amplitude and pitch. At least that's how most synthesizers reproduce it and it sounds quite convincing.
You have to check the spectral content and change in the wavefile, so you'll need FFT. Input format doesn't really matter, but obviously raw material (wav) is better.
Once you got that you should detect that it's close to some kind of colored noise and then perhaps extract series of pitch and amplitude and try to use classic pattern classification algorithm for that data set. I think supervised learning could work here.
This is actually a hard problem to solve.
Assuming you have only a single microphone data. The raw data you get when you open an audio file (time-domain signal) has some, but not a lot of information for this kind of processing. You need to go into the frequency domain using FFTs and look at the statistics of the the frequency bins and use that to build a classifier using SVM or Random Forests.
With all due respect to #Karoly-Horvath, I would also not use any recordings that has undergone compression, such as mp3. Audio compression algorithms always distorts the higher frequencies, which as it turns out, is an important feature in detecting wind now. If possible, get the raw PCM data from a mic.
You also need to make sure your recording is sampled at at least 24kHz so you have information of the signal up to 12kHz.
Finally - the wind shape in the frequency domain is not a simple filtered white noise. The characteristics is that it usually has high energy in the low frequencies (a rumbling type of sound) with sheering and flapping sounds in the high frequencies. The high frequency energy is quite transient, so if your FFT size is too big, you will miss this important feature.
If you have 2 microphone data, then this gets a little bit easier. Wind, when recorded, is a local phenomenon. Sure, in recordings, you can hear the rustling of leaves or the sound of chimes caused by the wind. But that is not wind-noise and should not be filtered out.
The actual annoying wind noise you hear in a recording is the air hitting the membrane of your microphone. That effect is a local event - and can be exploited if you have 2 microphones. It can be exploited because the event is local to each individual mic and is not correlated with the other mic. Of course, where the 2 mics are placed in relations to each other is also important. They have to be reasonably close to each other (say, within 8 inches).
A time-domain correlation can then be used to determine the presence of wind noise. (All the other recorded sound are correlated with each other because the mics are fairly close to each other, so a high correlation means no wind, low correlation means wind). If you are going with this approach, your input audio file need not be uncompressed. A reasonable compression algorithm won't affect this.
I hope this overview helps.

Comparing 2 one dimensional signals

I have the following problem: I have 2 signals over time. They are from the same source so they should be the same. I want to check if they really are.
Complications:
they may be measured with different sample rates
the start / end time do not correlate. The measurement does not start at the same time and end at the same time.
there may be an time offset between the two signals.
My thoughts go along Fourier transformation, convolution and statistical methods for comparison. Can someone post me some links where I can find more information on how to handle this?
You can easily correct for the phase by just shifting them so their centers of mass line up. (Or alternatively, in the Fourier domain just multiplying by the inverse of the phase of the first coefficient.)
Similarly, if you want to line up the images given only partial data, you can just cross correlate and take the maximal value (which is again easy to do in the Fourier domain).
That leaves the only tricky part of this process as dealing with the sampling rates. Now if you know a-priori what the sample rates are, (and if they are related by a rational number), you can just use sinc interpolation/downsampling to rescale them to a common sampling rate:
https://ccrma.stanford.edu/~jos/st/Bandlimited_Interpolation_Time_Limited_Signals.html
If you don't know the sampling rate, you may be a bit screwed. Technically, you can try just brute forcing over all the different rescalings of your signal, but doing this tends to be either slow or else give mediocre results.
As a last suggestion, if you just want to match sounds exactly you can try using the cepstrum and verifying that the peaks of the signal are close enough to within some tolerance. This type of analysis is used a lot in sound and speech recognition, with some refinements to make it operate a bit more locally. It tends to work best with frequency modulated data like speech and music:
http://en.wikipedia.org/wiki/Cepstrum
Fourier transformation does sound like the right way.
There is too much mathematical information for me to just start explaining here so if you really wanna know what's going on with that (cause I don't think you can just use FT without understanding it) you should use this reference from MIT OpenCourseWare: http://ocw.mit.edu/courses/mathematics/18-103-fourier-analysis-theory-and-applications-spring-2004/lecture-notes/
Hope it helped.
If you are working with a linux box and the waveforms that need to be processed have already been recorded, you can try to use the file command to display details about the recording. It gives you the sampling rate when it is invoked on a wav file, though I am not sure what format you are recording in.
If the signals are time-shifted with respect to each other, you may try to convolve one with a delta function with increasing delays and then comparing. On MATLAB, conv and all should be good enough.
These are just 'crude' attempts (almost like hacking at the problem). There may be algorithms that are shift-invariant that may do a better job.
Hope that helps.

Algorithms needed on filtering the noise caused by the vibration

For example you measure the data coming from some device, it can be a mass of the object moving on the bridge. Because it is moving the mass will give data which will vibrate in some amplitude depending on the mass of the object. Bigger the mass - bigger the vibrations.
Are there any methods for filtering such kind of noise from that data?
May be using some formulas of vibrations? Have no idea what kind of formulas or algorithms (filters) can be used here. Please suggest anything.
EDIT 2:
Better picture, I just draw it for better understanding:
Not very good picture. From that graph you can see that the frequency is the same every
time, but the amplitude chanbges periodically. Something like that I have when there are no objects on the moving road. (conveyer belt). vibrating near zero value.
When the object moves, I there are the same waves with changing amplitude.
The graph can tell that there may be some force applying to the system and which produces forced occilations. So I am interested in removing such kind of noise. I do not know what force causes such occilations. Soon I hope I will get some data on the non moving road with and without object on it for comparison with moving road case.
What you have in your last plot is basically an amplitude modulated oscillation coming from a function like:
f[x] := 10 * (4 + Sin[x]) * Sin[80 * x]
The constants have been chosen to match your plot (using just a rule of thumb)
The Plot of this function is
That isn't "noise" (although may be some noise is there too), but can be filtered easily.
Let's see your data for the static and moving payloads ....
Edit
Based on your response to several comments, and based in my previous experience with weighting devices:
You are interfacing the physical world, not just getting input from a mouse and keyboard. It is very important for you understand the device, how it works and how it is designed.
You need a calibration procedure. You have to use several master weights to be sure that the device is working properly and linearly in the whole scale, and that the static case is measured much better than your dynamic needs.
You'll not be able to predict if you can measure with several loads in the conveyor until you do some experiments and look very carefully at the resulting plots
You need to be sure that a load placed anywhere in the conveyor shows the same reading. Or at least you should be able to correlate reading and position.
As I said before, you need a lot of info, and it seems that is not available. I always worked as a team with the engineers designing the device.
Don't hesitate to add more info ...
Have you tried filters with lowpass characteristics? There are different approaches for smoothing data (i.e. Savitzky-Golay, Gauss, moving average) but often, a simple N-point median filter is already sufficient.
It really depends on what you're after.
Take a look at this book:
The Scientist and Engineer's Guide to Digital Signal Processing
You can download it for free. In particular, check chapters 14 and 15.
If the frequency changes with mass and you're trying to measure mass, why not measure the frequency of the oscillations and use that as your primary measure?
Otherwise you need a notch filter which is tunable - figure out the frequency of the "noise" and tune the notch filter to that.
Another book to try is Lyons Understanding Digital Signal Processing
In order to smooth the signal, I'd average the previous 2 * n samples where n is the maximum expected wavelength of the vibrations.
This should cause most of the noise to be eliminated.
If you have some idea of the range of frequencies, you could do a simple average as long as the measurement period were sufficiently long to give you the level of accuracy you want to achieve. The more wavelengths worth of data you average against, the smaller the ratio of contributed error from a partial wavelength.
I'd suggest first simulating/modeling this in software like Matlab.
Data you'll need to consider:
The expected range of vibration frequencies
The measurement accuracy you want to achieve
The expected range of mass you'll want to measure
The function of mass to vibration amplitude
You should be able to apply the same principles as noise-cancelling microphones: put two sensors out, then subtract the secondary sensor's (farther away from the good signal source) signal from the primary sensor's (closer to the good signal source) signal.
Obviously, this works best if the "noise" will reach both sensors fairly equally while the "signal" reaches the primary sensor much more strongly.
For things like sound, this is pretty easy to do in the sensor itself, which makes your software a lot easier and more performant. Depending on what you're measuring, this might be easier to do with multiple sets of hardware and doing the cancellation in software.
If you can characterize the frequency spectra of the unwanted vibration noise, you might be able to synthesize a set of (near) minimum phase notch or band reject filter(s) to allow you to acquire your desired signal at your desired S/N ratio with minimized latency or data set size.
Filtering noisy digital signals is straight forward, as previous posters have noted. There are lots of references. You have not however stated what your objectives are clearly, so we cannot point you into a good direction. Are you looking for a single measurement of a single object on a bridge? [Then see other answers].
Are you monitoring traffic on this bridge and weighing each entity as it passes by? Then you need to determine when entities are on the sensor and when they are not. Typically, as long as the sensor's noise floor is significantly lower than the signal you're measuring this can be accomplished by simple thresholding.
Are you trying to measure the vibrations of the bridge caused by other vehicles? In which case you need either a more expensive sensor if you're having problems doing this, or a clearer measuring objective.

Resources