I want to change the frequency of a voice recording by changing sample rate on Mac OS X.
This is a research project aimed at people who stutter. It's essential that the latency is very low – this is, for instance, why I'm not considering Fast Fourier Transforms. Instead, I want to collect samples at a rate of, say, 44kHz, then do one of two things:
1) Play the samples back twice as slowly (i.e. 22kHz). This will result in increasing asynchrony with the source. It would be useful if I can restart the sampling every 1 second or so to prevent the asynchrony from becoming too noticeable.
2) Play the samples back twice as quickly. Obviously, it's impossible to do this continuously (i.e. can't play back samples which haven't been collected yet). To get around this, I'm intending to gate the playback with a square wave. Samples will be played back twice as quickly as they were recorded during the peak of the square wave. Nothing will be heard (but samples will still be collected) during the trough of the square wave.
I've prepared a PDF which describes the project in more detail here:
A friend has helped me with some of the programming for this using PortAudio. Unfortunately, we're getting very long latencies. I think this might be because PortAudio is working at too high a level. From the code, it looks to me as if PortAudio is buffering the incoming audio stream and then making alterations which are prima facie similar to the ones I've described above, but which are in fact operations on the buffered stream.
This isn't what I want at all. It's essential that the processing unit does as little as possible. Referring to the conditions (1) and (2) above, all the computer should do is to (1) play back the samples without any manipulation but twice as slowly; or (2) store the incoming samples then play them back twice as quickly. There should be no other processing whatsoever. I think this is the only way I'll get the very low latencies I'm looking for.
I wondered if it would be better to try doing this directly in Core Audio for OS X, rather than using PortAudio? This would limit platform compatibility. But the low latency is much more important than compatibility.
Am I likely to be able to do what I want using a mid-level service, such as Audio Units? Or would I need to write directly for a low-level service such as I/O Kit? How would I go about it?

It looks like the best thing for you would be to use something like Max/MSP or Pure Data. This will allow you to avoid working with text-based languages and should be good for you rapidly develop what you're looking to do. I/O kit is a bit too low-level for what you're trying to do.
Since max is not a text based language, sharing the code itself is a bit tricky on sites like stack overflow. I've included a screengrab. You can copy and paste max code, but it's a bit ugly and innappropiate for this.
here's a quick description. The box that says rect~ 1 is generating a square wave at Hz. The snapshot~ box is capturing the values this spits out. The if boxes check when it's greater than zero or less than zeros (peaks and troughs). If it gets a trough, the record~ box records the signal from the microphone box and stores it in a buffer. the groove~ box is a sampler that plays back the audio in this buffer, when it recives a bang from the if box, it plays back the audio. The sig~ box is being used to control the playback rate.
Also, you may not know this but the .PDF you're trying to share is unavailable.
One other thing, if latency is important, you should learn about something called a click train. This is basically where you send a signal with a single 1 at the start and time how long it takes for that value to get through your system.


I want to develop an app for detecting wind according the audio stream.
I need some expert thoughts here, just to give me guide lines or some links, I know this is not easy task but I am planning to put a lot of effort here.
My plan is to detect some common patterns in the stream, and if the values are close to this common patterns of the wind noise I will notify that match is found, if the values are closer to the known pattern great, I can be sure that the wind is detected, if the values doesn't match with the patterns then I guess there is no so much wind....
That is my plan at first, but I need to learn how this things are done. Is there some open project already doing this ? Or is there someone who is doing research on this topics ?
The reason I write on this forum is because I do not know how to google it, the things I found was not I was looking for. I really do not know how to start developing this kind of algorithm.
EDIT 1 :
I tried to record a wind, and when I open the saved audio file for me it was just a bunch of numbers :). I do not even see in what format should I save this, is wave good enough ? Should I use something else, or what if I convert the wind noise audio file in mp3 : is this gonna help with parsing ?
Well I got many questions, that is because I do not know from where to read more about this kind of topic. I tag my question with guidlines so I hope someone will help me.
There must be something that is detectable, cause the wind noise is so common, there must be somehow to detect this, we need only someone to give me tips, someone who is familiar with this topic.
I just came across this post I have recently made a library which can detect wind noise in recordings.
I made a model of wind noise and created a database of examples and then trained a Machine Learning algorithm to detect and meter the wind level in a perceptually weighted way.
The C++/C code is here if it is of use to anyone!
The science for your problem is called "pattern classification", especially the subfield of "audio pattern classification". The task is abstracted as classifying a sound recording into two classes (wind and not wind). You seem to have no strong background in signal processing yet, so let me insert one central warning:
Pattern classification is not as easy as it looks at first. Humans excel at pattern classification. Computers don't.
A good first approach is often to compute the correlation of the Fourier transform of your signal and a sample. Don't know how much that will depend on wind speed, however.
You might want to have a look at the bag-of-frames approach, it was used successfully to classify ambient noise.
As #thiton mentioned this is an example of audio pattern classification.
Main characteristics for wind: it's a shaped (band/hp filtered) white noise with small semi-random fluctuations in amplitude and pitch. At least that's how most synthesizers reproduce it and it sounds quite convincing.
You have to check the spectral content and change in the wavefile, so you'll need FFT. Input format doesn't really matter, but obviously raw material (wav) is better.
Once you got that you should detect that it's close to some kind of colored noise and then perhaps extract series of pitch and amplitude and try to use classic pattern classification algorithm for that data set. I think supervised learning could work here.
This is actually a hard problem to solve.
Assuming you have only a single microphone data. The raw data you get when you open an audio file (time-domain signal) has some, but not a lot of information for this kind of processing. You need to go into the frequency domain using FFTs and look at the statistics of the the frequency bins and use that to build a classifier using SVM or Random Forests.
With all due respect to #Karoly-Horvath, I would also not use any recordings that has undergone compression, such as mp3. Audio compression algorithms always distorts the higher frequencies, which as it turns out, is an important feature in detecting wind now. If possible, get the raw PCM data from a mic.
You also need to make sure your recording is sampled at at least 24kHz so you have information of the signal up to 12kHz.
Finally - the wind shape in the frequency domain is not a simple filtered white noise. The characteristics is that it usually has high energy in the low frequencies (a rumbling type of sound) with sheering and flapping sounds in the high frequencies. The high frequency energy is quite transient, so if your FFT size is too big, you will miss this important feature.
If you have 2 microphone data, then this gets a little bit easier. Wind, when recorded, is a local phenomenon. Sure, in recordings, you can hear the rustling of leaves or the sound of chimes caused by the wind. But that is not wind-noise and should not be filtered out.
The actual annoying wind noise you hear in a recording is the air hitting the membrane of your microphone. That effect is a local event - and can be exploited if you have 2 microphones. It can be exploited because the event is local to each individual mic and is not correlated with the other mic. Of course, where the 2 mics are placed in relations to each other is also important. They have to be reasonably close to each other (say, within 8 inches).
A time-domain correlation can then be used to determine the presence of wind noise. (All the other recorded sound are correlated with each other because the mics are fairly close to each other, so a high correlation means no wind, low correlation means wind). If you are going with this approach, your input audio file need not be uncompressed. A reasonable compression algorithm won't affect this.
I hope this overview helps.

