Mac OS X: Audio frequency shift by change of sample rate? - macos

I want to change the frequency of a voice recording by changing sample rate on Mac OS X.
This is a research project aimed at people who stutter. It's essential that the latency is very low – this is, for instance, why I'm not considering Fast Fourier Transforms. Instead, I want to collect samples at a rate of, say, 44kHz, then do one of two things:
1) Play the samples back twice as slowly (i.e. 22kHz). This will result in increasing asynchrony with the source. It would be useful if I can restart the sampling every 1 second or so to prevent the asynchrony from becoming too noticeable.
2) Play the samples back twice as quickly. Obviously, it's impossible to do this continuously (i.e. can't play back samples which haven't been collected yet). To get around this, I'm intending to gate the playback with a square wave. Samples will be played back twice as quickly as they were recorded during the peak of the square wave. Nothing will be heard (but samples will still be collected) during the trough of the square wave.
I've prepared a PDF which describes the project in more detail here:
https://www.dropbox.com/s/8u3tz7d9hhxd3t9/Frequency%20shift%20techniques.pdf?dl=0
A friend has helped me with some of the programming for this using PortAudio. Unfortunately, we're getting very long latencies. I think this might be because PortAudio is working at too high a level. From the code, it looks to me as if PortAudio is buffering the incoming audio stream and then making alterations which are prima facie similar to the ones I've described above, but which are in fact operations on the buffered stream.
This isn't what I want at all. It's essential that the processing unit does as little as possible. Referring to the conditions (1) and (2) above, all the computer should do is to (1) play back the samples without any manipulation but twice as slowly; or (2) store the incoming samples then play them back twice as quickly. There should be no other processing whatsoever. I think this is the only way I'll get the very low latencies I'm looking for.
I wondered if it would be better to try doing this directly in Core Audio for OS X, rather than using PortAudio? This would limit platform compatibility. But the low latency is much more important than compatibility.
Am I likely to be able to do what I want using a mid-level service, such as Audio Units? Or would I need to write directly for a low-level service such as I/O Kit? How would I go about it?

It looks like the best thing for you would be to use something like Max/MSP or Pure Data. This will allow you to avoid working with text-based languages and should be good for you rapidly develop what you're looking to do. I/O kit is a bit too low-level for what you're trying to do.
Since max is not a text based language, sharing the code itself is a bit tricky on sites like stack overflow. I've included a screengrab. You can copy and paste max code, but it's a bit ugly and innappropiate for this.
here's a quick description. The box that says rect~ 1 is generating a square wave at Hz. The snapshot~ box is capturing the values this spits out. The if boxes check when it's greater than zero or less than zeros (peaks and troughs). If it gets a trough, the record~ box records the signal from the microphone box and stores it in a buffer. the groove~ box is a sampler that plays back the audio in this buffer, when it recives a bang from the if box, it plays back the audio. The sig~ box is being used to control the playback rate.
Also, you may not know this but the .PDF you're trying to share is unavailable.
One other thing, if latency is important, you should learn about something called a click train. This is basically where you send a signal with a single 1 at the start and time how long it takes for that value to get through your system.

Related

Hardware for image processing (object recognition) from video

What I want to do is to recognize signs on roads using some embedded device with a cam. I was thinking about RaspberryPi 2b, but I don't know if it's power is sufficient. I don't have to analyze every frame of the video, but still the more frames per second I analyze the better especially with high movement speeds.
Question is: Are there any better boards, that could be used for a task like this? (it would be best if they could run Linux/Windows10 on themselves as I am going to use openCV)
for a problem like this you can try to over analyse it and try to pick the hardware before solving the problem. but that is basically the cart before the horse.
first take some video
second digitize it or get it into your daily driver or whatever your preferred software development computer is
start working on the algorithms to solve whatever problem you want to solve, bearing in mind that eventually you want to embed this so you may need to lean more towards lighter weight libraries or roll your own vs heavyweight or operating system dependent solutions (feeding it into photoshop is not a solution, nor is some matlab thing).
you may find that you need better video, important info
eventually you get close to the algorithm and THEN or as you approach the algorithm you can either prototype it on some raspberry pi or beaglebone boards or use a simulator if man hours is cheaper for you that hardware. how many operations per second or per sign or whatever, with some derating how many operations per second do I think i can do on platform X (which is not deterministic, even with experiments as one line of code changed could completely change the performance, esp if on the edge). an instruction set simulator is not going to mimic the pipeline right, but you can take an open source one and modify it to count instructions or types of instructions branches vs non-branches, etc. roughly convert that to clocks, etc. again if hardware is more expensive than man hours. at the price of a raspberry pi and beaglebone black or white it is hard to not just buy one and try it.
a valid stackoverflow question would be, I have this video clip and I am trying to detect whether the car has passed a road sign or not and here is my code but it doesnt work or here is my algorithm but it doesnt work. once past that hurdle another so question could be, I have detected there is a sign in this frame but i cannot detect whether it is a stop sign or yield sign or other, here is my algorithm or code and here are my results and my expected results. another valid so question would be I have this algorithm that works, but I am not able to optimize it for platform X I am close within N percent (needs to be a smallish number less than 20% for example) can this be optimized further.
Hardware For fast Object recognition
You can use Raspberry pi 4 with Google's Usb Accelerator

What are good polling algorithms

I am trying to create an application for polling different sensors. I want to make this polling efficient so that I don't poll a slow changing sensor very frequently. On the other hand there may be some sensors like temp sensor whose values keep on changing frequently.
I found an analogy in twitter. When a twitter stream is open how does it automatically give real time notifications of new tweets? . They must be polling a web server of tweet. Right? How this polling rate is decided. There must be some algorithms or may be I am missing proper term/keyword for this thing to Google it. This must have got something to do with push and pull based architectures. Right?
For getting accurate sensor waveforms, search for Nyquist rate, e.g., this Wikipedia page
If you are going to be filtering your sensor streams, it is best to sample at a fixed frequency to make the filter math easier.

How do I interpret the audio stream from the microphone to detect a certain sound in WP7?

I am using the basic methods from http://msdn.microsoft.com/en-us/library/gg442302(v=vs.92).aspx to access the microphone. But I am trying to detect the occurrance of a specific sound, like the clapper. How does one interpret the stream from the microphone? What exactly do the floats in the buffer represent?
Thanks
I think this might help http://en.wikipedia.org/wiki/Pulse-code_modulation. I think the values in a way represent the offset of the mechanical part in the microphone from its middle position, but I am sure the theory and vocabulary might go really deep.
When it comes to recognizing sounds - it can also get arbitrarily complex, but a clapper might be a simple task - you basically want to detect a sudden increase in volume, which would manifest in a sharp, short-term increase of the moving-average of absolute values in the stream, so I'd put a sliding windows on the stream and keep checking with certain thresholds - one short window for the high volume threshold and two adjacent, longer and lower threshold windows to make sure there was no such noise before and after the clapper.

Detecting wind noise [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I want to develop an app for detecting wind according the audio stream.
I need some expert thoughts here, just to give me guide lines or some links, I know this is not easy task but I am planning to put a lot of effort here.
My plan is to detect some common patterns in the stream, and if the values are close to this common patterns of the wind noise I will notify that match is found, if the values are closer to the known pattern great, I can be sure that the wind is detected, if the values doesn't match with the patterns then I guess there is no so much wind....
That is my plan at first, but I need to learn how this things are done. Is there some open project already doing this ? Or is there someone who is doing research on this topics ?
The reason I write on this forum is because I do not know how to google it, the things I found was not I was looking for. I really do not know how to start developing this kind of algorithm.
EDIT 1 :
I tried to record a wind, and when I open the saved audio file for me it was just a bunch of numbers :). I do not even see in what format should I save this, is wave good enough ? Should I use something else, or what if I convert the wind noise audio file in mp3 : is this gonna help with parsing ?
Well I got many questions, that is because I do not know from where to read more about this kind of topic. I tag my question with guidlines so I hope someone will help me.
There must be something that is detectable, cause the wind noise is so common, there must be somehow to detect this, we need only someone to give me tips, someone who is familiar with this topic.
I just came across this post I have recently made a library which can detect wind noise in recordings.
I made a model of wind noise and created a database of examples and then trained a Machine Learning algorithm to detect and meter the wind level in a perceptually weighted way.
The C++/C code is here if it is of use to anyone!
The science for your problem is called "pattern classification", especially the subfield of "audio pattern classification". The task is abstracted as classifying a sound recording into two classes (wind and not wind). You seem to have no strong background in signal processing yet, so let me insert one central warning:
Pattern classification is not as easy as it looks at first. Humans excel at pattern classification. Computers don't.
A good first approach is often to compute the correlation of the Fourier transform of your signal and a sample. Don't know how much that will depend on wind speed, however.
You might want to have a look at the bag-of-frames approach, it was used successfully to classify ambient noise.
As #thiton mentioned this is an example of audio pattern classification.
Main characteristics for wind: it's a shaped (band/hp filtered) white noise with small semi-random fluctuations in amplitude and pitch. At least that's how most synthesizers reproduce it and it sounds quite convincing.
You have to check the spectral content and change in the wavefile, so you'll need FFT. Input format doesn't really matter, but obviously raw material (wav) is better.
Once you got that you should detect that it's close to some kind of colored noise and then perhaps extract series of pitch and amplitude and try to use classic pattern classification algorithm for that data set. I think supervised learning could work here.
This is actually a hard problem to solve.
Assuming you have only a single microphone data. The raw data you get when you open an audio file (time-domain signal) has some, but not a lot of information for this kind of processing. You need to go into the frequency domain using FFTs and look at the statistics of the the frequency bins and use that to build a classifier using SVM or Random Forests.
With all due respect to #Karoly-Horvath, I would also not use any recordings that has undergone compression, such as mp3. Audio compression algorithms always distorts the higher frequencies, which as it turns out, is an important feature in detecting wind now. If possible, get the raw PCM data from a mic.
You also need to make sure your recording is sampled at at least 24kHz so you have information of the signal up to 12kHz.
Finally - the wind shape in the frequency domain is not a simple filtered white noise. The characteristics is that it usually has high energy in the low frequencies (a rumbling type of sound) with sheering and flapping sounds in the high frequencies. The high frequency energy is quite transient, so if your FFT size is too big, you will miss this important feature.
If you have 2 microphone data, then this gets a little bit easier. Wind, when recorded, is a local phenomenon. Sure, in recordings, you can hear the rustling of leaves or the sound of chimes caused by the wind. But that is not wind-noise and should not be filtered out.
The actual annoying wind noise you hear in a recording is the air hitting the membrane of your microphone. That effect is a local event - and can be exploited if you have 2 microphones. It can be exploited because the event is local to each individual mic and is not correlated with the other mic. Of course, where the 2 mics are placed in relations to each other is also important. They have to be reasonably close to each other (say, within 8 inches).
A time-domain correlation can then be used to determine the presence of wind noise. (All the other recorded sound are correlated with each other because the mics are fairly close to each other, so a high correlation means no wind, low correlation means wind). If you are going with this approach, your input audio file need not be uncompressed. A reasonable compression algorithm won't affect this.
I hope this overview helps.

Algorithm for matching data

I have a project where I am testing a device that is very sensitive to noise (electromagnetic, radio, etc...). The device generates 5-6 bytes per second of binary data (looks like gibberish to an untrained eye) based on a give input (audio).
Depending on noise, sometime the device will miss characters, sometimes it will insert random characters, sometimes multiples of both.
I have written an app that gives the user an ability to see on the fly the errors that it generates (as compared to the master file [e.g. what the device should output in ideal conditions]). My algorithm basically takes each byte in the live data and compares it to the byte in the same position in the known master file. If the bytes don't match, I have a window of 10 characters both ways from the current position, where I'll seek a match nearby. If that matches (plus a validation or two), I visually mark up the location in the UI and register an error.
This approach works reasonably well and actually, given the speed of the incoming data, works real time as well. However, I feel like what I am doing is not optimal and the approach would fall apart if the data would stream at higher rates.
Are there other approaches I could take? Are there known algorithms for this type of thing?
I read many years ago that NASA's data collection outfit (e.g. ones that communicate with crafts in space and on the Moon/Mars) have had a 0.00001% loss of data despite tremendous interference in space.
Any ideas?
I presume of main interest is the signal generated by the device? What is more important? Detecting when an error has occurred or making the signal 'robust' against such errors? I do a lot of signal processing lately and denoising a signal is part of my routine, I'm basically trying to estimate the real signal and remove any contaminants.
I don't know how the signal generated by the device is further used...if it's being recorded to a computer, then you can easily apply some denoising, try wavelet denoising for instance. You will find packages for doing this in several languages of your choice.

Resources