Can you isolate audio data given two mics? - algorithm

So I had an idea for a program, but I really have no idea if it would work, and I didn't really know where to post this question. I used which computer science programming stack exchange sites do I post on in order to determine, that since this is an algorithmic question (can this algorithm be written), I thought this would be the best place.
So the idea is this. Imagine you are sitting at a table with someone else each of you have a mic, as if you're doing a podcast or something. Your voice is being heard by your mic, and the other mic, albeit faintly.
The question is, can you use the distance between the two mics to act as a filter on your voice only?
Your voice is heard by the mic in front of you first, then by the mic further away from you at a very specific interval, since they are stationary. Any audio elsewhere in the room reaches the two mics at different interval signatures, for instance, if at a perpendicular angle the audio would reach the two mics at the same time.
It seems to me that this relationship could be used to isolate audio coming from a certain angle and filter out all the rest. Is this possible? Is it done already?

Related

Viable use of genetic algorithms to train neural nets in a poker bot?

I am designing a bot to play Texas Hold'Em Poker on tables of up to ten players, and the design includes a few feed forward neural networks (FFNN). These neural nets each have 8 to 12 inputs, 2 to 6 outputs, and 1 or 2 hidden layers, so there are a few hundred weights that I have to optimize. My main issue with training through back propagation is getting enough training data. I play poker in my spare time, but not enough to gather data on my own. I have looked into purchasing a few million hands off of a poker site, but I don't think my wallet will be very happy with me if I do... So, I have decided on approaching this by designing a genetic algorithm. I have seen examples of FFNNs being trained to play games like Super Mario and Tetris using genetic algorithms, but never for a game like poker, so I want to know if this is a viable approach to training my bot.
First, let me give a little background information (this may be confusing if you are unfamiliar with poker). I have a system in place that allows the bot to put its opponents on a specific range of hands so that it can make intelligent decisions accordingly, but it relies entirely on accurate output from three different neural networks:
NN_1) This determines how likely it is that an opponent is a) playing the actual value of his hand, b) bluffing, or c) playing a hand with the potential to become stronger later on.
NN_2) This assumes the opponent is playing the actual value of his hand and outputs the likely strength. It represents option (a) from the first neural net.
NN_3) This does the same thing as NN_2 but instead assumes the opponent is bluffing, representing option (b).
Then I have an algorithm for option (c) that does not use a FFNN. The outputs for (a), (b), and (c) are then combined based on the output from NN_1 to update my opponent's range.
Whenever the bot is faced with a decision (i.e. should it fold, call, or raise?), it calculates which is most profitable based on its opponents' hand ranges and how they are likely to respond to different bet sizes. This is where the fourth and final neural net comes in. It takes inputs based on properties unique to each player and the state of the table, and it outputs the likelihood of the opponent folding, calling, or raising.
The bot will also have a value for aggression (how likely it is to raise instead of call) and its opening range (which hands to play pre-flop). These four neural networks and two values will define each generation of bots in my genetic algorithm.
Here is my plan for training:
I will be simulating multiple large tournaments with 10n initial bots each with random values for everything. For the first few dozen tournaments, they will all be placed on tables of 10. They will play until either one bot is left or they play, say, 1,000 hands. If they reach that hand limit, the remaining bots will instantly go all-in every hand until one is left. After each table has completed, the most accurate FFNNs will be placed in the winning bot that will move on to the next round (even if the bot containing the best FFNN was not the winner). The winning bot will retain its aggression and opening range values. The tournament ends when only 100 bots remain, and random variations on those bots will generate the players for the next tournament. I'm assuming the first few tournaments will be complete chaos, so I don't want to narrow down my options too much early on.
If by some miracle, the bots actually develop a profitable, or at least somewhat coherent, strategy (I will check for this periodically), I will begin decreasing the amount of variation between bots. Anyone who plays poker could tell you that there are different types of players each with different strategies. I want to make sure that I am allowing enough room for different strategies to develop throughout this process. Then I may develop some sort of "super bot" that can switch between those different strategies if one is failing.
So, are there any glaring issue with this approach? If so, how would you recommend fixing them? Do you have any advice for speeding up this process or increasing my chances of success? I just want to make sure I'm not about to waste hundreds of hours on something doomed to fail. Also, if this site is not the correct place to be asking this question, please refer me to another website before flagging this. I would really appreciate it. Thanks all!
It will be difficult to use ANN for poker bot. It is better to think for expert system. You can use odds calculator to have numerical evaluation of the hand strength and after that expert system for money management (risk management). ANNs are good in other problems.

Is there an algorithm to combine audio from multiple microphones to improve audio quality?

Consider a situation where you have multiple microphones, each capable of transmitting the audio they pick up over a wifi network (meaning that the audio can be time delayed by several milliseconds or more).
Is there an algorithm that can combine the audio from multiple microphones to produce a higher quality audio recording, detecting and correcting for any time delay?
For the detection/correction of time delay, you might want to look for "feature extraction". It determines key points in the audio to match up.
This works best if all the microphones are hearing (roughly) the same thing, though. For a studio-type environment, where each mic is directional and aimed at a different instrument, it may have a very hard time identifying common features.
I'm unsure of what "higher quality" means to you, though. I assume you mean the least amount of noise. If that's the case, you might be interested in this answer, which is about noise detection. You can calculate the signal/noise ratio of each input and weight them as you see fit when combining.
There are other ways to reduce noise as well. You could simply run one of many noise reduction techniques on each input, or on the mixed output.
If you mean something else by "quality", then you might be headed into tougher areas. There is a reason professional mixers get paid, because computers aren't good at telling what sounds "better".
Of course, there may not be any need to reinvent the wheel at all. There are probably several open-source programs that do this kind of stuff. I'd think the Audacity source would have everything you want.

Artificial Intelligence for card battle based game

I want to make a game card battle based game. In this cards have specific attributes which can increase player's hp/attack/defense or attack an enemy to reduce his hp/attack/defense
I am trying to make an AI for this game. AI has to predict which card it will select on the basis of current situations like AI's hp/attack/defense and Enemy's hp/attack/defense. Since AI cannot see enemy's card hence it cannot predict future moves.
I searched few techniques of AI like minmax but I think minmax will not be suitable since AI cannot predict any future moves.
I am searching for a technique which is very flexible so that i can add a large variety of cards later.
Can you please suggest a technique for such game.
Thanks
This isn't an ActionScript 3 topic per se but I do think it's rather interesting.
First I'd suggest picking up Yu-Gi-Oh's Stardust Accelerator World Championship 2009 for the Nintendo DS or a comparable game.
The game has a fairly advanced computer AI system that not only deals with expected advantage or disadvantage in terms of hit points but also card advantage and combos. If you're taking on a challenge like this, I definately recommend you do the required research (plus, when playing video games is research, who can complain?)
My suggestion for building an AI is as follows:
As the computer decides its move, create an array of Move objects. Then have it create a new Move object for each possible Move that it can see.
For each move object, calculate how much less HP the opponent will have, how many cards they will still have, how many creatures,etc.
Have the computer decide what's most important (more damage, more card advantage) and have it play that move.
More sophisticated AI's will also think several turns in advance and perhaps "see" moves that others do not.
I suggest you look at this game of Reversi I built a few weeks back for fun in Flash. This has a very basic AI implemented, but the basics could be applied to your situation.
Basically, the way that game works is after each move (player or CPU, so I can determine if the player made the right move in comparison to what the CPU would have made), I create a Vector of each possible legal move. I then decide which move provides the highest score change, and set that as best move. However, I also check to see if the move would result in the other player having access to a corner (if you've never played, the player who grabs the corners generally wins). If it does, I tell the CPU to avoid that move and check the second best move and so on. The end result is a CPU who can actually put up a fight.
Keep in mind that this is just a single afternoon of work (for the entire game, from the crappy GUI to the functionality to the AI) so it is very basic and I could do things like run future possible moves through the check sequence as well. Fun fact, though, my moves (which I based the AI on obviously) are the ones the CPU would make nearly 80% of the time. The only time that does not hold true is when I play the game like you would Chess, where your move is made solely to position a move four turns down the line
For your game, you have a ton of variables to consider and not a single point scale as I had. I would suggest listing out each thing and applying a point value to each thing so you can apply importance values to each one. I did something similar for a caching system that automatically determines what is the most important file to keep based on age, usage, size, etc. You then look at each card in the CPU's hand, calculate what each card's value is and play that card (assuming it is legal to do so, of course).
Once you figure that out, you can look into things like what each move could do in the next turn (i.e. "damage" values for each move). And once that is done, you could add functionality to it that would let the CPU make strategic moves that would allow them to use a more powerful card or perform a "finishing" move or however it works in the end.
Again, though, keep it to a simple point based system and keep going from there. You need something you can physically compare, so sticking to a point based system makes that simple.
I apologize for the length of this answer, but I hope it helps in some way.

Detecting wind noise [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I want to develop an app for detecting wind according the audio stream.
I need some expert thoughts here, just to give me guide lines or some links, I know this is not easy task but I am planning to put a lot of effort here.
My plan is to detect some common patterns in the stream, and if the values are close to this common patterns of the wind noise I will notify that match is found, if the values are closer to the known pattern great, I can be sure that the wind is detected, if the values doesn't match with the patterns then I guess there is no so much wind....
That is my plan at first, but I need to learn how this things are done. Is there some open project already doing this ? Or is there someone who is doing research on this topics ?
The reason I write on this forum is because I do not know how to google it, the things I found was not I was looking for. I really do not know how to start developing this kind of algorithm.
EDIT 1 :
I tried to record a wind, and when I open the saved audio file for me it was just a bunch of numbers :). I do not even see in what format should I save this, is wave good enough ? Should I use something else, or what if I convert the wind noise audio file in mp3 : is this gonna help with parsing ?
Well I got many questions, that is because I do not know from where to read more about this kind of topic. I tag my question with guidlines so I hope someone will help me.
There must be something that is detectable, cause the wind noise is so common, there must be somehow to detect this, we need only someone to give me tips, someone who is familiar with this topic.
I just came across this post I have recently made a library which can detect wind noise in recordings.
I made a model of wind noise and created a database of examples and then trained a Machine Learning algorithm to detect and meter the wind level in a perceptually weighted way.
The C++/C code is here if it is of use to anyone!
The science for your problem is called "pattern classification", especially the subfield of "audio pattern classification". The task is abstracted as classifying a sound recording into two classes (wind and not wind). You seem to have no strong background in signal processing yet, so let me insert one central warning:
Pattern classification is not as easy as it looks at first. Humans excel at pattern classification. Computers don't.
A good first approach is often to compute the correlation of the Fourier transform of your signal and a sample. Don't know how much that will depend on wind speed, however.
You might want to have a look at the bag-of-frames approach, it was used successfully to classify ambient noise.
As #thiton mentioned this is an example of audio pattern classification.
Main characteristics for wind: it's a shaped (band/hp filtered) white noise with small semi-random fluctuations in amplitude and pitch. At least that's how most synthesizers reproduce it and it sounds quite convincing.
You have to check the spectral content and change in the wavefile, so you'll need FFT. Input format doesn't really matter, but obviously raw material (wav) is better.
Once you got that you should detect that it's close to some kind of colored noise and then perhaps extract series of pitch and amplitude and try to use classic pattern classification algorithm for that data set. I think supervised learning could work here.
This is actually a hard problem to solve.
Assuming you have only a single microphone data. The raw data you get when you open an audio file (time-domain signal) has some, but not a lot of information for this kind of processing. You need to go into the frequency domain using FFTs and look at the statistics of the the frequency bins and use that to build a classifier using SVM or Random Forests.
With all due respect to #Karoly-Horvath, I would also not use any recordings that has undergone compression, such as mp3. Audio compression algorithms always distorts the higher frequencies, which as it turns out, is an important feature in detecting wind now. If possible, get the raw PCM data from a mic.
You also need to make sure your recording is sampled at at least 24kHz so you have information of the signal up to 12kHz.
Finally - the wind shape in the frequency domain is not a simple filtered white noise. The characteristics is that it usually has high energy in the low frequencies (a rumbling type of sound) with sheering and flapping sounds in the high frequencies. The high frequency energy is quite transient, so if your FFT size is too big, you will miss this important feature.
If you have 2 microphone data, then this gets a little bit easier. Wind, when recorded, is a local phenomenon. Sure, in recordings, you can hear the rustling of leaves or the sound of chimes caused by the wind. But that is not wind-noise and should not be filtered out.
The actual annoying wind noise you hear in a recording is the air hitting the membrane of your microphone. That effect is a local event - and can be exploited if you have 2 microphones. It can be exploited because the event is local to each individual mic and is not correlated with the other mic. Of course, where the 2 mics are placed in relations to each other is also important. They have to be reasonably close to each other (say, within 8 inches).
A time-domain correlation can then be used to determine the presence of wind noise. (All the other recorded sound are correlated with each other because the mics are fairly close to each other, so a high correlation means no wind, low correlation means wind). If you are going with this approach, your input audio file need not be uncompressed. A reasonable compression algorithm won't affect this.
I hope this overview helps.

How to detect the BPM of a song in php [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
How can the tempo/BPM of a song be determined programmatically? What algorithms are commonly used, and what considerations must be made?
This is challenging to explain in a single StackOverflow post. In general, the simplest beat-detection algorithms work by locating peaks in sound energy, which is easy to detect. More sophisticated methods use comb filters and other statistical/waveform methods. For a detailed explication including code samples, check this GameDev article out.
The keywords to search for are "Beat Detection", "Beat Tracking" and "Music Information Retrieval". There is lots of information here: http://www.music-ir.org/
There is a (maybe) annual contest called MIREX where different algorithms are tested on their beat detection performance.
http://nema.lis.illinois.edu/nema_out/mirex2010/results/abt/mck/
That should give you a list of algorithms to test.
A classic algorithm is Beatroot (google it), which is nice and easy to understand. It works like this:
Short-time FFT the music to get a sonogram.
Sum the increases in magnitude over all frequencies for each time step (ignore the decreases). This gives you a 1D time-varying function called the "spectral flux".
Find the peaks using any old peak detection algorithm. These are called "onsets" and correspond to the start of sounds in the music (starts of notes, drum hits, etc).
Construct a histogram of inter-onset-intervals (IOIs). This can be used to find likely tempos.
Initialise a set of "agents" or "hypotheses" for the beat-tracking result. Feed these agents the onsets one at a time in order. Each agent tracks the list of onsets that are also beats, and the current tempo estimate. The agents can either accept the onsets, if they fit closely with their last tracked beat and tempo, ignore them if they are wildly different, or spawn a new agent if they are in-between. Not every beat requires an onset - agents can interpolate.
Each agent is given a score according to how neat its hypothesis is - if all its beat onsets are loud it gets a higher score. If they are all regular it gets a higher score.
The highest scoring agent is the answer.
Downsides to this algorithm in my experience:
The peak-detection is rather ad-hoc and sensitive to threshold parameters and whatnot.
Some music doesn't have obvious onsets on the beats. Obviously it won't work with those.
Difficult to know how to resolve the 60bpm-vs-120bpm issue, especially with live tracking!
Throws away a lot of information by only using a 1D spectral flux. I reckon you can do much better by having a few band-limited spectral fluxes (and maybe one broadband one for drums).
Here is a demo of a live version of this algorithm, showing the spectral flux (black line at the bottom) and onsets (green circles). It's worth considering the fact that the beat is extracted from only the green circles. I've played back the onsets just as clicks, and to be honest I don't think I could hear the beat from them, so in some ways this algorithm is better than people at beat detection. I think the reduction to such a low-dimensional signal is its weak step though.
Annoyingly I did find a very good site with many algorithms and code for beat detection a few years ago. I've totally failed to refind it though.
Edit: Found it!
Here are some great links that should get you started:
http://marsyasweb.appspot.com/
http://www.vamp-plugins.org/download.html
Beat extraction involves the identification of cognitive metric structures in music. Very often these do not correspond to physical sound energy - for example, in most music there is a level of syncopation, which means that the "foot-tapping" beat that we perceive does not correspond to the presence of a physical sound. This means that this is a quite different field to onset detection, which is the detection of the physical sounds, and is performed in a different way.
You could try the Aubio library, which is a plain C library offering both onset and beat extraction tools.
There is also the online Echonest API, although this involves uploading an MP3 to a website and retrieving XML, so might not be so suitable..
EDIT: I came across this last night - a very promising looking C/C++ library, although I haven't used it myself. Vamp Plugins
The general area of research you are interested in is called MUSIC INFORMATION RETRIEVAL
There are many different algorithms that do this but they all are fundamentally centered around ONSET DETECTION.
Onset detection measures the start of an event, the event in this case is a note being played. You can look for changes in the weighted fourier transform (High Frequency Content) you can look for large changes in spectrial content. (Spectrial Difference). (there are a couple of papers that I recommend you look into further down) Once you apply an onset detection algorithm you pick off where the beats are via thresholding.
There are various algorithms that you can use once you've gotten that time localization of the beat. You can turn it into a pulse train (create a signal that is zero for all time and 1 only when your beat happens) then apply a FFT to that and BAM now you have a Frequency of Onsets at the largest peak.
Here are some papers to lead you in the right direction:
https://web.archive.org/web/20120310151026/http://www.elec.qmul.ac.uk/people/juan/Documents/Bello-TSAP-2005.pdf
https://adamhess.github.io/Onset_Detection_Nov302011.pdf
Here is an extension to what some people are discussing:
Someone mentioned looking into applying a machine learning algorithm: Basically collect a bunch of features from the onset detection functions (mentioned above) and combine them with the raw signal in a neural network/logistic regression and learn what makes a beat a beat.
look into Dr Andrew Ng, he has free machine learning lectures from Stanford University online (not the long winded video lectures, there is actually an online distance course)
If you can manage to interface with python code in your project, Echo Nest Remix API is a pretty slick API for python:
There's a method analysis.tempo which will give you the BPM. It can do a whole lot more than simple BPM, as you can see from the API docs or this tutorial
Perform a Fourier transform, and find peaks in the power spectrum. You're looking for peaks below the 20 Hz cutoff for human hearing. I'd guess typically in the 0.1-5ish Hz range to be generous.
SO question that might help: Bpm audio detection Library
Also, here is one of several "peak finding" questions on SO: Peak detection of measured signal
Edit: Not that I do audio processing. It's just a guess based on the fact that you're looking for a frequency domain property of the file...
another edit: It is worth noting that lossy compression formats like mp3, store Fourier domain data rather than time domain data in the first place. With a little cleverness, you can save yourself some heavy computation...but see the thoughtful comment by cobbal.
To repost my answer: The easy way to do it is to have the user tap a button in rhythm with the beat, and count the number of taps divided by the time.
Others have already described some beat-detection methods. I want to add that there are some libraries available that provide techniques and algorithms for this sort of task.
Aubio is one of them, it has a good reputation and it's written in C with a C++ wrapper so you can integrate it easily with a cocoa application (all the audio stuff in Apple's frameworks is also written in C/C++).
There are several methods to get the BPM but the one I find the most effective is the "beat spectrum" (described here).
This algorithm computes a similarity matrix by comparing each short sample of the music with every others. Once the similarity matrix is computed it is possible to get average similarity between every samples pairs {S(T);S(T+1)} for each time interval T: this is the beat spectrum. The first high peak in the beat spectrum is most of the time the beat duration. The best part is you can also do things like music structure or rythm analyses.
I'd imagine this will be easiest in 4-4 dance music, as there should be a single low frequency thud about twice a second.

Resources