Is there an algorithm to combine audio from multiple microphones to improve audio quality?

Is there an algorithm to combine audio from multiple microphones to improve audio quality? - algorithm

Consider a situation where you have multiple microphones, each capable of transmitting the audio they pick up over a wifi network (meaning that the audio can be time delayed by several milliseconds or more).
Is there an algorithm that can combine the audio from multiple microphones to produce a higher quality audio recording, detecting and correcting for any time delay?

For the detection/correction of time delay, you might want to look for "feature extraction". It determines key points in the audio to match up.
This works best if all the microphones are hearing (roughly) the same thing, though. For a studio-type environment, where each mic is directional and aimed at a different instrument, it may have a very hard time identifying common features.
I'm unsure of what "higher quality" means to you, though. I assume you mean the least amount of noise. If that's the case, you might be interested in this answer, which is about noise detection. You can calculate the signal/noise ratio of each input and weight them as you see fit when combining.
There are other ways to reduce noise as well. You could simply run one of many noise reduction techniques on each input, or on the mixed output.
If you mean something else by "quality", then you might be headed into tougher areas. There is a reason professional mixers get paid, because computers aren't good at telling what sounds "better".
Of course, there may not be any need to reinvent the wheel at all. There are probably several open-source programs that do this kind of stuff. I'd think the Audacity source would have everything you want.

Related

Hardware for image processing (object recognition) from video

What I want to do is to recognize signs on roads using some embedded device with a cam. I was thinking about RaspberryPi 2b, but I don't know if it's power is sufficient. I don't have to analyze every frame of the video, but still the more frames per second I analyze the better especially with high movement speeds.
Question is: Are there any better boards, that could be used for a task like this? (it would be best if they could run Linux/Windows10 on themselves as I am going to use openCV)

for a problem like this you can try to over analyse it and try to pick the hardware before solving the problem. but that is basically the cart before the horse.
first take some video
second digitize it or get it into your daily driver or whatever your preferred software development computer is
start working on the algorithms to solve whatever problem you want to solve, bearing in mind that eventually you want to embed this so you may need to lean more towards lighter weight libraries or roll your own vs heavyweight or operating system dependent solutions (feeding it into photoshop is not a solution, nor is some matlab thing).
you may find that you need better video, important info
eventually you get close to the algorithm and THEN or as you approach the algorithm you can either prototype it on some raspberry pi or beaglebone boards or use a simulator if man hours is cheaper for you that hardware. how many operations per second or per sign or whatever, with some derating how many operations per second do I think i can do on platform X (which is not deterministic, even with experiments as one line of code changed could completely change the performance, esp if on the edge). an instruction set simulator is not going to mimic the pipeline right, but you can take an open source one and modify it to count instructions or types of instructions branches vs non-branches, etc. roughly convert that to clocks, etc. again if hardware is more expensive than man hours. at the price of a raspberry pi and beaglebone black or white it is hard to not just buy one and try it.
a valid stackoverflow question would be, I have this video clip and I am trying to detect whether the car has passed a road sign or not and here is my code but it doesnt work or here is my algorithm but it doesnt work. once past that hurdle another so question could be, I have detected there is a sign in this frame but i cannot detect whether it is a stop sign or yield sign or other, here is my algorithm or code and here are my results and my expected results. another valid so question would be I have this algorithm that works, but I am not able to optimize it for platform X I am close within N percent (needs to be a smallish number less than 20% for example) can this be optimized further.

Hardware For fast Object recognition
You can use Raspberry pi 4 with Google's Usb Accelerator

Drum sound recognition algorithms

I am thinking of trying to make program that will automatically generate drum tabs using an audio file containing only the drumming.
I have thought of using FFT to get an average spectrum peaks during a xxxx ms interval and then compare that to a table containing all the drum parts(snare, tombs, base drum and so no) of that specific drum kit and sound gear.
But i have a feeling that it won't be that easy. Have you guys any suggestions on which methods i could use to solve my issue?
// Eric

It isn't easy for anything except a trivial signal. Almost all western 'classical' and commercial music features coincident drum sounds.
1: Superposition: The original sources add together in a similar manner in the frequency domain as they do in the time domain. Each FFT bin contains contributions from all instruments currently being played (and those which are undamped and still decaying, or resonating sympathetically). Unpicking the various sources is hard - and certainly not a comparison with a library of spectra.
2: The FFT by its definition windows audio in the time domain and yields magnitude and phase of the basis function in each bin over that window period. The best you could say is that content appeared in the bin corresponding to a drum sound within the window period. If you were to compute a 1024 point FFT, the window duration would be 23ms at 44.1kHz. To put this into a musical perspective, 16th notes at 120bpm are 31.3ms apart. You might get away with smaller FFTs.
3: Percussion instrument signals tend to look a lot like noise - at least at the point where the instrument is hit. That is to say, there will be energy spread across the spectrum and no obviously dominant frequencies. After impact, tuned percussion starts to look more 'tonal'.
You probably need to look at a time-domain approach to accurately detect the onset point (onset detection). From there you could look at time or frequency domain characteristics of the signal to try and deduce the instrument in question. There's probably also a lot you could do with a priori knowledge of the genre of music being played, allowing you to predict the patterns that are likely to be present.
This is a particular case of the more generalised audio source separation problem. There's been lots of academic activity in this area, and consequently a lot of published papers describing approaches. Look for source separation, music information retrieval, audio feature detection

Detecting wind noise [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I want to develop an app for detecting wind according the audio stream.
I need some expert thoughts here, just to give me guide lines or some links, I know this is not easy task but I am planning to put a lot of effort here.
My plan is to detect some common patterns in the stream, and if the values are close to this common patterns of the wind noise I will notify that match is found, if the values are closer to the known pattern great, I can be sure that the wind is detected, if the values doesn't match with the patterns then I guess there is no so much wind....
That is my plan at first, but I need to learn how this things are done. Is there some open project already doing this ? Or is there someone who is doing research on this topics ?
The reason I write on this forum is because I do not know how to google it, the things I found was not I was looking for. I really do not know how to start developing this kind of algorithm.
EDIT 1 :
I tried to record a wind, and when I open the saved audio file for me it was just a bunch of numbers :). I do not even see in what format should I save this, is wave good enough ? Should I use something else, or what if I convert the wind noise audio file in mp3 : is this gonna help with parsing ?
Well I got many questions, that is because I do not know from where to read more about this kind of topic. I tag my question with guidlines so I hope someone will help me.
There must be something that is detectable, cause the wind noise is so common, there must be somehow to detect this, we need only someone to give me tips, someone who is familiar with this topic.

I just came across this post I have recently made a library which can detect wind noise in recordings.
I made a model of wind noise and created a database of examples and then trained a Machine Learning algorithm to detect and meter the wind level in a perceptually weighted way.
The C++/C code is here if it is of use to anyone!

The science for your problem is called "pattern classification", especially the subfield of "audio pattern classification". The task is abstracted as classifying a sound recording into two classes (wind and not wind). You seem to have no strong background in signal processing yet, so let me insert one central warning:
Pattern classification is not as easy as it looks at first. Humans excel at pattern classification. Computers don't.
A good first approach is often to compute the correlation of the Fourier transform of your signal and a sample. Don't know how much that will depend on wind speed, however.
You might want to have a look at the bag-of-frames approach, it was used successfully to classify ambient noise.

As #thiton mentioned this is an example of audio pattern classification.
Main characteristics for wind: it's a shaped (band/hp filtered) white noise with small semi-random fluctuations in amplitude and pitch. At least that's how most synthesizers reproduce it and it sounds quite convincing.
You have to check the spectral content and change in the wavefile, so you'll need FFT. Input format doesn't really matter, but obviously raw material (wav) is better.
Once you got that you should detect that it's close to some kind of colored noise and then perhaps extract series of pitch and amplitude and try to use classic pattern classification algorithm for that data set. I think supervised learning could work here.

This is actually a hard problem to solve.
Assuming you have only a single microphone data. The raw data you get when you open an audio file (time-domain signal) has some, but not a lot of information for this kind of processing. You need to go into the frequency domain using FFTs and look at the statistics of the the frequency bins and use that to build a classifier using SVM or Random Forests.
With all due respect to #Karoly-Horvath, I would also not use any recordings that has undergone compression, such as mp3. Audio compression algorithms always distorts the higher frequencies, which as it turns out, is an important feature in detecting wind now. If possible, get the raw PCM data from a mic.
You also need to make sure your recording is sampled at at least 24kHz so you have information of the signal up to 12kHz.
Finally - the wind shape in the frequency domain is not a simple filtered white noise. The characteristics is that it usually has high energy in the low frequencies (a rumbling type of sound) with sheering and flapping sounds in the high frequencies. The high frequency energy is quite transient, so if your FFT size is too big, you will miss this important feature.
If you have 2 microphone data, then this gets a little bit easier. Wind, when recorded, is a local phenomenon. Sure, in recordings, you can hear the rustling of leaves or the sound of chimes caused by the wind. But that is not wind-noise and should not be filtered out.
The actual annoying wind noise you hear in a recording is the air hitting the membrane of your microphone. That effect is a local event - and can be exploited if you have 2 microphones. It can be exploited because the event is local to each individual mic and is not correlated with the other mic. Of course, where the 2 mics are placed in relations to each other is also important. They have to be reasonably close to each other (say, within 8 inches).
A time-domain correlation can then be used to determine the presence of wind noise. (All the other recorded sound are correlated with each other because the mics are fairly close to each other, so a high correlation means no wind, low correlation means wind). If you are going with this approach, your input audio file need not be uncompressed. A reasonable compression algorithm won't affect this.
I hope this overview helps.

Determining the best audio quality

How can you determine the best audio quality in a list of audio files of the same audio clip, with out looking at the audio file's header. The tricky part is that all of the files came from different formats and bit rates and they where all transcoded to the same format and bit rate. How can this be done efficiently?

Many of the answers outlined here refer to common audio measurements such as THD+N, SNR, etc. However, these do not always correlate well with human hearing of audio artifacts. Lossy audio compression techniques typically function by increasing THD+N and SNR, but aim to do so in ways that are difficult for the human ear to detect. A more traditional audio measurement technique may find decreased SNR in a certain frequency band, but does that matter if there's so much energy in adjacent bands that no one would ever notice the difference?
The research paper titled "A Perceptual Audio Quality Measure Based on a Psychoacoustic Sound Representation" outlines an algorithm for quantifying the ability of the human ear to detect audible differences, based on a model of how the ear hears. It takes into factors that do correlate with audio quality as perceived by humans. The paper includes a study comparing their algorithm's results to subjective double-blind testing, to give you an idea of how well their model works.
I could not find a free copy of this paper but a decent university library should have it on file.
Implementing the algorithm would require some knowledge of audio signal processing in the frequency domain. An undergraduate with DSP experience should be able to implement it. If you don't have the reference waveform, you could use information in this paper to quantify how objectionable artifacts may be.
The algorithm would work on PCM audio, preferably time-aligned, and certainly does not require knowledge of the file type or header.

I'm not a software developer (I'm an audio engineer) and what you hear when you compress with mp3 algorithms is:
- less high frequencies: so you can check a loss in the energy of the higher range
- distorted stereo: so you can make a Mid/Side matrix, and check for the THD in the Side
- less phase coherency: maybe you can check that with a correlation meter
Hope it helps, it's a difficult task for a computer!

First, I'm not an audio engineer, but I've been trying to keep in touch about audio compression in general because I have a big mp3 collection and I have some thoughts to share about the subject.
Is the best audio quality that you're looking for from an human perspective? If so, you can't measure by "objective means" like comparing spectograms and such.
If a spectogram is ugly, it doesn't necessarily mean the quality is terrible. What matters is if someone can distinguish an encoded file from an original source doing a blind test. Period. If you want to check the quality of an encoded audio track you have to conduct a blind ABX test.
LAME (and all other kinds of lossy
MP3, AAC, AC3, DTS, ATRAC...
compressors) is so called perceptual
coder. It exploits certain facts about
the nature of human audio perception.
So, you cannot rely simply on
spectrograms to evaluate its quality.
Source
Now, if your objectives are from objective manners/perspectives, you could use EAQUAL, which stands for Evaluation Of Audio Quality:
It's an objective measurement
technique used to measure the quality
of encoded/decoded audio files (very
similiar to PEAQ)
(...)
The results, however when using
objective testing methodologies are
still inconclusive and mostly only
used by codec developers and
researchers.
...or Friedman statistical analysis tool.
(...) performs several statistical
analysis on data sets, which is
particularly suited for listening test
data.
I'm not saying that spectrum analyzers are useless. That's why I posted some utilities. I'm just saying to be careful with all these statistical methods: as someone at the Hydrogenaudio community said once, You don't listen with your eyes. (check this thread I posted as well, it's a great resource). To really prove audio quality from an human perspective, you should test ears and not graphs.
This is a complicated subject, and IMHO I suggest you to look for a specialized audio community like Hydrogenaudio.

If I understand correctly, you have a bunch of audio files that started in different formats with varying quality. They've all been converted to the same format, so you can't use the header to figure out which ones were originally high quality and which ones weren't.
This is a hard problem. There are potentially a few tricks that could catch some quality problems, but detecting, say, something that was converted from a low-bitrate compression algorithm like MP3 would be very hard.
Some easy tricks:
Check the maximum amplitude - if it's low, the quality won't be good.
Measure the highest frequency - if it's low, the original might have had a lower sample rate.

If you have the original you can estimate how it was altered by estimating a transfer function. You will need to assume some model, maybe start with a low-pass filter, add some smudging (convolution) and then run an estimator to produce a measure of quality. You could look around on the wikipedia article on Estimation_theory

I think that disown's answer is good, assuming that you are just trying to estimate a set of parameters. Unfortunately, you also have to define a comparison function for the parameters you have estimated.
What happens if two compressions have both applied a band-pass filter with equally large frequency ranges, but one of the admits higher frequencies than the other. Is one of them better? Which one?
The answer probably depends on which frequencies are being used more in the files you are working with.
An objective measure would be to see which file has lost less entropy. Unfortunately, this is not easy to do correctly.

I'm not too sure about this, but here's a good place to start:
http://en.wikipedia.org/wiki/Signal-to-noise_ratio
I don't think you can calculate SNR from one signal, but if you have a collection of signals then you might be able to work out the SNR comparing all of them.
There are some interesting links at the bottom of the page which could provide some routes of interest as well if that isn't possible.
Also, I'm not an audio engineer, but I know a little about signal processing, is there any way you can measure quantisation levels in audio signals? Perhaps something to look into.

If you do not have the original audio, this is probably a lot of work; it's almost certainly fundamentally impossible in an absolute sense since you can't tell which track's peculiarities are intentional and which bogus. You may even have encodings from different recordings or mixes, in which case plain comparison is fairly meaningless in any case.
Thus, assuming you do not have the original, the best you can probably do is a heuristic approach - which will probably work quite well, but be a lot of effort to implement.
Invest in some audio-processing software and skill; use this to build software to identify common encoder defects heuristically based solely on the output. Such defects might be poor temporal locality of sound hits (suggestion overlarge windows in the compression), high correlation between left and right signals, limited frequency range, etc. (a person with the right experience can probably list dozens).
Rate the quality of the audio on each heuristic on some sliding scale.
Use common sense and as much time+people for testing as you have to weigh the various factors for relevance. For example, while it might be nice to have frequency reproduction up to 24Khz, it's not very important; on the other had lack of sharpness may be more annoying.
If you're lucky, someone's done the job before you, because this sounds like an expensive proposition.

A New Perceptual Quality Measure for
Bit Rate Reduced Audio
http://citeseer.ist.psu.edu/cache/papers/cs/15888/http:zSzzSzwww-ft.ee.tu-berlin.dezSzPublikationenzSzpaperszSzAES1996Copenhagen.pdf/a-new-perceptual-quality.pdf
Perceptual audio coding algorithms
perform a drastic irrelevancy
reduction in order to achieve a high
coding gain. Signal components that
are assumed to be unperceivable are
not transmitted and the coding noise
is spectrally shaped according to the
masking threshold of the audio signal.
Simple quality measures (e.g. signal
to noise ratio, harmonic distortions),
which can not separate these inaudible
artefacts from audible errors, can not
be used to assess the performance of
such coders.
For the quality evaluation of
perceptual audio codecs, appropriate
measurement algorithms are needed,
which detect and assess audible
artefacts by comparing the output of
the codec with the uncoded reference.
A filter bank based perceptual model
is presented, which yields better
temporal resolution than FFT-based
approaches and thus allows a more
precise modelling of pre- and
post-masking and a refined analysis of
the envelopes within each filter
channel.
See Also
http://academic.research.microsoft.com/Paper/201987.aspx?viewType=1

How to detect the BPM of a song in php [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
How can the tempo/BPM of a song be determined programmatically? What algorithms are commonly used, and what considerations must be made?

This is challenging to explain in a single StackOverflow post. In general, the simplest beat-detection algorithms work by locating peaks in sound energy, which is easy to detect. More sophisticated methods use comb filters and other statistical/waveform methods. For a detailed explication including code samples, check this GameDev article out.

The keywords to search for are "Beat Detection", "Beat Tracking" and "Music Information Retrieval". There is lots of information here: http://www.music-ir.org/
There is a (maybe) annual contest called MIREX where different algorithms are tested on their beat detection performance.
http://nema.lis.illinois.edu/nema_out/mirex2010/results/abt/mck/
That should give you a list of algorithms to test.
A classic algorithm is Beatroot (google it), which is nice and easy to understand. It works like this:
Short-time FFT the music to get a sonogram.
Sum the increases in magnitude over all frequencies for each time step (ignore the decreases). This gives you a 1D time-varying function called the "spectral flux".
Find the peaks using any old peak detection algorithm. These are called "onsets" and correspond to the start of sounds in the music (starts of notes, drum hits, etc).
Construct a histogram of inter-onset-intervals (IOIs). This can be used to find likely tempos.
Initialise a set of "agents" or "hypotheses" for the beat-tracking result. Feed these agents the onsets one at a time in order. Each agent tracks the list of onsets that are also beats, and the current tempo estimate. The agents can either accept the onsets, if they fit closely with their last tracked beat and tempo, ignore them if they are wildly different, or spawn a new agent if they are in-between. Not every beat requires an onset - agents can interpolate.
Each agent is given a score according to how neat its hypothesis is - if all its beat onsets are loud it gets a higher score. If they are all regular it gets a higher score.
The highest scoring agent is the answer.
Downsides to this algorithm in my experience:
The peak-detection is rather ad-hoc and sensitive to threshold parameters and whatnot.
Some music doesn't have obvious onsets on the beats. Obviously it won't work with those.
Difficult to know how to resolve the 60bpm-vs-120bpm issue, especially with live tracking!
Throws away a lot of information by only using a 1D spectral flux. I reckon you can do much better by having a few band-limited spectral fluxes (and maybe one broadband one for drums).
Here is a demo of a live version of this algorithm, showing the spectral flux (black line at the bottom) and onsets (green circles). It's worth considering the fact that the beat is extracted from only the green circles. I've played back the onsets just as clicks, and to be honest I don't think I could hear the beat from them, so in some ways this algorithm is better than people at beat detection. I think the reduction to such a low-dimensional signal is its weak step though.
Annoyingly I did find a very good site with many algorithms and code for beat detection a few years ago. I've totally failed to refind it though.
Edit: Found it!
Here are some great links that should get you started:
http://marsyasweb.appspot.com/
http://www.vamp-plugins.org/download.html

Beat extraction involves the identification of cognitive metric structures in music. Very often these do not correspond to physical sound energy - for example, in most music there is a level of syncopation, which means that the "foot-tapping" beat that we perceive does not correspond to the presence of a physical sound. This means that this is a quite different field to onset detection, which is the detection of the physical sounds, and is performed in a different way.
You could try the Aubio library, which is a plain C library offering both onset and beat extraction tools.
There is also the online Echonest API, although this involves uploading an MP3 to a website and retrieving XML, so might not be so suitable..
EDIT: I came across this last night - a very promising looking C/C++ library, although I haven't used it myself. Vamp Plugins

The general area of research you are interested in is called MUSIC INFORMATION RETRIEVAL
There are many different algorithms that do this but they all are fundamentally centered around ONSET DETECTION.
Onset detection measures the start of an event, the event in this case is a note being played. You can look for changes in the weighted fourier transform (High Frequency Content) you can look for large changes in spectrial content. (Spectrial Difference). (there are a couple of papers that I recommend you look into further down) Once you apply an onset detection algorithm you pick off where the beats are via thresholding.
There are various algorithms that you can use once you've gotten that time localization of the beat. You can turn it into a pulse train (create a signal that is zero for all time and 1 only when your beat happens) then apply a FFT to that and BAM now you have a Frequency of Onsets at the largest peak.
Here are some papers to lead you in the right direction:
https://web.archive.org/web/20120310151026/http://www.elec.qmul.ac.uk/people/juan/Documents/Bello-TSAP-2005.pdf
https://adamhess.github.io/Onset_Detection_Nov302011.pdf
Here is an extension to what some people are discussing:
Someone mentioned looking into applying a machine learning algorithm: Basically collect a bunch of features from the onset detection functions (mentioned above) and combine them with the raw signal in a neural network/logistic regression and learn what makes a beat a beat.
look into Dr Andrew Ng, he has free machine learning lectures from Stanford University online (not the long winded video lectures, there is actually an online distance course)

If you can manage to interface with python code in your project, Echo Nest Remix API is a pretty slick API for python:
There's a method analysis.tempo which will give you the BPM. It can do a whole lot more than simple BPM, as you can see from the API docs or this tutorial

Perform a Fourier transform, and find peaks in the power spectrum. You're looking for peaks below the 20 Hz cutoff for human hearing. I'd guess typically in the 0.1-5ish Hz range to be generous.
SO question that might help: Bpm audio detection Library
Also, here is one of several "peak finding" questions on SO: Peak detection of measured signal
Edit: Not that I do audio processing. It's just a guess based on the fact that you're looking for a frequency domain property of the file...
another edit: It is worth noting that lossy compression formats like mp3, store Fourier domain data rather than time domain data in the first place. With a little cleverness, you can save yourself some heavy computation...but see the thoughtful comment by cobbal.

To repost my answer: The easy way to do it is to have the user tap a button in rhythm with the beat, and count the number of taps divided by the time.

Others have already described some beat-detection methods. I want to add that there are some libraries available that provide techniques and algorithms for this sort of task.
Aubio is one of them, it has a good reputation and it's written in C with a C++ wrapper so you can integrate it easily with a cocoa application (all the audio stuff in Apple's frameworks is also written in C/C++).

There are several methods to get the BPM but the one I find the most effective is the "beat spectrum" (described here).
This algorithm computes a similarity matrix by comparing each short sample of the music with every others. Once the similarity matrix is computed it is possible to get average similarity between every samples pairs {S(T);S(T+1)} for each time interval T: this is the beat spectrum. The first high peak in the beat spectrum is most of the time the beat duration. The best part is you can also do things like music structure or rythm analyses.

I'd imagine this will be easiest in 4-4 dance music, as there should be a single low frequency thud about twice a second.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio