MFCC Vector Quantization for Speaker Verification Hidden Markov Models - models

I am currently doing a project on speaker verification using Hidden Markov Models. I chose MFCC for my feature extraction. I also intend to apply VQ to it. I have implemented HMM and tested it on Eisner's data spreadsheet found here: http://www.cs.jhu.edu/~jason/papers/ and got correct results.
Using voice signals, I seem to have missed something since I was not getting correct acceptance (I did the probability estimation using the forward algorithm - no scaling applied).I was wondering on what could have I done wrong. I used scikits talkbox's MFCC function for feature extraction and used Scipy's cluster for vector quantization. Here is what I have written:
from scikits.talkbox.features import mfcc
from scikits.audiolab import wavread
from scipy.cluster.vq import vq, kmeans, whiten
(data, fs) = wavread(file_name)[:2]
mfcc_features = mfcc(data, fs=fs)[0]
#Vector Quantization
#collected_feats is a list of spectral vectors taken together from 3 voice samples
random.seed(0)
collected_feats = whiten(collected_feats)
codebook = kmeans(collected_feats, no_clusters)[0]
feature = vq(mfcc_features, codebook)
#feature is then used as the observation for the hidden markov model
I assumed that the default parameters for scikits' mfcc function is already fit for speaker verification. The audio files are of sampling rates 8000 and 22050. Is there something I am lacking here? I chose a cluster of 64 for VQ. Each sample is an isolated word. at least 1 second in duration. I haven't found a Python function yet to remove the silences in the voice samples so I use Audacity to manually truncate the silence parts. Any help would be appreciated. Thanks!

Well I am not sure about HMM approach but I would recommend using GMM. ALize is a great library for doing that. For Silence removal, use the LIUM library. The process is called speaker diarization, the program detects where the speaker is speaking and gives the time stamp.

Related

Seeding F# random generator to same state as Matlab

In trying to port over some Matlab code to F#, I'm trying to make sure the translations are accurate. As of now, there are cases where I'm not completely sure whether there are mistakes. Since a lot of the code is statistical in nature, it would be convenient to be able to seed the F# generators to the same state as Matlab's. It would also help with triangulating the exact equations that are wrong. Wanted to ask before I started dumping Matlab generated random numbers to csv files and solving this issue in a manual way.
This is not a definitive answer as probably implementing your own random number generator in matlab and F# should yield the most reliable results. You are also bound to bump into issues of thread safety in .NET, and the shapes of matrices in matlab. For example
In matlab:
rng(200,'twister')
rand(1,5)
ans =
0.9476 0.2265 0.5944 0.4283 0.7641
In F#:
open MathNet.Numerics.Random
let random1b = MersenneTwister(200)
random1b.NextDoubles(5)
val it : float [] = [|0.9476322592; 0.4941436297; 0.2265474238;
0.1485590497; 0.5944201448|]
The 1st, 3rd, and 5th random numbers do match.
Now it's possible you can replicate this somehow by playing around with different versions and/or F# and matlab array dimensions.
The MathNet Random Docs.

Generate Random Numbers non-algorithmically

I am looking for a satisfying solution of how to generate a random number.
I looked at this, this, this and this.
But am looking for something else.
Most of the posts mention using, R[n+1] = (a *R[n-1 + b) %n, this pseudo-random function, or some other mathematical functions.
But weirdly I am not looking for these; I want some non-algorithmic answer. Precisely, an "Interview" answer. Something easy to understand, not to make the interviewer feel that I mugged up a method :) .
For an interview question, a common answer might be to look at the intervals between keystrokes (ask the user to type something), disc seek times or input from a disconnected source -- that will give you thermal electrons from inside your MIC socket or whatever.
LavaRnd uses a digital camera with the lens cap on, which is a version of the last.
Some operating systems allows indirect access to some of this random input, usually through a secure random function; slower but more secure than the usual RNG.
Depending on what job the interview is for, you can talk about testing the raw data to check for entropy, and concentrating the entropy by using a cryptographic hash function like SHA-256.
There are also specialised, and expensive, hardware cards which use various quantum effects to generate true random numbers.
Take the system time, add a seed, modulo the upper limit. if upper limit is less than 0 than multiply it by -1 and then later the result subtract the max... not very strong but meets your requirement?
If you have a UI and only need a couple of randoms can ask the user to move mouse around, enter a few seeds, enter a few words and use them as seeds

How to implement a part of histogram equalization in matlab without using for loops and influencing speed and performance

Suppose that I have these Three variables in matlab Variables
I want to extract diverse values in NewGrayLevels and sum rows of OldHistogram that are in the same rows as one diverse value is.
For example you see in NewGrayLevels that the six first rows are equal to zero. It means that 0 in the NewGrayLevels has taken its value from (0 1 2 3 4 5) of OldGrayLevels. So the corresponding rows in OldHistogram should be summed.
So 0+2+12+38+113+163=328 would be the frequency of the gray level 0 in the equalized histogram and so on.
Those who are familiar with image processing know that it's part of the histogram equalization algorithm.
Note that I don't want to use built-in function "histeq" available in image processing toolbox and I want to implement it myself.
I know how to write the algorithm with for loops. I'm seeking if there is a faster way without using for loops.
The code using for loops:
for k=0:255
Condition = NewGrayLevels==k;
ConditionMultiplied = Condition.*OldHistogram;
NewHistogram(k+1,1) = sum(ConditionMultiplied);
end
I'm afraid if this code gets slow for high resolution big images.Because the variables that I have uploaded are for a small image downloaded from the internet but my code may be used for sattellite images.
I know you say you don't want to use histeq, but it might be worth your time to look at the MATLAB source file to see how the developers wrote it and copy the parts of their code that you would like to implement. Just do edit('histeq') or edit('histeq.m'), I forget which.
Usually the MATLAB code is vectorized where possible and runs pretty quick. This could save you from having to reinvent the entire wheel, just the parts you want to change.
I can't think a way to implement this without a for loop somewhere, but one optimisation you could make would be using indexing instead of multiplication:
for k=0:255
Condition = NewGrayLevels==k; % These act as logical indices to OldHistogram
NewHistogram(k+1,1) = sum(OldHistogram(Condition)); % Removes a vector multiplication, some additions, and an index-to-double conversion
end
Edit:
On rereading your initial post, I think that the way to do this without a for loop is to use accumarray (I find this a difficult function to understand, so read the documentation and search online and on here for examples to do so):
NewHistogram = accumarray(1+NewGrayLevels,OldHistogram);
This should work so long as your maximum value in NewGrayLevels (+1 because you are starting at zero) is equal to the length of OldHistogram.
Well I understood that there's no need to write the code that #Hugh Nolan suggested. See the explanation here:
%The green lines are because after writing the code, I understood that
%there's no need to calculate the equalized histogram in
%"HistogramEqualization" function and after gaining the equalized image
%matrix you can pass it to the "ExtractHistogram" function
% (which there's no loops in it) to acquire the
%equalized histogram.
%But I didn't delete those lines of code because I had tried a lot to
%understand the algorithm and write them.
For more information and studying the code, please see my next question.

When to stop the looping in random number generators?

I'm not sure StackOverflow is the right place to ask this question, because this question is half-programming and half-mathematics. And also really sorry if my question is stupid ^_^
I'm studying about Monte Carlo simulations via the "Monte Carlo Methods" book. One of the first thing I must learn is about Random Number Generator. The basic algorithm of RNG is:
1. Initialize: Draw the seed S0 from the distribution µ on S. Set t = 1.
2. Transition: Set St = f(St−1).
3. Output: Set Ut = g(St).
4. Repeat: Set t = t+ 1 and return to Step 2.
(µ is a probability distribution on the finite set of states S, the input is S0 and the random number we desire it the output Ut)
It is not hard to understand, but the problem here is I don't see the random factor which lie in the number of repeat. How can we decide when to stop the loop of the RNG? All examples I read which implement a RNG are loop for 100 times, and they returns the same value for a specific seed. It is not random at all >_<
Can someone explain what I'm missing here? Any help will be appreciated. Thanks everyone
You can't get a true sequence of random numbers on a computer, without specialized hardware. (Such specialized hardware performs the equivalent of an initial roll of the dice using physics to provide the randomness. Electronic ones often use the electronic noise of specialized diodes at constant temperatures; others use radioactive decay events.)
Without that specialized hardware, what you can generate are pseudorandom numbers which, as you've observed, always generate the same sequence of numbers for the same initial seed. For simple applications, you can often get away with generating an initial seed from the time of invocation, which is effectively random.
And when I say "simple applications," I am excluding cryptography. (Not just that, but especially that.)
Sometimes when you are trying to debug a simulation, you actually want to have a reproducible stream of "random" numbers so you might specifically sent a stream to start with a specific seed.
For instance in the answer Creating a facet_wrap plot with ggplot2 with different annotations in each plot rcs starts the answer by creating a reproducible set of data using the R code
set.seed(1)
df <- data.frame(x=rnorm(300), y=rnorm(300), cl=gl(3,100)) # create test data
before going on to demonstrate how to answer the actual question.

Time delay estimation between two audio signals

I have two audio recordings of a same signal by 2 different microphones (for example, in a WAV format), but one of them is recorded with delay, for example, several seconds.
It's easy to identify such a delay visually when viewing these signals in some kind of waveform viewer - i.e. just spotting first visible peak in every signal and ensuring that they're the same shape:
(source: greycat.ru)
But how do I do it programmatically - find out what this delay (t) is? Two digitized signals are slightly different (because microphones are different, were at different positions, due to ADC setups, etc).
I've digged around a bit and found out that this problem is usually called "time-delay estimation" and it has myriads of approaches to it - for example, one of them.
But are there any simple and ready-made solutions, such as command-line utility, library or straight-forward algorithm available?
Conclusion: I've found no simple implementation and done a simple command-line utility myself - available at https://bitbucket.org/GreyCat/calc-sound-delay (GPLv3-licensed). It implements a very simple search-for-maximum algorithm described at Wikipedia.
The technique you're looking for is called cross correlation. It's a very simple, if somewhat compute intensive technique which can be used for solving various problems, including measuring the time difference (aka lag) between two similar signals (the signals do not need to be identical).
If you have a reasonable idea of your lag value (or at least the range of lag values that are expected) then you can reduce the total amount of computation considerably. Ditto if you can put a definite limit on how much accuracy you need.
Having had the same problem and without success to find a tool to sync the start of video/audio recordings automatically,
I decided to make syncstart (github).
It is a command line tool. The basic code behind it is this:
import numpy as np
from scipy import fft
from scipy.io import wavfile
r1,s1 = wavfile.read(in1)
r2,s2 = wavfile.read(in2)
assert r1==r2, "syncstart normalizes using ffmpeg"
fs = r1
ls1 = len(s1)
ls2 = len(s2)
padsize = ls1+ls2+1
padsize = 2**(int(np.log(padsize)/np.log(2))+1)
s1pad = np.zeros(padsize)
s1pad[:ls1] = s1
s2pad = np.zeros(padsize)
s2pad[:ls2] = s2
corr = fft.ifft(fft.fft(s1pad)*np.conj(fft.fft(s2pad)))
ca = np.absolute(corr)
xmax = np.argmax(ca)
if xmax > padsize // 2:
file,offset = in2,(padsize-xmax)/fs
else:
file,offset = in1,xmax/fs
A very straight forward thing todo is just to check if the peaks exceed some threshold, the time between high-peak on line A and high-peak on line B is probably your delay. Just try tinkering a bit with the thresholds and if the graphs are usually as clear as the picture you posted, then you should be fine.

Resources