I am trying to shift the pitch of a file in 20Hz, but when I do this in praat and get the mean pitch I never get the 20 Hz, just something similar.
For example I have a .85s file with "108.07459844192924 Hz (mean pitch in SELECTION)"; if I go to manipulation, get the pitch tier and shift it 20 Hz the result is a file 126.12524578822578 Hz (mean pitch in SELECTION)
I have already tried changing the time-step, minimum and maximum pitch when creating the manipulation object. That doesn't seem to be the problem
This is my script (I have tested doing it manually and have the same result):
Note: The array dur_files[] has 10 files with different lengths
for i from 0 to 10
for j from 0 to 10
selectObject: dur_files[i]
durat_mod = Get end time
manip = To Manipulation: 0.005, 10, 1000
selectObject: manip
pitch_tier = Extract pitch tier
selectObject: pitch_tier
Shift frequencies: 0, durat_mod, 3*(j-5), "Hertz"
plusObject: manip
Replace pitch tier
removeObject: pitch_tier
selectObject: manip
resynth = Get resynthesis (overlap-add)
removeObject: manip
selectObject: resynth
Rename: selected$ ("Sound") + "_pitch-" + string$(j-5))
lib_files[i,j] = selected()
lib_files_name$[i,j] = selected$()
endfor
endfor
Ok after almost a year I can answer my own question:
finding the pitch (F0) of a spech recording is not an straight forward thing, the pitch ony has sense in the voiced parts of the recording (vowels)
Praat in fact, makes the shift in pitch as intended, the measurement is the thing that can have an error
Related
I have a figure like x axis Bit Error Rate and y axis Data Rate. And I want to find minimum Bit Error Rate and maximum Data rate in this figure. Namely I want to do thing that in this figure there are 18 points and I want to find optimal result but I cannot it ?
I believe what you are asking is to find the optimal ratio between BitErrorRate and DataRate, in that case you need to calculate the ratio of BitErrorRate per DataRate and then find the min or max of that depending on whether you are looking for fewer or more BitErrorRate per DataRate. Assuming the BitErrorRate and BitRate are saved in arrays you could use code like this:
% Give some random numbers to illustrate functionality
BitErrorRate = [2 7 3 5 8 1];
DataRate = [1 3 4 2 6 5];
% Find the ratio of BitErrorRate per DataRate
ErrorRatio = BitErrorRate ./ BitRate;
% Optimal ratio with minimal errors per data rate and corresponding index
[MinError, MinIndex] = min(ErrorRatio);
% Print results in console
disp(['Optimal error rate ratio is ' num2str(ErrorRatio(MinIndex)) ...
' BitErrorRate per DataRate with a Bit Error Rate of ' ...
num2str(BitErrorRate(MinIndex)) ' and a Bit Rate of ' ...
num2str(BitRate(MinIndex)) '.']);
% Sort in ascending manner according to top row, below this is DataRate
SortedForDataRate = sortrows([DataRate;BitErrorRate;ErrorRatio]')';
fig = figure(1);
subplot(3,1,1)
plot(SortedForDataRate(1,:))
title('BitErrorRate')
subplot(3,1,2)
plot(SortedForDataRate(2,:))
title('DataRate')
subplot(3,1,3)
plot(SortedForDataRate(3,:))
title('ErrorRatio')
My supervisor has suggested the following code for extracting vertical slices from a set of 40 x 40 images and plotting them as a time series (24hrs), with images stored in an array 'images = FLTARR(no_images, 40,40)', and corresponding times in array UT=FLTARR(no_images):
PLOT, [0],[0], /NODATA, XRANGE=[0,24], XSTYLE, YRANGE=[-40,40], /YSTYLE
FOR i=0, no_images-1 DO BEGIN
FOR j=0, 39 DO BEGIN
POLYFILL, UT(i)+[0,0,1,1]*2/60.0, (j+[0,1,1,0])*2-40, COL=[work out what colour you want the pixel to be in terms of the value in images(i,20,j) ]
ENDFOR
ENDFOR
The images were taken at 2 minute intervals.
I understand what is being done here - essentially drawing small rectangles to represent the pixels in the images. My question is what should the COL argument be? Can someone give me an example? At the minute to test the code i've just put a fixed value (e.g. 255), which obviously just gives a block of the same color. How can i get different colours corresponding to the pixel values?
Just use:
max_value = max(ut[*, 20, *])
color=images[i, 20, j] / float(max_value) * 255
Make sure you are in indexed color, i.e., you have done:
device, decomposed=0
and used used LOADCT or TVLCT to load the color table you want to use.
You could also use new graphics if you have IDL 8.4+
no_images = 100
ut = FINDGEN(100)/100*24
img = bytscl(randomu(seed,no_images,40))
y = FINDGEN(40)
p = PLOT(ut, y, XSTYLE=1, YSTYLE=1, XRANGE=[0,24], YRANGE=[0,40], $
/NODATA, TITLE='My Data',XTITLE='X',YTITLE='Y')
i = IMAGE(img, ut, y, /OVERPLOT, RGB_TABLE=74, ASPECT_RATIO=0.5)
You can adjust the ASPECT_RATIO to make your image appear more or less square. And it's easy to change the RGB_TABLE to whatever one you'd like.
I want to modify the pitch at two different parts of a wav file. To do that , i have the information of starting time and ending time from the corresponding textgrid file of the wav file. Is it possible to modify the pitch at two parts.
You can use a Manipulation object to make any changes you want to the original sound's pitch.
# Original sound made of three consecutive notes
snd[1] = Create Sound as pure tone: "A", 1, 0, 0.3, 44100, 220, 0.2, 0.01, 0.01
snd[2] = Create Sound as pure tone: "B", 1, 0, 0.3, 44100, 247, 0.2, 0.01, 0.01
snd[3] = Create Sound as pure tone: "C", 1, 0, 0.3, 44100, 277, 0.2, 0.01, 0.01
selectObject(snd[1], snd[2], snd[3])
sound = Concatenate
Rename: "original"
removeObject(snd[1], snd[2], snd[3])
selectObject(sound)
Play
# We will invert the pitch, so that the notes play in the opposite direction
manipulation = To Manipulation: 0.01, 200, 300
pitchtier = Extract pitch tier
# We copy it because we want to modify it, not create one from scratch
# and we want to be able to read the values of the original from somewhere
original = Copy: "old"
points = Get number of points
# This for loop looks at the values of the original pitch tier and writes them
# onto the new pitch tier
for p to points
selectObject(original)
f = Get value at index: points - p + 1
t = Get time from index: p
# If you uncomment the if block, the changes will only affect the first and last
# quarter of the sound
# if t < 0.25 or t > 0.75
selectObject(pitchtier)
Remove point: p
Add point: t, f
# endif
endfor
# We replace the pitch tier
selectObject(pitchtier, manipulation)
Replace pitch tier
# Resynthesize
selectObject(manipulation)
new_sound = Get resynthesis (overlap-add)
# And clean up
removeObject(original, pitchtier, manipulation)
selectObject(new_sound)
Rename: "modified"
Play
You change the pitch tier by adding points at different times with different pitch values (in Hertz), and when you do the resynthesis Praat will modify the original values so they match the ones you specified.
In your case, you can use the time values from the TextGrid to know when the modified PitchTier points need to be added and leave the rest alone. You can also manipulate duration like this.
In the example, the script changes the value of each of the points in the original pitch tier with the value of the points in the inverted order, so that the first point will have the value of the last one. The if block inside the for is one way of limiting those changes to a subset of the pitch tier, but how you do this will depend on the sort of changes you are trying to make.
how do I estimate SNR from a single audio file containing speech?
I know of two methods:
log power histogram pecentile difference (aka "NIST quick method"), described here: http://labrosa.ee.columbia.edu/~dpwe/tmp/nist/doc/stnr.txt
10*log10( (S-N)/N ), where
S = sum{x[i]^2 * e[i]}
N = sum{x[i]^2 * (1-e[i])}
e[i] some sort of voice activity detection (speech/non-speech indicator)
are there any better methods that do not require stereo data (or data in both clean and noisy version)? I also would like to avoid the "second method" described in the NIST document (see 1.) that makes strong assumptions about the distributions.
Human voice uses frequencies from 300 Hz to 3 kHz. This is what (old) telephone systems are using. Human voice never uses all these frequencies at a time, this is why we can do a frequency analysis for finding the noise floor - without any reference or voice activity detection e[i]:
Compute FFT with a frequency resolution of ~ 10 - 20 Hz.
With a samplerate of 48 kHz you would use an FFT length of samplerate/resolution = 4800 samples, which should the get rounded to the nearest power of 2, which is 4096
Identify the necessary bins which hold the results from 300 - 3000 Hz.
The bin index k holds the result for frequency k*samplerate/FFT_length. For above 48 kHz input and FFT length 4096 this is k(300 Hz) = 300 * 4096 / 48000 ~= 25 and k(3000 Hz) = 3000 * 4096 / 48000 ~= 250.
Calculate the energy in each necessary bin: E[k] = FFT[k].re ^2 + FFT[k].im ^2. It depends on your FFT algorithm "where" the real and imaginary parts are written.
N = min{ E[k=25..250] } * number_of_bins (=250-25+1)
S = sum{ E[k=25..250] }
SNR = (S-N)/N. The level is 10*log10(SNR)
As the SNR varies over time, go back to step 1 with some new samples - probably with some overlap
On every frame of my application, I can call timeGetTime() to retrieve the current elapsed milliseconds, and subtract the value of timeGetTime() from the previous frame to get the time between the two frames. However, to get the frame rate of the application, I have to use this formula: fps=1000/delay(ms). So for instance if the delay was 16 milliseconds, then 1000/16=62.5 (stored in memory as 62). Then let's say the delay became 17 milliseconds, then 1000/17=58, and so on:
1000/10=100
1000/11=90
1000/12=83
1000/13=76
1000/14=71
1000/15=66
1000/16=62
1000/17=58
1000/18=55
1000/19=52
1000/20=50
As you can see for consecutive instances for the delay, there are pretty big gaps in the frame rates. So how do programs like FRAPS determine the frame rate of applications that are between these values (eg 51,53,54,56,57,etc)?
Why would you do this on every frame? You'll find if you do it on every tenth frame, and then divide that value by 10, you'll easily get frame rates within the gaps you're seeing. You'll also probably find your frame rates are higher since you're doing less admin work within the loop :-)
In other words, something like (pseudo code):
chkpnt = 10
cntr = chkpnt
baseTime = now()
do lots of times:
display next frame
cntr--
if cntr == 0:
cntr = chkpnt
newTime = now()
display "framerate = " (newTime - baseTime) / chkpnt
baseTime = newTime
In addition to #Marko's suggestion to use a better timer, the key trick for a smoothly varying and better approximate evaluation of the frame rate is to use a moving average -- don't consider only the very latest delay you've observed, consider the average of (say) the last five. You can compute the latter as a floating-point number, to get more possible values for the frame rate (which you can still round to the nearest integer of course).
For minimal computation, consider a "fifo queue" of the last 5 delays (pseudocode)...:
array = [16, 16, 16, 16, 16] # initial estimate
totdelay = 80
while not Done:
newest = latestDelay()
oldest = array.pop(0)
array.append(newest)
totdelay += (newest - oldest)
estimatedFramerate = 200 / totdelay
...
Not sure, but maybe you need better (high resolution) timer. Check QueryPerformanceTimer.
Instead of a moving average (as #Alex suggests) I suggest a Low-Pass Filter. It's easier to calculate, and can be tweaked to have an arbitrary amount of value smoothing with no change to performance or memory usage. In short (demonstrated in JavaScript):
var smoothing = 10; // The larger this value, the more smoothing
var fps = 30; // some likely starting value
var lastUpdate = new Date;
function onFrameUpdate(){
var now = new Date;
var frameTime = now - lastUpdate;
var frameFPS = 1/frameTime;
// Here's the magic
fps += (frameFPS - fps) / smoothing;
lastUpdate = now;
}
For a pretty demo of this functionality, see my live example here:
http://phrogz.net/JS/framerate-independent-low-pass-filter.html