Windows Vista Speech Recognition engine sampling rates - windows-vista

I need to recognize the speech from a set of WAV files that are not recorded on the computer doing the recognition. I know that if you recognize the speech from WAV files that are recorded at the same sampling rate as the recordings that the engine used for training, the results will be better.
But my questions is: what's the sampling rate for which Vista's engine was trained? I cannot seem to find this information anywhere.
And also... do you know any method to convert the WAV files from one sampling rate to another, from C#?
Thanks!

The default sampling rate for the SAPI desktop engines is 11 KHz. The desktop engines work well with any sampling rate above that. Also, SAPI will resample the audio for you, if you use the SpBindToFile helper. You didn't mention which programming language you're using, so I assumed C++.

Related

Is the format of a Windows audio loopback capture fixed? Or is it sound card dependent?

I am using windows audio core api to do loopback-capture and then processing the data. On my machine I get 48000 sampling rate with 32 bit floats for the format. Is that what Windows is using internally? I'm wondering if I'm tapping the output before any hardware specific conversion so the format is always the same, or if I might be getting 16 bit ints on some other machine?
There is clearly some variation between machines, at least with respect to sample rate, as WASAPI on my machine gives 32-bit floats at 44100Hz. The documentation for GetMixFormat (remarks section, paragraphs 2 and 3) suggests that the supplied format is the internal format used for mixing, and that it may well differ from what the sound card actually accepts as input, but doesn't make clear exactly which formats may be used. I suspect that this is intentionally vague so as to encourage developers to handle multiple formats in the case that they may be used somewhere. That said, given they are abstracting the mix format from the sound card, I would be surprised if they used different internal formats on different machines.

What is the fastest way to combine audio files on a web server?

Disclaimer: Forgive my ignorance of audio/sound processing, my background is web and mobile development and this is a bespoke requirement for one of my clients!
I have a requirement to concatenate 4 audio files, with a background track playing behind all 4 audio files. The source audio files can be created in any format, or have any treatment applied to them, to improve the processing time, but the output quality is still important. For clarity, the input files could be named as follows (.wav is only an example format):
background.wav
segment-a.wav
segment-b.wav
segment-c.wav
segment-d.wav
And would need to be structured something like this:
[------------------------------background.wav------------------------------]
[--segment-a.wav--][--segment-b.wav--][--segment-c.wav--][--segment-d.wav--]
I have managed to use the SoX tool to achieve the concatenation portion of the above using MP3 files, but on a reasonably fast computer I am getting roughly an hours worth of concatenated audio per minute of processing, which isn't fast enough for my requirements, and I haven't applied the background sound or any 'nice to haves' such as trimming/fading yet.
My questions are:
Is SoX the best/only tool for this kind of operation?
Is there any way to make the process faster without sacrificing (too much) quality?
Would changing the input file format result in improved performance? If so, which format is best?
Any suggestions from this excellent community would be much appreciated!
Sox may not be the best tool, but I doubt you will find anything much better without hand-coding.
I would venture to guess that you are doing pretty well to process that much audio in that time. You might do better, but you'll have to experiment. You are right that probably the main way to improve speed is to change the file format.
MP3 and OGG will probably give you similar performance, so first identify how MP3 compares to uncompressed audio, such as wav or aiff. If MP3/OGG is better, try different compression ratios and sample rates to see which goes faster. With wav files, you can try lowering the sample rate (you can do this with MP3/OGG as well). If this is speech, you can probably go as low as 8kHz, which should speed things up considerably. For music, I would say 32kHz, but it depends on the requirements. Also, try mono instead of stereo, which should also speed things up.

fast encoding video codec?

can anybody compare popular video codecs by encoding speed? I understand that usually better compression requires more processing time, but it's also possible that some codecs still provide comparably good compression with fast encoding. any comparison links?
thanks for your help
[EDIT]: codecs can be compared by used algorithms, regardless of its particular implementation, hardware used or video source, something like big O for mathematical algorithms
When comparing VP8 and x264, VP8 also shows 5-25 times lower encoding speed with 20-30% lower quality at average. For example x264 High-Speed preset is faster and has higher quality than any of VP8 presets at average."
its tough to compare feature sets vs speed/quality.
see some quality comparison http://www.compression.ru/video/codec_comparison/h264_2012/
The following paragraph and image are from VP9 encoding/decoding performance vs. HEVC/H.264 by Ronald S. Bultje:
x264 is an incredibly well-optimized encoder, and many people still
use it. It’s not that they don’t want better bitrate/quality ratios,
but rather, they complain that when they try to switch, it turns out
these new codecs have much slower encoders, and when you increase
their speed settings (which lowers their quality), the gains go away.
Let’s measure that! So, I picked a target bitrate of 4000kbps for each
encoder, using otherwise the same settings as earlier, but instead of
using the slow presets, I used variable-speed presets (x265/x264:
–preset=placebo-ultrafast; libvpx: –cpu-used=0-7).
This is one of those topics where Your Mileage May Vary widely. If I were in your position, I'd start off with a bit of research on Wikipedia, and then gather the tools to do some testing and benchmarking. The source video format will probably affect overall encoding speed, so you should test with video that you intend to use on the Production system.
Video encoding time can vary widely depending on the hardware used, and whether you used an accelerator card, and so on. It's difficult for us to make any hard and fast recommendations without explicit knowledge of your particular set up.
The only way to make decisions like this, is to test these things yourself. I've done the same thing when comparing Virtualisation tools. It's fun too!

What is a good format for storing sounds on windows compressed?

Currently we use .wav files for storing our sounds with our product. However, these can get large. I know there are many different sound files out there, however what is the best sound file to use that will:
1) Work on all windows-based systems (XP+)
2) Doesn't add a lot of extra code (ie: including a 3 mb library to play mp3's will offset any gains I get from removing the .wav files)
3) Isn't GPL or some code I can't use (ideally just something in the windows SDK, or maybe just a different compression scheme for .wav that compresses better and works nicely with sndPlaySound(..) or something similar.
Any ideas would be appreciated, thanks!
While WAV files are typically uncompressed, they can be compressed with various codecs and still be played with the system API's. The largest factors in the overall size are the number of channels (mono or stereo), the sample rate (11k, 44.1k, etc), and the sample size (8 bit, 16 bit, 24 bit). This link discusses the various compression schemes supported for WAV files and associated file sizes:
http://en.wikipedia.org/wiki/WAV
Beyond that, you could resort to encoding the data to WMA files, which are also richly supported without third party libraries, but would probably require using the Windows Media SDK or DirectShow for playback.
This article discusses the WMA codecs and levels of compression that can be expected:
http://www.microsoft.com/windows/windowsmedia/forpros/codecs/audio.aspx
If the totality of the files is what 'gets large' rather than individual files, so that the time taken by the extra step does not prevent timely action, you might consider zipping up the files yourself and unzipping them as needed. I realize this sounds, and in many cases may be, inefficient, but if mp3 is ruled out it may be worth looking at depending on other (not mentioned in your question) considerations.
I'd look at DirectShow and see if you can use the DirectShow MP3 or WMA codecs to compress the audio stream. All the DLLs are in-box on Windows so there's no additional redistributable needed.

Real-time equalizer for all audio on computer

Is it possible to capture all the sound from a computer and have it pass through a equalizer before reaching the speakers?
How can you program a band pass filter on it?
EDIT: I'm trying to get this on Windows (with Python? heh) but if there is a generic, cross-platform approach that would be great.
On the GNU/Linux platform with a real time pre-emption enabled Kernel, you have the JACK Audio Connection Kit. Put simply, JACK allows you to connect JACK-aware audio programs such that you could capture all the sound from your computer.
You would then pass this captured sound into another JACK audio program which hosts your equalizer plugin. The equalizer plugin, in Linux at least, will be either a LADSPA plugin, or, LADSPA's successor plugin standard LV2.
You can program a band pass filter if you have a very very very good grasp of very high level mathematics (IMHO) and excellent knowledge of Digital Signal Processing in general. If you don't have these skills I would strongly discourage you against coding a band pass filter, and to just use one of the many freely available implementations.
http://jackaudio.org
http://ladspa.org
http://lv2plug.in
see also:
http://musicdsp.org
You can implement an equalizer either using discrete bandpass filters or you can do it in the frequency domain (FFT -> equalize -> IFFT). For bandpass filters you can either combine a lowpass and a highpass filter or you can use one of various common designs, such as a damped resonator.
How you actually implement the above will depend on what OS, programming language, etc, you are using.

Resources