Is it possible to use the NSSpeechRecognizer with an pre-recorded audio file instead of direct microphone input?
Or is there any other speech-to-text framework for Objective-C/Cocoa available?
Added:
Rather than using voice at the machine that is running the application external devices (e.g. iPhone) could be used for sending just an recorded audio stream to that desktop application. The desktop Cocoa app then would process and do whatever it's supposed to do using the assigned commands.
Thanks.
I don't see any obvious way to switch the input programmatically, though the "Speech" companion guide's first paragraph in the "Recognizing Speech" section seems to imply other inputs can be used. I think this is meant to be set via System Preferences, though. I'm guessing it uses the primary audio input device selected there.
I suspect, though, you're looking for open-ended speech recognition, which NSSpeechRecognizer is not. If you're looking to transform any pre-recorded audio into text (ie, make a transcript of a recording), you're completely out of luck with NSSpeechRecognizer, as you must give it an array of "commands" to listen for.
Theoretically, you could feed it the whole dictionary, but I don't think that would work since you usually have to give it clear, distinct commands. Its performance would suffer, I would guess, if you gave it a bunch of stuff to analyze for (in real time).
Your best bet is to look at third-party open source solutions. There are a few generalized packages out there (none specifically for Cocoa/Objective-C), but this poses another question: What kind of recognition are you looking for? The two main forms of speech recognition ('trained' is more accurate but less flexible for different voices and the recording environment, whereas 'open' is generally much less accurate).
It'd probably be best if you stated exactly what you're trying to accomplish.
Related
We are trying to use a full-duplex stream from two different devices on two different clock domains. PortAudio reports flags periodically, which I've put down to the interfaces being on different clock domains.
I have read and understood the behaviour documented in http://www.portaudio.com/docs/proposals/001-UnderflowOverflowHandling.html, and it is consistent with my observations. These flags are of course accompanied by clicks. I can reproduce this behaviour easily in the pa_fuzz example, so I believe it to be expected behaviour.
What I'm unsure of exactly what I should do with this information? The flags tell me that an overflow/underflow has already happened; how could I feasibly implement any resampling with this?
Am I going about this completely wrong? What is the typical usage pattern for PortAudio full-duplex using two different devices?
I'm using Core Audio. Thanks!
I don't have a proper answer for handling it at the PortAudio level, but there is a platform-specific solution that you could consider.
On OS X, you can create an aggregate device that combines the two devices, and enable drift correction to have CoreAudio perform the resampling. The resulting aggregate will have the inputs and outputs for both devices, so it may take additional logic to select the input and output channels of interest.
You could try it first by creating an aggregate device using Audio MIDI Setup.app. Apple does have some support notes on it, like
Create an Aggregate Device to combine multiple audio interfaces, and Set aggregate device settings in Audio MIDI Setup on Mac (which may have conflicting guidance on selecting drift correction)
I'm not sure it's formally documented other than headers like AudioHardware.h, but it is possible to create an aggregate device programatically as well. I have only experimented with it myself, and recall having some intermittent issues configuring the device (perhaps an asynchronous behavior with some operations? ).
Of course, if you're using PortAudio and trying to find a portable solution, this won't solve the issue generally.
I want to make a password manager (for windows) using a fingerprint reader as a validator, but I want to make it from scratch. I want to take the output of the usb fingerprint reader (I don't know exactly the format of the output, but I guess is a grayscale image) and process it (using my own fingerprint comparison algorithm).
I've searched on google and youtube, but I only found fingerprint readers with their own comparison algorithm and software.
Can you please help me with some examples of fingerprints readers that gives you the output and some tutorials in which describes how they access the it in real time?
Windows has something called the Windows Biometric Framework which you should be able to program to and which should do all the heavy lifting for you, see:
https://learn.microsoft.com/en-us/windows/desktop/secbiomet/biometric-service-api-portal
So just look for a reader supporting that (which is probably all of them).
I hope you don't mind me saying this, but MS (or perhaps the hardware vendor) can probably write a better comparison algorithm than you and doing things this way means that you'll get the same matches (or not) as, say, Windows' own logon procedure. Imagine the confusion you would sow in the minds of your users if you didn't.
I'd like to write some code that generates some pretty simple musical tones (notes) and has them output through the speaker (whatever sound device).
I suspect I'll likely need to generate as MIDI data, which I can go figure out independently, but I'm new to audio programming generally and I'm not sure what the best entry point into the system frameworks is. AudioToolbox has these MusicSequence objects. There's also Core MIDI and Core Audio. None has an obvious interface for "here's a data structure for a bunch of notes, now call this method to play them", so I'll presumably need some combination of these to cobble it together.
I'm confident that OS X supports this. If anyone has context with this kind of work, I'd appreciate a couple basic pointers on where in the docs (or other resources) to start looking for building whatever structures represent music data and where you'd turn around and trigger playback.
OS X does support this, but it's a lot more inherently complex than it might seem at first. There are essentially three pieces:
MusicSequence is the "data structure for a bunch of notes" (along with timing information in the form of a tempo/meter map.
MusicPlayer is the object that controls playback of the MusicSequence.
AUGraph is what you'd use to create an instrument object and hook it up to your physical outputs, to turn the note data into sound.
There's a lot of potential variety in how you set up the AUGraph. For example, the default General MIDI synthesizer is the built-in DLSMusicDevice, but you could also load an FM synth, a sampler, or any number of other instrument units. From there, you could be processing the audio in various ways and routing it to various devices. All that stuff that falls in the general category of "audio processing" happens within the AUGraph.
Apple's PlaySequence sample code does mostly what you're looking for. It's a C++ project—but MusicSequence, MusicPlayer, and AUGraph are plain C APIs, so it should be a decent starting point. https://developer.apple.com/library/mac/samplecode/PlaySequence/Introduction/Intro.html
OK, the first issue. I am trying to write a virtual soundboard that will output to multiple devices at once. I would prefer OpenAL for this, but if I have to switch over to MS libs (I'm writing this initially on Windows 7) I will.
Anyway, the idea is that you have a bunch of sound files loaded up and ready to play. You're on Skype, and someone fails in a major way, so you hit the play button on the Price is Right fail ditty. Both you and your friends hear this sound at the same time, and have a good laugh about it.
I've gotten OAL to the point where I can play on the default device, and selecting a device at this point seems rather trivial. However, from what I understand, each OAL device needs its context to be current in order for the buffer to populate/propagate properly. Which means, in a standard program, the sound would play on one device, and then the device would be switched and the sound buffered then played on the second device.
Is this possible at all, with any audio library? Would threads be involved, and would those be safe?
Then, the next problem is, in order for it to integrate seamlessly with end-user setups, it would need to be able to either output to the default recording device, or intercept the recording device, mix it with the sound, and output it as another playback device. Is either of these possible, and if both are, which is more feasible? I think it would be preferable to be able to output to the recording device itself, as then the program wouldn't have to be running in order to have the microphone still work for calls.
If I understood well there are two questions here, mainly.
Is it possible to play a sound on two or more audio output devices simultaneously, and how to achieve this?
Is it possible to loop back data through a audio input (recording) device so that is is played on the respective monitor i.e for example sent through the audio stream of Skype to your partner, in your respective case.
Answer to 1: This is absolutely feasable, all independent audio outputs of your system can play sounds simultaneously. For example some professional audio interfaces (for music production) have 8, 16, 64 independent outputs of which all can be played sound simultaneously. That means that each output device maintains its own buffer that it consumes independently (apart from concurrency on eventual shared memory to feed the buffer).
How?
Most audio frameworks / systems provide functions to get a "device handle" which will need you to pass a callback for feeding the buffer with samples (so does Open AL for example). This will be called independently and asynchroneously by the framework / system (ultimately the audio device driver(s)).
Since this all works asynchroneously you dont necessarily need multi-threading here. All you need to do in principle is maintaining two (or more) audio output device handles, each with a seperate buffer consuming callback, to feed the two (or more) seperate devices.
Note You can also play several sounds on one single device. Most devices / systems allow this kind of "resources sharing". Actually, that is one purpose for which sound cards are actually made for. To mix together all the sounds produced by the various programs (and hence take off that heavy burden from the CPU). When you use one (physical) device to play several sounds, the concept is the same as with multiple device. For each sound you get a logical device handle. Only that those handle refers to several "channels" of one physical device.
What should you use?
Open AL seems a little like using heavy artillery for this simple task I would say (since you dont want that much portability, and probably dont plan to implement your own codec and effects ;) )
I would recommend you to use Qt here. It is highly portable (Win/Mac/Linux) and it has a very handy class that will do the job for you: http://qt-project.org/doc/qt-5.0/qtmultimedia/qaudiooutput.html
Check the example in the documentation to see how to play a WAV file, with a couple of lines of code. To play several WAV files simultaneously you simply have to open several QAudioOutput (basically put the code from the example in a function and call it as often as you want). Note that you have to close / stop the QAudioOutput in order for the sound to stop playing.
Answer to 2: What you want to do is called a loopback. Only a very limited number of sound cards i.e audio devices provide a so called loopback input device, which would permit for recording what is currently output by the main output mix of the soundcard for example. However, even this kind of device provided, it will not permit you to loop back anything into the microphone input device. The microphone input device only takes data from the microphone D/A converter. This is deep in the H/W, you can not mix in anything on your level there.
This said, it will be very very hard (IMHO practicably impossible) to have Skype send your sound with a standard setup to your conversation partner. Only thing I can think of would be having an audio device with loopback capabilities (or simply have a physical cable connection a possible monitor line out to any recording line in), and have then Skype set up to use this looped back device as an input. However, Skype will not pick up from your microphone anymore, hence, you wont have a conversation ;)
Note: When saying "simultaneous" playback here, we are talking about synchronizing the playback of two sounds as concerned by real-time perception (in the range of 10-20ms). We are not looking at actual synchronization on a sample level, and the related clock jitter and phase shifting issues that come into play when sending sound onto two physical devices with two independent (free running) clocks. Thus, when the application demands in phase signal generation on independent devices, clock recovery mechanisms are necessary, which may be provided by the drivers or OS.
Note: Virtual audio device software such as Virtual Audio Cable will provide virtual devices to achieve loopback functionnality in Windows. Frameworks such as Jack Audio may achieve the same in UX environment.
There is a very easy way to output audio on two devices at the same time:
For Realtek devices you can use the Audio-mixer "trick" (but this will give you a delay / echo);
For everything else (and without echo) you can use Voicemeeter (which is totaly free).
I have explained BOTH solutions in this video: https://youtu.be/lpvae_2WOSQ
Best Regards
I'm looking for an OSX (or Linux?) application that can recieve data from a webcam/video-input and let you do some image processing on the pixels in something similar to c or python or perl, not that bothered about the processing language.
I was considering throwing one together but figured I'd try and find one that exists already first before I start re-inventing the wheel.
Wanting to do some experiments with object detection and reading of dials and numbers.
If you're willing to do a little coding, you want to take a look at QTKit, the QuickTime framework for Cocoa. QTKit will let you easity set up an input source from the webcam (intro here). You can also apply Core Image filters to the stream (demo code here). If you want to use OpenGL to render or apply filters to the movie, check out Core Video (examples here).
Using theMyMovieFilter demo should get you up and running very quickly.
Found a cross platform tool called 'Processing', actually ran the windows version to avoid further complications getting the webcams to work.
Had to install quick time, and something called gVid to get it to work but after the initial hurdle coding seems like C; (I think it gets "compiled" into Java), and it runs quite fast; even scanning pixels from the webcam in real time.
Still to get it working on OSX.
Depending on what processing you want to do (i.e. if it's a filter that's available in Apple's Core Image filter library), the built-in Photo Booth app may be all you need. There's a comercial set of add-on filters available from the Apple store as well (http://www.apple.com/downloads/macosx/imaging_3d/composerfxeffectsforphotobooth.html).