I am trying to write a small DirectShow application using C++. My main issue is grabbing the raw data from the microphone and saving it as BYTES.
Is this possible using a DirectShow filter? What interface am I supposed to use in order to get the data?
Right now, I am able to achieve writing a recorded AVI file using the following Graph Filter:
Microphone->Avi Mux->File writer
This graph works fine.
I have tried to use the SampleGrabber (which is deprecated by Microsoft) and I have a lack of knowledge regarding what to do with this BaseFilter type.
By design DirectShow topology needs to be complete, starting with source (microphone) and terminating with renderer filter, and data exchange in DirectShow pipelines is private to connected filters, without data exposure to controlling application.
This makes you confused because you apparently want to export content from the pipeline, into outer world. It is not exactly the way DirectShow is designed to work.
The "intended", "DirectShow way" is to develop a custom renderer filter which would connect to the microphone filter and receive its data. More often than not developers prefer to not take this path since developing a custom filter is a sort of complicated.
The popular solution is to build a pipeline Microphone --> Sample Grabber --> Null Renderer. Sample Grabber is a filter which exposes data, which is passed through, using SampleCB callback. Even though it's getting harder with time, you can still find tons of code which do the job. Most developers prefer this path: to build pipeline using ready to use blocks and forget about DirectShow API.
And then one another option would be to not use DirectShow at all. Given its state this API choice is unlucky, you should rather be looking at WASAPI capture instead.
Related
The AVAudioEngine and related AVAudioNode objects seem to be quite powerful for audio processing but it's difficult to see how to automate parameter changes with them. I'm sure there must be something more effective than manually using a Timer to change values, as a crude example.
AVMutableAudioMixInputParameters includes a method setVolumeRampFromStartVolume:toEndVolume:timeRange: but I cannot see how I could integrate that with AVAudioPlayerNode (connected with AVAudioEffectNodes) and use that method to fade the volume over time. Instead, I have only seen examples of AVMutableAudioMixInputParameters working with AVMutableCompositioninstances and none of them include AVAudioNode objects.
Can anyone post or link to some code samples that combine the use of AVAudioNodes with setVolumeRampFromStartVolume:toEndVolume:timeRange: or explain best practices for automating parameter changes on a node over time?
Many thanks
The AVAudioEngine is a real-time engine, but AVMutableComposition seems to be a non-real-time object. Thus incompatible. An alternative is to build and insert your own fader real-time AUAudioUnit node..
I want to use the Medialooks multisource filter in my application, This has entry in
HKEY_CURRENT_USER\SOFTWARE\Classes\CLSID\
But still i have to Add this filter manually using CLS_ID and AddFilter Function.
Is there any way so that Renderfile function of Dshow will automatically creates a graph by enumerating the filters from registry
Checked in Grphedt tool but if i manually insert and connect Filters I can play the videos properly.Otherwise it wont render automatically by building the graph
Ability yo connect filters and obtain a topology of your interest is one thing, and having this sort of connection taking place during Intelligent Connect is another thing. For Intelligent Connect and RenderFile the filters of interest must be apparently registered, and then they have accurate DirectShow registration details: merit, media types. Quite so often filters are lacking this registration (and at other times they are "over-registrating" themselves so that they are picked up when they are obvious misfit).
Even though you can re-register filter yourself (see IFilterMapper2::RegisterFilter) with alternate registration details, you typically do not do it. It's filter developer's business to register accurately. The better alternative for you is to build graph using AddFilter calls where you have fine control over graph construction. Or you might want to do it as a fallback construction method if RenderFile fails in first place.
I'm having problems writing a custom DS source filter to play a TS stream that was dumped to multiple files. [ed: The point being to re-play a continuous stream from these separate files]
First I tried modifying the Async file sample: no go - the data 'pull' model seems to put all the controlling logic in the splitter filter so I couldn't trick it into believing I have a 'continuous' stream.
So then tried modifying the PushSource desktop sample: it appears we have to babysit the MPEG demuxer this way to create its output pin, parsing data myself to get IDs etc. I managed to get GraphStudio to auto-wire up something (using a strange DTV-DVD decoder) but it doesn't play anything despite the source filter pushing the right data downstream.
Does anyone have experience in this area to help/suggest anything?
Have you found a solution to your problem, now?
I am writing a similar DirectShow filter, currently for playing only one file, but I think that modifying it for playing several files should not be a problem.
I built this filter starting from the "Push Source Bitmap" filter, but I had to make a lot of changes on it.
I also had to build the graph using an application that I wrote (so not using GraphEdit), connect the "Mpeg-2 Demultiplexer" to the new filter, add one PSI output (mapped to PID 0 = PAT) and the "MPEG-2 Sections and Tables Filter" connected to this PSI output.
After that, I used the "MPEG-2 Sections and Tables Filter" for reading the PAT table and the PMT PIDs defined inside it. Next, I mapped also all PMT PIDs to the same "MPEG-2 Sections and Tables Filter", and I parsed the PMT tables for knowing Elementary Streams PIDs and media types, and next I created one video output and one audio output based on these informations (there may be more than one audio + video streams, but at the current step I retain only the first one). Note that this needs to run the partial graph temporarily in order to be able to parse tables, and after to stop it in order to be able to create the video and audio output pins (with the proper media types) and connect decoders and renderers.
In addition to that, I have an information that I you could find interesting: it appears that when connected, the "Mpeg-2 Demultiplexer" searchs the graph for a filter exposing the "IBDA_NetworkProvider" interface, and if found, it registers itself to it using the IBDA_NetworkProvider::RegisterDeviceFilter method.
I think that you could use this for detecting the "Mpeg-2 Demultiplexer" filter insertion into the graph (by exposing the "IBDA_NetworkProvider" interface from your filter), and try to make the above operations from your source filter, thus allowing to use your filter inside GraphEdit and expect the "Mpeg-2 Demultiplexer" to be baby-sat from this filter without worrying to build an application around for doing these operations.
Gingko
I created a TS source filter which reads a network stream. So that is continuous too, but I difference to reading from a file is, that a network stream automatically gives me the correct speed. So I think you should be able to use a similar approach.
I have based my filter on the FBall example from de dx sdk.
The filter derives from CSource, IFileSourceFilter, IAMFilterMiscFlags and ISpecifyPropertyPages. The output pin derives just from CSourceStream.
If you have problems decoding the audio/video, maybe first try a simple mpeg-2 stream, for example from a DVB source. And make sure you have a decoder installed and it accepts that format. (for example ffdshow has mpeg2 decoding turned off by default).
Is it possible to use the NSSpeechRecognizer with an pre-recorded audio file instead of direct microphone input?
Or is there any other speech-to-text framework for Objective-C/Cocoa available?
Added:
Rather than using voice at the machine that is running the application external devices (e.g. iPhone) could be used for sending just an recorded audio stream to that desktop application. The desktop Cocoa app then would process and do whatever it's supposed to do using the assigned commands.
Thanks.
I don't see any obvious way to switch the input programmatically, though the "Speech" companion guide's first paragraph in the "Recognizing Speech" section seems to imply other inputs can be used. I think this is meant to be set via System Preferences, though. I'm guessing it uses the primary audio input device selected there.
I suspect, though, you're looking for open-ended speech recognition, which NSSpeechRecognizer is not. If you're looking to transform any pre-recorded audio into text (ie, make a transcript of a recording), you're completely out of luck with NSSpeechRecognizer, as you must give it an array of "commands" to listen for.
Theoretically, you could feed it the whole dictionary, but I don't think that would work since you usually have to give it clear, distinct commands. Its performance would suffer, I would guess, if you gave it a bunch of stuff to analyze for (in real time).
Your best bet is to look at third-party open source solutions. There are a few generalized packages out there (none specifically for Cocoa/Objective-C), but this poses another question: What kind of recognition are you looking for? The two main forms of speech recognition ('trained' is more accurate but less flexible for different voices and the recording environment, whereas 'open' is generally much less accurate).
It'd probably be best if you stated exactly what you're trying to accomplish.
I'm looking for an OSX (or Linux?) application that can recieve data from a webcam/video-input and let you do some image processing on the pixels in something similar to c or python or perl, not that bothered about the processing language.
I was considering throwing one together but figured I'd try and find one that exists already first before I start re-inventing the wheel.
Wanting to do some experiments with object detection and reading of dials and numbers.
If you're willing to do a little coding, you want to take a look at QTKit, the QuickTime framework for Cocoa. QTKit will let you easity set up an input source from the webcam (intro here). You can also apply Core Image filters to the stream (demo code here). If you want to use OpenGL to render or apply filters to the movie, check out Core Video (examples here).
Using theMyMovieFilter demo should get you up and running very quickly.
Found a cross platform tool called 'Processing', actually ran the windows version to avoid further complications getting the webcams to work.
Had to install quick time, and something called gVid to get it to work but after the initial hurdle coding seems like C; (I think it gets "compiled" into Java), and it runs quite fast; even scanning pixels from the webcam in real time.
Still to get it working on OSX.
Depending on what processing you want to do (i.e. if it's a filter that's available in Apple's Core Image filter library), the built-in Photo Booth app may be all you need. There's a comercial set of add-on filters available from the Apple store as well (http://www.apple.com/downloads/macosx/imaging_3d/composerfxeffectsforphotobooth.html).