How to interpret Apple Core Media Frame Quanta numbers? - macos

The Apple Mac OS framework offers an api called CMTimeCodeFormatDescriptionGetFrameQuanta.
I'm trying to use this in a Swift Mac program's user interface as part of a pick list of video formats available from webcams and other video sources. It returns an integer. The doc says this about the units of its result.
Returns the frames/sec for timecode (e.g. 30) OR frames/tick for counter mode.
How do I make sense of frames/tick? The built in iSight camera gives back 0, 1932, or 35632, depending on the resolution of the format. A Logitec USB cam gives back mostly 73600, but sometimes 0 or 1932, also depending on resolution.
What does this number mean? How can I render it in a way that makes sense to end users, like 30fps or 24fps?

Related

Real time microphone audio manipulation windows

I would like to make an app (Target pc windows) that let you modify the micro input in real time, like introducing sound effects or even modulating your voice.
I searched over the internet and only found people telling that it would not be possible without using a virtual audio cable.
However I know some apps with similar behavior (voicemod, resonance) not using a virtual audio cable so I would like some help about how can be done (just the name of a library capable would be enough) or where to start.
Firstly, you can use professional ready-made software for that - Digital audio workstation (DAW) in combination with a huge number of plugins for that.
See 5 steps to real-time process your instrument in the DAW.
And What is (audio) direct monitoring?
If you are sure you have to write your own, you can use libraries for real-time audio processing (as far as I know, C++ is better for this than C#).
These libraries really works. They are specially designed for realtime.
https://github.com/thestk/rtaudio
http://www.portaudio.com/
See also https://en.wikipedia.org/wiki/Csound
If you don't have a professional sound interface yet, but want to minimize a latency, read about Asio4All
The linked tutorial worked for me. In it, a sound is recorded and saved to a .wav.
The key to having this stream to a speaker would be opening a SourceDataLine and outputting to that instead of writing to a wav file. So, instead of outputting on line 59 to AudioSystem.write, output to a SourceDataLine write method.
IDK if there will be a feedback issue. Probably good to output to headphones and not your speakers!
To add an effect, the AudioInputLine has to be accessed and processed in segments. In each segment the following needs to happen:
obtain the byte array from the AudioInputLine
convert the audio bytes to PCM
apply your audio effect to the PCM (if the effect is a volume change over time, this could be done by progressively altering a volume factor between 0 to 1, multiplying the factor against the PCM)
convert back to audio bytes
write to the SourceDataLine
All these steps have been covered in StackOverflow posts.
The link tutorial does some simplification in how file locations, threads, and the stopping and starting are handled. But most importantly, it shows a working, live audio line from the microphone.

How to capture from WDM Streaming audio sources (Realtek HD Audio)

I'm trying to use various audio sources in DirectShow and I have these capture devices in my system which I think are quite common (provided by chipset drivers):
Realtek HD Audio Line input
Realtek HD Audio Stereo input
Realtek HD Audio Mic input
They look like capture sources, expose analog input and 24-bit pcm output, and can connect the output to other filters (renderer etc).
But the return code from IMediaFilter::Run of the capture filter is ERROR_BAD_COMMAND which does not say much. I tried it in my program and also in GraphStudioNext which did not reveal any extra information.
Is it possible to use these for capture and how?
Update
For instance, I tried this graph with mic input (actually connected and working). In this setup, the graph does not start (ERROR_BAD_COMMAND) but with the other source, it would start.
This is the same device but different drivers. The one that works is from the category "Audio capture sources" the one that does not "WDM Streaming Capture Devices".
The easiest way to check the device with GraphStudioNext is to build a recording graph with the PCM audio input device itself, AVI Mux filter and File Writer filter connected as this (with default media types):
You hit Run and the recording graph produces non-empty file via Filter Writer in the location prompted during graph building.
--
So now I realized your question is a bit different. You see filters corresponding to your audio input device both under
Audio Capture Sources -- CLSID_AudioInputDeviceCategory
WDM Streaming Capture Devices -- AM_KSCATEGORY_CAPTURE
And the question is that the first filter works and the other does not.
A similar filter from AM_KSCATEGORY_CAPTURE seems to be connecting into topology, but attempt to run triggers ERROR_BAD_COMMAND.
First of all, these are indeed different filters. Even though underlying hardware might be the same, the "frontend" filters are different. The wrapper that "works" is Audio Capture Filter backed by WDM device. In the other case it is Generic WDM Filter Proxy which behavior is, generally speaking, undefined. The filter is not documented and, I am guessing, it does not receive sufficient initialization or does not implement required behavior otherwise, so this proxy is not and is not supposed to be interchangeable with Audio Capture Filter proxy.

How to use a video as an avcapture input

Trying to test a variety of camera inputs against an application, but since it would be prohibitive to get the webcam to do the exact same thing every time, and to change lenses, would like to just shoot videos and use it as an input.
I can see how to query OSX for AVCapture devices, but is it possible create one and register it with the system, while feeding it frames from a saved video file?

Registry location of DirectShow audio capture devices

I am executing VLC from my application to capture and encode from a DirectShow audio capture device. VLC sends the encoded data to my application via STDOUT. I need a way to enumerate DirectShow audio capture devices. Unfortunately, VLC doesn't seem to provide any non-GUI way for this.
While looking for a simple way to get a list of device names, I stumbled on these registry keys where child keys are named after audio capture devices:
HKEY_CURRENT_USER\Software\Microsoft\ActiveMovie\devenum 64-bit\{33D9A762-90C8-11D0-BD43-00A0C911CE86}
HKEY_CURRENT_USER\Software\Microsoft\ActiveMovie\devenum\{33D9A762-90C8-11D0-BD43-00A0C911CE86}
Is this registry location guaranteed to be in the same place for other machines and recent versions of DirectX? Short of implementing a ton of DirectX code, is there some other way to get a list of the DirectShow audio device names? (Possibly through some output of a diagnostic tool.)
The list of DirectShow (a Windows core API, not a part of DirectX anymore) devices is provided by enumerators listing specific category (audio input devices in this case, CLSID_AudioInputDeviceCategory) on request. This is the GUID in question and registry does not necessarily contains entries for all devices there. Instead, enumerator provides the list of devices programmatically via API, combining the available devices of different types.
There is no way to affect enumeration order in well defined/documented way.
The easiest way to enumerate the devices is Windows SDK GraphEdt.exe tool, or its nicer alternate options GraphStudio/GraphStudioNext. Ctrl+F and then select the category:
You can also enumerate devices and their capabilities with EnumerateAudioCaptureFilterCapabilities command line tool (source code), where "Friendly Name" lines list devices in enumeration order:
Moniker Display Name: #device:cm:{33D9A762-90C8-11D0-BD43-00A0C911CE86}\Stereo Mix (Realtek High Defini
Friendly Name: Stereo Mix (Realtek High Defini
Pin: Capture
Capability Count: 23
Capability 0:
AM_MEDIA_TYPE:
.bFixedSizeSamples: 1
.bTemporalCompression: 0
.lSampleSize: 4
.cbFormat: 18
WAVEFORMATEX:
.wFormatTag: 1
.nChannels: 2
.nSamplesPerSec: 44100
.nAvgBytesPerSec: 176400
.nBlockAlign: 4
.wBitsPerSample: 16
.cbSize: 0
To affect the order, such as to place a device on interest on top of the list, I can only think of API hooking, which is a possible thing but not recommended for wide use due to alteration of standard system behavior.

DirectShow - How to read a file from a source filter

I'm writing a DirectShow source filter which is registered as a CLSID_VideoInputDeviceCategory, so it can be seen as a Video Capture Device (from Skype, for example, it is viewed as another WebCam).
My source filter is based on the VCam example from here, and, for now, the filter produces the exact output as this example (random colored pixels with one Video output pin, no audio yet), all implemented in the FillBuffer() method of the one and only output pin.
Now the real scenario will be a bit more tricky - The filter uses a file handle to a hardware device, opened using the CreateFile() API call (opening the device is out of my control, and is done by a 3Party library). It should then read chunks of data from this handle (usually 256-512 bytes chunk sizes).
The device is a WinUSB device and the 3Party framework just "gives" me an opened file handle to read chunks from.
The data read by the filter is a *.mp4 file, which is streamed from the device to the "handle".
This scenario is equivalent to a source filter reading from a *.mp4 file on the disk (in "chunks") and pushing its data to the DirectShow graph, but without the ability to read the file entirely from start to end, so the file size is unknown (Correct?).
I'm pretty new to DirectShow and I feel as though I'm missing some basic concepts. I'll be happy if anyone can direct me to solutions\resources\explanations for the following questions:
1) From various sources on the web and Microsoft SDK (v7.1) samples, I understood that for an application (such as Skype) to build a correct & valid DirectShow graph (so it will render the Video & Audio successfully), the source filter pin (inherits from CSourceStream) should implement the method "GetMediaType". Depending on the returned value from this implemented function, an application will be able to build the correct graph to render the data, thus, build the correct order of filters. If this is correct - How would I implement it in my case so that the graph will be built to render *.mp4 input in chunks (we can assume constant chunk sizes)?
2) I've noticed the the FillBuffer() method is supposed to call SetTime() for the IMediaSample object it gets (and fills). I'm reading raw *.mp4 data from the device. Will I have to parse the data and extract the frames & time values from the stream? If yes - an example would b great.
3) Will I have to split the data received from the file handle (the "chunks") to Video & Audio, or can the data be pushed to the graph without the need to manipulate it in the source filter? If split is needed - How can it be done (the data is not continuous, and is spitted to chunks) and will this affect the desired implementation of "GetMediaType"?
Please feel free to correct me if I'm using incorrect terminology.
Thanks :-)
This is a good question. On the one hand this is doable, but there is some specific involved.
First of all, your filter registered under CLSID_VideoInputDeviceCategory category is expected to behave as a live video source. By doing so you make it discoverable by applications (such as Skype as you mentioned), and those applications will be attempting to configure video resolution, they expect video to go at real time rate, some applications (such as Skype) are not expecting compressed video such H.264 there or would just reject such device. You can neither attach audio right to this filter as applications would not even look for audio there (not sure if you have audio on your filter, but you mentioned .MP4 file so audio might be there).
On your questions:
1 - You would have a better picture of application requirement by checking what interface methods applications call on your filter. Most of the methods are implemented by BaseClasses and convert the calls into internal methods such as GetMediaType. Yes you need to implement it, and by doing so you will - among other - enable your filter to connect with downstream filter pins by trying specific media types you support.
Again, those cannot me MP4 chunks, even if such approach can work in other DirectShow graphs. Implementing a video capture device you should be delivering exactly video frames, preferably decompressed (well those could be compressed too, but you are going to immediately have compatibility issies with applications).
A solution you might be thinking of is to embed a fully featured graph internally to which you inject your MP4 chunks, then the pipelines parse those, decodes and delivers to your custom renderer, taking frames on which you re-expose them off your virtual device. This might be a good design, though assumes certain understanding of how filters work internally.
2 - Your device is typically treated as/expected to be a live source, which means that you deliver video in realtime and frames are not necessarily time stamped. So you can put times there and yes you definitely need to extract time stamps from your original media (or have it done by internal graph as mentioned in item 1 above), however be prepared that applications strip time stamps especially for preview purposes, since the source is "live".
3 - Getting back to audio, you cannot implement audio on the same virtual device. Well you can, and this filter might be even working in a custom built graph, but this is not going to work with applications. They will be looking for separate audio device, and if you implement such, they will instantiate it separately. So you are expected to implement both virtual video and virtual audio source, and implement internal synchronization behind the scenes. This is where timestamps will be important, by providing them correctly you will keep lip sync in live session to what it was originally on the media file you are streaming from.

Resources