WMV encoding using Media Foudation: specifying "Number of B Frames" - windows

I am encoding video to WMV using Media Foundation SDK. I see that the number of B frames can be set using a property, but I have no clue how/where to set it.
That property is called MFPKEY_NUMBFRAMES and is described here:
http://msdn.microsoft.com/en-us/library/windows/desktop/ff819354%28v=vs.85%29.aspx
Our code does roughly the following:
call MFStartup
call MFCreateAttributes once so we can set muxer, video and audio attributes
configure the IMFAttributes created in the previous step, for example by setting the video bitrate: pVideoOverrides->SetUINT32(MF_MT_AVG_BITRATE, m_iVideoBitrateBPS);
create sink writer by calling IMFReadWriteClassFactory::CreateInstanceFromURL
for each frame, call WriteSample on the sink writer
call MFShutdown
Am I supposed to set the b-frames property on the IMFAttribute on which I also set the video bitrate?

The property is applicable to Windows Media Video 9 Encoder. That is, you need to locate it on your topology and adjust the property there. Other topology elements (e.g. multiplexer) might accept other properties, but this one has no effect there.
MSDN gives you step by st4ep instructions in Configuring a WMV Encoder and where it says
To specify the target bitrate, set the MF_MT_AVG_BITRATE attribute on the media type.
You can also alter other encoder properties. There is also step by step detailed Tutorial: 1-Pass Windows Media Encoding which shows the steps of the entire process.

Related

Name of the property that specifies the desired bit rate in CBR audio encoding

I'm trying to configure a "Windows Media Audio Standard" DMO codec to compress in single-pass, constant bit-rate mode (CBR). Unfortunately I can not find on the MSDN documentation how can I pass the desired bit-rate to the encoder object.
In other words, I'm looking for the equivalent of MFPKEY_RMAX which seems to identify be the desired bit-rate setting for two-pass Variable Bit-rate encoding, but for single-pass, CBR encoding.
Finally found it.
The key I required is MF_MT_AUDIO_AVG_BYTES_PER_SECOND and is documented here:
Choose the encoding bit rate.
For CBR encoding, you must know the bit rate at which you want to encode the stream before the encoding session begins. You must set the bit rate during while you are configuring the encoder. To do this, while you are performing media type negotiation, check the MF_MT_AUDIO_AVG_BYTES_PER_SECOND attribute (for audio streams) or the MF_MT_AVG_BITRATE attribute (for video streams) of the available output media types and choose an output media type that has the average bit rate closest to the target bit rate you want to achieve. For more information, see Media Type Negotiation on the Encoder.

AddSourceFilter behavior

The following code is good at rendering an MPG file without audio:
IBaseFilter *pRenderer;
CoCreateInstance(CLSID_VideoRenderer, NULL, CLSCTX_INPROC_SERVER, IID_PPV_ARGS(&pRenderer)));
IFileSourceFilter *pSourceFilter;
IBaseFilter *pBaseFilter;
CoCreateInstance(CLSID_AsyncReader, NULL, CLSCTX_INPROC_SERVER, IID_PPV_ARGS(&pSourceFilter));
pSourceFilter->QueryInterface(IID_PPV_ARGS(&pBaseFilter));
pGraphBuilder->AddFilter(pRenderer, L"Renderer Filter");
pSourceFilter->Load(filename, NULL);
pGraphBuilder->AddFilter(pBaseFilter, L"File Source Filter");
But fails with an WMV file with audio. The failure happens at the following call, when I connect the only output of the video source with the only input of the video renderer.
pGraphBuilder->Connect(pOutPin[0], pInPin[0])
Which returns -2147220969. If I replace the code above with the following:
IBaseFilter *pRenderer;
CoCreateInstance(CLSID_VideoRenderer, NULL, CLSCTX_INPROC_SERVER, IID_PPV_ARGS(&pRenderer));
IBaseFilter *pBaseFilter;
pGraphBuilder->AddSourceFilter(filename, L"Renderer Filter", &pBaseFilter);
pGraphBuilder->AddFilter(pRenderer, L"Renderer Filter");
then the MPG plays fine with:
pGraphBuilder->Connect(pOutPin[0], pInPin[0])
while the WMV results in the same error as above, but instead it plays upside down with:
pGraphBuilder->Connect(pOutPin[1], pInPin[0])
All of this means that the second coding style creates a source with two output pins, and probably audio is mapped to the first one. Or, maybe, an A/V splitter is inserted automatically by DirectShow.
My understanding is that AddSourceFilter can create a splitter transparently. Is it correct?
If I want to do it manually, which component should I use?
Why the WMV video renders upside-down?
Which returns -2147220969
Which is 0x80040217 VFW_E_CANNOT_CONNECT "No combination of intermediate filters could be found to make the connection."
which is the result of your manual adding CLSID_AsyncReader: Windows Media files are typically rendered through another source filter (use GraphEdit from Windows SDK to render a file and you will be able to inspect the topology).
My understanding is that AddSourceFilter can create a splitter transparently. Is it correct?
Yes if splitter is compatible with Async Reader, which is not the case.
If I want to do it manually, which component should I use?
Use GraphEdit to create topologies interactively and you will have an idea what to do on code.
Why the WMV video renders upside-down?
Because of the topology. Most likely you have a weird combination of filters on the pipeline, including third party ones. Inspecting effective topology is the key to resolve the problem.
Use pGraphBuilder->AddSourceFilter() to add the source filter for a specific file. Don't assume that the File Source (Async) is the right source filter (for some formats, the source and demux are combined into a single filter).

build an encoder on android(FFMPEG)

I need to build an encoder on android. Trying to encode the video stream captured by camera to h.264.
I've got the libffmpeg.so file, but I don't know how to use it.
I'm new on this. Could anyone give some suggestions?
To use the FFMPEG libraries on Android, you would have to integrate the same as OMX components.
For ffmpeg compilation and OMX generation, you could refer to this link: FFmpeg on Android
Once you have the OMX component ready, you will have to integrate the same into Android, by including the same in media_codecs.xml. If you desire to invoke your specific encoder always, please do ensure that your codec is the first codec registered in the list.
For the encoder, you will to have consider a couple of important points.
One, if you wish to optimize your system, then you may want to avoid copying of frames from the source (camera, surface or some other source) to the input port of your OMX encoder component. Hence, your codec will have to support passing of buffers through metadata (Reference: http://androidxref.com/4.2.2_r1/xref/frameworks/av/media/libmediaplayerservice/StagefrightRecorder.cpp#1413). If you require more information on this topic, please raise a separate question.
Two, The encoder will have to support standard OMX indices and some new indices. For example, for Miracast, a new index prependSPSPPStoIDRFrames is introduced, which is supported through getExtensionIndex. For reference, you could refer to http://androidxref.com/4.2.2_r1/xref/frameworks/av/media/libstagefright/ACodec.cpp#891 .
In addition to the aforementioned index, the encoder will also get a new request to enableGraphicBuffers with a FALSE boolean value. The most important point for these 2 indices is to ensure that the OMX component doesn't fail when these 2 indices are invoked.
With these modifications, you should be able to integrate your encoder into Stagefright framework.

What is the minimum amount of metadata is needed to stream only video using libx264 to encode at the server and libffmpeg to decode at the client?

I want to stream video (no audio) from a server to a client. I will encode the video using libx264 and decode it with ffmpeg. I plan to use fixed settings (at the very least they will be known in advance by both the client and the server). I was wondering if I can avoid wrapping the compressed video in a container format (like mp4 or mkv).
Right now I am able to encode my frames using x264_encoder_encode. I get a compressed frame back, and I can do that for every frame. What extra information (if anything at all) do I need to send to the client so that ffmpeg can decode the compressed frames, and more importantly how can I obtain it with libx264. I assume I may need to generate NAL information (x264_nal_encode?). Having an idea of what is the minimum necessary to get the video across, and how to put the pieces together would be really helpful.
I found out that the minimum amount of information are the NAL units from each frame, this will give me a raw h264 stream. If I were to write this to a file, I could watchit using VLC if adding a .h264
I can also open such a file using ffmpeg, but if I want to stream it, then it makes more sense to use RTSP, and a good open source library for that is Live555: http://www.live555.com/liveMedia/
In their FAQ they mention how to send the output from your encoder to live555, and there is source for both a client and a server. I have yet to finish coding this, but it seems like a reasonable solution

Can I get raw video frames from DirectShow without playback

I'm working on a media player using Media foundation. I want to support VOB files playback. However, media foundation currently does not support the VOB container. Therefore I wish to use DirectShow for the same.
My idea here is not to take an alternate path using a DirectsShow graph, but just grab a video frame and pass it to the same pipeline in media foundation. In media foundation, I have an 'IMFSourceReader' which simply reads frames from the video file. Is there a direct show equivalent, which just gives me the frames without needing to create a graph, start playback cycle, and then trying to extract frames from the renders pin? (To be more clear, does DirectsShow support an architecture wherein it could give me raw frames without actually having to play the video?)
I've read about ISampleGrabber but its deprecated and I think it won't fit my architecture. I've not worked on DirectShow before.
Thanks,
Mots
You have to build a graph and accept frames from the respective parser/demultiplexer filter which will read container and deliver individual frames on its output.
The playback does not have to be realtime, nor you need to fake painting those video frames somewhere. Once you get the data you need in Sample Grabber filter, or a customer filter, you can terminate pipeline with a Null Renderer. That is, you can arrange getting frames you need in a more or less convenient way.
You can use Monogram frame grabber filter to connect the VOB DS filter's output - it works great. See the comments there for how to connect the output to external application.

Resources