FFMPEG- H.264 encoding BGR image data to YUP420P video file resulting in empty video - ffmpeg

I'm new to FFMPEG and trying to use it to do some screen capture to a video file, but after a lot of online searching I am stumped as to what I'm doing wrong. Basically, I've already done the effort of capturing screen data via DirectX which stores in a BGR pixel format and I'm just trying to put each frame in a video file. There's two functions, setup which does all the ffmpeg initialization work, and addImage which is called in the main program loop and puts each buffer of BGR image data into a video file. The technique I'm doing for this is to make two frames, one with the BGR data and one with YUP420P (doesn't need to be the latter but after a lot of trial and error it was all I was able to get working with H.264), and use sws_scale to copy data between the two, and then send that frame to video.mp4. The file seems to be having data written to it successfully (the file size grows and grows as the program runs), but when I try and view it in VLC I see nothing- indeed, VLC fails to fetch a length of the video, and bringing up codec and media information both are empty. I turned on ffmpeg verbose logging but all that is spit out is the following:
Setting default whitelist 'Epu��'
Timestamps are unset in a packet for stream -1259342440. This is deprecated and will stop working in the future. Fix your code to set the timestamps properly
Encoder did not produce proper pts, making some up.
From what I am reading, I understand this to be warnings rather than errors that would totally corrupt my video file. I separately went through all the error codes being spit out and everything seems nominal to me (zero for success for most calls, -11 sometimes for avcodec_receive_packet but the docs indicate that's expected sometimes).
Based on my understanding of things as they are, this should be working, but isn't, and the logs and error codes give me nothing to go on, so someone with experience with this I reckon would save me a ton of time. The code is as follows:
VideoService.h
#ifndef VIDEO_SERVICE_H
#define VIDEO_SERVICE_H
extern "C" {
#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
#include <libavutil/imgutils.h>
#include <libswscale/swscale.h>
}
class VideoService {
public:
void setup();
void addImage(unsigned char* data, int lineSize, int width, int height, int align);
private:
AVCodecContext* context;
AVFormatContext* formatContext;
AVFrame* bgrFrame;
AVFrame* yuvFrame;
AVStream* videoStream;
SwsContext* swsContext;
};
#endif
VideoService.cpp
#include "VideoService.h"
#include <stdio.h>
void FfmpegLogCallback(void *ptr, int level, const char *fmt, va_list vargs)
{
FILE* f = fopen("ffmpeg.txt", "a");
fprintf(f, fmt, vargs);
fclose(f);
}
void VideoService::setup() {
int result = 0;
av_log_set_level(AV_LOG_VERBOSE);
av_log_set_callback(FfmpegLogCallback);
bgrFrame = av_frame_alloc();
bgrFrame->width = 1920;
bgrFrame->height = 1080;
bgrFrame->format = AV_PIX_FMT_BGRA;
bgrFrame->time_base.num = 1;
bgrFrame->time_base.den = 60;
result = av_frame_get_buffer(bgrFrame, 1);
yuvFrame = av_frame_alloc();
yuvFrame->width = 1920;
yuvFrame->height = 1080;
yuvFrame->format = AV_PIX_FMT_YUV420P;
yuvFrame->time_base.num = 1;
yuvFrame->time_base.den = 60;
result = av_frame_get_buffer(yuvFrame, 1);
const AVOutputFormat* outputFormat = av_guess_format("mp4", "video.mp4", "video/mp4");
result = avformat_alloc_output_context2(
&formatContext,
outputFormat,
"mp4",
"video.mp4"
);
formatContext->oformat = outputFormat;
const AVCodec* codec = avcodec_find_encoder(AVCodecID::AV_CODEC_ID_H264);
result = avio_open2(&formatContext->pb, "video.mp4", AVIO_FLAG_WRITE, NULL, NULL);
videoStream = avformat_new_stream(formatContext, codec);
AVCodecParameters* codecParameters = videoStream->codecpar;
codecParameters->codec_type = AVMediaType::AVMEDIA_TYPE_VIDEO;
codecParameters->codec_id = AVCodecID::AV_CODEC_ID_HEVC;
codecParameters->width = 1920;
codecParameters->height = 1080;
codecParameters->format = AVPixelFormat::AV_PIX_FMT_YUV420P;
videoStream->time_base.num = 1;
videoStream->time_base.den = 60;
result = avformat_write_header(formatContext, NULL);
codec = avcodec_find_encoder(videoStream->codecpar->codec_id);
context = avcodec_alloc_context3(codec);
context->time_base.num = 1;
context->time_base.den = 60;
avcodec_parameters_to_context(context, videoStream->codecpar);
result = avcodec_open2(context, codec, nullptr);
swsContext = sws_getContext(1920, 1080, AV_PIX_FMT_BGRA, 1920, 1080, AV_PIX_FMT_YUV420P, 0, 0, 0, 0);
}
void VideoService::addImage(unsigned char* data, int lineSize, int width, int height, int align) {
int result = 0;
result = av_image_fill_arrays(bgrFrame->data, bgrFrame->linesize, data, AV_PIX_FMT_BGRA, 1920, 1080, 1);
sws_scale(swsContext, bgrFrame->data, bgrFrame->linesize, 0, 1080, &yuvFrame->data[0], yuvFrame->linesize);
result = avcodec_send_frame(context, yuvFrame);
AVPacket *packet = av_packet_alloc();
result = avcodec_receive_packet(context, packet);
if (result != 0) {
return;
}
result = av_interleaved_write_frame(formatContext, packet);
}
My environment is windows 10, I'm building with clang++ 12.0.1, and using the FFMPEG 5.1 libs.

See the official sample, muxing.c.
Fix you code like the following.
Set fields of an AVCodecContext and call the avcodec_parameters_from_context(), instead of calling the avcodec_parameters_to_context(). You should set width, height, bit_rate, pix_fmt, framerate and time_base at least.(See the implementation of the add_stream() in the sample.)
Specify an algorithm such as the SWS_BILINEAR when calling the sws_getContext().(Although a default algorithm will be selected, that's an undocumented feature.)
Set the pts(presentation timestamp) field of an AVFrame.
Implement a loop calling the avcodec_receive_packet() after calling the avcodec_send_frame(). See the write_frame() in the sample.(Single frame can result in multiple packets.)

Related

Encoding of raw frames (D3D11Texture2D) to an rtsp stream using libav*

I have managed to create a rtsp stream using libav* and directX texture (which I am obtaining from GDI API using Bitblit method). Here's my approach for creating live rtsp stream:
Create output context and stream (skipping the checks here)
avformat_alloc_output_context2(&ofmt_ctx, NULL, "rtsp", rtsp_url); //RTSP
vid_codec = avcodec_find_encoder(ofmt_ctx->oformat->video_codec);
vid_stream = avformat_new_stream(ofmt_ctx,vid_codec);
vid_codec_ctx = avcodec_alloc_context3(vid_codec);
Set codec params
codec_ctx->codec_tag = 0;
codec_ctx->codec_id = ofmt_ctx->oformat->video_codec;
//codec_ctx->codec_type = AVMEDIA_TYPE_VIDEO;
codec_ctx->width = width; codec_ctx->height = height;
codec_ctx->gop_size = 12;
//codec_ctx->gop_size = 40;
//codec_ctx->max_b_frames = 3;
codec_ctx->pix_fmt = target_pix_fmt; // AV_PIX_FMT_YUV420P
codec_ctx->framerate = { stream_fps, 1 };
codec_ctx->time_base = { 1, stream_fps};
if (fctx->oformat->flags & AVFMT_GLOBALHEADER)
{
codec_ctx->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
}
Initialize video stream
if (avcodec_parameters_from_context(stream->codecpar, codec_ctx) < 0)
{
Debug::Error("Could not initialize stream codec parameters!");
return false;
}
AVDictionary* codec_options = nullptr;
if (codec->id == AV_CODEC_ID_H264) {
av_dict_set(&codec_options, "profile", "high", 0);
av_dict_set(&codec_options, "preset", "fast", 0);
av_dict_set(&codec_options, "tune", "zerolatency", 0);
}
// open video encoder
int ret = avcodec_open2(codec_ctx, codec, &codec_options);
if (ret<0) {
Debug::Error("Could not open video encoder: ", avcodec_get_name(codec->id), " error ret: ", AVERROR(ret));
return false;
}
stream->codecpar->extradata = codec_ctx->extradata;
stream->codecpar->extradata_size = codec_ctx->extradata_size;
Start streaming
// Create new frame and allocate buffer
AVFrame* AllocateFrameBuffer(AVCodecContext* codec_ctx, double width, double height)
{
AVFrame* frame = av_frame_alloc();
std::vector<uint8_t> framebuf(av_image_get_buffer_size(codec_ctx->pix_fmt, width, height, 1));
av_image_fill_arrays(frame->data, frame->linesize, framebuf.data(), codec_ctx->pix_fmt, width, height, 1);
frame->width = width;
frame->height = height;
frame->format = static_cast<int>(codec_ctx->pix_fmt);
//Debug::Log("framebuf size: ", framebuf.size(), " frame format: ", frame->format);
return frame;
}
void RtspStream(AVFormatContext* ofmt_ctx, AVStream* vid_stream, AVCodecContext* vid_codec_ctx, char* rtsp_url)
{
printf("Output stream info:\n");
av_dump_format(ofmt_ctx, 0, rtsp_url, 1);
const int width = WindowManager::Get().GetWindow(RtspStreaming::WindowId())->GetTextureWidth();
const int height = WindowManager::Get().GetWindow(RtspStreaming::WindowId())->GetTextureHeight();
//DirectX BGRA to h264 YUV420p
SwsContext* conversion_ctx = sws_getContext(width, height, src_pix_fmt,
vid_stream->codecpar->width, vid_stream->codecpar->height, target_pix_fmt,
SWS_BICUBIC | SWS_BITEXACT, nullptr, nullptr, nullptr);
if (!conversion_ctx)
{
Debug::Error("Could not initialize sample scaler!");
return;
}
AVFrame* frame = AllocateFrameBuffer(vid_codec_ctx,vid_codec_ctx->width,vid_codec_ctx->height);
if (!frame) {
Debug::Error("Could not allocate video frame\n");
return;
}
if (avformat_write_header(ofmt_ctx, NULL) < 0) {
Debug::Error("Error occurred when writing header");
return;
}
if (av_frame_get_buffer(frame, 0) < 0) {
Debug::Error("Could not allocate the video frame data\n");
return;
}
int frame_cnt = 0;
//av start time in microseconds
int64_t start_time_av = av_gettime();
AVRational time_base = vid_stream->time_base;
AVRational time_base_q = { 1, AV_TIME_BASE };
// frame pixel data info
int data_size = width * height * 4;
uint8_t* data = new uint8_t[data_size];
// AVPacket* pkt = av_packet_alloc();
while (RtspStreaming::IsStreaming())
{
/* make sure the frame data is writable */
if (av_frame_make_writable(frame) < 0)
{
Debug::Error("Can't make frame writable");
break;
}
//get copy/ref of the texture
//uint8_t* data = WindowManager::Get().GetWindow(RtspStreaming::WindowId())->GetBuffer();
if (!WindowManager::Get().GetWindow(RtspStreaming::WindowId())->GetPixels(data, 0, 0, width, height))
{
Debug::Error("Failed to get frame buffer. ID: ", RtspStreaming::WindowId());
std::this_thread::sleep_for (std::chrono::seconds(2));
continue;
}
//printf("got pixels data\n");
// convert BGRA to yuv420 pixel format
int srcStrides[1] = { 4 * width };
if (sws_scale(conversion_ctx, &data, srcStrides, 0, height, frame->data, frame->linesize) < 0)
{
Debug::Error("Unable to scale d3d11 texture to frame. ", frame_cnt);
break;
}
//Debug::Log("frame pts: ", frame->pts, " time_base:", av_rescale_q(1, vid_codec_ctx->time_base, vid_stream->time_base));
frame->pts = frame_cnt++;
//frame_cnt++;
//printf("scale conversion done\n");
//encode to the video stream
int ret = avcodec_send_frame(vid_codec_ctx, frame);
if (ret < 0)
{
Debug::Error("Error sending frame to codec context! ",frame_cnt);
break;
}
AVPacket* pkt = av_packet_alloc();
//av_init_packet(pkt);
ret = avcodec_receive_packet(vid_codec_ctx, pkt);
if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF)
{
//av_packet_unref(pkt);
av_packet_free(&pkt);
continue;
}
else if (ret < 0)
{
Debug::Error("Error during receiving packet: ",AVERROR(ret));
//av_packet_unref(pkt);
av_packet_free(&pkt);
break;
}
if (pkt->pts == AV_NOPTS_VALUE)
{
//Write PTS
//Duration between 2 frames (us)
int64_t calc_duration = (double)AV_TIME_BASE / av_q2d(vid_stream->r_frame_rate);
//Parameters
pkt->pts = (double)(frame_cnt * calc_duration) / (double)(av_q2d(time_base) * AV_TIME_BASE);
pkt->dts = pkt->pts;
pkt->duration = (double)calc_duration / (double)(av_q2d(time_base) * AV_TIME_BASE);
}
int64_t pts_time = av_rescale_q(pkt->dts, time_base, time_base_q);
int64_t now_time = av_gettime() - start_time_av;
if (pts_time > now_time)
av_usleep(pts_time - now_time);
//pkt.pts = av_rescale_q_rnd(pkt.pts, in_stream->time_base, out_stream->time_base, (AVRounding)(AV_ROUND_NEAR_INF | AV_ROUND_PASS_MINMAX));
//pkt.dts = av_rescale_q_rnd(pkt.dts, in_stream->time_base, out_stream->time_base, (AVRounding)(AV_ROUND_NEAR_INF | AV_ROUND_PASS_MINMAX));
//pkt.duration = av_rescale_q(pkt.duration, in_stream->time_base, out_stream->time_base);
//pkt->pos = -1;
//write frame and send
if (av_interleaved_write_frame(ofmt_ctx, pkt)<0)
{
Debug::Error("Error muxing packet, frame number:",frame_cnt);
break;
}
//Debug::Log("RTSP streaming...");
//sstd::this_thread::sleep_for(std::chrono::milliseconds(1000/20));
//av_packet_unref(pkt);
av_packet_free(&pkt);
}
//av_free_packet(pkt);
delete[] data;
/* Write the trailer, if any. The trailer must be written before you
* close the CodecContexts open when you wrote the header; otherwise
* av_write_trailer() may try to use memory that was freed on
* av_codec_close(). */
av_write_trailer(ofmt_ctx);
av_frame_unref(frame);
av_frame_free(&frame);
printf("streaming thread CLOSED!\n");
}
Now, this allows me to connect to my rtsp server and maintain the connection. However, on the rtsp client side I am getting either gray or single static frame as shown below:
Would appreciate if you can help with following questions:
Firstly, why the stream is not working in spite of continued connection to the server and updating frames?
Video codec. By default rtsp format uses Mpeg4 codec, is it possible to use h264? When I manually set it to AV_CODEC_ID_H264 the program fails at avcodec_open2 with return value of -22.
Do I need to create and allocate new "AVFrame" and "AVPacket" for every frame? Or can I just reuse global variable for this?
Do I need to explicitly define some code for real-time streaming? (Like in ffmpeg we use "-re" flag).
Would be great if you can point out some example code for creating livestream. I have checked following resources:
https://github.com/FFmpeg/FFmpeg/blob/master/doc/examples/encode_video.c
streaming FLV to RTMP with FFMpeg using H264 codec and C++ API to flv.js
https://medium.com/swlh/streaming-video-with-ffmpeg-and-directx-11-7395fcb372c4
Update
While test I found that I am able to play the stream using ffplay, while it's getting stuck on VLC player. Here is snapshot on the ffplay log
The basic construct and initialization seems to be okay. Find below responses to your questions
why the stream is not working in spite of continued connection to the server and updating frames?
If you're getting an error or broken stream, you might wanna check into your presentation and decompression timestamps (pts/dts) of your packet.
In your code, I notice that you're taking time_base from video stream object which is not guranteed to be same as codec->time_base value and usually varies depending upon active stream.
AVRational time_base = vid_stream->time_base;
AVRational time_base_q = { 1, AV_TIME_BASE };
Video codec. By default rtsp format uses Mpeg4 codec, is it possible to use h264?
I don't see why not... RTSP is just a protocol for carrying your packets over the network. So you should be able use AV_CODEC_ID_H264 for encoding the stream.
Do I need to create and allocate new "AVFrame" and "AVPacket" for every frame? Or can I just reuse global variable for this?
In libav during encoding process a single packet is used for encoding a video frame, while there can be multiple audio frames in a single packet. I should reference this, but can't seem to find any source at the moment. But anyways the point is you would need to create new packet every time.
Do I need to explicitly define some code for real-time streaming? (Like in ffmpeg we use "-re" flag).
You don't need to add anything else for real time streaming. Although you might wanna implement it to limit the number of frame updates that you pass to encoder and save some performance.
for me the difference between ffplay good capture and VLC bad capture (for UDP packets) was pkt_size=xxx attribute (ffmpeg -re -i test.mp4 -f mpegts udp://127.0.0.1:23000?pkt_size=1316) (VLC open media network tab udp://#:23000:pkt_size=1316). So only if pkt_size is defined (and equal) VLC is able to capture.

How to change the settings of AVCodecContext after initialization (FFMPEG)

I have a question about Libavcodec that I can't find the answer to online. I'm trying to use H.264 to encode frames. The issue I'm having is that the frames I wish to encode have variable widths and heights. I understand that to encode frames in Libavcodec, you need to pass a "width" and "height" parameter to the AvCodecContext struct, and then initialize it as such:
AVCodec *codec = codec = avcodec_find_encoder(AV_CODEC_ID_H264);
AVCodecContext *context = avcodec_alloc_context3(encoder->codec);
context->width = 1920;
//OTHER SETTINGS HERE
//FINALLY...
avcodec_open2(context, codec, NULL);
Let's say that, after I've initialized this context, I need to encode a different frame that now has a width of 900. I can't simply just do context->width = 900 because the context has already been set to width 1920 and initialized. I could create an entirely new AvCodecContext and delete the previous one with avcodec_close() as follows:
AVCodec *codec = codec = avcodec_find_encoder(AV_CODEC_ID_H264);
AVCodecContext *context = avcodec_alloc_context3(encoder->codec);
context->width = 900;
//OTHER SETTINGS HERE
//FINALLY...
avcodec_open2(context, codec, NULL);
// DO THE ENCODING HERE
avcodec_close(context);
But my program has been crashing unexpectedly when I do this, and I feel like recreating the AVCodecContext every time I need to change a simple width/height setting is inefficient to begin with. Does anyone have any suggestions as to how I can go about doing this? Thank you very much!
That’s not a thing. You must reinitialize the encoder, or scale/pad the frames to the same size
I had the same problem, I solved it like this:
First I changed the codecContext width and height (They should be even numbers):
while (w%2 != 0) {
w--;
}
while (h%2 != 0) {
h--;
}
cctx->bit_rate = w*h*10;
cctx->width = w;
cctx->height = h;
Secons I initialized the codec with changed codecContext:
codec->init(cctx);
And finally destroy and recreated the AVFrame (I couldn't find a method to reinitialize frame without recreation and simply changing width and height doesn't work):
if (videoFrame) {
av_frame_free(&videoFrame);
}
if (!videoFrame) {
videoFrame = av_frame_alloc();
videoFrame->format = AV_PIX_FMT_YUV420P;
videoFrame->width = cctx->width;
videoFrame->height = cctx->height;
if ((av_frame_get_buffer(videoFrame, 32)) < 0) {
std::cout << "Failed to allocate picture" << std::endl;
return;
}
}

Replace Bento4 with libav / ffmpeg

We use Bento4 - a really well designed SDK - to demux mp4 files in .mov containers. Decoding is done by an own codec, so only the raw (intraframe) samples are needed. By now this works pretty straightforward
AP4_Track *test_videoTrack = nullptr;
AP4_ByteStream *input = nullptr;
AP4_Result result = AP4_FileByteStream::Create(filename, AP4_FileByteStream::STREAM_MODE_READ, input);
AP4_File m_file (*input, true);
//
// Read movie tracks, and metadata, find the video track
size_t index = 0;
uint32_t m_width = 0, m_height = 0;
auto item = m_file.GetMovie()->GetTracks().FirstItem();
auto track = item->GetData();
if (track->GetType() == AP4_Track::TYPE_VIDEO)
{
m_width = (uint32_t)((double)test_videoTrack->GetWidth() / double(1 << 16));
m_height = (uint32_t)((double)test_videoTrack->GetHeight() / double(1 << 16));
std::string codec("unknown");
auto sd = track->GetSampleDescription(0);
AP4_String c;
if (AP4_SUCCEEDED(sd->GetCodecString(c)))
{
codec = c.GetChars();
}
// Find and instantiate the decoder
AP4_Sample sample;
AP4_DataBuffer sampleData;
test_videoTrack->ReadSample(0, sample, sampleData);
}
For several reasons we would prefer replacing Bento4 with libav/ffmpeg (mainly because we already have in the project and want to reduce dependencies)
How would we ( preferrably in pseudo-code ) replace the Bento4-tasks done above with libav? Please remember that the used codec is not in the ffmpeg library, so we cannot use the standard ffmpeg decoding examples. Opening the media file simply fails. Without decoder we got no size or any other info so far. What we need would
open the media file
get contained tracks (possibly also audio)
get track size / length info
get track samples by index
It turned out to be very easy:
AVFormatContext* inputFile = avformat_alloc_context();
avformat_open_input(&inputFile, filename, nullptr, nullptr);
avformat_find_stream_info(inputFile, nullptr);
//Get just two streams...First Video & First Audio
int videoStreamIndex = -1, audioStreamIndex = -1;
for (int i = 0; i < inputFile->nb_streams; i++)
{
if (inputFile->streams[i]->codec->codec_type == AVMEDIA_TYPE_VIDEO && videoStreamIndex == -1)
{
videoStreamIndex = i;
}
else if (inputFile->streams[i]->codec->codec_type == AVMEDIA_TYPE_AUDIO && audioStreamIndex == -1)
{
audioStreamIndex = i;
}
}
Now test for the correct codec tag
// get codec id
char ct[64] = {0};
static const char* codec_id = "MPAK";
av_get_codec_tag_string( ct, sizeof(ct),inputFile->streams[videoStreamIndex]->codec->codec_tag);
assert(strncmp( ct , codec_id, strlen(codec_id)) == 0)
I did not know that the sizes are set even before a codec is chosen (or even available).
// lookup size
Size2D mediasize(inputFile->streams[videoStreamIndex]->codec->width, inputFile->streams[videoStreamIndex]->codec->height);
Seeking by frame and unpacking (video) is done like this:
AVStream* s = m_file->streams[videoStreamIndex];
int64_t seek_ts = (int64_t(frame_index) * s->r_frame_rate.den * s->time_base.den) / (int64_t(s->r_frame_rate.num) * s->time_base.num);
av_seek_frame(m_hap_file, videoStreamIndex, seek_ts, AVSEEK_FLAG_ANY);
AVPacket pkt;
av_read_frame(inputFile, &pkt);
Now the packet contains a frame ready to unpack with own decoder.

IMFTransform interface of Color Converter DSP giving E_INVALIDARG on SetInputType/SetOutputType

I'm trying to use Color Converter DMO (http://msdn.microsoft.com/en-us/library/windows/desktop/ff819079(v=vs.85).aspx) to convert RBG24 to YV12/NV12 via Media Foundation. I've created an instance of Color Converter DSP via CLSID_CColorConvertDMO and then tried to set the needed input/output types, but the calls always return E_INVALIDARG even when using media types that are returned by GetOutputAvailableType and GetInputAvailableType. If I set the media type to NULL then i get the error that the media type is invalid, that makes sense. I've seen examples from MSDN, where people do the same - enumerate available types and then set them as input types - and they claim it works, but i'm kinda stuck on the E_INVALIDARG. I understand that this is hard to answer without a code example, if no one has had similar experience, I'll try to post a snipplet, but maybe someone has experienced the same issue?
This DMO/DSP is dual interfaced and is both a DMO with IMediaObject and an MFT with IMFTransform. The two interfaces share a lot common, and here is a code snippet to test initialization of RGB24 into YV12 conversion:
#include "stdafx.h"
#include <dshow.h>
#include <dmo.h>
#include <wmcodecdsp.h>
#pragma comment(lib, "strmiids.lib")
#pragma comment(lib, "wmcodecdspuuid.lib")
int _tmain(int argc, _TCHAR* argv[])
{
ATLVERIFY(SUCCEEDED(CoInitialize(NULL)));
CComPtr<IMediaObject> pMediaObject;
ATLVERIFY(SUCCEEDED(pMediaObject.CoCreateInstance(CLSID_CColorConvertDMO)));
VIDEOINFOHEADER InputVideoInfoHeader;
ZeroMemory(&InputVideoInfoHeader, sizeof InputVideoInfoHeader);
InputVideoInfoHeader.bmiHeader.biSize = sizeof InputVideoInfoHeader.bmiHeader;
InputVideoInfoHeader.bmiHeader.biWidth = 1920;
InputVideoInfoHeader.bmiHeader.biHeight = 1080;
InputVideoInfoHeader.bmiHeader.biPlanes = 1;
InputVideoInfoHeader.bmiHeader.biBitCount = 24;
InputVideoInfoHeader.bmiHeader.biCompression = BI_RGB;
InputVideoInfoHeader.bmiHeader.biSizeImage = 1080 * (1920 * 3);
DMO_MEDIA_TYPE InputMediaType;
ZeroMemory(&InputMediaType, sizeof InputMediaType);
InputMediaType.majortype = MEDIATYPE_Video;
InputMediaType.subtype = MEDIASUBTYPE_RGB24;
InputMediaType.bFixedSizeSamples = TRUE;
InputMediaType.bTemporalCompression = FALSE;
InputMediaType.lSampleSize = InputVideoInfoHeader.bmiHeader.biSizeImage;
InputMediaType.formattype = FORMAT_VideoInfo;
InputMediaType.cbFormat = sizeof InputVideoInfoHeader;
InputMediaType.pbFormat = (BYTE*) &InputVideoInfoHeader;
const HRESULT nSetInputTypeResult = pMediaObject->SetInputType(0, &InputMediaType, 0);
_tprintf(_T("nSetInputTypeResult 0x%08x\n"), nSetInputTypeResult);
VIDEOINFOHEADER OutputVideoInfoHeader = InputVideoInfoHeader;
OutputVideoInfoHeader.bmiHeader.biBitCount = 12;
OutputVideoInfoHeader.bmiHeader.biCompression = MAKEFOURCC('Y', 'V', '1', '2');
OutputVideoInfoHeader.bmiHeader.biSizeImage = 1080 * 1920 * 12 / 8;
DMO_MEDIA_TYPE OutputMediaType = InputMediaType;
OutputMediaType.subtype = MEDIASUBTYPE_YV12;
OutputMediaType.lSampleSize = OutputVideoInfoHeader.bmiHeader.biSizeImage;
OutputMediaType.cbFormat = sizeof OutputVideoInfoHeader;
OutputMediaType.pbFormat = (BYTE*) &OutputVideoInfoHeader;
const HRESULT nSetOutputTypeResult = pMediaObject->SetOutputType(0, &OutputMediaType, 0);
_tprintf(_T("nSetOutputTypeResult 0x%08x\n"), nSetOutputTypeResult);
// TODO: ProcessInput, ProcessOutput
pMediaObject.Release();
CoUninitialize();
return 0;
}
This should work fine and print two S_OKs out...

Setting volumes for multiple streams using Media Foundation

I'm providing audio code for an application that will have multiple streams of audio being played back at the same time. I'm a bit confused by all of the different options, and there are some specific things that I don't quite understand.
I am using the IAudioClient calls to get and set volumes. Is that the best way to get volumes for multiple streams?
It appears that I have to call IAudioClient::Initialize. This function requires a WAVEFORMATEX structure. Are any parameters from that other than the number of channels used in volume setting? Also, it appears that Initialize can only be used once, and volume setting and reading happens many times. Should I save the reference to the IAudioClient and use it each time, or can I release it each time I get or set a volume?
How do I differentiate between two streams being played on the same device (endpoint)?
Here's the code that sets the volume (with the usual checks to make sure each call succeeded eliminated to save space):
hr = CoCreateInstance(__uuidof(MMDeviceEnumerator), NULL, CLSCTX_INPROC_SERVER, IID_PPV_ARGS(&DeviceEnumerator));
hr = DeviceEnumerator->GetDevice((wchar_t *)currentPlaybackDevice.id, &pPlaybackDevice);
hr = pPlaybackDevice->Activate(__uuidof(IAudioClient), CLSCTX_INPROC_SERVER, NULL, reinterpret_cast<void **>(&pPlaybackClient));
hr = pPlaybackClient->Initialize(AUDCLNT_SHAREMODE_SHARED, 0, 0, 0, &pWaveFormat, 0);
hr = pPlaybackClient->GetService(__uuidof(IAudioStreamVolume), (void **)&pStreamVolume);
hr = pStreamVolume->GetChannelCount(&channels);
for(UINT32 i = 0; i < channels; i++)
chanVolumes[i] = playbackLevel;
hr = pStreamVolume->SetAllVolumes(channels, chanVolumes);
Number of channels is irrelevant to volume. T adjust volume you need to obtain interfaces IAudioStreamVolume, IChannelAudioVolume. See MSDN writes:
The IAudioStreamVolume interface enables a client to control and
monitor the volume levels for all of the channels in an audio stream.
The client obtains a reference to the IAudioStreamVolume interface on
a stream object by calling the IAudioClient::GetService method with
parameter riid set to REFIID IID_IAudioStreamVolume.
Here is the code snippet for you. It plays synthesized sine wave at louder volume for a few seconds, then continues with updated volume to keep playing quietly.
#define _USE_MATH_DEFINES
#include <math.h>
#include <mmdeviceapi.h>
#include <audioclient.h>
#define _A ATLASSERT
#define __C ATLENSURE_SUCCEEDED
#define __D ATLENSURE_THROW
int _tmain(int argc, _TCHAR* argv[])
{
__C(CoInitialize(NULL));
CComPtr<IMMDeviceEnumerator> pMmDeviceEnumerator;
__C(pMmDeviceEnumerator.CoCreateInstance(__uuidof(MMDeviceEnumerator)));
CComPtr<IMMDevice> pMmDevice;
__C(pMmDeviceEnumerator->GetDefaultAudioEndpoint(eRender, eMultimedia, &pMmDevice));
CComPtr<IAudioClient> pAudioClient;
__C(pMmDevice->Activate(__uuidof(IAudioClient), CLSCTX_ALL, NULL, (VOID**) &pAudioClient));
CComHeapPtr<WAVEFORMATEX> pWaveFormatEx;
__C(pAudioClient->GetMixFormat(&pWaveFormatEx));
static const REFERENCE_TIME g_nBufferTime = 60 * 1000 * 10000i64; // 1 minute
__C(pAudioClient->Initialize(AUDCLNT_SHAREMODE_SHARED, 0, g_nBufferTime, 0, pWaveFormatEx, NULL));
#pragma region Data
CComPtr<IAudioRenderClient> pAudioRenderClient;
__C(pAudioClient->GetService(__uuidof(IAudioRenderClient), (VOID**) &pAudioRenderClient));
UINT32 nSampleCount = (UINT32) (g_nBufferTime / (1000 * 10000i64) * pWaveFormatEx->nSamplesPerSec) / 2;
_A(pWaveFormatEx->wFormatTag == WAVE_FORMAT_EXTENSIBLE);
const WAVEFORMATEXTENSIBLE* pWaveFormatExtensible = (const WAVEFORMATEXTENSIBLE*) (const WAVEFORMATEX*) pWaveFormatEx;
_A(pWaveFormatExtensible->SubFormat == KSDATAFORMAT_SUBTYPE_IEEE_FLOAT);
// ASSU: Mixing format is IEEE Float PCM
BYTE* pnData = NULL;
__C(pAudioRenderClient->GetBuffer(nSampleCount, &pnData));
FLOAT* pfFloatData = (FLOAT*) pnData;
for(UINT32 nSampleIndex = 0; nSampleIndex < nSampleCount; nSampleIndex++)
for(WORD nChannelIndex = 0; nChannelIndex < pWaveFormatEx->nChannels; nChannelIndex++)
pfFloatData[nSampleIndex * pWaveFormatEx->nChannels + nChannelIndex] = sin(1000.0f * nSampleIndex / pWaveFormatEx->nSamplesPerSec * 2 * M_PI);
__C(pAudioRenderClient->ReleaseBuffer(nSampleCount, 0));
#pragma endregion
CComPtr<ISimpleAudioVolume> pSimpleAudioVolume;
__C(pAudioClient->GetService(__uuidof(ISimpleAudioVolume), (VOID**) &pSimpleAudioVolume));
__C(pSimpleAudioVolume->SetMasterVolume(0.50f, NULL));
_tprintf(_T("Playing Loud\n"));
__C(pAudioClient->Start());
Sleep(5 * 1000);
_tprintf(_T("Playing Quiet\n"));
__C(pSimpleAudioVolume->SetMasterVolume(0.10f, NULL));
Sleep(15 * 1000);
// NOTE: We don't care for termination crash
return 0;
}

Resources