Capturing 48 kHz audio with FFmpeg and DirectShow (dshow input) - ffmpeg

I tried to capture audio with 48 kHz in FFmpeg, the code as below:
AVInputFormat* ifmt = av_find_input_format("dshow");
CHECK_POINTER_RETURN_VALUE(ifmt, false)
pFmtCtx = avformat_alloc_context();
CHECK_POINTER_RETURN_VALUE(pFmtCtx, false)
AVDictionary *param = nullptr;
std::string sr = std::to_string(48000);
av_dict_set(&param, "sample_rate",sr.c_str(), 0);
int error = avformat_open_input(&pFmtCtx, ffName.c_str(), ifmt, &param);
if (error != 0) {
char buf[2014];
av_strerror(error, buf, 1024);
LOG(ERROR)<<"open audio device failed,err is "<<buf;
return false;
}
but "avformat_open_input" return fail, err shows "I/O error", if the sample rate is 44100, all is OK.
Now FFmpeg doesn't support capturing 48 kHz audio?

This was an issue with the DirectShow API that FFmpeg was using. It has been resolved with a change to FFmpeg: https://github.com/FFmpeg/FFmpeg/commit/d9a9b4c877b85fea5a5bad74c3d592a756047f79
Specifically, DirectShow doesn't adequately describe the audio device capabilities with AUDIO_STREAM_CONFIG_CAPS when the audio device supports both 44.1 kHz and 48 kHz as clock multiples. WAVEFORMATEX within the AM_MEDIA_TYPE must be used instead.

As #die maus mentioned, the fact that this works if sample rate is set to 44100, but not 48000, likely indicates that your input device does not support sampling at 48 kHz. This is not a limitation of FFmpeg, but of the hardware.
And as #moi suggested, unless you have a specific need for 48 kHz, 44.1 should work just fine.
If you really need 48 kHz (e.g. you are sending the audio to something else that expects 48 kHz), you can resample the audio. FFmpeg includes libswresample for this purpose; see the example here.

Related

Is it possible to decode MPEG4 frames without delay with ffmpeg?

I use ffmpeg's MPEG4 decoder. The decoder has CODEC_CAP_DELAY capability among others. It means the decoder will give me decoded frames with latency of 1 frame.
I have a set of MPEG4 (I- & P- )frames from AVI file and feed ffmpeg decoder with these frames. For the very first I-frame decoder gives me nothing, but decodes the frames successfully. I can force the decoder to get the decoded frame with the second call of avcodec_decode_video2 and providing nulls (flush it), but if I do so for each frame I get artifacts for the first group of pictures (e.g. second decoded P-frame is of gray color).
If I do not force ffmpeg decoder to give me decoded frame right now, then it works flawlessly and without artifacts.
Question: But is it possible to get decoded frame without giving the decoder next frame and without artifacts?
Small example of how decoding is implemented for each frame:
// decode
int got_frame = 0;
int err = 0;
int tries = 5;
do
{
err = avcodec_decode_video2(m_CodecContext, m_Frame, &got_frame, &m_Packet);
/* some codecs, such as MPEG, transmit the I and P frame with a
latency of one frame. You must do the following to have a
chance to get the last frame of the video */
m_Packet.data = NULL;
m_Packet.size = 0;
--tries;
}
while (err >= 0 && got_frame == 0 && tries > 0);
But as I said that gave me artifacts for the first gop.
Use the "-flags +low_delay" option (or in code, set AVCodecContext.flags |= CODEC_FLAG_LOW_DELAY).
I tested several options and "-flags low_delay" and "-probesize 32" is more important than others. bellow code worked for me.
AVDictionary* avDic = nullptr;
av_dict_set(&avDic, "flags", "low_delay", 0);
av_dict_set(&avDic, "probesize", "32", 0);
const int errorCode = avformat_open_input(&pFormatCtx, mUrl.c_str(), nullptr, &avDic);

Reading RTSP stream with FFMpeg library - how to use avcodec_open2?

While trying to read rtsp stream I get some problems, with code and documentation alike. Short description: whatever I do, avcodec_open2 either fails (saying "codec type or id mismatches") or width and height of codec context after the call are 0 (thus making further code useless). Stream itself can be opened normally by VLC player and av_dump_format() displays correct info. My code is based on technique answer to this question.
Long description: my code is in C#, but here is C++-equivalent of FFMpeg calls (I actually reduced my code to this minimum and problem persists):
av_register_all();
avformat_network_init(); //return code ignored
AVFormatContext* formatContext = avformat_alloc_context();
if (avformat_open_input(&formatContext, stream_path, null, null) != 0) {
return;
}
if (avformat_find_stream_info(formatContext, null) < 0) {
return;
}
int videoStreamIndex = 0;
for (int i = 0; i < formatContext->nb_streams; ++i) {
AVStream* s = formatContext->streams[i];
if (s->codec == null) continue;
AVCodecContext c = *(s->codec);
if (c.codec_type == AVMEDIA_TYPE_VIDEO) videoStreamIndex = i;
}
//start reading packets from stream and write them to file
//av_read_play(formatContext); //return code ignored
//this call would print "method PLAY failed: 455 Method Not Valid in This State"
//seems to be the case that for rtsp stream it isn't needed
AVCodec* codec = null;
codec = avcodec_find_decoder(AV_CODEC_ID_H264);
if (codec == null) {
return;
}
AVCodecContext* codecContext = avcodec_alloc_context3(null);
avcodec_get_context_defaults3(codecContext, codec);//return code ignored
avcodec_copy_context(codecContext, formatContext->streams[videoStreamIndex]->codec); //return code ignored
av_dump_format(formatContext, videoStreamIndex, stream_path, 0);
if (avcodec_open2(codecContext, codec, null) < 0) {
return;
}
The code actually uses DLL version of FFMpeg library; avcodec-55.dll and avformat-55.dll are used.
Documentation says something weird about which calls can be made in which succession (that copy_context should be called before get_context_defaults), current code is left close as possible to technique version. As written, it results in non-zero return from avcodec_open2 with "codec type or id mismatches" message. Changing the order does little good: now avcodec_open2 executes successfully, but both codecContext->width and codecContext->height are 0 afterwards.
Also documentation doesn't mention which is default value for the third argument of avcodec_open2 should be, but source code seems to taking into account that options can be NULL.
Output of av_dump_format is as follows:
Input #0, rtsp, from 'rtsp://xx.xx.xx.xx:xx/video.pro1':
Metadata:
title : QStream
comment : QStreaming Media
Duration: N/A, start: 0.000000, bitrate: 64 kb/s
Stream #0:0: Video: h264 (Baseline), yuvj420p(pc), 1920x1080, 30 fps, 25 tbr, 90k tbn, 60 tbc
Stream #0:1: Audio: pcm_mulaw, 8000 Hz, 1 channels, s16, 64 kb/s
First, what does the av_dump_format shows? Are you sure your video stream codec is h264, because you try to open the codec as if it were H264.
In order to open any codec, change your avcodec_find_decoder to pass it the source codec id:
codec = avcodec_find_decoder(formatContext->streams[videoStreamIndex]->codec->codec_id);
By the way, (forget this one if you do not use the c++ code but stick with c#): you do not need to make a copy of the initial AVCodecContext when you are looking for the video stream. You can do: (note that you may want to keep a pointer to the inital codec context, see below).
AVCodecContext* c = s->codec;
if (c->codec_type == AVMEDIA_TYPE_VIDEO) {
videoStreamIndex = i;
initialVideoCodecCtx = c;
}
Next point, not really relevant in this case: instead of looping through all the steams, FFmpeg has a helper function for it:
int videoStreamIndex = av_find_best_stream(formatContext, AVMEDIA_TYPE_VIDEO, -1, -1, NULL, 0);
Last point: I think only the first point should do the trick to make avcodec_open2 work, but you might not be able to decode your stream. You opened the codec for the new codec context, but no codec is opened for the inital context. Why did you make a copy of the initial codec context? It is usefull if you want to record your stream in another file (i.e. transcode), but if you only want to decode your stream, it is much easier to use the initial context, and use it instead of the new one as a parameter for avcodec_decode_video2.
To sum it up, replace your code after avformat_find_stream_info by (warning: no error check):
int videoStreamIndex = av_find_best_stream(formatContext, AVMEDIA_TYPE_VIDEO, -1, -1, NULL, 0);
AVCodecContext* codecCtx = formatContext->streams[videoStreamIndex]->codec;
AVCodec* codec = avcodec_find_decoder(codecCtx->codec_id);
// tune codecCtx if you want special decoding options. See FFmpeg docs for a list of members
if (avcodec_open2(codecCtx, codec, null) < 0) {
return;
}
// use av_read_frame(formatContext, ...) to read packets
// use avcodec_decode_video2(codecCtx, ...) to decode packets
If avcodec_open2 does not fail, and you still see width and height being 0 this might be expected. Notice that the stream (frame) dimensions are not always known until you actually start decoding.
You should use the AVFrame values in order to initialize your decoding buffers, after your first avcodec_decode_video2 decoding call.

{OpenAL(+FFmpeg)} How to queue variable size buffer due to ogg format?

(First of all, I may feel sorry about my poor English as it's not my native language.)
I use FFmpeg to decode some audio file and play it with OpenAL by "stream"(i.e."queue" and "unqueue" function of OpenAL).
When I use my program to play .ogg file, I find that it has a variable nb_samples.(due to ogg has variable bit rate??) There are 128 B, 512 B and 1024 B of nb_samples. As a results, I must call alDeleteBuffers and alGenBuffers before I use alBufferSamplesSOFT(similar to alBufferData) because it would fail to call alBufferSamplesSOFT without recreate the buffer.
Notes: alBufferSamplesSOFT is provided by OpenAL Soft. You can just see it as alBufferData.
Nevertheless, I think it's foolish and inefficient if I do this. Is there is some smart method? I paste the part of code:
while (av_read_frame(...) == 0){
avcodec_decode_audio4(...);
swr_convert(...); // to convert PCM format from FLTP to FLT
alDeleteBuffers(1, theBuffers[?]);
alGenBuffers(1, theBuffers[?]);
alBufferSamplesSOFT(...); // put those data into OpenAL buffer
}
if I don't do this, It would failed to update the OpenAL buffer. Is there any method to create a variable size buffer or a big size buffer? Or is there any method to change the size of buffer?
Thanks for you guys.
You can rely on the FIFO functionality of libavresample (supported by FFmpeg), such as:
... decode the audio into srcaudio frame...
// Resample the input into the audioSampleBuffer until we proceed the whole decoded data
if ( (err = avresample_convert( audioResampleCtx,
NULL,
0,
0,
srcaudio.data,
0,
srcaudio.nb_samples )) < 0 )
{
warning( "Error resampling decoded audio: %d", err );
return -1;
}
while( avresample_available( audioResampleCtx ) >= audioFrame->nb_samples )
{
// Read a frame audio data from the resample fifo
if ( avresample_read( audioResampleCtx, audioFrame->data, audioFrame->nb_samples ) != audioFrame->nb_samples )
{
warning( "Error reading resampled audio: %d", err );
return -1;
}
This way you will always get back the fixed size buffer via avresample_read. It also has the benefit if you're converting audio in a way which generates more samples from the source file than nb_samples.
The code piece is from my Karaoke Lyrics Editor; you can check the respective source file here if you need more information.

Parsing FFMpeg AVPackets into h264 nal units

I am using FFMpeg To decode live video and stream it using Live555.i am able to decode video and getting the output AVPackets.
1. Convert the BGR Image to YUV422P format using FFMpeg's SWScale
// initilize a BGR To RGB converter using FFMpeg
ctx = sws_getContext(codecContext->width, codecContext->height, AV_PIX_FMT_BGR24, codecContext->width, codecContext->height, AV_PIX_FMT_YUV422P, SWS_BICUBIC, 0, 0, 0);
tempFrame = av_frame_alloc();
int num_bytes = avpicture_get_size(PIX_FMT_BGR24, codecContext->width, codecContext->height);
uint8_t* frame2_buffer = (uint8_t*)av_malloc(num_bytes*sizeof(uint8_t));
avpicture_fill((AVPicture*)tempFrame, frame2_buffer, PIX_FMT_BGR24, codecContext->width, codecContext->height);
// inside the loop of where frames are being encoded where rawFrame is a BGR image
tempFrame->data[0] = reinterpret_cast<uint8_t*>(rawFrame->_data);
sws_scale(ctx, tempFrame->data, tempFrame->linesize, 0, frame->height, frame->data, frame->linesize);
For decoding each Frame
ret = avcodec_encode_video2(codecContext, &packet, frame, &got_output);
if(ret < 0)
{
fprintf(stderr, "Error in encoding frame\n");
exit(1);
}
if(got_output)
{
//printf("Received frame! pushing to queue\n");
OutputFrame *outFrame = new OutputFrame();
outFrame->_data = packet.buf->data;
outFrame->_bufferSize = packet.buf->size;
outputQueue.push_back(outFrame);
}
Till here it works fine. i am able to write these frames to file and play it using VLC. after this i have to pass the output frame to Live555.i think AVPackets i am getting here doesn't need to be a single H264 Nal unit which is required by Live555.
How to break a AVPacket into Nal units which can be passed to Live555?
H264VideoStreamDiscreateFramer expect data without the start code '\x00\x00\x00\x01'.
It is needed to remove the 4 first bytes either in your LiveDeviceSource or inserting a FramedFilter to do this job.
Perhaps you can tried to use an H264VideoStreamFramer, like the testH264VideoStreamer test program.
If it could help, you can find one of my tries with live555 implementing an RTSP server feed from V4L2 capture https://github.com/mpromonet/h264_v4l2_rtspserver

How can i mux H264 stream into MP4 file via libavformat

I want to realize an application that firstly decode a multi-media file(such as test.mp4 file, video codec id is H264), get a video stream and an audio stream, then make some different in the audio stream, at last encode the video stream(use libx264) and audio stream into a result file(result.mp4). To promote the efficiency, i omitted the decode and encode of video stream, i get the video packet via function "av_read_frame", then output it directly into the result file via function "av_write_frame". But there is no picture in the output file, and the size of output file is fairly small.
I tracked the ffmpeg code and found that in the function "av_write_frame->mov_write_packet->ff_mov_write_packet", it will call function "ff_avc_parse_nal_units" to obtain the size of nal unit, but the return value is very small(such as 208 bytes).
I find that the H264 stream in the MP4 file is not stored in Annex-B format, so it can't find start code(0x000001), now my problem is how can I change the H264 stream to Annex-B format, and make it work?
I added start code at the beginning of every frame manually, but it still not work.
Anyone can give me any hint?Thanks very much.
Following is the codes similar with my:
// write the stream header, if any
av_write_header(pFormatCtxEnc);
.........
/**
* Init of Encoder and Decoder
*/
bool KeyFlag = false;
bool KeyFlagEx = false;
// Read frames and save frames to disk
int iPts = 1;
av_init_packet(&packet);
while(av_read_frame(pFormatCtxDec, &packet)>=0)
{
if (packet.flags == 1)
KeyFlag = true;
if (!KeyFlag)
continue;
if (m_bStop)
{
break;
}
// Is this a packet from the video stream?
if(packet.stream_index == videoStream)
{
currentframeNum ++;
if (progressCB != NULL && currentframeNum%20 == 0)
{
float fpercent = (float)currentframeNum/frameNum;
progressCB(fpercent,m_pUser);
}
if (currentframeNum >= beginFrame && currentframeNum <= endFrane)
{
if (packet.flags == 1)
KeyFlagEx = true;
if (!KeyFlagEx)
continue;
packet.dts = iPts ++;
av_write_frame(pFormatCtxEnc, &packet);
}
}
// Free the packet that was allocated by av_read_frame
}
// write the trailer, if any
av_write_trailer(pFormatCtxEnc);
/**
* Release of encoder and decoder
*/
return true;
You might try this: libavcodec/h264_mp4toannexb_bsf.c. It converts bitstream without start codes to bitstream with start codes.
Using your source file, does ffmpeg -i src.mp4 -vcodec copy -an dst.mp4 work? Does it work if you add -bsf h264_mp4toannexb? (all using the same version/build of ffmpeg as you are trying to use programmatically of course)

Resources