I use ffmpeg's MPEG4 decoder. The decoder has CODEC_CAP_DELAY capability among others. It means the decoder will give me decoded frames with latency of 1 frame.
I have a set of MPEG4 (I- & P- )frames from AVI file and feed ffmpeg decoder with these frames. For the very first I-frame decoder gives me nothing, but decodes the frames successfully. I can force the decoder to get the decoded frame with the second call of avcodec_decode_video2 and providing nulls (flush it), but if I do so for each frame I get artifacts for the first group of pictures (e.g. second decoded P-frame is of gray color).
If I do not force ffmpeg decoder to give me decoded frame right now, then it works flawlessly and without artifacts.
Question: But is it possible to get decoded frame without giving the decoder next frame and without artifacts?
Small example of how decoding is implemented for each frame:
// decode
int got_frame = 0;
int err = 0;
int tries = 5;
do
{
err = avcodec_decode_video2(m_CodecContext, m_Frame, &got_frame, &m_Packet);
/* some codecs, such as MPEG, transmit the I and P frame with a
latency of one frame. You must do the following to have a
chance to get the last frame of the video */
m_Packet.data = NULL;
m_Packet.size = 0;
--tries;
}
while (err >= 0 && got_frame == 0 && tries > 0);
But as I said that gave me artifacts for the first gop.
Use the "-flags +low_delay" option (or in code, set AVCodecContext.flags |= CODEC_FLAG_LOW_DELAY).
I tested several options and "-flags low_delay" and "-probesize 32" is more important than others. bellow code worked for me.
AVDictionary* avDic = nullptr;
av_dict_set(&avDic, "flags", "low_delay", 0);
av_dict_set(&avDic, "probesize", "32", 0);
const int errorCode = avformat_open_input(&pFormatCtx, mUrl.c_str(), nullptr, &avDic);
Related
Context
I'm attempting to extract raw image data for each I-frame from an MPEG-2 Transport Stream with a H.264 annex B codec. This video contains I-frames on every 2 second interval. I've read that an I-frame can be found in after an NALu start code with a type of 5 (e.g. Coded slice of an IDR picture). The byte payload of these NALu's contains all the necessary data to construct a full frame. Albeit to, my understanding, in a H.264 encoded format.
I would like to build a solution to extract these I-frame from an incoming byte stream, by finding NALu's that contain I-frames, saving the payload and decoding the payload into some ubiquitous raw image format to access pixel data etc.
Note: I would like to avoid using filesystem dependency binaries like ffmpeg if possible and more importantly if feasible!
PoC
So far I have build an PoC in rust to find the byte offset, and byte size of I-frames:
use std::fs::File;
use std::io::{prelude::*, BufReader};
extern crate image;
fn main() {
let file = File::open("vodpart-0.ts").unwrap();
let reader = BufReader::new(file);
let mut idr_payload = Vec::<u8>::new();
let mut total_idr_frame_count = 0;
let mut is_idr_payload = false;
let mut is_nalu_type_code = false;
let mut start_code_vec = Vec::<u8>::new();
for (pos, byte_result) in reader.bytes().enumerate() {
let byte = byte_result.unwrap();
if is_nalu_type_code {
is_idr_payload = false;
is_nalu_type_code = false;
start_code_vec.clear();
if byte == 101 {
is_idr_payload = true;
total_idr_frame_count += 1;
println!("Found IDR picture at byte offset {}", pos);
}
continue;
}
if is_idr_payload {
idr_payload.push(byte);
}
if byte == 0 {
start_code_vec.push(byte);
continue;
}
if byte == 1 && start_code_vec.len() >= 2 {
if is_idr_payload {
let payload = idr_payload.len() - start_code_vec.len() + 1;
println!("Previous NALu payload is {} bytes long\n", payload);
save_image(&idr_payload.as_slice(), total_idr_frame_count);
idr_payload.clear();
}
is_nalu_type_code = true;
continue;
}
start_code_vec.clear();
}
println!();
println!("total i frame count: {}", total_idr_frame_count);
println!();
println!("done!");
}
fn save_image(buffer: &[u8], index: u16) {
let image_name = format!("image-{}.jpg", index);
image::save_buffer(image_name, buffer, 858, 480, image::ColorType::Rgb8).unwrap()
}
The result of which looks like:
Found IDR picture at byte offset 870
Previous NALu payload is 202929 bytes long
Found IDR picture at byte offset 1699826
Previous NALu payload is 185069 bytes long
Found IDR picture at byte offset 3268686
Previous NALu payload is 145218 bytes long
Found IDR picture at byte offset 4898270
Previous NALu payload is 106114 bytes long
Found IDR picture at byte offset 6482358
Previous NALu payload is 185638 bytes long
total i frame count: 5
done!
This is correct, based on my research using H.264 bit stream viewers etc. there are definitely 5 I-frames at those byte offsets!
The issue is that I don't understand how to convert from the H.264 bytestream payload to the raw image RBG data format. The resulting images once converted to jpg are just a fuzzy mess that takes up roughly 10% of the image area.
For example:
Questions
Is there a decoding step that needs to be performed?
Am I approaching this correctly and is this feasible to attempt myself, or should I be relying on another lib?
Any help would be greatly appreciated!
“ Is there a decoding step that needs to be performed?”
Yes. And writing a decoder from scratch is EXTREMELY complicated. The document that describes it (ISO 14496-10) is over 750 pages long. You should use a library. Libavcodec from the ffmpeg is really your only option. (Unless you only need baseline profile, in which you can use the open source decoder from android)
You can compile a custom version of libavcodec to exclude things you don’t need.
I am trying to create a live RTMP stream containing the animation generated with NVIDIA OptiX. The stream is to be received by nginx + rtmp module and broadcasted in MPEG-DASH format. Full chain up to dash.js player is working if the video is first saved to .flv file and then I send it with ffmpeg without any reformatting using command:
ffmpeg -re -i my_video.flv -c:v copy -f flv rtmp://x.x.x.x:1935/dash/test
But I want to stream directly from the code. And with this I am failng... Nginx logs an error "dash: invalid avcc received (2: No such file or directory)". Then it seems to receive the stream correctly (segments are rolling, dash manifest is there), however the stream is not possible to play in the browser.
I can see only one difference in the manifest between direct stream and stream from file. Codecs attribute of the representation in the direct stream is missed: codecs="avcc1.000000" instead of "avc1.640028" which I get when streaming from file.
My code opens the stream:
av_register_all();
AVOutputFormat* fmt = av_guess_format("flv",
file_name, nullptr);
fmt->video_codec = AV_CODEC_ID_H264;
AVFormatContext* _oc;
avformat_alloc_output_context2(&_oc, fmt, nullptr, "rtmp://x.x.x.x:1935/dash/test");
AVStream* _vs = avformat_new_stream(_oc, nullptr);
_vs->id = 0;
_vs->time_base = AVRational { 1, 25 };
_vs->avg_frame_rate = AVRational{ 25, 1 };
AVCodecParameters *vpar = _vs->codecpar;
vpar->codec_id = fmt->video_codec;
vpar->codec_type = AVMEDIA_TYPE_VIDEO;
vpar->format = AV_PIX_FMT_YUV420P;
vpar->profile = FF_PROFILE_H264_HIGH;
vpar->level = _level;
vpar->width = _width;
vpar->height = _height;
vpar->bit_rate = _avg_bitrate;
avio_open(&_oc->pb, _oc->filename, AVIO_FLAG_WRITE);
avformat_write_header(_oc, nullptr);
Width, height, bitrate, level and profile I get from NVENC encoder settings. I also do the error checking, ommited here. Then I have a loop writing each encoded packets, with IDR frames etc all prepared on the fly with NVENC. The loop body is:
auto & pkt_data = _packets[i];
AVPacket pkt = { 0 };
av_init_packet(&pkt);
pkt.pts = av_rescale_q(_n_frames++, AVRational{ 1, 25 }, _vs->time_base);
pkt.duration = av_rescale_q(1, AVRational{ 1, 25 }, _vs->time_base);
pkt.dts = pkt.pts;
pkt.stream_index = _vs->index;
pkt.data = pkt_data.data();
pkt.size = (int)pkt_data.size();
if (!memcmp(pkt_data.data(), "\x00\x00\x00\x01\x67", 5))
{
pkt.flags |= AV_PKT_FLAG_KEY;
}
av_write_frame(_oc, &pkt);
Obviously ffmpeg is writing avcc code somewhere... I have no clue where to add this code so the RTMP server can recognize it. Or I am missing something else?
Any hint greatly appreciated, folks!
Thanks to Gyan's comment I was able to solve the issue. Following the AV_CODEC_FLAG_GLOBAL_HEADER flag in the wrapper one can see how the global header is added, which was missing in my case. You can use directly the NVENC API function nvEncGetSequenceParams, but since I am anyway using SDK, it is a bit cleaner.
So I had to attach the header to AVCodecParameters::extradata:
std::vector<uint8_t> payload;
_encoder->GetSequenceParams(payload);
vpar->extradata_size = payload.size();
vpar->extradata = (uint8_t*)av_mallocz(payload.size() + AV_INPUT_BUFFER_PADDING_SIZE);
memcpy(vpar->extradata, payload.data(), payload.size());
_encoder is my instance of NvEncoder from SDK.
The wrapper is doing the same thing, however using deprecated struct AVCodecContext.
I have a strange problem in my C/C++ FFmpeg transcoder, which takes an input MP4 (varying input codecs) and produces and output MP4 (x264, baseline & AAC LC #44100 sample rate with libfdk_aac):
The resulting mp4 video has fine images (x264) and the audio (AAC LC) works fine as well, but is only played until exactly the half of the video.
The audio is not slowed down, not stretched and doesn't stutter. It just stops right in the middle of the video.
One hint may be that the input file has a sample rate of 22050 and 22050/44100 is 0.5, but I really don't get why this would make the sound just stop after half the time. I'd expect such an error leading to sound being at the wrong speed. Everything works just fine if I don't try to enforce 44100 and instead just use the incoming sample_rate.
Another guess would be that the pts calculation doesn't work. But the audio sounds just fine (until it stops) and I do exactly the same for the video part, where it works flawlessly. "Exactly", as in the same code, but "audio"-variables replaced with "video"-variables.
FFmpeg reports no errors during the whole process. I also flush the decoders/encoders/interleaved_writing after all the package reading from the input is done. It works well for the video so I doubt there is much wrong with my general approach.
Here are the functions of my code (stripped off the error handling & other class stuff):
AudioCodecContext Setup
outContext->_audioCodec = avcodec_find_encoder(outContext->_audioTargetCodecID);
outContext->_audioStream =
avformat_new_stream(outContext->_formatContext, outContext->_audioCodec);
outContext->_audioCodecContext = outContext->_audioStream->codec;
outContext->_audioCodecContext->channels = 2;
outContext->_audioCodecContext->channel_layout = av_get_default_channel_layout(2);
outContext->_audioCodecContext->sample_rate = 44100;
outContext->_audioCodecContext->sample_fmt = outContext->_audioCodec->sample_fmts[0];
outContext->_audioCodecContext->bit_rate = 128000;
outContext->_audioCodecContext->strict_std_compliance = FF_COMPLIANCE_EXPERIMENTAL;
outContext->_audioCodecContext->time_base =
(AVRational){1, outContext->_audioCodecContext->sample_rate};
outContext->_audioStream->time_base = (AVRational){1, outContext->_audioCodecContext->sample_rate};
int retVal = avcodec_open2(outContext->_audioCodecContext, outContext->_audioCodec, NULL);
Resampler Setup
outContext->_audioResamplerContext =
swr_alloc_set_opts( NULL, outContext->_audioCodecContext->channel_layout,
outContext->_audioCodecContext->sample_fmt,
outContext->_audioCodecContext->sample_rate,
_inputContext._audioCodecContext->channel_layout,
_inputContext._audioCodecContext->sample_fmt,
_inputContext._audioCodecContext->sample_rate,
0, NULL);
int retVal = swr_init(outContext->_audioResamplerContext);
Decoding
decodedBytes = avcodec_decode_audio4( _inputContext._audioCodecContext,
_inputContext._audioTempFrame,
&p_gotAudioFrame, &_inputContext._currentPacket);
Converting (only if decoding produced a frame, of course)
int retVal = swr_convert( outContext->_audioResamplerContext,
outContext->_audioConvertedFrame->data,
outContext->_audioConvertedFrame->nb_samples,
(const uint8_t**)_inputContext._audioTempFrame->data,
_inputContext._audioTempFrame->nb_samples);
Encoding (only if decoding produced a frame, of course)
outContext->_audioConvertedFrame->pts =
av_frame_get_best_effort_timestamp(_inputContext._audioTempFrame);
// Init the new packet
av_init_packet(&outContext->_audioPacket);
outContext->_audioPacket.data = NULL;
outContext->_audioPacket.size = 0;
// Encode
int retVal = avcodec_encode_audio2( outContext->_audioCodecContext,
&outContext->_audioPacket,
outContext->_audioConvertedFrame,
&p_gotPacket);
// Set pts/dts time stamps for writing interleaved
av_packet_rescale_ts( &outContext->_audioPacket,
outContext->_audioCodecContext->time_base,
outContext->_audioStream->time_base);
outContext->_audioPacket.stream_index = outContext->_audioStream->index;
Writing (only if encoding produced a packet, of course)
int retVal = av_interleaved_write_frame(outContext->_formatContext, &outContext->_audioPacket);
I am quite out of ideas about what would cause such a behaviour.
So, I finally managed to figure things out myself.
The problem was indeed in the difference of the sample_rate.
You'd assume that a call to swr_convert() would give you all the samples you need for converting the audio frame when called like I did.
Of course, that would be too easy.
Instead, you need to call swr_convert (potentially) multiple times per frame and buffer its output, if required. Then you need to grab a single frame from the buffer and that is what you will have to encode.
Here is my new convertAudioFrame function:
// Calculate number of output samples
int numOutputSamples = av_rescale_rnd(
swr_get_delay(outContext->_audioResamplerContext, _inputContext._audioCodecContext->sample_rate)
+ _inputContext._audioTempFrame->nb_samples,
outContext->_audioCodecContext->sample_rate,
_inputContext._audioCodecContext->sample_rate,
AV_ROUND_UP);
if (numOutputSamples == 0)
{
return;
}
uint8_t* tempSamples;
av_samples_alloc( &tempSamples, NULL,
outContext->_audioCodecContext->channels, numOutputSamples,
outContext->_audioCodecContext->sample_fmt, 0);
int retVal = swr_convert( outContext->_audioResamplerContext,
&tempSamples,
numOutputSamples,
(const uint8_t**)_inputContext._audioTempFrame->data,
_inputContext._audioTempFrame->nb_samples);
// Write to audio fifo
if (retVal > 0)
{
retVal = av_audio_fifo_write(outContext->_audioFifo, (void**)&tempSamples, retVal);
}
av_freep(&tempSamples);
// Get a frame from audio fifo
int samplesAvailable = av_audio_fifo_size(outContext->_audioFifo);
if (samplesAvailable > 0)
{
retVal = av_audio_fifo_read(outContext->_audioFifo,
(void**)outContext->_audioConvertedFrame->data,
outContext->_audioCodecContext->frame_size);
// We got a frame, so also set its pts
if (retVal > 0)
{
p_gotConvertedFrame = 1;
if (_inputContext._audioTempFrame->pts != AV_NOPTS_VALUE)
{
outContext->_audioConvertedFrame->pts = _inputContext._audioTempFrame->pts;
}
else if (_inputContext._audioTempFrame->pkt_pts != AV_NOPTS_VALUE)
{
outContext->_audioConvertedFrame->pts = _inputContext._audioTempFrame->pkt_pts;
}
}
}
This function I basically call until there are no more frame in the audio fifo buffer.
So, the audio was only half as long because I only encoded as many frames as I decoded. Where I actually needed to encode 2 times as many frames due to 2 times the sample_rate.
I am using FFMpeg To decode live video and stream it using Live555.i am able to decode video and getting the output AVPackets.
1. Convert the BGR Image to YUV422P format using FFMpeg's SWScale
// initilize a BGR To RGB converter using FFMpeg
ctx = sws_getContext(codecContext->width, codecContext->height, AV_PIX_FMT_BGR24, codecContext->width, codecContext->height, AV_PIX_FMT_YUV422P, SWS_BICUBIC, 0, 0, 0);
tempFrame = av_frame_alloc();
int num_bytes = avpicture_get_size(PIX_FMT_BGR24, codecContext->width, codecContext->height);
uint8_t* frame2_buffer = (uint8_t*)av_malloc(num_bytes*sizeof(uint8_t));
avpicture_fill((AVPicture*)tempFrame, frame2_buffer, PIX_FMT_BGR24, codecContext->width, codecContext->height);
// inside the loop of where frames are being encoded where rawFrame is a BGR image
tempFrame->data[0] = reinterpret_cast<uint8_t*>(rawFrame->_data);
sws_scale(ctx, tempFrame->data, tempFrame->linesize, 0, frame->height, frame->data, frame->linesize);
For decoding each Frame
ret = avcodec_encode_video2(codecContext, &packet, frame, &got_output);
if(ret < 0)
{
fprintf(stderr, "Error in encoding frame\n");
exit(1);
}
if(got_output)
{
//printf("Received frame! pushing to queue\n");
OutputFrame *outFrame = new OutputFrame();
outFrame->_data = packet.buf->data;
outFrame->_bufferSize = packet.buf->size;
outputQueue.push_back(outFrame);
}
Till here it works fine. i am able to write these frames to file and play it using VLC. after this i have to pass the output frame to Live555.i think AVPackets i am getting here doesn't need to be a single H264 Nal unit which is required by Live555.
How to break a AVPacket into Nal units which can be passed to Live555?
H264VideoStreamDiscreateFramer expect data without the start code '\x00\x00\x00\x01'.
It is needed to remove the 4 first bytes either in your LiveDeviceSource or inserting a FramedFilter to do this job.
Perhaps you can tried to use an H264VideoStreamFramer, like the testH264VideoStreamer test program.
If it could help, you can find one of my tries with live555 implementing an RTSP server feed from V4L2 capture https://github.com/mpromonet/h264_v4l2_rtspserver
I want to realize an application that firstly decode a multi-media file(such as test.mp4 file, video codec id is H264), get a video stream and an audio stream, then make some different in the audio stream, at last encode the video stream(use libx264) and audio stream into a result file(result.mp4). To promote the efficiency, i omitted the decode and encode of video stream, i get the video packet via function "av_read_frame", then output it directly into the result file via function "av_write_frame". But there is no picture in the output file, and the size of output file is fairly small.
I tracked the ffmpeg code and found that in the function "av_write_frame->mov_write_packet->ff_mov_write_packet", it will call function "ff_avc_parse_nal_units" to obtain the size of nal unit, but the return value is very small(such as 208 bytes).
I find that the H264 stream in the MP4 file is not stored in Annex-B format, so it can't find start code(0x000001), now my problem is how can I change the H264 stream to Annex-B format, and make it work?
I added start code at the beginning of every frame manually, but it still not work.
Anyone can give me any hint?Thanks very much.
Following is the codes similar with my:
// write the stream header, if any
av_write_header(pFormatCtxEnc);
.........
/**
* Init of Encoder and Decoder
*/
bool KeyFlag = false;
bool KeyFlagEx = false;
// Read frames and save frames to disk
int iPts = 1;
av_init_packet(&packet);
while(av_read_frame(pFormatCtxDec, &packet)>=0)
{
if (packet.flags == 1)
KeyFlag = true;
if (!KeyFlag)
continue;
if (m_bStop)
{
break;
}
// Is this a packet from the video stream?
if(packet.stream_index == videoStream)
{
currentframeNum ++;
if (progressCB != NULL && currentframeNum%20 == 0)
{
float fpercent = (float)currentframeNum/frameNum;
progressCB(fpercent,m_pUser);
}
if (currentframeNum >= beginFrame && currentframeNum <= endFrane)
{
if (packet.flags == 1)
KeyFlagEx = true;
if (!KeyFlagEx)
continue;
packet.dts = iPts ++;
av_write_frame(pFormatCtxEnc, &packet);
}
}
// Free the packet that was allocated by av_read_frame
}
// write the trailer, if any
av_write_trailer(pFormatCtxEnc);
/**
* Release of encoder and decoder
*/
return true;
You might try this: libavcodec/h264_mp4toannexb_bsf.c. It converts bitstream without start codes to bitstream with start codes.
Using your source file, does ffmpeg -i src.mp4 -vcodec copy -an dst.mp4 work? Does it work if you add -bsf h264_mp4toannexb? (all using the same version/build of ffmpeg as you are trying to use programmatically of course)